better cache

2026-02-10 01:20:58 -05:00
parent c773affbc8
commit f056e67eae
39 changed files with 830 additions and 17 deletions
--- a/openspec/changes/archive/2026-02-10-better-cache/.openspec.yaml
+++ b/openspec/changes/archive/2026-02-10-better-cache/.openspec.yaml
--- a/openspec/changes/archive/2026-02-10-better-cache/design.md
+++ b/openspec/changes/archive/2026-02-10-better-cache/design.md
@@ -0,0 +1,48 @@
+## Context
+
+The site is an Astro static build served via nginx. Content is gathered by build-time ingestion (`site/scripts/fetch-content.ts`) that reads/writes a repo-local cache file (`site/content/cache/content.json`).
+
+Today, repeated ingestion runs can re-hit external sources (YouTube API/RSS, podcast RSS, WordPress `wp-json`) and re-do normalization work. We want a shared caching layer to reduce IO and network load and to make repeated runs faster and more predictable.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Add a Redis-backed cache layer usable from Node scripts (ingestion) with TTL-based invalidation.
+- Use the cache layer to reduce repeated network/API calls and parsing work for:
+  - social content ingestion (YouTube/podcast/Instagram list)
+  - WordPress `wp-json` ingestion
+- Provide a default “industry standard” TTL with environment override.
+- Add a manual cache clear command/script.
+- Provide verification (tests and/or logs) that cache hits occur and TTL expiration behaves as expected.
+
+**Non-Goals:**
+- Adding a runtime server for the site (the site remains static HTML served by nginx).
+- Caching browser requests to nginx (no CDN/edge cache configuration in this change).
+- Perfect cache coherence across multiple machines/environments (dev+docker is the target).
+
+## Decisions
+
+- **Decision: Use Redis as the shared cache backend (docker-compose service).**
+  - Rationale: Redis is widely adopted, lightweight, supports TTLs natively, and is easy to run in dev via Docker.
+  - Alternative considered: Local file-based cache only. Rejected because it doesn’t provide a shared service and is harder to invalidate consistently.
+
+- **Decision: Cache at the “source fetch” and “normalized dataset” boundaries.**
+  - Rationale: The biggest cost is network + parsing/normalization. Caching raw API responses (or normalized outputs) by source+params gives the best win.
+  - Approach:
+    - Cache keys like `youtube:api:<channelId>:<limit>`, `podcast:rss:<url>`, `wp:posts`, `wp:pages`, `wp:categories`.
+    - Store JSON values, set TTL, and log hit/miss per key.
+
+- **Decision: Default TTL = 1 hour (3600s), configurable via env.**
+  - Rationale: A 1h TTL is a common baseline for content freshness vs load. It also aligns with typical ingestion schedules (hourly/daily).
+  - Allow overrides for local testing and production tuning.
+
+- **Decision: Cache clear script uses Redis `FLUSHDB` in the configured Redis database.**
+  - Rationale: Simple manual operation and easy to verify.
+  - Guardrail: Use a dedicated Redis DB index (e.g., `0` by default) so the script is scoped.
+
+## Risks / Trade-offs
+
+- [Risk] Redis introduces a new dependency and operational moving part. -> Mitigation: Keep Redis optional; ingestion should fall back to no-cache mode if Redis is not reachable.
+- [Risk] Stale content if TTL too long. -> Mitigation: Default to 1h and allow env override; provide manual clear command.
+- [Risk] Cache key mistakes lead to wrong content reuse. -> Mitigation: Centralize key generation and add tests for key uniqueness and TTL behavior.
+
--- a/openspec/changes/archive/2026-02-10-better-cache/proposal.md
+++ b/openspec/changes/archive/2026-02-10-better-cache/proposal.md
@@ -0,0 +1,28 @@
+## Why
+
+Reduce IO and external fetch load by adding a shared caching layer so repeated requests for the same content do not re-hit disk/network unnecessarily.
+
+## What Changes
+
+- Add a caching layer (Redis or similar lightweight cache) used by the site’s data/ingestion flows.
+- Add a cache service to `docker-compose.yml`.
+- Define an industry-standard cache invalidation interval (TTL) with a sensible default and allow it to be configured via environment variables.
+- Add a script/command to manually clear the cache on demand.
+- Add verification that the cache is working (cache hits/misses and TTL behavior).
+
+## Capabilities
+
+### New Capabilities
+- `cache-layer`: Provide a shared caching service (Redis or equivalent) with TTL-based invalidation and a manual clear operation for the website’s data flows.
+
+### Modified Capabilities
+- `social-content-aggregation`: Use the cache layer to avoid re-fetching or re-processing external content sources on repeated runs/requests.
+- `wordpress-content-source`: Use the cache layer to reduce repeated `wp-json` fetches and parsing work.
+
+## Impact
+
+- Deployment/local dev: add Redis (or equivalent) to `docker-compose.yml` and wire environment/config for connection + TTL.
+- Scripts/services: update ingestion/build-time fetch to read/write via cache and log hit/miss for verification.
+- Tooling: add a cache-clear script/command (and document usage).
+- Testing: add tests or a lightweight verification step proving cached reads are used and expire as expected.
+
--- a/openspec/changes/archive/2026-02-10-better-cache/specs/cache-layer/spec.md
+++ b/openspec/changes/archive/2026-02-10-better-cache/specs/cache-layer/spec.md
@@ -0,0 +1,38 @@
+## ADDED Requirements
+
+### Requirement: Redis-backed cache service
+The system MUST provide a Redis-backed cache service for use by ingestion and content processing flows.
+
+The cache service MUST be runnable in local development via Docker Compose.
+
+#### Scenario: Cache service available in Docker
+- **WHEN** the Docker Compose stack is started
+- **THEN** a Redis service is available to other services/scripts on the internal network
+
+### Requirement: TTL-based invalidation
+Cached entries MUST support TTL-based invalidation.
+
+The system MUST define a default TTL and MUST allow overriding the TTL via environment/config.
+
+#### Scenario: Default TTL applies
+- **WHEN** a cached entry is written without an explicit TTL override
+- **THEN** it expires after the configured default TTL
+
+#### Scenario: TTL override applies
+- **WHEN** a TTL override is configured via environment/config
+- **THEN** new cached entries use that TTL for expiration
+
+### Requirement: Cache key namespace
+Cache keys MUST be namespaced by source and parameters so that different data requests do not collide.
+
+#### Scenario: Two different sources do not collide
+- **WHEN** the system caches a YouTube fetch and a WordPress fetch
+- **THEN** they use different key namespaces and do not overwrite each other
+
+### Requirement: Manual cache clear
+The system MUST provide a script/command to manually clear the cache.
+
+#### Scenario: Manual clear executed
+- **WHEN** a developer runs the cache clear command
+- **THEN** the cache is cleared and subsequent ingestion runs produce cache misses
+
--- a/openspec/changes/archive/2026-02-10-better-cache/specs/social-content-aggregation/spec.md
+++ b/openspec/changes/archive/2026-02-10-better-cache/specs/social-content-aggregation/spec.md
@@ -0,0 +1,23 @@
+## MODIFIED Requirements
+
+### Requirement: Refresh and caching
+The system MUST cache the latest successful ingestion output and MUST serve the cached data to the site renderer.
+
+The system MUST support periodic refresh on a schedule (at minimum daily) and MUST support a manual refresh trigger.
+
+On ingestion failure, the system MUST continue serving the most recent cached data.
+
+The ingestion pipeline MUST use the cache layer (when configured and reachable) to reduce repeated network and parsing work for external sources (for example, YouTube API/RSS and podcast RSS).
+
+#### Scenario: Scheduled refresh fails
+- **WHEN** a scheduled refresh run fails to fetch one or more sources
+- **THEN** the site continues to use the most recent successfully cached dataset
+
+#### Scenario: Manual refresh requested
+- **WHEN** a manual refresh is triggered
+- **THEN** the system attempts ingestion immediately and updates the cache if ingestion succeeds
+
+#### Scenario: Cache hit avoids refetch
+- **WHEN** a refresh run is executed within the cache TTL for a given source+parameters
+- **THEN** the ingestion pipeline uses cached data for that source instead of refetching over the network
+
--- a/openspec/changes/archive/2026-02-10-better-cache/specs/wordpress-content-source/spec.md
+++ b/openspec/changes/archive/2026-02-10-better-cache/specs/wordpress-content-source/spec.md
@@ -0,0 +1,19 @@
+## MODIFIED Requirements
+
+### Requirement: Build-time caching
+WordPress posts, pages, and categories MUST be written into the repo-local content cache used by the site build.
+
+If the WordPress fetch fails, the system MUST NOT crash the entire build pipeline; it MUST either:
+- keep the last-known-good cached WordPress content (if present), or
+- store an empty WordPress dataset and allow the rest of the site to build.
+
+When the cache layer is configured and reachable, the WordPress ingestion MUST cache `wp-json` responses (or normalized outputs) using a TTL so repeated ingestion runs avoid unnecessary network requests and parsing work.
+
+#### Scenario: WordPress fetch fails
+- **WHEN** a WordPress API request fails
+- **THEN** the site build can still complete and the blog surface renders a graceful empty state
+
+#### Scenario: Cache hit avoids wp-json refetch
+- **WHEN** WordPress ingestion is executed within the configured cache TTL
+- **THEN** it uses cached data instead of refetching from `wp-json`
+
--- a/openspec/changes/archive/2026-02-10-better-cache/tasks.md
+++ b/openspec/changes/archive/2026-02-10-better-cache/tasks.md
@@ -0,0 +1,26 @@
+## 1. Cache Service And Config
+
+- [x] 1.1 Add Redis service to `docker-compose.yml` and wire basic health/ports for local dev
+- [x] 1.2 Add cache env/config variables (Redis URL/host+port, DB index, default TTL seconds) and document in `site/.env.example`
+
+## 2. Cache Client And Utilities
+
+- [x] 2.1 Add a small Redis cache client wrapper (get/set JSON with TTL, namespaced keys) for Node scripts
+- [x] 2.2 Add logging for cache hit/miss per key to support verification
+- [x] 2.3 Ensure caching is optional: if Redis is unreachable, ingestion proceeds without caching
+
+## 3. Integrate With Ingestion
+
+- [x] 3.1 Cache YouTube fetches (API and/or RSS) by source+params and reuse within TTL
+- [x] 3.2 Cache podcast RSS fetch by URL and reuse within TTL
+- [x] 3.3 Cache WordPress `wp-json` fetches (posts/pages/categories) and reuse within TTL
+
+## 4. Cache Invalidation
+
+- [x] 4.1 Add a command/script to manually clear the cache (scoped to configured Redis DB)
+- [x] 4.2 Document the cache clear command usage
+
+## 5. Verification
+
+- [x] 5.1 Add a test that exercises the cache wrapper (set/get JSON + TTL expiration behavior)
+- [x] 5.2 Add a test or build verification that a second ingestion run within TTL produces cache hits
--- a/openspec/changes/archive/2026-02-10-blogs-section/.openspec.yaml
+++ b/openspec/changes/archive/2026-02-10-blogs-section/.openspec.yaml
--- a/openspec/changes/archive/2026-02-10-blogs-section/design.md
+++ b/openspec/changes/archive/2026-02-10-blogs-section/design.md
--- a/openspec/changes/archive/2026-02-10-blogs-section/proposal.md
+++ b/openspec/changes/archive/2026-02-10-blogs-section/proposal.md
--- a/openspec/changes/archive/2026-02-10-blogs-section/specs/blog-section-surface/spec.md
+++ b/openspec/changes/archive/2026-02-10-blogs-section/specs/blog-section-surface/spec.md
--- a/openspec/changes/archive/2026-02-10-blogs-section/specs/seo-content-surface/spec.md
+++ b/openspec/changes/archive/2026-02-10-blogs-section/specs/seo-content-surface/spec.md
--- a/openspec/changes/archive/2026-02-10-blogs-section/specs/wordpress-content-source/spec.md
+++ b/openspec/changes/archive/2026-02-10-blogs-section/specs/wordpress-content-source/spec.md
--- a/openspec/changes/archive/2026-02-10-blogs-section/tasks.md
+++ b/openspec/changes/archive/2026-02-10-blogs-section/tasks.md
--- a/openspec/changes/archive/2026-02-10-hide-ig-if-no-data/.openspec.yaml
+++ b/openspec/changes/archive/2026-02-10-hide-ig-if-no-data/.openspec.yaml
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-02-10
--- a/openspec/changes/archive/2026-02-10-hide-ig-if-no-data/design.md
+++ b/openspec/changes/archive/2026-02-10-hide-ig-if-no-data/design.md
--- a/openspec/changes/archive/2026-02-10-hide-ig-if-no-data/proposal.md
+++ b/openspec/changes/archive/2026-02-10-hide-ig-if-no-data/proposal.md
--- a/openspec/changes/archive/2026-02-10-hide-ig-if-no-data/specs/homepage-content-modules/spec.md
+++ b/openspec/changes/archive/2026-02-10-hide-ig-if-no-data/specs/homepage-content-modules/spec.md
--- a/openspec/changes/archive/2026-02-10-hide-ig-if-no-data/tasks.md
+++ b/openspec/changes/archive/2026-02-10-hide-ig-if-no-data/tasks.md
--- a/openspec/specs/blog-section-surface/spec.md
+++ b/openspec/specs/blog-section-surface/spec.md
@@ -0,0 +1,66 @@
+## Purpose
+
+Expose a blog section on the site backed by cached WordPress content, including listing, detail pages, and category browsing.
+
+## ADDED Requirements
+
+### Requirement: Primary navigation entry
+The site MUST add a header navigation link to the blog index at `/blog` labeled "Blog".
+
+#### Scenario: Blog link in header
+- **WHEN** a user views any page
+- **THEN** the header navigation includes a "Blog" link that navigates to `/blog`
+
+### Requirement: Blog index listing (posts)
+The site MUST provide a blog index page at `/blog` that lists WordPress posts as cards containing:
+- featured image (when available)
+- title
+- excerpt/summary
+
+The listing MUST be ordered by publish date descending (newest first).
+
+#### Scenario: Blog index lists posts
+- **WHEN** the cached WordPress dataset contains posts
+- **THEN** `/blog` renders a list of post cards ordered by publish date descending
+
+### Requirement: Blog post detail
+The site MUST provide a blog post detail page for each WordPress post that renders:
+- title
+- publish date
+- featured image (when available)
+- full post content
+
+#### Scenario: Post detail renders
+- **WHEN** a user navigates to a blog post detail page
+- **THEN** the page renders the full post content from the cached WordPress dataset
+
+### Requirement: WordPress pages support
+The blog section MUST support WordPress pages by rendering page detail routes that show:
+- title
+- featured image (when available)
+- full page content
+
+#### Scenario: Page detail renders
+- **WHEN** a user navigates to a WordPress page detail route
+- **THEN** the page renders the full page content from the cached WordPress dataset
+
+### Requirement: Category-based secondary navigation
+The blog section MUST render a secondary navigation under the header derived from the cached WordPress categories.
+
+Selecting a category MUST navigate to a category listing page showing only posts in that category.
+
+#### Scenario: Category nav present
+- **WHEN** the cached WordPress dataset contains categories
+- **THEN** the blog section shows a secondary navigation with those categories
+
+#### Scenario: Category listing filters posts
+- **WHEN** a user navigates to a category listing page
+- **THEN** only posts assigned to that category are listed
+
+### Requirement: Graceful empty states
+If there are no WordPress posts available, the blog index MUST render a non-broken empty state and MUST still render header/navigation.
+
+#### Scenario: No posts available
+- **WHEN** the cached WordPress dataset contains no posts
+- **THEN** `/blog` renders a helpful empty state
+
--- a/openspec/specs/cache-layer/spec.md
+++ b/openspec/specs/cache-layer/spec.md
@@ -0,0 +1,42 @@
+## Purpose
+
+Provide a shared caching layer (Redis-backed) for ingestion and content processing flows, with TTL-based invalidation and manual cache clearing.
+
+## ADDED Requirements
+
+### Requirement: Redis-backed cache service
+The system MUST provide a Redis-backed cache service for use by ingestion and content processing flows.
+
+The cache service MUST be runnable in local development via Docker Compose.
+
+#### Scenario: Cache service available in Docker
+- **WHEN** the Docker Compose stack is started
+- **THEN** a Redis service is available to other services/scripts on the internal network
+
+### Requirement: TTL-based invalidation
+Cached entries MUST support TTL-based invalidation.
+
+The system MUST define a default TTL and MUST allow overriding the TTL via environment/config.
+
+#### Scenario: Default TTL applies
+- **WHEN** a cached entry is written without an explicit TTL override
+- **THEN** it expires after the configured default TTL
+
+#### Scenario: TTL override applies
+- **WHEN** a TTL override is configured via environment/config
+- **THEN** new cached entries use that TTL for expiration
+
+### Requirement: Cache key namespace
+Cache keys MUST be namespaced by source and parameters so that different data requests do not collide.
+
+#### Scenario: Two different sources do not collide
+- **WHEN** the system caches a YouTube fetch and a WordPress fetch
+- **THEN** they use different key namespaces and do not overwrite each other
+
+### Requirement: Manual cache clear
+The system MUST provide a script/command to manually clear the cache.
+
+#### Scenario: Manual clear executed
+- **WHEN** a developer runs the cache clear command
+- **THEN** the cache is cleared and subsequent ingestion runs produce cache misses
+
--- a/openspec/specs/homepage-content-modules/spec.md
+++ b/openspec/specs/homepage-content-modules/spec.md
@@ -38,7 +38,8 @@ When `metrics.views` is not available, the system MUST render the high-performin
 ### Requirement: Graceful empty and error states
 If a module has no content to display, the homepage MUST render a non-broken empty state for that module and MUST still render the rest of the page.

+The Instagram module is an exception: if there are no Instagram items to display, the homepage MUST omit the Instagram module entirely (no empty state block) and MUST still render the rest of the page.
+
 #### Scenario: No Instagram items available
 - **WHEN** the cached dataset contains no Instagram items
- **THEN** the Instagram-related module renders an empty state and the homepage still renders other modules
-
+- **THEN** the Instagram-related module is not rendered and the homepage still renders other modules
--- a/openspec/specs/seo-content-surface/spec.md
+++ b/openspec/specs/seo-content-surface/spec.md
@@ -45,9 +45,19 @@ The site MUST provide:
 - `sitemap.xml` enumerating indexable pages
 - `robots.txt` that allows indexing of indexable pages

+The sitemap MUST include the blog surface routes:
+- `/blog`
+- blog post detail routes
+- blog page detail routes
+- blog category listing routes
+
 #### Scenario: Sitemap is available
 - **WHEN** a crawler requests `/sitemap.xml`
- **THEN** the server returns an XML sitemap listing `/`, `/videos`, `/podcast`, and `/about`
+- **THEN** the server returns an XML sitemap listing `/`, `/videos`, `/podcast`, `/about`, and `/blog`
+
+#### Scenario: Blog URLs appear in sitemap
+- **WHEN** WordPress content is available in the cache at build time
+- **THEN** the generated sitemap includes the blog detail URLs for those items

 ### Requirement: Structured data
 The site MUST support structured data (JSON-LD) for Video and Podcast content when detail pages exist, and MUST ensure the JSON-LD is valid JSON.
--- a/openspec/specs/social-content-aggregation/spec.md
+++ b/openspec/specs/social-content-aggregation/spec.md
@@ -57,6 +57,8 @@ The system MUST support periodic refresh on a schedule (at minimum daily) and MU

 On ingestion failure, the system MUST continue serving the most recent cached data.

+The ingestion pipeline MUST use the cache layer (when configured and reachable) to reduce repeated network and parsing work for external sources (for example, YouTube API/RSS and podcast RSS).
+
 #### Scenario: Scheduled refresh fails
 - **WHEN** a scheduled refresh run fails to fetch one or more sources
 - **THEN** the site continues to use the most recent successfully cached dataset
@@ -65,3 +67,6 @@ On ingestion failure, the system MUST continue serving the most recent cached da
 - **WHEN** a manual refresh is triggered
 - **THEN** the system attempts ingestion immediately and updates the cache if ingestion succeeds

+#### Scenario: Cache hit avoids refetch
+- **WHEN** a refresh run is executed within the cache TTL for a given source+parameters
+- **THEN** the ingestion pipeline uses cached data for that source instead of refetching over the network
--- a/openspec/specs/wordpress-content-source/spec.md
+++ b/openspec/specs/wordpress-content-source/spec.md
@@ -0,0 +1,69 @@
+## Purpose
+
+Provide a build-time content source backed by a WordPress site via the `wp-json` REST APIs.
+
+## ADDED Requirements
+
+### Requirement: WordPress API configuration
+The system MUST allow configuring a WordPress content source using environment/config values:
+- WordPress base URL
+- credentials (username + password or application password) when required by the WordPress instance
+
+The WordPress base URL MUST be used to construct requests to the WordPress `wp-json` REST APIs.
+
+#### Scenario: Config provided
+- **WHEN** WordPress configuration values are provided
+- **THEN** the system can attempt to fetch WordPress content via `wp-json`
+
+### Requirement: Fetch posts
+The system MUST fetch the latest WordPress posts via `wp-json` and map them into an internal representation with:
+- stable ID
+- slug
+- title
+- excerpt/summary
+- content HTML
+- featured image URL when available
+- publish date/time and last modified date/time
+- category assignments (IDs and slugs when available)
+
+#### Scenario: Posts fetched successfully
+- **WHEN** the WordPress posts endpoint returns a non-empty list
+- **THEN** the system stores the mapped post items in the content cache for rendering
+
+### Requirement: Fetch pages
+The system MUST fetch WordPress pages via `wp-json` and map them into an internal representation with:
+- stable ID
+- slug
+- title
+- excerpt/summary when available
+- content HTML
+- featured image URL when available
+- publish date/time and last modified date/time
+
+#### Scenario: Pages fetched successfully
+- **WHEN** the WordPress pages endpoint returns a non-empty list
+- **THEN** the system stores the mapped page items in the content cache for rendering
+
+### Requirement: Fetch categories
+The system MUST fetch WordPress categories via `wp-json` and store them for rendering a category-based secondary navigation under the blog section.
+
+#### Scenario: Categories fetched successfully
+- **WHEN** the WordPress categories endpoint returns a list of categories
+- **THEN** the system stores categories (ID, slug, name) in the content cache for blog navigation
+
+### Requirement: Build-time caching
+WordPress posts, pages, and categories MUST be written into the repo-local content cache used by the site build.
+
+If the WordPress fetch fails, the system MUST NOT crash the entire build pipeline; it MUST either:
+- keep the last-known-good cached WordPress content (if present), or
+- store an empty WordPress dataset and allow the rest of the site to build.
+
+When the cache layer is configured and reachable, the WordPress ingestion MUST cache `wp-json` responses (or normalized outputs) using a TTL so repeated ingestion runs avoid unnecessary network requests and parsing work.
+
+#### Scenario: WordPress fetch fails
+- **WHEN** a WordPress API request fails
+- **THEN** the site build can still complete and the blog surface renders a graceful empty state
+
+#### Scenario: Cache hit avoids wp-json refetch
+- **WHEN** WordPress ingestion is executed within the configured cache TTL
+- **THEN** it uses cached data instead of refetching from `wp-json`