better cache

This commit is contained in:
2026-02-10 01:20:58 -05:00
parent c773affbc8
commit f056e67eae
39 changed files with 830 additions and 17 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-10

View File

@@ -0,0 +1,48 @@
## Context
The site is an Astro static build served via nginx. Content is gathered by build-time ingestion (`site/scripts/fetch-content.ts`) that reads/writes a repo-local cache file (`site/content/cache/content.json`).
Today, repeated ingestion runs can re-hit external sources (YouTube API/RSS, podcast RSS, WordPress `wp-json`) and re-do normalization work. We want a shared caching layer to reduce IO and network load and to make repeated runs faster and more predictable.
## Goals / Non-Goals
**Goals:**
- Add a Redis-backed cache layer usable from Node scripts (ingestion) with TTL-based invalidation.
- Use the cache layer to reduce repeated network/API calls and parsing work for:
- social content ingestion (YouTube/podcast/Instagram list)
- WordPress `wp-json` ingestion
- Provide a default “industry standard” TTL with environment override.
- Add a manual cache clear command/script.
- Provide verification (tests and/or logs) that cache hits occur and TTL expiration behaves as expected.
**Non-Goals:**
- Adding a runtime server for the site (the site remains static HTML served by nginx).
- Caching browser requests to nginx (no CDN/edge cache configuration in this change).
- Perfect cache coherence across multiple machines/environments (dev+docker is the target).
## Decisions
- **Decision: Use Redis as the shared cache backend (docker-compose service).**
- Rationale: Redis is widely adopted, lightweight, supports TTLs natively, and is easy to run in dev via Docker.
- Alternative considered: Local file-based cache only. Rejected because it doesnt provide a shared service and is harder to invalidate consistently.
- **Decision: Cache at the “source fetch” and “normalized dataset” boundaries.**
- Rationale: The biggest cost is network + parsing/normalization. Caching raw API responses (or normalized outputs) by source+params gives the best win.
- Approach:
- Cache keys like `youtube:api:<channelId>:<limit>`, `podcast:rss:<url>`, `wp:posts`, `wp:pages`, `wp:categories`.
- Store JSON values, set TTL, and log hit/miss per key.
- **Decision: Default TTL = 1 hour (3600s), configurable via env.**
- Rationale: A 1h TTL is a common baseline for content freshness vs load. It also aligns with typical ingestion schedules (hourly/daily).
- Allow overrides for local testing and production tuning.
- **Decision: Cache clear script uses Redis `FLUSHDB` in the configured Redis database.**
- Rationale: Simple manual operation and easy to verify.
- Guardrail: Use a dedicated Redis DB index (e.g., `0` by default) so the script is scoped.
## Risks / Trade-offs
- [Risk] Redis introduces a new dependency and operational moving part. -> Mitigation: Keep Redis optional; ingestion should fall back to no-cache mode if Redis is not reachable.
- [Risk] Stale content if TTL too long. -> Mitigation: Default to 1h and allow env override; provide manual clear command.
- [Risk] Cache key mistakes lead to wrong content reuse. -> Mitigation: Centralize key generation and add tests for key uniqueness and TTL behavior.

View File

@@ -0,0 +1,28 @@
## Why
Reduce IO and external fetch load by adding a shared caching layer so repeated requests for the same content do not re-hit disk/network unnecessarily.
## What Changes
- Add a caching layer (Redis or similar lightweight cache) used by the sites data/ingestion flows.
- Add a cache service to `docker-compose.yml`.
- Define an industry-standard cache invalidation interval (TTL) with a sensible default and allow it to be configured via environment variables.
- Add a script/command to manually clear the cache on demand.
- Add verification that the cache is working (cache hits/misses and TTL behavior).
## Capabilities
### New Capabilities
- `cache-layer`: Provide a shared caching service (Redis or equivalent) with TTL-based invalidation and a manual clear operation for the websites data flows.
### Modified Capabilities
- `social-content-aggregation`: Use the cache layer to avoid re-fetching or re-processing external content sources on repeated runs/requests.
- `wordpress-content-source`: Use the cache layer to reduce repeated `wp-json` fetches and parsing work.
## Impact
- Deployment/local dev: add Redis (or equivalent) to `docker-compose.yml` and wire environment/config for connection + TTL.
- Scripts/services: update ingestion/build-time fetch to read/write via cache and log hit/miss for verification.
- Tooling: add a cache-clear script/command (and document usage).
- Testing: add tests or a lightweight verification step proving cached reads are used and expire as expected.

View File

@@ -0,0 +1,38 @@
## ADDED Requirements
### Requirement: Redis-backed cache service
The system MUST provide a Redis-backed cache service for use by ingestion and content processing flows.
The cache service MUST be runnable in local development via Docker Compose.
#### Scenario: Cache service available in Docker
- **WHEN** the Docker Compose stack is started
- **THEN** a Redis service is available to other services/scripts on the internal network
### Requirement: TTL-based invalidation
Cached entries MUST support TTL-based invalidation.
The system MUST define a default TTL and MUST allow overriding the TTL via environment/config.
#### Scenario: Default TTL applies
- **WHEN** a cached entry is written without an explicit TTL override
- **THEN** it expires after the configured default TTL
#### Scenario: TTL override applies
- **WHEN** a TTL override is configured via environment/config
- **THEN** new cached entries use that TTL for expiration
### Requirement: Cache key namespace
Cache keys MUST be namespaced by source and parameters so that different data requests do not collide.
#### Scenario: Two different sources do not collide
- **WHEN** the system caches a YouTube fetch and a WordPress fetch
- **THEN** they use different key namespaces and do not overwrite each other
### Requirement: Manual cache clear
The system MUST provide a script/command to manually clear the cache.
#### Scenario: Manual clear executed
- **WHEN** a developer runs the cache clear command
- **THEN** the cache is cleared and subsequent ingestion runs produce cache misses

View File

@@ -0,0 +1,23 @@
## MODIFIED Requirements
### Requirement: Refresh and caching
The system MUST cache the latest successful ingestion output and MUST serve the cached data to the site renderer.
The system MUST support periodic refresh on a schedule (at minimum daily) and MUST support a manual refresh trigger.
On ingestion failure, the system MUST continue serving the most recent cached data.
The ingestion pipeline MUST use the cache layer (when configured and reachable) to reduce repeated network and parsing work for external sources (for example, YouTube API/RSS and podcast RSS).
#### Scenario: Scheduled refresh fails
- **WHEN** a scheduled refresh run fails to fetch one or more sources
- **THEN** the site continues to use the most recent successfully cached dataset
#### Scenario: Manual refresh requested
- **WHEN** a manual refresh is triggered
- **THEN** the system attempts ingestion immediately and updates the cache if ingestion succeeds
#### Scenario: Cache hit avoids refetch
- **WHEN** a refresh run is executed within the cache TTL for a given source+parameters
- **THEN** the ingestion pipeline uses cached data for that source instead of refetching over the network

View File

@@ -0,0 +1,19 @@
## MODIFIED Requirements
### Requirement: Build-time caching
WordPress posts, pages, and categories MUST be written into the repo-local content cache used by the site build.
If the WordPress fetch fails, the system MUST NOT crash the entire build pipeline; it MUST either:
- keep the last-known-good cached WordPress content (if present), or
- store an empty WordPress dataset and allow the rest of the site to build.
When the cache layer is configured and reachable, the WordPress ingestion MUST cache `wp-json` responses (or normalized outputs) using a TTL so repeated ingestion runs avoid unnecessary network requests and parsing work.
#### Scenario: WordPress fetch fails
- **WHEN** a WordPress API request fails
- **THEN** the site build can still complete and the blog surface renders a graceful empty state
#### Scenario: Cache hit avoids wp-json refetch
- **WHEN** WordPress ingestion is executed within the configured cache TTL
- **THEN** it uses cached data instead of refetching from `wp-json`

View File

@@ -0,0 +1,26 @@
## 1. Cache Service And Config
- [x] 1.1 Add Redis service to `docker-compose.yml` and wire basic health/ports for local dev
- [x] 1.2 Add cache env/config variables (Redis URL/host+port, DB index, default TTL seconds) and document in `site/.env.example`
## 2. Cache Client And Utilities
- [x] 2.1 Add a small Redis cache client wrapper (get/set JSON with TTL, namespaced keys) for Node scripts
- [x] 2.2 Add logging for cache hit/miss per key to support verification
- [x] 2.3 Ensure caching is optional: if Redis is unreachable, ingestion proceeds without caching
## 3. Integrate With Ingestion
- [x] 3.1 Cache YouTube fetches (API and/or RSS) by source+params and reuse within TTL
- [x] 3.2 Cache podcast RSS fetch by URL and reuse within TTL
- [x] 3.3 Cache WordPress `wp-json` fetches (posts/pages/categories) and reuse within TTL
## 4. Cache Invalidation
- [x] 4.1 Add a command/script to manually clear the cache (scoped to configured Redis DB)
- [x] 4.2 Document the cache clear command usage
## 5. Verification
- [x] 5.1 Add a test that exercises the cache wrapper (set/get JSON + TTL expiration behavior)
- [x] 5.2 Add a test or build verification that a second ingestion run within TTL produces cache hits

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-10

View File

@@ -0,0 +1,53 @@
## Context
The site is currently a static Astro build served via nginx. Content is populated by a build-time fetch step (`site/scripts/fetch-content.ts`) that writes a repo-local cache file consumed by the Astro pages/components.
We want to add a new Blog section backed by a WordPress site via the `wp-json` REST APIs, including:
- a primary header nav link (`/blog`)
- blog listing pages (cards with featured image, title, excerpt)
- blog detail pages (full content)
- a blog-only secondary navigation based on WordPress categories
- support for both WordPress posts and pages
## Goals / Non-Goals
**Goals:**
- Add `/blog` with a listing of WordPress posts rendered as static HTML at build time.
- Add detail pages for WordPress posts and pages, rendered as static HTML at build time.
- Add category-based browsing within the Blog section (secondary navigation + category listing pages).
- Use environment variables for WordPress configuration (site URL and credentials) and fetch via `wp-json`.
- Keep pages indexable and included in sitemap output.
**Non-Goals:**
- Real-time updates without rebuilds (v1 remains build-time fetched).
- Implementing “like” storage in WordPress or a database (nice-to-have can be a simple outbound share action later).
- Full WordPress theme parity (we render a simplified reading surface).
## Decisions
- **Decision: Build-time ingestion into the existing content cache.**
- Rationale: Matches the current architecture (cache file + static build), keeps the site fast and crawlable, and avoids introducing a runtime server layer.
- Alternative: Client-side fetch from WP directly. Rejected for SEO and performance (would rely on client rendering and adds CORS/auth complexity).
- **Decision: Prefer WordPress Application Passwords over raw user passwords (if possible).**
- Rationale: Application passwords are the standard WP approach for API access and can be revoked without changing the user login password.
- Alternative: Basic auth with username/password. Allowed if thats what your WP setup supports, but we should treat credentials as secrets in `.env`.
- **Decision: Normalize WordPress content into a small internal schema.**
- Rationale: Keeps UI components simple and consistent with existing content rendering patterns (cards + detail pages).
- Implementation: Add a `wordpress` source to the cache schema, with fields for `id`, `slug`, `kind` (`post|page`), `title`, `excerpt`, `contentHtml`, `featuredImageUrl`, `publishedAt`, `updatedAt`, `categories`.
- **Decision: Route structure.**
- Rationale: Keep URLs clear and stable.
- Proposed:
- `/blog` (latest posts)
- `/blog/category/<slug>` (posts in category)
- `/blog/post/<slug>` (post detail)
- `/blog/page/<slug>` (page detail)
## Risks / Trade-offs
- [Risk] WP API rate limits / downtime break the build. → Mitigation: Cache last-known-good content.json; on fetch failure, retain existing cache and log errors.
- [Risk] WordPress HTML content can contain unexpected markup or scripts. → Mitigation: Render server-side as HTML but sanitize or strip scripts; document allowed HTML subset.
- [Risk] Auth method differs per WP hosting. → Mitigation: Support both public endpoints for reading (preferred) and authenticated requests when needed; keep config flexible.

View File

@@ -0,0 +1,30 @@
## Why
Add a blog section so the site can publish indexable textual content (in addition to videos/podcast), improving SEO and giving visitors another reason to return and engage.
## What Changes
- Add a new primary navigation link in the header: **Blog** (between **Podcast** and **About**).
- Add a blog index route that lists WordPress posts as cards (featured image, title, excerpt/summary).
- Add blog detail routes so a user can read the full content of a post.
- Add a secondary navigation within the blog section driven by WordPress categories (exact structure negotiable).
- Support rendering both WordPress **posts** and **pages** within the blog section.
- Add configuration via environment variables for WordPress site URL and credentials, and fetch content via the WordPress `wp-json` REST APIs.
- (Optional / later) Like and share feature for blog content.
## Capabilities
### New Capabilities
- `wordpress-content-source`: Fetch posts, pages, and categories from a configured WordPress site via `wp-json`, and provide them in a form the site can render (including featured images and excerpts).
- `blog-section-surface`: Provide blog routes (index, category views, content detail pages) and a secondary navigation for blog browsing.
### Modified Capabilities
- `seo-content-surface`: Include the blog routes in the indexable surface (e.g., sitemap coverage and crawlable HTML for `/blog` and blog detail pages).
## Impact
- Site UI/layout: header navigation update; new blog pages; secondary blog navigation.
- Content pipeline: extend the content fetching/caching flow to include WordPress content; update any normalized schemas/types as needed.
- Configuration: add WordPress settings to environment/config and ensure they are supported in local dev and Docker.
- SEO: ensure blog pages have correct titles, descriptions/excerpts, canonical URLs, and appear in `sitemap.xml`.

View File

@@ -0,0 +1,62 @@
## ADDED Requirements
### Requirement: Primary navigation entry
The site MUST add a header navigation link to the blog index at `/blog` labeled "Blog".
#### Scenario: Blog link in header
- **WHEN** a user views any page
- **THEN** the header navigation includes a "Blog" link that navigates to `/blog`
### Requirement: Blog index listing (posts)
The site MUST provide a blog index page at `/blog` that lists WordPress posts as cards containing:
- featured image (when available)
- title
- excerpt/summary
The listing MUST be ordered by publish date descending (newest first).
#### Scenario: Blog index lists posts
- **WHEN** the cached WordPress dataset contains posts
- **THEN** `/blog` renders a list of post cards ordered by publish date descending
### Requirement: Blog post detail
The site MUST provide a blog post detail page for each WordPress post that renders:
- title
- publish date
- featured image (when available)
- full post content
#### Scenario: Post detail renders
- **WHEN** a user navigates to a blog post detail page
- **THEN** the page renders the full post content from the cached WordPress dataset
### Requirement: WordPress pages support
The blog section MUST support WordPress pages by rendering page detail routes that show:
- title
- featured image (when available)
- full page content
#### Scenario: Page detail renders
- **WHEN** a user navigates to a WordPress page detail route
- **THEN** the page renders the full page content from the cached WordPress dataset
### Requirement: Category-based secondary navigation
The blog section MUST render a secondary navigation under the header derived from the cached WordPress categories.
Selecting a category MUST navigate to a category listing page showing only posts in that category.
#### Scenario: Category nav present
- **WHEN** the cached WordPress dataset contains categories
- **THEN** the blog section shows a secondary navigation with those categories
#### Scenario: Category listing filters posts
- **WHEN** a user navigates to a category listing page
- **THEN** only posts assigned to that category are listed
### Requirement: Graceful empty states
If there are no WordPress posts available, the blog index MUST render a non-broken empty state and MUST still render header/navigation.
#### Scenario: No posts available
- **WHEN** the cached WordPress dataset contains no posts
- **THEN** `/blog` renders a helpful empty state

View File

@@ -0,0 +1,21 @@
## MODIFIED Requirements
### Requirement: Sitemap and robots
The site MUST provide:
- `sitemap.xml` enumerating indexable pages
- `robots.txt` that allows indexing of indexable pages
The sitemap MUST include the blog surface routes:
- `/blog`
- blog post detail routes
- blog page detail routes
- blog category listing routes
#### Scenario: Sitemap is available
- **WHEN** a crawler requests `/sitemap.xml`
- **THEN** the server returns an XML sitemap listing `/`, `/videos`, `/podcast`, `/about`, and `/blog`
#### Scenario: Blog URLs appear in sitemap
- **WHEN** WordPress content is available in the cache at build time
- **THEN** the generated sitemap includes the blog detail URLs for those items

View File

@@ -0,0 +1,60 @@
## ADDED Requirements
### Requirement: WordPress API configuration
The system MUST allow configuring a WordPress content source using environment/config values:
- WordPress base URL
- credentials (username + password or application password) when required by the WordPress instance
The WordPress base URL MUST be used to construct requests to the WordPress `wp-json` REST APIs.
#### Scenario: Config provided
- **WHEN** WordPress configuration values are provided
- **THEN** the system can attempt to fetch WordPress content via `wp-json`
### Requirement: Fetch posts
The system MUST fetch the latest WordPress posts via `wp-json` and map them into an internal representation with:
- stable ID
- slug
- title
- excerpt/summary
- content HTML
- featured image URL when available
- publish date/time and last modified date/time
- category assignments (IDs and slugs when available)
#### Scenario: Posts fetched successfully
- **WHEN** the WordPress posts endpoint returns a non-empty list
- **THEN** the system stores the mapped post items in the content cache for rendering
### Requirement: Fetch pages
The system MUST fetch WordPress pages via `wp-json` and map them into an internal representation with:
- stable ID
- slug
- title
- excerpt/summary when available
- content HTML
- featured image URL when available
- publish date/time and last modified date/time
#### Scenario: Pages fetched successfully
- **WHEN** the WordPress pages endpoint returns a non-empty list
- **THEN** the system stores the mapped page items in the content cache for rendering
### Requirement: Fetch categories
The system MUST fetch WordPress categories via `wp-json` and store them for rendering a category-based secondary navigation under the blog section.
#### Scenario: Categories fetched successfully
- **WHEN** the WordPress categories endpoint returns a list of categories
- **THEN** the system stores categories (ID, slug, name) in the content cache for blog navigation
### Requirement: Build-time caching
WordPress posts, pages, and categories MUST be written into the repo-local content cache used by the site build.
If the WordPress fetch fails, the system MUST NOT crash the entire build pipeline; it MUST either:
- keep the last-known-good cached WordPress content (if present), or
- store an empty WordPress dataset and allow the rest of the site to build.
#### Scenario: WordPress fetch fails
- **WHEN** a WordPress API request fails
- **THEN** the site build can still complete and the blog surface renders a graceful empty state

View File

@@ -0,0 +1,28 @@
## 1. WordPress Config And Fetch
- [x] 1.1 Add WordPress env/config variables (base URL + credentials) and document them in `site/.env.example`
- [x] 1.2 Extend `site/scripts/fetch-content.ts` to fetch WordPress posts, pages, and categories via `wp-json` and write to `site/content/cache/content.json`
- [x] 1.3 Add a failure mode where WP fetch errors do not crash the whole fetch/build (keep last-known-good or write empty WP dataset)
## 2. Normalize And Select
- [x] 2.1 Extend content cache/types to represent WordPress items (post/page) and categories
- [x] 2.2 Add selector helpers for WordPress posts/pages/categories (ordered by publish date, filter by category)
## 3. Blog UI Surface
- [x] 3.1 Add `/blog` index page that renders WordPress post cards (featured image, title, excerpt)
- [x] 3.2 Add post detail routes (e.g., `/blog/post/<slug>`) that render the full post content
- [x] 3.3 Add page detail routes (e.g., `/blog/page/<slug>`) that render the full page content
- [x] 3.4 Add blog secondary navigation under header based on cached categories, with category listing pages (e.g., `/blog/category/<slug>`)
- [x] 3.5 Add header nav link "Blog" between "Podcast" and "About"
## 4. SEO And Sitemap
- [x] 4.1 Ensure blog pages include title/description/canonical URL metadata
- [x] 4.2 Update sitemap generation to include `/blog` and blog content routes when WP content is present at build time
## 5. Verification
- [x] 5.1 Add at least one test to assert the header includes the Blog link
- [x] 5.2 Add a build verification that `/blog` is generated and renders an empty state when no WP content is available

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-10

View File

@@ -0,0 +1,33 @@
## Context
The homepage includes an Instagram section that renders even when there are no Instagram items in the cached dataset (or configured Instagram post URL list). Today the page shows an empty-state block, which takes away valuable homepage real estate.
The site is built with Astro and served as static HTML via nginx.
## Goals / Non-Goals
**Goals:**
- Omit the Instagram module entirely when there are no Instagram items available.
- Keep the rest of the homepage modules rendering unchanged.
- Preserve existing behavior when Instagram items exist (module visible, embeds render).
**Non-Goals:**
- Changing how Instagram data is ingested or configured (this remains a content/config concern).
- Adding new analytics events (existing Umami tracking remains as-is).
- Redesigning the homepage module layout beyond removing the empty Instagram block.
## Decisions
- **Decision: Prefer conditional rendering over “empty state” content.**
- Rationale: The objective is to reclaim space and keep the page focused; rendering nothing is the simplest and most consistent behavior.
- Alternative considered: Keep the empty state but reduce its height. Rejected because it still consumes space and draws attention to missing content.
- **Decision: Make the visibility decision at the homepage composition level.**
- Rationale: The homepage (`site/src/pages/index.astro`) is the module orchestrator and already has access to the list of Instagram items; the simplest change is to gate rendering of the section.
- Alternative considered: Add “hide if empty” behavior inside a shared module component. Rejected for now because the Instagram section is already bespoke in the homepage.
## Risks / Trade-offs
- [Risk] The homepage may look “suddenly different” if Instagram items disappear due to content/config changes. → Mitigation: This is the intended outcome; ensure the layout still flows nicely without the section.
- [Risk] Future requirements might want a minimal hint instead of full omission. → Mitigation: If needed later, we can reintroduce a smaller, inline CTA without restoring the full module.

View File

@@ -0,0 +1,23 @@
## Why
The homepage currently reserves space for the Instagram module even when there are no Instagram items available, which wastes above-the-fold real estate and makes the page less relevant.
## What Changes
- When the Instagram feed dataset is empty, the homepage MUST omit the Instagram module entirely (not render an empty state block).
- The rest of the homepage modules MUST continue to render normally.
## Capabilities
### New Capabilities
- (none)
### Modified Capabilities
- `homepage-content-modules`: change the "no Instagram items available" behavior from rendering an empty state to hiding/omitting the Instagram module.
## Impact
- Affected UI: homepage module composition and layout.
- Likely code changes in the homepage renderer (e.g., `site/src/pages/index.astro`) and any Instagram module component(s).
- Tests/verification should cover: with Instagram items => module visible; without items => module omitted.

View File

@@ -0,0 +1,11 @@
## MODIFIED Requirements
### Requirement: Graceful empty and error states
If a module has no content to display, the homepage MUST render a non-broken empty state for that module and MUST still render the rest of the page.
The Instagram module is an exception: if there are no Instagram items to display, the homepage MUST omit the Instagram module entirely (no empty state block) and MUST still render the rest of the page.
#### Scenario: No Instagram items available
- **WHEN** the cached dataset contains no Instagram items
- **THEN** the Instagram-related module is not rendered and the homepage still renders other modules

View File

@@ -0,0 +1,15 @@
## 1. Reproduce And Locate
- [x] 1.1 Confirm where the homepage renders the Instagram section and what data source it uses (`site/src/pages/index.astro`)
- [x] 1.2 Identify the condition that currently triggers the Instagram empty state (no items) and the markup that reserves space
## 2. Implement Conditional Rendering
- [x] 2.1 Update the homepage to omit the Instagram section entirely when there are 0 Instagram items (no header, no empty-state text, no container)
- [x] 2.2 Ensure when Instagram items exist, the Instagram section still renders correctly (embeds/links unchanged)
## 3. Verify
- [x] 3.1 Build the site and confirm the homepage HTML does not include the Instagram section when `site/content/instagram-posts.json` is empty
- [x] 3.2 Build the site and confirm the homepage HTML includes the Instagram section when `site/content/instagram-posts.json` has at least one item
- [x] 3.3 Smoke-test in Docker/nginx (`localhost:8080`) to ensure the homepage layout renders correctly in both cases