better cache

This commit is contained in:
2026-02-10 01:20:58 -05:00
parent c773affbc8
commit f056e67eae
39 changed files with 830 additions and 17 deletions

View File

@@ -0,0 +1,48 @@
## Context
The site is an Astro static build served via nginx. Content is gathered by build-time ingestion (`site/scripts/fetch-content.ts`) that reads/writes a repo-local cache file (`site/content/cache/content.json`).
Today, repeated ingestion runs can re-hit external sources (YouTube API/RSS, podcast RSS, WordPress `wp-json`) and re-do normalization work. We want a shared caching layer to reduce IO and network load and to make repeated runs faster and more predictable.
## Goals / Non-Goals
**Goals:**
- Add a Redis-backed cache layer usable from Node scripts (ingestion) with TTL-based invalidation.
- Use the cache layer to reduce repeated network/API calls and parsing work for:
- social content ingestion (YouTube/podcast/Instagram list)
- WordPress `wp-json` ingestion
- Provide a default “industry standard” TTL with environment override.
- Add a manual cache clear command/script.
- Provide verification (tests and/or logs) that cache hits occur and TTL expiration behaves as expected.
**Non-Goals:**
- Adding a runtime server for the site (the site remains static HTML served by nginx).
- Caching browser requests to nginx (no CDN/edge cache configuration in this change).
- Perfect cache coherence across multiple machines/environments (dev+docker is the target).
## Decisions
- **Decision: Use Redis as the shared cache backend (docker-compose service).**
- Rationale: Redis is widely adopted, lightweight, supports TTLs natively, and is easy to run in dev via Docker.
- Alternative considered: Local file-based cache only. Rejected because it doesnt provide a shared service and is harder to invalidate consistently.
- **Decision: Cache at the “source fetch” and “normalized dataset” boundaries.**
- Rationale: The biggest cost is network + parsing/normalization. Caching raw API responses (or normalized outputs) by source+params gives the best win.
- Approach:
- Cache keys like `youtube:api:<channelId>:<limit>`, `podcast:rss:<url>`, `wp:posts`, `wp:pages`, `wp:categories`.
- Store JSON values, set TTL, and log hit/miss per key.
- **Decision: Default TTL = 1 hour (3600s), configurable via env.**
- Rationale: A 1h TTL is a common baseline for content freshness vs load. It also aligns with typical ingestion schedules (hourly/daily).
- Allow overrides for local testing and production tuning.
- **Decision: Cache clear script uses Redis `FLUSHDB` in the configured Redis database.**
- Rationale: Simple manual operation and easy to verify.
- Guardrail: Use a dedicated Redis DB index (e.g., `0` by default) so the script is scoped.
## Risks / Trade-offs
- [Risk] Redis introduces a new dependency and operational moving part. -> Mitigation: Keep Redis optional; ingestion should fall back to no-cache mode if Redis is not reachable.
- [Risk] Stale content if TTL too long. -> Mitigation: Default to 1h and allow env override; provide manual clear command.
- [Risk] Cache key mistakes lead to wrong content reuse. -> Mitigation: Centralize key generation and add tests for key uniqueness and TTL behavior.

View File

@@ -0,0 +1,28 @@
## Why
Reduce IO and external fetch load by adding a shared caching layer so repeated requests for the same content do not re-hit disk/network unnecessarily.
## What Changes
- Add a caching layer (Redis or similar lightweight cache) used by the sites data/ingestion flows.
- Add a cache service to `docker-compose.yml`.
- Define an industry-standard cache invalidation interval (TTL) with a sensible default and allow it to be configured via environment variables.
- Add a script/command to manually clear the cache on demand.
- Add verification that the cache is working (cache hits/misses and TTL behavior).
## Capabilities
### New Capabilities
- `cache-layer`: Provide a shared caching service (Redis or equivalent) with TTL-based invalidation and a manual clear operation for the websites data flows.
### Modified Capabilities
- `social-content-aggregation`: Use the cache layer to avoid re-fetching or re-processing external content sources on repeated runs/requests.
- `wordpress-content-source`: Use the cache layer to reduce repeated `wp-json` fetches and parsing work.
## Impact
- Deployment/local dev: add Redis (or equivalent) to `docker-compose.yml` and wire environment/config for connection + TTL.
- Scripts/services: update ingestion/build-time fetch to read/write via cache and log hit/miss for verification.
- Tooling: add a cache-clear script/command (and document usage).
- Testing: add tests or a lightweight verification step proving cached reads are used and expire as expected.

View File

@@ -0,0 +1,38 @@
## ADDED Requirements
### Requirement: Redis-backed cache service
The system MUST provide a Redis-backed cache service for use by ingestion and content processing flows.
The cache service MUST be runnable in local development via Docker Compose.
#### Scenario: Cache service available in Docker
- **WHEN** the Docker Compose stack is started
- **THEN** a Redis service is available to other services/scripts on the internal network
### Requirement: TTL-based invalidation
Cached entries MUST support TTL-based invalidation.
The system MUST define a default TTL and MUST allow overriding the TTL via environment/config.
#### Scenario: Default TTL applies
- **WHEN** a cached entry is written without an explicit TTL override
- **THEN** it expires after the configured default TTL
#### Scenario: TTL override applies
- **WHEN** a TTL override is configured via environment/config
- **THEN** new cached entries use that TTL for expiration
### Requirement: Cache key namespace
Cache keys MUST be namespaced by source and parameters so that different data requests do not collide.
#### Scenario: Two different sources do not collide
- **WHEN** the system caches a YouTube fetch and a WordPress fetch
- **THEN** they use different key namespaces and do not overwrite each other
### Requirement: Manual cache clear
The system MUST provide a script/command to manually clear the cache.
#### Scenario: Manual clear executed
- **WHEN** a developer runs the cache clear command
- **THEN** the cache is cleared and subsequent ingestion runs produce cache misses

View File

@@ -0,0 +1,23 @@
## MODIFIED Requirements
### Requirement: Refresh and caching
The system MUST cache the latest successful ingestion output and MUST serve the cached data to the site renderer.
The system MUST support periodic refresh on a schedule (at minimum daily) and MUST support a manual refresh trigger.
On ingestion failure, the system MUST continue serving the most recent cached data.
The ingestion pipeline MUST use the cache layer (when configured and reachable) to reduce repeated network and parsing work for external sources (for example, YouTube API/RSS and podcast RSS).
#### Scenario: Scheduled refresh fails
- **WHEN** a scheduled refresh run fails to fetch one or more sources
- **THEN** the site continues to use the most recent successfully cached dataset
#### Scenario: Manual refresh requested
- **WHEN** a manual refresh is triggered
- **THEN** the system attempts ingestion immediately and updates the cache if ingestion succeeds
#### Scenario: Cache hit avoids refetch
- **WHEN** a refresh run is executed within the cache TTL for a given source+parameters
- **THEN** the ingestion pipeline uses cached data for that source instead of refetching over the network

View File

@@ -0,0 +1,19 @@
## MODIFIED Requirements
### Requirement: Build-time caching
WordPress posts, pages, and categories MUST be written into the repo-local content cache used by the site build.
If the WordPress fetch fails, the system MUST NOT crash the entire build pipeline; it MUST either:
- keep the last-known-good cached WordPress content (if present), or
- store an empty WordPress dataset and allow the rest of the site to build.
When the cache layer is configured and reachable, the WordPress ingestion MUST cache `wp-json` responses (or normalized outputs) using a TTL so repeated ingestion runs avoid unnecessary network requests and parsing work.
#### Scenario: WordPress fetch fails
- **WHEN** a WordPress API request fails
- **THEN** the site build can still complete and the blog surface renders a graceful empty state
#### Scenario: Cache hit avoids wp-json refetch
- **WHEN** WordPress ingestion is executed within the configured cache TTL
- **THEN** it uses cached data instead of refetching from `wp-json`

View File

@@ -0,0 +1,26 @@
## 1. Cache Service And Config
- [x] 1.1 Add Redis service to `docker-compose.yml` and wire basic health/ports for local dev
- [x] 1.2 Add cache env/config variables (Redis URL/host+port, DB index, default TTL seconds) and document in `site/.env.example`
## 2. Cache Client And Utilities
- [x] 2.1 Add a small Redis cache client wrapper (get/set JSON with TTL, namespaced keys) for Node scripts
- [x] 2.2 Add logging for cache hit/miss per key to support verification
- [x] 2.3 Ensure caching is optional: if Redis is unreachable, ingestion proceeds without caching
## 3. Integrate With Ingestion
- [x] 3.1 Cache YouTube fetches (API and/or RSS) by source+params and reuse within TTL
- [x] 3.2 Cache podcast RSS fetch by URL and reuse within TTL
- [x] 3.3 Cache WordPress `wp-json` fetches (posts/pages/categories) and reuse within TTL
## 4. Cache Invalidation
- [x] 4.1 Add a command/script to manually clear the cache (scoped to configured Redis DB)
- [x] 4.2 Document the cache clear command usage
## 5. Verification
- [x] 5.1 Add a test that exercises the cache wrapper (set/get JSON + TTL expiration behavior)
- [x] 5.2 Add a test or build verification that a second ingestion run within TTL produces cache hits

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-10