Files
astro-website/openspec/changes/archive/2026-02-10-better-cache/design.md
2026-02-10 01:20:58 -05:00

49 lines
3.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Context
The site is an Astro static build served via nginx. Content is gathered by build-time ingestion (`site/scripts/fetch-content.ts`) that reads/writes a repo-local cache file (`site/content/cache/content.json`).
Today, repeated ingestion runs can re-hit external sources (YouTube API/RSS, podcast RSS, WordPress `wp-json`) and re-do normalization work. We want a shared caching layer to reduce IO and network load and to make repeated runs faster and more predictable.
## Goals / Non-Goals
**Goals:**
- Add a Redis-backed cache layer usable from Node scripts (ingestion) with TTL-based invalidation.
- Use the cache layer to reduce repeated network/API calls and parsing work for:
- social content ingestion (YouTube/podcast/Instagram list)
- WordPress `wp-json` ingestion
- Provide a default “industry standard” TTL with environment override.
- Add a manual cache clear command/script.
- Provide verification (tests and/or logs) that cache hits occur and TTL expiration behaves as expected.
**Non-Goals:**
- Adding a runtime server for the site (the site remains static HTML served by nginx).
- Caching browser requests to nginx (no CDN/edge cache configuration in this change).
- Perfect cache coherence across multiple machines/environments (dev+docker is the target).
## Decisions
- **Decision: Use Redis as the shared cache backend (docker-compose service).**
- Rationale: Redis is widely adopted, lightweight, supports TTLs natively, and is easy to run in dev via Docker.
- Alternative considered: Local file-based cache only. Rejected because it doesnt provide a shared service and is harder to invalidate consistently.
- **Decision: Cache at the “source fetch” and “normalized dataset” boundaries.**
- Rationale: The biggest cost is network + parsing/normalization. Caching raw API responses (or normalized outputs) by source+params gives the best win.
- Approach:
- Cache keys like `youtube:api:<channelId>:<limit>`, `podcast:rss:<url>`, `wp:posts`, `wp:pages`, `wp:categories`.
- Store JSON values, set TTL, and log hit/miss per key.
- **Decision: Default TTL = 1 hour (3600s), configurable via env.**
- Rationale: A 1h TTL is a common baseline for content freshness vs load. It also aligns with typical ingestion schedules (hourly/daily).
- Allow overrides for local testing and production tuning.
- **Decision: Cache clear script uses Redis `FLUSHDB` in the configured Redis database.**
- Rationale: Simple manual operation and easy to verify.
- Guardrail: Use a dedicated Redis DB index (e.g., `0` by default) so the script is scoped.
## Risks / Trade-offs
- [Risk] Redis introduces a new dependency and operational moving part. -> Mitigation: Keep Redis optional; ingestion should fall back to no-cache mode if Redis is not reachable.
- [Risk] Stale content if TTL too long. -> Mitigation: Default to 1h and allow env override; provide manual clear command.
- [Risk] Cache key mistakes lead to wrong content reuse. -> Mitigation: Centralize key generation and add tests for key uniqueness and TTL behavior.