## Context Current operations are concentrated in `backend/cli.py` with a single `force-fetch` command and no unified admin maintenance suite. Operational actions such as archive cleanup, translation regeneration, image refresh, and cache/news reset require manual code/DB operations. Existing backend services already contain reusable primitives: ingestion (`process_and_store_news`), archival helpers (`archive_old_news`, `delete_archived_news`), and translation generation pipelines in `backend/news_service.py`. ## Goals / Non-Goals **Goals:** - Introduce an admin command suite that consolidates common maintenance and recovery actions. - Implement queued image refetch for latest 30 items, sequentially processed with exponential backoff. - Improve image refresh relevance by combining keyword and mood/sentiment cues with deterministic fallback behavior. - Provide safe destructive operations (`clear-news`, `clean-archive`, cache clear) with operator guardrails. - Add translation regeneration and parameterized fetch count command to reduce manual intervention. **Non-Goals:** - Replacing the scheduled ingestion model. - Introducing external queue infrastructure (RabbitMQ/Redis workers) for this phase. - Redesigning storage models or adding new DB tables unless strictly necessary. - Building a web-based admin dashboard in this change. ## Decisions ### Decision: Extend existing CLI with subcommands **Decision:** Expand `backend/cli.py` into a multi-subcommand admin command suite. **Rationale:** - Reuses existing deployment/runtime assumptions. - Keeps operations scriptable via terminal/cron and avoids UI scope expansion. **Alternatives considered:** - New standalone admin binary: rejected due to duplicated bootstrapping/runtime checks. ### Decision: Queue image refetch in-process with sequential workers **Decision:** Build a bounded in-memory queue for latest 30 items and process one-by-one. **Rationale:** - Meets rate-limit resilience requirement without new infrastructure. - Deterministic and easy to monitor in command output. **Alternatives considered:** - Parallel refetch workers: rejected due to higher provider throttling risk. ### Decision: Exponential backoff for external image calls **Decision:** Apply exponential backoff with capped retries for rate-limited or transient failures. **Rationale:** - Reduces burst retry amplification. - Improves success rate under API pressure. ### Decision: Safety-first destructive command ergonomics **Decision:** Destructive operations require explicit confirmation/flags and support dry-run where meaningful. **Rationale:** - Prevents accidental data loss. - Makes admin actions auditable and predictable. ### Decision: Fetch-N command reuses ingestion pipeline **Decision:** Add a fetch-count option that drives existing ingestion/fetch flow rather than building a second implementation. **Rationale:** - Preserves deduplication/retry logic and minimizes divergence. ## Risks / Trade-offs - **[Risk] Operator misuse of destructive commands** -> Mitigation: confirmation gate + explicit flags + dry-run. - **[Risk] Backoff can increase command runtime** -> Mitigation: cap retries and print progress ETA-style output. - **[Risk] Queue processing interruption mid-run** -> Mitigation: idempotent per-item updates and resumable reruns. - **[Trade-off] In-process queue is simpler but non-distributed** -> Mitigation: acceptable for admin-invoked maintenance scope. ## Migration Plan 1. Extend CLI parser with admin subcommands and argument validation. 2. Add reusable maintenance handlers (archive clean, cache clear, clear news, rebuild, regenerate translations, fetch-n). 3. Implement queued image-refetch handler with exponential backoff and per-item progress logs. 4. Add safe guards (`--confirm`, optional `--dry-run`) for destructive operations. 5. Document command usage and examples in README. Rollback: - Keep existing `force-fetch` path intact. - Revert new subcommands while preserving unaffected ingestion pipeline. ## Open Questions - What cache layers are considered in-scope for `clear-cache` (in-memory only vs additional filesystem cache)? - Should `rebuild-site` chain all maintenance actions or remain a defined subset with explicit steps? - Should `fetch n` enforce an upper bound to avoid accidental high-cost runs?