Files

4.2 KiB

Context

Current operations are concentrated in backend/cli.py with a single force-fetch command and no unified admin maintenance suite. Operational actions such as archive cleanup, translation regeneration, image refresh, and cache/news reset require manual code/DB operations. Existing backend services already contain reusable primitives: ingestion (process_and_store_news), archival helpers (archive_old_news, delete_archived_news), and translation generation pipelines in backend/news_service.py.

Goals / Non-Goals

Goals:

  • Introduce an admin command suite that consolidates common maintenance and recovery actions.
  • Implement queued image refetch for latest 30 items, sequentially processed with exponential backoff.
  • Improve image refresh relevance by combining keyword and mood/sentiment cues with deterministic fallback behavior.
  • Provide safe destructive operations (clear-news, clean-archive, cache clear) with operator guardrails.
  • Add translation regeneration and parameterized fetch count command to reduce manual intervention.

Non-Goals:

  • Replacing the scheduled ingestion model.
  • Introducing external queue infrastructure (RabbitMQ/Redis workers) for this phase.
  • Redesigning storage models or adding new DB tables unless strictly necessary.
  • Building a web-based admin dashboard in this change.

Decisions

Decision: Extend existing CLI with subcommands

Decision: Expand backend/cli.py into a multi-subcommand admin command suite.

Rationale:

  • Reuses existing deployment/runtime assumptions.
  • Keeps operations scriptable via terminal/cron and avoids UI scope expansion.

Alternatives considered:

  • New standalone admin binary: rejected due to duplicated bootstrapping/runtime checks.

Decision: Queue image refetch in-process with sequential workers

Decision: Build a bounded in-memory queue for latest 30 items and process one-by-one.

Rationale:

  • Meets rate-limit resilience requirement without new infrastructure.
  • Deterministic and easy to monitor in command output.

Alternatives considered:

  • Parallel refetch workers: rejected due to higher provider throttling risk.

Decision: Exponential backoff for external image calls

Decision: Apply exponential backoff with capped retries for rate-limited or transient failures.

Rationale:

  • Reduces burst retry amplification.
  • Improves success rate under API pressure.

Decision: Safety-first destructive command ergonomics

Decision: Destructive operations require explicit confirmation/flags and support dry-run where meaningful.

Rationale:

  • Prevents accidental data loss.
  • Makes admin actions auditable and predictable.

Decision: Fetch-N command reuses ingestion pipeline

Decision: Add a fetch-count option that drives existing ingestion/fetch flow rather than building a second implementation.

Rationale:

  • Preserves deduplication/retry logic and minimizes divergence.

Risks / Trade-offs

  • [Risk] Operator misuse of destructive commands -> Mitigation: confirmation gate + explicit flags + dry-run.
  • [Risk] Backoff can increase command runtime -> Mitigation: cap retries and print progress ETA-style output.
  • [Risk] Queue processing interruption mid-run -> Mitigation: idempotent per-item updates and resumable reruns.
  • [Trade-off] In-process queue is simpler but non-distributed -> Mitigation: acceptable for admin-invoked maintenance scope.

Migration Plan

  1. Extend CLI parser with admin subcommands and argument validation.
  2. Add reusable maintenance handlers (archive clean, cache clear, clear news, rebuild, regenerate translations, fetch-n).
  3. Implement queued image-refetch handler with exponential backoff and per-item progress logs.
  4. Add safe guards (--confirm, optional --dry-run) for destructive operations.
  5. Document command usage and examples in README.

Rollback:

  • Keep existing force-fetch path intact.
  • Revert new subcommands while preserving unaffected ingestion pipeline.

Open Questions

  • What cache layers are considered in-scope for clear-cache (in-memory only vs additional filesystem cache)?
  • Should rebuild-site chain all maintenance actions or remain a defined subset with explicit steps?
  • Should fetch n enforce an upper bound to avoid accidental high-cost runs?