4.2 KiB
Context
ClawFort currently performs automated hourly news ingestion through APScheduler (scheduled_news_fetch() in backend/news_service.py) and the same pipeline handles retries, deduplication, image optimization, and persistence. There is no operator-facing command to run this pipeline on demand.
The change adds an explicit manual trigger path for operations use cases:
- first-time bootstrap (populate content immediately after setup)
- recovery after failed external API calls
- ad-hoc operational refresh without waiting for scheduler cadence
Constraints:
- Reuse existing fetch pipeline to avoid logic drift
- Keep behavior idempotent with existing duplicate detection
- Preserve scheduler behavior; manual runs must not mutate scheduler configuration
Goals / Non-Goals
Goals:
- Provide a Python command to force an immediate news fetch.
- Reuse existing retry, dedup, and storage logic.
- Return clear terminal output and process exit status for automation.
- Keep command safe to run repeatedly.
Non-Goals:
- Replacing APScheduler-based hourly fetch.
- Introducing new API endpoints for manual triggering.
- Changing data schema or retention policy.
- Building a full operator dashboard.
Decisions
Decision: Add a dedicated CLI entrypoint module
Decision: Add a small CLI entrypoint under backend (for example backend/cli.py) with a subcommand that invokes the fetch pipeline.
Rationale:
- Keeps operational workflow explicit and scriptable.
- Avoids coupling manual trigger behavior to HTTP routes.
- Works in local dev and containerized runtime.
Alternatives considered:
- Add an admin HTTP endpoint: rejected due to unnecessary security exposure.
- Trigger APScheduler internals directly: rejected to avoid scheduler-state side effects.
Decision: Invoke the existing news pipeline directly
Decision: The command should call process_and_store_news() (or the existing sync wrapper) instead of implementing parallel fetch logic.
Rationale:
- Guarantees parity with scheduled runs.
- Reuses retry/backoff, fallback provider behavior, image handling, and dedup checks.
- Minimizes maintenance overhead.
Alternatives considered:
- New command-specific fetch implementation: rejected due to drift risk.
Decision: Standardize command exit semantics
Decision: Exit code 0 for successful command execution (including zero new items), non-zero for operational failures (for example unhandled exceptions or fatal setup errors).
Rationale:
- Enables CI/cron/operator scripts to react deterministically.
- Matches common CLI conventions.
Alternatives considered:
- Exit non-zero when zero new items were inserted: rejected because dedup can make zero-item runs valid.
Decision: Keep manual and scheduled paths independent
Decision: Manual command does not reconfigure or trigger scheduler jobs; it performs a one-off run only.
Rationale:
- Avoids race-prone manipulation of scheduler internals.
- Reduces complexity and risk in production runtime.
Alternatives considered:
- Temporarily altering scheduler trigger times: rejected as brittle and harder to reason about.
Risks / Trade-offs
- [Risk] Overlapping manual and scheduled runs may happen at boundary times -> Mitigation: document operational guidance and keep dedup checks as safety net.
- [Risk] External API failures still occur during forced runs -> Mitigation: existing retry/backoff plus fallback provider path and explicit error output.
- [Trade-off] Command success does not guarantee new rows -> Mitigation: command output reports inserted count so operators can distinguish no-op vs failure.
Migration Plan
- Add CLI module and force-fetch subcommand wired to existing pipeline.
- Add command result reporting and exit code behavior.
- Document usage in README for bootstrap and recovery flows.
- Validate command in local runtime and container runtime.
Rollback:
- Remove CLI entrypoint and related docs; scheduler-based hourly behavior remains unchanged.
Open Questions
- Should force-fetch support an optional
--max-attemptsoverride, or stay fixed to pipeline defaults for v1? - Should concurrent-run prevention use a process lock in this phase, or remain a documented operational constraint?