94 lines
4.2 KiB
Markdown
94 lines
4.2 KiB
Markdown
## Context
|
|
|
|
ClawFort currently performs automated hourly news ingestion through APScheduler (`scheduled_news_fetch()` in `backend/news_service.py`) and the same pipeline handles retries, deduplication, image optimization, and persistence. There is no operator-facing command to run this pipeline on demand.
|
|
|
|
The change adds an explicit manual trigger path for operations use cases:
|
|
- first-time bootstrap (populate content immediately after setup)
|
|
- recovery after failed external API calls
|
|
- ad-hoc operational refresh without waiting for scheduler cadence
|
|
|
|
Constraints:
|
|
- Reuse existing fetch pipeline to avoid logic drift
|
|
- Keep behavior idempotent with existing duplicate detection
|
|
- Preserve scheduler behavior; manual runs must not mutate scheduler configuration
|
|
|
|
## Goals / Non-Goals
|
|
|
|
**Goals:**
|
|
- Provide a Python command to force an immediate news fetch.
|
|
- Reuse existing retry, dedup, and storage logic.
|
|
- Return clear terminal output and process exit status for automation.
|
|
- Keep command safe to run repeatedly.
|
|
|
|
**Non-Goals:**
|
|
- Replacing APScheduler-based hourly fetch.
|
|
- Introducing new API endpoints for manual triggering.
|
|
- Changing data schema or retention policy.
|
|
- Building a full operator dashboard.
|
|
|
|
## Decisions
|
|
|
|
### Decision: Add a dedicated CLI entrypoint module
|
|
**Decision:** Add a small CLI entrypoint under backend (for example `backend/cli.py`) with a subcommand that invokes the fetch pipeline.
|
|
|
|
**Rationale:**
|
|
- Keeps operational workflow explicit and scriptable.
|
|
- Avoids coupling manual trigger behavior to HTTP routes.
|
|
- Works in local dev and containerized runtime.
|
|
|
|
**Alternatives considered:**
|
|
- Add an admin HTTP endpoint: rejected due to unnecessary security exposure.
|
|
- Trigger APScheduler internals directly: rejected to avoid scheduler-state side effects.
|
|
|
|
### Decision: Invoke the existing news pipeline directly
|
|
**Decision:** The command should call `process_and_store_news()` (or the existing sync wrapper) instead of implementing parallel fetch logic.
|
|
|
|
**Rationale:**
|
|
- Guarantees parity with scheduled runs.
|
|
- Reuses retry/backoff, fallback provider behavior, image handling, and dedup checks.
|
|
- Minimizes maintenance overhead.
|
|
|
|
**Alternatives considered:**
|
|
- New command-specific fetch implementation: rejected due to drift risk.
|
|
|
|
### Decision: Standardize command exit semantics
|
|
**Decision:** Exit code `0` for successful command execution (including zero new items), non-zero for operational failures (for example unhandled exceptions or fatal setup errors).
|
|
|
|
**Rationale:**
|
|
- Enables CI/cron/operator scripts to react deterministically.
|
|
- Matches common CLI conventions.
|
|
|
|
**Alternatives considered:**
|
|
- Exit non-zero when zero new items were inserted: rejected because dedup can make zero-item runs valid.
|
|
|
|
### Decision: Keep manual and scheduled paths independent
|
|
**Decision:** Manual command does not reconfigure or trigger scheduler jobs; it performs a one-off run only.
|
|
|
|
**Rationale:**
|
|
- Avoids race-prone manipulation of scheduler internals.
|
|
- Reduces complexity and risk in production runtime.
|
|
|
|
**Alternatives considered:**
|
|
- Temporarily altering scheduler trigger times: rejected as brittle and harder to reason about.
|
|
|
|
## Risks / Trade-offs
|
|
|
|
- **[Risk] Overlapping manual and scheduled runs may happen at boundary times** -> Mitigation: document operational guidance and keep dedup checks as safety net.
|
|
- **[Risk] External API failures still occur during forced runs** -> Mitigation: existing retry/backoff plus fallback provider path and explicit error output.
|
|
- **[Trade-off] Command success does not guarantee new rows** -> Mitigation: command output reports inserted count so operators can distinguish no-op vs failure.
|
|
|
|
## Migration Plan
|
|
|
|
1. Add CLI module and force-fetch subcommand wired to existing pipeline.
|
|
2. Add command result reporting and exit code behavior.
|
|
3. Document usage in README for bootstrap and recovery flows.
|
|
4. Validate command in local runtime and container runtime.
|
|
|
|
Rollback:
|
|
- Remove CLI entrypoint and related docs; scheduler-based hourly behavior remains unchanged.
|
|
|
|
## Open Questions
|
|
|
|
- Should force-fetch support an optional `--max-attempts` override, or stay fixed to pipeline defaults for v1?
|
|
- Should concurrent-run prevention use a process lock in this phase, or remain a documented operational constraint?
|