bulk commit changes!

2026-02-13 02:32:06 -05:00
parent c8f98c54c9
commit bf4a40f533
152 changed files with 2210 additions and 19 deletions
--- a/openspec/changes/archive/2026-02-13-p02-force-fetch-command/design.md
+++ b/openspec/changes/archive/2026-02-13-p02-force-fetch-command/design.md
@@ -0,0 +1,93 @@
+## Context
+
+ClawFort currently performs automated hourly news ingestion through APScheduler (`scheduled_news_fetch()` in `backend/news_service.py`) and the same pipeline handles retries, deduplication, image optimization, and persistence. There is no operator-facing command to run this pipeline on demand.
+
+The change adds an explicit manual trigger path for operations use cases:
+- first-time bootstrap (populate content immediately after setup)
+- recovery after failed external API calls
+- ad-hoc operational refresh without waiting for scheduler cadence
+
+Constraints:
+- Reuse existing fetch pipeline to avoid logic drift
+- Keep behavior idempotent with existing duplicate detection
+- Preserve scheduler behavior; manual runs must not mutate scheduler configuration
+
+## Goals / Non-Goals
+
+**Goals:**
+- Provide a Python command to force an immediate news fetch.
+- Reuse existing retry, dedup, and storage logic.
+- Return clear terminal output and process exit status for automation.
+- Keep command safe to run repeatedly.
+
+**Non-Goals:**
+- Replacing APScheduler-based hourly fetch.
+- Introducing new API endpoints for manual triggering.
+- Changing data schema or retention policy.
+- Building a full operator dashboard.
+
+## Decisions
+
+### Decision: Add a dedicated CLI entrypoint module
+**Decision:** Add a small CLI entrypoint under backend (for example `backend/cli.py`) with a subcommand that invokes the fetch pipeline.
+
+**Rationale:**
+- Keeps operational workflow explicit and scriptable.
+- Avoids coupling manual trigger behavior to HTTP routes.
+- Works in local dev and containerized runtime.
+
+**Alternatives considered:**
+- Add an admin HTTP endpoint: rejected due to unnecessary security exposure.
+- Trigger APScheduler internals directly: rejected to avoid scheduler-state side effects.
+
+### Decision: Invoke the existing news pipeline directly
+**Decision:** The command should call `process_and_store_news()` (or the existing sync wrapper) instead of implementing parallel fetch logic.
+
+**Rationale:**
+- Guarantees parity with scheduled runs.
+- Reuses retry/backoff, fallback provider behavior, image handling, and dedup checks.
+- Minimizes maintenance overhead.
+
+**Alternatives considered:**
+- New command-specific fetch implementation: rejected due to drift risk.
+
+### Decision: Standardize command exit semantics
+**Decision:** Exit code `0` for successful command execution (including zero new items), non-zero for operational failures (for example unhandled exceptions or fatal setup errors).
+
+**Rationale:**
+- Enables CI/cron/operator scripts to react deterministically.
+- Matches common CLI conventions.
+
+**Alternatives considered:**
+- Exit non-zero when zero new items were inserted: rejected because dedup can make zero-item runs valid.
+
+### Decision: Keep manual and scheduled paths independent
+**Decision:** Manual command does not reconfigure or trigger scheduler jobs; it performs a one-off run only.
+
+**Rationale:**
+- Avoids race-prone manipulation of scheduler internals.
+- Reduces complexity and risk in production runtime.
+
+**Alternatives considered:**
+- Temporarily altering scheduler trigger times: rejected as brittle and harder to reason about.
+
+## Risks / Trade-offs
+
+- **[Risk] Overlapping manual and scheduled runs may happen at boundary times** -> Mitigation: document operational guidance and keep dedup checks as safety net.
+- **[Risk] External API failures still occur during forced runs** -> Mitigation: existing retry/backoff plus fallback provider path and explicit error output.
+- **[Trade-off] Command success does not guarantee new rows** -> Mitigation: command output reports inserted count so operators can distinguish no-op vs failure.
+
+## Migration Plan
+
+1. Add CLI module and force-fetch subcommand wired to existing pipeline.
+2. Add command result reporting and exit code behavior.
+3. Document usage in README for bootstrap and recovery flows.
+4. Validate command in local runtime and container runtime.
+
+Rollback:
+- Remove CLI entrypoint and related docs; scheduler-based hourly behavior remains unchanged.
+
+## Open Questions
+
+- Should force-fetch support an optional `--max-attempts` override, or stay fixed to pipeline defaults for v1?
+- Should concurrent-run prevention use a process lock in this phase, or remain a documented operational constraint?