clawfort/openspec/changes/archive/2026-02-13-p02-force-fetch-command/design.md at 0e21e035f5e222d0b6a47583527f531706372a6f

Files

Santhosh Janardhanan bf4a40f533 bulk commit changes!

2026-02-13 02:32:06 -05:00

4.2 KiB

Raw Blame History

Context

ClawFort currently performs automated hourly news ingestion through APScheduler (scheduled_news_fetch() in backend/news_service.py) and the same pipeline handles retries, deduplication, image optimization, and persistence. There is no operator-facing command to run this pipeline on demand.

The change adds an explicit manual trigger path for operations use cases:

first-time bootstrap (populate content immediately after setup)
recovery after failed external API calls
ad-hoc operational refresh without waiting for scheduler cadence

Constraints:

Reuse existing fetch pipeline to avoid logic drift
Keep behavior idempotent with existing duplicate detection
Preserve scheduler behavior; manual runs must not mutate scheduler configuration

Goals / Non-Goals

Goals:

Provide a Python command to force an immediate news fetch.
Reuse existing retry, dedup, and storage logic.
Return clear terminal output and process exit status for automation.
Keep command safe to run repeatedly.

Non-Goals:

Replacing APScheduler-based hourly fetch.
Introducing new API endpoints for manual triggering.
Changing data schema or retention policy.
Building a full operator dashboard.

Decisions

Decision: Add a dedicated CLI entrypoint module

Decision: Add a small CLI entrypoint under backend (for example backend/cli.py) with a subcommand that invokes the fetch pipeline.

Rationale:

Keeps operational workflow explicit and scriptable.
Avoids coupling manual trigger behavior to HTTP routes.
Works in local dev and containerized runtime.

Alternatives considered:

Add an admin HTTP endpoint: rejected due to unnecessary security exposure.
Trigger APScheduler internals directly: rejected to avoid scheduler-state side effects.

Decision: Invoke the existing news pipeline directly

Decision: The command should call process_and_store_news() (or the existing sync wrapper) instead of implementing parallel fetch logic.

Rationale:

Guarantees parity with scheduled runs.
Reuses retry/backoff, fallback provider behavior, image handling, and dedup checks.
Minimizes maintenance overhead.

Alternatives considered:

New command-specific fetch implementation: rejected due to drift risk.

Decision: Standardize command exit semantics

Decision: Exit code 0 for successful command execution (including zero new items), non-zero for operational failures (for example unhandled exceptions or fatal setup errors).

Rationale:

Enables CI/cron/operator scripts to react deterministically.
Matches common CLI conventions.

Alternatives considered:

Exit non-zero when zero new items were inserted: rejected because dedup can make zero-item runs valid.

Decision: Keep manual and scheduled paths independent

Decision: Manual command does not reconfigure or trigger scheduler jobs; it performs a one-off run only.

Rationale:

Avoids race-prone manipulation of scheduler internals.
Reduces complexity and risk in production runtime.

Alternatives considered:

Temporarily altering scheduler trigger times: rejected as brittle and harder to reason about.

Risks / Trade-offs

[Risk] Overlapping manual and scheduled runs may happen at boundary times -> Mitigation: document operational guidance and keep dedup checks as safety net.
[Risk] External API failures still occur during forced runs -> Mitigation: existing retry/backoff plus fallback provider path and explicit error output.
[Trade-off] Command success does not guarantee new rows -> Mitigation: command output reports inserted count so operators can distinguish no-op vs failure.

Migration Plan

Add CLI module and force-fetch subcommand wired to existing pipeline.
Add command result reporting and exit code behavior.
Document usage in README for bootstrap and recovery flows.
Validate command in local runtime and container runtime.

Rollback:

Remove CLI entrypoint and related docs; scheduler-based hourly behavior remains unchanged.

Open Questions

Should force-fetch support an optional --max-attempts override, or stay fixed to pipeline defaults for v1?
Should concurrent-run prevention use a process lock in this phase, or remain a documented operational constraint?

4.2 KiB Raw Blame History

Context

Goals / Non-Goals

Decisions

Decision: Add a dedicated CLI entrypoint module

Decision: Invoke the existing news pipeline directly

Decision: Standardize command exit semantics

Decision: Keep manual and scheduled paths independent

Risks / Trade-offs

Migration Plan

Open Questions

4.2 KiB

Raw Blame History