clawfort/openspec/changes/archive/2026-02-13-p02-force-fetch-command/design.md

## Context

ClawFort currently performs automated hourly news ingestion through APScheduler (`scheduled_news_fetch()` in `backend/news_service.py`) and the same pipeline handles retries, deduplication, image optimization, and persistence. There is no operator-facing command to run this pipeline on demand.

The change adds an explicit manual trigger path for operations use cases:
- first-time bootstrap (populate content immediately after setup)
- recovery after failed external API calls
- ad-hoc operational refresh without waiting for scheduler cadence

Constraints:
- Reuse existing fetch pipeline to avoid logic drift
- Keep behavior idempotent with existing duplicate detection
- Preserve scheduler behavior; manual runs must not mutate scheduler configuration

## Goals / Non-Goals

**Goals:**
- Provide a Python command to force an immediate news fetch.
- Reuse existing retry, dedup, and storage logic.
- Return clear terminal output and process exit status for automation.
- Keep command safe to run repeatedly.

**Non-Goals:**
- Replacing APScheduler-based hourly fetch.
- Introducing new API endpoints for manual triggering.
- Changing data schema or retention policy.
- Building a full operator dashboard.

## Decisions

### Decision: Add a dedicated CLI entrypoint module
**Decision:** Add a small CLI entrypoint under backend (for example `backend/cli.py`) with a subcommand that invokes the fetch pipeline.

**Rationale:**
- Keeps operational workflow explicit and scriptable.
- Avoids coupling manual trigger behavior to HTTP routes.
- Works in local dev and containerized runtime.

**Alternatives considered:**
- Add an admin HTTP endpoint: rejected due to unnecessary security exposure.
- Trigger APScheduler internals directly: rejected to avoid scheduler-state side effects.

### Decision: Invoke the existing news pipeline directly
**Decision:** The command should call `process_and_store_news()` (or the existing sync wrapper) instead of implementing parallel fetch logic.

**Rationale:**
- Guarantees parity with scheduled runs.
- Reuses retry/backoff, fallback provider behavior, image handling, and dedup checks.
- Minimizes maintenance overhead.

**Alternatives considered:**
- New command-specific fetch implementation: rejected due to drift risk.

### Decision: Standardize command exit semantics
**Decision:** Exit code `0` for successful command execution (including zero new items), non-zero for operational failures (for example unhandled exceptions or fatal setup errors).

**Rationale:**
- Enables CI/cron/operator scripts to react deterministically.
- Matches common CLI conventions.

**Alternatives considered:**
- Exit non-zero when zero new items were inserted: rejected because dedup can make zero-item runs valid.

### Decision: Keep manual and scheduled paths independent
**Decision:** Manual command does not reconfigure or trigger scheduler jobs; it performs a one-off run only.

**Rationale:**
- Avoids race-prone manipulation of scheduler internals.
- Reduces complexity and risk in production runtime.

**Alternatives considered:**
- Temporarily altering scheduler trigger times: rejected as brittle and harder to reason about.

## Risks / Trade-offs

- **[Risk] Overlapping manual and scheduled runs may happen at boundary times** -> Mitigation: document operational guidance and keep dedup checks as safety net.
- **[Risk] External API failures still occur during forced runs** -> Mitigation: existing retry/backoff plus fallback provider path and explicit error output.
- **[Trade-off] Command success does not guarantee new rows** -> Mitigation: command output reports inserted count so operators can distinguish no-op vs failure.

## Migration Plan

1. Add CLI module and force-fetch subcommand wired to existing pipeline.
2. Add command result reporting and exit code behavior.
3. Document usage in README for bootstrap and recovery flows.
4. Validate command in local runtime and container runtime.

Rollback:
- Remove CLI entrypoint and related docs; scheduler-based hourly behavior remains unchanged.

## Open Questions

- Should force-fetch support an optional `--max-attempts` override, or stay fixed to pipeline defaults for v1?
- Should concurrent-run prevention use a process lock in this phase, or remain a documented operational constraint?