Initial Commit

2026-02-12 16:50:29 -05:00
commit a1da041f14
74 changed files with 6140 additions and 0 deletions
--- a/openspec/changes/p02-force-fetch-command/.openspec.yaml
+++ b/openspec/changes/p02-force-fetch-command/.openspec.yaml
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-02-12
--- a/openspec/changes/p02-force-fetch-command/design.md
+++ b/openspec/changes/p02-force-fetch-command/design.md
@@ -0,0 +1,93 @@
+## Context
+
+ClawFort currently performs automated hourly news ingestion through APScheduler (`scheduled_news_fetch()` in `backend/news_service.py`) and the same pipeline handles retries, deduplication, image optimization, and persistence. There is no operator-facing command to run this pipeline on demand.
+
+The change adds an explicit manual trigger path for operations use cases:
+- first-time bootstrap (populate content immediately after setup)
+- recovery after failed external API calls
+- ad-hoc operational refresh without waiting for scheduler cadence
+
+Constraints:
+- Reuse existing fetch pipeline to avoid logic drift
+- Keep behavior idempotent with existing duplicate detection
+- Preserve scheduler behavior; manual runs must not mutate scheduler configuration
+
+## Goals / Non-Goals
+
+**Goals:**
+- Provide a Python command to force an immediate news fetch.
+- Reuse existing retry, dedup, and storage logic.
+- Return clear terminal output and process exit status for automation.
+- Keep command safe to run repeatedly.
+
+**Non-Goals:**
+- Replacing APScheduler-based hourly fetch.
+- Introducing new API endpoints for manual triggering.
+- Changing data schema or retention policy.
+- Building a full operator dashboard.
+
+## Decisions
+
+### Decision: Add a dedicated CLI entrypoint module
+**Decision:** Add a small CLI entrypoint under backend (for example `backend/cli.py`) with a subcommand that invokes the fetch pipeline.
+
+**Rationale:**
+- Keeps operational workflow explicit and scriptable.
+- Avoids coupling manual trigger behavior to HTTP routes.
+- Works in local dev and containerized runtime.
+
+**Alternatives considered:**
+- Add an admin HTTP endpoint: rejected due to unnecessary security exposure.
+- Trigger APScheduler internals directly: rejected to avoid scheduler-state side effects.
+
+### Decision: Invoke the existing news pipeline directly
+**Decision:** The command should call `process_and_store_news()` (or the existing sync wrapper) instead of implementing parallel fetch logic.
+
+**Rationale:**
+- Guarantees parity with scheduled runs.
+- Reuses retry/backoff, fallback provider behavior, image handling, and dedup checks.
+- Minimizes maintenance overhead.
+
+**Alternatives considered:**
+- New command-specific fetch implementation: rejected due to drift risk.
+
+### Decision: Standardize command exit semantics
+**Decision:** Exit code `0` for successful command execution (including zero new items), non-zero for operational failures (for example unhandled exceptions or fatal setup errors).
+
+**Rationale:**
+- Enables CI/cron/operator scripts to react deterministically.
+- Matches common CLI conventions.
+
+**Alternatives considered:**
+- Exit non-zero when zero new items were inserted: rejected because dedup can make zero-item runs valid.
+
+### Decision: Keep manual and scheduled paths independent
+**Decision:** Manual command does not reconfigure or trigger scheduler jobs; it performs a one-off run only.
+
+**Rationale:**
+- Avoids race-prone manipulation of scheduler internals.
+- Reduces complexity and risk in production runtime.
+
+**Alternatives considered:**
+- Temporarily altering scheduler trigger times: rejected as brittle and harder to reason about.
+
+## Risks / Trade-offs
+
+- **[Risk] Overlapping manual and scheduled runs may happen at boundary times** -> Mitigation: document operational guidance and keep dedup checks as safety net.
+- **[Risk] External API failures still occur during forced runs** -> Mitigation: existing retry/backoff plus fallback provider path and explicit error output.
+- **[Trade-off] Command success does not guarantee new rows** -> Mitigation: command output reports inserted count so operators can distinguish no-op vs failure.
+
+## Migration Plan
+
+1. Add CLI module and force-fetch subcommand wired to existing pipeline.
+2. Add command result reporting and exit code behavior.
+3. Document usage in README for bootstrap and recovery flows.
+4. Validate command in local runtime and container runtime.
+
+Rollback:
+- Remove CLI entrypoint and related docs; scheduler-based hourly behavior remains unchanged.
+
+## Open Questions
+
+- Should force-fetch support an optional `--max-attempts` override, or stay fixed to pipeline defaults for v1?
+- Should concurrent-run prevention use a process lock in this phase, or remain a documented operational constraint?
--- a/openspec/changes/p02-force-fetch-command/proposal.md
+++ b/openspec/changes/p02-force-fetch-command/proposal.md
@@ -0,0 +1,35 @@
+## Why
+
+ClawFort currently fetches news on a fixed hourly schedule, which is not enough during first-time setup or after a failed API cycle. Operators need a reliable way to force an immediate news pull so they can bootstrap content quickly and recover without waiting for the next scheduled run.
+
+## What Changes
+
+- **New Capabilities:**
+  - Add a manual Python command to trigger an immediate news fetch on demand.
+  - Add command output that clearly reports success/failure, number of fetched/stored items, and error details.
+  - Add safe invocation behavior so manual runs reuse existing fetch/retry/dedup logic.
+- **Backend:**
+  - Add a CLI entrypoint/script for force-fetch execution.
+  - Wire the command to existing news aggregation pipeline used by scheduled jobs.
+  - Return non-zero exit codes on command failure for operational automation.
+- **Operations:**
+  - Document how and when to run the force-fetch command (initial setup and recovery scenarios).
+
+## Capabilities
+
+### New Capabilities
+- `force-fetch-command`: Provide a Python command that triggers immediate news aggregation outside the hourly scheduler.
+- `fetch-run-reporting`: Provide operator-facing command output and exit semantics for successful runs and failures.
+- `manual-fetch-recovery`: Support manual recovery workflow after failed or partial API fetch cycles.
+
+### Modified Capabilities
+- None.
+
+## Impact
+
+- **Code:** New CLI command module/entrypoint plus minimal integration with existing `news_service` execution path.
+- **APIs:** No external API contract changes.
+- **Dependencies:** No required new runtime dependencies expected.
+- **Infrastructure:** No deployment topology change; command runs in the same container/runtime.
+- **Environment:** Reuses existing env vars (`PERPLEXITY_API_KEY`, `OPENROUTER_API_KEY`, `IMAGE_QUALITY`).
+- **Data:** No schema changes; command writes through existing dedup + persistence flow.
--- a/openspec/changes/p02-force-fetch-command/specs/fetch-run-reporting/spec.md
+++ b/openspec/changes/p02-force-fetch-command/specs/fetch-run-reporting/spec.md
@@ -0,0 +1,27 @@
+## ADDED Requirements
+
+### Requirement: Command reports run outcome to operator
+The system SHALL present operator-facing output that describes whether the forced run succeeded or failed.
+
+#### Scenario: Successful run reporting
+- **WHEN** a forced fetch command completes without fatal errors
+- **THEN** the command output includes a success indication
+- **AND** includes the number of items stored in that run
+
+#### Scenario: Failed run reporting
+- **WHEN** a forced fetch command encounters a fatal execution error
+- **THEN** the command output includes a failure indication
+- **AND** includes actionable error details for operator diagnosis
+
+### Requirement: Command exposes automation-friendly exit semantics
+The system SHALL return deterministic process exit codes for command success and failure.
+
+#### Scenario: Exit code on success
+- **WHEN** the force-fetch command execution completes successfully
+- **THEN** the process exits with code 0
+- **AND** automation tooling can treat the run as successful
+
+#### Scenario: Exit code on fatal failure
+- **WHEN** the force-fetch command execution fails fatally
+- **THEN** the process exits with a non-zero code
+- **AND** automation tooling can detect the failure state
--- a/openspec/changes/p02-force-fetch-command/specs/force-fetch-command/spec.md
+++ b/openspec/changes/p02-force-fetch-command/specs/force-fetch-command/spec.md
@@ -0,0 +1,27 @@
+## ADDED Requirements
+
+### Requirement: Operator can trigger immediate news fetch via Python command
+The system SHALL provide a Python command that triggers one immediate news aggregation run outside of the hourly scheduler.
+
+#### Scenario: Successful forced fetch invocation
+- **WHEN** an operator runs the documented force-fetch command with valid runtime configuration
+- **THEN** the system executes one full fetch cycle using the existing aggregation pipeline
+- **AND** the command terminates after the run completes
+
+#### Scenario: Command does not reconfigure scheduler
+- **WHEN** an operator runs the force-fetch command while the service scheduler exists
+- **THEN** the command performs a one-off run only
+- **AND** scheduler job definitions and cadence remain unchanged
+
+### Requirement: Forced fetch reuses existing aggregation behavior
+The system SHALL use the same retry, fallback, deduplication, image processing, and persistence logic as scheduled fetch runs.
+
+#### Scenario: Retry and fallback parity
+- **WHEN** the primary news provider request fails during a forced run
+- **THEN** the system applies the configured retry behavior
+- **AND** uses the configured fallback provider path if available
+
+#### Scenario: Deduplication parity
+- **WHEN** fetched headlines match existing duplicate rules
+- **THEN** duplicate items are skipped according to existing deduplication policy
+- **AND** only eligible items are persisted
--- a/openspec/changes/p02-force-fetch-command/specs/manual-fetch-recovery/spec.md
+++ b/openspec/changes/p02-force-fetch-command/specs/manual-fetch-recovery/spec.md
@@ -0,0 +1,22 @@
+## ADDED Requirements
+
+### Requirement: Manual command supports bootstrap and recovery workflows
+The system SHALL allow operators to run the forced fetch command during first-time setup and after failed scheduled cycles.
+
+#### Scenario: Bootstrap content population
+- **WHEN** the system is newly deployed and contains no current news items
+- **THEN** an operator can run the force-fetch command immediately
+- **AND** the command attempts to populate the dataset without waiting for the next hourly schedule
+
+#### Scenario: Recovery after failed scheduled fetch
+- **WHEN** a prior scheduled fetch cycle failed or produced incomplete results
+- **THEN** an operator can run the force-fetch command on demand
+- **AND** the system performs a fresh one-off fetch attempt
+
+### Requirement: Repeated manual runs remain operationally safe
+The system SHALL support repeated operator-triggered runs without corrupting data integrity.
+
+#### Scenario: Repeated invocation in same day
+- **WHEN** an operator runs the force-fetch command multiple times within the same day
+- **THEN** existing deduplication behavior prevents duplicate persistence for matching items
+- **AND** each command run completes with explicit run status output
--- a/openspec/changes/p02-force-fetch-command/tasks.md
+++ b/openspec/changes/p02-force-fetch-command/tasks.md
@@ -0,0 +1,28 @@
+## 1. CLI Command Foundation
+
+- [x] 1.1 Create `backend/cli.py` with command parsing for force-fetch execution
+- [x] 1.2 Add a force-fetch command entrypoint that can be invoked via Python module execution
+- [x] 1.3 Ensure command initializes required runtime context (env + database readiness)
+
+## 2. Force-Fetch Execution Path
+
+- [x] 2.1 Wire command to existing news aggregation execution path (`process_and_store_news` or sync wrapper)
+- [x] 2.2 Ensure command runs as a one-off operation without changing scheduler job configuration
+- [x] 2.3 Preserve existing deduplication, retry, fallback, and image processing behavior during manual runs
+
+## 3. Operator Reporting and Exit Semantics
+
+- [x] 3.1 Add success output that includes stored item count for the forced run
+- [x] 3.2 Add failure output with actionable error details when fatal execution errors occur
+- [x] 3.3 Return exit code `0` on success and non-zero on fatal failures
+
+## 4. Recovery Workflow and Validation
+
+- [x] 4.1 Validate bootstrap workflow: force-fetch on a fresh deployment with no current items
+- [x] 4.2 Validate recovery workflow: force-fetch after simulated failed scheduled cycle
+- [x] 4.3 Validate repeated same-day manual runs do not create duplicate records under dedup policy
+
+## 5. Documentation
+
+- [x] 5.1 Update `README.md` with force-fetch command usage for first-time setup
+- [x] 5.2 Document recovery-run usage and expected command output/exit behavior