Initial Commit

This commit is contained in:
2026-02-12 16:50:29 -05:00
commit a1da041f14
74 changed files with 6140 additions and 0 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-12

View File

@@ -0,0 +1,93 @@
## Context
ClawFort currently performs automated hourly news ingestion through APScheduler (`scheduled_news_fetch()` in `backend/news_service.py`) and the same pipeline handles retries, deduplication, image optimization, and persistence. There is no operator-facing command to run this pipeline on demand.
The change adds an explicit manual trigger path for operations use cases:
- first-time bootstrap (populate content immediately after setup)
- recovery after failed external API calls
- ad-hoc operational refresh without waiting for scheduler cadence
Constraints:
- Reuse existing fetch pipeline to avoid logic drift
- Keep behavior idempotent with existing duplicate detection
- Preserve scheduler behavior; manual runs must not mutate scheduler configuration
## Goals / Non-Goals
**Goals:**
- Provide a Python command to force an immediate news fetch.
- Reuse existing retry, dedup, and storage logic.
- Return clear terminal output and process exit status for automation.
- Keep command safe to run repeatedly.
**Non-Goals:**
- Replacing APScheduler-based hourly fetch.
- Introducing new API endpoints for manual triggering.
- Changing data schema or retention policy.
- Building a full operator dashboard.
## Decisions
### Decision: Add a dedicated CLI entrypoint module
**Decision:** Add a small CLI entrypoint under backend (for example `backend/cli.py`) with a subcommand that invokes the fetch pipeline.
**Rationale:**
- Keeps operational workflow explicit and scriptable.
- Avoids coupling manual trigger behavior to HTTP routes.
- Works in local dev and containerized runtime.
**Alternatives considered:**
- Add an admin HTTP endpoint: rejected due to unnecessary security exposure.
- Trigger APScheduler internals directly: rejected to avoid scheduler-state side effects.
### Decision: Invoke the existing news pipeline directly
**Decision:** The command should call `process_and_store_news()` (or the existing sync wrapper) instead of implementing parallel fetch logic.
**Rationale:**
- Guarantees parity with scheduled runs.
- Reuses retry/backoff, fallback provider behavior, image handling, and dedup checks.
- Minimizes maintenance overhead.
**Alternatives considered:**
- New command-specific fetch implementation: rejected due to drift risk.
### Decision: Standardize command exit semantics
**Decision:** Exit code `0` for successful command execution (including zero new items), non-zero for operational failures (for example unhandled exceptions or fatal setup errors).
**Rationale:**
- Enables CI/cron/operator scripts to react deterministically.
- Matches common CLI conventions.
**Alternatives considered:**
- Exit non-zero when zero new items were inserted: rejected because dedup can make zero-item runs valid.
### Decision: Keep manual and scheduled paths independent
**Decision:** Manual command does not reconfigure or trigger scheduler jobs; it performs a one-off run only.
**Rationale:**
- Avoids race-prone manipulation of scheduler internals.
- Reduces complexity and risk in production runtime.
**Alternatives considered:**
- Temporarily altering scheduler trigger times: rejected as brittle and harder to reason about.
## Risks / Trade-offs
- **[Risk] Overlapping manual and scheduled runs may happen at boundary times** -> Mitigation: document operational guidance and keep dedup checks as safety net.
- **[Risk] External API failures still occur during forced runs** -> Mitigation: existing retry/backoff plus fallback provider path and explicit error output.
- **[Trade-off] Command success does not guarantee new rows** -> Mitigation: command output reports inserted count so operators can distinguish no-op vs failure.
## Migration Plan
1. Add CLI module and force-fetch subcommand wired to existing pipeline.
2. Add command result reporting and exit code behavior.
3. Document usage in README for bootstrap and recovery flows.
4. Validate command in local runtime and container runtime.
Rollback:
- Remove CLI entrypoint and related docs; scheduler-based hourly behavior remains unchanged.
## Open Questions
- Should force-fetch support an optional `--max-attempts` override, or stay fixed to pipeline defaults for v1?
- Should concurrent-run prevention use a process lock in this phase, or remain a documented operational constraint?

View File

@@ -0,0 +1,35 @@
## Why
ClawFort currently fetches news on a fixed hourly schedule, which is not enough during first-time setup or after a failed API cycle. Operators need a reliable way to force an immediate news pull so they can bootstrap content quickly and recover without waiting for the next scheduled run.
## What Changes
- **New Capabilities:**
- Add a manual Python command to trigger an immediate news fetch on demand.
- Add command output that clearly reports success/failure, number of fetched/stored items, and error details.
- Add safe invocation behavior so manual runs reuse existing fetch/retry/dedup logic.
- **Backend:**
- Add a CLI entrypoint/script for force-fetch execution.
- Wire the command to existing news aggregation pipeline used by scheduled jobs.
- Return non-zero exit codes on command failure for operational automation.
- **Operations:**
- Document how and when to run the force-fetch command (initial setup and recovery scenarios).
## Capabilities
### New Capabilities
- `force-fetch-command`: Provide a Python command that triggers immediate news aggregation outside the hourly scheduler.
- `fetch-run-reporting`: Provide operator-facing command output and exit semantics for successful runs and failures.
- `manual-fetch-recovery`: Support manual recovery workflow after failed or partial API fetch cycles.
### Modified Capabilities
- None.
## Impact
- **Code:** New CLI command module/entrypoint plus minimal integration with existing `news_service` execution path.
- **APIs:** No external API contract changes.
- **Dependencies:** No required new runtime dependencies expected.
- **Infrastructure:** No deployment topology change; command runs in the same container/runtime.
- **Environment:** Reuses existing env vars (`PERPLEXITY_API_KEY`, `OPENROUTER_API_KEY`, `IMAGE_QUALITY`).
- **Data:** No schema changes; command writes through existing dedup + persistence flow.

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Command reports run outcome to operator
The system SHALL present operator-facing output that describes whether the forced run succeeded or failed.
#### Scenario: Successful run reporting
- **WHEN** a forced fetch command completes without fatal errors
- **THEN** the command output includes a success indication
- **AND** includes the number of items stored in that run
#### Scenario: Failed run reporting
- **WHEN** a forced fetch command encounters a fatal execution error
- **THEN** the command output includes a failure indication
- **AND** includes actionable error details for operator diagnosis
### Requirement: Command exposes automation-friendly exit semantics
The system SHALL return deterministic process exit codes for command success and failure.
#### Scenario: Exit code on success
- **WHEN** the force-fetch command execution completes successfully
- **THEN** the process exits with code 0
- **AND** automation tooling can treat the run as successful
#### Scenario: Exit code on fatal failure
- **WHEN** the force-fetch command execution fails fatally
- **THEN** the process exits with a non-zero code
- **AND** automation tooling can detect the failure state

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Operator can trigger immediate news fetch via Python command
The system SHALL provide a Python command that triggers one immediate news aggregation run outside of the hourly scheduler.
#### Scenario: Successful forced fetch invocation
- **WHEN** an operator runs the documented force-fetch command with valid runtime configuration
- **THEN** the system executes one full fetch cycle using the existing aggregation pipeline
- **AND** the command terminates after the run completes
#### Scenario: Command does not reconfigure scheduler
- **WHEN** an operator runs the force-fetch command while the service scheduler exists
- **THEN** the command performs a one-off run only
- **AND** scheduler job definitions and cadence remain unchanged
### Requirement: Forced fetch reuses existing aggregation behavior
The system SHALL use the same retry, fallback, deduplication, image processing, and persistence logic as scheduled fetch runs.
#### Scenario: Retry and fallback parity
- **WHEN** the primary news provider request fails during a forced run
- **THEN** the system applies the configured retry behavior
- **AND** uses the configured fallback provider path if available
#### Scenario: Deduplication parity
- **WHEN** fetched headlines match existing duplicate rules
- **THEN** duplicate items are skipped according to existing deduplication policy
- **AND** only eligible items are persisted

View File

@@ -0,0 +1,22 @@
## ADDED Requirements
### Requirement: Manual command supports bootstrap and recovery workflows
The system SHALL allow operators to run the forced fetch command during first-time setup and after failed scheduled cycles.
#### Scenario: Bootstrap content population
- **WHEN** the system is newly deployed and contains no current news items
- **THEN** an operator can run the force-fetch command immediately
- **AND** the command attempts to populate the dataset without waiting for the next hourly schedule
#### Scenario: Recovery after failed scheduled fetch
- **WHEN** a prior scheduled fetch cycle failed or produced incomplete results
- **THEN** an operator can run the force-fetch command on demand
- **AND** the system performs a fresh one-off fetch attempt
### Requirement: Repeated manual runs remain operationally safe
The system SHALL support repeated operator-triggered runs without corrupting data integrity.
#### Scenario: Repeated invocation in same day
- **WHEN** an operator runs the force-fetch command multiple times within the same day
- **THEN** existing deduplication behavior prevents duplicate persistence for matching items
- **AND** each command run completes with explicit run status output

View File

@@ -0,0 +1,28 @@
## 1. CLI Command Foundation
- [x] 1.1 Create `backend/cli.py` with command parsing for force-fetch execution
- [x] 1.2 Add a force-fetch command entrypoint that can be invoked via Python module execution
- [x] 1.3 Ensure command initializes required runtime context (env + database readiness)
## 2. Force-Fetch Execution Path
- [x] 2.1 Wire command to existing news aggregation execution path (`process_and_store_news` or sync wrapper)
- [x] 2.2 Ensure command runs as a one-off operation without changing scheduler job configuration
- [x] 2.3 Preserve existing deduplication, retry, fallback, and image processing behavior during manual runs
## 3. Operator Reporting and Exit Semantics
- [x] 3.1 Add success output that includes stored item count for the forced run
- [x] 3.2 Add failure output with actionable error details when fatal execution errors occur
- [x] 3.3 Return exit code `0` on success and non-zero on fatal failures
## 4. Recovery Workflow and Validation
- [x] 4.1 Validate bootstrap workflow: force-fetch on a fresh deployment with no current items
- [x] 4.2 Validate recovery workflow: force-fetch after simulated failed scheduled cycle
- [x] 4.3 Validate repeated same-day manual runs do not create duplicate records under dedup policy
## 5. Documentation
- [x] 5.1 Update `README.md` with force-fetch command usage for first-time setup
- [x] 5.2 Document recovery-run usage and expected command output/exit behavior