bulk commit changes!

This commit is contained in:
2026-02-13 02:32:06 -05:00
parent c8f98c54c9
commit bf4a40f533
152 changed files with 2210 additions and 19 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-13

View File

@@ -0,0 +1,85 @@
## Context
Current operations are concentrated in `backend/cli.py` with a single `force-fetch` command and no unified admin maintenance suite. Operational actions such as archive cleanup, translation regeneration, image refresh, and cache/news reset require manual code/DB operations. Existing backend services already contain reusable primitives: ingestion (`process_and_store_news`), archival helpers (`archive_old_news`, `delete_archived_news`), and translation generation pipelines in `backend/news_service.py`.
## Goals / Non-Goals
**Goals:**
- Introduce an admin command suite that consolidates common maintenance and recovery actions.
- Implement queued image refetch for latest 30 items, sequentially processed with exponential backoff.
- Improve image refresh relevance by combining keyword and mood/sentiment cues with deterministic fallback behavior.
- Provide safe destructive operations (`clear-news`, `clean-archive`, cache clear) with operator guardrails.
- Add translation regeneration and parameterized fetch count command to reduce manual intervention.
**Non-Goals:**
- Replacing the scheduled ingestion model.
- Introducing external queue infrastructure (RabbitMQ/Redis workers) for this phase.
- Redesigning storage models or adding new DB tables unless strictly necessary.
- Building a web-based admin dashboard in this change.
## Decisions
### Decision: Extend existing CLI with subcommands
**Decision:** Expand `backend/cli.py` into a multi-subcommand admin command suite.
**Rationale:**
- Reuses existing deployment/runtime assumptions.
- Keeps operations scriptable via terminal/cron and avoids UI scope expansion.
**Alternatives considered:**
- New standalone admin binary: rejected due to duplicated bootstrapping/runtime checks.
### Decision: Queue image refetch in-process with sequential workers
**Decision:** Build a bounded in-memory queue for latest 30 items and process one-by-one.
**Rationale:**
- Meets rate-limit resilience requirement without new infrastructure.
- Deterministic and easy to monitor in command output.
**Alternatives considered:**
- Parallel refetch workers: rejected due to higher provider throttling risk.
### Decision: Exponential backoff for external image calls
**Decision:** Apply exponential backoff with capped retries for rate-limited or transient failures.
**Rationale:**
- Reduces burst retry amplification.
- Improves success rate under API pressure.
### Decision: Safety-first destructive command ergonomics
**Decision:** Destructive operations require explicit confirmation/flags and support dry-run where meaningful.
**Rationale:**
- Prevents accidental data loss.
- Makes admin actions auditable and predictable.
### Decision: Fetch-N command reuses ingestion pipeline
**Decision:** Add a fetch-count option that drives existing ingestion/fetch flow rather than building a second implementation.
**Rationale:**
- Preserves deduplication/retry logic and minimizes divergence.
## Risks / Trade-offs
- **[Risk] Operator misuse of destructive commands** -> Mitigation: confirmation gate + explicit flags + dry-run.
- **[Risk] Backoff can increase command runtime** -> Mitigation: cap retries and print progress ETA-style output.
- **[Risk] Queue processing interruption mid-run** -> Mitigation: idempotent per-item updates and resumable reruns.
- **[Trade-off] In-process queue is simpler but non-distributed** -> Mitigation: acceptable for admin-invoked maintenance scope.
## Migration Plan
1. Extend CLI parser with admin subcommands and argument validation.
2. Add reusable maintenance handlers (archive clean, cache clear, clear news, rebuild, regenerate translations, fetch-n).
3. Implement queued image-refetch handler with exponential backoff and per-item progress logs.
4. Add safe guards (`--confirm`, optional `--dry-run`) for destructive operations.
5. Document command usage and examples in README.
Rollback:
- Keep existing `force-fetch` path intact.
- Revert new subcommands while preserving unaffected ingestion pipeline.
## Open Questions
- What cache layers are considered in-scope for `clear-cache` (in-memory only vs additional filesystem cache)?
- Should `rebuild-site` chain all maintenance actions or remain a defined subset with explicit steps?
- Should `fetch n` enforce an upper bound to avoid accidental high-cost runs?

View File

@@ -0,0 +1,34 @@
## Why
Operational recovery and maintenance flows are currently fragmented, manual, and risky for site admins during outages or data-quality incidents. We need a reliable admin command surface that supports safe reset/rebuild workflows without requiring ad-hoc scripts.
## What Changes
- Add a unified admin CLI command with maintenance subcommands for common operational tasks.
- Add `refetch-images` mode that processes the latest 30 news items through a queue, one-by-one, with exponential backoff to reduce provider/API rate-limit failures.
- Make image refetch context-aware using article keywords plus mood/sentiment signals to improve image relevance.
- Add archive cleanup command for archived news maintenance.
- Add cache clear command for application cache invalidation.
- Add clear-news command for wiping existing news items.
- Add rebuild-site command to re-run full rebuild workflow.
- Add regenerate-translations command for all supported languages.
- Add fetch command supporting user-provided `n` article count.
- Add guardrails and operator UX improvements (dry-run where applicable, progress output, failure summaries, and safe defaults).
## Capabilities
### New Capabilities
- `admin-maintenance-command-suite`: Defines a single admin command surface with subcommands for refetch images, archive cleanup, cache clear, news clear, rebuild, translation regeneration, and fetch-n workflows.
- `queued-image-refetch-with-backoff`: Defines queue-based image refetch behavior for latest 30 items with sequential processing and exponential backoff for rate-limit resilience.
- `context-aware-image-selection-recovery`: Defines keyword + sentiment/mood-informed image query rules and generic AI fallback behavior for refetch operations.
- `site-admin-safety-and-ergonomics`: Defines operational safeguards and usability requirements (dry-run, confirmation for destructive actions, progress reporting, and actionable error summaries).
### Modified Capabilities
- None.
## Impact
- **Backend/CLI:** new admin command entrypoints and orchestration logic for maintenance workflows.
- **News/Image Pipeline:** image re-fetch and optimization logic, retry/backoff strategy, and relevance heuristics.
- **Data Layer:** archive cleanup, cache invalidation, news-clear, translation regeneration, and controlled fetch-count ingestion operations.
- **Operations:** faster incident recovery, reduced manual intervention, and safer reset/rebuild procedures for admins.

View File

@@ -0,0 +1,32 @@
## ADDED Requirements
### Requirement: Unified admin command surface
The system SHALL provide a single admin CLI command family exposing maintenance subcommands.
#### Scenario: Subcommand discovery
- **WHEN** an operator runs the admin command help output
- **THEN** available subcommands include refetch-images, clean-archive, clear-cache, clear-news, rebuild-site, regenerate-translations, and fetch
### Requirement: Fetch command supports configurable article count
The admin fetch command SHALL support an operator-provided article count parameter.
#### Scenario: Fetch with explicit count
- **WHEN** an operator invokes fetch with `n=25`
- **THEN** the command executes ingestion targeting the requested count
- **AND** prints completion summary including processed/stored counts
### Requirement: Translation regeneration command
The system SHALL provide a command to regenerate translations for existing articles.
#### Scenario: Regenerate translations run
- **WHEN** an operator runs regenerate-translations
- **THEN** the system attempts translation regeneration for supported languages
- **AND** outputs success/failure totals
### Requirement: Rebuild site command
The system SHALL provide a rebuild-site command that executes the defined rebuild workflow.
#### Scenario: Rebuild execution
- **WHEN** an operator runs rebuild-site
- **THEN** the system executes the documented rebuild steps in deterministic order
- **AND** prints a final success/failure summary

View File

@@ -0,0 +1,23 @@
## ADDED Requirements
### Requirement: Context-aware image query generation
Image refetch SHALL construct provider queries from article context including keywords and mood/sentiment cues.
#### Scenario: Context-enriched query
- **WHEN** a queued article is processed for image refetch
- **THEN** the system derives query terms from article headline/summary content
- **AND** includes mood/sentiment-informed cues to improve relevance
### Requirement: AI-domain fallback keywords
When context extraction is insufficient, the system SHALL use AI-domain fallback keywords.
#### Scenario: Empty or weak context extraction
- **WHEN** extracted context terms are empty or below quality threshold
- **THEN** the system applies fallback terms such as `ai`, `machine learning`, `deep learning`
### Requirement: Generic AI fallback image on terminal failure
If no usable provider image is returned, the system SHALL assign a generic AI fallback image.
#### Scenario: Provider chain exhaustion
- **WHEN** all provider attempts fail or return unusable images
- **THEN** the system stores a generic AI fallback image for the article

View File

@@ -0,0 +1,33 @@
## ADDED Requirements
### Requirement: Latest-30 queue construction
The refetch-images command SHALL enqueue up to the latest 30 news items for processing.
#### Scenario: Queue population
- **WHEN** refetch-images is started
- **THEN** the command loads recent news items
- **AND** enqueues at most 30 items ordered from newest to oldest
### Requirement: Sequential processing
The image refetch queue SHALL be processed one item at a time.
#### Scenario: Single-item worker behavior
- **WHEN** queue processing runs
- **THEN** only one queued item is processed concurrently
- **AND** next item starts only after current item completes/fails
### Requirement: Exponential backoff on transient failures
The queue processor SHALL retry transient image-provider failures using exponential backoff.
#### Scenario: Rate-limited provider response
- **WHEN** provider call returns rate-limit or transient error
- **THEN** command retries with exponential delay between attempts
- **AND** stops retrying after configured max attempts
### Requirement: Progress and completion reporting
The command SHALL emit operator-readable progress and final summary output.
#### Scenario: Queue progress output
- **WHEN** queue processing is in progress
- **THEN** the command prints per-item progress (processed/succeeded/failed)
- **AND** prints final totals on completion

View File

@@ -0,0 +1,25 @@
## ADDED Requirements
### Requirement: Confirmation guard for destructive commands
Destructive admin commands SHALL require explicit confirmation before execution.
#### Scenario: Missing confirmation flag
- **WHEN** an operator runs clear-news or clean-archive without required confirmation
- **THEN** the command exits without applying destructive changes
- **AND** prints guidance for explicit confirmation usage
### Requirement: Dry-run support where applicable
Maintenance commands SHALL provide dry-run mode for previewing effects where feasible.
#### Scenario: Dry-run preview
- **WHEN** an operator invokes a command with dry-run mode
- **THEN** the command reports intended actions and affected counts
- **AND** persists no data changes
### Requirement: Actionable failure summaries
Admin commands SHALL output actionable errors and final status summaries.
#### Scenario: Partial failure reporting
- **WHEN** a maintenance command partially fails
- **THEN** output includes succeeded/failed counts
- **AND** includes actionable next-step guidance

View File

@@ -0,0 +1,41 @@
## 1. Admin CLI Foundation
- [x] 1.1 Extend `backend/cli.py` parser with an admin maintenance command group and subcommands.
- [x] 1.2 Add argument validation for subcommands including `fetch --count n`.
- [x] 1.3 Keep existing `force-fetch` command behavior intact.
## 2. Queue-Based Image Refetch
- [x] 2.1 Implement latest-30 article selection query for refetch queue.
- [x] 2.2 Implement in-process sequential queue worker for refetch-images.
- [x] 2.3 Add exponential backoff retry logic for transient/rate-limit provider failures.
- [x] 2.4 Add per-item progress logging and final queue summary output.
## 3. Context-Aware Image Recovery
- [x] 3.1 Add context-aware query generation using article keywords plus mood/sentiment cues.
- [x] 3.2 Add AI-domain fallback keyword set when extracted context is weak.
- [x] 3.3 Add explicit generic AI fallback image assignment for terminal provider failure.
- [x] 3.4 Ensure refetched images are optimized and persisted using existing image pipeline contracts.
## 4. Maintenance Operations
- [x] 4.1 Implement clean-archive command using existing archival repository helpers.
- [x] 4.2 Implement clear-cache command for configured cache layers in scope.
- [x] 4.3 Implement clear-news command for non-archived and/or configured scope items.
- [x] 4.4 Implement rebuild-site command to execute defined rebuild sequence.
- [x] 4.5 Implement regenerate-translations command across supported languages.
- [x] 4.6 Implement fetch command with configurable article count.
## 5. Safety and Operator UX
- [x] 5.1 Add explicit confirmation requirement for destructive commands.
- [x] 5.2 Add dry-run support for commands where preview is feasible.
- [x] 5.3 Standardize command output format for success/failure totals and next-step hints.
## 6. Documentation and Validation
- [x] 6.1 Update README command documentation with examples for each new subcommand.
- [x] 6.2 Add operational guardrail notes (confirmation, dry-run, backoff behavior).
- [x] 6.3 Validate command help output and argument handling.
- [x] 6.4 Run end-to-end manual checks for refetch-images queue behavior and failure recovery output.