bulk commit changes!

This commit is contained in:
2026-02-13 02:32:06 -05:00
parent c8f98c54c9
commit bf4a40f533
152 changed files with 2210 additions and 19 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-12

View File

@@ -0,0 +1,111 @@
## Context
ClawFort needs a stunning one-page placeholder website that automatically aggregates and displays AI news hourly. The site must be containerized, use Perplexity API for news generation, and feature infinite scroll with 30-day retention.
**Current State:** Greenfield project - no existing codebase.
**Constraints:**
- Must use Perplexity API (API key via environment variable)
- Containerized deployment (Docker)
- Lean JavaScript framework for frontend
- 30-day news retention with archiving
- Hourly automated updates
## Goals / Non-Goals
**Goals:**
- Stunning one-page website with ClawFort branding
- Hourly AI news aggregation via Perplexity API
- Dynamic hero block with featured news and image
- Infinite scroll news feed (10 initial items)
- 30-day retention with automatic archiving
- Source attribution for news and images
- Fully containerized deployment
- Responsive design
**Non-Goals:**
- User authentication/accounts
- Manual news curation interface
- Real-time updates (polling only)
- Multi-language support
- CMS integration
- SEO optimization beyond basic meta tags
## Decisions
### Architecture: Monolithic Container
**Decision:** Single container with frontend + backend + SQLite
**Rationale:** Simplicity for a placeholder site, easy deployment, no external database dependencies
**Alternative:** Microservices with separate DB container - rejected as overkill for this scope
### Frontend Framework: Alpine.js + Tailwind CSS
**Decision:** Alpine.js for lean reactivity, Tailwind for styling
**Rationale:** Minimal bundle size (~15kb), no build step complexity, perfect for one-page sites
**Alternative:** React/Vue - rejected as too heavy for simple infinite scroll and hero display
### Backend: Python (FastAPI) + APScheduler
**Decision:** FastAPI for REST API, APScheduler for cron-like jobs
**Rationale:** Fast to develop, excellent async support, built-in OpenAPI docs, simple scheduling
**Alternative:** Node.js/Express - rejected; Python better for data processing and Perplexity integration
### Database: SQLite with SQLAlchemy
**Decision:** SQLite for zero-config persistence
**Rationale:** No separate DB container needed, sufficient for 30-day news retention (~1000-2000 records)
**Alternative:** PostgreSQL - rejected as adds deployment complexity
### News Aggregation Strategy
**Decision:** Hourly cron job queries Perplexity for "latest AI news" with image generation
**Rationale:** Simple, reliable, cost-effective
**Implementation:**
- Perplexity API call: "What are the latest AI news from the last hour?"
- Store: headline, summary, source URL, image URL, timestamp
- Attribution: Display source name and image credit
### Image Strategy
**Decision:** Use Perplexity to suggest relevant images or generate via DALL-E if available, with local image optimization
**Rationale:** Consistent with AI theme, no copyright concerns, plus configurable compression
**Implementation:**
- Download and optimize images locally using Pillow
- Configurable quality setting via `IMAGE_QUALITY` env var (1-100, default 85)
- Store optimized images in `/app/static/images/`
- Serve optimized versions, fallback to original URL if optimization fails
**Alternative:** Unsplash API - rejected to keep dependencies minimal
### Infinite Scroll Implementation
**Decision:** Cursor-based pagination with Intersection Observer API
**Rationale:** Efficient for large datasets, simple Alpine.js integration
**Page Size:** 10 items per request
### Archive Strategy
**Decision:** Soft delete (archived flag) + nightly cleanup job
**Rationale:** Easy to implement, data recoverable if needed
**Cleanup:** Move items >30 days to archive table or delete
## Risks / Trade-offs
**[Risk] Perplexity API rate limits or downtime** → Mitigation: Implement exponential backoff, cache last successful fetch, display cached content with "last updated" timestamp, fallback to OpenRouter API if configured
**[Risk] Container storage grows unbounded** → Mitigation: SQLite WAL mode, volume mounts for persistence, 30-day hard limit on retention
**[Risk] News quality varies** → Mitigation: Basic filtering (require title + summary), manual blacklist capability in config
**[Risk] Cold start performance** → Mitigation: SQLite connection pooling, frontend CDN-ready static assets
**[Trade-off] SQLite vs PostgreSQL** → SQLite limits concurrent writes but acceptable for read-heavy news site
**[Trade-off] Single container vs microservices** → Easier deployment but less scalable; acceptable for placeholder site
## Migration Plan
1. **Development:** Local Docker Compose setup
2. **Environment:** Configure `PERPLEXITY_API_KEY` in `.env`
3. **Build:** `docker build -t clawfort-site .`
4. **Run:** `docker run -e PERPLEXITY_API_KEY=xxx -p 8000:8000 clawfort-site`
5. **Data:** SQLite volume mount for persistence across restarts
## Open Questions (Resolved)
1. **Admin panel?** → Deferred to future
2. **Image optimization?** → Yes, local optimization with Pillow, configurable quality via `IMAGE_QUALITY` env var
3. **Analytics?** → Umami integration with `UMAMI_SCRIPT_URL` and `UMAMI_WEBSITE_ID` env vars, track page views, scroll events, and CTA clicks
4. **API cost monitoring?** → Log Perplexity usage, fallback to OpenRouter API if `OPENROUTER_API_KEY` configured

View File

@@ -0,0 +1,54 @@
## Why
ClawFort needs a stunning one-page placeholder website that automatically generates and displays AI news hourly, creating a dynamic, always-fresh brand presence without manual content curation. The site will serve as a living showcase of AI capabilities while building brand recognition.
## What Changes
- **New Capabilities:**
- Automated AI news aggregation via Perplexity API (hourly updates)
- Dynamic hero section with featured news and images
- Infinite scroll news feed with 1-month retention
- Archive system for older news items
- Containerized deployment (Docker)
- Responsive single-page design with lean JavaScript framework
- **Frontend:**
- One-page website with hero block
- Infinite scroll news feed (latest 10 on load)
- News attribution to sources
- Image credit display
- Responsive design
- **Backend:**
- News aggregation service (hourly cron job)
- Perplexity API integration
- News storage with 30-day retention
- Archive management
- REST API for frontend
- **Infrastructure:**
- Docker containerization
- Environment-based configuration
- Perplexity API key management
## Capabilities
### New Capabilities
- `news-aggregator`: Automated AI news collection via Perplexity API with hourly scheduling
- `news-storage`: Database storage with 30-day retention and archive management
- `hero-display`: Dynamic hero block with featured news and image attribution
- `infinite-scroll`: Frontend infinite scroll with lazy loading (10 initial, paginated)
- `containerized-deployment`: Docker-based deployment with environment configuration
- `responsive-frontend`: Single-page application with lean JavaScript framework
### Modified Capabilities
- None (new project)
## Impact
- **Code:** New full-stack application (frontend + backend)
- **APIs:** Perplexity API integration required
- **Dependencies:** Docker, Node.js/Python runtime, database (SQLite/PostgreSQL)
- **Infrastructure:** Container orchestration support
- **Environment:** `PERPLEXITY_API_KEY` required
- **Data:** 30-day rolling news archive with automatic cleanup

View File

@@ -0,0 +1,45 @@
## ADDED Requirements
### Requirement: Containerized deployment
The system SHALL run entirely within Docker containers with all dependencies included.
#### Scenario: Single container build
- **WHEN** building the Docker image
- **THEN** the Dockerfile SHALL include Python runtime, Node.js (for Tailwind if needed), and all application code
- **AND** expose port 8000 for web traffic
#### Scenario: Environment configuration
- **WHEN** running the container
- **THEN** the system SHALL read PERPLEXITY_API_KEY from environment variables
- **AND** fail to start if the key is missing or invalid
- **AND** support optional configuration for retention days (default: 30)
- **AND** support optional IMAGE_QUALITY for image compression (default: 85)
- **AND** support optional OPENROUTER_API_KEY for fallback LLM provider
- **AND** support optional UMAMI_SCRIPT_URL and UMAMI_WEBSITE_ID for analytics
#### Scenario: Data persistence
- **WHEN** the container restarts
- **THEN** the SQLite database SHALL persist via Docker volume mount
- **AND** news data SHALL remain intact across restarts
## ADDED Requirements
### Requirement: Responsive single-page design
The system SHALL provide a stunning, responsive one-page website with ClawFort branding.
#### Scenario: Brand consistency
- **WHEN** viewing the website
- **THEN** the design SHALL feature ClawFort branding (logo, colors, typography)
- **AND** maintain visual consistency across all sections
#### Scenario: Responsive layout
- **WHEN** viewing on mobile, tablet, or desktop
- **THEN** the layout SHALL adapt appropriately
- **AND** the hero block SHALL resize proportionally
- **AND** the news feed SHALL use appropriate column layouts
#### Scenario: Performance
- **WHEN** loading the page
- **THEN** initial page load SHALL complete within 2 seconds
- **AND** images SHALL lazy load outside viewport
- **AND** JavaScript bundle SHALL be under 100KB gzipped

View File

@@ -0,0 +1,55 @@
## ADDED Requirements
### Requirement: Hero block display
The system SHALL display the most recent news item as a featured hero block with full attribution.
#### Scenario: Hero rendering
- **WHEN** the page loads
- **THEN** the hero block SHALL display the latest news headline, summary, and featured image
- **AND** show source attribution (e.g., "Via: TechCrunch")
- **AND** show image credit (e.g., "Image: DALL-E")
#### Scenario: Hero update
- **WHEN** new news is fetched hourly
- **THEN** the hero block SHALL automatically update to show the newest item
- **AND** the previous hero item SHALL move to the news feed
## ADDED Requirements
### Requirement: Infinite scroll news feed
The system SHALL display news items in reverse chronological order with infinite scroll pagination.
#### Scenario: Initial load
- **WHEN** the page first loads
- **THEN** the system SHALL display the 10 most recent non-archived news items
- **AND** exclude the hero item from the feed
#### Scenario: Infinite scroll
- **WHEN** the user scrolls to the bottom of the feed
- **THEN** the system SHALL fetch the next 10 news items via API
- **AND** append them to the feed without page reload
- **AND** show a loading indicator during fetch
#### Scenario: End of feed
- **WHEN** all non-archived news items have been loaded
- **THEN** the system SHALL display "No more news" message
- **AND** disable further scroll triggers
### Requirement: News attribution display
The system SHALL clearly attribute all news content and images to their sources.
#### Scenario: Source attribution
- **WHEN** displaying any news item
- **THEN** the system SHALL show the original source name and link
- **AND** display image credit if available
#### Scenario: Perplexity attribution
- **WHEN** displaying aggregated content
- **THEN** the system SHALL include "Powered by Perplexity" in the footer
#### Scenario: Analytics tracking
- **WHEN** Umami analytics is configured via `UMAMI_SCRIPT_URL` and `UMAMI_WEBSITE_ID`
- **THEN** the system SHALL inject Umami tracking script into page head
- **AND** track page view events on initial load
- **AND** track scroll depth events (25%, 50%, 75%, 100%)
- **AND** track CTA click events (news item clicks, source link clicks)

View File

@@ -0,0 +1,56 @@
## ADDED Requirements
### Requirement: News aggregation via Perplexity API
The system SHALL fetch AI news hourly from Perplexity API and store it with full attribution.
#### Scenario: Hourly news fetch
- **WHEN** the scheduled job runs every hour
- **THEN** the system calls Perplexity API with query "latest AI news"
- **AND** stores the response with headline, summary, source URL, and timestamp
#### Scenario: API error handling
- **WHEN** Perplexity API returns an error or timeout
- **THEN** the system logs the error with cost tracking
- **AND** retries with exponential backoff up to 3 times
- **AND** falls back to OpenRouter API if `OPENROUTER_API_KEY` is configured
- **AND** continues using cached content if all retries and fallback fail
### Requirement: Featured image generation
The system SHALL generate or fetch a relevant featured image for each news item.
#### Scenario: Image acquisition
- **WHEN** a new news item is fetched
- **THEN** the system SHALL request a relevant image URL from Perplexity
- **AND** download and optimize the image locally using Pillow
- **AND** apply quality compression based on `IMAGE_QUALITY` env var (1-100, default 85)
- **AND** store the optimized image path and original image credit/source information
#### Scenario: Image optimization configuration
- **WHEN** the system processes an image
- **THEN** it SHALL read `IMAGE_QUALITY` from environment (default: 85)
- **AND** apply JPEG compression at specified quality level
- **AND** resize images exceeding 1200px width while maintaining aspect ratio
- **AND** store optimized images in `/app/static/images/` directory
#### Scenario: Image fallback
- **WHEN** image generation fails or returns no result
- **THEN** the system SHALL use a default ClawFort branded placeholder image
## ADDED Requirements
### Requirement: News data persistence with retention
The system SHALL store news items for exactly 30 days with automatic archiving.
#### Scenario: News storage
- **WHEN** a news item is fetched from Perplexity
- **THEN** the system SHALL store it in SQLite with fields: id, headline, summary, source_url, image_url, image_credit, published_at, created_at
- **AND** set archived=false by default
#### Scenario: Automatic archiving
- **WHEN** a nightly cleanup job runs
- **THEN** the system SHALL mark all news items older than 30 days as archived=true
- **AND** delete archived items older than 60 days permanently
#### Scenario: Duplicate prevention
- **WHEN** fetching news that matches an existing headline (within 24 hours)
- **THEN** the system SHALL skip insertion to prevent duplicates

View File

@@ -0,0 +1,82 @@
## 1. Project Setup
- [x] 1.1 Create project directory structure (backend/, frontend/, docker/)
- [x] 1.2 Initialize Python project with pyproject.toml (FastAPI, SQLAlchemy, APScheduler, httpx)
- [x] 1.3 Create requirements.txt for Docker build
- [x] 1.4 Set up Tailwind CSS configuration
- [x] 1.5 Create .env.example with all environment variables (PERPLEXITY_API_KEY, IMAGE_QUALITY, OPENROUTER_API_KEY, UMAMI_SCRIPT_URL, UMAMI_WEBSITE_ID)
## 2. Database Layer
- [x] 2.1 Create SQLAlchemy models (NewsItem with fields: id, headline, summary, source_url, image_url, image_credit, published_at, created_at, archived)
- [x] 2.2 Create database initialization and migration scripts
- [x] 2.3 Implement database connection management with SQLite
- [x] 2.4 Create repository functions (create_news, get_recent_news, get_news_paginated, archive_old_news, delete_archived_news)
## 3. News Aggregation Service
- [x] 3.1 Implement Perplexity API client with httpx and cost logging
- [x] 3.2 Create news fetch function with query "latest AI news"
- [x] 3.3 Implement exponential backoff retry logic (3 attempts)
- [x] 3.4 Add duplicate detection (headline match within 24h)
- [x] 3.5 Create hourly scheduled job with APScheduler
- [x] 3.6 Implement image URL fetching from Perplexity
- [x] 3.7 Add image download and optimization with Pillow (configurable quality)
- [x] 3.8 Implement OpenRouter API fallback for news fetching
- [x] 3.9 Add default placeholder image fallback
## 4. Backend API
- [x] 4.1 Create FastAPI application structure
- [x] 4.2 Implement GET /api/news endpoint with pagination (cursor-based)
- [x] 4.3 Implement GET /api/news/latest endpoint for hero block
- [x] 4.4 Add CORS middleware for frontend access
- [x] 4.5 Create Pydantic schemas for API responses
- [x] 4.6 Implement health check endpoint
- [x] 4.7 Add API error handling and logging
## 5. Frontend Implementation
- [x] 5.1 Create HTML structure with ClawFort branding
- [x] 5.2 Implement hero block with Alpine.js (latest news display)
- [x] 5.3 Create news feed component with Alpine.js
- [x] 5.4 Implement infinite scroll with Intersection Observer API
- [x] 5.5 Add loading indicators and "No more news" message
- [x] 5.6 Implement source attribution display
- [x] 5.7 Add image lazy loading
- [x] 5.8 Style with Tailwind CSS (responsive design)
- [x] 5.9 Add "Powered by Perplexity" footer attribution
- [x] 5.10 Implement Umami analytics integration (conditional on env vars)
- [x] 5.11 Add analytics events: page view, scroll depth (25/50/75/100%), CTA clicks
## 6. Archive Management
- [x] 6.1 Implement nightly cleanup job (archive >30 days)
- [x] 6.2 Create permanent deletion job (>60 days archived)
- [x] 6.3 Add retention configuration (default 30 days)
## 7. Docker Containerization
- [x] 7.1 Create Dockerfile with multi-stage build (Python + static assets)
- [x] 7.2 Create docker-compose.yml for local development
- [x] 7.3 Add volume mount for SQLite persistence
- [x] 7.4 Configure environment variable handling
- [x] 7.5 Optimize image size (slim Python base)
- [x] 7.6 Add .dockerignore file
## 8. Testing & Validation
- [x] 8.1 Test Perplexity API integration manually
- [x] 8.2 Verify hourly news fetching works
- [x] 8.3 Test infinite scroll pagination
- [x] 8.4 Verify responsive design on mobile/desktop
- [x] 8.5 Test container build and run
- [x] 8.6 Verify data persistence across container restarts
- [x] 8.7 Test archive cleanup functionality
## 9. Documentation
- [x] 9.1 Create README.md with setup instructions
- [x] 9.2 Document environment variables
- [x] 9.3 Add deployment instructions
- [x] 9.4 Document API endpoints

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-12

View File

@@ -0,0 +1,93 @@
## Context
ClawFort currently performs automated hourly news ingestion through APScheduler (`scheduled_news_fetch()` in `backend/news_service.py`) and the same pipeline handles retries, deduplication, image optimization, and persistence. There is no operator-facing command to run this pipeline on demand.
The change adds an explicit manual trigger path for operations use cases:
- first-time bootstrap (populate content immediately after setup)
- recovery after failed external API calls
- ad-hoc operational refresh without waiting for scheduler cadence
Constraints:
- Reuse existing fetch pipeline to avoid logic drift
- Keep behavior idempotent with existing duplicate detection
- Preserve scheduler behavior; manual runs must not mutate scheduler configuration
## Goals / Non-Goals
**Goals:**
- Provide a Python command to force an immediate news fetch.
- Reuse existing retry, dedup, and storage logic.
- Return clear terminal output and process exit status for automation.
- Keep command safe to run repeatedly.
**Non-Goals:**
- Replacing APScheduler-based hourly fetch.
- Introducing new API endpoints for manual triggering.
- Changing data schema or retention policy.
- Building a full operator dashboard.
## Decisions
### Decision: Add a dedicated CLI entrypoint module
**Decision:** Add a small CLI entrypoint under backend (for example `backend/cli.py`) with a subcommand that invokes the fetch pipeline.
**Rationale:**
- Keeps operational workflow explicit and scriptable.
- Avoids coupling manual trigger behavior to HTTP routes.
- Works in local dev and containerized runtime.
**Alternatives considered:**
- Add an admin HTTP endpoint: rejected due to unnecessary security exposure.
- Trigger APScheduler internals directly: rejected to avoid scheduler-state side effects.
### Decision: Invoke the existing news pipeline directly
**Decision:** The command should call `process_and_store_news()` (or the existing sync wrapper) instead of implementing parallel fetch logic.
**Rationale:**
- Guarantees parity with scheduled runs.
- Reuses retry/backoff, fallback provider behavior, image handling, and dedup checks.
- Minimizes maintenance overhead.
**Alternatives considered:**
- New command-specific fetch implementation: rejected due to drift risk.
### Decision: Standardize command exit semantics
**Decision:** Exit code `0` for successful command execution (including zero new items), non-zero for operational failures (for example unhandled exceptions or fatal setup errors).
**Rationale:**
- Enables CI/cron/operator scripts to react deterministically.
- Matches common CLI conventions.
**Alternatives considered:**
- Exit non-zero when zero new items were inserted: rejected because dedup can make zero-item runs valid.
### Decision: Keep manual and scheduled paths independent
**Decision:** Manual command does not reconfigure or trigger scheduler jobs; it performs a one-off run only.
**Rationale:**
- Avoids race-prone manipulation of scheduler internals.
- Reduces complexity and risk in production runtime.
**Alternatives considered:**
- Temporarily altering scheduler trigger times: rejected as brittle and harder to reason about.
## Risks / Trade-offs
- **[Risk] Overlapping manual and scheduled runs may happen at boundary times** -> Mitigation: document operational guidance and keep dedup checks as safety net.
- **[Risk] External API failures still occur during forced runs** -> Mitigation: existing retry/backoff plus fallback provider path and explicit error output.
- **[Trade-off] Command success does not guarantee new rows** -> Mitigation: command output reports inserted count so operators can distinguish no-op vs failure.
## Migration Plan
1. Add CLI module and force-fetch subcommand wired to existing pipeline.
2. Add command result reporting and exit code behavior.
3. Document usage in README for bootstrap and recovery flows.
4. Validate command in local runtime and container runtime.
Rollback:
- Remove CLI entrypoint and related docs; scheduler-based hourly behavior remains unchanged.
## Open Questions
- Should force-fetch support an optional `--max-attempts` override, or stay fixed to pipeline defaults for v1?
- Should concurrent-run prevention use a process lock in this phase, or remain a documented operational constraint?

View File

@@ -0,0 +1,35 @@
## Why
ClawFort currently fetches news on a fixed hourly schedule, which is not enough during first-time setup or after a failed API cycle. Operators need a reliable way to force an immediate news pull so they can bootstrap content quickly and recover without waiting for the next scheduled run.
## What Changes
- **New Capabilities:**
- Add a manual Python command to trigger an immediate news fetch on demand.
- Add command output that clearly reports success/failure, number of fetched/stored items, and error details.
- Add safe invocation behavior so manual runs reuse existing fetch/retry/dedup logic.
- **Backend:**
- Add a CLI entrypoint/script for force-fetch execution.
- Wire the command to existing news aggregation pipeline used by scheduled jobs.
- Return non-zero exit codes on command failure for operational automation.
- **Operations:**
- Document how and when to run the force-fetch command (initial setup and recovery scenarios).
## Capabilities
### New Capabilities
- `force-fetch-command`: Provide a Python command that triggers immediate news aggregation outside the hourly scheduler.
- `fetch-run-reporting`: Provide operator-facing command output and exit semantics for successful runs and failures.
- `manual-fetch-recovery`: Support manual recovery workflow after failed or partial API fetch cycles.
### Modified Capabilities
- None.
## Impact
- **Code:** New CLI command module/entrypoint plus minimal integration with existing `news_service` execution path.
- **APIs:** No external API contract changes.
- **Dependencies:** No required new runtime dependencies expected.
- **Infrastructure:** No deployment topology change; command runs in the same container/runtime.
- **Environment:** Reuses existing env vars (`PERPLEXITY_API_KEY`, `OPENROUTER_API_KEY`, `IMAGE_QUALITY`).
- **Data:** No schema changes; command writes through existing dedup + persistence flow.

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Command reports run outcome to operator
The system SHALL present operator-facing output that describes whether the forced run succeeded or failed.
#### Scenario: Successful run reporting
- **WHEN** a forced fetch command completes without fatal errors
- **THEN** the command output includes a success indication
- **AND** includes the number of items stored in that run
#### Scenario: Failed run reporting
- **WHEN** a forced fetch command encounters a fatal execution error
- **THEN** the command output includes a failure indication
- **AND** includes actionable error details for operator diagnosis
### Requirement: Command exposes automation-friendly exit semantics
The system SHALL return deterministic process exit codes for command success and failure.
#### Scenario: Exit code on success
- **WHEN** the force-fetch command execution completes successfully
- **THEN** the process exits with code 0
- **AND** automation tooling can treat the run as successful
#### Scenario: Exit code on fatal failure
- **WHEN** the force-fetch command execution fails fatally
- **THEN** the process exits with a non-zero code
- **AND** automation tooling can detect the failure state

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Operator can trigger immediate news fetch via Python command
The system SHALL provide a Python command that triggers one immediate news aggregation run outside of the hourly scheduler.
#### Scenario: Successful forced fetch invocation
- **WHEN** an operator runs the documented force-fetch command with valid runtime configuration
- **THEN** the system executes one full fetch cycle using the existing aggregation pipeline
- **AND** the command terminates after the run completes
#### Scenario: Command does not reconfigure scheduler
- **WHEN** an operator runs the force-fetch command while the service scheduler exists
- **THEN** the command performs a one-off run only
- **AND** scheduler job definitions and cadence remain unchanged
### Requirement: Forced fetch reuses existing aggregation behavior
The system SHALL use the same retry, fallback, deduplication, image processing, and persistence logic as scheduled fetch runs.
#### Scenario: Retry and fallback parity
- **WHEN** the primary news provider request fails during a forced run
- **THEN** the system applies the configured retry behavior
- **AND** uses the configured fallback provider path if available
#### Scenario: Deduplication parity
- **WHEN** fetched headlines match existing duplicate rules
- **THEN** duplicate items are skipped according to existing deduplication policy
- **AND** only eligible items are persisted

View File

@@ -0,0 +1,22 @@
## ADDED Requirements
### Requirement: Manual command supports bootstrap and recovery workflows
The system SHALL allow operators to run the forced fetch command during first-time setup and after failed scheduled cycles.
#### Scenario: Bootstrap content population
- **WHEN** the system is newly deployed and contains no current news items
- **THEN** an operator can run the force-fetch command immediately
- **AND** the command attempts to populate the dataset without waiting for the next hourly schedule
#### Scenario: Recovery after failed scheduled fetch
- **WHEN** a prior scheduled fetch cycle failed or produced incomplete results
- **THEN** an operator can run the force-fetch command on demand
- **AND** the system performs a fresh one-off fetch attempt
### Requirement: Repeated manual runs remain operationally safe
The system SHALL support repeated operator-triggered runs without corrupting data integrity.
#### Scenario: Repeated invocation in same day
- **WHEN** an operator runs the force-fetch command multiple times within the same day
- **THEN** existing deduplication behavior prevents duplicate persistence for matching items
- **AND** each command run completes with explicit run status output

View File

@@ -0,0 +1,28 @@
## 1. CLI Command Foundation
- [x] 1.1 Create `backend/cli.py` with command parsing for force-fetch execution
- [x] 1.2 Add a force-fetch command entrypoint that can be invoked via Python module execution
- [x] 1.3 Ensure command initializes required runtime context (env + database readiness)
## 2. Force-Fetch Execution Path
- [x] 2.1 Wire command to existing news aggregation execution path (`process_and_store_news` or sync wrapper)
- [x] 2.2 Ensure command runs as a one-off operation without changing scheduler job configuration
- [x] 2.3 Preserve existing deduplication, retry, fallback, and image processing behavior during manual runs
## 3. Operator Reporting and Exit Semantics
- [x] 3.1 Add success output that includes stored item count for the forced run
- [x] 3.2 Add failure output with actionable error details when fatal execution errors occur
- [x] 3.3 Return exit code `0` on success and non-zero on fatal failures
## 4. Recovery Workflow and Validation
- [x] 4.1 Validate bootstrap workflow: force-fetch on a fresh deployment with no current items
- [x] 4.2 Validate recovery workflow: force-fetch after simulated failed scheduled cycle
- [x] 4.3 Validate repeated same-day manual runs do not create duplicate records under dedup policy
## 5. Documentation
- [x] 5.1 Update `README.md` with force-fetch command usage for first-time setup
- [x] 5.2 Document recovery-run usage and expected command output/exit behavior

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-12

View File

@@ -0,0 +1,94 @@
## Context
ClawFort currently stores and serves article content in a single language flow. The news creation path fetches English content via Perplexity and persists one record per article, while frontend hero/feed rendering consumes that single-language payload.
This change introduces multilingual support for Tamil and Malayalam with language-aware rendering and persistent user preference.
Constraints:
- Keep existing English behavior as default and fallback.
- Reuse current Perplexity integration for translation generation.
- Keep API and frontend changes minimal and backward-compatible where possible.
- Persist user language preference client-side so returning users keep their choice.
## Goals / Non-Goals
**Goals:**
- Generate Tamil and Malayalam translations at article creation time.
- Persist translation variants linked to the base article.
- Serve language-specific content in hero/feed API responses.
- Add landing-page language selector and persist preference across sessions.
**Non-Goals:**
- Supporting arbitrary language expansion in this phase.
- Introducing user accounts/server-side profile preferences.
- Building editorial translation workflows or manual override UI.
- Replacing Perplexity as translation provider.
## Decisions
### Decision: Model translations as child records linked to a base article
**Decision:** Keep one source article and store translation rows keyed by article ID + language code.
**Rationale:**
- Avoids duplicating non-language metadata (source URL, image attribution, timestamps).
- Supports language lookup with deterministic fallback to English.
- Eases future language additions without schema redesign.
**Alternatives considered:**
- Inline columns on article table (`headline_ta`, `headline_ml`): rejected as rigid and harder to extend.
- Fully duplicated article rows per language: rejected due to dedup and feed-order complexity.
### Decision: Translate immediately after article creation in ingestion pipeline
**Decision:** For each newly accepted article, request Tamil and Malayalam translations and persist before ingestion cycle completes.
**Rationale:**
- Keeps article and translations synchronized.
- Avoids delayed jobs and partial language availability in normal flow.
- Fits existing per-article processing loop.
**Alternatives considered:**
- Asynchronous background translation queue: rejected for higher complexity in this phase.
### Decision: Add optional language input to read APIs with English fallback
**Decision:** Add language selection input (query param) on existing read endpoints; if translation missing, return English source text.
**Rationale:**
- Preserves endpoint footprint and frontend integration simplicity.
- Guarantees response completeness even when translation fails.
- Supports progressive rollout without breaking existing consumers.
**Alternatives considered:**
- New language-specific endpoints: rejected as unnecessary API surface growth.
### Decision: Persist frontend language preference in localStorage with cookie fallback
**Decision:** Primary persistence in `localStorage`; optional cookie fallback for constrained browsers.
**Rationale:**
- Simple client-only persistence without backend session dependencies.
- Matches one-page app architecture and current no-auth model.
**Alternatives considered:**
- Cookie-only preference: rejected as less ergonomic for JS state hydration.
## Risks / Trade-offs
- **[Risk] Translation generation increases API cost/latency per ingestion cycle** -> Mitigation: bounded retries, fallback to English when translation unavailable.
- **[Risk] Partial translation failures create mixed-language feed** -> Mitigation: deterministic fallback to English for missing translation rows.
- **[Trade-off] Translation-at-ingest adds synchronous processing time** -> Mitigation: keep language set fixed to two targets in this phase.
- **[Risk] Language preference desynchronization between tabs/devices** -> Mitigation: accept per-browser persistence scope in current architecture.
## Migration Plan
1. Add translation persistence model and migration path.
2. Extend ingestion pipeline to request/store Tamil and Malayalam translations.
3. Add language-aware API response behavior with fallback.
4. Implement frontend language selector + preference persistence.
5. Validate language switching, fallback, and returning-user preference behavior.
Rollback:
- Disable language selection in frontend and return English-only payload while retaining translation data safely.
## Open Questions
- Should translation failures be retried independently per language within the same cycle, or skipped after one failed language call?
- Should unsupported language requests return 400 or silently fallback to English in v1?

View File

@@ -0,0 +1,37 @@
## Why
ClawFort currently publishes content in a single language, which limits accessibility for regional audiences. Adding multilingual delivery now improves usability for Tamil and Malayalam readers while keeping the current English workflow intact.
## What Changes
- **New Capabilities:**
- Persist the fetched articles locally in database.
- Generate Tamil and Malayalam translations for each newly created article using Perplexity.
- Store translated variants as language-specific content items linked to the same base article.
- Add a language selector on the landing page to switch article rendering language.
- Persist user language preference in browser storage (local storage or cookie) and restore it for returning users.
- **Frontend:**
- Add visible language switcher UI on the one-page experience.
- Render hero and feed content in selected language when translation exists.
- **Backend:**
- Extend content generation flow to request and save multilingual outputs.
- Serve language-specific content for existing API reads.
## Capabilities
### New Capabilities
- `article-translations-ml-tm`: Create and store Tamil and Malayalam translated content variants for each article at creation time.
- `language-aware-content-delivery`: Return and render language-specific article fields based on selected language.
- `language-preference-persistence`: Persist and restore user-selected language across sessions for returning users.
### Modified Capabilities
- None.
## Impact
- **Code:** Backend aggregation/storage flow, API response handling, and frontend rendering/state management will be updated.
- **APIs:** Existing read endpoints will need language-aware response behavior or language selection input handling.
- **Dependencies:** Reuses Perplexity integration; no mandatory new external provider expected.
- **Infrastructure:** No deployment topology changes.
- **Environment:** Uses existing Perplexity configuration; may introduce optional translation toggles/settings later.
- **Data:** Adds translation data model/fields linked to each source article.

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: System generates Tamil and Malayalam translations at article creation time
The system SHALL generate Tamil (`ta`) and Malayalam (`ml`) translations for each newly created article during ingestion.
#### Scenario: Translation generation for new article
- **WHEN** a new source article is accepted for storage
- **THEN** the system requests Tamil and Malayalam translations for headline and summary
- **AND** translation generation occurs in the same ingestion flow for that article
#### Scenario: Translation failure fallback
- **WHEN** translation generation fails for one or both target languages
- **THEN** the system stores the base article in English
- **AND** marks missing translations as unavailable without failing the whole ingestion cycle
### Requirement: System stores translation variants linked to the same article
The system SHALL persist language-specific translated content as translation items associated with the base article.
#### Scenario: Persist linked translations
- **WHEN** Tamil and Malayalam translations are generated successfully
- **THEN** the system stores them as language-specific content variants linked to the base article identifier
- **AND** translation records remain queryable by language code
#### Scenario: No duplicate translation variants per language
- **WHEN** translation storage is attempted for an article-language pair that already exists
- **THEN** the system avoids creating duplicate translation items for the same language
- **AND** preserves one authoritative translation variant per article per language in this phase

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: API supports language-aware content retrieval
The system SHALL support language-aware content delivery for hero and feed reads using selected language input.
#### Scenario: Language-specific latest article response
- **WHEN** a client requests latest article data with a supported language selection
- **THEN** the system returns headline and summary in the selected language when available
- **AND** includes the corresponding base article metadata and media attribution
#### Scenario: Language-specific paginated feed response
- **WHEN** a client requests paginated feed data with a supported language selection
- **THEN** the system returns each feed item's headline and summary in the selected language when available
- **AND** preserves existing pagination behavior and ordering semantics
### Requirement: Language fallback to English is deterministic
The system SHALL return English source content when the requested translation is unavailable.
#### Scenario: Missing translation fallback
- **WHEN** a client requests Tamil or Malayalam content for an article lacking that translation
- **THEN** the system returns the English headline and summary for that article
- **AND** response shape remains consistent with language-aware responses
#### Scenario: Unsupported language handling
- **WHEN** a client requests a language outside supported values (`en`, `ta`, `ml`)
- **THEN** the system applies the defined default language behavior for this phase
- **AND** avoids breaking existing consumers of news endpoints

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Landing page provides language selector
The system SHALL display a language selector on the landing page that allows switching between English, Tamil, and Malayalam content views.
#### Scenario: User selects language from landing page
- **WHEN** a user chooses Tamil or Malayalam from the language selector
- **THEN** hero and feed content update to requested language-aware rendering
- **AND** subsequent API requests use the selected language context
#### Scenario: User switches back to English
- **WHEN** a user selects English in the language selector
- **THEN** content renders in English
- **AND** language state updates immediately in the frontend view
### Requirement: User language preference is persisted and restored
The system SHALL persist selected language preference in client-side storage and restore it for returning users.
#### Scenario: Persist language selection
- **WHEN** a user selects a supported language on the landing page
- **THEN** the selected language code is stored in local storage or a client cookie
- **AND** the persisted value is used as preferred language for future visits on the same browser
#### Scenario: Restore preference on return visit
- **WHEN** a returning user opens the landing page
- **THEN** the system reads persisted language preference from client storage
- **AND** initializes the UI and content requests with that language by default

View File

@@ -0,0 +1,40 @@
## 1. Translation Data Model and Persistence
- [x] 1.1 Add translation persistence model linked to base article with language code (`en`, `ta`, `ml`)
- [x] 1.2 Update database initialization/migration path to create translation storage structures
- [x] 1.3 Add repository operations to create/read translation variants by article and language
- [x] 1.4 Enforce no duplicate translation variant for the same article-language pair
## 2. Ingestion Pipeline Translation Generation
- [x] 2.1 Extend ingestion flow to trigger Tamil and Malayalam translation generation for each new article
- [x] 2.2 Reuse Perplexity integration for translation calls with language-specific prompts
- [x] 2.3 Persist generated translations as linked variants during the same ingestion cycle
- [x] 2.4 Implement graceful fallback when translation generation fails (store English base, continue cycle)
## 3. Language-Aware API Delivery
- [x] 3.1 Add language selection input handling to latest-news endpoint
- [x] 3.2 Add language selection input handling to paginated feed endpoint
- [x] 3.3 Return translated headline/summary when available and fallback to English when missing
- [x] 3.4 Define and implement behavior for unsupported language requests in this phase
## 4. Frontend Language Selector and Rendering
- [x] 4.1 Add landing-page language selector UI with English, Tamil, and Malayalam options
- [x] 4.2 Update hero data fetch/render flow to request and display selected language content
- [x] 4.3 Update feed pagination fetch/render flow to request and display selected language content
- [x] 4.4 Keep existing attribution/media rendering behavior intact across language switches
## 5. Preference Persistence and Returning User Behavior
- [x] 5.1 Persist user-selected language in localStorage with cookie fallback
- [x] 5.2 Restore persisted language on page load before initial content fetch
- [x] 5.3 Initialize selector state and API language requests from restored preference
## 6. Validation and Documentation
- [x] 6.1 Validate translation creation and retrieval for Tamil and Malayalam on new articles
- [x] 6.2 Validate fallback behavior for missing translation variants and unsupported language input
- [x] 6.3 Validate returning-user language persistence across browser sessions
- [x] 6.4 Update README with multilingual behavior, language selector usage, and persistence details

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-12

View File

@@ -0,0 +1,95 @@
## Context
ClawFort currently stores and displays full headline/summary text from the ingestion pipeline and renders feed content directly in cards/hero. There is no dedicated concise summary format, modal reading experience, or summary-specific analytics lifecycle.
This change introduces a structured summary artifact per fetched article, with template-driven rendering and event instrumentation.
Constraints:
- Reuse existing Perplexity integration for generation.
- Keep source attribution visible and preserved.
- Prefer royalty-free image retrieval via MCP integration when available, with deterministic fallback path.
- Ensure modal interactions are fully tagged in Umami.
## Goals / Non-Goals
**Goals:**
- Generate and persist concise summary content at ingestion time.
- Persist and return template-compatible summary fields and image metadata.
- Present summary in a modal dialog with required visual structure.
- Track modal open/close/link-out analytics events consistently.
**Non-Goals:**
- Replacing the existing core feed API model end-to-end.
- Building a full long-form article reader.
- Introducing user-authored summary editing workflows.
- Supporting arbitrary analytics providers beyond current Umami hooks.
## Decisions
### Decision: Persist structured summary fields alongside article records
**Decision:** Store summary artifacts as explicit fields (TL;DR bullets, summary body, citation/source, summary image URL/credit) linked to each article.
**Rationale:**
- Enables deterministic API response shape for modal rendering.
- Keeps summary retrieval simple at read time.
- Avoids dynamic prompt regeneration during page interactions.
**Alternatives considered:**
- Generate summary on-demand at modal open: rejected due to latency and cost spikes.
- Store a single blob markdown string only: rejected due to weaker field-level control and analytics granularity.
### Decision: Use Perplexity for summary generation with strict output schema
**Decision:** Prompt Perplexity to return machine-parseable JSON fields that map directly to the template sections.
**Rationale:**
- Existing Perplexity integration and operational familiarity.
- Structured output reduces frontend parsing fragility.
**Alternatives considered:**
- Free-form text generation then regex parsing: rejected as brittle.
### Decision: Prefer MCP royalty-free image sourcing, fallback to deterministic non-MCP source path
**Decision:** When MCP image retrieval integration is configured, use it first; otherwise use a configured royalty-free provider path and fallback placeholder.
**Rationale:**
- Satisfies preference for MCP leverage while preserving reliability.
- Maintains legal/licensing constraints and avoids blocked ingestion.
**Alternatives considered:**
- Hard dependency on MCP only: rejected due to availability/runtime coupling risk.
### Decision: Add modal-specific analytics event contract
**Decision:** Define and emit explicit Umami events for summary modal open, close, and source link-out clicks.
**Rationale:**
- Makes summary engagement measurable independently of feed interactions.
- Prevents implicit/ambiguous event interpretation.
**Alternatives considered:**
- Reusing existing generic card click events only: rejected due to insufficient modal-level observability.
## Risks / Trade-offs
- **[Risk] Summary generation adds ingest latency** -> Mitigation: bounded retries and skip/fallback behavior.
- **[Risk] Provider output schema drift breaks parser** -> Mitigation: strict validation + fallback summary text behavior.
- **[Risk] Royalty-free image selection may be semantically weak** -> Mitigation: relevance prompt constraints and placeholder fallback.
- **[Trade-off] Additional stored fields increase row size** -> Mitigation: concise field limits and optional archival policy alignment.
- **[Risk] Event overcount from repeated modal toggles** -> Mitigation: standardize open/close trigger boundaries and dedupe rules in frontend logic.
## Migration Plan
1. Add summary/image metadata fields or related model for persisted summary artifacts.
2. Extend ingestion flow to generate structured summary + citation via Perplexity.
3. Integrate royalty-free image retrieval with MCP-preferred flow and fallback.
4. Extend API payloads to return summary-modal-ready data.
5. Implement frontend modal rendering with exact template and analytics tags.
6. Validate event tagging correctness and rendering fallback behavior.
Rollback:
- Disable modal entrypoint and return existing feed behavior while retaining stored summary data.
## Open Questions
- Should TL;DR bullet count be fixed (for example 3) or provider-adaptive within a bounded range?
- Should summary modal open be card-click only or have an explicit "Read Summary" CTA in each card?
- Which royalty-free provider is preferred default when MCP is unavailable?

View File

@@ -0,0 +1,37 @@
## Why
Users need a quick, concise view of fetched news without reading long article text. Adding a TL;DR summary flow improves scan speed, reading experience, and engagement while keeping source transparency.
## What Changes
- **New Capabilities:**
- Generate concise article summaries via Perplexity when news is fetched.
- Store structured summary content in the database using the required display template.
- Fetch and attach an appropriate royalty-free image for each summarized article.
- Render summary content in a modal dialog from the landing page.
- Add Umami event tagging for modal opens, closes, and source link-outs.
- **Backend:**
- Extend ingestion to call Perplexity summary generation and persist summary output.
- Integrate royalty-free image sourcing (prefer MCP path when available).
- **Frontend:**
- Add summary modal UI and interaction flow.
- Add event tracking for all required user actions in the modal.
## Capabilities
### New Capabilities
- `article-tldr-summary`: Generate and persist concise TL;DR + summary content per fetched article using Perplexity.
- `summary-modal-experience`: Display summary content in a modal dialog using the standardized template format.
- `royalty-free-image-enrichment`: Attach appropriate royalty-free images for summarized articles, leveraging MCP integration when available.
- `summary-analytics-tagging`: Track summary modal opens, closes, and external source link-outs via Umami event tags.
### Modified Capabilities
- None.
## Impact
- **Code:** Ingestion pipeline, storage model/schema, API response payloads, and frontend modal interactions will be updated.
- **APIs:** Existing news payloads will include summary/template-ready fields and image metadata.
- **Dependencies:** Reuses Perplexity; may add/enable royalty-free image provider integration (including MCP route if available).
- **Infrastructure:** No major topology change expected.
- **Data:** Database will store summary artifacts and associated image/source metadata.

View File

@@ -0,0 +1,22 @@
## ADDED Requirements
### Requirement: System generates structured TL;DR summary for each fetched article
The system SHALL generate a concise summary artifact for each newly fetched article using Perplexity during ingestion.
#### Scenario: Successful summary generation
- **WHEN** a new article is accepted in ingestion
- **THEN** the system generates TL;DR bullet points and a concise summary body
- **AND** output is persisted in a structured, template-compatible format
#### Scenario: Summary generation fallback
- **WHEN** summary generation fails for an article
- **THEN** ingestion continues without failing the entire cycle
- **AND** the article remains available with existing non-summary content
### Requirement: Summary storage includes citation and source context
The system SHALL persist source/citation information needed to render summary provenance.
#### Scenario: Persist source and citation metadata
- **WHEN** summary content is stored
- **THEN** associated source/citation fields are stored with the article summary artifact
- **AND** response payloads can render a "Source and Citation" section

View File

@@ -0,0 +1,22 @@
## ADDED Requirements
### Requirement: System enriches summaries with appropriate royalty-free images
The system SHALL attach an appropriate royalty-free image to each summarized article.
#### Scenario: Successful royalty-free image retrieval
- **WHEN** summary generation succeeds for an article
- **THEN** the system retrieves an appropriate royalty-free image for that article context
- **AND** stores image URL and attribution metadata for rendering
#### Scenario: MCP-preferred retrieval path
- **WHEN** MCP image integration is available in runtime
- **THEN** the system uses MCP-based retrieval as the preferred image sourcing path
- **AND** falls back to configured non-MCP royalty-free source path when MCP retrieval fails or is unavailable
### Requirement: Image retrieval failures do not block article availability
The system SHALL remain resilient when image sourcing fails.
#### Scenario: Image fallback behavior
- **WHEN** no suitable royalty-free image can be retrieved
- **THEN** the article summary remains available for modal display
- **AND** UI uses configured placeholder/fallback image behavior

View File

@@ -0,0 +1,22 @@
## ADDED Requirements
### Requirement: Modal interactions are tagged for analytics
The system SHALL emit Umami analytics events for summary modal open and close actions.
#### Scenario: Modal open event tagging
- **WHEN** a user opens the summary modal
- **THEN** the system emits a modal-open Umami event
- **AND** event payload includes article context identifier
#### Scenario: Modal close event tagging
- **WHEN** a user closes the summary modal
- **THEN** the system emits a modal-close Umami event
- **AND** event payload includes article context identifier when available
### Requirement: Source link-out interactions are tagged for analytics
The system SHALL emit Umami analytics events for source/citation link-outs from summary modal.
#### Scenario: Source link-out event tagging
- **WHEN** a user clicks source/citation link in summary modal
- **THEN** the system emits a link-out Umami event before or at navigation trigger
- **AND** event includes source URL or source identifier metadata

View File

@@ -0,0 +1,22 @@
## ADDED Requirements
### Requirement: Summary is rendered in a modal dialog using standard template
The system SHALL render article summary content in a modal dialog using the required structure.
#### Scenario: Open summary modal
- **WHEN** a user triggers summary view for an article
- **THEN** a modal dialog opens and displays content in this order: relevant image, TL;DR bullets, summary body, source and citation, and "Powered by Perplexity"
- **AND** modal content corresponds to the selected article
#### Scenario: Close summary modal
- **WHEN** a user closes the modal via close control or backdrop interaction
- **THEN** the modal is dismissed cleanly
- **AND** user returns to previous feed context without page navigation
### Requirement: Modal content preserves source link-out behavior
The system SHALL provide source link-outs from the summary modal.
#### Scenario: Source link-out from modal
- **WHEN** a user clicks source/citation link in the modal
- **THEN** the original source opens in a new tab/window
- **AND** modal behavior remains stable for continued browsing

View File

@@ -0,0 +1,46 @@
## 1. Summary Data Model and Persistence
- [x] 1.1 Add persisted summary fields for TL;DR bullets, summary body, source/citation, and summary image metadata
- [x] 1.2 Update database initialization/migration path for summary-related storage changes
- [x] 1.3 Add repository read/write helpers for structured summary artifact fields
## 2. Ingestion-Time Summary Generation
- [x] 2.1 Extend ingestion flow to request structured TL;DR + summary output from Perplexity for each fetched article
- [x] 2.2 Implement strict parser/validator for summary output schema used by the frontend template
- [x] 2.3 Persist generated summary artifacts with article records during the same ingestion cycle
- [x] 2.4 Add graceful fallback behavior when summary generation fails without blocking article availability
## 3. Royalty-Free Image Enrichment
- [x] 3.1 Implement royalty-free image retrieval for summarized articles with relevance constraints
- [x] 3.2 Prefer MCP-based image retrieval path when available
- [x] 3.3 Implement deterministic non-MCP fallback image path and placeholder behavior
- [x] 3.4 Persist image attribution/licensing metadata required for compliant display
## 4. API Delivery for Summary Modal
- [x] 4.1 Extend API response payloads to include summary-modal-ready fields
- [x] 4.2 Ensure API payload contract maps directly to required template sections
- [x] 4.3 Preserve source and citation link-out data in API responses
## 5. Frontend Summary Modal Experience
- [x] 5.1 Implement summary modal dialog component in landing page flow
- [x] 5.2 Render modal using required order: image, TL;DR bullets, summary, source/citation, Powered by Perplexity
- [x] 5.3 Add article-level trigger to open summary modal from feed interactions
- [x] 5.4 Implement robust modal close behavior (close control + backdrop interaction)
## 6. Analytics Event Tagging
- [x] 6.1 Emit Umami event on summary modal open with article context
- [x] 6.2 Emit Umami event on summary modal close with article context when available
- [x] 6.3 Emit Umami event on source/citation link-out from modal with source metadata
- [x] 6.4 Validate event naming and payload consistency across repeated interactions
## 7. Validation and Documentation
- [x] 7.1 Validate end-to-end summary generation, persistence, and modal rendering for new articles
- [x] 7.2 Validate fallback behavior for summary/image retrieval failures
- [x] 7.3 Validate source/citation visibility and external link behavior in modal
- [x] 7.4 Update README with summary modal feature behavior and analytics event contract

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-12

View File

@@ -0,0 +1,65 @@
## Context
ClawFort currently has a single-page news experience and no dedicated policy/disclaimer documents accessible from primary navigation. This creates ambiguity around authorship, verification, and acceptable use expectations.
This change introduces lightweight policy pages and footer navigation updates without changing core data flows or APIs.
## Goals / Non-Goals
**Goals:**
- Add visible footer links for Terms of Use and Attribution.
- Add dedicated pages with explicit non-ownership and AI-generation disclosures.
- Add clear risk language that content is unverified and users act at their own risk.
**Non-Goals:**
- Implementing full legal-policy versioning workflows.
- User-specific policy acceptance tracking.
- Backend auth/session changes.
## Decisions
### Decision: Serve policy pages as static frontend documents
**Decision:** Implement `terms.html` and `attribution.html` as static pages in the frontend directory.
**Rationale:**
- Lowest complexity for current architecture.
- Policy content is mostly static and does not require dynamic API data.
**Alternatives considered:**
- Backend-rendered templates: rejected due to unnecessary server complexity.
### Decision: Add persistent footer links on main page and policy pages
**Decision:** Footer includes links on landing page and reciprocal navigation from policy pages back to home.
**Rationale:**
- Improves discoverability and prevents navigation dead ends.
**Alternatives considered:**
- Header-only links: rejected due to crowded header and lower policy discoverability.
### Decision: Keep disclaimer wording explicit and prominent
**Decision:** Use direct language in page body and heading hierarchy emphasizing AI generation, non-ownership, and use-at-own-risk boundaries.
**Rationale:**
- Meets intent of legal disclosure and user expectation setting.
**Alternatives considered:**
- Compact single-line disclaimers: rejected as insufficiently clear.
## Risks / Trade-offs
- **[Risk] Disclaimer copy may still be interpreted differently by jurisdictions** -> Mitigation: keep language clear and easily editable in static pages.
- **[Trade-off] Static pages require redeploy for copy updates** -> Mitigation: isolate content in dedicated files for quick revision.
## Migration Plan
1. Add static policy pages under frontend.
2. Add footer links in the main page and cross-links in policy pages.
3. Validate page serving and navigation in local runtime.
Rollback:
- Remove policy pages and footer links; no data migration required.
## Open Questions
- Should policy pages include effective-date/version metadata in this phase?

View File

@@ -0,0 +1,26 @@
## Why
The site needs explicit legal/disclaimer pages so users understand content ownership boundaries and reliability limits. Adding these now reduces misuse risk and sets clear expectations for AI-generated, unverified information.
## What Changes
- Add two footer links: **Terms of Use** and **Attribution**.
- Create an **Attribution** page with clear disclosure that content is AI-generated and not authored/verified by the site owner.
- Create a **Terms of Use** page stating users must use the information at their own risk because it is not independently verified.
- Ensure footer links are visible and route correctly from the landing page.
## Capabilities
### New Capabilities
- `footer-policy-links`: Add footer navigation entries for Terms of Use and Attribution pages.
- `attribution-disclaimer-page`: Provide a dedicated attribution/disclaimer page with explicit AI-generation and non-ownership statements.
- `terms-of-use-risk-disclosure`: Provide a terms page that clearly states unverified information and user-at-own-risk usage.
### Modified Capabilities
- None.
## Impact
- **Frontend/UI:** Footer layout and navigation updated; two new legal/disclaimer pages added.
- **Routing/Serving:** Backend/static serving may need routes for new pages if not purely static-linked.
- **Content/Policy:** Adds formal disclaimer language for authorship, verification, and usage risk.

View File

@@ -0,0 +1,14 @@
## ADDED Requirements
### Requirement: Attribution page discloses AI generation and non-ownership
The system SHALL provide an Attribution page with explicit statements that content is AI-generated and not personally authored by the site owner.
#### Scenario: Attribution page title and disclosure content
- **WHEN** a user opens the Attribution page
- **THEN** the page title clearly indicates attribution/disclaimer purpose
- **AND** the body states that content is AI-generated and not generated by the owner as an individual
#### Scenario: Attribution page includes non-involvement statement
- **WHEN** a user reads the Attribution page
- **THEN** the page explicitly states owner non-involvement in generated content claims
- **AND** wording is presented in primary readable content area

View File

@@ -0,0 +1,14 @@
## ADDED Requirements
### Requirement: Footer exposes policy navigation links
The system SHALL display footer links for Terms of Use and Attribution on the landing page.
#### Scenario: Footer links visible on landing page
- **WHEN** a user loads the main page
- **THEN** the footer includes links labeled "Terms of Use" and "Attribution"
- **AND** links are visually distinguishable and keyboard focusable
#### Scenario: Footer links navigate correctly
- **WHEN** a user activates either policy link
- **THEN** the browser navigates to the corresponding policy page
- **AND** navigation succeeds without API dependency

View File

@@ -0,0 +1,14 @@
## ADDED Requirements
### Requirement: Terms page states unverified-content risk
The system SHALL provide a Terms of Use page that states information is unverified and use is at the users own risk.
#### Scenario: Terms page risk statement visible
- **WHEN** a user opens the Terms of Use page
- **THEN** the page includes clear at-own-risk usage language
- **AND** the page states information is not independently verified
#### Scenario: Terms page references source uncertainty
- **WHEN** a user reads terms details
- **THEN** the page explains content is surfaced from external/AI-generated sources
- **AND** users are informed responsibility remains with their own decisions

View File

@@ -0,0 +1,24 @@
## 1. Footer Policy Navigation
- [x] 1.1 Add Terms of Use and Attribution links to primary footer
- [x] 1.2 Ensure policy links are keyboard focusable and readable on all breakpoints
## 2. Attribution Page Content
- [x] 2.1 Create attribution page with explicit AI-generated and non-ownership disclosure title/content
- [x] 2.2 Add statement clarifying owner non-involvement in generated content claims
## 3. Terms of Use Risk Disclosure
- [x] 3.1 Create Terms of Use page with unverified-content and use-at-own-risk statements
- [x] 3.2 Add language that users remain responsible for downstream use decisions
## 4. Routing and Page Serving
- [x] 4.1 Wire policy page routes/serving behavior from current frontend/backend structure
- [x] 4.2 Add return navigation between home and policy pages
## 5. Validation and Documentation
- [x] 5.1 Validate footer link navigation and policy page rendering on desktop/mobile
- [x] 5.2 Update README or docs with policy page locations and purpose

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-12

View File

@@ -0,0 +1,68 @@
## Context
The current UI defaults to dark presentation and lacks a global theme control. Users with different preference and accessibility needs cannot choose light/system/high-contrast alternatives.
This change introduces an icon-based theme switcher and persistent client-side preference restoration.
## Goals / Non-Goals
**Goals:**
- Add a header theme switcher with system, light, dark, and high-contrast options.
- Apply theme choice globally with minimal visual regression.
- Persist preference in localStorage with cookie fallback.
- Restore returning-user choice; default to system when unset.
**Non-Goals:**
- Server-side profile theme persistence.
- Theme-specific content changes.
- Full design-system rewrite.
## Decisions
### Decision: Centralize theme state on `document.documentElement`
**Decision:** Set a root theme attribute/class and drive color tokens from CSS variables.
**Rationale:**
- Single source of truth for whole page styling.
- Works with existing Tailwind utility classes via custom CSS variable bridge.
**Alternatives considered:**
- Component-level theming flags: rejected due to drift and maintenance overhead.
### Decision: Keep system mode dynamic via `prefers-color-scheme`
**Decision:** For `system`, listen to media query changes and update resolved theme automatically.
**Rationale:**
- Matches user OS preference behavior.
**Alternatives considered:**
- One-time system snapshot: rejected as surprising for users changing OS theme at runtime.
### Decision: Use icon-only options with accessible labels
**Decision:** Theme controls are icon buttons with ARIA labels and visible selected state.
**Rationale:**
- Meets UX requirement while preserving accessibility.
**Alternatives considered:**
- Text dropdown: rejected due to explicit icon requirement.
## Risks / Trade-offs
- **[Risk] Existing hardcoded color classes may not adapt perfectly** -> Mitigation: prioritize core surfaces/text and progressively map remaining variants.
- **[Risk] High-contrast mode may expose layout artifacts** -> Mitigation: audit focus outlines, borders, and semantic contrast first.
- **[Trade-off] Additional JS for persistence and media listeners** -> Mitigation: keep logic modular and lightweight.
## Migration Plan
1. Add theme tokens and root theme resolver.
2. Implement icon switcher in header and state persistence.
3. Wire system preference listener and fallback behavior.
4. Validate across refresh/returning sessions and responsive breakpoints.
Rollback:
- Remove theme switcher and resolver, revert to existing dark-default classes.
## Open Questions
- Should high-contrast mode align with OS `prefers-contrast` in a later phase?

View File

@@ -0,0 +1,29 @@
## Why
The current UI is locked to dark presentation, which does not match all user preferences or accessibility needs. Adding multi-theme support now improves usability and lets returning users keep a consistent visual experience.
## What Changes
- Add a theme switcher in the top-right header area.
- Support four theme modes: **system**, **light**, **dark**, and **high-contrast**.
- Render theme options as icons (not text-only controls).
- Persist selected theme in client storage with **localStorage as primary** and **cookie fallback**.
- Restore persisted theme for returning users.
- Use **system** as default when no prior selection exists.
## Capabilities
### New Capabilities
- `theme-switcher-control`: Provide an icon-based theme switcher in the header with system/light/dark/high-contrast options.
- `theme-preference-persistence`: Persist and restore user-selected theme using localStorage first, with cookie fallback.
- `theme-default-system`: Apply system theme automatically when no saved preference exists.
### Modified Capabilities
- None.
## Impact
- **Frontend/UI:** Header controls and global styling system updated for four theme modes.
- **State management:** Client-side preference state handling added for theme selection and restoration.
- **Accessibility:** High-contrast option improves readability for users needing stronger contrast.
- **APIs/Backend:** No required backend API changes expected.

View File

@@ -0,0 +1,14 @@
## ADDED Requirements
### Requirement: System theme is default when no preference exists
The system SHALL default to system theme behavior if no persisted theme preference is found.
#### Scenario: No saved preference on first visit
- **WHEN** a user visits the site with no stored theme value
- **THEN** the UI resolves theme from system color-scheme preference
- **AND** switcher indicates system mode as active
#### Scenario: Persisted preference overrides system default
- **WHEN** a user has an existing stored theme preference
- **THEN** stored preference is applied instead of system mode
- **AND** user-selected theme remains stable across reloads

View File

@@ -0,0 +1,14 @@
## ADDED Requirements
### Requirement: Theme choice persists across sessions
The system SHALL persist user-selected theme with localStorage as primary storage and cookie fallback when localStorage is unavailable.
#### Scenario: Persist theme in localStorage
- **WHEN** localStorage is available and user selects a theme
- **THEN** selected theme is saved in localStorage
- **AND** stored value is used on next visit in same browser
#### Scenario: Cookie fallback persistence
- **WHEN** localStorage is unavailable or blocked
- **THEN** selected theme is saved in cookie storage
- **AND** cookie value is used to restore theme on return visit

View File

@@ -0,0 +1,14 @@
## ADDED Requirements
### Requirement: Header provides icon-based theme switcher
The system SHALL display a theme switcher in the top-right header area with icon controls for system, light, dark, and high-contrast modes.
#### Scenario: Theme options visible as icons
- **WHEN** a user views the header
- **THEN** all four theme options are represented by distinct icons
- **AND** each option is keyboard-accessible and screen-reader labeled
#### Scenario: Theme selection applies immediately
- **WHEN** a user selects a theme option
- **THEN** the page updates visual theme without full page reload
- **AND** selected option has a visible active state

View File

@@ -0,0 +1,26 @@
## 1. Theme Foundation
- [x] 1.1 Define root-level theme state model for system, light, dark, and high-contrast
- [x] 1.2 Add CSS token/variable mapping so all theme modes can be resolved consistently
## 2. Theme Switcher UI
- [x] 2.1 Add icon-based theme switcher control to top-right header area
- [x] 2.2 Provide accessible labels and active-state indication for each icon option
## 3. Theme Preference Persistence
- [x] 3.1 Persist selected theme in localStorage when available
- [x] 3.2 Implement cookie fallback persistence when localStorage is unavailable
- [x] 3.3 Restore persisted preference for returning users
## 4. System Default Behavior
- [x] 4.1 Apply system mode when no persisted preference exists
- [x] 4.2 Ensure saved user preference overrides system default on subsequent visits
## 5. Validation and Documentation
- [x] 5.1 Validate theme switching and persistence across refreshes and browser restarts
- [x] 5.2 Validate icon controls with keyboard navigation and screen reader labels
- [x] 5.3 Update README/docs with theme options and persistence behavior

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-12

View File

@@ -0,0 +1,71 @@
## Context
The current one-page application has strong feature velocity but uneven robustness across viewport sizes and accessibility states. It also initializes analytics without an explicit user consent gate, which creates compliance and trust risks in stricter jurisdictions.
This change introduces responsive hardening, WCAG 2.2 AA baseline conformance, and explicit cookie-consent-controlled tracking behavior.
## Goals / Non-Goals
**Goals:**
- Ensure key UI flows work consistently across mobile, tablet, and desktop.
- Bring critical interactions and content presentation to WCAG 2.2 AA expectations.
- Add consent UI and persist consent state in cookies (with local state sync).
- Gate analytics script/event execution behind consent.
**Non-Goals:**
- Full legal-policy framework beyond consent capture basics.
- Rebuilding the full visual system from scratch.
- Country-specific geo-personalized consent variants in this phase.
## Decisions
### Decision: Define responsive guarantees around existing breakpoints and interaction surfaces
**Decision:** Formalize requirements for hero, feed cards, modal/dialog interactions, and header controls across common breakpoints and orientations.
**Rationale:**
- Targets user-visible breakage points first.
- Reduces regression risk while keeping implementation incremental.
**Alternatives considered:**
- Pixel-perfect per-device tailoring: rejected due to maintenance cost.
### Decision: Prioritize WCAG 2.2 AA for core paths and controls
**Decision:** Apply compliance to keyboard navigation, focus indicators, contrast, semantic labels, and non-text alternatives in primary user journeys.
**Rationale:**
- Maximizes accessibility impact where users spend time.
- Keeps scope realistic for immediate hardening.
**Alternatives considered:**
- Attempting broad AAA alignment: rejected as out-of-scope for this phase.
### Decision: Gate analytics on explicit consent and persist choice in cookie
**Decision:** Tracking initializes only after user consent; cookie stores consent state and optional local state mirrors for fast frontend read.
**Rationale:**
- Aligns with safer consent posture and user transparency.
- Supports returning-user behavior without backend session coupling.
**Alternatives considered:**
- Always-on tracking with notice-only banner: rejected for compliance risk.
## Risks / Trade-offs
- **[Risk] Responsive fixes can introduce visual drift across existing sections** -> Mitigation: validate at target breakpoints and keep changes token-based.
- **[Risk] Accessibility remediations may require widespread class/markup changes** -> Mitigation: focus first on critical interactions and reuse shared utility patterns.
- **[Trade-off] Consent gating can reduce analytics volume** -> Mitigation: explicit consent messaging and friction-minimized accept flow.
## Migration Plan
1. Add responsive and accessibility acceptance criteria for key components.
2. Implement consent banner, persistence logic, and analytics gating.
3. Refine UI semantics/focus/contrast and test keyboard-only navigation.
4. Validate across viewport matrix and accessibility checklist.
Rollback:
- Disable consent gate logic and revert to prior analytics init path while retaining non-breaking responsive/accessibility improvements.
## Open Questions
- Should consent expiration/renewal interval be introduced in this phase or follow-up?
- Should consent state include analytics-only granularity now, or remain a single accepted state?

View File

@@ -0,0 +1,27 @@
## Why
The product needs stronger technical quality and trust signals across devices and accessibility contexts. Improving responsiveness, WCAG 2.2 AA conformance, and compliant consent handling reduces usability risk and supports broader adoption.
## What Changes
- Make the site fully device-agnostic and responsive across mobile, tablet, and desktop breakpoints.
- Bring key user flows to WCAG 2.2 AA standards (contrast, focus visibility, keyboard navigation, semantics, and non-text content).
- Add a cookie consent banner with clear consent messaging and persistence.
- Record consent in browser cookies (with local state sync where applicable) and apply analytics only after consent is given.
## Capabilities
### New Capabilities
- `responsive-device-agnostic-layout`: Ensure core pages/components adapt reliably across viewport sizes and input modes.
- `wcag-2-2-aa-accessibility`: Enforce WCAG 2.2 AA requirements for interactive and content-rendering paths.
- `cookie-consent-tracking-gate`: Provide consent capture and persistence for analytics/tracking behavior.
### Modified Capabilities
- None.
## Impact
- **Frontend/UI:** Layout, spacing, typography, and interaction behavior updated for responsive and accessible presentation.
- **Accessibility:** ARIA semantics, keyboard focus flow, and contrast/focus treatments refined.
- **Analytics/Consent:** Consent banner and tracking gate logic added; cookie persistence introduced.
- **QA/Validation:** Accessibility and responsiveness verification scope expands (manual + automated checks where available).

View File

@@ -0,0 +1,14 @@
## ADDED Requirements
### Requirement: Consent banner captures and persists tracking consent
The system SHALL display a cookie consent banner and persist user consent decision in cookies before enabling analytics tracking.
#### Scenario: Consent capture and persistence
- **WHEN** a user interacts with the consent banner and accepts
- **THEN** consent state is stored in a cookie
- **AND** stored consent is honored on subsequent visits
#### Scenario: Tracking gated by consent
- **WHEN** consent has not been granted
- **THEN** analytics/tracking scripts and events do not execute
- **AND** tracking begins only after consent state indicates acceptance

View File

@@ -0,0 +1,14 @@
## ADDED Requirements
### Requirement: Core layout is device-agnostic and responsive
The system SHALL render key surfaces (header, hero, feed, modal, footer) responsively across mobile, tablet, and desktop viewports.
#### Scenario: Mobile layout behavior
- **WHEN** a user opens the site on a mobile viewport
- **THEN** content remains readable without horizontal overflow
- **AND** interactive controls remain reachable and usable
#### Scenario: Desktop and tablet adaptation
- **WHEN** a user opens the site on tablet or desktop viewports
- **THEN** layout reflows according to breakpoint design rules
- **AND** no key content or controls are clipped

View File

@@ -0,0 +1,14 @@
## ADDED Requirements
### Requirement: Core user flows comply with WCAG 2.2 AA baseline
The system SHALL meet WCAG 2.2 AA accessibility requirements for primary interactions and content presentation.
#### Scenario: Keyboard-only interaction flow
- **WHEN** a keyboard-only user navigates the page
- **THEN** all primary interactive elements are reachable and operable
- **AND** visible focus indication is present at each step
#### Scenario: Contrast and non-text alternatives
- **WHEN** users consume text and non-text UI content
- **THEN** color contrast meets AA thresholds for relevant text and controls
- **AND** meaningful images and controls include accessible labels/alternatives

View File

@@ -0,0 +1,28 @@
## 1. Responsive Hardening
- [x] 1.1 Audit and fix layout breakpoints for header, hero, feed cards, modal, and footer
- [x] 1.2 Ensure no horizontal overflow or clipped controls on supported viewport sizes
## 2. WCAG 2.2 AA Accessibility Baseline
- [x] 2.1 Implement/verify keyboard operability for primary controls and dialogs
- [x] 2.2 Add/verify visible focus indicators and semantic labels for interactive elements
- [x] 2.3 Improve contrast and non-text alternatives to meet AA expectations on core flows
## 3. Cookie Consent and Tracking Gate
- [x] 3.1 Implement consent banner UI with explicit analytics consent action
- [x] 3.2 Persist consent state in cookies and synchronize frontend state
- [x] 3.3 Gate analytics script/event initialization until consent is granted
## 4. Returning User Consent Behavior
- [x] 4.1 Restore prior consent state from cookie on returning visits
- [x] 4.2 Ensure tracking remains disabled without stored accepted consent
## 5. Verification and Documentation
- [x] 5.1 Validate responsive behavior on mobile/tablet/desktop matrices
- [x] 5.2 Run accessibility checks and manual keyboard-only walkthrough for critical journeys
- [x] 5.3 Validate consent gating and analytics behavior before/after acceptance
- [x] 5.4 Update README/docs with accessibility and consent behavior notes

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-13

View File

@@ -0,0 +1,91 @@
## Context
ClawFort serves a static, client-rendered news experience backed by FastAPI endpoints and scheduled content refresh. The change introduces explicit technical requirements for crawlability, structured data quality, and delivery speed so SEO behavior is reliable across homepage, article cards, and static policy pages.
Current implementation already includes foundational metadata and partial performance behavior, but requirements are not yet codified in change specs. This design defines an implementation approach that keeps existing architecture (FastAPI + static frontend) while formalizing output guarantees required by search engines and validators.
## Goals / Non-Goals
**Goals:**
- Define a deterministic metadata contract for homepage and static pages (description, canonical, robots, Open Graph, Twitter card fields).
- Define structured-data output for homepage (`Newspaper`) and every rendered news item (`NewsArticle`) with stable required properties.
- Define response-delivery expectations for compression and cache policy plus front-end media loading behavior.
- Keep requirements implementable in the current stack without introducing heavyweight infrastructure.
**Non-Goals:**
- Full SSR migration or framework replacement.
- Introduction of external CDN, edge workers, or managed caching tiers.
- Reworking editorial/news-fetch business logic.
- Rich-result optimization for types outside this scope (e.g., FAQ, VideoObject, LiveBlogPosting).
## Decisions
### Decision: Keep JSON-LD generation in the existing page runtime contract
**Decision:** Define structured data as JSON-LD embedded in `index.html`, populated from the same article data model used by hero/feed rendering.
**Rationale:**
- Avoids duplication between UI content and structured data.
- Preserves current architecture and deployment flow.
- Supports homepage-wide `@graph` output containing `Newspaper` and multiple `NewsArticle` nodes.
**Alternatives considered:**
- Server-side rendered JSON-LD via template engine: rejected due to architectural drift and migration overhead.
- Microdata-only tagging: rejected because JSON-LD is simpler to maintain and validate for this use case.
### Decision: Use standards-aligned required field baseline for `NewsArticle`
**Decision:** Require each `NewsArticle` node to include stable core fields: headline, description, image, datePublished, dateModified, url/mainEntityOfPage, inLanguage, publisher, and author.
**Rationale:**
- Produces predictable, testable output.
- Reduces schema validation regressions from partial payloads.
- Aligns with common crawler expectations for article entities.
**Alternatives considered:**
- Minimal schema with only headline/url: rejected due to weak semantic value and poorer validation confidence.
### Decision: Enforce lightweight HTTP performance controls in-app
**Decision:** Treat transport optimization as explicit requirements using in-app compression middleware and response cache headers by route class (static assets, APIs, HTML pages).
**Rationale:**
- High impact with minimal infrastructure changes.
- Testable directly in integration checks.
- Works in current deployment topology.
**Alternatives considered:**
- Delegate entirely to reverse proxy/CDN: rejected because this repository currently controls delivery behavior directly.
### Decision: Standardize lazy media loading behavior with shimmer placeholders
**Decision:** Define lazy-loading requirements for non-critical images and require shimmer placeholder states until image load/error resolution.
**Rationale:**
- Improves perceived performance and consistency.
- Helps reduce layout instability when paired with explicit image dimensions.
- Fits existing UI loading pattern.
**Alternatives considered:**
- Skeleton-only page-level placeholders: rejected because item-level shimmer provides better visual continuity.
## Risks / Trade-offs
- **[Risk] Dynamic metadata timing for client-rendered content** -> Mitigation: require baseline static metadata defaults and deterministic runtime replacement after hero/article payload availability.
- **[Risk] Overly aggressive cache behavior could stale fresh news** -> Mitigation: short API max-age with stale-while-revalidate; separate longer static asset policy.
- **[Trade-off] Strict validation vs. framework directives in markup** -> Mitigation: define standards-compatible output goals and track exceptions where framework attributes are unavoidable.
- **[Trade-off] More metadata fields increase maintenance** -> Mitigation: centralize field mapping helpers and require parity with article model fields.
## Migration Plan
1. Implement and verify metadata/structured-data contracts on homepage and news-card rendering paths.
2. Add/verify response compression and route-level cache directives in backend delivery layer.
3. Align image loading UX requirements (lazy + shimmer + explicit dimensions) across hero/feed/modal contexts.
4. Validate output with schema and HTML validation tooling, then fix conformance gaps.
5. Document acceptance checks and rollback approach.
Rollback:
- Revert SEO/performance-specific frontend/backend changes to prior baseline while retaining unaffected feature behavior.
- Remove schema additions and route cache directives if they introduce regressions.
## Open Questions
- Should policy pages (`/terms`, `/attribution`) share a stricter noindex strategy or remain indexable by default?
- Should canonical URLs include hash anchors for in-page article cards or stay route-level canonical only?
- Do we require locale-specific `og:locale`/alternate tags in this phase or defer to a follow-up i18n SEO change?

View File

@@ -0,0 +1,28 @@
## Why
ClawFort currently lacks a formal SEO and structured-data specification, which limits discoverability and consistency for search crawlers. Defining this now ensures the news experience is indexable, standards-oriented, and performance-focused as the content footprint grows.
## What Changes
- Add search-focused metadata requirements for the main page and policy pages (description, canonical, robots, social preview tags).
- Define structured data requirements so the home page is represented as `Newspaper` and each news item is represented as `NewsArticle`.
- Establish performance requirements for transport and caching behavior (HTTP compression and cache directives) plus front-end loading behavior.
- Define UX and rendering requirements for image lazy loading with shimmer placeholders and smooth scrolling.
- Require markup and interaction patterns that are compatible with strict standards validation goals.
## Capabilities
### New Capabilities
- `seo-meta-and-social-tags`: Standardize meta, canonical, robots, and social preview tags for key public pages.
- `news-structured-data`: Provide machine-readable `Newspaper` and `NewsArticle` structured data for homepage and article entries.
- `delivery-and-rendering-performance`: Define response compression/caching and client-side loading behavior for faster page delivery.
### Modified Capabilities
- None.
## Impact
- **Frontend/UI:** `frontend/index.html` and static policy pages gain SEO metadata, structured-data hooks, and loading-state behavior requirements.
- **Backend/API Delivery:** `backend/main.py` response middleware/headers are affected by compression and cache policy expectations.
- **Quality/Validation:** Standards conformance and SEO validation become explicit acceptance criteria for this change.
- **Operations:** Performance posture depends on HTTP behavior and deploy/runtime configuration alignment.

View File

@@ -0,0 +1,35 @@
## ADDED Requirements
### Requirement: HTTP delivery applies compression and cache policy
The system SHALL apply transport-level compression and explicit cache directives for static assets, API responses, and public HTML routes.
#### Scenario: Compressed responses are available for eligible payloads
- **WHEN** a client requests compressible content that exceeds the compression threshold
- **THEN** the response is served with gzip compression
- **AND** response headers advertise the selected content encoding
#### Scenario: Route classes receive deterministic cache-control directives
- **WHEN** clients request static assets, API responses, or HTML page routes
- **THEN** each route class returns a cache policy aligned to its freshness requirements
- **AND** cache directives are explicit and testable from response headers
### Requirement: Media rendering optimizes perceived loading performance
The system SHALL lazy-load non-critical images and render shimmer placeholders until image load completion or fallback resolution.
#### Scenario: Feed and modal images lazy-load with placeholders
- **WHEN** feed or modal images have not completed loading
- **THEN** a shimmer placeholder is visible for the pending image region
- **AND** the placeholder is removed after load or fallback error handling completes
#### Scenario: Image rendering reduces layout shift risk
- **WHEN** article images are rendered in hero, feed, or modal contexts
- **THEN** image elements include explicit dimensions and async decoding hints
- **AND** layout remains stable while content loads
### Requirement: Smooth scrolling behavior is consistently enabled
The system SHALL provide smooth scrolling behavior for in-page navigation and user-initiated scroll interactions.
#### Scenario: In-page navigation uses smooth scrolling
- **WHEN** users navigate to in-page anchors or equivalent interactions
- **THEN** scrolling transitions occur smoothly rather than jumping abruptly
- **AND** behavior is consistent across supported breakpoints

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Homepage publishes Newspaper structured data
The system SHALL expose a valid JSON-LD entity of type `Newspaper` on the homepage.
#### Scenario: Newspaper entity is emitted on homepage
- **WHEN** the homepage HTML is rendered
- **THEN** a JSON-LD script block includes an entity with `@type` set to `Newspaper`
- **AND** the entity includes stable publisher and site identity fields
#### Scenario: Newspaper entity remains language-aware
- **WHEN** homepage content is rendered in a selected language
- **THEN** the structured data includes language context for the active locale
- **AND** language output stays consistent with visible content language
### Requirement: Each rendered news item publishes NewsArticle structured data
The system SHALL expose a valid JSON-LD entity of type `NewsArticle` for each rendered news item in hero and feed contexts.
#### Scenario: NewsArticle entities include required semantic fields
- **WHEN** news items are present on the homepage
- **THEN** each `NewsArticle` entity includes headline, description, image, publication dates, and URL fields
- **AND** publisher and author context are present for each item
#### Scenario: Structured data avoids duplicate article entities
- **WHEN** article data appears across hero and feed sections
- **THEN** structured-data output deduplicates entities for the same article URL
- **AND** only one canonical semantic entry remains for each unique article

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Core SEO metadata is present on public pages
The system SHALL expose standards-compliant SEO metadata on the homepage and policy pages, including description, robots, canonical URL, and social preview metadata.
#### Scenario: Homepage metadata baseline exists
- **WHEN** a crawler or browser loads the homepage
- **THEN** the document includes `description`, `robots`, and canonical metadata
- **AND** Open Graph and Twitter card metadata fields are present with non-empty values
#### Scenario: Policy pages include indexable metadata
- **WHEN** a crawler loads `/terms` or `/attribution`
- **THEN** the page includes page-specific `title` and `description` metadata
- **AND** Open Graph and Twitter card metadata are present for link previews
### Requirement: Canonical and preview metadata remain deterministic
The system SHALL keep canonical and preview metadata deterministic for each route to avoid conflicting crawler signals.
#### Scenario: Canonical URL reflects active route
- **WHEN** metadata is rendered for a public route
- **THEN** exactly one canonical link is emitted for that route
- **AND** canonical metadata does not point to unrelated routes
#### Scenario: Social preview tags map to current page context
- **WHEN** the page metadata is generated or updated
- **THEN** `og:title`, `og:description`, and corresponding Twitter fields reflect the current page context
- **AND** preview image fields resolve to a valid absolute URL

View File

@@ -0,0 +1,34 @@
## 1. SEO Metadata and Social Tags
- [x] 1.1 Ensure homepage and policy pages expose required `title`, `description`, `robots`, and canonical metadata.
- [x] 1.2 Ensure Open Graph and Twitter metadata fields are present and mapped to current page context.
- [x] 1.3 Add verification checks for deterministic canonical URLs and valid absolute social image URLs.
## 2. Structured Data (Newspaper and NewsArticle)
- [x] 2.1 Implement and verify homepage `Newspaper` JSON-LD output with publisher/site identity fields.
- [x] 2.2 Implement and verify `NewsArticle` JSON-LD output for hero and feed items using required semantic fields.
- [x] 2.3 Add deduplication logic so repeated hero/feed references emit one semantic entity per article URL.
## 3. Delivery and Caching Performance
- [x] 3.1 Apply and validate gzip compression for eligible responses.
- [x] 3.2 Apply and validate explicit cache-control policies for static assets, APIs, and HTML routes.
- [x] 3.3 Verify route-level header behavior with repeatable checks and document expected header values.
## 4. Rendering Performance and UX
- [x] 4.1 Ensure non-critical images use lazy loading with explicit dimensions and async decoding hints.
- [x] 4.2 Ensure shimmer placeholders are visible until image load or fallback completion in feed and modal contexts.
- [x] 4.3 Ensure smooth scrolling behavior remains consistent for in-page navigation interactions.
## 5. Validation and Acceptance
- [x] 5.1 Validate structured data output for `Newspaper` and `NewsArticle` entities against schema expectations.
- [x] 5.2 Validate HTML/metadata output against project validation goals and resolve conformance gaps.
- [x] 5.3 Execute regression checks for homepage rendering, article card behavior, and policy page metadata.
## 6. Documentation
- [x] 6.1 Document SEO/structured-data contracts and performance header expectations in project docs.
- [x] 6.2 Document verification steps so future changes can re-run SEO and performance acceptance checks.

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-13

View File

@@ -0,0 +1,144 @@
## Context
ClawFort enriches news articles with royalty-free images via `fetch_royalty_free_image()` in `backend/news_service.py` (lines 297-344). The current implementation follows a priority chain:
1. **MCP Endpoint** (if configured) — POST `{"query": headline}` to custom endpoint
2. **Wikimedia Commons** — Search API with headline, returns first result
3. **Picsum** (default) — Deterministic URL from MD5 hash, returns random unrelated images
The problem: Picsum is the effective default, producing visually random images with no relevance to article content. Wikimedia search quality is inconsistent. The MCP endpoint infrastructure exists but requires external service configuration.
Three MCP-based image services are available and verified:
- **Pixabay MCP** (`zym9863/pixabay-mcp`) — npm package, `search_pixabay_images` tool
- **Unsplash MCP** (`cevatkerim/unsplash-mcp`) — Python/FastMCP, `search` tool
- **Pexels MCP** (`garylab/pexels-mcp-server`) — PyPI package, `photos_search` tool
## Goals / Non-Goals
**Goals:**
- Integrate Pixabay, Unsplash, and Pexels as first-class image providers via direct API calls
- Implement configurable provider priority chain with automatic fallback
- Improve query construction for better image relevance (keyword extraction)
- Maintain backward compatibility with existing MCP endpoint and Wikimedia/Picsum fallbacks
- Handle provider-specific attribution requirements
**Non-Goals:**
- Replacing the existing MCP endpoint mechanism (it remains as override option)
- Caching image search results (can be added later)
- User-facing image selection UI (future enhancement)
- Video retrieval from Pixabay/Pexels (images only)
## Decisions
### Decision 1: Direct API Integration vs MCP Server Dependency
**Choice:** Direct HTTP API calls to Pixabay/Unsplash/Pexels APIs
**Alternatives Considered:**
- *MCP Server Sidecar:* Run MCP servers as separate processes, call via MCP protocol
- Rejected: Adds deployment complexity (multiple processes, stdio communication)
- *Existing MCP Endpoint Override:* Require users to run their own MCP bridge
- Rejected: High friction, existing behavior preserved as optional override
**Rationale:** The provider APIs are simple REST endpoints. Direct `httpx` calls match the existing Wikimedia pattern, require no additional infrastructure, and keep the codebase consistent.
### Decision 2: Provider Priority Chain Architecture
**Choice:** Ordered provider list with per-provider enable/disable via environment variables
```
ROYALTY_IMAGE_PROVIDERS=pixabay,unsplash,pexels,wikimedia,picsum
PIXABAY_API_KEY=xxx
UNSPLASH_ACCESS_KEY=xxx
PEXELS_API_KEY=xxx
```
**Alternatives Considered:**
- *Single Provider Selection:* `ROYALTY_IMAGE_PROVIDER=pixabay` (current pattern)
- Rejected: No fallback when provider fails or returns empty results
- *Hardcoded Priority:* Always try Pixabay → Unsplash → Pexels
- Rejected: Users may prefer different ordering or want to disable specific providers
**Rationale:** Flexible ordering lets users prioritize by image quality preference, rate limits, or API cost. Fallback chain ensures images are found even when primary provider fails.
### Decision 3: Query Refinement Strategy
**Choice:** Extract first 3-5 keywords from headline, remove stop words, join with spaces
**Alternatives Considered:**
- *Full Headline:* Pass entire headline to image APIs
- Rejected: Long queries often return poor results; APIs work better with keywords
- *LLM Keyword Extraction:* Use Perplexity/OpenRouter to extract search terms
- Rejected: Adds latency and API cost for marginal improvement
- *NLP Library:* Use spaCy or NLTK for keyword extraction
- Rejected: Heavy dependency for simple task
**Rationale:** Simple regex-based extraction (remove common stop words, take first N significant words) is fast, dependency-free, and effective for image search.
### Decision 4: Attribution Handling
**Choice:** Provider-specific credit format stored in `summary_image_credit`
| Provider | Credit Format |
|----------|---------------|
| Pixabay | `"Photo by {user} on Pixabay"` |
| Unsplash | `"Photo by {user} on Unsplash"` (required by TOS) |
| Pexels | `"Photo by {photographer} on Pexels"` |
| Wikimedia | `"Wikimedia Commons"` (existing) |
| Picsum | `"Picsum Photos"` (existing) |
**Rationale:** Each provider has specific attribution requirements. Unsplash TOS requires photographer credit. Format is consistent and human-readable.
## Risks / Trade-offs
| Risk | Mitigation |
|------|------------|
| API rate limits exhausted | Fallback chain continues to next provider; Picsum always succeeds as final fallback |
| API keys exposed in logs | Never log API keys; use `***` masking if debugging request URLs |
| Provider API changes | Isolate each provider in separate function; easy to update one without affecting others |
| Slow image search adds latency | Existing 15s timeout per provider; total chain timeout could reach 45s+ if all fail |
| Empty search results | Fallback to next provider; if all fail, use existing article image or placeholder |
## Provider API Contracts
### Pixabay API
- **Endpoint:** `https://pixabay.com/api/`
- **Auth:** Query param `key={PIXABAY_API_KEY}`
- **Search:** `GET /?key={key}&q={query}&image_type=photo&per_page=3&safesearch=true`
- **Response:** `{"hits": [{"webformatURL": "...", "user": "..."}]}`
### Unsplash API
- **Endpoint:** `https://api.unsplash.com/search/photos`
- **Auth:** Header `Authorization: Client-ID {UNSPLASH_ACCESS_KEY}`
- **Search:** `GET ?query={query}&per_page=3`
- **Response:** `{"results": [{"urls": {"regular": "..."}, "user": {"name": "..."}}]}`
### Pexels API
- **Endpoint:** `https://api.pexels.com/v1/search`
- **Auth:** Header `Authorization: {PEXELS_API_KEY}`
- **Search:** `GET ?query={query}&per_page=3`
- **Response:** `{"photos": [{"src": {"large": "..."}, "photographer": "..."}]}`
## Migration Plan
1. **Phase 1:** Add new provider functions (non-breaking)
- Add `fetch_pixabay_image()`, `fetch_unsplash_image()`, `fetch_pexels_image()`
- Add new config variables with empty defaults
- Existing behavior unchanged
2. **Phase 2:** Integrate provider chain
- Modify `fetch_royalty_free_image()` to iterate configured providers
- Preserve existing MCP endpoint override as highest priority
- Preserve Wikimedia and Picsum as fallback options
3. **Phase 3:** Documentation and rollout
- Update README with new environment variables
- Update `.env.example` with provider key placeholders
**Rollback:** Set `ROYALTY_IMAGE_PROVIDERS=picsum` to restore original behavior.
## Open Questions
- Should we add retry logic per provider (currently single attempt each)?
- Should provider timeouts be configurable or keep fixed 15s?
- Consider caching search results by query hash to reduce API calls?

View File

@@ -0,0 +1,28 @@
## Why
ClawFort currently defaults to Picsum for article imagery, which returns random photos unrelated to article content. This undermines user trust and content quality. The existing MCP endpoint infrastructure supports pluggable image providers but lacks configuration for relevance-focused services. Connecting to curated royalty-free image APIs (Pixabay, Unsplash, Pexels) will deliver contextually relevant images that match article topics.
## What Changes
- Integrate MCP-based image providers (Pixabay, Unsplash, Pexels) that support keyword-based search for relevant royalty-free images.
- Implement a provider priority chain with automatic fallback when primary providers fail or return no results.
- Refine search query construction to improve image relevance (keyword extraction, query normalization).
- Add provider-specific attribution handling to comply with license requirements.
- Document configuration for each supported MCP image provider.
## Capabilities
### New Capabilities
- `mcp-image-provider-integration`: Configure and connect to MCP-based image services (Pixabay, Unsplash, Pexels) for keyword-driven image retrieval.
- `image-provider-fallback-chain`: Define provider priority and fallback behavior when primary sources fail or return empty results.
- `image-query-refinement`: Extract and normalize search keywords from article content to improve image relevance.
### Modified Capabilities
- `fetch_royalty_free_image`: Extend existing function to support multiple MCP providers with fallback logic and refined query handling.
## Impact
- **Backend/Image Service:** `backend/news_service.py` image retrieval logic gains multi-provider support and query refinement.
- **Configuration:** `backend/config.py` adds provider priority list and per-provider API key variables.
- **Documentation:** `README.md` environment variable table expands with new provider configuration options.
- **Operations:** Image quality and relevance improve without frontend changes; existing shimmer/lazy-load behavior remains intact.

View File

@@ -0,0 +1,59 @@
## ADDED Requirements
### Requirement: Configurable provider priority
The system SHALL support configuring image provider order via `ROYALTY_IMAGE_PROVIDERS` environment variable.
#### Scenario: Custom provider order
- **WHEN** `ROYALTY_IMAGE_PROVIDERS=unsplash,pexels,pixabay,wikimedia,picsum`
- **THEN** system tries providers in order: Unsplash → Pexels → Pixabay → Wikimedia → Picsum
#### Scenario: Default provider order
- **WHEN** `ROYALTY_IMAGE_PROVIDERS` is not set or empty
- **THEN** system uses default order: `pixabay,unsplash,pexels,wikimedia,picsum`
#### Scenario: Single provider configured
- **WHEN** `ROYALTY_IMAGE_PROVIDERS=pexels`
- **THEN** system only tries Pexels provider
- **AND** returns `(None, None)` if Pexels fails or is not configured
### Requirement: Sequential fallback execution
The system SHALL try providers sequentially until one returns a valid image.
#### Scenario: First provider succeeds
- **WHEN** provider chain is `pixabay,unsplash,pexels` AND Pixabay returns valid image
- **THEN** system returns Pixabay image immediately
- **AND** does NOT call Unsplash or Pexels APIs
#### Scenario: First provider fails, second succeeds
- **WHEN** provider chain is `pixabay,unsplash,pexels` AND Pixabay returns no results AND Unsplash returns valid image
- **THEN** system returns Unsplash image
- **AND** does NOT call Pexels API
#### Scenario: All providers fail
- **WHEN** all configured providers return `(None, None)`
- **THEN** system returns `(None, None)` as final result
- **AND** caller handles fallback to article image or placeholder
### Requirement: MCP endpoint override
The existing `ROYALTY_IMAGE_MCP_ENDPOINT` SHALL take priority over the provider chain when configured.
#### Scenario: MCP endpoint configured
- **WHEN** `ROYALTY_IMAGE_MCP_ENDPOINT` is set to valid URL
- **THEN** system tries MCP endpoint first before provider chain
- **AND** falls back to provider chain only if MCP endpoint fails
#### Scenario: MCP endpoint not configured
- **WHEN** `ROYALTY_IMAGE_MCP_ENDPOINT` is empty or unset
- **THEN** system skips MCP endpoint and proceeds directly to provider chain
### Requirement: Provider skip on missing credentials
Providers without required API keys SHALL be skipped silently.
#### Scenario: Skip unconfigured provider
- **WHEN** provider chain includes `pixabay` AND `PIXABAY_API_KEY` is not set
- **THEN** Pixabay is skipped without error
- **AND** chain continues to next provider
#### Scenario: All providers skipped
- **WHEN** no providers in chain have valid API keys configured
- **THEN** system falls back to Wikimedia (no key required) or Picsum (always available)

View File

@@ -0,0 +1,71 @@
## ADDED Requirements
### Requirement: Keyword extraction from headline
The system SHALL extract relevant keywords from article headlines for image search.
#### Scenario: Extract keywords from standard headline
- **WHEN** headline is "OpenAI Announces GPT-5 with Revolutionary Reasoning Capabilities"
- **THEN** extracted query is "OpenAI GPT-5 Revolutionary Reasoning"
- **AND** stop words like "Announces", "with", "Capabilities" are removed
#### Scenario: Handle short headline
- **WHEN** headline is "AI Breakthrough"
- **THEN** extracted query is "AI Breakthrough"
- **AND** no keywords are removed (headline too short)
#### Scenario: Handle headline with special characters
- **WHEN** headline is "Tesla's Self-Driving AI: 99.9% Accuracy Achieved!"
- **THEN** extracted query is "Tesla Self-Driving AI Accuracy"
- **AND** special characters like apostrophes, colons, and punctuation are normalized
### Requirement: Stop word removal
The system SHALL remove common English stop words from search queries.
#### Scenario: Remove articles and prepositions
- **WHEN** headline is "The Future of AI in the Healthcare Industry"
- **THEN** extracted query is "Future AI Healthcare Industry"
- **AND** "The", "of", "in", "the" are removed
#### Scenario: Preserve technical terms
- **WHEN** headline is "How Machine Learning Models Learn from Data"
- **THEN** extracted query is "Machine Learning Models Learn Data"
- **AND** technical terms "Machine", "Learning", "Models" are preserved
### Requirement: Query length limit
The system SHALL limit search query length to optimize API results.
#### Scenario: Truncate long query
- **WHEN** extracted keywords exceed 10 words
- **THEN** query is limited to first 5 most significant keywords
- **AND** remaining keywords are dropped
#### Scenario: Preserve short query
- **WHEN** extracted keywords are 5 words or fewer
- **THEN** all keywords are included in query
- **AND** no truncation occurs
### Requirement: URL-safe query encoding
The system SHALL URL-encode queries before sending to provider APIs.
#### Scenario: Encode spaces and special characters
- **WHEN** query is "AI Machine Learning"
- **THEN** encoded query is "AI+Machine+Learning" or "AI%20Machine%20Learning"
- **AND** query is safe for HTTP GET parameters
#### Scenario: Handle Unicode characters
- **WHEN** query contains Unicode like "AI für Deutschland"
- **THEN** Unicode characters are properly percent-encoded
- **AND** API request succeeds without encoding errors
### Requirement: Empty query handling
The system SHALL handle edge cases where no keywords can be extracted.
#### Scenario: Headline with only stop words
- **WHEN** headline is "The and a or but"
- **THEN** system uses fallback query "news technology"
- **AND** image search proceeds with generic query
#### Scenario: Empty headline
- **WHEN** headline is empty string or whitespace only
- **THEN** system uses fallback query "news technology"
- **AND** image search proceeds with generic query

View File

@@ -0,0 +1,79 @@
## ADDED Requirements
### Requirement: Pixabay image retrieval
The system SHALL support retrieving images from Pixabay API when `PIXABAY_API_KEY` is configured.
#### Scenario: Successful Pixabay image search
- **WHEN** Pixabay is enabled in provider chain AND query is "artificial intelligence breakthrough"
- **THEN** system sends GET request to `https://pixabay.com/api/?key={key}&q=artificial+intelligence+breakthrough&image_type=photo&per_page=3&safesearch=true`
- **AND** returns first hit's `webformatURL` as image URL
- **AND** returns credit as `"Photo by {user} on Pixabay"`
#### Scenario: Pixabay returns no results
- **WHEN** Pixabay search returns empty `hits` array
- **THEN** system returns `(None, None)` for this provider
- **AND** fallback chain continues to next provider
#### Scenario: Pixabay API key not configured
- **WHEN** `PIXABAY_API_KEY` environment variable is empty or unset
- **THEN** Pixabay provider is skipped in the fallback chain
### Requirement: Unsplash image retrieval
The system SHALL support retrieving images from Unsplash API when `UNSPLASH_ACCESS_KEY` is configured.
#### Scenario: Successful Unsplash image search
- **WHEN** Unsplash is enabled in provider chain AND query is "machine learning robot"
- **THEN** system sends GET request to `https://api.unsplash.com/search/photos?query=machine+learning+robot&per_page=3` with header `Authorization: Client-ID {key}`
- **AND** returns first result's `urls.regular` as image URL
- **AND** returns credit as `"Photo by {user.name} on Unsplash"`
#### Scenario: Unsplash returns no results
- **WHEN** Unsplash search returns empty `results` array
- **THEN** system returns `(None, None)` for this provider
- **AND** fallback chain continues to next provider
#### Scenario: Unsplash API key not configured
- **WHEN** `UNSPLASH_ACCESS_KEY` environment variable is empty or unset
- **THEN** Unsplash provider is skipped in the fallback chain
### Requirement: Pexels image retrieval
The system SHALL support retrieving images from Pexels API when `PEXELS_API_KEY` is configured.
#### Scenario: Successful Pexels image search
- **WHEN** Pexels is enabled in provider chain AND query is "tech startup office"
- **THEN** system sends GET request to `https://api.pexels.com/v1/search?query=tech+startup+office&per_page=3` with header `Authorization: {key}`
- **AND** returns first photo's `src.large` as image URL
- **AND** returns credit as `"Photo by {photographer} on Pexels"`
#### Scenario: Pexels returns no results
- **WHEN** Pexels search returns empty `photos` array
- **THEN** system returns `(None, None)` for this provider
- **AND** fallback chain continues to next provider
#### Scenario: Pexels API key not configured
- **WHEN** `PEXELS_API_KEY` environment variable is empty or unset
- **THEN** Pexels provider is skipped in the fallback chain
### Requirement: Provider timeout handling
Each provider API call SHALL timeout after 15 seconds.
#### Scenario: Provider request timeout
- **WHEN** provider API does not respond within 15 seconds
- **THEN** system logs timeout warning
- **AND** returns `(None, None)` for this provider
- **AND** fallback chain continues to next provider
### Requirement: Provider error handling
The system SHALL gracefully handle provider API errors without crashing.
#### Scenario: Provider returns HTTP error
- **WHEN** provider API returns 4xx or 5xx status code
- **THEN** system logs error with status code
- **AND** returns `(None, None)` for this provider
- **AND** fallback chain continues to next provider
#### Scenario: Provider returns malformed response
- **WHEN** provider API returns invalid JSON or unexpected schema
- **THEN** system logs parsing error
- **AND** returns `(None, None)` for this provider
- **AND** fallback chain continues to next provider

View File

@@ -0,0 +1,52 @@
## 1. Configuration Setup
- [x] 1.1 Add `ROYALTY_IMAGE_PROVIDERS` config variable to `backend/config.py` with default `pixabay,unsplash,pexels,wikimedia,picsum`
- [x] 1.2 Add `PIXABAY_API_KEY` config variable to `backend/config.py`
- [x] 1.3 Add `UNSPLASH_ACCESS_KEY` config variable to `backend/config.py`
- [x] 1.4 Add `PEXELS_API_KEY` config variable to `backend/config.py`
- [x] 1.5 Update `.env.example` with new provider API key placeholders
## 2. Query Refinement
- [x] 2.1 Create `extract_image_keywords(headline: str) -> str` function in `backend/news_service.py`
- [x] 2.2 Implement stop word removal (articles, prepositions, common verbs)
- [x] 2.3 Implement keyword limit (max 5 significant words)
- [x] 2.4 Handle edge cases: empty headline, only stop words, special characters
## 3. Provider Implementations
- [x] 3.1 Create `fetch_pixabay_image(query: str) -> tuple[str | None, str | None]` function
- [x] 3.2 Implement Pixabay API call with `webformatURL` extraction and `"Photo by {user} on Pixabay"` credit
- [x] 3.3 Create `fetch_unsplash_image(query: str) -> tuple[str | None, str | None]` function
- [x] 3.4 Implement Unsplash API call with `urls.regular` extraction and `"Photo by {user.name} on Unsplash"` credit
- [x] 3.5 Create `fetch_pexels_image(query: str) -> tuple[str | None, str | None]` function
- [x] 3.6 Implement Pexels API call with `src.large` extraction and `"Photo by {photographer} on Pexels"` credit
## 4. Provider Fallback Chain
- [x] 4.1 Create provider registry mapping provider names to fetch functions
- [x] 4.2 Parse `ROYALTY_IMAGE_PROVIDERS` into ordered list at startup
- [x] 4.3 Implement `get_enabled_providers()` that filters by configured API keys
- [x] 4.4 Modify `fetch_royalty_free_image()` to iterate provider chain with fallback
## 5. Integration
- [x] 5.1 Wire refined query extraction into `fetch_royalty_free_image()` call
- [x] 5.2 Preserve MCP endpoint as highest priority (existing behavior)
- [x] 5.3 Preserve Wikimedia and Picsum as fallback providers in chain
- [x] 5.4 Add error logging for each provider failure with provider name
## 6. Documentation
- [x] 6.1 Update README environment variables table with new provider keys
- [x] 6.2 Add provider configuration section to README explaining priority chain
- [x] 6.3 Document attribution requirements for each provider
## 7. Verification
- [x] 7.1 Test Pixabay provider with sample query (requires API key)
- [x] 7.2 Test Unsplash provider with sample query (requires API key)
- [x] 7.3 Test Pexels provider with sample query (requires API key)
- [x] 7.4 Test fallback chain when primary provider fails
- [x] 7.5 Test fallback to Picsum when no API keys configured
- [x] 7.6 Verify attribution format matches provider requirements

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-13

View File

@@ -0,0 +1,88 @@
## Context
`frontend/index.html` currently renders the hero CTA as an external redirect (`window.open(item.source_url)`), and modal sizing is constrained by `max-w-2xl` with `max-h-[90vh]`. TL;DR content does not expose a dedicated loading placeholder, and hero badges can lose readability over bright images. On the backend, `backend/news_service.py` already performs keyword extraction and provider fallback, but defaults remain too generic (`"news technology"`) and do not explicitly prioritize AI-topic fallback behavior.
## Goals / Non-Goals
**Goals:**
- Keep users on-site by making the hero primary CTA open the existing TL;DR modal flow.
- Ensure `LATEST` and relative timestamp remain legible over all hero images in light/dark themes.
- Increase modal usable area (width and near-full-height scrolling behavior) without breaking mobile usability.
- Add a small horizontal shimmer placeholder for TL;DR bullets while modal content initializes.
- Improve image relevance with stronger AI-focused keyword fallback and deterministic generic AI-image fallback when lookup fails.
**Non-Goals:**
- Rebuilding the full feed card architecture.
- Replacing existing provider integrations or adding new third-party image providers.
- Introducing backend sentiment ML models.
- Reworking scheduler or ingestion cadence.
## Decisions
### Decision: Reuse existing modal interaction path for hero CTA
**Decision:** Wire hero CTA to the same `openSummary(item)` behavior used by feed cards.
**Rationale:**
- Reuses existing event tracking and modal rendering logic.
- Avoids duplicate interaction models and reduces regression risk.
**Alternatives considered:**
- Add a second hero-only modal implementation: rejected due to duplicate UI state and maintenance cost.
### Decision: Enforce readability with layered overlay + contrast-safe tokens
**Decision:** Strengthen hero overlay and badge/text color tokens so metadata remains visible independent of image luminance.
**Rationale:**
- Solves visibility issues without image preprocessing.
- Keeps responsive behavior in CSS instead of JS image analysis.
**Alternatives considered:**
- Dynamic luminance detection per image: rejected as unnecessary complexity for current scope.
### Decision: Expand modal dimensions with responsive constraints
**Decision:** Use a wider desktop container (minimum half viewport intent) while preserving mobile full-width behavior and near-full-height scrolling.
**Rationale:**
- Improves readability for summary blocks and TL;DR bullets.
- Keeps accessibility of close controls and keyboard escape path.
**Alternatives considered:**
- Full-screen modal only: rejected due to excessive visual disruption on desktop.
### Decision: Treat TL;DR loading as explicit skeleton state
**Decision:** Add a low-height horizontal shimmer placeholder visible when TL;DR is not yet available.
**Rationale:**
- Reduces perceived latency ambiguity.
- Matches existing skeleton design language already used for images/cards.
### Decision: Improve fallback query semantics for AI-news image retrieval
**Decision:** Enhance keyword fallback to AI-focused defaults (`ai machine learning deep learning`) and add explicit generic AI image fallback contract.
**Rationale:**
- Reduces irrelevant imagery when topic extraction is weak or providers return noisy results.
- Keeps behavior deterministic and testable.
## Risks / Trade-offs
- **[Risk] Wider modal may crowd smaller laptops** -> Mitigation: use responsive width caps with mobile-first breakpoints and overflow handling.
- **[Risk] Hero overlay could darken images too much** -> Mitigation: tune gradient opacity and preserve theme-specific token overrides.
- **[Risk] Fallback image monotony if providers fail frequently** -> Mitigation: keep provider chain first; generic AI fallback only as terminal fallback.
- **[Trade-off] Stronger AI default keywords may reduce non-AI niche relevance** -> Mitigation: apply defaults only when extracted keywords are insufficient.
## Migration Plan
1. Update hero CTA and hero readability styles in `frontend/index.html`.
2. Update modal sizing and TL;DR shimmer loading state in `frontend/index.html`.
3. Update backend keyword fallback and generic AI image fallback behavior in `backend/news_service.py`.
4. Verify behavior manually on desktop/mobile and run relevant checks.
Rollback:
- Revert hero CTA to external link behavior.
- Revert modal class and shimmer additions.
- Revert keyword/default fallback updates in image pipeline.
## Open Questions
- Should generic AI fallback be local static asset only, or deterministic remote URL with local optimization?
- Do we need separate fallback keyword sets per language now, or keep English-focused defaults in this change?

View File

@@ -0,0 +1,27 @@
## Why
The current UX for the homepage hero and summary modal still has high-friction behavior and readability issues, and image relevance is still inconsistent for AI news topics. Fixing these now improves retention, trust, and content quality without changing the core product flow.
## What Changes
- Change hero primary CTA behavior to open the in-site TL;DR summary flow instead of immediately sending users off-site.
- Improve hero readability over images so `LATEST` and relative time metadata remain visible across themes and screen sizes.
- Increase modal width and adjust modal height behavior so long content can use near full-height viewport scrolling.
- Add a short horizontal shimmer placeholder for TL;DR bullet content while summary details are loading.
- Strengthen image relevance by extracting better keywords from news text, adding AI-topic default keywords, and using a generic AI fallback image when providers fail.
## Capabilities
### New Capabilities
- `hero-summary-entry-and-readability`: Define hero CTA in-site summary entry behavior and image-overlay readability requirements for badges, timestamps, headline, and summary text.
- `modal-layout-and-loading-feedback`: Define modal sizing/overflow behavior and TL;DR loading placeholders for clearer perceived loading state.
- `news-image-relevance-and-fallbacks`: Define keyword extraction quality, default AI keyword fallback rules, and generic AI image fallback behavior when no relevant image is found.
### Modified Capabilities
- None.
## Impact
- **Frontend/UI:** `frontend/index.html` (hero CTA wiring, hero overlay/readability styles, modal width/height classes, TL;DR loading skeleton state).
- **Backend/Image Pipeline:** `backend/news_service.py` (keyword extraction and provider fallback behavior for summary images).
- **Assets/Config:** generic AI fallback image path/asset contract and related docs may be updated.

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Hero primary action opens in-site TL;DR summary
The homepage hero primary CTA SHALL open the in-site summary modal for the hero article instead of navigating off-site.
#### Scenario: Hero CTA opens summary modal
- **WHEN** a user clicks the hero primary CTA
- **THEN** the system opens the summary modal for the current hero article
- **AND** no external navigation is triggered by that CTA
### Requirement: Hero source link remains available as secondary action
The hero section SHALL keep an explicit secondary source-link action for external navigation.
#### Scenario: Source link navigates externally
- **WHEN** a user clicks the hero source link
- **THEN** the system opens the article source URL in a new tab
### Requirement: Hero metadata readability over images
Hero metadata (`LATEST`, relative time, headline, and summary) SHALL remain visually legible across bright and dark images on desktop and mobile.
#### Scenario: Bright image background
- **WHEN** the hero image contains bright regions under metadata text
- **THEN** overlay and text styles preserve readable contrast for metadata and headline blocks
#### Scenario: Mobile viewport readability
- **WHEN** the hero renders on a mobile viewport
- **THEN** metadata and title remain readable without overlapping controls or clipping

View File

@@ -0,0 +1,33 @@
## ADDED Requirements
### Requirement: Modal width supports comfortable desktop reading
The summary modal SHALL render with a desktop width that is approximately half of viewport width or larger when space allows, while remaining responsive on small screens.
#### Scenario: Desktop width expansion
- **WHEN** the modal opens on desktop viewport widths
- **THEN** the modal content area renders wider than the previous narrow baseline
- **AND** text blocks are readable without excessive line wrapping
#### Scenario: Mobile responsiveness
- **WHEN** the modal opens on small mobile viewport widths
- **THEN** modal width remains fully usable without horizontal overflow
### Requirement: Modal height supports near-full viewport scrolling
The summary modal SHALL use near full-height viewport behavior when content overflows.
#### Scenario: Overflowing summary content
- **WHEN** summary content exceeds modal viewport height
- **THEN** modal body remains scrollable with close controls accessible
- **AND** modal container uses near full viewport height constraints
### Requirement: TL;DR loading placeholder is explicit
The modal SHALL show a horizontal shimmer placeholder for TL;DR content while TL;DR bullets are not yet available.
#### Scenario: TL;DR pending state
- **WHEN** the summary modal is open and TL;DR bullet data is pending
- **THEN** the system displays a low-height horizontal shimmer placeholder
#### Scenario: TL;DR loaded state
- **WHEN** TL;DR bullet data becomes available
- **THEN** shimmer placeholder is removed
- **AND** TL;DR bullet list is rendered

View File

@@ -0,0 +1,23 @@
## ADDED Requirements
### Requirement: Image query fallback uses AI-focused defaults
When extracted image keywords are insufficient, the system SHALL use AI-focused default fallback terms.
#### Scenario: Empty keyword extraction
- **WHEN** keyword extraction yields no usable topic keywords
- **THEN** the system uses default fallback terms including AI-domain keywords (for example `ai`, `machine learning`, `deep learning`)
### Requirement: Generic AI image fallback is guaranteed
If provider lookups fail to return a usable summary image, the system SHALL use a generic AI-themed fallback image.
#### Scenario: Provider chain failure
- **WHEN** all configured image providers return no usable image
- **THEN** the system assigns a generic AI fallback image URL/path for summary image
### Requirement: Fallback behavior remains context-aware first
The system SHALL attempt context-aware keyword retrieval before any generic fallback image is selected.
#### Scenario: Context-aware attempt precedes fallback
- **WHEN** summary image selection runs for a news item
- **THEN** the system first attempts provider queries from extracted context-aware keywords
- **AND** only falls back to generic AI image if these attempts fail

View File

@@ -0,0 +1,29 @@
## 1. Hero UX Fixes
- [x] 1.1 Update hero primary CTA in `frontend/index.html` to open the in-site summary modal for the hero item.
- [x] 1.2 Keep hero source link as a secondary external action and preserve tracking events.
- [x] 1.3 Strengthen hero overlay and metadata styles so `LATEST` and relative time remain readable across image brightness levels.
## 2. Modal Layout Improvements
- [x] 2.1 Increase modal width for desktop while keeping mobile-safe responsive behavior.
- [x] 2.2 Update modal height/overflow behavior to support near full-height scrolling for long content.
- [x] 2.3 Verify close controls and keyboard escape behavior remain intact after sizing changes.
## 3. TL;DR Loading Feedback
- [x] 3.1 Add a dedicated horizontal shimmer placeholder for TL;DR content while modal summary data is initializing.
- [x] 3.2 Hide the TL;DR shimmer placeholder when TL;DR bullets are available and render the bullet list.
## 4. Image Relevance and Fallback
- [x] 4.1 Update keyword fallback logic in `backend/news_service.py` to use AI-focused default terms when extracted keywords are insufficient.
- [x] 4.2 Add explicit generic AI summary-image fallback behavior when provider chain returns no usable image.
- [x] 4.3 Ensure context-aware keyword/provider attempts always run before generic AI fallback selection.
## 5. Validation
- [x] 5.1 Verify hero CTA opens summary modal instead of navigating away.
- [x] 5.2 Verify modal sizing on desktop/mobile and long-content scrolling behavior.
- [x] 5.3 Verify TL;DR shimmer appears during pending state and disappears after load.
- [x] 5.4 Verify generic AI fallback image is used when provider chain fails.

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-13

View File

@@ -0,0 +1,85 @@
## Context
Current operations are concentrated in `backend/cli.py` with a single `force-fetch` command and no unified admin maintenance suite. Operational actions such as archive cleanup, translation regeneration, image refresh, and cache/news reset require manual code/DB operations. Existing backend services already contain reusable primitives: ingestion (`process_and_store_news`), archival helpers (`archive_old_news`, `delete_archived_news`), and translation generation pipelines in `backend/news_service.py`.
## Goals / Non-Goals
**Goals:**
- Introduce an admin command suite that consolidates common maintenance and recovery actions.
- Implement queued image refetch for latest 30 items, sequentially processed with exponential backoff.
- Improve image refresh relevance by combining keyword and mood/sentiment cues with deterministic fallback behavior.
- Provide safe destructive operations (`clear-news`, `clean-archive`, cache clear) with operator guardrails.
- Add translation regeneration and parameterized fetch count command to reduce manual intervention.
**Non-Goals:**
- Replacing the scheduled ingestion model.
- Introducing external queue infrastructure (RabbitMQ/Redis workers) for this phase.
- Redesigning storage models or adding new DB tables unless strictly necessary.
- Building a web-based admin dashboard in this change.
## Decisions
### Decision: Extend existing CLI with subcommands
**Decision:** Expand `backend/cli.py` into a multi-subcommand admin command suite.
**Rationale:**
- Reuses existing deployment/runtime assumptions.
- Keeps operations scriptable via terminal/cron and avoids UI scope expansion.
**Alternatives considered:**
- New standalone admin binary: rejected due to duplicated bootstrapping/runtime checks.
### Decision: Queue image refetch in-process with sequential workers
**Decision:** Build a bounded in-memory queue for latest 30 items and process one-by-one.
**Rationale:**
- Meets rate-limit resilience requirement without new infrastructure.
- Deterministic and easy to monitor in command output.
**Alternatives considered:**
- Parallel refetch workers: rejected due to higher provider throttling risk.
### Decision: Exponential backoff for external image calls
**Decision:** Apply exponential backoff with capped retries for rate-limited or transient failures.
**Rationale:**
- Reduces burst retry amplification.
- Improves success rate under API pressure.
### Decision: Safety-first destructive command ergonomics
**Decision:** Destructive operations require explicit confirmation/flags and support dry-run where meaningful.
**Rationale:**
- Prevents accidental data loss.
- Makes admin actions auditable and predictable.
### Decision: Fetch-N command reuses ingestion pipeline
**Decision:** Add a fetch-count option that drives existing ingestion/fetch flow rather than building a second implementation.
**Rationale:**
- Preserves deduplication/retry logic and minimizes divergence.
## Risks / Trade-offs
- **[Risk] Operator misuse of destructive commands** -> Mitigation: confirmation gate + explicit flags + dry-run.
- **[Risk] Backoff can increase command runtime** -> Mitigation: cap retries and print progress ETA-style output.
- **[Risk] Queue processing interruption mid-run** -> Mitigation: idempotent per-item updates and resumable reruns.
- **[Trade-off] In-process queue is simpler but non-distributed** -> Mitigation: acceptable for admin-invoked maintenance scope.
## Migration Plan
1. Extend CLI parser with admin subcommands and argument validation.
2. Add reusable maintenance handlers (archive clean, cache clear, clear news, rebuild, regenerate translations, fetch-n).
3. Implement queued image-refetch handler with exponential backoff and per-item progress logs.
4. Add safe guards (`--confirm`, optional `--dry-run`) for destructive operations.
5. Document command usage and examples in README.
Rollback:
- Keep existing `force-fetch` path intact.
- Revert new subcommands while preserving unaffected ingestion pipeline.
## Open Questions
- What cache layers are considered in-scope for `clear-cache` (in-memory only vs additional filesystem cache)?
- Should `rebuild-site` chain all maintenance actions or remain a defined subset with explicit steps?
- Should `fetch n` enforce an upper bound to avoid accidental high-cost runs?

View File

@@ -0,0 +1,34 @@
## Why
Operational recovery and maintenance flows are currently fragmented, manual, and risky for site admins during outages or data-quality incidents. We need a reliable admin command surface that supports safe reset/rebuild workflows without requiring ad-hoc scripts.
## What Changes
- Add a unified admin CLI command with maintenance subcommands for common operational tasks.
- Add `refetch-images` mode that processes the latest 30 news items through a queue, one-by-one, with exponential backoff to reduce provider/API rate-limit failures.
- Make image refetch context-aware using article keywords plus mood/sentiment signals to improve image relevance.
- Add archive cleanup command for archived news maintenance.
- Add cache clear command for application cache invalidation.
- Add clear-news command for wiping existing news items.
- Add rebuild-site command to re-run full rebuild workflow.
- Add regenerate-translations command for all supported languages.
- Add fetch command supporting user-provided `n` article count.
- Add guardrails and operator UX improvements (dry-run where applicable, progress output, failure summaries, and safe defaults).
## Capabilities
### New Capabilities
- `admin-maintenance-command-suite`: Defines a single admin command surface with subcommands for refetch images, archive cleanup, cache clear, news clear, rebuild, translation regeneration, and fetch-n workflows.
- `queued-image-refetch-with-backoff`: Defines queue-based image refetch behavior for latest 30 items with sequential processing and exponential backoff for rate-limit resilience.
- `context-aware-image-selection-recovery`: Defines keyword + sentiment/mood-informed image query rules and generic AI fallback behavior for refetch operations.
- `site-admin-safety-and-ergonomics`: Defines operational safeguards and usability requirements (dry-run, confirmation for destructive actions, progress reporting, and actionable error summaries).
### Modified Capabilities
- None.
## Impact
- **Backend/CLI:** new admin command entrypoints and orchestration logic for maintenance workflows.
- **News/Image Pipeline:** image re-fetch and optimization logic, retry/backoff strategy, and relevance heuristics.
- **Data Layer:** archive cleanup, cache invalidation, news-clear, translation regeneration, and controlled fetch-count ingestion operations.
- **Operations:** faster incident recovery, reduced manual intervention, and safer reset/rebuild procedures for admins.

View File

@@ -0,0 +1,32 @@
## ADDED Requirements
### Requirement: Unified admin command surface
The system SHALL provide a single admin CLI command family exposing maintenance subcommands.
#### Scenario: Subcommand discovery
- **WHEN** an operator runs the admin command help output
- **THEN** available subcommands include refetch-images, clean-archive, clear-cache, clear-news, rebuild-site, regenerate-translations, and fetch
### Requirement: Fetch command supports configurable article count
The admin fetch command SHALL support an operator-provided article count parameter.
#### Scenario: Fetch with explicit count
- **WHEN** an operator invokes fetch with `n=25`
- **THEN** the command executes ingestion targeting the requested count
- **AND** prints completion summary including processed/stored counts
### Requirement: Translation regeneration command
The system SHALL provide a command to regenerate translations for existing articles.
#### Scenario: Regenerate translations run
- **WHEN** an operator runs regenerate-translations
- **THEN** the system attempts translation regeneration for supported languages
- **AND** outputs success/failure totals
### Requirement: Rebuild site command
The system SHALL provide a rebuild-site command that executes the defined rebuild workflow.
#### Scenario: Rebuild execution
- **WHEN** an operator runs rebuild-site
- **THEN** the system executes the documented rebuild steps in deterministic order
- **AND** prints a final success/failure summary

View File

@@ -0,0 +1,23 @@
## ADDED Requirements
### Requirement: Context-aware image query generation
Image refetch SHALL construct provider queries from article context including keywords and mood/sentiment cues.
#### Scenario: Context-enriched query
- **WHEN** a queued article is processed for image refetch
- **THEN** the system derives query terms from article headline/summary content
- **AND** includes mood/sentiment-informed cues to improve relevance
### Requirement: AI-domain fallback keywords
When context extraction is insufficient, the system SHALL use AI-domain fallback keywords.
#### Scenario: Empty or weak context extraction
- **WHEN** extracted context terms are empty or below quality threshold
- **THEN** the system applies fallback terms such as `ai`, `machine learning`, `deep learning`
### Requirement: Generic AI fallback image on terminal failure
If no usable provider image is returned, the system SHALL assign a generic AI fallback image.
#### Scenario: Provider chain exhaustion
- **WHEN** all provider attempts fail or return unusable images
- **THEN** the system stores a generic AI fallback image for the article

View File

@@ -0,0 +1,33 @@
## ADDED Requirements
### Requirement: Latest-30 queue construction
The refetch-images command SHALL enqueue up to the latest 30 news items for processing.
#### Scenario: Queue population
- **WHEN** refetch-images is started
- **THEN** the command loads recent news items
- **AND** enqueues at most 30 items ordered from newest to oldest
### Requirement: Sequential processing
The image refetch queue SHALL be processed one item at a time.
#### Scenario: Single-item worker behavior
- **WHEN** queue processing runs
- **THEN** only one queued item is processed concurrently
- **AND** next item starts only after current item completes/fails
### Requirement: Exponential backoff on transient failures
The queue processor SHALL retry transient image-provider failures using exponential backoff.
#### Scenario: Rate-limited provider response
- **WHEN** provider call returns rate-limit or transient error
- **THEN** command retries with exponential delay between attempts
- **AND** stops retrying after configured max attempts
### Requirement: Progress and completion reporting
The command SHALL emit operator-readable progress and final summary output.
#### Scenario: Queue progress output
- **WHEN** queue processing is in progress
- **THEN** the command prints per-item progress (processed/succeeded/failed)
- **AND** prints final totals on completion

View File

@@ -0,0 +1,25 @@
## ADDED Requirements
### Requirement: Confirmation guard for destructive commands
Destructive admin commands SHALL require explicit confirmation before execution.
#### Scenario: Missing confirmation flag
- **WHEN** an operator runs clear-news or clean-archive without required confirmation
- **THEN** the command exits without applying destructive changes
- **AND** prints guidance for explicit confirmation usage
### Requirement: Dry-run support where applicable
Maintenance commands SHALL provide dry-run mode for previewing effects where feasible.
#### Scenario: Dry-run preview
- **WHEN** an operator invokes a command with dry-run mode
- **THEN** the command reports intended actions and affected counts
- **AND** persists no data changes
### Requirement: Actionable failure summaries
Admin commands SHALL output actionable errors and final status summaries.
#### Scenario: Partial failure reporting
- **WHEN** a maintenance command partially fails
- **THEN** output includes succeeded/failed counts
- **AND** includes actionable next-step guidance

View File

@@ -0,0 +1,41 @@
## 1. Admin CLI Foundation
- [x] 1.1 Extend `backend/cli.py` parser with an admin maintenance command group and subcommands.
- [x] 1.2 Add argument validation for subcommands including `fetch --count n`.
- [x] 1.3 Keep existing `force-fetch` command behavior intact.
## 2. Queue-Based Image Refetch
- [x] 2.1 Implement latest-30 article selection query for refetch queue.
- [x] 2.2 Implement in-process sequential queue worker for refetch-images.
- [x] 2.3 Add exponential backoff retry logic for transient/rate-limit provider failures.
- [x] 2.4 Add per-item progress logging and final queue summary output.
## 3. Context-Aware Image Recovery
- [x] 3.1 Add context-aware query generation using article keywords plus mood/sentiment cues.
- [x] 3.2 Add AI-domain fallback keyword set when extracted context is weak.
- [x] 3.3 Add explicit generic AI fallback image assignment for terminal provider failure.
- [x] 3.4 Ensure refetched images are optimized and persisted using existing image pipeline contracts.
## 4. Maintenance Operations
- [x] 4.1 Implement clean-archive command using existing archival repository helpers.
- [x] 4.2 Implement clear-cache command for configured cache layers in scope.
- [x] 4.3 Implement clear-news command for non-archived and/or configured scope items.
- [x] 4.4 Implement rebuild-site command to execute defined rebuild sequence.
- [x] 4.5 Implement regenerate-translations command across supported languages.
- [x] 4.6 Implement fetch command with configurable article count.
## 5. Safety and Operator UX
- [x] 5.1 Add explicit confirmation requirement for destructive commands.
- [x] 5.2 Add dry-run support for commands where preview is feasible.
- [x] 5.3 Standardize command output format for success/failure totals and next-step hints.
## 6. Documentation and Validation
- [x] 6.1 Update README command documentation with examples for each new subcommand.
- [x] 6.2 Add operational guardrail notes (confirmation, dry-run, backoff behavior).
- [x] 6.3 Validate command help output and argument handling.
- [x] 6.4 Run end-to-end manual checks for refetch-images queue behavior and failure recovery output.