Initial Commit

This commit is contained in:
2026-02-12 16:50:29 -05:00
commit a1da041f14
74 changed files with 6140 additions and 0 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-12

View File

@@ -0,0 +1,111 @@
## Context
ClawFort needs a stunning one-page placeholder website that automatically aggregates and displays AI news hourly. The site must be containerized, use Perplexity API for news generation, and feature infinite scroll with 30-day retention.
**Current State:** Greenfield project - no existing codebase.
**Constraints:**
- Must use Perplexity API (API key via environment variable)
- Containerized deployment (Docker)
- Lean JavaScript framework for frontend
- 30-day news retention with archiving
- Hourly automated updates
## Goals / Non-Goals
**Goals:**
- Stunning one-page website with ClawFort branding
- Hourly AI news aggregation via Perplexity API
- Dynamic hero block with featured news and image
- Infinite scroll news feed (10 initial items)
- 30-day retention with automatic archiving
- Source attribution for news and images
- Fully containerized deployment
- Responsive design
**Non-Goals:**
- User authentication/accounts
- Manual news curation interface
- Real-time updates (polling only)
- Multi-language support
- CMS integration
- SEO optimization beyond basic meta tags
## Decisions
### Architecture: Monolithic Container
**Decision:** Single container with frontend + backend + SQLite
**Rationale:** Simplicity for a placeholder site, easy deployment, no external database dependencies
**Alternative:** Microservices with separate DB container - rejected as overkill for this scope
### Frontend Framework: Alpine.js + Tailwind CSS
**Decision:** Alpine.js for lean reactivity, Tailwind for styling
**Rationale:** Minimal bundle size (~15kb), no build step complexity, perfect for one-page sites
**Alternative:** React/Vue - rejected as too heavy for simple infinite scroll and hero display
### Backend: Python (FastAPI) + APScheduler
**Decision:** FastAPI for REST API, APScheduler for cron-like jobs
**Rationale:** Fast to develop, excellent async support, built-in OpenAPI docs, simple scheduling
**Alternative:** Node.js/Express - rejected; Python better for data processing and Perplexity integration
### Database: SQLite with SQLAlchemy
**Decision:** SQLite for zero-config persistence
**Rationale:** No separate DB container needed, sufficient for 30-day news retention (~1000-2000 records)
**Alternative:** PostgreSQL - rejected as adds deployment complexity
### News Aggregation Strategy
**Decision:** Hourly cron job queries Perplexity for "latest AI news" with image generation
**Rationale:** Simple, reliable, cost-effective
**Implementation:**
- Perplexity API call: "What are the latest AI news from the last hour?"
- Store: headline, summary, source URL, image URL, timestamp
- Attribution: Display source name and image credit
### Image Strategy
**Decision:** Use Perplexity to suggest relevant images or generate via DALL-E if available, with local image optimization
**Rationale:** Consistent with AI theme, no copyright concerns, plus configurable compression
**Implementation:**
- Download and optimize images locally using Pillow
- Configurable quality setting via `IMAGE_QUALITY` env var (1-100, default 85)
- Store optimized images in `/app/static/images/`
- Serve optimized versions, fallback to original URL if optimization fails
**Alternative:** Unsplash API - rejected to keep dependencies minimal
### Infinite Scroll Implementation
**Decision:** Cursor-based pagination with Intersection Observer API
**Rationale:** Efficient for large datasets, simple Alpine.js integration
**Page Size:** 10 items per request
### Archive Strategy
**Decision:** Soft delete (archived flag) + nightly cleanup job
**Rationale:** Easy to implement, data recoverable if needed
**Cleanup:** Move items >30 days to archive table or delete
## Risks / Trade-offs
**[Risk] Perplexity API rate limits or downtime** → Mitigation: Implement exponential backoff, cache last successful fetch, display cached content with "last updated" timestamp, fallback to OpenRouter API if configured
**[Risk] Container storage grows unbounded** → Mitigation: SQLite WAL mode, volume mounts for persistence, 30-day hard limit on retention
**[Risk] News quality varies** → Mitigation: Basic filtering (require title + summary), manual blacklist capability in config
**[Risk] Cold start performance** → Mitigation: SQLite connection pooling, frontend CDN-ready static assets
**[Trade-off] SQLite vs PostgreSQL** → SQLite limits concurrent writes but acceptable for read-heavy news site
**[Trade-off] Single container vs microservices** → Easier deployment but less scalable; acceptable for placeholder site
## Migration Plan
1. **Development:** Local Docker Compose setup
2. **Environment:** Configure `PERPLEXITY_API_KEY` in `.env`
3. **Build:** `docker build -t clawfort-site .`
4. **Run:** `docker run -e PERPLEXITY_API_KEY=xxx -p 8000:8000 clawfort-site`
5. **Data:** SQLite volume mount for persistence across restarts
## Open Questions (Resolved)
1. **Admin panel?** → Deferred to future
2. **Image optimization?** → Yes, local optimization with Pillow, configurable quality via `IMAGE_QUALITY` env var
3. **Analytics?** → Umami integration with `UMAMI_SCRIPT_URL` and `UMAMI_WEBSITE_ID` env vars, track page views, scroll events, and CTA clicks
4. **API cost monitoring?** → Log Perplexity usage, fallback to OpenRouter API if `OPENROUTER_API_KEY` configured

View File

@@ -0,0 +1,54 @@
## Why
ClawFort needs a stunning one-page placeholder website that automatically generates and displays AI news hourly, creating a dynamic, always-fresh brand presence without manual content curation. The site will serve as a living showcase of AI capabilities while building brand recognition.
## What Changes
- **New Capabilities:**
- Automated AI news aggregation via Perplexity API (hourly updates)
- Dynamic hero section with featured news and images
- Infinite scroll news feed with 1-month retention
- Archive system for older news items
- Containerized deployment (Docker)
- Responsive single-page design with lean JavaScript framework
- **Frontend:**
- One-page website with hero block
- Infinite scroll news feed (latest 10 on load)
- News attribution to sources
- Image credit display
- Responsive design
- **Backend:**
- News aggregation service (hourly cron job)
- Perplexity API integration
- News storage with 30-day retention
- Archive management
- REST API for frontend
- **Infrastructure:**
- Docker containerization
- Environment-based configuration
- Perplexity API key management
## Capabilities
### New Capabilities
- `news-aggregator`: Automated AI news collection via Perplexity API with hourly scheduling
- `news-storage`: Database storage with 30-day retention and archive management
- `hero-display`: Dynamic hero block with featured news and image attribution
- `infinite-scroll`: Frontend infinite scroll with lazy loading (10 initial, paginated)
- `containerized-deployment`: Docker-based deployment with environment configuration
- `responsive-frontend`: Single-page application with lean JavaScript framework
### Modified Capabilities
- None (new project)
## Impact
- **Code:** New full-stack application (frontend + backend)
- **APIs:** Perplexity API integration required
- **Dependencies:** Docker, Node.js/Python runtime, database (SQLite/PostgreSQL)
- **Infrastructure:** Container orchestration support
- **Environment:** `PERPLEXITY_API_KEY` required
- **Data:** 30-day rolling news archive with automatic cleanup

View File

@@ -0,0 +1,45 @@
## ADDED Requirements
### Requirement: Containerized deployment
The system SHALL run entirely within Docker containers with all dependencies included.
#### Scenario: Single container build
- **WHEN** building the Docker image
- **THEN** the Dockerfile SHALL include Python runtime, Node.js (for Tailwind if needed), and all application code
- **AND** expose port 8000 for web traffic
#### Scenario: Environment configuration
- **WHEN** running the container
- **THEN** the system SHALL read PERPLEXITY_API_KEY from environment variables
- **AND** fail to start if the key is missing or invalid
- **AND** support optional configuration for retention days (default: 30)
- **AND** support optional IMAGE_QUALITY for image compression (default: 85)
- **AND** support optional OPENROUTER_API_KEY for fallback LLM provider
- **AND** support optional UMAMI_SCRIPT_URL and UMAMI_WEBSITE_ID for analytics
#### Scenario: Data persistence
- **WHEN** the container restarts
- **THEN** the SQLite database SHALL persist via Docker volume mount
- **AND** news data SHALL remain intact across restarts
## ADDED Requirements
### Requirement: Responsive single-page design
The system SHALL provide a stunning, responsive one-page website with ClawFort branding.
#### Scenario: Brand consistency
- **WHEN** viewing the website
- **THEN** the design SHALL feature ClawFort branding (logo, colors, typography)
- **AND** maintain visual consistency across all sections
#### Scenario: Responsive layout
- **WHEN** viewing on mobile, tablet, or desktop
- **THEN** the layout SHALL adapt appropriately
- **AND** the hero block SHALL resize proportionally
- **AND** the news feed SHALL use appropriate column layouts
#### Scenario: Performance
- **WHEN** loading the page
- **THEN** initial page load SHALL complete within 2 seconds
- **AND** images SHALL lazy load outside viewport
- **AND** JavaScript bundle SHALL be under 100KB gzipped

View File

@@ -0,0 +1,55 @@
## ADDED Requirements
### Requirement: Hero block display
The system SHALL display the most recent news item as a featured hero block with full attribution.
#### Scenario: Hero rendering
- **WHEN** the page loads
- **THEN** the hero block SHALL display the latest news headline, summary, and featured image
- **AND** show source attribution (e.g., "Via: TechCrunch")
- **AND** show image credit (e.g., "Image: DALL-E")
#### Scenario: Hero update
- **WHEN** new news is fetched hourly
- **THEN** the hero block SHALL automatically update to show the newest item
- **AND** the previous hero item SHALL move to the news feed
## ADDED Requirements
### Requirement: Infinite scroll news feed
The system SHALL display news items in reverse chronological order with infinite scroll pagination.
#### Scenario: Initial load
- **WHEN** the page first loads
- **THEN** the system SHALL display the 10 most recent non-archived news items
- **AND** exclude the hero item from the feed
#### Scenario: Infinite scroll
- **WHEN** the user scrolls to the bottom of the feed
- **THEN** the system SHALL fetch the next 10 news items via API
- **AND** append them to the feed without page reload
- **AND** show a loading indicator during fetch
#### Scenario: End of feed
- **WHEN** all non-archived news items have been loaded
- **THEN** the system SHALL display "No more news" message
- **AND** disable further scroll triggers
### Requirement: News attribution display
The system SHALL clearly attribute all news content and images to their sources.
#### Scenario: Source attribution
- **WHEN** displaying any news item
- **THEN** the system SHALL show the original source name and link
- **AND** display image credit if available
#### Scenario: Perplexity attribution
- **WHEN** displaying aggregated content
- **THEN** the system SHALL include "Powered by Perplexity" in the footer
#### Scenario: Analytics tracking
- **WHEN** Umami analytics is configured via `UMAMI_SCRIPT_URL` and `UMAMI_WEBSITE_ID`
- **THEN** the system SHALL inject Umami tracking script into page head
- **AND** track page view events on initial load
- **AND** track scroll depth events (25%, 50%, 75%, 100%)
- **AND** track CTA click events (news item clicks, source link clicks)

View File

@@ -0,0 +1,56 @@
## ADDED Requirements
### Requirement: News aggregation via Perplexity API
The system SHALL fetch AI news hourly from Perplexity API and store it with full attribution.
#### Scenario: Hourly news fetch
- **WHEN** the scheduled job runs every hour
- **THEN** the system calls Perplexity API with query "latest AI news"
- **AND** stores the response with headline, summary, source URL, and timestamp
#### Scenario: API error handling
- **WHEN** Perplexity API returns an error or timeout
- **THEN** the system logs the error with cost tracking
- **AND** retries with exponential backoff up to 3 times
- **AND** falls back to OpenRouter API if `OPENROUTER_API_KEY` is configured
- **AND** continues using cached content if all retries and fallback fail
### Requirement: Featured image generation
The system SHALL generate or fetch a relevant featured image for each news item.
#### Scenario: Image acquisition
- **WHEN** a new news item is fetched
- **THEN** the system SHALL request a relevant image URL from Perplexity
- **AND** download and optimize the image locally using Pillow
- **AND** apply quality compression based on `IMAGE_QUALITY` env var (1-100, default 85)
- **AND** store the optimized image path and original image credit/source information
#### Scenario: Image optimization configuration
- **WHEN** the system processes an image
- **THEN** it SHALL read `IMAGE_QUALITY` from environment (default: 85)
- **AND** apply JPEG compression at specified quality level
- **AND** resize images exceeding 1200px width while maintaining aspect ratio
- **AND** store optimized images in `/app/static/images/` directory
#### Scenario: Image fallback
- **WHEN** image generation fails or returns no result
- **THEN** the system SHALL use a default ClawFort branded placeholder image
## ADDED Requirements
### Requirement: News data persistence with retention
The system SHALL store news items for exactly 30 days with automatic archiving.
#### Scenario: News storage
- **WHEN** a news item is fetched from Perplexity
- **THEN** the system SHALL store it in SQLite with fields: id, headline, summary, source_url, image_url, image_credit, published_at, created_at
- **AND** set archived=false by default
#### Scenario: Automatic archiving
- **WHEN** a nightly cleanup job runs
- **THEN** the system SHALL mark all news items older than 30 days as archived=true
- **AND** delete archived items older than 60 days permanently
#### Scenario: Duplicate prevention
- **WHEN** fetching news that matches an existing headline (within 24 hours)
- **THEN** the system SHALL skip insertion to prevent duplicates

View File

@@ -0,0 +1,82 @@
## 1. Project Setup
- [x] 1.1 Create project directory structure (backend/, frontend/, docker/)
- [x] 1.2 Initialize Python project with pyproject.toml (FastAPI, SQLAlchemy, APScheduler, httpx)
- [x] 1.3 Create requirements.txt for Docker build
- [x] 1.4 Set up Tailwind CSS configuration
- [x] 1.5 Create .env.example with all environment variables (PERPLEXITY_API_KEY, IMAGE_QUALITY, OPENROUTER_API_KEY, UMAMI_SCRIPT_URL, UMAMI_WEBSITE_ID)
## 2. Database Layer
- [x] 2.1 Create SQLAlchemy models (NewsItem with fields: id, headline, summary, source_url, image_url, image_credit, published_at, created_at, archived)
- [x] 2.2 Create database initialization and migration scripts
- [x] 2.3 Implement database connection management with SQLite
- [x] 2.4 Create repository functions (create_news, get_recent_news, get_news_paginated, archive_old_news, delete_archived_news)
## 3. News Aggregation Service
- [x] 3.1 Implement Perplexity API client with httpx and cost logging
- [x] 3.2 Create news fetch function with query "latest AI news"
- [x] 3.3 Implement exponential backoff retry logic (3 attempts)
- [x] 3.4 Add duplicate detection (headline match within 24h)
- [x] 3.5 Create hourly scheduled job with APScheduler
- [x] 3.6 Implement image URL fetching from Perplexity
- [x] 3.7 Add image download and optimization with Pillow (configurable quality)
- [x] 3.8 Implement OpenRouter API fallback for news fetching
- [x] 3.9 Add default placeholder image fallback
## 4. Backend API
- [x] 4.1 Create FastAPI application structure
- [x] 4.2 Implement GET /api/news endpoint with pagination (cursor-based)
- [x] 4.3 Implement GET /api/news/latest endpoint for hero block
- [x] 4.4 Add CORS middleware for frontend access
- [x] 4.5 Create Pydantic schemas for API responses
- [x] 4.6 Implement health check endpoint
- [x] 4.7 Add API error handling and logging
## 5. Frontend Implementation
- [x] 5.1 Create HTML structure with ClawFort branding
- [x] 5.2 Implement hero block with Alpine.js (latest news display)
- [x] 5.3 Create news feed component with Alpine.js
- [x] 5.4 Implement infinite scroll with Intersection Observer API
- [x] 5.5 Add loading indicators and "No more news" message
- [x] 5.6 Implement source attribution display
- [x] 5.7 Add image lazy loading
- [x] 5.8 Style with Tailwind CSS (responsive design)
- [x] 5.9 Add "Powered by Perplexity" footer attribution
- [x] 5.10 Implement Umami analytics integration (conditional on env vars)
- [x] 5.11 Add analytics events: page view, scroll depth (25/50/75/100%), CTA clicks
## 6. Archive Management
- [x] 6.1 Implement nightly cleanup job (archive >30 days)
- [x] 6.2 Create permanent deletion job (>60 days archived)
- [x] 6.3 Add retention configuration (default 30 days)
## 7. Docker Containerization
- [x] 7.1 Create Dockerfile with multi-stage build (Python + static assets)
- [x] 7.2 Create docker-compose.yml for local development
- [x] 7.3 Add volume mount for SQLite persistence
- [x] 7.4 Configure environment variable handling
- [x] 7.5 Optimize image size (slim Python base)
- [x] 7.6 Add .dockerignore file
## 8. Testing & Validation
- [x] 8.1 Test Perplexity API integration manually
- [x] 8.2 Verify hourly news fetching works
- [x] 8.3 Test infinite scroll pagination
- [x] 8.4 Verify responsive design on mobile/desktop
- [x] 8.5 Test container build and run
- [x] 8.6 Verify data persistence across container restarts
- [x] 8.7 Test archive cleanup functionality
## 9. Documentation
- [x] 9.1 Create README.md with setup instructions
- [x] 9.2 Document environment variables
- [x] 9.3 Add deployment instructions
- [x] 9.4 Document API endpoints

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-12

View File

@@ -0,0 +1,93 @@
## Context
ClawFort currently performs automated hourly news ingestion through APScheduler (`scheduled_news_fetch()` in `backend/news_service.py`) and the same pipeline handles retries, deduplication, image optimization, and persistence. There is no operator-facing command to run this pipeline on demand.
The change adds an explicit manual trigger path for operations use cases:
- first-time bootstrap (populate content immediately after setup)
- recovery after failed external API calls
- ad-hoc operational refresh without waiting for scheduler cadence
Constraints:
- Reuse existing fetch pipeline to avoid logic drift
- Keep behavior idempotent with existing duplicate detection
- Preserve scheduler behavior; manual runs must not mutate scheduler configuration
## Goals / Non-Goals
**Goals:**
- Provide a Python command to force an immediate news fetch.
- Reuse existing retry, dedup, and storage logic.
- Return clear terminal output and process exit status for automation.
- Keep command safe to run repeatedly.
**Non-Goals:**
- Replacing APScheduler-based hourly fetch.
- Introducing new API endpoints for manual triggering.
- Changing data schema or retention policy.
- Building a full operator dashboard.
## Decisions
### Decision: Add a dedicated CLI entrypoint module
**Decision:** Add a small CLI entrypoint under backend (for example `backend/cli.py`) with a subcommand that invokes the fetch pipeline.
**Rationale:**
- Keeps operational workflow explicit and scriptable.
- Avoids coupling manual trigger behavior to HTTP routes.
- Works in local dev and containerized runtime.
**Alternatives considered:**
- Add an admin HTTP endpoint: rejected due to unnecessary security exposure.
- Trigger APScheduler internals directly: rejected to avoid scheduler-state side effects.
### Decision: Invoke the existing news pipeline directly
**Decision:** The command should call `process_and_store_news()` (or the existing sync wrapper) instead of implementing parallel fetch logic.
**Rationale:**
- Guarantees parity with scheduled runs.
- Reuses retry/backoff, fallback provider behavior, image handling, and dedup checks.
- Minimizes maintenance overhead.
**Alternatives considered:**
- New command-specific fetch implementation: rejected due to drift risk.
### Decision: Standardize command exit semantics
**Decision:** Exit code `0` for successful command execution (including zero new items), non-zero for operational failures (for example unhandled exceptions or fatal setup errors).
**Rationale:**
- Enables CI/cron/operator scripts to react deterministically.
- Matches common CLI conventions.
**Alternatives considered:**
- Exit non-zero when zero new items were inserted: rejected because dedup can make zero-item runs valid.
### Decision: Keep manual and scheduled paths independent
**Decision:** Manual command does not reconfigure or trigger scheduler jobs; it performs a one-off run only.
**Rationale:**
- Avoids race-prone manipulation of scheduler internals.
- Reduces complexity and risk in production runtime.
**Alternatives considered:**
- Temporarily altering scheduler trigger times: rejected as brittle and harder to reason about.
## Risks / Trade-offs
- **[Risk] Overlapping manual and scheduled runs may happen at boundary times** -> Mitigation: document operational guidance and keep dedup checks as safety net.
- **[Risk] External API failures still occur during forced runs** -> Mitigation: existing retry/backoff plus fallback provider path and explicit error output.
- **[Trade-off] Command success does not guarantee new rows** -> Mitigation: command output reports inserted count so operators can distinguish no-op vs failure.
## Migration Plan
1. Add CLI module and force-fetch subcommand wired to existing pipeline.
2. Add command result reporting and exit code behavior.
3. Document usage in README for bootstrap and recovery flows.
4. Validate command in local runtime and container runtime.
Rollback:
- Remove CLI entrypoint and related docs; scheduler-based hourly behavior remains unchanged.
## Open Questions
- Should force-fetch support an optional `--max-attempts` override, or stay fixed to pipeline defaults for v1?
- Should concurrent-run prevention use a process lock in this phase, or remain a documented operational constraint?

View File

@@ -0,0 +1,35 @@
## Why
ClawFort currently fetches news on a fixed hourly schedule, which is not enough during first-time setup or after a failed API cycle. Operators need a reliable way to force an immediate news pull so they can bootstrap content quickly and recover without waiting for the next scheduled run.
## What Changes
- **New Capabilities:**
- Add a manual Python command to trigger an immediate news fetch on demand.
- Add command output that clearly reports success/failure, number of fetched/stored items, and error details.
- Add safe invocation behavior so manual runs reuse existing fetch/retry/dedup logic.
- **Backend:**
- Add a CLI entrypoint/script for force-fetch execution.
- Wire the command to existing news aggregation pipeline used by scheduled jobs.
- Return non-zero exit codes on command failure for operational automation.
- **Operations:**
- Document how and when to run the force-fetch command (initial setup and recovery scenarios).
## Capabilities
### New Capabilities
- `force-fetch-command`: Provide a Python command that triggers immediate news aggregation outside the hourly scheduler.
- `fetch-run-reporting`: Provide operator-facing command output and exit semantics for successful runs and failures.
- `manual-fetch-recovery`: Support manual recovery workflow after failed or partial API fetch cycles.
### Modified Capabilities
- None.
## Impact
- **Code:** New CLI command module/entrypoint plus minimal integration with existing `news_service` execution path.
- **APIs:** No external API contract changes.
- **Dependencies:** No required new runtime dependencies expected.
- **Infrastructure:** No deployment topology change; command runs in the same container/runtime.
- **Environment:** Reuses existing env vars (`PERPLEXITY_API_KEY`, `OPENROUTER_API_KEY`, `IMAGE_QUALITY`).
- **Data:** No schema changes; command writes through existing dedup + persistence flow.

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Command reports run outcome to operator
The system SHALL present operator-facing output that describes whether the forced run succeeded or failed.
#### Scenario: Successful run reporting
- **WHEN** a forced fetch command completes without fatal errors
- **THEN** the command output includes a success indication
- **AND** includes the number of items stored in that run
#### Scenario: Failed run reporting
- **WHEN** a forced fetch command encounters a fatal execution error
- **THEN** the command output includes a failure indication
- **AND** includes actionable error details for operator diagnosis
### Requirement: Command exposes automation-friendly exit semantics
The system SHALL return deterministic process exit codes for command success and failure.
#### Scenario: Exit code on success
- **WHEN** the force-fetch command execution completes successfully
- **THEN** the process exits with code 0
- **AND** automation tooling can treat the run as successful
#### Scenario: Exit code on fatal failure
- **WHEN** the force-fetch command execution fails fatally
- **THEN** the process exits with a non-zero code
- **AND** automation tooling can detect the failure state

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Operator can trigger immediate news fetch via Python command
The system SHALL provide a Python command that triggers one immediate news aggregation run outside of the hourly scheduler.
#### Scenario: Successful forced fetch invocation
- **WHEN** an operator runs the documented force-fetch command with valid runtime configuration
- **THEN** the system executes one full fetch cycle using the existing aggregation pipeline
- **AND** the command terminates after the run completes
#### Scenario: Command does not reconfigure scheduler
- **WHEN** an operator runs the force-fetch command while the service scheduler exists
- **THEN** the command performs a one-off run only
- **AND** scheduler job definitions and cadence remain unchanged
### Requirement: Forced fetch reuses existing aggregation behavior
The system SHALL use the same retry, fallback, deduplication, image processing, and persistence logic as scheduled fetch runs.
#### Scenario: Retry and fallback parity
- **WHEN** the primary news provider request fails during a forced run
- **THEN** the system applies the configured retry behavior
- **AND** uses the configured fallback provider path if available
#### Scenario: Deduplication parity
- **WHEN** fetched headlines match existing duplicate rules
- **THEN** duplicate items are skipped according to existing deduplication policy
- **AND** only eligible items are persisted

View File

@@ -0,0 +1,22 @@
## ADDED Requirements
### Requirement: Manual command supports bootstrap and recovery workflows
The system SHALL allow operators to run the forced fetch command during first-time setup and after failed scheduled cycles.
#### Scenario: Bootstrap content population
- **WHEN** the system is newly deployed and contains no current news items
- **THEN** an operator can run the force-fetch command immediately
- **AND** the command attempts to populate the dataset without waiting for the next hourly schedule
#### Scenario: Recovery after failed scheduled fetch
- **WHEN** a prior scheduled fetch cycle failed or produced incomplete results
- **THEN** an operator can run the force-fetch command on demand
- **AND** the system performs a fresh one-off fetch attempt
### Requirement: Repeated manual runs remain operationally safe
The system SHALL support repeated operator-triggered runs without corrupting data integrity.
#### Scenario: Repeated invocation in same day
- **WHEN** an operator runs the force-fetch command multiple times within the same day
- **THEN** existing deduplication behavior prevents duplicate persistence for matching items
- **AND** each command run completes with explicit run status output

View File

@@ -0,0 +1,28 @@
## 1. CLI Command Foundation
- [x] 1.1 Create `backend/cli.py` with command parsing for force-fetch execution
- [x] 1.2 Add a force-fetch command entrypoint that can be invoked via Python module execution
- [x] 1.3 Ensure command initializes required runtime context (env + database readiness)
## 2. Force-Fetch Execution Path
- [x] 2.1 Wire command to existing news aggregation execution path (`process_and_store_news` or sync wrapper)
- [x] 2.2 Ensure command runs as a one-off operation without changing scheduler job configuration
- [x] 2.3 Preserve existing deduplication, retry, fallback, and image processing behavior during manual runs
## 3. Operator Reporting and Exit Semantics
- [x] 3.1 Add success output that includes stored item count for the forced run
- [x] 3.2 Add failure output with actionable error details when fatal execution errors occur
- [x] 3.3 Return exit code `0` on success and non-zero on fatal failures
## 4. Recovery Workflow and Validation
- [x] 4.1 Validate bootstrap workflow: force-fetch on a fresh deployment with no current items
- [x] 4.2 Validate recovery workflow: force-fetch after simulated failed scheduled cycle
- [x] 4.3 Validate repeated same-day manual runs do not create duplicate records under dedup policy
## 5. Documentation
- [x] 5.1 Update `README.md` with force-fetch command usage for first-time setup
- [x] 5.2 Document recovery-run usage and expected command output/exit behavior

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-12

View File

@@ -0,0 +1,94 @@
## Context
ClawFort currently stores and serves article content in a single language flow. The news creation path fetches English content via Perplexity and persists one record per article, while frontend hero/feed rendering consumes that single-language payload.
This change introduces multilingual support for Tamil and Malayalam with language-aware rendering and persistent user preference.
Constraints:
- Keep existing English behavior as default and fallback.
- Reuse current Perplexity integration for translation generation.
- Keep API and frontend changes minimal and backward-compatible where possible.
- Persist user language preference client-side so returning users keep their choice.
## Goals / Non-Goals
**Goals:**
- Generate Tamil and Malayalam translations at article creation time.
- Persist translation variants linked to the base article.
- Serve language-specific content in hero/feed API responses.
- Add landing-page language selector and persist preference across sessions.
**Non-Goals:**
- Supporting arbitrary language expansion in this phase.
- Introducing user accounts/server-side profile preferences.
- Building editorial translation workflows or manual override UI.
- Replacing Perplexity as translation provider.
## Decisions
### Decision: Model translations as child records linked to a base article
**Decision:** Keep one source article and store translation rows keyed by article ID + language code.
**Rationale:**
- Avoids duplicating non-language metadata (source URL, image attribution, timestamps).
- Supports language lookup with deterministic fallback to English.
- Eases future language additions without schema redesign.
**Alternatives considered:**
- Inline columns on article table (`headline_ta`, `headline_ml`): rejected as rigid and harder to extend.
- Fully duplicated article rows per language: rejected due to dedup and feed-order complexity.
### Decision: Translate immediately after article creation in ingestion pipeline
**Decision:** For each newly accepted article, request Tamil and Malayalam translations and persist before ingestion cycle completes.
**Rationale:**
- Keeps article and translations synchronized.
- Avoids delayed jobs and partial language availability in normal flow.
- Fits existing per-article processing loop.
**Alternatives considered:**
- Asynchronous background translation queue: rejected for higher complexity in this phase.
### Decision: Add optional language input to read APIs with English fallback
**Decision:** Add language selection input (query param) on existing read endpoints; if translation missing, return English source text.
**Rationale:**
- Preserves endpoint footprint and frontend integration simplicity.
- Guarantees response completeness even when translation fails.
- Supports progressive rollout without breaking existing consumers.
**Alternatives considered:**
- New language-specific endpoints: rejected as unnecessary API surface growth.
### Decision: Persist frontend language preference in localStorage with cookie fallback
**Decision:** Primary persistence in `localStorage`; optional cookie fallback for constrained browsers.
**Rationale:**
- Simple client-only persistence without backend session dependencies.
- Matches one-page app architecture and current no-auth model.
**Alternatives considered:**
- Cookie-only preference: rejected as less ergonomic for JS state hydration.
## Risks / Trade-offs
- **[Risk] Translation generation increases API cost/latency per ingestion cycle** -> Mitigation: bounded retries, fallback to English when translation unavailable.
- **[Risk] Partial translation failures create mixed-language feed** -> Mitigation: deterministic fallback to English for missing translation rows.
- **[Trade-off] Translation-at-ingest adds synchronous processing time** -> Mitigation: keep language set fixed to two targets in this phase.
- **[Risk] Language preference desynchronization between tabs/devices** -> Mitigation: accept per-browser persistence scope in current architecture.
## Migration Plan
1. Add translation persistence model and migration path.
2. Extend ingestion pipeline to request/store Tamil and Malayalam translations.
3. Add language-aware API response behavior with fallback.
4. Implement frontend language selector + preference persistence.
5. Validate language switching, fallback, and returning-user preference behavior.
Rollback:
- Disable language selection in frontend and return English-only payload while retaining translation data safely.
## Open Questions
- Should translation failures be retried independently per language within the same cycle, or skipped after one failed language call?
- Should unsupported language requests return 400 or silently fallback to English in v1?

View File

@@ -0,0 +1,37 @@
## Why
ClawFort currently publishes content in a single language, which limits accessibility for regional audiences. Adding multilingual delivery now improves usability for Tamil and Malayalam readers while keeping the current English workflow intact.
## What Changes
- **New Capabilities:**
- Persist the fetched articles locally in database.
- Generate Tamil and Malayalam translations for each newly created article using Perplexity.
- Store translated variants as language-specific content items linked to the same base article.
- Add a language selector on the landing page to switch article rendering language.
- Persist user language preference in browser storage (local storage or cookie) and restore it for returning users.
- **Frontend:**
- Add visible language switcher UI on the one-page experience.
- Render hero and feed content in selected language when translation exists.
- **Backend:**
- Extend content generation flow to request and save multilingual outputs.
- Serve language-specific content for existing API reads.
## Capabilities
### New Capabilities
- `article-translations-ml-tm`: Create and store Tamil and Malayalam translated content variants for each article at creation time.
- `language-aware-content-delivery`: Return and render language-specific article fields based on selected language.
- `language-preference-persistence`: Persist and restore user-selected language across sessions for returning users.
### Modified Capabilities
- None.
## Impact
- **Code:** Backend aggregation/storage flow, API response handling, and frontend rendering/state management will be updated.
- **APIs:** Existing read endpoints will need language-aware response behavior or language selection input handling.
- **Dependencies:** Reuses Perplexity integration; no mandatory new external provider expected.
- **Infrastructure:** No deployment topology changes.
- **Environment:** Uses existing Perplexity configuration; may introduce optional translation toggles/settings later.
- **Data:** Adds translation data model/fields linked to each source article.

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: System generates Tamil and Malayalam translations at article creation time
The system SHALL generate Tamil (`ta`) and Malayalam (`ml`) translations for each newly created article during ingestion.
#### Scenario: Translation generation for new article
- **WHEN** a new source article is accepted for storage
- **THEN** the system requests Tamil and Malayalam translations for headline and summary
- **AND** translation generation occurs in the same ingestion flow for that article
#### Scenario: Translation failure fallback
- **WHEN** translation generation fails for one or both target languages
- **THEN** the system stores the base article in English
- **AND** marks missing translations as unavailable without failing the whole ingestion cycle
### Requirement: System stores translation variants linked to the same article
The system SHALL persist language-specific translated content as translation items associated with the base article.
#### Scenario: Persist linked translations
- **WHEN** Tamil and Malayalam translations are generated successfully
- **THEN** the system stores them as language-specific content variants linked to the base article identifier
- **AND** translation records remain queryable by language code
#### Scenario: No duplicate translation variants per language
- **WHEN** translation storage is attempted for an article-language pair that already exists
- **THEN** the system avoids creating duplicate translation items for the same language
- **AND** preserves one authoritative translation variant per article per language in this phase

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: API supports language-aware content retrieval
The system SHALL support language-aware content delivery for hero and feed reads using selected language input.
#### Scenario: Language-specific latest article response
- **WHEN** a client requests latest article data with a supported language selection
- **THEN** the system returns headline and summary in the selected language when available
- **AND** includes the corresponding base article metadata and media attribution
#### Scenario: Language-specific paginated feed response
- **WHEN** a client requests paginated feed data with a supported language selection
- **THEN** the system returns each feed item's headline and summary in the selected language when available
- **AND** preserves existing pagination behavior and ordering semantics
### Requirement: Language fallback to English is deterministic
The system SHALL return English source content when the requested translation is unavailable.
#### Scenario: Missing translation fallback
- **WHEN** a client requests Tamil or Malayalam content for an article lacking that translation
- **THEN** the system returns the English headline and summary for that article
- **AND** response shape remains consistent with language-aware responses
#### Scenario: Unsupported language handling
- **WHEN** a client requests a language outside supported values (`en`, `ta`, `ml`)
- **THEN** the system applies the defined default language behavior for this phase
- **AND** avoids breaking existing consumers of news endpoints

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Landing page provides language selector
The system SHALL display a language selector on the landing page that allows switching between English, Tamil, and Malayalam content views.
#### Scenario: User selects language from landing page
- **WHEN** a user chooses Tamil or Malayalam from the language selector
- **THEN** hero and feed content update to requested language-aware rendering
- **AND** subsequent API requests use the selected language context
#### Scenario: User switches back to English
- **WHEN** a user selects English in the language selector
- **THEN** content renders in English
- **AND** language state updates immediately in the frontend view
### Requirement: User language preference is persisted and restored
The system SHALL persist selected language preference in client-side storage and restore it for returning users.
#### Scenario: Persist language selection
- **WHEN** a user selects a supported language on the landing page
- **THEN** the selected language code is stored in local storage or a client cookie
- **AND** the persisted value is used as preferred language for future visits on the same browser
#### Scenario: Restore preference on return visit
- **WHEN** a returning user opens the landing page
- **THEN** the system reads persisted language preference from client storage
- **AND** initializes the UI and content requests with that language by default

View File

@@ -0,0 +1,40 @@
## 1. Translation Data Model and Persistence
- [x] 1.1 Add translation persistence model linked to base article with language code (`en`, `ta`, `ml`)
- [x] 1.2 Update database initialization/migration path to create translation storage structures
- [x] 1.3 Add repository operations to create/read translation variants by article and language
- [x] 1.4 Enforce no duplicate translation variant for the same article-language pair
## 2. Ingestion Pipeline Translation Generation
- [x] 2.1 Extend ingestion flow to trigger Tamil and Malayalam translation generation for each new article
- [x] 2.2 Reuse Perplexity integration for translation calls with language-specific prompts
- [x] 2.3 Persist generated translations as linked variants during the same ingestion cycle
- [x] 2.4 Implement graceful fallback when translation generation fails (store English base, continue cycle)
## 3. Language-Aware API Delivery
- [x] 3.1 Add language selection input handling to latest-news endpoint
- [x] 3.2 Add language selection input handling to paginated feed endpoint
- [x] 3.3 Return translated headline/summary when available and fallback to English when missing
- [x] 3.4 Define and implement behavior for unsupported language requests in this phase
## 4. Frontend Language Selector and Rendering
- [x] 4.1 Add landing-page language selector UI with English, Tamil, and Malayalam options
- [x] 4.2 Update hero data fetch/render flow to request and display selected language content
- [x] 4.3 Update feed pagination fetch/render flow to request and display selected language content
- [x] 4.4 Keep existing attribution/media rendering behavior intact across language switches
## 5. Preference Persistence and Returning User Behavior
- [x] 5.1 Persist user-selected language in localStorage with cookie fallback
- [x] 5.2 Restore persisted language on page load before initial content fetch
- [x] 5.3 Initialize selector state and API language requests from restored preference
## 6. Validation and Documentation
- [x] 6.1 Validate translation creation and retrieval for Tamil and Malayalam on new articles
- [x] 6.2 Validate fallback behavior for missing translation variants and unsupported language input
- [x] 6.3 Validate returning-user language persistence across browser sessions
- [x] 6.4 Update README with multilingual behavior, language selector usage, and persistence details