# ClawFort — AI News Aggregation One-Pager A stunning single-page website that automatically aggregates and displays AI news hourly using the Perplexity API. ## Quick Start ### Docker (Recommended) ```bash cp .env.example .env # Edit .env and set PERPLEXITY_API_KEY docker compose up --build ``` Open http://localhost:8000 ### Local Development ```bash pip install -r requirements.txt cp .env.example .env # Edit .env and set PERPLEXITY_API_KEY python -m uvicorn backend.main:app --reload --port 8000 ``` ## Force Fetch Command Use the force-fetch command to run one immediate news ingestion cycle outside the hourly scheduler. ```bash python -m backend.cli force-fetch ``` Common use cases: - **Bootstrap**: Populate initial content right after first deployment. - **Recovery**: Re-run ingestion after a failed provider/API cycle. Command behavior: - Reuses existing retry, fallback, dedup, image optimization, and persistence logic. - Prints success output with stored item count, for example: `force-fetch succeeded: stored=3 elapsed=5.1s` - Prints actionable error output on fatal failures and exits non-zero. Exit codes: - `0`: Command completed successfully (including runs that store zero new rows) - `1`: Fatal command failure (for example missing API keys or unrecoverable runtime error) ## Multilingual Support ClawFort supports English (`en`), Tamil (`ta`), and Malayalam (`ml`) content delivery. - New articles are stored in English and translated to Tamil and Malayalam during ingestion. - Translations are linked to the same base article and served by the existing news endpoints. - If a requested translation is unavailable, the API falls back to English. Language-aware API usage: ```bash # Latest hero item in Tamil curl "http://localhost:8000/api/news/latest?language=ta" # Feed page in Malayalam curl "http://localhost:8000/api/news?limit=10&language=ml" ``` Unsupported language codes default to English. Frontend language selector behavior: - Landing page includes a language selector (`English`, `Tamil`, `Malayalam`). - Selected language is persisted in `localStorage` and mirrored in a client cookie. - Returning users see content in their previously selected language. ## Summary Modal Each fetched article now includes a concise summary artifact and can be opened in a modal from the feed. Modal structure: ```text [Relevant Image] ## TL;DR [bullet points] ## Summary [Summarized article] ## Source and Citation [source of the news] Powered by Perplexity ``` Backend behavior: - Summary artifacts are generated during ingestion using Perplexity and persisted with article records. - If summary generation fails, ingestion still succeeds and a fallback summary artifact is stored. - Summary image enrichment prefers MCP image retrieval when configured, with deterministic fallback behavior. Umami events for summary modal: - `summary-modal-open` with `article_id` - `summary-modal-close` with `article_id` - `summary-modal-link-out` with `article_id` and `source_url` ## Policy Pages The footer now includes: - `Terms of Use` (`/terms`) - `Attribution` (`/attribution`) Both pages are served as static frontend documents through FastAPI routes. ## Theme Switcher The header includes an icon-based theme switcher with four modes: - `system` (default when unset) - `light` - `dark` - `contrast` (high contrast) Theme persistence behavior: - Primary: `localStorage` (`clawfort_theme`) - Fallback: cookie (`clawfort_theme`) Returning users get their previously selected theme. ## Cookie Consent and Analytics Gate Analytics loading is consent-gated: - Consent banner appears when no consent is stored. - Clicking `Accept` stores consent in localStorage and cookie (`clawfort_cookie_consent=accepted`). - Umami analytics script loads only after consent. ## Environment Variables | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `PERPLEXITY_API_KEY` | Yes | — | Perplexity API key for news fetching | | `IMAGE_QUALITY` | No | `85` | JPEG compression quality (1-100) | | `OPENROUTER_API_KEY` | No | — | Fallback LLM provider API key | | `RETENTION_DAYS` | No | `30` | Days to keep news before archiving | | `UMAMI_SCRIPT_URL` | No | — | Umami analytics script URL | | `UMAMI_WEBSITE_ID` | No | — | Umami website tracking ID | | `ROYALTY_IMAGE_PROVIDER` | No | `picsum` | Royalty-free image source (`picsum`, `wikimedia`, or MCP) | | `ROYALTY_IMAGE_MCP_ENDPOINT` | No | — | MCP endpoint for image retrieval (preferred when set) | | `ROYALTY_IMAGE_API_KEY` | No | — | Optional API key for image provider integrations | | `SUMMARY_LENGTH_SCALE` | No | `3` | Summary detail level from `1` (short) to `5` (long) | ## Architecture - **Backend**: Python (FastAPI) + SQLAlchemy + APScheduler - **Frontend**: Alpine.js + Tailwind CSS (CDN, no build step) - **Database**: SQLite with 30-day retention - **Container**: Single Docker container ## API Endpoints | Method | Path | Description | |--------|------|-------------| | `GET` | `/` | Serve frontend | | `GET` | `/api/news/latest` | Latest news item for hero block | | `GET` | `/api/news` | Paginated news feed | | `GET` | `/api/health` | Health check with news count | | `GET` | `/config` | Frontend config (analytics) | ## SEO and Structured Data Contract Homepage (`/`) contract: - Core metadata: `title`, `description`, `robots`, canonical link. - Social metadata: Open Graph (`og:type`, `og:site_name`, `og:title`, `og:description`, `og:url`, `og:image`) and Twitter (`twitter:card`, `twitter:title`, `twitter:description`, `twitter:image`). - JSON-LD graph includes: - `Newspaper` entity for site-level identity. - `NewsArticle` entities for hero and feed articles. Policy page contract (`/terms`, `/attribution`): - Page-specific `title` and `description`. - `robots` metadata. - Canonical link for the route. - Open Graph and Twitter preview metadata. Structured data field baseline for each `NewsArticle`: - `headline`, `description`, `image`, `datePublished`, `dateModified`, `url`, `mainEntityOfPage`, `inLanguage`, `publisher`, `author`. ## Delivery Performance Header Contract FastAPI middleware applies route-specific cache and compression behavior: | Route Class | Cache-Control | Notes | |-------------|---------------|-------| | `/static/*` | `public, max-age=604800, immutable` | Long-lived static assets | | `/api/*` | `public, max-age=60, stale-while-revalidate=120` | Short-lived feed data | | `/`, `/terms`, `/attribution` | `public, max-age=300, stale-while-revalidate=600` | HTML routes | Additional headers: - `Vary: Accept-Encoding` for API responses. - `X-Content-Type-Options: nosniff` for all responses. - Gzip compression enabled via `GZipMiddleware` for eligible payloads. ## SEO and Performance Verification Checklist Run after local startup (`python -m uvicorn backend.main:app --reload --port 8000`): ```bash # HTML route cache checks curl -I http://localhost:8000/ curl -I http://localhost:8000/terms curl -I http://localhost:8000/attribution # API cache + vary checks curl -I "http://localhost:8000/api/news/latest?language=en" curl -I "http://localhost:8000/api/news?limit=5&language=en" # Compression check (expect Content-Encoding: gzip for eligible payloads) curl -s -H "Accept-Encoding: gzip" -D - "http://localhost:8000/api/news?limit=10&language=en" -o /dev/null ``` Manual acceptance checks: 1. Homepage source contains one canonical link and Open Graph/Twitter metadata fields. 2. Homepage JSON-LD contains one `Newspaper` entity and deduplicated `NewsArticle` entries. 3. Hero/feed/modal images show shimmer placeholders until load/fallback completion. 4. Feed and modal images use `loading="lazy"`, explicit `width`/`height`, and `decoding="async"`. 5. Smooth scrolling behavior is enabled for in-page navigation interactions. Structured-data validation: - Validate JSON-LD output using schema-aware validators (e.g., Schema.org validator or equivalent tooling) and confirm `Newspaper` + `NewsArticle` entities pass required field checks. Regression checks: - Verify homepage rendering (hero, feed, modal). - Verify policy-page metadata output. - Verify cache/compression headers remain unchanged after SEO-related edits. ### GET /api/news Query parameters: - `cursor` (int, optional): Last item ID for pagination - `limit` (int, default 10): Items per page (max 50) - `exclude_hero` (int, optional): Hero item ID to exclude Response: ```json { "items": [{ "id": 1, "headline": "...", "summary": "...", "source_url": "...", "image_url": "...", "image_credit": "...", "published_at": "...", "created_at": "..." }], "next_cursor": 5, "has_more": true } ``` ## Deployment ```bash docker build -t clawfort . docker run -d \ -e PERPLEXITY_API_KEY=pplx-xxx \ -v clawfort-data:/app/data \ -v clawfort-images:/app/backend/static/images \ -p 8000:8000 \ clawfort ``` Data persists across restarts via Docker volumes. ## Scheduled Jobs - **Hourly**: Fetch latest AI news from Perplexity API - **Nightly (3 AM)**: Archive news older than 30 days, delete archived items older than 60 days