ClawFort — AI News Aggregation One-Pager
A stunning single-page website that automatically aggregates and displays AI news hourly using the Perplexity API.
Quick Start
Docker (Recommended)
cp .env.example .env
# Edit .env and set PERPLEXITY_API_KEY
docker compose up --build
Local Development
pip install -r requirements.txt
cp .env.example .env
# Edit .env and set PERPLEXITY_API_KEY
python -m uvicorn backend.main:app --reload --port 8000
Force Fetch Command
Use the force-fetch command to run one immediate news ingestion cycle outside the hourly scheduler.
python -m backend.cli force-fetch
Common use cases:
- Bootstrap: Populate initial content right after first deployment.
- Recovery: Re-run ingestion after a failed provider/API cycle.
Command behavior:
- Reuses existing retry, fallback, dedup, image optimization, and persistence logic.
- Prints success output with stored item count, for example:
force-fetch succeeded: stored=3 elapsed=5.1s - Prints actionable error output on fatal failures and exits non-zero.
Exit codes:
0: Command completed successfully (including runs that store zero new rows)1: Fatal command failure (for example missing API keys or unrecoverable runtime error)
Quality and Test Suite
Run local quality gates:
pip install -e .[dev]
pytest
ruff check backend tests
CI quality gates are defined in .github/workflows/quality-gates.yml.
Monitoring baseline, thresholds, and alert runbook are documented in docs/quality-and-monitoring.md.
Admin Maintenance Commands
ClawFort includes an admin command suite to simplify operational recovery and maintenance.
# List admin subcommands
python -m backend.cli admin --help
# Fetch n articles on demand
python -m backend.cli admin fetch --count 10
# Refetch images for latest 30 articles (sequential queue + exponential backoff)
python -m backend.cli admin refetch-images --limit 30
# Clean archived records older than N days
python -m backend.cli admin clean-archive --days 60 --confirm
# Clear optimized image cache files
python -m backend.cli admin clear-cache --confirm
# Clear existing news items (includes archived when requested)
python -m backend.cli admin clear-news --include-archived --confirm
# Rebuild content from scratch (clear + fetch)
python -m backend.cli admin rebuild-site --count 10 --confirm
# Regenerate translations for existing articles
python -m backend.cli admin regenerate-translations --limit 100
Safety guardrails:
- Destructive commands require
--confirm. - Dry-run previews are available for applicable commands via
--dry-run. - Admin output follows a structured format like:
admin:<command> status=<ok|error|blocked> ....
Multilingual Support
ClawFort supports English (en), Tamil (ta), and Malayalam (ml) content delivery.
- New articles are stored in English and translated to Tamil and Malayalam during ingestion.
- Translations are linked to the same base article and served by the existing news endpoints.
- If a requested translation is unavailable, the API falls back to English.
Language-aware API usage:
# Latest hero item in Tamil
curl "http://localhost:8000/api/news/latest?language=ta"
# Feed page in Malayalam
curl "http://localhost:8000/api/news?limit=10&language=ml"
Unsupported language codes default to English.
Frontend language selector behavior:
- Landing page includes a language selector (
English,Tamil,Malayalam). - Selected language is persisted in
localStorageand mirrored in a client cookie. - Returning users see content in their previously selected language.
Summary Modal
Each fetched article now includes a concise summary artifact and can be opened in a modal from the feed.
Modal structure:
[Relevant Image]
## TL;DR
[bullet points]
## Summary
[Summarized article]
## Source and Citation
[source of the news]
Powered by Perplexity
Backend behavior:
- Summary artifacts are generated during ingestion using Perplexity and persisted with article records.
- If summary generation fails, ingestion still succeeds and a fallback summary artifact is stored.
- Summary image enrichment prefers MCP image retrieval when configured, with deterministic fallback behavior.
Umami events for summary modal:
summary-modal-openwitharticle_idsummary-modal-closewitharticle_idsummary-modal-link-outwitharticle_idandsource_url
Policy Pages
The footer now includes:
Terms of Use(/terms)Attribution(/attribution)
Both pages are served as static frontend documents through FastAPI routes.
Theme Switcher
The header includes an icon-based theme switcher with four modes:
system(default when unset)lightdarkcontrast(high contrast)
Theme persistence behavior:
- Primary:
localStorage(clawfort_theme) - Fallback: cookie (
clawfort_theme)
Returning users get their previously selected theme.
Cookie Consent and Analytics Gate
Analytics loading is consent-gated:
- Consent banner appears when no consent is stored.
- Clicking
Acceptstores consent in localStorage and cookie (clawfort_cookie_consent=accepted). - Umami analytics script loads only after consent.
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
PERPLEXITY_API_KEY |
Yes | — | Perplexity API key for news fetching |
IMAGE_QUALITY |
No | 85 |
JPEG compression quality (1-100) |
OPENROUTER_API_KEY |
No | — | Fallback LLM provider API key |
RETENTION_DAYS |
No | 30 |
Days to keep news before archiving |
UMAMI_SCRIPT_URL |
No | — | Umami analytics script URL |
UMAMI_WEBSITE_ID |
No | — | Umami website tracking ID |
ROYALTY_IMAGE_PROVIDER |
No | picsum |
Legacy: single provider selection (deprecated, use ROYALTY_IMAGE_PROVIDERS) |
ROYALTY_IMAGE_PROVIDERS |
No | pixabay,unsplash,pexels,wikimedia,picsum |
Comma-separated provider priority chain |
ROYALTY_IMAGE_MCP_ENDPOINT |
No | — | MCP endpoint for image retrieval (highest priority when set) |
ROYALTY_IMAGE_API_KEY |
No | — | Optional API key for image provider integrations |
PIXABAY_API_KEY |
No | — | Pixabay API key (get from pixabay.com/api/docs) |
UNSPLASH_ACCESS_KEY |
No | — | Unsplash API access key (get from unsplash.com/developers) |
PEXELS_API_KEY |
No | — | Pexels API key (get from pexels.com/api) |
SUMMARY_LENGTH_SCALE |
No | 3 |
Summary detail level from 1 (short) to 5 (long) |
Image Provider Configuration
ClawFort retrieves royalty-free images for news articles using a configurable provider chain.
Provider Priority
Providers are tried in order until one returns a valid image:
- MCP Endpoint (if
ROYALTY_IMAGE_MCP_ENDPOINTis set) — Custom endpoint, highest priority - Provider Chain (from
ROYALTY_IMAGE_PROVIDERS) — Comma-separated list, tried in order
Default chain: pixabay,unsplash,pexels,wikimedia,picsum
Supported Providers
| Provider | API Key Variable | Attribution Format |
|---|---|---|
| Pixabay | PIXABAY_API_KEY |
"Photo by {user} on Pixabay" |
| Unsplash | UNSPLASH_ACCESS_KEY |
"Photo by {name} on Unsplash" (required by TOS) |
| Pexels | PEXELS_API_KEY |
"Photo by {photographer} on Pexels" |
| Wikimedia | None (no key needed) | "Wikimedia Commons" |
| Picsum | None (always available) | "Picsum Photos" |
Configuration Examples
# Use Unsplash as primary, fall back to Pexels, then Picsum
ROYALTY_IMAGE_PROVIDERS=unsplash,pexels,picsum
UNSPLASH_ACCESS_KEY=your-unsplash-key
PEXELS_API_KEY=your-pexels-key
# Use only Pixabay
ROYALTY_IMAGE_PROVIDERS=pixabay
PIXABAY_API_KEY=your-pixabay-key
# Disable premium providers, use only free sources
ROYALTY_IMAGE_PROVIDERS=wikimedia,picsum
Providers without configured API keys are automatically skipped.
Architecture
- Backend: Python (FastAPI) + SQLAlchemy + APScheduler
- Frontend: Alpine.js + Tailwind CSS (CDN, no build step)
- Database: SQLite with 30-day retention
- Container: Single Docker container
API Endpoints
| Method | Path | Description |
|---|---|---|
GET |
/ |
Serve frontend |
GET |
/api/news/latest |
Latest news item for hero block |
GET |
/api/news |
Paginated news feed |
GET |
/api/health |
Health check with news count |
GET |
/config |
Frontend config (analytics) |
SEO and Structured Data Contract
Homepage (/) contract:
- Core metadata:
title,description,robots, canonical link. - Social metadata: Open Graph (
og:type,og:site_name,og:title,og:description,og:url,og:image) and Twitter (twitter:card,twitter:title,twitter:description,twitter:image). - JSON-LD graph includes:
Newspaperentity for site-level identity.NewsArticleentities for hero and feed articles.
Policy page contract (/terms, /attribution):
- Page-specific
titleanddescription. robotsmetadata.- Canonical link for the route.
- Open Graph and Twitter preview metadata.
Structured data field baseline for each NewsArticle:
headline,description,image,datePublished,dateModified,url,mainEntityOfPage,inLanguage,publisher,author.
Delivery Performance Header Contract
FastAPI middleware applies route-specific cache and compression behavior:
| Route Class | Cache-Control | Notes |
|---|---|---|
/static/* |
public, max-age=604800, immutable |
Long-lived static assets |
/api/* |
public, max-age=60, stale-while-revalidate=120 |
Short-lived feed data |
/, /terms, /attribution |
public, max-age=300, stale-while-revalidate=600 |
HTML routes |
Additional headers:
Vary: Accept-Encodingfor API responses.X-Content-Type-Options: nosnifffor all responses.- Gzip compression enabled via
GZipMiddlewarefor eligible payloads.
SEO and Performance Verification Checklist
Run after local startup (python -m uvicorn backend.main:app --reload --port 8000):
# HTML route cache checks
curl -I http://localhost:8000/
curl -I http://localhost:8000/terms
curl -I http://localhost:8000/attribution
# API cache + vary checks
curl -I "http://localhost:8000/api/news/latest?language=en"
curl -I "http://localhost:8000/api/news?limit=5&language=en"
# Compression check (expect Content-Encoding: gzip for eligible payloads)
curl -s -H "Accept-Encoding: gzip" -D - "http://localhost:8000/api/news?limit=10&language=en" -o /dev/null
Manual acceptance checks:
- Homepage source contains one canonical link and Open Graph/Twitter metadata fields.
- Homepage JSON-LD contains one
Newspaperentity and deduplicatedNewsArticleentries. - Hero/feed/modal images show shimmer placeholders until load/fallback completion.
- Feed and modal images use
loading="lazy", explicitwidth/height, anddecoding="async". - Smooth scrolling behavior is enabled for in-page navigation interactions.
Structured-data validation:
- Validate JSON-LD output using schema-aware validators (e.g., Schema.org validator or equivalent tooling) and confirm
Newspaper+NewsArticleentities pass required field checks.
Regression checks:
- Verify homepage rendering (hero, feed, modal).
- Verify policy-page metadata output.
- Verify cache/compression headers remain unchanged after SEO-related edits.
GET /api/news
Query parameters:
cursor(int, optional): Last item ID for paginationlimit(int, default 10): Items per page (max 50)exclude_hero(int, optional): Hero item ID to exclude
Response:
{
"items": [{ "id": 1, "headline": "...", "summary": "...", "source_url": "...", "image_url": "...", "image_credit": "...", "published_at": "...", "created_at": "..." }],
"next_cursor": 5,
"has_more": true
}
Deployment
docker build -t clawfort .
docker run -d \
-e PERPLEXITY_API_KEY=pplx-xxx \
-v clawfort-data:/app/data \
-v clawfort-images:/app/backend/static/images \
-p 8000:8000 \
clawfort
Data persists across restarts via Docker volumes.
Scheduled Jobs
- Hourly: Fetch latest AI news from Perplexity API
- Nightly (3 AM): Archive news older than 30 days, delete archived items older than 60 days