Santhosh Janardhanan 679561bcdb
Some checks failed
quality-gates / lint-and-test (push) Has been cancelled
quality-gates / security-scan (push) Has been cancelled
First deployment
2026-02-13 09:14:04 -05:00
2026-02-13 09:14:04 -05:00
2026-02-12 16:50:29 -05:00
2026-02-13 09:14:04 -05:00
2026-02-13 03:12:42 -05:00
2026-02-13 09:14:04 -05:00
2026-02-13 09:14:04 -05:00
2026-02-13 09:14:04 -05:00
2026-02-13 09:14:04 -05:00
2026-02-12 16:50:29 -05:00
2026-02-13 09:14:04 -05:00
2026-02-13 03:12:42 -05:00
2026-02-12 16:50:29 -05:00
2026-02-12 16:50:29 -05:00
2026-02-13 09:14:04 -05:00
2026-02-13 09:14:04 -05:00
2026-02-13 09:14:04 -05:00
2026-02-12 16:50:29 -05:00
2026-02-13 09:14:04 -05:00

ClawFort — AI News Aggregation One-Pager

A stunning single-page website that automatically aggregates and displays AI news hourly using the Perplexity API.

Quick Start

cp .env.example .env
# Edit .env and set PERPLEXITY_API_KEY

docker compose up --build

Open http://localhost:8000

Local Development

pip install -r requirements.txt
cp .env.example .env
# Edit .env and set PERPLEXITY_API_KEY

python -m uvicorn backend.main:app --reload --port 8000

Force Fetch Command

Use the force-fetch command to run one immediate news ingestion cycle outside the hourly scheduler.

python -m backend.cli force-fetch

Common use cases:

  • Bootstrap: Populate initial content right after first deployment.
  • Recovery: Re-run ingestion after a failed provider/API cycle.

Command behavior:

  • Reuses existing retry, fallback, dedup, image optimization, and persistence logic.
  • Prints success output with stored item count, for example: force-fetch succeeded: stored=3 elapsed=5.1s
  • Prints actionable error output on fatal failures and exits non-zero.

Exit codes:

  • 0: Command completed successfully (including runs that store zero new rows)
  • 1: Fatal command failure (for example missing API keys or unrecoverable runtime error)

Quality and Test Suite

Run local quality gates:

pip install -e .[dev]
pytest
ruff check backend tests

CI quality gates are defined in .github/workflows/quality-gates.yml.

Monitoring baseline, thresholds, and alert runbook are documented in docs/quality-and-monitoring.md.

Admin Maintenance Commands

ClawFort includes an admin command suite to simplify operational recovery and maintenance.

# List admin subcommands
python -m backend.cli admin --help

# Fetch n articles on demand
python -m backend.cli admin fetch --count 10

# Refetch images for latest 30 articles (sequential queue + exponential backoff)
python -m backend.cli admin refetch-images --limit 30

# Clean archived records older than N days
python -m backend.cli admin clean-archive --days 60 --confirm

# Clear optimized image cache files
python -m backend.cli admin clear-cache --confirm

# Clear existing news items (includes archived when requested)
python -m backend.cli admin clear-news --include-archived --confirm

# Rebuild content from scratch (clear + fetch)
python -m backend.cli admin rebuild-site --count 10 --confirm

# Regenerate translations for existing articles
python -m backend.cli admin regenerate-translations --limit 100

Safety guardrails:

  • Destructive commands require --confirm.
  • Dry-run previews are available for applicable commands via --dry-run.
  • Admin output follows a structured format like: admin:<command> status=<ok|error|blocked> ....

Multilingual Support

ClawFort supports English (en), Tamil (ta), and Malayalam (ml) content delivery.

  • New articles are stored in English and translated to Tamil and Malayalam during ingestion.
  • Translations are linked to the same base article and served by the existing news endpoints.
  • If a requested translation is unavailable, the API falls back to English.

Language-aware API usage:

# Latest hero item in Tamil
curl "http://localhost:8000/api/news/latest?language=ta"

# Feed page in Malayalam
curl "http://localhost:8000/api/news?limit=10&language=ml"

Unsupported language codes default to English.

Frontend language selector behavior:

  • Landing page includes a language selector (English, Tamil, Malayalam).
  • Selected language is persisted in localStorage and mirrored in a client cookie.
  • Returning users see content in their previously selected language.

Summary Modal

Each fetched article now includes a concise summary artifact and can be opened in a modal from the feed.

Modal structure:

[Relevant Image]

## TL;DR
[bullet points]

## Summary
[Summarized article]

## Source and Citation
[source of the news]

Powered by Perplexity

Backend behavior:

  • Summary artifacts are generated during ingestion using Perplexity and persisted with article records.
  • If summary generation fails, ingestion still succeeds and a fallback summary artifact is stored.
  • Summary image enrichment prefers MCP image retrieval when configured, with deterministic fallback behavior.

Umami events for summary modal:

  • summary-modal-open with article_id
  • summary-modal-close with article_id
  • summary-modal-link-out with article_id and source_url

Policy Pages

The footer now includes:

  • Terms of Use (/terms)
  • Attribution (/attribution)

Both pages are served as static frontend documents through FastAPI routes.

Theme Switcher

The header includes an icon-based theme switcher with four modes:

  • system (default when unset)
  • light
  • dark
  • contrast (high contrast)

Theme persistence behavior:

  • Primary: localStorage (clawfort_theme)
  • Fallback: cookie (clawfort_theme)

Returning users get their previously selected theme.

Analytics loading is consent-gated:

  • Consent banner appears when no consent is stored.
  • Clicking Accept stores consent in localStorage and cookie (clawfort_cookie_consent=accepted).
  • Umami analytics script loads only after consent.

Environment Variables

Variable Required Default Description
PERPLEXITY_API_KEY Yes Perplexity API key for news fetching
IMAGE_QUALITY No 85 JPEG compression quality (1-100)
OPENROUTER_API_KEY No Fallback LLM provider API key
RETENTION_DAYS No 30 Days to keep news before archiving
UMAMI_SCRIPT_URL No Umami analytics script URL
UMAMI_WEBSITE_ID No Umami website tracking ID
ROYALTY_IMAGE_PROVIDER No picsum Legacy: single provider selection (deprecated, use ROYALTY_IMAGE_PROVIDERS)
ROYALTY_IMAGE_PROVIDERS No pixabay,unsplash,pexels,wikimedia,picsum Comma-separated provider priority chain
ROYALTY_IMAGE_MCP_ENDPOINT No MCP endpoint for image retrieval (highest priority when set)
ROYALTY_IMAGE_API_KEY No Optional API key for image provider integrations
PIXABAY_API_KEY No Pixabay API key (get from pixabay.com/api/docs)
UNSPLASH_ACCESS_KEY No Unsplash API access key (get from unsplash.com/developers)
PEXELS_API_KEY No Pexels API key (get from pexels.com/api)
SUMMARY_LENGTH_SCALE No 3 Summary detail level from 1 (short) to 5 (long)

Image Provider Configuration

ClawFort retrieves royalty-free images for news articles using a configurable provider chain.

Provider Priority

Providers are tried in order until one returns a valid image:

  1. MCP Endpoint (if ROYALTY_IMAGE_MCP_ENDPOINT is set) — Custom endpoint, highest priority
  2. Provider Chain (from ROYALTY_IMAGE_PROVIDERS) — Comma-separated list, tried in order

Default chain: pixabay,unsplash,pexels,wikimedia,picsum

Supported Providers

Provider API Key Variable Attribution Format
Pixabay PIXABAY_API_KEY "Photo by {user} on Pixabay"
Unsplash UNSPLASH_ACCESS_KEY "Photo by {name} on Unsplash" (required by TOS)
Pexels PEXELS_API_KEY "Photo by {photographer} on Pexels"
Wikimedia None (no key needed) "Wikimedia Commons"
Picsum None (always available) "Picsum Photos"

Configuration Examples

# Use Unsplash as primary, fall back to Pexels, then Picsum
ROYALTY_IMAGE_PROVIDERS=unsplash,pexels,picsum
UNSPLASH_ACCESS_KEY=your-unsplash-key
PEXELS_API_KEY=your-pexels-key

# Use only Pixabay
ROYALTY_IMAGE_PROVIDERS=pixabay
PIXABAY_API_KEY=your-pixabay-key

# Disable premium providers, use only free sources
ROYALTY_IMAGE_PROVIDERS=wikimedia,picsum

Providers without configured API keys are automatically skipped.

Architecture

  • Backend: Python (FastAPI) + SQLAlchemy + APScheduler
  • Frontend: Alpine.js + Tailwind CSS (CDN, no build step)
  • Database: SQLite with 30-day retention
  • Container: Single Docker container

API Endpoints

Method Path Description
GET / Serve frontend
GET /api/news/latest Latest news item for hero block
GET /api/news Paginated news feed
GET /api/health Health check with news count
GET /config Frontend config (analytics)

SEO and Structured Data Contract

Homepage (/) contract:

  • Core metadata: title, description, robots, canonical link.
  • Social metadata: Open Graph (og:type, og:site_name, og:title, og:description, og:url, og:image) and Twitter (twitter:card, twitter:title, twitter:description, twitter:image).
  • JSON-LD graph includes:
    • Newspaper entity for site-level identity.
    • NewsArticle entities for hero and feed articles.

Policy page contract (/terms, /attribution):

  • Page-specific title and description.
  • robots metadata.
  • Canonical link for the route.
  • Open Graph and Twitter preview metadata.

Structured data field baseline for each NewsArticle:

  • headline, description, image, datePublished, dateModified, url, mainEntityOfPage, inLanguage, publisher, author.

Delivery Performance Header Contract

FastAPI middleware applies route-specific cache and compression behavior:

Route Class Cache-Control Notes
/static/* public, max-age=604800, immutable Long-lived static assets
/api/* public, max-age=60, stale-while-revalidate=120 Short-lived feed data
/, /terms, /attribution public, max-age=300, stale-while-revalidate=600 HTML routes

Additional headers:

  • Vary: Accept-Encoding for API responses.
  • X-Content-Type-Options: nosniff for all responses.
  • Gzip compression enabled via GZipMiddleware for eligible payloads.

SEO and Performance Verification Checklist

Run after local startup (python -m uvicorn backend.main:app --reload --port 8000):

# HTML route cache checks
curl -I http://localhost:8000/
curl -I http://localhost:8000/terms
curl -I http://localhost:8000/attribution

# API cache + vary checks
curl -I "http://localhost:8000/api/news/latest?language=en"
curl -I "http://localhost:8000/api/news?limit=5&language=en"

# Compression check (expect Content-Encoding: gzip for eligible payloads)
curl -s -H "Accept-Encoding: gzip" -D - "http://localhost:8000/api/news?limit=10&language=en" -o /dev/null

Manual acceptance checks:

  1. Homepage source contains one canonical link and Open Graph/Twitter metadata fields.
  2. Homepage JSON-LD contains one Newspaper entity and deduplicated NewsArticle entries.
  3. Hero/feed/modal images show shimmer placeholders until load/fallback completion.
  4. Feed and modal images use loading="lazy", explicit width/height, and decoding="async".
  5. Smooth scrolling behavior is enabled for in-page navigation interactions.

Structured-data validation:

  • Validate JSON-LD output using schema-aware validators (e.g., Schema.org validator or equivalent tooling) and confirm Newspaper + NewsArticle entities pass required field checks.

Regression checks:

  • Verify homepage rendering (hero, feed, modal).
  • Verify policy-page metadata output.
  • Verify cache/compression headers remain unchanged after SEO-related edits.

GET /api/news

Query parameters:

  • cursor (int, optional): Last item ID for pagination
  • limit (int, default 10): Items per page (max 50)
  • exclude_hero (int, optional): Hero item ID to exclude

Response:

{
  "items": [{ "id": 1, "headline": "...", "summary": "...", "source_url": "...", "image_url": "...", "image_credit": "...", "published_at": "...", "created_at": "..." }],
  "next_cursor": 5,
  "has_more": true
}

Deployment

docker build -t clawfort .
docker run -d \
  -e PERPLEXITY_API_KEY=pplx-xxx \
  -v clawfort-data:/app/data \
  -v clawfort-images:/app/backend/static/images \
  -p 8000:8000 \
  clawfort

Data persists across restarts via Docker volumes.

Scheduled Jobs

  • Hourly: Fetch latest AI news from Perplexity API
  • Nightly (3 AM): Archive news older than 30 days, delete archived items older than 60 days
Description
No description provided
Readme 19 MiB
Languages
Python 36.3%
HTML 32.9%
TypeScript 30.3%
JavaScript 0.3%
Dockerfile 0.2%