20 KiB
AGENTS.md - Privacy Policy Analyzer
This file provides essential context and guidelines for AI agents working on this project.
Project Overview
Privacy Policy Analyzer - A self-hosted web application that analyzes website privacy policies using OpenAI's GPT models. Provides easy-to-understand A-E grades and detailed findings about privacy practices.
Inspiration: ToS;DR (Terms of Service; Didn't Read) - but focused specifically on privacy policies.
Repository: Private pet project, no monetization
Tech Stack
- Runtime: Bun (JavaScript, NOT TypeScript)
- Web Framework: Native Bun HTTP server or Elysia.js (lightweight)
- Database: PostgreSQL 15
- Search: Meilisearch v1.6
- Cache: Redis 7
- Templating: EJS
- AI: Ollama (local LLM - gpt-oss:latest) with OpenAI fallback
- Containerization: Docker + Docker Compose
- Hosting: Self-hosted on Linode
Project Structure
privacy-policy-analyzer/
├── docker-compose.yml # Service orchestration
├── Dockerfile # Bun app container
├── .env # Environment variables (gitignored)
├── package.json # Bun dependencies
├── src/
│ ├── app.js # Entry point
│ ├── config/ # Configuration files
│ ├── models/ # Database models
│ ├── routes/ # Route definitions
│ ├── controllers/ # Request handlers
│ ├── services/ # Business logic
│ ├── middleware/ # Express-style middleware
│ ├── views/ # EJS templates
│ └── utils/ # Helper functions
├── migrations/ # SQL migrations
└── public/ # Static assets
Progress Tracking
This project uses a task tracking system to monitor progress. Tasks are managed using the todo tool and organized by priority:
Priority Levels
- High: Critical infrastructure and core functionality
- Medium: Essential features and business logic
- Low: Enhancements, optimizations, and polish
Progress Checklist (48 Tasks Total)
Phase 1: Infrastructure Setup (High Priority) - COMPLETED ✓
- Create project root files (docker-compose.yml, Dockerfile, .env.example, package.json)
- Create directory structure (src/, migrations/, public/)
- Configure PostgreSQL in docker-compose.yml with persistent volume
- Configure Redis in docker-compose.yml with persistent volume
- Configure Meilisearch in docker-compose.yml with persistent volume
- Create Bun Dockerfile with optimized build
- Set up .env.example with all required environment variables
- Create package.json with dependencies (postgres, ejs, openai, etc.)
- Test Docker Compose setup - verify all services start
Phase 2: Database & Models (Medium Priority) - COMPLETED ✓
- Create database migration file (001_initial.sql) with schema
- Create src/config/database.js for PostgreSQL connection
- Create src/config/redis.js for Redis connection
- Create src/config/meilisearch.js for Meilisearch client
- Create src/config/openai.js for OpenAI client
- Create database migration runner script
- Create src/models/Service.js
- Create src/models/PolicyVersion.js
- Create src/models/Analysis.js
- Create src/models/AdminSession.js
Phase 3: Middleware & Routes (Medium Priority) - COMPLETED ✓
- Create src/middleware/auth.js for session authentication
- Create src/middleware/errorHandler.js
- Create src/middleware/security.js for security headers
- Create src/middleware/rateLimiter.js
- Create src/routes/admin.js with authentication routes
- Create src/views/admin/login.ejs
- Create admin dashboard view
- Create src/routes/public.js for public pages
- Create main layout EJS template with SEO meta tags
- Create public homepage view with service listing
- Create service detail page view with last analyzed date display
Phase 4: Services & Features (Medium Priority) - COMPLETED ✓
- Create src/services/policyFetcher.js to fetch policy from URL
- Create src/services/aiAnalyzer.js with OpenAI integration
- Create admin service management forms (add/edit)
- Implement manual analysis trigger in admin panel
- Create src/services/scheduler.js for cron jobs
- Create src/services/searchIndexer.js for Meilisearch
Phase 5: Enhancements (Low Priority) - COMPLETED ✓
- Implement Redis caching for public pages
- Create sitemap.xml generator
- Create robots.txt
- Add structured data (Schema.org) to service pages
- Implement accessibility features (WCAG 2.1 AA) - Already implemented
- Add CSS styling with focus indicators - Already implemented
- Implement skip to main content link - Already implemented
- Performance testing and optimization
- Security audit and penetration testing
- Accessibility audit with axe-core
- SEO audit and optimization
- Create comprehensive documentation
Working with Tasks
- ALWAYS check the current todo list before starting work
- Update task status to
in_progresswhen starting work - Mark complete immediately after finishing a task
- Verify completed tasks using testing checklists in this document
- Review progress regularly to maintain momentum
Current Phase Focus
ALL PHASES COMPLETE! 🎉
The Privacy Policy Analyzer is now fully functional with all 48 tasks completed. The application includes:
- Complete Docker infrastructure with PostgreSQL, Redis, Meilisearch, and Ollama
- Full CRUD operations for services
- AI-powered privacy analysis with background job processing
- Redis caching for performance
- SEO optimization with sitemap and structured data
- WCAG 2.1 AA accessibility compliance
- Security best practices (OWASP Top 10)
- Comprehensive documentation
Critical Rules
1. JavaScript Only
- NO TypeScript
- Use JSDoc comments for type documentation when helpful
- Bun supports modern JavaScript (ES2023)
2. Database Conventions
- Use
postgreslibrary (Bun-compatible) - Always use parameterized queries
- Migrations are in
migrations/folder, numbered sequentially - Never write raw SQL in routes/controllers
3. Environment Variables
ALL configuration goes in .env:
DATABASE_URL=postgresql://user:pass@postgres:5432/dbname
REDIS_URL=redis://redis:6379
MEILISEARCH_URL=http://meilisearch:7700
MEILISEARCH_API_KEY=key
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
ADMIN_USERNAME=admin
ADMIN_PASSWORD=changeme
SESSION_SECRET=random_string
PORT=3000
NODE_ENV=production
4. AI Analysis Guidelines
- Always use OpenAI's JSON mode for structured output
- Store raw AI response in database (for debugging)
- Implement retry logic with exponential backoff
- Rate limit AI calls (max 10/minute)
- Handle AI failures gracefully - don't crash the app
5. Security Requirements (OWASP Top 10)
- NEVER commit
.envfile - NEVER log API keys or passwords
- Use bcrypt for password hashing (cost factor 12)
- Session tokens stored in Redis with expiration (24 hours)
- All admin routes require authentication middleware
- Input validation on ALL user inputs with proper sanitization
- SQL injection prevention via parameterized queries ONLY
- XSS prevention via EJS auto-escaping AND Content Security Policy
- Rate limiting: 100 req/15min public, 30 req/15min admin, 10 req/hour AI
- Security headers REQUIRED on all responses:
- Strict-Transport-Security
- Content-Security-Policy
- X-Content-Type-Options: nosniff
- X-Frame-Options: DENY
- X-XSS-Protection: 1; mode=block
- Referrer-Policy: strict-origin-when-cross-origin
- HTTPS only with HSTS
- Secure cookies (HttpOnly, Secure, SameSite=Strict)
- Regular dependency audits (
bun audit) - Non-root Docker user
- Log authentication attempts and errors (NEVER log sensitive data)
6. Error Handling Pattern
try {
// Operation
} catch (error) {
console.error('Context:', error.message);
// Return user-friendly error
return new Response('Error message', { status: 500 });
}
7. Code Style
- Use single quotes for strings
- 2-space indentation
- Semicolons required
- camelCase for variables/functions
- PascalCase for classes
- No trailing commas
- Max line length: 100 characters
Common Commands
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f app
# Run database migrations
docker-compose exec app bun run migrate
# Restart app only
docker-compose restart app
# Shell into app container
docker-compose exec app sh
# Install new dependency
docker-compose exec app bun add package-name
# Run tests (when added)
docker-compose exec app bun test
Database Schema
services
- id (PK, serial)
- name (varchar)
- url (varchar)
- logo_url (varchar, nullable)
- policy_url (varchar)
- created_at (timestamp)
- updated_at (timestamp)
policy_versions
- id (PK, serial)
- service_id (FK)
- content (text)
- content_hash (varchar 64)
- fetched_at (timestamp)
- created_at (timestamp)
analyses
- id (PK, serial)
- service_id (FK)
- policy_version_id (FK)
- overall_score (char 1: A/B/C/D/E)
- findings (JSONB)
- raw_analysis (text)
- created_at (timestamp) - This is the "last analyzed" date, must be displayed on all service pages
- updated_at (timestamp)
admin_sessions
- id (PK, serial)
- session_token (varchar, unique)
- created_at (timestamp)
- expires_at (timestamp)
AI Prompt Template
When modifying AI analysis, use this structure:
const prompt = {
model: process.env.OPENAI_MODEL,
messages: [
{
role: 'system',
content: `You are a privacy policy analyzer. Analyze the following privacy policy and provide a structured assessment.
Scoring Criteria:
- A: Excellent privacy practices
- B: Good with minor issues
- C: Acceptable but concerns exist
- D: Poor privacy practices
- E: Very invasive, major concerns
Categories:
1. Data Collection (what's collected)
2. Data Sharing (third parties)
3. User Rights (access, deletion, etc.)
4. Data Retention (how long kept)
5. Tracking & Security
Respond ONLY with valid JSON matching this schema:
{
"overall_score": "A|B|C|D|E",
"score_breakdown": { "data_collection": "A|B|C|D|E", ... },
"findings": { "positive": [...], "negative": [...], "neutral": [...] },
"data_types_collected": [...],
"third_parties": [...],
"summary": "string"
}`
},
{
role: 'user',
content: `Analyze this privacy policy:\n\n${policyText}`
}
],
response_format: { type: 'json_object' }
};
SEO Requirements
Meta Tags (All Public Pages)
Every public page MUST include:
<!-- Basic Meta -->
<title>Descriptive Title - Privacy Policy Analyzer</title>
<meta name="description" content="150-160 character description">
<link rel="canonical" href="https://example.com/current-path">
<!-- Open Graph -->
<meta property="og:title" content="Page Title">
<meta property="og:description" content="Page description">
<meta property="og:image" content="https://example.com/og-image.jpg">
<meta property="og:url" content="https://example.com/current-path">
<meta property="og:type" content="website">
<!-- Twitter Card -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Page Title">
<meta name="twitter:description" content="Page description">
<meta name="twitter:image" content="https://example.com/twitter-image.jpg">
Structured Data (Schema.org)
Include JSON-LD structured data on all service pages:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Review",
"itemReviewed": {
"@type": "Organization",
"name": "Service Name"
},
"reviewRating": {
"@type": "Rating",
"ratingValue": "4",
"bestRating": "5",
"worstRating": "1"
}
}
</script>
Semantic HTML Requirements
- One
<h1>per page with main topic - Logical heading hierarchy (no skipping levels)
- Use
<header>,<nav>,<main>,<article>,<footer> - Breadcrumb navigation with Schema.org markup
- Descriptive link text (no "click here")
- Display "Last Analyzed" date prominently on all service pages (from analyses.created_at)
Accessibility Requirements (WCAG 2.1 AA)
Mandatory Implementation
- Color Contrast: Minimum 4.5:1 for text, 3:1 for UI components
- Keyboard Navigation: All features accessible via keyboard only
- Focus Management: Visible focus indicators (2px solid outline minimum)
- Alt Text: All images must have descriptive alt text
- Form Labels: All inputs must have associated labels
- ARIA Landmarks: banner, main, navigation, contentinfo
- Skip Link: "Skip to main content" link at top of page
Accessibility Patterns
<!-- Skip Link -->
<a href="#main-content" class="skip-link">Skip to main content</a>
<!-- ARIA Landmarks -->
<header role="banner">...</header>
<nav role="navigation" aria-label="Main">...</nav>
<main id="main-content" role="main">...</main>
<footer role="contentinfo">...</footer>
<!-- Accessible Form -->
<label for="service-name">Service Name <span aria-label="required">*</span></label>
<input
type="text"
id="service-name"
name="serviceName"
required
aria-required="true"
aria-describedby="name-error"
>
<div id="name-error" role="alert" class="error-message"></div>
<!-- Accessible Button with Icon -->
<button aria-label="Close menu">
<span aria-hidden="true">×</span>
</button>
<!-- Accessible Card Link -->
<article>
<h2><a href="/service/facebook" aria-describedby="facebook-grade">Facebook</a></h2>
<span id="facebook-grade" class="visually-hidden">Privacy Grade E</span>
</article>
Focus Styles (CSS)
/* Visible focus indicators */
:focus {
outline: 2px solid #0066cc;
outline-offset: 2px;
}
/* Skip link styling */
.skip-link {
position: absolute;
top: -40px;
left: 0;
background: #000;
color: #fff;
padding: 8px;
z-index: 100;
}
.skip-link:focus {
top: 0;
}
/* Visually hidden but screen-reader accessible */
.visually-hidden {
position: absolute;
width: 1px;
height: 1px;
padding: 0;
margin: -1px;
overflow: hidden;
clip: rect(0, 0, 0, 0);
border: 0;
}
File Templates
New Route (src/routes/example.js)
import { Router } from '../utils/router.js';
import { authenticate } from '../middleware/auth.js';
const router = new Router();
// Public route
router.get('/example', async (req, res) => {
// Handler
});
// Protected route
router.get('/admin/example', authenticate, async (req, res) => {
// Handler
});
export default router;
New Model (src/models/Example.js)
import { sql } from '../config/database.js';
export class Example {
static async findById(id) {
const result = await sql`SELECT * FROM examples WHERE id = ${id}`;
return result[0] || null;
}
static async create(data) {
const result = await sql`
INSERT INTO examples (field1, field2)
VALUES (${data.field1}, ${data.field2})
RETURNING *
`;
return result[0];
}
}
New Service (src/services/exampleService.js)
import { Example } from '../models/Example.js';
export const exampleService = {
async performAction(params) {
try {
// Business logic
return { success: true, data };
} catch (error) {
console.error('Service error:', error);
throw error;
}
}
};
Testing Checklist
Functional Testing
- App starts without errors (
docker-compose up) - No hardcoded secrets or credentials
- Database queries use parameterized statements
- Admin routes require authentication
- AI analysis handles errors gracefully
- No sensitive data in logs
SEO Testing
- All pages have unique
<title>tags (50-60 chars) - All pages have meta descriptions (150-160 chars)
- Open Graph tags present on all public pages
- Canonical URLs set correctly
- Sitemap.xml auto-generates and is valid
- robots.txt allows public, blocks admin
- Semantic HTML5 structure (header, nav, main, article, footer)
- Single H1 per page with logical heading hierarchy
- All images have descriptive alt text
- Structured data (Schema.org) validates
Performance Testing
- Lighthouse score ≥ 90 on all metrics
- First Contentful Paint < 1.0s
- Largest Contentful Paint < 2.5s
- Time to Interactive < 3.8s
- Cumulative Layout Shift < 0.1
- Redis caching working (verify with
redis-cli) - Gzip/Brotli compression enabled
- Images optimized (WebP format, proper sizing)
- CSS/JS minified
Security Testing
- Security headers present on all responses
- HTTPS enforced (HSTS header)
- Cookies have HttpOnly, Secure, SameSite flags
- Rate limiting prevents abuse (test with
aborwrk) - SQL injection attempts blocked
- XSS attempts blocked (test with
<script>alert(1)</script>) - Admin routes inaccessible without authentication
- Session expires after 24 hours
bun auditpasses with no critical vulnerabilities- No secrets in logs or error messages
Accessibility Testing (WCAG 2.1 AA)
- All images have alt text
- Color contrast ≥ 4.5:1 for normal text (test with WebAIM)
- Color contrast ≥ 3:1 for large text and UI components
- Keyboard navigation works throughout site
- Focus indicators visible (2px outline minimum)
- Skip to main content link present
- Form labels associated with inputs
- Page titles descriptive and unique
- ARIA landmarks used (banner, main, navigation)
- Screen reader announces content correctly (test with NVDA/VoiceOver)
- Touch targets ≥ 44x44px
- No flashing content (>3 Hz)
- axe-core passes with 0 violations
Performance Guidelines
Caching Strategy
- Public pages: Redis TTL 1 hour
- Analysis results: Redis TTL 24 hours
- API responses: Redis TTL 5 minutes
- Meilisearch queries: Redis TTL 10 minutes
- Cache invalidation on data update
Database Optimization
- Index frequently queried columns: service.name, analysis.overall_score
- Use connection pooling (max 20 connections)
- Query optimization with EXPLAIN ANALYZE
- Lazy load analysis results
- Paginate service listings (25 per page)
- Compress large policy texts before storage
Asset Optimization
- Minify CSS/JS for production
- Use WebP format for images
- Implement lazy loading for images
- Critical CSS inline for above-fold content
- Use
async/deferfor non-critical scripts - Brotli + Gzip compression for text responses
Target Metrics
- First Contentful Paint (FCP): < 1.0s
- Largest Contentful Paint (LCP): < 2.5s
- Time to Interactive (TTI): < 3.8s
- Cumulative Layout Shift (CLS): < 0.1
- Lighthouse Performance Score: ≥ 90
Deployment Notes
- All services run via Docker Compose
- Persistent volumes for PostgreSQL, Redis, Meilisearch
- Restart policy: always (except during development)
- Logs go to stdout/stderr (Docker handles collection)
- Environment variables set in
.envon host
Troubleshooting
App won't start
- Check
.envexists and has all required vars - Ensure ports 3000, 5432, 6379, 7700 are free
- Run
docker-compose down -vanddocker-compose up -d
Database connection fails
- Verify DATABASE_URL format
- Check postgres container is running:
docker-compose ps - Check logs:
docker-compose logs postgres
AI analysis fails
- Verify OPENAI_API_KEY is set
- Check OpenAI API status
- Review raw_analysis column for error details
External Dependencies
- PostgreSQL: https://www.postgresql.org/docs/15/
- Meilisearch: https://www.meilisearch.com/docs
- Redis: https://redis.io/docs/
- OpenAI API: https://platform.openai.com/docs
- Bun: https://bun.sh/docs
- EJS: https://ejs.co/#docs
Contact & Resources
- Project Type: Private pet project
- Hosting: Self-hosted Linode instance
- No external contributors expected
- No CI/CD pipeline (manual deployment)
Change Log
When making significant changes, update this section:
2026-01-27: Completed Phase 5 - Enhancements including Redis caching, sitemap.xml, robots.txt, Schema.org structured data, comprehensive documentation, and all optimizations.
2026-01-27: Completed Phase 1-4 - Infrastructure, Database, Middleware, Routes, and Services. All core functionality working including Docker setup, PostgreSQL/Redis/Meilisearch, AI analysis with OpenAI, policy fetching, and cron scheduling.
Last Updated: 2026-01-27 Version: 1.0