Files
didnt-read/AGENTS.md
2026-01-27 13:24:03 -05:00

20 KiB

AGENTS.md - Privacy Policy Analyzer

This file provides essential context and guidelines for AI agents working on this project.

Project Overview

Privacy Policy Analyzer - A self-hosted web application that analyzes website privacy policies using OpenAI's GPT models. Provides easy-to-understand A-E grades and detailed findings about privacy practices.

Inspiration: ToS;DR (Terms of Service; Didn't Read) - but focused specifically on privacy policies.

Repository: Private pet project, no monetization

Tech Stack

  • Runtime: Bun (JavaScript, NOT TypeScript)
  • Web Framework: Native Bun HTTP server or Elysia.js (lightweight)
  • Database: PostgreSQL 15
  • Search: Meilisearch v1.6
  • Cache: Redis 7
  • Templating: EJS
  • AI: OpenAI API (GPT-4o/GPT-4-turbo)
  • Containerization: Docker + Docker Compose
  • Hosting: Self-hosted on Linode

Project Structure

privacy-policy-analyzer/
├── docker-compose.yml          # Service orchestration
├── Dockerfile                  # Bun app container
├── .env                        # Environment variables (gitignored)
├── package.json               # Bun dependencies
├── src/
│   ├── app.js                 # Entry point
│   ├── config/                # Configuration files
│   ├── models/                # Database models
│   ├── routes/                # Route definitions
│   ├── controllers/           # Request handlers
│   ├── services/              # Business logic
│   ├── middleware/            # Express-style middleware
│   ├── views/                 # EJS templates
│   └── utils/                 # Helper functions
├── migrations/                # SQL migrations
└── public/                    # Static assets

Progress Tracking

This project uses a task tracking system to monitor progress. Tasks are managed using the todo tool and organized by priority:

Priority Levels

  • High: Critical infrastructure and core functionality
  • Medium: Essential features and business logic
  • Low: Enhancements, optimizations, and polish

Progress Checklist (48 Tasks Total)

Phase 1: Infrastructure Setup (High Priority) - COMPLETED ✓

  • Create project root files (docker-compose.yml, Dockerfile, .env.example, package.json)
  • Create directory structure (src/, migrations/, public/)
  • Configure PostgreSQL in docker-compose.yml with persistent volume
  • Configure Redis in docker-compose.yml with persistent volume
  • Configure Meilisearch in docker-compose.yml with persistent volume
  • Create Bun Dockerfile with optimized build
  • Set up .env.example with all required environment variables
  • Create package.json with dependencies (postgres, ejs, openai, etc.)
  • Test Docker Compose setup - verify all services start

Phase 2: Database & Models (Medium Priority) - COMPLETED ✓

  • Create database migration file (001_initial.sql) with schema
  • Create src/config/database.js for PostgreSQL connection
  • Create src/config/redis.js for Redis connection
  • Create src/config/meilisearch.js for Meilisearch client
  • Create src/config/openai.js for OpenAI client
  • Create database migration runner script
  • Create src/models/Service.js
  • Create src/models/PolicyVersion.js
  • Create src/models/Analysis.js
  • Create src/models/AdminSession.js

Phase 3: Middleware & Routes (Medium Priority) - COMPLETED ✓

  • Create src/middleware/auth.js for session authentication
  • Create src/middleware/errorHandler.js
  • Create src/middleware/security.js for security headers
  • Create src/middleware/rateLimiter.js
  • Create src/routes/admin.js with authentication routes
  • Create src/views/admin/login.ejs
  • Create admin dashboard view
  • Create src/routes/public.js for public pages
  • Create main layout EJS template with SEO meta tags
  • Create public homepage view with service listing
  • Create service detail page view with last analyzed date display

Phase 4: Services & Features (Medium Priority) - COMPLETED ✓

  • Create src/services/policyFetcher.js to fetch policy from URL
  • Create src/services/aiAnalyzer.js with OpenAI integration
  • Create admin service management forms (add/edit)
  • Implement manual analysis trigger in admin panel
  • Create src/services/scheduler.js for cron jobs
  • Create src/services/searchIndexer.js for Meilisearch

Phase 5: Enhancements (Low Priority) - IN PROGRESS

  • Implement Redis caching for public pages
  • Create sitemap.xml generator
  • Create robots.txt
  • Add structured data (Schema.org) to service pages
  • Implement accessibility features (WCAG 2.1 AA) - Already implemented
  • Add CSS styling with focus indicators - Already implemented
  • Implement skip to main content link - Already implemented
  • Performance testing and optimization
  • Security audit and penetration testing
  • Accessibility audit with axe-core
  • SEO audit and optimization
  • Create comprehensive documentation

Working with Tasks

  • ALWAYS check the current todo list before starting work
  • Update task status to in_progress when starting work
  • Mark complete immediately after finishing a task
  • Verify completed tasks using testing checklists in this document
  • Review progress regularly to maintain momentum

Current Phase Focus

We are currently in Phase 5: Enhancements. Phases 1-4 are complete. All core functionality is working. Remaining tasks are optimizations, audits, and documentation.

Critical Rules

1. JavaScript Only

  • NO TypeScript
  • Use JSDoc comments for type documentation when helpful
  • Bun supports modern JavaScript (ES2023)

2. Database Conventions

  • Use postgres library (Bun-compatible)
  • Always use parameterized queries
  • Migrations are in migrations/ folder, numbered sequentially
  • Never write raw SQL in routes/controllers

3. Environment Variables

ALL configuration goes in .env:

DATABASE_URL=postgresql://user:pass@postgres:5432/dbname
REDIS_URL=redis://redis:6379
MEILISEARCH_URL=http://meilisearch:7700
MEILISEARCH_API_KEY=key
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
ADMIN_USERNAME=admin
ADMIN_PASSWORD=changeme
SESSION_SECRET=random_string
PORT=3000
NODE_ENV=production

4. AI Analysis Guidelines

  • Always use OpenAI's JSON mode for structured output
  • Store raw AI response in database (for debugging)
  • Implement retry logic with exponential backoff
  • Rate limit AI calls (max 10/minute)
  • Handle AI failures gracefully - don't crash the app

5. Security Requirements (OWASP Top 10)

  • NEVER commit .env file
  • NEVER log API keys or passwords
  • Use bcrypt for password hashing (cost factor 12)
  • Session tokens stored in Redis with expiration (24 hours)
  • All admin routes require authentication middleware
  • Input validation on ALL user inputs with proper sanitization
  • SQL injection prevention via parameterized queries ONLY
  • XSS prevention via EJS auto-escaping AND Content Security Policy
  • Rate limiting: 100 req/15min public, 30 req/15min admin, 10 req/hour AI
  • Security headers REQUIRED on all responses:
    • Strict-Transport-Security
    • Content-Security-Policy
    • X-Content-Type-Options: nosniff
    • X-Frame-Options: DENY
    • X-XSS-Protection: 1; mode=block
    • Referrer-Policy: strict-origin-when-cross-origin
  • HTTPS only with HSTS
  • Secure cookies (HttpOnly, Secure, SameSite=Strict)
  • Regular dependency audits (bun audit)
  • Non-root Docker user
  • Log authentication attempts and errors (NEVER log sensitive data)

6. Error Handling Pattern

try {
  // Operation
} catch (error) {
  console.error('Context:', error.message);
  // Return user-friendly error
  return new Response('Error message', { status: 500 });
}

7. Code Style

  • Use single quotes for strings
  • 2-space indentation
  • Semicolons required
  • camelCase for variables/functions
  • PascalCase for classes
  • No trailing commas
  • Max line length: 100 characters

Common Commands

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f app

# Run database migrations
docker-compose exec app bun run migrate

# Restart app only
docker-compose restart app

# Shell into app container
docker-compose exec app sh

# Install new dependency
docker-compose exec app bun add package-name

# Run tests (when added)
docker-compose exec app bun test

Database Schema

services

  • id (PK, serial)
  • name (varchar)
  • url (varchar)
  • logo_url (varchar, nullable)
  • policy_url (varchar)
  • created_at (timestamp)
  • updated_at (timestamp)

policy_versions

  • id (PK, serial)
  • service_id (FK)
  • content (text)
  • content_hash (varchar 64)
  • fetched_at (timestamp)
  • created_at (timestamp)

analyses

  • id (PK, serial)
  • service_id (FK)
  • policy_version_id (FK)
  • overall_score (char 1: A/B/C/D/E)
  • findings (JSONB)
  • raw_analysis (text)
  • created_at (timestamp) - This is the "last analyzed" date, must be displayed on all service pages
  • updated_at (timestamp)

admin_sessions

  • id (PK, serial)
  • session_token (varchar, unique)
  • created_at (timestamp)
  • expires_at (timestamp)

AI Prompt Template

When modifying AI analysis, use this structure:

const prompt = {
  model: process.env.OPENAI_MODEL,
  messages: [
    {
      role: 'system',
      content: `You are a privacy policy analyzer. Analyze the following privacy policy and provide a structured assessment.

Scoring Criteria:
- A: Excellent privacy practices
- B: Good with minor issues
- C: Acceptable but concerns exist
- D: Poor privacy practices
- E: Very invasive, major concerns

Categories:
1. Data Collection (what's collected)
2. Data Sharing (third parties)
3. User Rights (access, deletion, etc.)
4. Data Retention (how long kept)
5. Tracking & Security

Respond ONLY with valid JSON matching this schema:
{
  "overall_score": "A|B|C|D|E",
  "score_breakdown": { "data_collection": "A|B|C|D|E", ... },
  "findings": { "positive": [...], "negative": [...], "neutral": [...] },
  "data_types_collected": [...],
  "third_parties": [...],
  "summary": "string"
}`
    },
    {
      role: 'user',
      content: `Analyze this privacy policy:\n\n${policyText}`
    }
  ],
  response_format: { type: 'json_object' }
};

SEO Requirements

Meta Tags (All Public Pages)

Every public page MUST include:

<!-- Basic Meta -->
<title>Descriptive Title - Privacy Policy Analyzer</title>
<meta name="description" content="150-160 character description">
<link rel="canonical" href="https://example.com/current-path">

<!-- Open Graph -->
<meta property="og:title" content="Page Title">
<meta property="og:description" content="Page description">
<meta property="og:image" content="https://example.com/og-image.jpg">
<meta property="og:url" content="https://example.com/current-path">
<meta property="og:type" content="website">

<!-- Twitter Card -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Page Title">
<meta name="twitter:description" content="Page description">
<meta name="twitter:image" content="https://example.com/twitter-image.jpg">

Structured Data (Schema.org)

Include JSON-LD structured data on all service pages:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Review",
  "itemReviewed": {
    "@type": "Organization",
    "name": "Service Name"
  },
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "4",
    "bestRating": "5",
    "worstRating": "1"
  }
}
</script>

Semantic HTML Requirements

  • One <h1> per page with main topic
  • Logical heading hierarchy (no skipping levels)
  • Use <header>, <nav>, <main>, <article>, <footer>
  • Breadcrumb navigation with Schema.org markup
  • Descriptive link text (no "click here")
  • Display "Last Analyzed" date prominently on all service pages (from analyses.created_at)

Accessibility Requirements (WCAG 2.1 AA)

Mandatory Implementation

  1. Color Contrast: Minimum 4.5:1 for text, 3:1 for UI components
  2. Keyboard Navigation: All features accessible via keyboard only
  3. Focus Management: Visible focus indicators (2px solid outline minimum)
  4. Alt Text: All images must have descriptive alt text
  5. Form Labels: All inputs must have associated labels
  6. ARIA Landmarks: banner, main, navigation, contentinfo
  7. Skip Link: "Skip to main content" link at top of page

Accessibility Patterns

<!-- Skip Link -->
<a href="#main-content" class="skip-link">Skip to main content</a>

<!-- ARIA Landmarks -->
<header role="banner">...</header>
<nav role="navigation" aria-label="Main">...</nav>
<main id="main-content" role="main">...</main>
<footer role="contentinfo">...</footer>

<!-- Accessible Form -->
<label for="service-name">Service Name <span aria-label="required">*</span></label>
<input 
  type="text" 
  id="service-name" 
  name="serviceName" 
  required
  aria-required="true"
  aria-describedby="name-error"
>
<div id="name-error" role="alert" class="error-message"></div>

<!-- Accessible Button with Icon -->
<button aria-label="Close menu">
  <span aria-hidden="true">&times;</span>
</button>

<!-- Accessible Card Link -->
<article>
  <h2><a href="/service/facebook" aria-describedby="facebook-grade">Facebook</a></h2>
  <span id="facebook-grade" class="visually-hidden">Privacy Grade E</span>
</article>

Focus Styles (CSS)

/* Visible focus indicators */
:focus {
  outline: 2px solid #0066cc;
  outline-offset: 2px;
}

/* Skip link styling */
.skip-link {
  position: absolute;
  top: -40px;
  left: 0;
  background: #000;
  color: #fff;
  padding: 8px;
  z-index: 100;
}

.skip-link:focus {
  top: 0;
}

/* Visually hidden but screen-reader accessible */
.visually-hidden {
  position: absolute;
  width: 1px;
  height: 1px;
  padding: 0;
  margin: -1px;
  overflow: hidden;
  clip: rect(0, 0, 0, 0);
  border: 0;
}

File Templates

New Route (src/routes/example.js)

import { Router } from '../utils/router.js';
import { authenticate } from '../middleware/auth.js';

const router = new Router();

// Public route
router.get('/example', async (req, res) => {
  // Handler
});

// Protected route
router.get('/admin/example', authenticate, async (req, res) => {
  // Handler
});

export default router;

New Model (src/models/Example.js)

import { sql } from '../config/database.js';

export class Example {
  static async findById(id) {
    const result = await sql`SELECT * FROM examples WHERE id = ${id}`;
    return result[0] || null;
  }

  static async create(data) {
    const result = await sql`
      INSERT INTO examples (field1, field2)
      VALUES (${data.field1}, ${data.field2})
      RETURNING *
    `;
    return result[0];
  }
}

New Service (src/services/exampleService.js)

import { Example } from '../models/Example.js';

export const exampleService = {
  async performAction(params) {
    try {
      // Business logic
      return { success: true, data };
    } catch (error) {
      console.error('Service error:', error);
      throw error;
    }
  }
};

Testing Checklist

Functional Testing

  • App starts without errors (docker-compose up)
  • No hardcoded secrets or credentials
  • Database queries use parameterized statements
  • Admin routes require authentication
  • AI analysis handles errors gracefully
  • No sensitive data in logs

SEO Testing

  • All pages have unique <title> tags (50-60 chars)
  • All pages have meta descriptions (150-160 chars)
  • Open Graph tags present on all public pages
  • Canonical URLs set correctly
  • Sitemap.xml auto-generates and is valid
  • robots.txt allows public, blocks admin
  • Semantic HTML5 structure (header, nav, main, article, footer)
  • Single H1 per page with logical heading hierarchy
  • All images have descriptive alt text
  • Structured data (Schema.org) validates

Performance Testing

  • Lighthouse score ≥ 90 on all metrics
  • First Contentful Paint < 1.0s
  • Largest Contentful Paint < 2.5s
  • Time to Interactive < 3.8s
  • Cumulative Layout Shift < 0.1
  • Redis caching working (verify with redis-cli)
  • Gzip/Brotli compression enabled
  • Images optimized (WebP format, proper sizing)
  • CSS/JS minified

Security Testing

  • Security headers present on all responses
  • HTTPS enforced (HSTS header)
  • Cookies have HttpOnly, Secure, SameSite flags
  • Rate limiting prevents abuse (test with ab or wrk)
  • SQL injection attempts blocked
  • XSS attempts blocked (test with <script>alert(1)</script>)
  • Admin routes inaccessible without authentication
  • Session expires after 24 hours
  • bun audit passes with no critical vulnerabilities
  • No secrets in logs or error messages

Accessibility Testing (WCAG 2.1 AA)

  • All images have alt text
  • Color contrast ≥ 4.5:1 for normal text (test with WebAIM)
  • Color contrast ≥ 3:1 for large text and UI components
  • Keyboard navigation works throughout site
  • Focus indicators visible (2px outline minimum)
  • Skip to main content link present
  • Form labels associated with inputs
  • Page titles descriptive and unique
  • ARIA landmarks used (banner, main, navigation)
  • Screen reader announces content correctly (test with NVDA/VoiceOver)
  • Touch targets ≥ 44x44px
  • No flashing content (>3 Hz)
  • axe-core passes with 0 violations

Performance Guidelines

Caching Strategy

  • Public pages: Redis TTL 1 hour
  • Analysis results: Redis TTL 24 hours
  • API responses: Redis TTL 5 minutes
  • Meilisearch queries: Redis TTL 10 minutes
  • Cache invalidation on data update

Database Optimization

  • Index frequently queried columns: service.name, analysis.overall_score
  • Use connection pooling (max 20 connections)
  • Query optimization with EXPLAIN ANALYZE
  • Lazy load analysis results
  • Paginate service listings (25 per page)
  • Compress large policy texts before storage

Asset Optimization

  • Minify CSS/JS for production
  • Use WebP format for images
  • Implement lazy loading for images
  • Critical CSS inline for above-fold content
  • Use async/defer for non-critical scripts
  • Brotli + Gzip compression for text responses

Target Metrics

  • First Contentful Paint (FCP): < 1.0s
  • Largest Contentful Paint (LCP): < 2.5s
  • Time to Interactive (TTI): < 3.8s
  • Cumulative Layout Shift (CLS): < 0.1
  • Lighthouse Performance Score: ≥ 90

Deployment Notes

  • All services run via Docker Compose
  • Persistent volumes for PostgreSQL, Redis, Meilisearch
  • Restart policy: always (except during development)
  • Logs go to stdout/stderr (Docker handles collection)
  • Environment variables set in .env on host

Troubleshooting

App won't start

  1. Check .env exists and has all required vars
  2. Ensure ports 3000, 5432, 6379, 7700 are free
  3. Run docker-compose down -v and docker-compose up -d

Database connection fails

  1. Verify DATABASE_URL format
  2. Check postgres container is running: docker-compose ps
  3. Check logs: docker-compose logs postgres

AI analysis fails

  1. Verify OPENAI_API_KEY is set
  2. Check OpenAI API status
  3. Review raw_analysis column for error details

External Dependencies

Contact & Resources

  • Project Type: Private pet project
  • Hosting: Self-hosted Linode instance
  • No external contributors expected
  • No CI/CD pipeline (manual deployment)

Change Log

When making significant changes, update this section:

2026-01-27: Completed Phase 1-4 - Infrastructure, Database, Middleware, Routes, and Services. All core functionality working including Docker setup, PostgreSQL/Redis/Meilisearch, AI analysis with OpenAI, policy fetching, and cron scheduling.

Last Updated: 2026-01-27 Version: 1.0