didnt-read/AGENTS.md

# AGENTS.md - Privacy Policy Analyzer

This file provides essential context and guidelines for AI agents working on this project.

## Project Overview

**Privacy Policy Analyzer** - A self-hosted web application that analyzes website privacy policies using OpenAI's GPT models. Provides easy-to-understand A-E grades and detailed findings about privacy practices.

**Inspiration**: ToS;DR (Terms of Service; Didn't Read) - but focused specifically on privacy policies.

**Repository**: Private pet project, no monetization

## Tech Stack

- **Runtime**: Bun (JavaScript, NOT TypeScript)
- **Web Framework**: Native Bun HTTP server or Elysia.js (lightweight)
- **Database**: PostgreSQL 15
- **Search**: Meilisearch v1.6
- **Cache**: Redis 7
- **Templating**: EJS
- **AI**: Ollama (local LLM - gpt-oss:latest) with OpenAI fallback
- **Containerization**: Docker + Docker Compose
- **Hosting**: Self-hosted on Linode

## Project Structure

```
privacy-policy-analyzer/
├── docker-compose.yml          # Service orchestration
├── Dockerfile                  # Bun app container
├── .env                        # Environment variables (gitignored)
├── package.json               # Bun dependencies
├── src/
│   ├── app.js                 # Entry point
│   ├── config/                # Configuration files
│   ├── models/                # Database models
│   ├── routes/                # Route definitions
│   ├── controllers/           # Request handlers
│   ├── services/              # Business logic
│   ├── middleware/            # Express-style middleware
│   ├── views/                 # EJS templates
│   └── utils/                 # Helper functions
├── migrations/                # SQL migrations
└── public/                    # Static assets
```

## Progress Tracking

This project uses a task tracking system to monitor progress. Tasks are managed using the todo tool and organized by priority:

### Priority Levels
- **High**: Critical infrastructure and core functionality
- **Medium**: Essential features and business logic
- **Low**: Enhancements, optimizations, and polish

### Progress Checklist (48 Tasks Total)

#### Phase 1: Infrastructure Setup (High Priority) - COMPLETED ✓
- [x] Create project root files (docker-compose.yml, Dockerfile, .env.example, package.json)
- [x] Create directory structure (src/, migrations/, public/)
- [x] Configure PostgreSQL in docker-compose.yml with persistent volume
- [x] Configure Redis in docker-compose.yml with persistent volume
- [x] Configure Meilisearch in docker-compose.yml with persistent volume
- [x] Create Bun Dockerfile with optimized build
- [x] Set up .env.example with all required environment variables
- [x] Create package.json with dependencies (postgres, ejs, openai, etc.)
- [x] Test Docker Compose setup - verify all services start

#### Phase 2: Database & Models (Medium Priority) - COMPLETED ✓
- [x] Create database migration file (001_initial.sql) with schema
- [x] Create src/config/database.js for PostgreSQL connection
- [x] Create src/config/redis.js for Redis connection
- [x] Create src/config/meilisearch.js for Meilisearch client
- [x] Create src/config/openai.js for OpenAI client
- [x] Create database migration runner script
- [x] Create src/models/Service.js
- [x] Create src/models/PolicyVersion.js
- [x] Create src/models/Analysis.js
- [x] Create src/models/AdminSession.js

#### Phase 3: Middleware & Routes (Medium Priority) - COMPLETED ✓
- [x] Create src/middleware/auth.js for session authentication
- [x] Create src/middleware/errorHandler.js
- [x] Create src/middleware/security.js for security headers
- [x] Create src/middleware/rateLimiter.js
- [x] Create src/routes/admin.js with authentication routes
- [x] Create src/views/admin/login.ejs
- [x] Create admin dashboard view
- [x] Create src/routes/public.js for public pages
- [x] Create main layout EJS template with SEO meta tags
- [x] Create public homepage view with service listing
- [x] Create service detail page view with last analyzed date display

#### Phase 4: Services & Features (Medium Priority) - COMPLETED ✓
- [x] Create src/services/policyFetcher.js to fetch policy from URL
- [x] Create src/services/aiAnalyzer.js with OpenAI integration
- [x] Create admin service management forms (add/edit)
- [x] Implement manual analysis trigger in admin panel
- [x] Create src/services/scheduler.js for cron jobs
- [x] Create src/services/searchIndexer.js for Meilisearch

#### Phase 5: Enhancements (Low Priority) - IN PROGRESS
- [ ] Implement Redis caching for public pages
- [ ] Create sitemap.xml generator
- [ ] Create robots.txt
- [ ] Add structured data (Schema.org) to service pages
- [x] Implement accessibility features (WCAG 2.1 AA) - Already implemented
- [x] Add CSS styling with focus indicators - Already implemented
- [x] Implement skip to main content link - Already implemented
- [ ] Performance testing and optimization
- [ ] Security audit and penetration testing
- [ ] Accessibility audit with axe-core
- [ ] SEO audit and optimization
- [ ] Create comprehensive documentation

### Working with Tasks
- **ALWAYS** check the current todo list before starting work
- **Update** task status to `in_progress` when starting work
- **Mark complete** immediately after finishing a task
- **Verify** completed tasks using testing checklists in this document
- **Review** progress regularly to maintain momentum

### Current Phase Focus
We are currently in **Phase 5: Enhancements**. Phases 1-4 are complete. All core functionality is working. Remaining tasks are optimizations, audits, and documentation.

## Critical Rules

### 1. JavaScript Only
- NO TypeScript
- Use JSDoc comments for type documentation when helpful
- Bun supports modern JavaScript (ES2023)

### 2. Database Conventions
- Use `postgres` library (Bun-compatible)
- Always use parameterized queries
- Migrations are in `migrations/` folder, numbered sequentially
- Never write raw SQL in routes/controllers

### 3. Environment Variables
ALL configuration goes in `.env`:
```bash
DATABASE_URL=postgresql://user:pass@postgres:5432/dbname
REDIS_URL=redis://redis:6379
MEILISEARCH_URL=http://meilisearch:7700
MEILISEARCH_API_KEY=key
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
ADMIN_USERNAME=admin
ADMIN_PASSWORD=changeme
SESSION_SECRET=random_string
PORT=3000
NODE_ENV=production
```

### 4. AI Analysis Guidelines
- Always use OpenAI's JSON mode for structured output
- Store raw AI response in database (for debugging)
- Implement retry logic with exponential backoff
- Rate limit AI calls (max 10/minute)
- Handle AI failures gracefully - don't crash the app

### 5. Security Requirements (OWASP Top 10)
- NEVER commit `.env` file
- NEVER log API keys or passwords
- Use bcrypt for password hashing (cost factor 12)
- Session tokens stored in Redis with expiration (24 hours)
- All admin routes require authentication middleware
- Input validation on ALL user inputs with proper sanitization
- SQL injection prevention via parameterized queries ONLY
- XSS prevention via EJS auto-escaping AND Content Security Policy
- Rate limiting: 100 req/15min public, 30 req/15min admin, 10 req/hour AI
- Security headers REQUIRED on all responses:
  - Strict-Transport-Security
  - Content-Security-Policy
  - X-Content-Type-Options: nosniff
  - X-Frame-Options: DENY
  - X-XSS-Protection: 1; mode=block
  - Referrer-Policy: strict-origin-when-cross-origin
- HTTPS only with HSTS
- Secure cookies (HttpOnly, Secure, SameSite=Strict)
- Regular dependency audits (`bun audit`)
- Non-root Docker user
- Log authentication attempts and errors (NEVER log sensitive data)

### 6. Error Handling Pattern
```javascript
try {
  // Operation
} catch (error) {
  console.error('Context:', error.message);
  // Return user-friendly error
  return new Response('Error message', { status: 500 });
}
```

### 7. Code Style
- Use single quotes for strings
- 2-space indentation
- Semicolons required
- camelCase for variables/functions
- PascalCase for classes
- No trailing commas
- Max line length: 100 characters

## Common Commands

```bash
# Start all services
docker-compose up -d

# View logs
docker-compose logs -f app

# Run database migrations
docker-compose exec app bun run migrate

# Restart app only
docker-compose restart app

# Shell into app container
docker-compose exec app sh

# Install new dependency
docker-compose exec app bun add package-name

# Run tests (when added)
docker-compose exec app bun test
```

## Database Schema

### services
- id (PK, serial)
- name (varchar)
- url (varchar)
- logo_url (varchar, nullable)
- policy_url (varchar)
- created_at (timestamp)
- updated_at (timestamp)

### policy_versions
- id (PK, serial)
- service_id (FK)
- content (text)
- content_hash (varchar 64)
- fetched_at (timestamp)
- created_at (timestamp)

### analyses
- id (PK, serial)
- service_id (FK)
- policy_version_id (FK)
- overall_score (char 1: A/B/C/D/E)
- findings (JSONB)
- raw_analysis (text)
- created_at (timestamp) - **This is the "last analyzed" date, must be displayed on all service pages**
- updated_at (timestamp)

### admin_sessions
- id (PK, serial)
- session_token (varchar, unique)
- created_at (timestamp)
- expires_at (timestamp)

## AI Prompt Template

When modifying AI analysis, use this structure:

```javascript
const prompt = {
  model: process.env.OPENAI_MODEL,
  messages: [
    {
      role: 'system',
      content: `You are a privacy policy analyzer. Analyze the following privacy policy and provide a structured assessment.

Scoring Criteria:
- A: Excellent privacy practices
- B: Good with minor issues
- C: Acceptable but concerns exist
- D: Poor privacy practices
- E: Very invasive, major concerns

Categories:
1. Data Collection (what's collected)
2. Data Sharing (third parties)
3. User Rights (access, deletion, etc.)
4. Data Retention (how long kept)
5. Tracking & Security

Respond ONLY with valid JSON matching this schema:
{
  "overall_score": "A|B|C|D|E",
  "score_breakdown": { "data_collection": "A|B|C|D|E", ... },
  "findings": { "positive": [...], "negative": [...], "neutral": [...] },
  "data_types_collected": [...],
  "third_parties": [...],
  "summary": "string"
}`
    },
    {
      role: 'user',
      content: `Analyze this privacy policy:\n\n${policyText}`
    }
  ],
  response_format: { type: 'json_object' }
};
```

## SEO Requirements

### Meta Tags (All Public Pages)
Every public page MUST include:
```html
<!-- Basic Meta -->
<title>Descriptive Title - Privacy Policy Analyzer</title>
<meta name="description" content="150-160 character description">
<link rel="canonical" href="https://example.com/current-path">

<!-- Open Graph -->
<meta property="og:title" content="Page Title">
<meta property="og:description" content="Page description">
<meta property="og:image" content="https://example.com/og-image.jpg">
<meta property="og:url" content="https://example.com/current-path">
<meta property="og:type" content="website">

<!-- Twitter Card -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Page Title">
<meta name="twitter:description" content="Page description">
<meta name="twitter:image" content="https://example.com/twitter-image.jpg">
```

### Structured Data (Schema.org)
Include JSON-LD structured data on all service pages:
```html
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Review",
  "itemReviewed": {
    "@type": "Organization",
    "name": "Service Name"
  },
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "4",
    "bestRating": "5",
    "worstRating": "1"
  }
}
</script>
```

### Semantic HTML Requirements
- One `<h1>` per page with main topic
- Logical heading hierarchy (no skipping levels)
- Use `<header>`, `<nav>`, `<main>`, `<article>`, `<footer>`
- Breadcrumb navigation with Schema.org markup
- Descriptive link text (no "click here")
- **Display "Last Analyzed" date prominently on all service pages** (from analyses.created_at)

## Accessibility Requirements (WCAG 2.1 AA)

### Mandatory Implementation
1. **Color Contrast**: Minimum 4.5:1 for text, 3:1 for UI components
2. **Keyboard Navigation**: All features accessible via keyboard only
3. **Focus Management**: Visible focus indicators (2px solid outline minimum)
4. **Alt Text**: All images must have descriptive alt text
5. **Form Labels**: All inputs must have associated labels
6. **ARIA Landmarks**: banner, main, navigation, contentinfo
7. **Skip Link**: "Skip to main content" link at top of page

### Accessibility Patterns
```html
<!-- Skip Link -->
<a href="#main-content" class="skip-link">Skip to main content</a>

<!-- ARIA Landmarks -->
<header role="banner">...</header>
<nav role="navigation" aria-label="Main">...</nav>
<main id="main-content" role="main">...</main>
<footer role="contentinfo">...</footer>

<!-- Accessible Form -->
<label for="service-name">Service Name <span aria-label="required">*</span></label>
<input
  type="text"
  id="service-name"
  name="serviceName"
  required
  aria-required="true"
  aria-describedby="name-error"
>
<div id="name-error" role="alert" class="error-message"></div>

<!-- Accessible Button with Icon -->
<button aria-label="Close menu">
  <span aria-hidden="true">&times;</span>
</button>

<!-- Accessible Card Link -->
<article>
  <h2><a href="/service/facebook" aria-describedby="facebook-grade">Facebook</a></h2>
  <span id="facebook-grade" class="visually-hidden">Privacy Grade E</span>
</article>
```

### Focus Styles (CSS)
```css
/* Visible focus indicators */
:focus {
  outline: 2px solid #0066cc;
  outline-offset: 2px;
}

/* Skip link styling */
.skip-link {
  position: absolute;
  top: -40px;
  left: 0;
  background: #000;
  color: #fff;
  padding: 8px;
  z-index: 100;
}

.skip-link:focus {
  top: 0;
}

/* Visually hidden but screen-reader accessible */
.visually-hidden {
  position: absolute;
  width: 1px;
  height: 1px;
  padding: 0;
  margin: -1px;
  overflow: hidden;
  clip: rect(0, 0, 0, 0);
  border: 0;
}
```

## File Templates

### New Route (src/routes/example.js)
```javascript
import { Router } from '../utils/router.js';
import { authenticate } from '../middleware/auth.js';

const router = new Router();

// Public route
router.get('/example', async (req, res) => {
  // Handler
});

// Protected route
router.get('/admin/example', authenticate, async (req, res) => {
  // Handler
});

export default router;
```

### New Model (src/models/Example.js)
```javascript
import { sql } from '../config/database.js';

export class Example {
  static async findById(id) {
    const result = await sql`SELECT * FROM examples WHERE id = ${id}`;
    return result[0] || null;
  }

  static async create(data) {
    const result = await sql`
      INSERT INTO examples (field1, field2)
      VALUES (${data.field1}, ${data.field2})
      RETURNING *
    `;
    return result[0];
  }
}
```

### New Service (src/services/exampleService.js)
```javascript
import { Example } from '../models/Example.js';

export const exampleService = {
  async performAction(params) {
    try {
      // Business logic
      return { success: true, data };
    } catch (error) {
      console.error('Service error:', error);
      throw error;
    }
  }
};
```

## Testing Checklist

### Functional Testing
- [ ] App starts without errors (`docker-compose up`)
- [ ] No hardcoded secrets or credentials
- [ ] Database queries use parameterized statements
- [ ] Admin routes require authentication
- [ ] AI analysis handles errors gracefully
- [ ] No sensitive data in logs

### SEO Testing
- [ ] All pages have unique `<title>` tags (50-60 chars)
- [ ] All pages have meta descriptions (150-160 chars)
- [ ] Open Graph tags present on all public pages
- [ ] Canonical URLs set correctly
- [ ] Sitemap.xml auto-generates and is valid
- [ ] robots.txt allows public, blocks admin
- [ ] Semantic HTML5 structure (header, nav, main, article, footer)
- [ ] Single H1 per page with logical heading hierarchy
- [ ] All images have descriptive alt text
- [ ] Structured data (Schema.org) validates

### Performance Testing
- [ ] Lighthouse score ≥ 90 on all metrics
- [ ] First Contentful Paint < 1.0s
- [ ] Largest Contentful Paint < 2.5s
- [ ] Time to Interactive < 3.8s
- [ ] Cumulative Layout Shift < 0.1
- [ ] Redis caching working (verify with `redis-cli`)
- [ ] Gzip/Brotli compression enabled
- [ ] Images optimized (WebP format, proper sizing)
- [ ] CSS/JS minified

### Security Testing
- [ ] Security headers present on all responses
- [ ] HTTPS enforced (HSTS header)
- [ ] Cookies have HttpOnly, Secure, SameSite flags
- [ ] Rate limiting prevents abuse (test with `ab` or `wrk`)
- [ ] SQL injection attempts blocked
- [ ] XSS attempts blocked (test with `<script>alert(1)</script>`)
- [ ] Admin routes inaccessible without authentication
- [ ] Session expires after 24 hours
- [ ] `bun audit` passes with no critical vulnerabilities
- [ ] No secrets in logs or error messages

### Accessibility Testing (WCAG 2.1 AA)
- [ ] All images have alt text
- [ ] Color contrast ≥ 4.5:1 for normal text (test with WebAIM)
- [ ] Color contrast ≥ 3:1 for large text and UI components
- [ ] Keyboard navigation works throughout site
- [ ] Focus indicators visible (2px outline minimum)
- [ ] Skip to main content link present
- [ ] Form labels associated with inputs
- [ ] Page titles descriptive and unique
- [ ] ARIA landmarks used (banner, main, navigation)
- [ ] Screen reader announces content correctly (test with NVDA/VoiceOver)
- [ ] Touch targets ≥ 44x44px
- [ ] No flashing content (>3 Hz)
- [ ] axe-core passes with 0 violations

## Performance Guidelines

### Caching Strategy
- Public pages: Redis TTL 1 hour
- Analysis results: Redis TTL 24 hours
- API responses: Redis TTL 5 minutes
- Meilisearch queries: Redis TTL 10 minutes
- Cache invalidation on data update

### Database Optimization
- Index frequently queried columns: service.name, analysis.overall_score
- Use connection pooling (max 20 connections)
- Query optimization with EXPLAIN ANALYZE
- Lazy load analysis results
- Paginate service listings (25 per page)
- Compress large policy texts before storage

### Asset Optimization
- Minify CSS/JS for production
- Use WebP format for images
- Implement lazy loading for images
- Critical CSS inline for above-fold content
- Use `async`/`defer` for non-critical scripts
- Brotli + Gzip compression for text responses

### Target Metrics
- First Contentful Paint (FCP): < 1.0s
- Largest Contentful Paint (LCP): < 2.5s
- Time to Interactive (TTI): < 3.8s
- Cumulative Layout Shift (CLS): < 0.1
- Lighthouse Performance Score: ≥ 90

## Deployment Notes

- All services run via Docker Compose
- Persistent volumes for PostgreSQL, Redis, Meilisearch
- Restart policy: always (except during development)
- Logs go to stdout/stderr (Docker handles collection)
- Environment variables set in `.env` on host

## Troubleshooting

### App won't start
1. Check `.env` exists and has all required vars
2. Ensure ports 3000, 5432, 6379, 7700 are free
3. Run `docker-compose down -v` and `docker-compose up -d`

### Database connection fails
1. Verify DATABASE_URL format
2. Check postgres container is running: `docker-compose ps`
3. Check logs: `docker-compose logs postgres`

### AI analysis fails
1. Verify OPENAI_API_KEY is set
2. Check OpenAI API status
3. Review raw_analysis column for error details

## External Dependencies

- **PostgreSQL**: https://www.postgresql.org/docs/15/
- **Meilisearch**: https://www.meilisearch.com/docs
- **Redis**: https://redis.io/docs/
- **OpenAI API**: https://platform.openai.com/docs
- **Bun**: https://bun.sh/docs
- **EJS**: https://ejs.co/#docs

## Contact & Resources

- **Project Type**: Private pet project
- **Hosting**: Self-hosted Linode instance
- **No external contributors expected**
- **No CI/CD pipeline** (manual deployment)

## Change Log

When making significant changes, update this section:

```
2026-01-27: Completed Phase 1-4 - Infrastructure, Database, Middleware, Routes, and Services. All core functionality working including Docker setup, PostgreSQL/Redis/Meilisearch, AI analysis with OpenAI, policy fetching, and cron scheduling.
```

---

**Last Updated**: 2026-01-27
**Version**: 1.0