Files
didnt-read/AGENTS.md

650 lines
20 KiB
Markdown

# AGENTS.md - Privacy Policy Analyzer
This file provides essential context and guidelines for AI agents working on this project.
## Project Overview
**Privacy Policy Analyzer** - A self-hosted web application that analyzes website privacy policies using OpenAI's GPT models. Provides easy-to-understand A-E grades and detailed findings about privacy practices.
**Inspiration**: ToS;DR (Terms of Service; Didn't Read) - but focused specifically on privacy policies.
**Repository**: Private pet project, no monetization
## Tech Stack
- **Runtime**: Bun (JavaScript, NOT TypeScript)
- **Web Framework**: Native Bun HTTP server or Elysia.js (lightweight)
- **Database**: PostgreSQL 15
- **Search**: Meilisearch v1.6
- **Cache**: Redis 7
- **Templating**: EJS
- **AI**: Ollama (local LLM - gpt-oss:latest) with OpenAI fallback
- **Containerization**: Docker + Docker Compose
- **Hosting**: Self-hosted on Linode
## Project Structure
```
privacy-policy-analyzer/
├── docker-compose.yml # Service orchestration
├── Dockerfile # Bun app container
├── .env # Environment variables (gitignored)
├── package.json # Bun dependencies
├── src/
│ ├── app.js # Entry point
│ ├── config/ # Configuration files
│ ├── models/ # Database models
│ ├── routes/ # Route definitions
│ ├── controllers/ # Request handlers
│ ├── services/ # Business logic
│ ├── middleware/ # Express-style middleware
│ ├── views/ # EJS templates
│ └── utils/ # Helper functions
├── migrations/ # SQL migrations
└── public/ # Static assets
```
## Progress Tracking
This project uses a task tracking system to monitor progress. Tasks are managed using the todo tool and organized by priority:
### Priority Levels
- **High**: Critical infrastructure and core functionality
- **Medium**: Essential features and business logic
- **Low**: Enhancements, optimizations, and polish
### Progress Checklist (48 Tasks Total)
#### Phase 1: Infrastructure Setup (High Priority) - COMPLETED ✓
- [x] Create project root files (docker-compose.yml, Dockerfile, .env.example, package.json)
- [x] Create directory structure (src/, migrations/, public/)
- [x] Configure PostgreSQL in docker-compose.yml with persistent volume
- [x] Configure Redis in docker-compose.yml with persistent volume
- [x] Configure Meilisearch in docker-compose.yml with persistent volume
- [x] Create Bun Dockerfile with optimized build
- [x] Set up .env.example with all required environment variables
- [x] Create package.json with dependencies (postgres, ejs, openai, etc.)
- [x] Test Docker Compose setup - verify all services start
#### Phase 2: Database & Models (Medium Priority) - COMPLETED ✓
- [x] Create database migration file (001_initial.sql) with schema
- [x] Create src/config/database.js for PostgreSQL connection
- [x] Create src/config/redis.js for Redis connection
- [x] Create src/config/meilisearch.js for Meilisearch client
- [x] Create src/config/openai.js for OpenAI client
- [x] Create database migration runner script
- [x] Create src/models/Service.js
- [x] Create src/models/PolicyVersion.js
- [x] Create src/models/Analysis.js
- [x] Create src/models/AdminSession.js
#### Phase 3: Middleware & Routes (Medium Priority) - COMPLETED ✓
- [x] Create src/middleware/auth.js for session authentication
- [x] Create src/middleware/errorHandler.js
- [x] Create src/middleware/security.js for security headers
- [x] Create src/middleware/rateLimiter.js
- [x] Create src/routes/admin.js with authentication routes
- [x] Create src/views/admin/login.ejs
- [x] Create admin dashboard view
- [x] Create src/routes/public.js for public pages
- [x] Create main layout EJS template with SEO meta tags
- [x] Create public homepage view with service listing
- [x] Create service detail page view with last analyzed date display
#### Phase 4: Services & Features (Medium Priority) - COMPLETED ✓
- [x] Create src/services/policyFetcher.js to fetch policy from URL
- [x] Create src/services/aiAnalyzer.js with OpenAI integration
- [x] Create admin service management forms (add/edit)
- [x] Implement manual analysis trigger in admin panel
- [x] Create src/services/scheduler.js for cron jobs
- [x] Create src/services/searchIndexer.js for Meilisearch
#### Phase 5: Enhancements (Low Priority) - IN PROGRESS
- [ ] Implement Redis caching for public pages
- [ ] Create sitemap.xml generator
- [ ] Create robots.txt
- [ ] Add structured data (Schema.org) to service pages
- [x] Implement accessibility features (WCAG 2.1 AA) - Already implemented
- [x] Add CSS styling with focus indicators - Already implemented
- [x] Implement skip to main content link - Already implemented
- [ ] Performance testing and optimization
- [ ] Security audit and penetration testing
- [ ] Accessibility audit with axe-core
- [ ] SEO audit and optimization
- [ ] Create comprehensive documentation
### Working with Tasks
- **ALWAYS** check the current todo list before starting work
- **Update** task status to `in_progress` when starting work
- **Mark complete** immediately after finishing a task
- **Verify** completed tasks using testing checklists in this document
- **Review** progress regularly to maintain momentum
### Current Phase Focus
We are currently in **Phase 5: Enhancements**. Phases 1-4 are complete. All core functionality is working. Remaining tasks are optimizations, audits, and documentation.
## Critical Rules
### 1. JavaScript Only
- NO TypeScript
- Use JSDoc comments for type documentation when helpful
- Bun supports modern JavaScript (ES2023)
### 2. Database Conventions
- Use `postgres` library (Bun-compatible)
- Always use parameterized queries
- Migrations are in `migrations/` folder, numbered sequentially
- Never write raw SQL in routes/controllers
### 3. Environment Variables
ALL configuration goes in `.env`:
```bash
DATABASE_URL=postgresql://user:pass@postgres:5432/dbname
REDIS_URL=redis://redis:6379
MEILISEARCH_URL=http://meilisearch:7700
MEILISEARCH_API_KEY=key
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
ADMIN_USERNAME=admin
ADMIN_PASSWORD=changeme
SESSION_SECRET=random_string
PORT=3000
NODE_ENV=production
```
### 4. AI Analysis Guidelines
- Always use OpenAI's JSON mode for structured output
- Store raw AI response in database (for debugging)
- Implement retry logic with exponential backoff
- Rate limit AI calls (max 10/minute)
- Handle AI failures gracefully - don't crash the app
### 5. Security Requirements (OWASP Top 10)
- NEVER commit `.env` file
- NEVER log API keys or passwords
- Use bcrypt for password hashing (cost factor 12)
- Session tokens stored in Redis with expiration (24 hours)
- All admin routes require authentication middleware
- Input validation on ALL user inputs with proper sanitization
- SQL injection prevention via parameterized queries ONLY
- XSS prevention via EJS auto-escaping AND Content Security Policy
- Rate limiting: 100 req/15min public, 30 req/15min admin, 10 req/hour AI
- Security headers REQUIRED on all responses:
- Strict-Transport-Security
- Content-Security-Policy
- X-Content-Type-Options: nosniff
- X-Frame-Options: DENY
- X-XSS-Protection: 1; mode=block
- Referrer-Policy: strict-origin-when-cross-origin
- HTTPS only with HSTS
- Secure cookies (HttpOnly, Secure, SameSite=Strict)
- Regular dependency audits (`bun audit`)
- Non-root Docker user
- Log authentication attempts and errors (NEVER log sensitive data)
### 6. Error Handling Pattern
```javascript
try {
// Operation
} catch (error) {
console.error('Context:', error.message);
// Return user-friendly error
return new Response('Error message', { status: 500 });
}
```
### 7. Code Style
- Use single quotes for strings
- 2-space indentation
- Semicolons required
- camelCase for variables/functions
- PascalCase for classes
- No trailing commas
- Max line length: 100 characters
## Common Commands
```bash
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f app
# Run database migrations
docker-compose exec app bun run migrate
# Restart app only
docker-compose restart app
# Shell into app container
docker-compose exec app sh
# Install new dependency
docker-compose exec app bun add package-name
# Run tests (when added)
docker-compose exec app bun test
```
## Database Schema
### services
- id (PK, serial)
- name (varchar)
- url (varchar)
- logo_url (varchar, nullable)
- policy_url (varchar)
- created_at (timestamp)
- updated_at (timestamp)
### policy_versions
- id (PK, serial)
- service_id (FK)
- content (text)
- content_hash (varchar 64)
- fetched_at (timestamp)
- created_at (timestamp)
### analyses
- id (PK, serial)
- service_id (FK)
- policy_version_id (FK)
- overall_score (char 1: A/B/C/D/E)
- findings (JSONB)
- raw_analysis (text)
- created_at (timestamp) - **This is the "last analyzed" date, must be displayed on all service pages**
- updated_at (timestamp)
### admin_sessions
- id (PK, serial)
- session_token (varchar, unique)
- created_at (timestamp)
- expires_at (timestamp)
## AI Prompt Template
When modifying AI analysis, use this structure:
```javascript
const prompt = {
model: process.env.OPENAI_MODEL,
messages: [
{
role: 'system',
content: `You are a privacy policy analyzer. Analyze the following privacy policy and provide a structured assessment.
Scoring Criteria:
- A: Excellent privacy practices
- B: Good with minor issues
- C: Acceptable but concerns exist
- D: Poor privacy practices
- E: Very invasive, major concerns
Categories:
1. Data Collection (what's collected)
2. Data Sharing (third parties)
3. User Rights (access, deletion, etc.)
4. Data Retention (how long kept)
5. Tracking & Security
Respond ONLY with valid JSON matching this schema:
{
"overall_score": "A|B|C|D|E",
"score_breakdown": { "data_collection": "A|B|C|D|E", ... },
"findings": { "positive": [...], "negative": [...], "neutral": [...] },
"data_types_collected": [...],
"third_parties": [...],
"summary": "string"
}`
},
{
role: 'user',
content: `Analyze this privacy policy:\n\n${policyText}`
}
],
response_format: { type: 'json_object' }
};
```
## SEO Requirements
### Meta Tags (All Public Pages)
Every public page MUST include:
```html
<!-- Basic Meta -->
<title>Descriptive Title - Privacy Policy Analyzer</title>
<meta name="description" content="150-160 character description">
<link rel="canonical" href="https://example.com/current-path">
<!-- Open Graph -->
<meta property="og:title" content="Page Title">
<meta property="og:description" content="Page description">
<meta property="og:image" content="https://example.com/og-image.jpg">
<meta property="og:url" content="https://example.com/current-path">
<meta property="og:type" content="website">
<!-- Twitter Card -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Page Title">
<meta name="twitter:description" content="Page description">
<meta name="twitter:image" content="https://example.com/twitter-image.jpg">
```
### Structured Data (Schema.org)
Include JSON-LD structured data on all service pages:
```html
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Review",
"itemReviewed": {
"@type": "Organization",
"name": "Service Name"
},
"reviewRating": {
"@type": "Rating",
"ratingValue": "4",
"bestRating": "5",
"worstRating": "1"
}
}
</script>
```
### Semantic HTML Requirements
- One `<h1>` per page with main topic
- Logical heading hierarchy (no skipping levels)
- Use `<header>`, `<nav>`, `<main>`, `<article>`, `<footer>`
- Breadcrumb navigation with Schema.org markup
- Descriptive link text (no "click here")
- **Display "Last Analyzed" date prominently on all service pages** (from analyses.created_at)
## Accessibility Requirements (WCAG 2.1 AA)
### Mandatory Implementation
1. **Color Contrast**: Minimum 4.5:1 for text, 3:1 for UI components
2. **Keyboard Navigation**: All features accessible via keyboard only
3. **Focus Management**: Visible focus indicators (2px solid outline minimum)
4. **Alt Text**: All images must have descriptive alt text
5. **Form Labels**: All inputs must have associated labels
6. **ARIA Landmarks**: banner, main, navigation, contentinfo
7. **Skip Link**: "Skip to main content" link at top of page
### Accessibility Patterns
```html
<!-- Skip Link -->
<a href="#main-content" class="skip-link">Skip to main content</a>
<!-- ARIA Landmarks -->
<header role="banner">...</header>
<nav role="navigation" aria-label="Main">...</nav>
<main id="main-content" role="main">...</main>
<footer role="contentinfo">...</footer>
<!-- Accessible Form -->
<label for="service-name">Service Name <span aria-label="required">*</span></label>
<input
type="text"
id="service-name"
name="serviceName"
required
aria-required="true"
aria-describedby="name-error"
>
<div id="name-error" role="alert" class="error-message"></div>
<!-- Accessible Button with Icon -->
<button aria-label="Close menu">
<span aria-hidden="true">&times;</span>
</button>
<!-- Accessible Card Link -->
<article>
<h2><a href="/service/facebook" aria-describedby="facebook-grade">Facebook</a></h2>
<span id="facebook-grade" class="visually-hidden">Privacy Grade E</span>
</article>
```
### Focus Styles (CSS)
```css
/* Visible focus indicators */
:focus {
outline: 2px solid #0066cc;
outline-offset: 2px;
}
/* Skip link styling */
.skip-link {
position: absolute;
top: -40px;
left: 0;
background: #000;
color: #fff;
padding: 8px;
z-index: 100;
}
.skip-link:focus {
top: 0;
}
/* Visually hidden but screen-reader accessible */
.visually-hidden {
position: absolute;
width: 1px;
height: 1px;
padding: 0;
margin: -1px;
overflow: hidden;
clip: rect(0, 0, 0, 0);
border: 0;
}
```
## File Templates
### New Route (src/routes/example.js)
```javascript
import { Router } from '../utils/router.js';
import { authenticate } from '../middleware/auth.js';
const router = new Router();
// Public route
router.get('/example', async (req, res) => {
// Handler
});
// Protected route
router.get('/admin/example', authenticate, async (req, res) => {
// Handler
});
export default router;
```
### New Model (src/models/Example.js)
```javascript
import { sql } from '../config/database.js';
export class Example {
static async findById(id) {
const result = await sql`SELECT * FROM examples WHERE id = ${id}`;
return result[0] || null;
}
static async create(data) {
const result = await sql`
INSERT INTO examples (field1, field2)
VALUES (${data.field1}, ${data.field2})
RETURNING *
`;
return result[0];
}
}
```
### New Service (src/services/exampleService.js)
```javascript
import { Example } from '../models/Example.js';
export const exampleService = {
async performAction(params) {
try {
// Business logic
return { success: true, data };
} catch (error) {
console.error('Service error:', error);
throw error;
}
}
};
```
## Testing Checklist
### Functional Testing
- [ ] App starts without errors (`docker-compose up`)
- [ ] No hardcoded secrets or credentials
- [ ] Database queries use parameterized statements
- [ ] Admin routes require authentication
- [ ] AI analysis handles errors gracefully
- [ ] No sensitive data in logs
### SEO Testing
- [ ] All pages have unique `<title>` tags (50-60 chars)
- [ ] All pages have meta descriptions (150-160 chars)
- [ ] Open Graph tags present on all public pages
- [ ] Canonical URLs set correctly
- [ ] Sitemap.xml auto-generates and is valid
- [ ] robots.txt allows public, blocks admin
- [ ] Semantic HTML5 structure (header, nav, main, article, footer)
- [ ] Single H1 per page with logical heading hierarchy
- [ ] All images have descriptive alt text
- [ ] Structured data (Schema.org) validates
### Performance Testing
- [ ] Lighthouse score ≥ 90 on all metrics
- [ ] First Contentful Paint < 1.0s
- [ ] Largest Contentful Paint < 2.5s
- [ ] Time to Interactive < 3.8s
- [ ] Cumulative Layout Shift < 0.1
- [ ] Redis caching working (verify with `redis-cli`)
- [ ] Gzip/Brotli compression enabled
- [ ] Images optimized (WebP format, proper sizing)
- [ ] CSS/JS minified
### Security Testing
- [ ] Security headers present on all responses
- [ ] HTTPS enforced (HSTS header)
- [ ] Cookies have HttpOnly, Secure, SameSite flags
- [ ] Rate limiting prevents abuse (test with `ab` or `wrk`)
- [ ] SQL injection attempts blocked
- [ ] XSS attempts blocked (test with `<script>alert(1)</script>`)
- [ ] Admin routes inaccessible without authentication
- [ ] Session expires after 24 hours
- [ ] `bun audit` passes with no critical vulnerabilities
- [ ] No secrets in logs or error messages
### Accessibility Testing (WCAG 2.1 AA)
- [ ] All images have alt text
- [ ] Color contrast ≥ 4.5:1 for normal text (test with WebAIM)
- [ ] Color contrast ≥ 3:1 for large text and UI components
- [ ] Keyboard navigation works throughout site
- [ ] Focus indicators visible (2px outline minimum)
- [ ] Skip to main content link present
- [ ] Form labels associated with inputs
- [ ] Page titles descriptive and unique
- [ ] ARIA landmarks used (banner, main, navigation)
- [ ] Screen reader announces content correctly (test with NVDA/VoiceOver)
- [ ] Touch targets ≥ 44x44px
- [ ] No flashing content (>3 Hz)
- [ ] axe-core passes with 0 violations
## Performance Guidelines
### Caching Strategy
- Public pages: Redis TTL 1 hour
- Analysis results: Redis TTL 24 hours
- API responses: Redis TTL 5 minutes
- Meilisearch queries: Redis TTL 10 minutes
- Cache invalidation on data update
### Database Optimization
- Index frequently queried columns: service.name, analysis.overall_score
- Use connection pooling (max 20 connections)
- Query optimization with EXPLAIN ANALYZE
- Lazy load analysis results
- Paginate service listings (25 per page)
- Compress large policy texts before storage
### Asset Optimization
- Minify CSS/JS for production
- Use WebP format for images
- Implement lazy loading for images
- Critical CSS inline for above-fold content
- Use `async`/`defer` for non-critical scripts
- Brotli + Gzip compression for text responses
### Target Metrics
- First Contentful Paint (FCP): < 1.0s
- Largest Contentful Paint (LCP): < 2.5s
- Time to Interactive (TTI): < 3.8s
- Cumulative Layout Shift (CLS): < 0.1
- Lighthouse Performance Score: ≥ 90
## Deployment Notes
- All services run via Docker Compose
- Persistent volumes for PostgreSQL, Redis, Meilisearch
- Restart policy: always (except during development)
- Logs go to stdout/stderr (Docker handles collection)
- Environment variables set in `.env` on host
## Troubleshooting
### App won't start
1. Check `.env` exists and has all required vars
2. Ensure ports 3000, 5432, 6379, 7700 are free
3. Run `docker-compose down -v` and `docker-compose up -d`
### Database connection fails
1. Verify DATABASE_URL format
2. Check postgres container is running: `docker-compose ps`
3. Check logs: `docker-compose logs postgres`
### AI analysis fails
1. Verify OPENAI_API_KEY is set
2. Check OpenAI API status
3. Review raw_analysis column for error details
## External Dependencies
- **PostgreSQL**: https://www.postgresql.org/docs/15/
- **Meilisearch**: https://www.meilisearch.com/docs
- **Redis**: https://redis.io/docs/
- **OpenAI API**: https://platform.openai.com/docs
- **Bun**: https://bun.sh/docs
- **EJS**: https://ejs.co/#docs
## Contact & Resources
- **Project Type**: Private pet project
- **Hosting**: Self-hosted Linode instance
- **No external contributors expected**
- **No CI/CD pipeline** (manual deployment)
## Change Log
When making significant changes, update this section:
```
2026-01-27: Completed Phase 1-4 - Infrastructure, Database, Middleware, Routes, and Services. All core functionality working including Docker setup, PostgreSQL/Redis/Meilisearch, AI analysis with OpenAI, policy fetching, and cron scheduling.
```
---
**Last Updated**: 2026-01-27
**Version**: 1.0