santhoshj/didnt-read

Fork 0

Files

Santhosh Janardhanan c85b877dc0 Initial Commit

2026-01-27 13:24:03 -05:00

20 KiB

Raw Blame History

AGENTS.md - Privacy Policy Analyzer

This file provides essential context and guidelines for AI agents working on this project.

Project Overview

Privacy Policy Analyzer - A self-hosted web application that analyzes website privacy policies using OpenAI's GPT models. Provides easy-to-understand A-E grades and detailed findings about privacy practices.

Inspiration: ToS;DR (Terms of Service; Didn't Read) - but focused specifically on privacy policies.

Repository: Private pet project, no monetization

Tech Stack

Runtime: Bun (JavaScript, NOT TypeScript)
Web Framework: Native Bun HTTP server or Elysia.js (lightweight)
Database: PostgreSQL 15
Search: Meilisearch v1.6
Cache: Redis 7
Templating: EJS
AI: OpenAI API (GPT-4o/GPT-4-turbo)
Containerization: Docker + Docker Compose
Hosting: Self-hosted on Linode

Project Structure

privacy-policy-analyzer/
├── docker-compose.yml          # Service orchestration
├── Dockerfile                  # Bun app container
├── .env                        # Environment variables (gitignored)
├── package.json               # Bun dependencies
├── src/
│   ├── app.js                 # Entry point
│   ├── config/                # Configuration files
│   ├── models/                # Database models
│   ├── routes/                # Route definitions
│   ├── controllers/           # Request handlers
│   ├── services/              # Business logic
│   ├── middleware/            # Express-style middleware
│   ├── views/                 # EJS templates
│   └── utils/                 # Helper functions
├── migrations/                # SQL migrations
└── public/                    # Static assets

Progress Tracking

This project uses a task tracking system to monitor progress. Tasks are managed using the todo tool and organized by priority:

Priority Levels

High: Critical infrastructure and core functionality
Medium: Essential features and business logic
Low: Enhancements, optimizations, and polish

Progress Checklist (48 Tasks Total)

Phase 1: Infrastructure Setup (High Priority) - COMPLETED ✓

Create project root files (docker-compose.yml, Dockerfile, .env.example, package.json)
Create directory structure (src/, migrations/, public/)
Configure PostgreSQL in docker-compose.yml with persistent volume
Configure Redis in docker-compose.yml with persistent volume
Configure Meilisearch in docker-compose.yml with persistent volume
Create Bun Dockerfile with optimized build
Set up .env.example with all required environment variables
Create package.json with dependencies (postgres, ejs, openai, etc.)
Test Docker Compose setup - verify all services start

Phase 2: Database & Models (Medium Priority) - COMPLETED ✓

Create database migration file (001_initial.sql) with schema
Create src/config/database.js for PostgreSQL connection
Create src/config/redis.js for Redis connection
Create src/config/meilisearch.js for Meilisearch client
Create src/config/openai.js for OpenAI client
Create database migration runner script
Create src/models/Service.js
Create src/models/PolicyVersion.js
Create src/models/Analysis.js
Create src/models/AdminSession.js

Phase 3: Middleware & Routes (Medium Priority) - COMPLETED ✓

Create src/middleware/auth.js for session authentication
Create src/middleware/errorHandler.js
Create src/middleware/security.js for security headers
Create src/middleware/rateLimiter.js
Create src/routes/admin.js with authentication routes
Create src/views/admin/login.ejs
Create admin dashboard view
Create src/routes/public.js for public pages
Create main layout EJS template with SEO meta tags
Create public homepage view with service listing
Create service detail page view with last analyzed date display

Phase 4: Services & Features (Medium Priority) - COMPLETED ✓

Create src/services/policyFetcher.js to fetch policy from URL
Create src/services/aiAnalyzer.js with OpenAI integration
Create admin service management forms (add/edit)
Implement manual analysis trigger in admin panel
Create src/services/scheduler.js for cron jobs
Create src/services/searchIndexer.js for Meilisearch

Phase 5: Enhancements (Low Priority) - IN PROGRESS

Implement Redis caching for public pages
Create sitemap.xml generator
Create robots.txt
Add structured data (Schema.org) to service pages
Implement accessibility features (WCAG 2.1 AA) - Already implemented
Add CSS styling with focus indicators - Already implemented
Implement skip to main content link - Already implemented
Performance testing and optimization
Security audit and penetration testing
Accessibility audit with axe-core
SEO audit and optimization
Create comprehensive documentation

Working with Tasks

ALWAYS check the current todo list before starting work
Update task status to in_progress when starting work
Mark complete immediately after finishing a task
Verify completed tasks using testing checklists in this document
Review progress regularly to maintain momentum

Current Phase Focus

We are currently in Phase 5: Enhancements. Phases 1-4 are complete. All core functionality is working. Remaining tasks are optimizations, audits, and documentation.

Critical Rules

1. JavaScript Only

NO TypeScript
Use JSDoc comments for type documentation when helpful
Bun supports modern JavaScript (ES2023)

2. Database Conventions

Use postgres library (Bun-compatible)
Always use parameterized queries
Migrations are in migrations/ folder, numbered sequentially
Never write raw SQL in routes/controllers

3. Environment Variables

ALL configuration goes in .env:

DATABASE_URL=postgresql://user:pass@postgres:5432/dbname
REDIS_URL=redis://redis:6379
MEILISEARCH_URL=http://meilisearch:7700
MEILISEARCH_API_KEY=key
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
ADMIN_USERNAME=admin
ADMIN_PASSWORD=changeme
SESSION_SECRET=random_string
PORT=3000
NODE_ENV=production

4. AI Analysis Guidelines

Always use OpenAI's JSON mode for structured output
Store raw AI response in database (for debugging)
Implement retry logic with exponential backoff
Rate limit AI calls (max 10/minute)
Handle AI failures gracefully - don't crash the app

5. Security Requirements (OWASP Top 10)

NEVER commit .env file
NEVER log API keys or passwords
Use bcrypt for password hashing (cost factor 12)
Session tokens stored in Redis with expiration (24 hours)
All admin routes require authentication middleware
Input validation on ALL user inputs with proper sanitization
SQL injection prevention via parameterized queries ONLY
XSS prevention via EJS auto-escaping AND Content Security Policy
Rate limiting: 100 req/15min public, 30 req/15min admin, 10 req/hour AI
Security headers REQUIRED on all responses:
- Strict-Transport-Security
- Content-Security-Policy
- X-Content-Type-Options: nosniff
- X-Frame-Options: DENY
- X-XSS-Protection: 1; mode=block
- Referrer-Policy: strict-origin-when-cross-origin
HTTPS only with HSTS
Secure cookies (HttpOnly, Secure, SameSite=Strict)
Regular dependency audits (bun audit)
Non-root Docker user
Log authentication attempts and errors (NEVER log sensitive data)

6. Error Handling Pattern

try {
  // Operation
} catch (error) {
  console.error('Context:', error.message);
  // Return user-friendly error
  return new Response('Error message', { status: 500 });
}

7. Code Style

Use single quotes for strings
2-space indentation
Semicolons required
camelCase for variables/functions
PascalCase for classes
No trailing commas
Max line length: 100 characters

Common Commands

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f app

# Run database migrations
docker-compose exec app bun run migrate

# Restart app only
docker-compose restart app

# Shell into app container
docker-compose exec app sh

# Install new dependency
docker-compose exec app bun add package-name

# Run tests (when added)
docker-compose exec app bun test

Database Schema

services

id (PK, serial)
name (varchar)
url (varchar)
logo_url (varchar, nullable)
policy_url (varchar)
created_at (timestamp)
updated_at (timestamp)

policy_versions

id (PK, serial)
service_id (FK)
content (text)
content_hash (varchar 64)
fetched_at (timestamp)
created_at (timestamp)

analyses

id (PK, serial)
service_id (FK)
policy_version_id (FK)
overall_score (char 1: A/B/C/D/E)
findings (JSONB)
raw_analysis (text)
created_at (timestamp) - This is the "last analyzed" date, must be displayed on all service pages
updated_at (timestamp)

admin_sessions

id (PK, serial)
session_token (varchar, unique)
created_at (timestamp)
expires_at (timestamp)

AI Prompt Template

When modifying AI analysis, use this structure:

const prompt = {
  model: process.env.OPENAI_MODEL,
  messages: [
    {
      role: 'system',
      content: `You are a privacy policy analyzer. Analyze the following privacy policy and provide a structured assessment.

Scoring Criteria:
- A: Excellent privacy practices
- B: Good with minor issues
- C: Acceptable but concerns exist
- D: Poor privacy practices
- E: Very invasive, major concerns

Categories:
1. Data Collection (what's collected)
2. Data Sharing (third parties)
3. User Rights (access, deletion, etc.)
4. Data Retention (how long kept)
5. Tracking & Security

Respond ONLY with valid JSON matching this schema:
{
  "overall_score": "A|B|C|D|E",
  "score_breakdown": { "data_collection": "A|B|C|D|E", ... },
  "findings": { "positive": [...], "negative": [...], "neutral": [...] },
  "data_types_collected": [...],
  "third_parties": [...],
  "summary": "string"
}`
    },
    {
      role: 'user',
      content: `Analyze this privacy policy:\n\n${policyText}`
    }
  ],
  response_format: { type: 'json_object' }
};

SEO Requirements

Meta Tags (All Public Pages)

Every public page MUST include:

<!-- Basic Meta -->
<title>Descriptive Title - Privacy Policy Analyzer</title>
<meta name="description" content="150-160 character description">
<link rel="canonical" href="https://example.com/current-path">

<!-- Open Graph -->
<meta property="og:title" content="Page Title">
<meta property="og:description" content="Page description">
<meta property="og:image" content="https://example.com/og-image.jpg">
<meta property="og:url" content="https://example.com/current-path">
<meta property="og:type" content="website">

<!-- Twitter Card -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Page Title">
<meta name="twitter:description" content="Page description">
<meta name="twitter:image" content="https://example.com/twitter-image.jpg">

Structured Data (Schema.org)

Include JSON-LD structured data on all service pages:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Review",
  "itemReviewed": {
    "@type": "Organization",
    "name": "Service Name"
  },
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "4",
    "bestRating": "5",
    "worstRating": "1"
  }
}
</script>

Semantic HTML Requirements

One <h1> per page with main topic
Logical heading hierarchy (no skipping levels)
Use <header>, <nav>, <main>, <article>, <footer>
Breadcrumb navigation with Schema.org markup
Descriptive link text (no "click here")
Display "Last Analyzed" date prominently on all service pages (from analyses.created_at)

Accessibility Requirements (WCAG 2.1 AA)

Mandatory Implementation

Color Contrast: Minimum 4.5:1 for text, 3:1 for UI components
Keyboard Navigation: All features accessible via keyboard only
Focus Management: Visible focus indicators (2px solid outline minimum)
Alt Text: All images must have descriptive alt text
Form Labels: All inputs must have associated labels
ARIA Landmarks: banner, main, navigation, contentinfo
Skip Link: "Skip to main content" link at top of page

Accessibility Patterns

<!-- Skip Link -->
<a href="#main-content" class="skip-link">Skip to main content</a>

<!-- ARIA Landmarks -->
<header role="banner">...</header>
<nav role="navigation" aria-label="Main">...</nav>
<main id="main-content" role="main">...</main>
<footer role="contentinfo">...</footer>

<!-- Accessible Form -->
<label for="service-name">Service Name <span aria-label="required">*</span></label>
<input 
  type="text" 
  id="service-name" 
  name="serviceName" 
  required
  aria-required="true"
  aria-describedby="name-error"
>
<div id="name-error" role="alert" class="error-message"></div>

<!-- Accessible Button with Icon -->
<button aria-label="Close menu">
  <span aria-hidden="true">&times;</span>
</button>

<!-- Accessible Card Link -->
<article>
  <h2><a href="/service/facebook" aria-describedby="facebook-grade">Facebook</a></h2>
  <span id="facebook-grade" class="visually-hidden">Privacy Grade E</span>
</article>

Focus Styles (CSS)

/* Visible focus indicators */
:focus {
  outline: 2px solid #0066cc;
  outline-offset: 2px;
}

/* Skip link styling */
.skip-link {
  position: absolute;
  top: -40px;
  left: 0;
  background: #000;
  color: #fff;
  padding: 8px;
  z-index: 100;
}

.skip-link:focus {
  top: 0;
}

/* Visually hidden but screen-reader accessible */
.visually-hidden {
  position: absolute;
  width: 1px;
  height: 1px;
  padding: 0;
  margin: -1px;
  overflow: hidden;
  clip: rect(0, 0, 0, 0);
  border: 0;
}

File Templates

New Route (src/routes/example.js)

import { Router } from '../utils/router.js';
import { authenticate } from '../middleware/auth.js';

const router = new Router();

// Public route
router.get('/example', async (req, res) => {
  // Handler
});

// Protected route
router.get('/admin/example', authenticate, async (req, res) => {
  // Handler
});

export default router;

New Model (src/models/Example.js)

import { sql } from '../config/database.js';

export class Example {
  static async findById(id) {
    const result = await sql`SELECT * FROM examples WHERE id = ${id}`;
    return result[0] || null;
  }

  static async create(data) {
    const result = await sql`
      INSERT INTO examples (field1, field2)
      VALUES (${data.field1}, ${data.field2})
      RETURNING *
    `;
    return result[0];
  }
}

New Service (src/services/exampleService.js)

import { Example } from '../models/Example.js';

export const exampleService = {
  async performAction(params) {
    try {
      // Business logic
      return { success: true, data };
    } catch (error) {
      console.error('Service error:', error);
      throw error;
    }
  }
};

Testing Checklist

Functional Testing

App starts without errors (docker-compose up)
No hardcoded secrets or credentials
Database queries use parameterized statements
Admin routes require authentication
AI analysis handles errors gracefully
No sensitive data in logs

SEO Testing

All pages have unique <title> tags (50-60 chars)
All pages have meta descriptions (150-160 chars)
Open Graph tags present on all public pages
Canonical URLs set correctly
Sitemap.xml auto-generates and is valid
robots.txt allows public, blocks admin
Semantic HTML5 structure (header, nav, main, article, footer)
Single H1 per page with logical heading hierarchy
All images have descriptive alt text
Structured data (Schema.org) validates

Performance Testing

Lighthouse score ≥ 90 on all metrics
First Contentful Paint < 1.0s
Largest Contentful Paint < 2.5s
Time to Interactive < 3.8s
Cumulative Layout Shift < 0.1
Redis caching working (verify with redis-cli)
Gzip/Brotli compression enabled
Images optimized (WebP format, proper sizing)
CSS/JS minified

Security Testing

Security headers present on all responses
HTTPS enforced (HSTS header)
Cookies have HttpOnly, Secure, SameSite flags
Rate limiting prevents abuse (test with ab or wrk)
SQL injection attempts blocked
XSS attempts blocked (test with <script>alert(1)</script>)
Admin routes inaccessible without authentication
Session expires after 24 hours
bun audit passes with no critical vulnerabilities
No secrets in logs or error messages

Accessibility Testing (WCAG 2.1 AA)

All images have alt text
Color contrast ≥ 4.5:1 for normal text (test with WebAIM)
Color contrast ≥ 3:1 for large text and UI components
Keyboard navigation works throughout site
Focus indicators visible (2px outline minimum)
Skip to main content link present
Form labels associated with inputs
Page titles descriptive and unique
ARIA landmarks used (banner, main, navigation)
Screen reader announces content correctly (test with NVDA/VoiceOver)
Touch targets ≥ 44x44px
No flashing content (>3 Hz)
axe-core passes with 0 violations

Performance Guidelines

Caching Strategy

Public pages: Redis TTL 1 hour
Analysis results: Redis TTL 24 hours
API responses: Redis TTL 5 minutes
Meilisearch queries: Redis TTL 10 minutes
Cache invalidation on data update

Database Optimization

Index frequently queried columns: service.name, analysis.overall_score
Use connection pooling (max 20 connections)
Query optimization with EXPLAIN ANALYZE
Lazy load analysis results
Paginate service listings (25 per page)
Compress large policy texts before storage

Asset Optimization

Minify CSS/JS for production
Use WebP format for images
Implement lazy loading for images
Critical CSS inline for above-fold content
Use async/defer for non-critical scripts
Brotli + Gzip compression for text responses

Target Metrics

First Contentful Paint (FCP): < 1.0s
Largest Contentful Paint (LCP): < 2.5s
Time to Interactive (TTI): < 3.8s
Cumulative Layout Shift (CLS): < 0.1
Lighthouse Performance Score: ≥ 90

Deployment Notes

All services run via Docker Compose
Persistent volumes for PostgreSQL, Redis, Meilisearch
Restart policy: always (except during development)
Logs go to stdout/stderr (Docker handles collection)
Environment variables set in .env on host

Troubleshooting

App won't start

Check .env exists and has all required vars
Ensure ports 3000, 5432, 6379, 7700 are free
Run docker-compose down -v and docker-compose up -d

Database connection fails

Verify DATABASE_URL format
Check postgres container is running: docker-compose ps
Check logs: docker-compose logs postgres

AI analysis fails

Verify OPENAI_API_KEY is set
Check OpenAI API status
Review raw_analysis column for error details

External Dependencies

PostgreSQL: https://www.postgresql.org/docs/15/
Meilisearch: https://www.meilisearch.com/docs
Redis: https://redis.io/docs/
OpenAI API: https://platform.openai.com/docs
Bun: https://bun.sh/docs
EJS: https://ejs.co/#docs

Contact & Resources

Project Type: Private pet project
Hosting: Self-hosted Linode instance
No external contributors expected
No CI/CD pipeline (manual deployment)

Change Log

When making significant changes, update this section:

2026-01-27: Completed Phase 1-4 - Infrastructure, Database, Middleware, Routes, and Services. All core functionality working including Docker setup, PostgreSQL/Redis/Meilisearch, AI analysis with OpenAI, policy fetching, and cron scheduling.

Last Updated: 2026-01-27 Version: 1.0

20 KiB Raw Blame History