# Privacy Policy Analyzer - Implementation Plan ## Overview A self-hosted web application that analyzes privacy policies using AI (ChatGPT) and provides easy-to-understand ratings and summaries. ## Tech Stack - **Runtime**: Bun (JavaScript) - **Database**: PostgreSQL - **Search**: Meilisearch - **Cache**: Redis - **Templating**: EJS - **AI**: OpenAI API (GPT-4o/GPT-4-turbo) - **Containerization**: Docker Compose ## Project Structure ``` privacy-policy-analyzer/ ├── docker-compose.yml # Multi-service orchestration ├── Dockerfile # Bun app container ├── .env.example # Environment variables template ├── .env # Actual environment variables (gitignored) ├── package.json # Bun dependencies ├── src/ │ ├── app.js # Main application entry │ ├── config/ │ │ ├── database.js # PostgreSQL connection │ │ ├── redis.js # Redis connection │ │ ├── meilisearch.js # Meilisearch client │ │ └── openai.js # OpenAI client │ ├── models/ │ │ ├── Service.js # Service/site model │ │ ├── PolicyVersion.js # Policy version model │ │ └── Analysis.js # Analysis results model │ ├── routes/ │ │ ├── public.js # Public-facing routes │ │ └── admin.js # Admin panel routes │ ├── controllers/ │ │ ├── publicController.js │ │ └── adminController.js │ ├── services/ │ │ ├── aiAnalyzer.js # OpenAI analysis logic │ │ ├── policyFetcher.js # Fetch policy from URL │ │ ├── scheduler.js # Cron jobs │ │ └── searchIndexer.js # Meilisearch indexing │ ├── middleware/ │ │ ├── auth.js # Admin authentication │ │ └── errorHandler.js # Global error handling │ ├── views/ │ │ ├── layouts/ │ │ │ └── main.ejs │ │ ├── public/ │ │ │ ├── index.ejs # Service listing │ │ │ └── service.ejs # Service detail page │ │ └── admin/ │ │ ├── login.ejs │ │ ├── dashboard.ejs │ │ ├── add-service.ejs │ │ └── edit-service.ejs │ └── utils/ │ ├── logger.js │ └── validators.js └── migrations/ └── 001_initial.sql # Database schema ``` ## Database Schema ### Services Table ```sql CREATE TABLE services ( id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, url VARCHAR(500) NOT NULL, logo_url VARCHAR(500), policy_url VARCHAR(500), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); ``` ### Policy Versions Table ```sql CREATE TABLE policy_versions ( id SERIAL PRIMARY KEY, service_id INTEGER REFERENCES services(id) ON DELETE CASCADE, content TEXT NOT NULL, content_hash VARCHAR(64) NOT NULL, -- SHA-256 hash for change detection fetched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); ``` ### Analyses Table ```sql CREATE TABLE analyses ( id SERIAL PRIMARY KEY, service_id INTEGER REFERENCES services(id) ON DELETE CASCADE, policy_version_id INTEGER REFERENCES policy_versions(id) ON DELETE CASCADE, overall_score VARCHAR(1) NOT NULL, -- A, B, C, D, or E findings JSONB NOT NULL, -- Structured analysis results raw_analysis TEXT, -- Full AI response for debugging created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- When this analysis was created (used as "last analyzed" date) updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); ``` **Note**: The `created_at` field in the analyses table represents when the policy was last analyzed. This date must be displayed prominently on every service page so users know the freshness of the analysis. ### Admin Sessions Table ```sql CREATE TABLE admin_sessions ( id SERIAL PRIMARY KEY, session_token VARCHAR(255) UNIQUE NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, expires_at TIMESTAMP NOT NULL ); ``` ## AI Analysis Structure ### Scoring Parameters The AI will analyze privacy policies based on these weighted categories: 1. **Data Collection (25%)** - What personal data is collected - Scope of collection (minimal vs excessive) - Collection methods (active vs passive) 2. **Data Sharing (25%)** - Third-party sharing practices - Purposes for sharing - Sale of personal data 3. **User Rights (20%)** - Data access rights - Deletion rights - Portability rights - Opt-out mechanisms 4. **Data Retention (15%)** - Retention periods - Deletion policies - Post-account deletion handling 5. **Tracking & Security (15%)** - Tracking technologies used - Security measures mentioned - Encryption practices ### AI Output Schema (JSON) ```json { "overall_score": "A|B|C|D|E", "score_breakdown": { "data_collection": "A|B|C|D|E", "data_sharing": "A|B|C|D|E", "user_rights": "A|B|C|D|E", "data_retention": "A|B|C|D|E", "tracking_security": "A|B|C|D|E" }, "findings": { "positive": [ { "category": "user_rights", "title": "Clear deletion process", "description": "Users can delete their account and data easily", "severity": "good" } ], "negative": [ { "category": "data_sharing", "title": "Data sold to third parties", "description": "Personal data is sold to advertisers and partners", "severity": "blocker" } ], "neutral": [ { "category": "general", "title": "Policy updated regularly", "description": "Privacy policy is reviewed and updated annually", "severity": "neutral" } ] }, "data_types_collected": [ "name", "email", "location", "device_info" ], "third_parties": [ { "name": "Google Analytics", "purpose": "analytics", "data_shared": ["usage_data", "device_info"] } ], "summary": "Brief 2-3 sentence summary of the privacy policy" } ``` ### Severity Levels - **blocker**: Critical privacy concerns (red icon) - **bad**: Significant issues (orange icon) - **neutral**: Informational (gray icon) - **good**: Positive privacy practices (green icon) ## Features ### Phase 1: Foundation 1. **Docker Setup** - Bun application container - PostgreSQL container with persistent volume - Meilisearch container - Redis container - Docker network for inter-service communication 2. **Database Layer** - Migration system - Connection pooling - Basic CRUD operations for all models 3. **Basic Web Server** - Bun HTTP server or lightweight framework - EJS templating engine setup - Static file serving - Request logging ### Phase 2: Core Features 1. **Admin Authentication** - Simple login form - Session-based authentication (stored in Redis) - Single admin user (credentials in .env) - Protected admin routes 2. **Service Management** - Add new service (name, URL, policy URL) - Edit service details - Delete service - List all services in admin panel 3. **Policy Fetching** - Fetch policy from URL (with timeout and error handling) - Support for pasting policy text directly - Content hash generation for change detection - Store full policy text in database 4. **AI Analysis** - Manual trigger from admin panel - Structured prompt engineering - JSON mode for consistent output - Error handling and retry logic - Store analysis results in database 5. **Public Pages** - Homepage with service listing (A-E grades displayed, last analyzed dates shown) - Search functionality via Meilisearch - Individual service detail page with prominent "last analyzed" date display - Filter by grade ### Phase 3: Enhancements 1. **Automated Policy Updates** - Daily cron job to check all policy URLs - Compare content hash with latest version - Flag services with changed policies - Admin notification of pending re-analysis 2. **Re-analysis Workflow** - Bulk re-analysis of updated policies - Historical analysis comparison - Show policy change history on service page 3. **Search & Discovery** - Full-text search via Meilisearch - Filter by data types collected - Filter by third parties - Sort by grade, name, last analyzed 4. **Caching** - Redis caching for public pages - Cache analysis results - Cache search results - TTL-based cache invalidation ### Phase 4: Polish 1. **Error Handling** - Global error handler middleware - User-friendly error pages - Graceful degradation when AI is unavailable 2. **Rate Limiting** - Rate limit on AI analysis endpoint - Rate limit on policy fetching - Prevent abuse 3. **UI/UX** - Clean, simple design - Responsive layout - Grade badges with colors - Expandable finding details 4. **Monitoring** - Basic logging - Health check endpoint - Analysis success/failure metrics ## API Endpoints ### Public Routes - `GET /` - Homepage with service listing - `GET /search?q=query` - Search services - `GET /service/:id` - Service detail page - `GET /api/health` - Health check ### Admin Routes - `GET /admin/login` - Login page - `POST /admin/login` - Authenticate - `GET /admin/logout` - Logout - `GET /admin/dashboard` - Admin dashboard - `GET /admin/services/new` - Add service form - `POST /admin/services` - Create service - `GET /admin/services/:id/edit` - Edit service form - `POST /admin/services/:id` - Update service - `POST /admin/services/:id/delete` - Delete service - `POST /admin/services/:id/analyze` - Trigger analysis - `GET /admin/pending-updates` - Services with policy changes ## Environment Variables ```env # Database DATABASE_URL=postgresql://user:password@postgres:5432/privacy_analyzer # Redis REDIS_URL=redis://redis:6379 # Meilisearch MEILISEARCH_URL=http://meilisearch:7700 MEILISEARCH_API_KEY=your_master_key # OpenAI OPENAI_API_KEY=sk-your-api-key OPENAI_MODEL=gpt-4o # Admin ADMIN_USERNAME=admin ADMIN_PASSWORD=secure_password_here SESSION_SECRET=random_session_secret # App PORT=3000 NODE_ENV=production ``` ## Docker Compose Configuration ```yaml version: '3.8' services: app: build: . ports: - "3000:3000" environment: - DATABASE_URL=postgresql://postgres:password@postgres:5432/privacy_analyzer - REDIS_URL=redis://redis:6379 - MEILISEARCH_URL=http://meilisearch:7700 depends_on: - postgres - redis - meilisearch volumes: - ./src:/app/src postgres: image: postgres:15-alpine environment: - POSTGRES_USER=postgres - POSTGRES_PASSWORD=password - POSTGRES_DB=privacy_analyzer volumes: - postgres_data:/var/lib/postgresql/data ports: - "5432:5432" redis: image: redis:7-alpine volumes: - redis_data:/data ports: - "6379:6379" meilisearch: image: getmeili/meilisearch:v1.6 environment: - MEILI_MASTER_KEY=your_master_key volumes: - meilisearch_data:/meili_data ports: - "7700:7700" volumes: postgres_data: redis_data: meilisearch_data: ``` ## Non-Functional Requirements ### 1. Search Engine Optimization (SEO) #### In-Page SEO - **Title Tags**: Dynamic `
Last analyzed: January 27, 2026