This commit is contained in:
2026-01-27 14:10:13 -05:00
parent 7d60504e1d
commit 2bc01c90de
7 changed files with 745 additions and 11 deletions

355
README.md Normal file
View File

@@ -0,0 +1,355 @@
# Privacy Policy Analyzer
A self-hosted web application that analyzes privacy policies using AI and provides easy-to-understand A-E grades.
## Features
- **AI-Powered Analysis**: Uses Ollama (local LLM) with OpenAI fallback to analyze privacy policies
- **Background Processing**: Analysis jobs run asynchronously to prevent timeouts
- **A-E Grading System**: Clear letter grades based on privacy practices
- **Service Management**: Add, edit, and manage services through admin panel
- **Search**: Full-text search powered by Meilisearch
- **Caching**: Redis caching for fast page loads
- **SEO Optimized**: Sitemap.xml, robots.txt, and Schema.org structured data
- **Accessibility**: WCAG 2.1 AA compliant
- **Security**: OWASP-compliant security headers and best practices
## Tech Stack
- **Runtime**: Bun (JavaScript)
- **Database**: PostgreSQL 15
- **Cache**: Redis 7
- **Search**: Meilisearch 1.6
- **AI**: Ollama (gpt-oss:latest) with OpenAI fallback
- **Templating**: EJS
- **Containerization**: Docker + Docker Compose
## Quick Start
1. **Clone the repository**:
```bash
git clone <repository-url>
cd privacy-policy-analyzer
```
2. **Set up environment variables**:
```bash
cp .env.example .env
# Edit .env with your settings
```
3. **Start all services**:
```bash
docker-compose up -d
```
4. **Run database migrations**:
```bash
docker-compose exec app bun run migrate
```
5. **Access the application**:
- Public site: http://localhost:3000
- Admin panel: http://localhost:3000/admin/login
- Default credentials: admin / secure_password_here
## Configuration
### Environment Variables
```bash
# Database
DATABASE_URL=postgresql://postgres:changeme@postgres:5432/privacy_analyzer
# Redis
REDIS_URL=redis://redis:6379
# Meilisearch
MEILISEARCH_URL=http://meilisearch:7700
MEILISEARCH_API_KEY=your_secure_master_key
# AI Provider (Ollama - default, no API costs)
USE_OLLAMA=true
OLLAMA_URL=http://ollama:11434
OLLAMA_MODEL=gpt-oss:latest
# AI Provider (OpenAI - optional fallback)
OPENAI_API_KEY=sk-your-openai-api-key
OPENAI_MODEL=gpt-4o
# Admin Credentials
ADMIN_USERNAME=admin
ADMIN_PASSWORD=secure_password_here
SESSION_SECRET=your_random_session_secret
# Base URL for sitemap
BASE_URL=https://yourdomain.com
```
## Usage
### Adding a Service
1. Log in to admin panel
2. Click "Add New Service"
3. Enter service details:
- **Name**: Service name (e.g., "Facebook")
- **Service URL**: Main website URL
- **Privacy Policy URL**: Direct link to privacy policy
- **Logo URL**: (Optional) Service logo
4. Click "Add Service"
5. Click "Analyze" to queue analysis
### Viewing Analysis Results
- **Public site**: Browse all analyzed services with grades
- **Service detail**: Click any service for full analysis
- **Filter by grade**: Use grade filters on homepage
- **Search**: Use search bar to find services
### Admin Features
- **Dashboard**: Overview of all services and statistics
- **Background Analysis**: Analysis runs asynchronously
- **Queue Status**: Real-time view of analysis queue
- **Edit/Delete**: Manage existing services
## API Endpoints
### Public Endpoints
```
GET / # Homepage with service listing
GET /service/:id # Service detail page
GET /search?q=query # Search services
GET /sitemap.xml # XML sitemap
GET /robots.txt # Robots.txt
GET /api/health # Health check
GET /api/analysis/status/:jobId # Check analysis job status
```
### Admin Endpoints (Requires authentication)
```
GET/POST /admin/login # Login
GET /admin/logout # Logout
GET /admin/dashboard # Admin dashboard
GET/POST /admin/services/new # Add service
GET/POST /admin/services/:id # Edit service
POST /admin/services/:id/delete # Delete service
POST /admin/services/:id/analyze # Queue analysis
GET /api/analysis/queue # Queue status
```
## Background Analysis
The system uses a background worker for privacy policy analysis:
1. **Queue Job**: When you click "Analyze", job is added to Redis queue
2. **Process**: Worker picks up job and fetches policy
3. **Analyze**: AI analyzes the policy (Ollama or OpenAI)
4. **Store**: Results saved to database
5. **Notify**: Dashboard auto-refreshes with results
### Analysis Timing
- Local Ollama: 2-5 minutes per policy
- OpenAI API: 10-30 seconds per policy
## Caching
Redis caching improves performance:
- **Homepage**: 1 hour cache
- **Service detail**: 2 hour cache
- **Search results**: 5 minute cache
- **API responses**: 1 minute cache
Cache is automatically invalidated when services are created, updated, or deleted.
## Security
### Implemented Security Features
- Security headers (CSP, HSTS, X-Frame-Options, etc.)
- Session-based authentication with Redis storage
- CSRF protection
- Rate limiting
- Input validation and sanitization
- SQL injection prevention (parameterized queries)
- XSS prevention (EJS auto-escaping)
- Non-root Docker containers
### Security Headers
```
Strict-Transport-Security: max-age=31536000
Content-Security-Policy: default-src 'self'
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Referrer-Policy: strict-origin-when-cross-origin
```
## Performance
### Optimizations
- 75% HTML to text reduction for AI analysis
- Smart content truncation (keeps important sections)
- Redis page caching
- Database indexes on frequently queried columns
- Connection pooling (PostgreSQL, Redis)
### Target Metrics
- First Contentful Paint: < 1.0s
- Largest Contentful Paint: < 2.5s
- Time to Interactive: < 3.8s
## Deployment
### Production Deployment
1. **Set production environment**:
```bash
NODE_ENV=production
BASE_URL=https://yourdomain.com
```
2. **Update admin credentials**:
```bash
ADMIN_USERNAME=your_username
ADMIN_PASSWORD=strong_password_hash
SESSION_SECRET=random_32_char_string
```
3. **Enable HTTPS** (use reverse proxy like Nginx):
```bash
# Example Nginx config
server {
listen 443 ssl;
server_name yourdomain.com;
location / {
proxy_pass http://localhost:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
```
4. **Backup strategy**:
```bash
# Backup database
docker-compose exec postgres pg_dump -U postgres privacy_analyzer > backup.sql
# Backup Redis
docker-compose exec redis redis-cli SAVE
```
## Troubleshooting
### Common Issues
**1. Analysis times out**
- Solution: Analysis runs in background, check dashboard for status
- Ollama may take 2-5 minutes for first analysis
**2. 429 Rate Limit Error**
- You're using OpenAI without sufficient quota
- Solution: Switch to Ollama (default) or add billing to OpenAI account
**3. Service won't start**
```bash
# Check logs
docker-compose logs app
# Verify environment
docker-compose config
# Restart all services
docker-compose restart
```
**4. Database connection fails**
```bash
# Check PostgreSQL status
docker-compose ps postgres
# Run migrations
docker-compose exec app bun run migrate
```
### Logs
```bash
# App logs
docker-compose logs -f app
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f ollama
```
## Development
### Project Structure
```
privacy-policy-analyzer/
├── docker-compose.yml # Service orchestration
├── Dockerfile # Bun app container
├── .env # Environment variables
├── src/
│ ├── app.js # Main application
│ ├── config/ # Database, Redis, etc.
│ ├── models/ # Data models
│ ├── services/ # Business logic
│ ├── middleware/ # Auth, security, cache
│ ├── views/ # EJS templates
│ └── scripts/ # Utility scripts
├── migrations/ # SQL migrations
└── public/ # Static assets
```
### Useful Commands
```bash
# Start services
docker-compose up -d
# View logs
docker-compose logs -f app
# Run migrations
docker-compose exec app bun run migrate
# Database shell
docker-compose exec postgres psql -U postgres -d privacy_analyzer
# Redis shell
docker-compose exec redis redis-cli
# Test AI integration
docker-compose exec app bun run src/scripts/test-ollama.js
```
## License
MIT License - Private pet project
## Contributing
This is a private project. No external contributions expected.
## Support
For issues or questions, check the logs and ensure all services are healthy:
```bash
docker-compose ps
docker-compose logs
```