## Context Headroom is a greenfield web application to replace manual spreadsheet-based capacity planning for engineering teams. The current spreadsheet approach is error-prone, lacks validation, provides no audit trail, and wastes manager time (2+ hours monthly on allocation work). **Current State:** - No existing system to migrate from - Team uses spreadsheets for capacity planning and resource allocation - No automation, no validation, no visibility into team headroom **Constraints:** - MVP must be production-ready with >70% test coverage - Must run containerized (Docker Compose) from day 1 - Existing Nginx Proxy Manager in environment (no Caddy/Traefik) - Must support 10-15 developers across 10-12 concurrent projects - Monthly capacity planning cycle (not real-time) **Stakeholders:** - Engineering managers (primary users) - Team members (log hours, view allocations) - Top brass (view reports only) - Superuser/admin (system configuration) --- ## Goals / Non-Goals **Goals:** - Automate capacity calculations (holidays, PTO, availability) - Validate allocations against capacity and approved estimates - Prevent billing errors (over/under-allocation detection) - Provide clear visibility into team headroom - Track planned vs actual hours for utilization analysis - Generate 5 core reports with customizable filters - Reduce manager allocation time from 2+ hours to <30 minutes per month - Enforce role-based access control (4 personas) - Maintain >70% test coverage with comprehensive E2E tests **Non-Goals:** - Real-time notifications (deferred to Phase 2, polling is acceptable for MVP) - PDF/CSV report exports (deferred to Phase 2, on-screen only for MVP) - Time-tracking tool integration (manual entry only for MVP) - Multi-tenancy (single-tenant MVP, add tenant_id later) - Mobile app (desktop web only) - AI-powered forecasting (rule-based validation sufficient) --- ## Decisions ### Decision 1: Two-Container Architecture (Laravel API + SvelteKit Frontend) **Choice:** Separate Laravel API backend and SvelteKit frontend in different containers. **Rationale:** - Clean separation of concerns (API vs UI) - Easier to scale independently in future - SvelteKit is modern and great for dashboards, worth learning curve - Laravel provides robust API development (owner has PHP background) **Alternatives Considered:** - Laravel + Blade templates: Rejected (less interactive UI, harder for dashboards) - Laravel + Vue (Inertia): Rejected (owner preferred Svelte over Vue) - SvelteKit full-stack: Rejected (owner has PHP background, prefer Laravel for API) **Implementation:** - Frontend: SvelteKit (port 5173), Tailwind CSS + DaisyUI, Recharts, TanStack Table - Backend: Laravel 12 (latest) (port 3000), PostgreSQL (latest), Redis (latest) - Communication: REST API with Laravel API Resources for consistent JSON - Reverse proxy: Existing Nginx Proxy Manager routes `/api/*` → Laravel, `/*` → SvelteKit --- ### Decision 2: PostgreSQL from Day 1 **Choice:** Use PostgreSQL in production and development (no SQLite). **Rationale:** - Avoid migration pain later (SQLite → PostgreSQL is error-prone) - Production-grade features (JSON operators, full-text search, advanced indexing) - Better for reporting queries (complex aggregations, window functions) - Docker volume mount preserves portability **Alternatives Considered:** - SQLite for local dev, PostgreSQL for prod: Rejected (migration pain, feature parity issues) **Implementation:** - PostgreSQL (latest, Alpine container) - Volume-mounted to `./data/postgres` for backup/portability - Migrations from day 1 (Laravel migrations) - UUIDs for primary keys (prevents ID enumeration, easier distributed systems later) --- ### Decision 3: Redis Caching from Day 1 **Choice:** Implement query and response caching with Redis from the start. **Rationale:** - Owner insisted: "No need to sweat on refactoring it everywhere" - Prevents technical debt accumulation - Expensive queries (capacity calculations, reports) benefit immediately - Easy automatic cache invalidation with Laravel **Alternatives Considered:** - Defer caching to Phase 2: Rejected (owner's preference for avoiding future refactoring) **Implementation:** - Redis (latest, Alpine container) - Cache keys pattern: `allocations:month:{YYYY-MM}`, `reports:forecast:{from}:{to}:{hash}` - TTL: 1 hour (allocations), 15 min (reports), 24 hours (master data) - Automatic invalidation on mutations (create/update/delete triggers cache flush) - Laravel cache facade for consistency --- ### Decision 4: JWT Authentication (Token-Based) **Choice:** JWT tokens instead of session-based authentication. **Rationale:** - Stateless (better for API-first architecture) - Suitable for SPA frontend - Easier to add mobile app later (future-proofing) - Industry standard for REST APIs **Alternatives Considered:** - Laravel sessions: Rejected (owner preferred JWT for future mobile support) **Implementation:** - tymon/jwt-auth package - Access token: 60 minute TTL - Refresh token: 7 day TTL (stored in Redis, one-time use with rotation) - Token claims: user UUID, role, permissions array - Refresh endpoint rotates tokens on each use --- ### Decision 5: SvelteKit Frontend Stack **Choice:** SvelteKit + Tailwind CSS + DaisyUI + Recharts + TanStack Table + Superforms + Zod **Rationale:** - **DaisyUI**: Fast development, opinionated but speeds up dashboard creation - **Recharts**: Good balance of power and simplicity for charts - **TanStack Table**: Industry standard for data grids, powerful filtering/sorting - **Superforms + Zod**: Type-safe validation, seamless SvelteKit Form Actions integration **Alternatives Considered:** - Shadcn/ui: Rejected (DaisyUI faster for MVP) - Chart.js: Rejected (Recharts more powerful) - Custom table component: Rejected (TanStack is proven, owner unfamiliar but trusts recommendation) **Implementation:** - Svelte stores for minimal UI state only (filters, modals) - Fetch API for HTTP (no Axios, native is sufficient) - Vitest for unit tests, Playwright for E2E tests --- ### Decision 6: Allocation Validation Strategy **Choice:** Soft validation with visual indicators (GREEN/YELLOW/RED), not hard blocks. **Rationale:** - Managers sometimes need flexibility to over-allocate temporarily - Hard blocks would frustrate workflow - Visual warnings catch errors while allowing override - "This money is my salary!" — both over and under-allocation must be flagged **Validation Rules:** - GREEN: Allocation = Approved estimate (100%, within ±5% tolerance) - YELLOW: Under-allocation (<95% of approved estimate) — will undercharge - RED: Over-allocation (>105% of approved estimate) — will overcharge - Person capacity: YELLOW warning at >100%, RED alert at >120% **Implementation:** - API returns validation status with each allocation response - Frontend displays color-coded indicators in allocation matrix - Tooltip shows exact variance ("Over by 20 hours, will overcharge client") --- ### Decision 7: Monthly Aggregate Actuals (Not Daily) **Choice:** Track actual hours as monthly totals, allowing incremental weekly updates. **Rationale:** - Monthly planning cycle doesn't require daily granularity - Simplifies data model and UI - Team members can update weekly and system accumulates - No time-tracking integration for MVP (manual entry) **Alternatives Considered:** - Daily time logging: Rejected (over-engineering for MVP, adds complexity) - Weekly buckets: Rejected (monthly is sufficient given monthly planning cycle) **Implementation:** - Actuals table: project_id, team_member_id, month (YYYY-MM), hours_logged - UI allows replacing or incrementing monthly total - Utilization calculated as: (Actual hours / Capacity) × 100% --- ### Decision 8: Defer Real-Time Notifications to Phase 2 **Choice:** No WebSocket notifications in MVP, users refresh to see changes. **Rationale:** - Allocations are planned monthly, not time-critical - WebSocket setup adds 6 hours of dev time - Polling every 30s is acceptable alternative but also deferred - Focus MVP on core allocation/reporting functionality **Alternatives Considered:** - WebSocket + 1 notification PoC (6 hours): Rejected (not critical for monthly planning) - Polling-based notifications (2 hours): Rejected (also deferred, users can refresh) **Implementation (Phase 2):** - Laravel Broadcasting with Redis adapter - SvelteKit WebSocket client - Events: AllocationCreated, AllocationUpdated, EstimateApproved --- ### Decision 9: Database Schema Design **Choice:** Normalized schema with master data tables, JSON for forecasted effort, UUIDs for primary keys. **Key Tables:** - `team_members`: id (UUID), name, role_id (FK), hourly_rate, active - `projects`: id (UUID), code (unique), title, status_id (FK), type_id (FK), approved_estimate, forecasted_effort (JSON) - `allocations`: id (UUID), project_id (FK), team_member_id (FK), month (YYYY-MM), allocated_hours - `actuals`: id (UUID), project_id (FK), team_member_id (FK), month (YYYY-MM), hours_logged - `roles`, `project_statuses`, `project_types`: Master data tables - `holidays`, `ptos`: Calendar data **Design Rationale:** - **UUIDs**: Prevent ID enumeration attacks, easier distributed systems later - **Normalized master data**: Roles/statuses/types in separate tables for dynamic configuration - **Month as string (YYYY-MM)**: Simplifies queries, index-friendly, human-readable - **JSON for forecasted effort**: Flexible structure `{"2026-02": 40, "2026-03": 60}`, easy to extend - **Soft deletes for projects**: `deleted_at` timestamp for audit trail - **Active flag for team members**: Preserve historical allocations when person leaves **Indexes:** - `allocations`: composite index on (project_id, month), (team_member_id, month) - `actuals`: composite index on (project_id, month), (team_member_id, month) - `team_members`: index on (role_id, active) - `projects`: index on (status_id, type_id), unique on (code) --- ### Decision 10: API Design Pattern **Choice:** REST API with Laravel API Resources for consistent JSON responses. **Rationale:** - REST is simpler than GraphQL for this use case - Laravel API Resources provide consistent transformation layer - Standard HTTP verbs (GET, POST, PUT, DELETE) - Easy to document with Laravel Scribe (SwaggerUI) **Endpoint Structure:** ``` /api/auth/login (POST) /api/auth/logout (POST) /api/auth/refresh (POST) /api/team-members (GET, POST) /api/team-members/:id (GET, PUT, DELETE) /api/projects (GET, POST) /api/projects/:id (GET, PUT, DELETE) /api/allocations?month=YYYY-MM (GET, POST) /api/allocations/bulk (POST) /api/allocations/:id (PUT, DELETE) /api/actuals?month=YYYY-MM (GET, POST) /api/actuals/bulk (POST) /api/actuals/:id (PUT) /api/reports/forecast?from=YYYY-MM&to=YYYY-MM (GET) /api/reports/utilization?month=YYYY-MM (GET) /api/reports/cost?month=YYYY-MM (GET) /api/reports/allocation?month=YYYY-MM (GET) /api/reports/variance?month=YYYY-MM (GET) /api/master-data/roles (GET) /api/master-data/statuses (GET) /api/master-data/types (GET) ``` **Response Format (Laravel API Resources):** ```json { "data": { /* resource */ }, "meta": { /* pagination, counts */ }, "links": { /* HATEOAS links */ } } ``` **Error Format:** ```json { "message": "Validation failed", "errors": { "allocated_hours": ["Must be greater than 0"] } } ``` --- ### Decision 11: Testing Strategy **Choice:** >70% code coverage with unit + E2E + regression tests on every change. **Test Layers:** - **Backend Unit (PHPUnit)**: Model methods, service classes, utilities - **Backend Feature (Pest)**: API endpoints, authentication, authorization - **Frontend Unit (Vitest)**: Svelte components, stores, utilities - **E2E (Playwright)**: Critical user flows (login → allocate → view reports) **Coverage Targets:** - Backend: >80% (easier to test server-side logic) - Frontend: >70% (UI testing is harder) - Overall: >70% (enforced in `/opsx-verify`) **Test Data Strategy:** - Database seeders for test data (Laravel seeders) - Factories for model generation (Laravel factories) - Test fixtures for E2E tests (Playwright fixtures) **Regression Test Approach (MVP):** - Run full test suite on every change - E2E tests cover happy paths + critical error cases - Phase 2: Issue-driven loop (E2E failure → create GitHub issue → fix → retest) **Implementation:** - Pre-commit hooks: Run linters + unit tests - CI/CD: Run full suite (unit + E2E) before merge - `openspec verify` command: Check coverage, run tests, lint --- ## Risks / Trade-offs ### Risk: SvelteKit Learning Curve **Impact:** Owner and associate unfamiliar with Svelte, may slow initial development. **Mitigation:** - SvelteKit has excellent documentation - Simpler than React/Vue (less boilerplate) - TanStack Table is framework-agnostic (owner unfamiliar but AI will guide) - Start with simple components, iterate --- ### Risk: Two-Container Complexity **Impact:** More moving parts than single monolith, deployment overhead. **Mitigation:** - Docker Compose handles orchestration - Code-mounted volumes for hot reload (no rebuild needed) - Owner comfortable with Docker - Cleaner architecture worth the overhead --- ### Risk: Over-Allocation Soft Validation **Impact:** Managers could ignore RED flags and over-allocate anyway. **Mitigation:** - Visual warnings are prominent (RED color, tooltip with exact impact) - Reports show over-allocation clearly - Phase 2: Add email notifications when allocations exceed threshold - Manager discipline assumed (this is their job) --- ### Risk: Manual Time Entry Accuracy **Impact:** Team members may forget to log hours or log inaccurately. **Mitigation:** - Utilization reports highlight under-logging (planned > actual consistently) - Manager can follow up with team members showing low actuals - Phase 2: Integrate with time-tracking tools (Jira, Harvest, Toggl) - Incremental weekly updates reduce forgetting --- ### Risk: Cache Invalidation Bugs **Impact:** Stale data shown if cache invalidation logic fails. **Mitigation:** - Comprehensive test coverage for cache invalidation logic - Redis TTL ensures eventual consistency (max 1 hour stale) - Cache keys are scoped (project, month, person) - Invalidation triggered on all mutations (create/update/delete) - Manual cache flush available for admins --- ### Risk: JWT Token Security **Impact:** Stolen tokens could allow unauthorized access. **Mitigation:** - Refresh token rotation (one-time use) - Short access token TTL (60 minutes) - Refresh tokens stored in Redis (revocable) - HTTPS enforced via Nginx Proxy Manager - Logout invalidates refresh token - Token includes user role for authorization checks --- ### Trade-off: No Real-Time Notifications **Benefit:** Saves 6 hours of dev time, keeps MVP scope tight. **Cost:** Users must manually refresh to see allocation changes. **Justification:** Allocations are monthly planning activity, not time-critical. Acceptable for MVP. --- ### Trade-off: No PDF/CSV Exports **Benefit:** Faster MVP, avoids report formatting complexity. **Cost:** Users cannot export reports for offline viewing or stakeholder sharing. **Justification:** On-screen reports are primary value, exports are nice-to-have for Phase 2. --- ### Trade-off: Manual Time Entry **Benefit:** Avoids vendor lock-in, no integration complexity. **Cost:** Team members must manually enter hours monthly. **Justification:** Monthly aggregate is low overhead (~5 minutes per person per month). --- ## Migration Plan **Deployment Steps:** 1. **Initial Setup:** - Run `docker-compose up` (creates 4 containers) - Laravel migrations create database schema - Database seeders populate master data (roles, statuses, types) - Create superuser account via Laravel seeder 2. **Data Import (Optional):** - If team has historical spreadsheet data, create import script - Import team members (name, role, hourly rate) - Import active projects (code, title, approved estimate) - Do NOT import historical allocations (start fresh) 3. **User Onboarding:** - Train managers on allocation workflow (1 hour session) - Demo: capacity planning → project setup → allocation → reports - Provide Quick Start guide (Markdown doc) 4. **Go-Live:** - Managers create February 2026 capacity plan (holidays, PTO, availability) - Managers allocate resources for February - Team members log February actuals mid-month (incremental updates) - Month-end: Review utilization reports, adjust March allocations **Rollback Strategy:** - MVP is greenfield (no data migration to revert) - If critical bug discovered, roll back to previous container image - Docker Compose down/up with previous image tag - PostgreSQL data persisted in volume (safe across container restarts) - Zero-downtime rollback: Blue/green deployment (Phase 2, not needed for MVP) **Monitoring (Phase 2):** - Application logs (Laravel log files) - Database performance (PostgreSQL slow query log) - Cache hit rate (Redis INFO stats) - API response times (Laravel Telescope or custom middleware) --- ## Open Questions ### Question 1: Hourly Rate Visibility Should developers see their own hourly rate, or only managers/top brass? **Options:** - A) Developers can see their own rate (transparency) - B) Developers cannot see rates (only allocations) **Recommendation:** A (transparency fosters trust, rate is not secret in most orgs) **Decision:** To be finalized with owner before implementation. --- ### Question 2: Hours Per Day Configuration Is "1.0 availability = 8 hours" globally configured, or per-project? **Options:** - A) Global setting (e.g., 1.0 = 8 hours for everyone) - B) Per-team member (some people work 6-hour days) - C) Per-project (different billing rates for different project types) **Recommendation:** A (global setting, simplest for MVP) **Decision:** Owner mentioned "configurable per project" but likely meant per team. Clarify. --- ### Question 3: PTO Approval Workflow Is PTO auto-approved, or does it require manager approval? **Options:** - A) Auto-approved (capacity reduced immediately) - B) Requires approval (pending state until manager approves) **Recommendation:** B (manager approval, prevents abuse) **Decision:** Owner likely expects approval workflow. Confirm. --- ### Question 4: Support Projects in Revenue Forecast Should "Support" type projects appear in revenue forecasts? **Options:** - A) Exclude from revenue (they're ongoing ops, not billable) - B) Include in revenue (still billable internally) **Recommendation:** Ask owner's preference (may vary by org) **Decision:** To be confirmed during implementation. --- ### Question 5: Allocation Tolerance Threshold What's the tolerance for "within estimate" (GREEN indicator)? **Current assumption:** ±5% (e.g., 100-hour project allocated 95-105 hours is GREEN) **Confirm:** Is 5% the right threshold, or should it be configurable? **Decision:** Start with 5%, make configurable in Phase 2 if needed. --- **End of Design Document** **Next Steps:** 1. Review open questions with owner 2. Finalize database schema (ERD diagram) 3. Create tasks.md (implementation checklist) 4. Begin Sprint 1: Docker Compose setup + database migrations