19 KiB
Context
Headroom is a greenfield web application to replace manual spreadsheet-based capacity planning for engineering teams. The current spreadsheet approach is error-prone, lacks validation, provides no audit trail, and wastes manager time (2+ hours monthly on allocation work).
Current State:
- No existing system to migrate from
- Team uses spreadsheets for capacity planning and resource allocation
- No automation, no validation, no visibility into team headroom
Constraints:
- MVP must be production-ready with >70% test coverage
- Must run containerized (Docker Compose) from day 1
- Existing Nginx Proxy Manager in environment (no Caddy/Traefik)
- Must support 10-15 developers across 10-12 concurrent projects
- Monthly capacity planning cycle (not real-time)
Stakeholders:
- Engineering managers (primary users)
- Team members (log hours, view allocations)
- Top brass (view reports only)
- Superuser/admin (system configuration)
Goals / Non-Goals
Goals:
- Automate capacity calculations (holidays, PTO, availability)
- Validate allocations against capacity and approved estimates
- Prevent billing errors (over/under-allocation detection)
- Provide clear visibility into team headroom
- Track planned vs actual hours for utilization analysis
- Generate 5 core reports with customizable filters
- Reduce manager allocation time from 2+ hours to <30 minutes per month
- Enforce role-based access control (4 personas)
- Maintain >70% test coverage with comprehensive E2E tests
Non-Goals:
- Real-time notifications (deferred to Phase 2, polling is acceptable for MVP)
- PDF/CSV report exports (deferred to Phase 2, on-screen only for MVP)
- Time-tracking tool integration (manual entry only for MVP)
- Multi-tenancy (single-tenant MVP, add tenant_id later)
- Mobile app (desktop web only)
- AI-powered forecasting (rule-based validation sufficient)
Decisions
Decision 1: Two-Container Architecture (Laravel API + SvelteKit Frontend)
Choice: Separate Laravel API backend and SvelteKit frontend in different containers.
Rationale:
- Clean separation of concerns (API vs UI)
- Easier to scale independently in future
- SvelteKit is modern and great for dashboards, worth learning curve
- Laravel provides robust API development (owner has PHP background)
Alternatives Considered:
- Laravel + Blade templates: Rejected (less interactive UI, harder for dashboards)
- Laravel + Vue (Inertia): Rejected (owner preferred Svelte over Vue)
- SvelteKit full-stack: Rejected (owner has PHP background, prefer Laravel for API)
Implementation:
- Frontend: SvelteKit (port 5173), Tailwind CSS + DaisyUI, Recharts, TanStack Table
- Backend: Laravel 12 (latest) (port 3000), PostgreSQL (latest), Redis (latest)
- Communication: REST API with Laravel API Resources for consistent JSON
- Reverse proxy: Existing Nginx Proxy Manager routes
/api/*→ Laravel,/*→ SvelteKit
Decision 2: PostgreSQL from Day 1
Choice: Use PostgreSQL in production and development (no SQLite).
Rationale:
- Avoid migration pain later (SQLite → PostgreSQL is error-prone)
- Production-grade features (JSON operators, full-text search, advanced indexing)
- Better for reporting queries (complex aggregations, window functions)
- Docker volume mount preserves portability
Alternatives Considered:
- SQLite for local dev, PostgreSQL for prod: Rejected (migration pain, feature parity issues)
Implementation:
- PostgreSQL (latest, Alpine container)
- Volume-mounted to
./data/postgresfor backup/portability - Migrations from day 1 (Laravel migrations)
- UUIDs for primary keys (prevents ID enumeration, easier distributed systems later)
Decision 3: Redis Caching from Day 1
Choice: Implement query and response caching with Redis from the start.
Rationale:
- Owner insisted: "No need to sweat on refactoring it everywhere"
- Prevents technical debt accumulation
- Expensive queries (capacity calculations, reports) benefit immediately
- Easy automatic cache invalidation with Laravel
Alternatives Considered:
- Defer caching to Phase 2: Rejected (owner's preference for avoiding future refactoring)
Implementation:
- Redis (latest, Alpine container)
- Cache keys pattern:
allocations:month:{YYYY-MM},reports:forecast:{from}:{to}:{hash} - TTL: 1 hour (allocations), 15 min (reports), 24 hours (master data)
- Automatic invalidation on mutations (create/update/delete triggers cache flush)
- Laravel cache facade for consistency
Decision 4: JWT Authentication (Token-Based)
Choice: JWT tokens instead of session-based authentication.
Rationale:
- Stateless (better for API-first architecture)
- Suitable for SPA frontend
- Easier to add mobile app later (future-proofing)
- Industry standard for REST APIs
Alternatives Considered:
- Laravel sessions: Rejected (owner preferred JWT for future mobile support)
Implementation:
- tymon/jwt-auth package
- Access token: 60 minute TTL
- Refresh token: 7 day TTL (stored in Redis, one-time use with rotation)
- Token claims: user UUID, role, permissions array
- Refresh endpoint rotates tokens on each use
Decision 5: SvelteKit Frontend Stack
Choice: SvelteKit + Tailwind CSS + DaisyUI + Recharts + TanStack Table + Superforms + Zod
Rationale:
- DaisyUI: Fast development, opinionated but speeds up dashboard creation
- Recharts: Good balance of power and simplicity for charts
- TanStack Table: Industry standard for data grids, powerful filtering/sorting
- Superforms + Zod: Type-safe validation, seamless SvelteKit Form Actions integration
Alternatives Considered:
- Shadcn/ui: Rejected (DaisyUI faster for MVP)
- Chart.js: Rejected (Recharts more powerful)
- Custom table component: Rejected (TanStack is proven, owner unfamiliar but trusts recommendation)
Implementation:
- Svelte stores for minimal UI state only (filters, modals)
- Fetch API for HTTP (no Axios, native is sufficient)
- Vitest for unit tests, Playwright for E2E tests
Decision 6: Allocation Validation Strategy
Choice: Soft validation with visual indicators (GREEN/YELLOW/RED), not hard blocks.
Rationale:
- Managers sometimes need flexibility to over-allocate temporarily
- Hard blocks would frustrate workflow
- Visual warnings catch errors while allowing override
- "This money is my salary!" — both over and under-allocation must be flagged
Validation Rules:
- GREEN: Allocation = Approved estimate (100%, within ±5% tolerance)
- YELLOW: Under-allocation (<95% of approved estimate) — will undercharge
- RED: Over-allocation (>105% of approved estimate) — will overcharge
- Person capacity: YELLOW warning at >100%, RED alert at >120%
Implementation:
- API returns validation status with each allocation response
- Frontend displays color-coded indicators in allocation matrix
- Tooltip shows exact variance ("Over by 20 hours, will overcharge client")
Decision 7: Monthly Aggregate Actuals (Not Daily)
Choice: Track actual hours as monthly totals, allowing incremental weekly updates.
Rationale:
- Monthly planning cycle doesn't require daily granularity
- Simplifies data model and UI
- Team members can update weekly and system accumulates
- No time-tracking integration for MVP (manual entry)
Alternatives Considered:
- Daily time logging: Rejected (over-engineering for MVP, adds complexity)
- Weekly buckets: Rejected (monthly is sufficient given monthly planning cycle)
Implementation:
- Actuals table: project_id, team_member_id, month (YYYY-MM), hours_logged
- UI allows replacing or incrementing monthly total
- Utilization calculated as: (Actual hours / Capacity) × 100%
Decision 8: Defer Real-Time Notifications to Phase 2
Choice: No WebSocket notifications in MVP, users refresh to see changes.
Rationale:
- Allocations are planned monthly, not time-critical
- WebSocket setup adds 6 hours of dev time
- Polling every 30s is acceptable alternative but also deferred
- Focus MVP on core allocation/reporting functionality
Alternatives Considered:
- WebSocket + 1 notification PoC (6 hours): Rejected (not critical for monthly planning)
- Polling-based notifications (2 hours): Rejected (also deferred, users can refresh)
Implementation (Phase 2):
- Laravel Broadcasting with Redis adapter
- SvelteKit WebSocket client
- Events: AllocationCreated, AllocationUpdated, EstimateApproved
Decision 9: Database Schema Design
Choice: Normalized schema with master data tables, JSON for forecasted effort, UUIDs for primary keys.
Key Tables:
team_members: id (UUID), name, role_id (FK), hourly_rate, activeprojects: id (UUID), code (unique), title, status_id (FK), type_id (FK), approved_estimate, forecasted_effort (JSON)allocations: id (UUID), project_id (FK), team_member_id (FK), month (YYYY-MM), allocated_hoursactuals: id (UUID), project_id (FK), team_member_id (FK), month (YYYY-MM), hours_loggedroles,project_statuses,project_types: Master data tablesholidays,ptos: Calendar data
Design Rationale:
- UUIDs: Prevent ID enumeration attacks, easier distributed systems later
- Normalized master data: Roles/statuses/types in separate tables for dynamic configuration
- Month as string (YYYY-MM): Simplifies queries, index-friendly, human-readable
- JSON for forecasted effort: Flexible structure
{"2026-02": 40, "2026-03": 60}, easy to extend - Soft deletes for projects:
deleted_attimestamp for audit trail - Active flag for team members: Preserve historical allocations when person leaves
Indexes:
allocations: composite index on (project_id, month), (team_member_id, month)actuals: composite index on (project_id, month), (team_member_id, month)team_members: index on (role_id, active)projects: index on (status_id, type_id), unique on (code)
Decision 10: API Design Pattern
Choice: REST API with Laravel API Resources for consistent JSON responses.
Rationale:
- REST is simpler than GraphQL for this use case
- Laravel API Resources provide consistent transformation layer
- Standard HTTP verbs (GET, POST, PUT, DELETE)
- Easy to document with Laravel Scribe (SwaggerUI)
Endpoint Structure:
/api/auth/login (POST)
/api/auth/logout (POST)
/api/auth/refresh (POST)
/api/team-members (GET, POST)
/api/team-members/:id (GET, PUT, DELETE)
/api/projects (GET, POST)
/api/projects/:id (GET, PUT, DELETE)
/api/allocations?month=YYYY-MM (GET, POST)
/api/allocations/bulk (POST)
/api/allocations/:id (PUT, DELETE)
/api/actuals?month=YYYY-MM (GET, POST)
/api/actuals/bulk (POST)
/api/actuals/:id (PUT)
/api/reports/forecast?from=YYYY-MM&to=YYYY-MM (GET)
/api/reports/utilization?month=YYYY-MM (GET)
/api/reports/cost?month=YYYY-MM (GET)
/api/reports/allocation?month=YYYY-MM (GET)
/api/reports/variance?month=YYYY-MM (GET)
/api/master-data/roles (GET)
/api/master-data/statuses (GET)
/api/master-data/types (GET)
Response Format (Laravel API Resources):
{
"data": { /* resource */ },
"meta": { /* pagination, counts */ },
"links": { /* HATEOAS links */ }
}
Error Format:
{
"message": "Validation failed",
"errors": {
"allocated_hours": ["Must be greater than 0"]
}
}
Decision 11: Testing Strategy
Choice: >70% code coverage with unit + E2E + regression tests on every change.
Test Layers:
- Backend Unit (PHPUnit): Model methods, service classes, utilities
- Backend Feature (Pest): API endpoints, authentication, authorization
- Frontend Unit (Vitest): Svelte components, stores, utilities
- E2E (Playwright): Critical user flows (login → allocate → view reports)
Coverage Targets:
- Backend: >80% (easier to test server-side logic)
- Frontend: >70% (UI testing is harder)
- Overall: >70% (enforced in
/opsx-verify)
Test Data Strategy:
- Database seeders for test data (Laravel seeders)
- Factories for model generation (Laravel factories)
- Test fixtures for E2E tests (Playwright fixtures)
Regression Test Approach (MVP):
- Run full test suite on every change
- E2E tests cover happy paths + critical error cases
- Phase 2: Issue-driven loop (E2E failure → create GitHub issue → fix → retest)
Implementation:
- Pre-commit hooks: Run linters + unit tests
- CI/CD: Run full suite (unit + E2E) before merge
openspec verifycommand: Check coverage, run tests, lint
Risks / Trade-offs
Risk: SvelteKit Learning Curve
Impact: Owner and associate unfamiliar with Svelte, may slow initial development.
Mitigation:
- SvelteKit has excellent documentation
- Simpler than React/Vue (less boilerplate)
- TanStack Table is framework-agnostic (owner unfamiliar but AI will guide)
- Start with simple components, iterate
Risk: Two-Container Complexity
Impact: More moving parts than single monolith, deployment overhead.
Mitigation:
- Docker Compose handles orchestration
- Code-mounted volumes for hot reload (no rebuild needed)
- Owner comfortable with Docker
- Cleaner architecture worth the overhead
Risk: Over-Allocation Soft Validation
Impact: Managers could ignore RED flags and over-allocate anyway.
Mitigation:
- Visual warnings are prominent (RED color, tooltip with exact impact)
- Reports show over-allocation clearly
- Phase 2: Add email notifications when allocations exceed threshold
- Manager discipline assumed (this is their job)
Risk: Manual Time Entry Accuracy
Impact: Team members may forget to log hours or log inaccurately.
Mitigation:
- Utilization reports highlight under-logging (planned > actual consistently)
- Manager can follow up with team members showing low actuals
- Phase 2: Integrate with time-tracking tools (Jira, Harvest, Toggl)
- Incremental weekly updates reduce forgetting
Risk: Cache Invalidation Bugs
Impact: Stale data shown if cache invalidation logic fails.
Mitigation:
- Comprehensive test coverage for cache invalidation logic
- Redis TTL ensures eventual consistency (max 1 hour stale)
- Cache keys are scoped (project, month, person)
- Invalidation triggered on all mutations (create/update/delete)
- Manual cache flush available for admins
Risk: JWT Token Security
Impact: Stolen tokens could allow unauthorized access.
Mitigation:
- Refresh token rotation (one-time use)
- Short access token TTL (60 minutes)
- Refresh tokens stored in Redis (revocable)
- HTTPS enforced via Nginx Proxy Manager
- Logout invalidates refresh token
- Token includes user role for authorization checks
Trade-off: No Real-Time Notifications
Benefit: Saves 6 hours of dev time, keeps MVP scope tight.
Cost: Users must manually refresh to see allocation changes.
Justification: Allocations are monthly planning activity, not time-critical. Acceptable for MVP.
Trade-off: No PDF/CSV Exports
Benefit: Faster MVP, avoids report formatting complexity.
Cost: Users cannot export reports for offline viewing or stakeholder sharing.
Justification: On-screen reports are primary value, exports are nice-to-have for Phase 2.
Trade-off: Manual Time Entry
Benefit: Avoids vendor lock-in, no integration complexity.
Cost: Team members must manually enter hours monthly.
Justification: Monthly aggregate is low overhead (~5 minutes per person per month).
Migration Plan
Deployment Steps:
-
Initial Setup:
- Run
docker-compose up(creates 4 containers) - Laravel migrations create database schema
- Database seeders populate master data (roles, statuses, types)
- Create superuser account via Laravel seeder
- Run
-
Data Import (Optional):
- If team has historical spreadsheet data, create import script
- Import team members (name, role, hourly rate)
- Import active projects (code, title, approved estimate)
- Do NOT import historical allocations (start fresh)
-
User Onboarding:
- Train managers on allocation workflow (1 hour session)
- Demo: capacity planning → project setup → allocation → reports
- Provide Quick Start guide (Markdown doc)
-
Go-Live:
- Managers create February 2026 capacity plan (holidays, PTO, availability)
- Managers allocate resources for February
- Team members log February actuals mid-month (incremental updates)
- Month-end: Review utilization reports, adjust March allocations
Rollback Strategy:
- MVP is greenfield (no data migration to revert)
- If critical bug discovered, roll back to previous container image
- Docker Compose down/up with previous image tag
- PostgreSQL data persisted in volume (safe across container restarts)
- Zero-downtime rollback: Blue/green deployment (Phase 2, not needed for MVP)
Monitoring (Phase 2):
- Application logs (Laravel log files)
- Database performance (PostgreSQL slow query log)
- Cache hit rate (Redis INFO stats)
- API response times (Laravel Telescope or custom middleware)
Open Questions
Question 1: Hourly Rate Visibility
Should developers see their own hourly rate, or only managers/top brass?
Options:
- A) Developers can see their own rate (transparency)
- B) Developers cannot see rates (only allocations)
Recommendation: A (transparency fosters trust, rate is not secret in most orgs)
Decision: To be finalized with owner before implementation.
Question 2: Hours Per Day Configuration
Is "1.0 availability = 8 hours" globally configured, or per-project?
Options:
- A) Global setting (e.g., 1.0 = 8 hours for everyone)
- B) Per-team member (some people work 6-hour days)
- C) Per-project (different billing rates for different project types)
Recommendation: A (global setting, simplest for MVP)
Decision: Owner mentioned "configurable per project" but likely meant per team. Clarify.
Question 3: PTO Approval Workflow
Is PTO auto-approved, or does it require manager approval?
Options:
- A) Auto-approved (capacity reduced immediately)
- B) Requires approval (pending state until manager approves)
Recommendation: B (manager approval, prevents abuse)
Decision: Owner likely expects approval workflow. Confirm.
Question 4: Support Projects in Revenue Forecast
Should "Support" type projects appear in revenue forecasts?
Options:
- A) Exclude from revenue (they're ongoing ops, not billable)
- B) Include in revenue (still billable internally)
Recommendation: Ask owner's preference (may vary by org)
Decision: To be confirmed during implementation.
Question 5: Allocation Tolerance Threshold
What's the tolerance for "within estimate" (GREEN indicator)?
Current assumption: ±5% (e.g., 100-hour project allocated 95-105 hours is GREEN)
Confirm: Is 5% the right threshold, or should it be configurable?
Decision: Start with 5%, make configurable in Phase 2 if needed.
End of Design Document
Next Steps:
- Review open questions with owner
- Finalize database schema (ERD diagram)
- Create tasks.md (implementation checklist)
- Begin Sprint 1: Docker Compose setup + database migrations