Headroom - Foundation

2026-02-17 02:10:23 -05:00
commit 04022b7e0b
46 changed files with 10488 additions and 0 deletions
--- a/openspec/changes/headroom-foundation/design.md
+++ b/openspec/changes/headroom-foundation/design.md
@@ -0,0 +1,559 @@
+## Context
+
+Headroom is a greenfield web application to replace manual spreadsheet-based capacity planning for engineering teams. The current spreadsheet approach is error-prone, lacks validation, provides no audit trail, and wastes manager time (2+ hours monthly on allocation work).
+
+**Current State:**
+- No existing system to migrate from
+- Team uses spreadsheets for capacity planning and resource allocation
+- No automation, no validation, no visibility into team headroom
+
+**Constraints:**
+- MVP must be production-ready with >70% test coverage
+- Must run containerized (Docker Compose) from day 1
+- Existing Nginx Proxy Manager in environment (no Caddy/Traefik)
+- Must support 10-15 developers across 10-12 concurrent projects
+- Monthly capacity planning cycle (not real-time)
+
+**Stakeholders:**
+- Engineering managers (primary users)
+- Team members (log hours, view allocations)
+- Top brass (view reports only)
+- Superuser/admin (system configuration)
+
+---
+
+## Goals / Non-Goals
+
+**Goals:**
+- Automate capacity calculations (holidays, PTO, availability)
+- Validate allocations against capacity and approved estimates
+- Prevent billing errors (over/under-allocation detection)
+- Provide clear visibility into team headroom
+- Track planned vs actual hours for utilization analysis
+- Generate 5 core reports with customizable filters
+- Reduce manager allocation time from 2+ hours to <30 minutes per month
+- Enforce role-based access control (4 personas)
+- Maintain >70% test coverage with comprehensive E2E tests
+
+**Non-Goals:**
+- Real-time notifications (deferred to Phase 2, polling is acceptable for MVP)
+- PDF/CSV report exports (deferred to Phase 2, on-screen only for MVP)
+- Time-tracking tool integration (manual entry only for MVP)
+- Multi-tenancy (single-tenant MVP, add tenant_id later)
+- Mobile app (desktop web only)
+- AI-powered forecasting (rule-based validation sufficient)
+
+---
+
+## Decisions
+
+### Decision 1: Two-Container Architecture (Laravel API + SvelteKit Frontend)
+
+**Choice:** Separate Laravel API backend and SvelteKit frontend in different containers.
+
+**Rationale:**
+- Clean separation of concerns (API vs UI)
+- Easier to scale independently in future
+- SvelteKit is modern and great for dashboards, worth learning curve
+- Laravel provides robust API development (owner has PHP background)
+
+**Alternatives Considered:**
+- Laravel + Blade templates: Rejected (less interactive UI, harder for dashboards)
+- Laravel + Vue (Inertia): Rejected (owner preferred Svelte over Vue)
+- SvelteKit full-stack: Rejected (owner has PHP background, prefer Laravel for API)
+
+**Implementation:**
+- Frontend: SvelteKit (port 5173), Tailwind CSS + DaisyUI, Recharts, TanStack Table
+- Backend: Laravel 12 (latest) (port 3000), PostgreSQL (latest), Redis (latest)
+- Communication: REST API with Laravel API Resources for consistent JSON
+- Reverse proxy: Existing Nginx Proxy Manager routes `/api/*` → Laravel, `/*` → SvelteKit
+
+---
+
+### Decision 2: PostgreSQL from Day 1
+
+**Choice:** Use PostgreSQL in production and development (no SQLite).
+
+**Rationale:**
+- Avoid migration pain later (SQLite → PostgreSQL is error-prone)
+- Production-grade features (JSON operators, full-text search, advanced indexing)
+- Better for reporting queries (complex aggregations, window functions)
+- Docker volume mount preserves portability
+
+**Alternatives Considered:**
+- SQLite for local dev, PostgreSQL for prod: Rejected (migration pain, feature parity issues)
+
+**Implementation:**
+- PostgreSQL (latest, Alpine container)
+- Volume-mounted to `./data/postgres` for backup/portability
+- Migrations from day 1 (Laravel migrations)
+- UUIDs for primary keys (prevents ID enumeration, easier distributed systems later)
+
+---
+
+### Decision 3: Redis Caching from Day 1
+
+**Choice:** Implement query and response caching with Redis from the start.
+
+**Rationale:**
+- Owner insisted: "No need to sweat on refactoring it everywhere"
+- Prevents technical debt accumulation
+- Expensive queries (capacity calculations, reports) benefit immediately
+- Easy automatic cache invalidation with Laravel
+
+**Alternatives Considered:**
+- Defer caching to Phase 2: Rejected (owner's preference for avoiding future refactoring)
+
+**Implementation:**
+- Redis (latest, Alpine container)
+- Cache keys pattern: `allocations:month:{YYYY-MM}`, `reports:forecast:{from}:{to}:{hash}`
+- TTL: 1 hour (allocations), 15 min (reports), 24 hours (master data)
+- Automatic invalidation on mutations (create/update/delete triggers cache flush)
+- Laravel cache facade for consistency
+
+---
+
+### Decision 4: JWT Authentication (Token-Based)
+
+**Choice:** JWT tokens instead of session-based authentication.
+
+**Rationale:**
+- Stateless (better for API-first architecture)
+- Suitable for SPA frontend
+- Easier to add mobile app later (future-proofing)
+- Industry standard for REST APIs
+
+**Alternatives Considered:**
+- Laravel sessions: Rejected (owner preferred JWT for future mobile support)
+
+**Implementation:**
+- tymon/jwt-auth package
+- Access token: 60 minute TTL
+- Refresh token: 7 day TTL (stored in Redis, one-time use with rotation)
+- Token claims: user UUID, role, permissions array
+- Refresh endpoint rotates tokens on each use
+
+---
+
+### Decision 5: SvelteKit Frontend Stack
+
+**Choice:** SvelteKit + Tailwind CSS + DaisyUI + Recharts + TanStack Table + Superforms + Zod
+
+**Rationale:**
+- **DaisyUI**: Fast development, opinionated but speeds up dashboard creation
+- **Recharts**: Good balance of power and simplicity for charts
+- **TanStack Table**: Industry standard for data grids, powerful filtering/sorting
+- **Superforms + Zod**: Type-safe validation, seamless SvelteKit Form Actions integration
+
+**Alternatives Considered:**
+- Shadcn/ui: Rejected (DaisyUI faster for MVP)
+- Chart.js: Rejected (Recharts more powerful)
+- Custom table component: Rejected (TanStack is proven, owner unfamiliar but trusts recommendation)
+
+**Implementation:**
+- Svelte stores for minimal UI state only (filters, modals)
+- Fetch API for HTTP (no Axios, native is sufficient)
+- Vitest for unit tests, Playwright for E2E tests
+
+---
+
+### Decision 6: Allocation Validation Strategy
+
+**Choice:** Soft validation with visual indicators (GREEN/YELLOW/RED), not hard blocks.
+
+**Rationale:**
+- Managers sometimes need flexibility to over-allocate temporarily
+- Hard blocks would frustrate workflow
+- Visual warnings catch errors while allowing override
+- "This money is my salary!" — both over and under-allocation must be flagged
+
+**Validation Rules:**
+- GREEN: Allocation = Approved estimate (100%, within ±5% tolerance)
+- YELLOW: Under-allocation (<95% of approved estimate) — will undercharge
+- RED: Over-allocation (>105% of approved estimate) — will overcharge
+- Person capacity: YELLOW warning at >100%, RED alert at >120%
+
+**Implementation:**
+- API returns validation status with each allocation response
+- Frontend displays color-coded indicators in allocation matrix
+- Tooltip shows exact variance ("Over by 20 hours, will overcharge client")
+
+---
+
+### Decision 7: Monthly Aggregate Actuals (Not Daily)
+
+**Choice:** Track actual hours as monthly totals, allowing incremental weekly updates.
+
+**Rationale:**
+- Monthly planning cycle doesn't require daily granularity
+- Simplifies data model and UI
+- Team members can update weekly and system accumulates
+- No time-tracking integration for MVP (manual entry)
+
+**Alternatives Considered:**
+- Daily time logging: Rejected (over-engineering for MVP, adds complexity)
+- Weekly buckets: Rejected (monthly is sufficient given monthly planning cycle)
+
+**Implementation:**
+- Actuals table: project_id, team_member_id, month (YYYY-MM), hours_logged
+- UI allows replacing or incrementing monthly total
+- Utilization calculated as: (Actual hours / Capacity) × 100%
+
+---
+
+### Decision 8: Defer Real-Time Notifications to Phase 2
+
+**Choice:** No WebSocket notifications in MVP, users refresh to see changes.
+
+**Rationale:**
+- Allocations are planned monthly, not time-critical
+- WebSocket setup adds 6 hours of dev time
+- Polling every 30s is acceptable alternative but also deferred
+- Focus MVP on core allocation/reporting functionality
+
+**Alternatives Considered:**
+- WebSocket + 1 notification PoC (6 hours): Rejected (not critical for monthly planning)
+- Polling-based notifications (2 hours): Rejected (also deferred, users can refresh)
+
+**Implementation (Phase 2):**
+- Laravel Broadcasting with Redis adapter
+- SvelteKit WebSocket client
+- Events: AllocationCreated, AllocationUpdated, EstimateApproved
+
+---
+
+### Decision 9: Database Schema Design
+
+**Choice:** Normalized schema with master data tables, JSON for forecasted effort, UUIDs for primary keys.
+
+**Key Tables:**
+- `team_members`: id (UUID), name, role_id (FK), hourly_rate, active
+- `projects`: id (UUID), code (unique), title, status_id (FK), type_id (FK), approved_estimate, forecasted_effort (JSON)
+- `allocations`: id (UUID), project_id (FK), team_member_id (FK), month (YYYY-MM), allocated_hours
+- `actuals`: id (UUID), project_id (FK), team_member_id (FK), month (YYYY-MM), hours_logged
+- `roles`, `project_statuses`, `project_types`: Master data tables
+- `holidays`, `ptos`: Calendar data
+
+**Design Rationale:**
+- **UUIDs**: Prevent ID enumeration attacks, easier distributed systems later
+- **Normalized master data**: Roles/statuses/types in separate tables for dynamic configuration
+- **Month as string (YYYY-MM)**: Simplifies queries, index-friendly, human-readable
+- **JSON for forecasted effort**: Flexible structure `{"2026-02": 40, "2026-03": 60}`, easy to extend
+- **Soft deletes for projects**: `deleted_at` timestamp for audit trail
+- **Active flag for team members**: Preserve historical allocations when person leaves
+
+**Indexes:**
+- `allocations`: composite index on (project_id, month), (team_member_id, month)
+- `actuals`: composite index on (project_id, month), (team_member_id, month)
+- `team_members`: index on (role_id, active)
+- `projects`: index on (status_id, type_id), unique on (code)
+
+---
+
+### Decision 10: API Design Pattern
+
+**Choice:** REST API with Laravel API Resources for consistent JSON responses.
+
+**Rationale:**
+- REST is simpler than GraphQL for this use case
+- Laravel API Resources provide consistent transformation layer
+- Standard HTTP verbs (GET, POST, PUT, DELETE)
+- Easy to document with Laravel Scribe (SwaggerUI)
+
+**Endpoint Structure:**
+```
+/api/auth/login (POST)
+/api/auth/logout (POST)
+/api/auth/refresh (POST)
+
+/api/team-members (GET, POST)
+/api/team-members/:id (GET, PUT, DELETE)
+
+/api/projects (GET, POST)
+/api/projects/:id (GET, PUT, DELETE)
+
+/api/allocations?month=YYYY-MM (GET, POST)
+/api/allocations/bulk (POST)
+/api/allocations/:id (PUT, DELETE)
+
+/api/actuals?month=YYYY-MM (GET, POST)
+/api/actuals/bulk (POST)
+/api/actuals/:id (PUT)
+
+/api/reports/forecast?from=YYYY-MM&to=YYYY-MM (GET)
+/api/reports/utilization?month=YYYY-MM (GET)
+/api/reports/cost?month=YYYY-MM (GET)
+/api/reports/allocation?month=YYYY-MM (GET)
+/api/reports/variance?month=YYYY-MM (GET)
+
+/api/master-data/roles (GET)
+/api/master-data/statuses (GET)
+/api/master-data/types (GET)
+```
+
+**Response Format (Laravel API Resources):**
+```json
+{
+  "data": { /* resource */ },
+  "meta": { /* pagination, counts */ },
+  "links": { /* HATEOAS links */ }
+}
+```
+
+**Error Format:**
+```json
+{
+  "message": "Validation failed",
+  "errors": {
+    "allocated_hours": ["Must be greater than 0"]
+  }
+}
+```
+
+---
+
+### Decision 11: Testing Strategy
+
+**Choice:** >70% code coverage with unit + E2E + regression tests on every change.
+
+**Test Layers:**
+- **Backend Unit (PHPUnit)**: Model methods, service classes, utilities
+- **Backend Feature (Pest)**: API endpoints, authentication, authorization
+- **Frontend Unit (Vitest)**: Svelte components, stores, utilities
+- **E2E (Playwright)**: Critical user flows (login → allocate → view reports)
+
+**Coverage Targets:**
+- Backend: >80% (easier to test server-side logic)
+- Frontend: >70% (UI testing is harder)
+- Overall: >70% (enforced in `/opsx-verify`)
+
+**Test Data Strategy:**
+- Database seeders for test data (Laravel seeders)
+- Factories for model generation (Laravel factories)
+- Test fixtures for E2E tests (Playwright fixtures)
+
+**Regression Test Approach (MVP):**
+- Run full test suite on every change
+- E2E tests cover happy paths + critical error cases
+- Phase 2: Issue-driven loop (E2E failure → create GitHub issue → fix → retest)
+
+**Implementation:**
+- Pre-commit hooks: Run linters + unit tests
+- CI/CD: Run full suite (unit + E2E) before merge
+- `openspec verify` command: Check coverage, run tests, lint
+
+---
+
+## Risks / Trade-offs
+
+### Risk: SvelteKit Learning Curve
+**Impact:** Owner and associate unfamiliar with Svelte, may slow initial development.
+
+**Mitigation:**
+- SvelteKit has excellent documentation
+- Simpler than React/Vue (less boilerplate)
+- TanStack Table is framework-agnostic (owner unfamiliar but AI will guide)
+- Start with simple components, iterate
+
+---
+
+### Risk: Two-Container Complexity
+**Impact:** More moving parts than single monolith, deployment overhead.
+
+**Mitigation:**
+- Docker Compose handles orchestration
+- Code-mounted volumes for hot reload (no rebuild needed)
+- Owner comfortable with Docker
+- Cleaner architecture worth the overhead
+
+---
+
+### Risk: Over-Allocation Soft Validation
+**Impact:** Managers could ignore RED flags and over-allocate anyway.
+
+**Mitigation:**
+- Visual warnings are prominent (RED color, tooltip with exact impact)
+- Reports show over-allocation clearly
+- Phase 2: Add email notifications when allocations exceed threshold
+- Manager discipline assumed (this is their job)
+
+---
+
+### Risk: Manual Time Entry Accuracy
+**Impact:** Team members may forget to log hours or log inaccurately.
+
+**Mitigation:**
+- Utilization reports highlight under-logging (planned > actual consistently)
+- Manager can follow up with team members showing low actuals
+- Phase 2: Integrate with time-tracking tools (Jira, Harvest, Toggl)
+- Incremental weekly updates reduce forgetting
+
+---
+
+### Risk: Cache Invalidation Bugs
+**Impact:** Stale data shown if cache invalidation logic fails.
+
+**Mitigation:**
+- Comprehensive test coverage for cache invalidation logic
+- Redis TTL ensures eventual consistency (max 1 hour stale)
+- Cache keys are scoped (project, month, person)
+- Invalidation triggered on all mutations (create/update/delete)
+- Manual cache flush available for admins
+
+---
+
+### Risk: JWT Token Security
+**Impact:** Stolen tokens could allow unauthorized access.
+
+**Mitigation:**
+- Refresh token rotation (one-time use)
+- Short access token TTL (60 minutes)
+- Refresh tokens stored in Redis (revocable)
+- HTTPS enforced via Nginx Proxy Manager
+- Logout invalidates refresh token
+- Token includes user role for authorization checks
+
+---
+
+### Trade-off: No Real-Time Notifications
+**Benefit:** Saves 6 hours of dev time, keeps MVP scope tight.
+
+**Cost:** Users must manually refresh to see allocation changes.
+
+**Justification:** Allocations are monthly planning activity, not time-critical. Acceptable for MVP.
+
+---
+
+### Trade-off: No PDF/CSV Exports
+**Benefit:** Faster MVP, avoids report formatting complexity.
+
+**Cost:** Users cannot export reports for offline viewing or stakeholder sharing.
+
+**Justification:** On-screen reports are primary value, exports are nice-to-have for Phase 2.
+
+---
+
+### Trade-off: Manual Time Entry
+**Benefit:** Avoids vendor lock-in, no integration complexity.
+
+**Cost:** Team members must manually enter hours monthly.
+
+**Justification:** Monthly aggregate is low overhead (~5 minutes per person per month).
+
+---
+
+## Migration Plan
+
+**Deployment Steps:**
+
+1. **Initial Setup:**
+   - Run `docker-compose up` (creates 4 containers)
+   - Laravel migrations create database schema
+   - Database seeders populate master data (roles, statuses, types)
+   - Create superuser account via Laravel seeder
+
+2. **Data Import (Optional):**
+   - If team has historical spreadsheet data, create import script
+   - Import team members (name, role, hourly rate)
+   - Import active projects (code, title, approved estimate)
+   - Do NOT import historical allocations (start fresh)
+
+3. **User Onboarding:**
+   - Train managers on allocation workflow (1 hour session)
+   - Demo: capacity planning → project setup → allocation → reports
+   - Provide Quick Start guide (Markdown doc)
+
+4. **Go-Live:**
+   - Managers create February 2026 capacity plan (holidays, PTO, availability)
+   - Managers allocate resources for February
+   - Team members log February actuals mid-month (incremental updates)
+   - Month-end: Review utilization reports, adjust March allocations
+
+**Rollback Strategy:**
+- MVP is greenfield (no data migration to revert)
+- If critical bug discovered, roll back to previous container image
+- Docker Compose down/up with previous image tag
+- PostgreSQL data persisted in volume (safe across container restarts)
+- Zero-downtime rollback: Blue/green deployment (Phase 2, not needed for MVP)
+
+**Monitoring (Phase 2):**
+- Application logs (Laravel log files)
+- Database performance (PostgreSQL slow query log)
+- Cache hit rate (Redis INFO stats)
+- API response times (Laravel Telescope or custom middleware)
+
+---
+
+## Open Questions
+
+### Question 1: Hourly Rate Visibility
+Should developers see their own hourly rate, or only managers/top brass?
+
+**Options:**
+- A) Developers can see their own rate (transparency)
+- B) Developers cannot see rates (only allocations)
+
+**Recommendation:** A (transparency fosters trust, rate is not secret in most orgs)
+
+**Decision:** To be finalized with owner before implementation.
+
+---
+
+### Question 2: Hours Per Day Configuration
+Is "1.0 availability = 8 hours" globally configured, or per-project?
+
+**Options:**
+- A) Global setting (e.g., 1.0 = 8 hours for everyone)
+- B) Per-team member (some people work 6-hour days)
+- C) Per-project (different billing rates for different project types)
+
+**Recommendation:** A (global setting, simplest for MVP)
+
+**Decision:** Owner mentioned "configurable per project" but likely meant per team. Clarify.
+
+---
+
+### Question 3: PTO Approval Workflow
+Is PTO auto-approved, or does it require manager approval?
+
+**Options:**
+- A) Auto-approved (capacity reduced immediately)
+- B) Requires approval (pending state until manager approves)
+
+**Recommendation:** B (manager approval, prevents abuse)
+
+**Decision:** Owner likely expects approval workflow. Confirm.
+
+---
+
+### Question 4: Support Projects in Revenue Forecast
+Should "Support" type projects appear in revenue forecasts?
+
+**Options:**
+- A) Exclude from revenue (they're ongoing ops, not billable)
+- B) Include in revenue (still billable internally)
+
+**Recommendation:** Ask owner's preference (may vary by org)
+
+**Decision:** To be confirmed during implementation.
+
+---
+
+### Question 5: Allocation Tolerance Threshold
+What's the tolerance for "within estimate" (GREEN indicator)?
+
+**Current assumption:** ±5% (e.g., 100-hour project allocated 95-105 hours is GREEN)
+
+**Confirm:** Is 5% the right threshold, or should it be configurable?
+
+**Decision:** Start with 5%, make configurable in Phase 2 if needed.
+
+---
+
+**End of Design Document**
+
+**Next Steps:**
+1. Review open questions with owner
+2. Finalize database schema (ERD diagram)
+3. Create tasks.md (implementation checklist)
+4. Begin Sprint 1: Docker Compose setup + database migrations