headroom/openspec/changes/headroom-foundation/design.md at main

Files

Santhosh Janardhanan 04022b7e0b Headroom - Foundation

2026-02-17 02:10:23 -05:00

19 KiB

Raw Permalink Blame History

Context

Headroom is a greenfield web application to replace manual spreadsheet-based capacity planning for engineering teams. The current spreadsheet approach is error-prone, lacks validation, provides no audit trail, and wastes manager time (2+ hours monthly on allocation work).

Current State:

No existing system to migrate from
Team uses spreadsheets for capacity planning and resource allocation
No automation, no validation, no visibility into team headroom

Constraints:

MVP must be production-ready with >70% test coverage
Must run containerized (Docker Compose) from day 1
Existing Nginx Proxy Manager in environment (no Caddy/Traefik)
Must support 10-15 developers across 10-12 concurrent projects
Monthly capacity planning cycle (not real-time)

Stakeholders:

Engineering managers (primary users)
Team members (log hours, view allocations)
Top brass (view reports only)
Superuser/admin (system configuration)

Goals / Non-Goals

Goals:

Automate capacity calculations (holidays, PTO, availability)
Validate allocations against capacity and approved estimates
Prevent billing errors (over/under-allocation detection)
Provide clear visibility into team headroom
Track planned vs actual hours for utilization analysis
Generate 5 core reports with customizable filters
Reduce manager allocation time from 2+ hours to <30 minutes per month
Enforce role-based access control (4 personas)
Maintain >70% test coverage with comprehensive E2E tests

Non-Goals:

Real-time notifications (deferred to Phase 2, polling is acceptable for MVP)
PDF/CSV report exports (deferred to Phase 2, on-screen only for MVP)
Time-tracking tool integration (manual entry only for MVP)
Multi-tenancy (single-tenant MVP, add tenant_id later)
Mobile app (desktop web only)
AI-powered forecasting (rule-based validation sufficient)

Decisions

Decision 1: Two-Container Architecture (Laravel API + SvelteKit Frontend)

Choice: Separate Laravel API backend and SvelteKit frontend in different containers.

Rationale:

Clean separation of concerns (API vs UI)
Easier to scale independently in future
SvelteKit is modern and great for dashboards, worth learning curve
Laravel provides robust API development (owner has PHP background)

Alternatives Considered:

Laravel + Blade templates: Rejected (less interactive UI, harder for dashboards)
Laravel + Vue (Inertia): Rejected (owner preferred Svelte over Vue)
SvelteKit full-stack: Rejected (owner has PHP background, prefer Laravel for API)

Implementation:

Frontend: SvelteKit (port 5173), Tailwind CSS + DaisyUI, Recharts, TanStack Table
Backend: Laravel 12 (latest) (port 3000), PostgreSQL (latest), Redis (latest)
Communication: REST API with Laravel API Resources for consistent JSON
Reverse proxy: Existing Nginx Proxy Manager routes /api/* → Laravel, /* → SvelteKit

Decision 2: PostgreSQL from Day 1

Choice: Use PostgreSQL in production and development (no SQLite).

Rationale:

Avoid migration pain later (SQLite → PostgreSQL is error-prone)
Production-grade features (JSON operators, full-text search, advanced indexing)
Better for reporting queries (complex aggregations, window functions)
Docker volume mount preserves portability

Alternatives Considered:

SQLite for local dev, PostgreSQL for prod: Rejected (migration pain, feature parity issues)

Implementation:

PostgreSQL (latest, Alpine container)
Volume-mounted to ./data/postgres for backup/portability
Migrations from day 1 (Laravel migrations)
UUIDs for primary keys (prevents ID enumeration, easier distributed systems later)

Decision 3: Redis Caching from Day 1

Choice: Implement query and response caching with Redis from the start.

Rationale:

Owner insisted: "No need to sweat on refactoring it everywhere"
Prevents technical debt accumulation
Expensive queries (capacity calculations, reports) benefit immediately
Easy automatic cache invalidation with Laravel

Alternatives Considered:

Defer caching to Phase 2: Rejected (owner's preference for avoiding future refactoring)

Implementation:

Redis (latest, Alpine container)
Cache keys pattern: allocations:month:{YYYY-MM}, reports:forecast:{from}:{to}:{hash}
TTL: 1 hour (allocations), 15 min (reports), 24 hours (master data)
Automatic invalidation on mutations (create/update/delete triggers cache flush)
Laravel cache facade for consistency

Decision 4: JWT Authentication (Token-Based)

Choice: JWT tokens instead of session-based authentication.

Rationale:

Stateless (better for API-first architecture)
Suitable for SPA frontend
Easier to add mobile app later (future-proofing)
Industry standard for REST APIs

Alternatives Considered:

Laravel sessions: Rejected (owner preferred JWT for future mobile support)

Implementation:

tymon/jwt-auth package
Access token: 60 minute TTL
Refresh token: 7 day TTL (stored in Redis, one-time use with rotation)
Token claims: user UUID, role, permissions array
Refresh endpoint rotates tokens on each use

Decision 5: SvelteKit Frontend Stack

Choice: SvelteKit + Tailwind CSS + DaisyUI + Recharts + TanStack Table + Superforms + Zod

Rationale:

DaisyUI: Fast development, opinionated but speeds up dashboard creation
Recharts: Good balance of power and simplicity for charts
TanStack Table: Industry standard for data grids, powerful filtering/sorting
Superforms + Zod: Type-safe validation, seamless SvelteKit Form Actions integration

Alternatives Considered:

Shadcn/ui: Rejected (DaisyUI faster for MVP)
Chart.js: Rejected (Recharts more powerful)
Custom table component: Rejected (TanStack is proven, owner unfamiliar but trusts recommendation)

Implementation:

Svelte stores for minimal UI state only (filters, modals)
Fetch API for HTTP (no Axios, native is sufficient)
Vitest for unit tests, Playwright for E2E tests

Decision 6: Allocation Validation Strategy

Choice: Soft validation with visual indicators (GREEN/YELLOW/RED), not hard blocks.

Rationale:

Managers sometimes need flexibility to over-allocate temporarily
Hard blocks would frustrate workflow
Visual warnings catch errors while allowing override
"This money is my salary!" — both over and under-allocation must be flagged

Validation Rules:

GREEN: Allocation = Approved estimate (100%, within ±5% tolerance)
YELLOW: Under-allocation (<95% of approved estimate) — will undercharge
RED: Over-allocation (>105% of approved estimate) — will overcharge
Person capacity: YELLOW warning at >100%, RED alert at >120%

Implementation:

API returns validation status with each allocation response
Frontend displays color-coded indicators in allocation matrix
Tooltip shows exact variance ("Over by 20 hours, will overcharge client")

Decision 7: Monthly Aggregate Actuals (Not Daily)

Choice: Track actual hours as monthly totals, allowing incremental weekly updates.

Rationale:

Monthly planning cycle doesn't require daily granularity
Simplifies data model and UI
Team members can update weekly and system accumulates
No time-tracking integration for MVP (manual entry)

Alternatives Considered:

Daily time logging: Rejected (over-engineering for MVP, adds complexity)
Weekly buckets: Rejected (monthly is sufficient given monthly planning cycle)

Implementation:

Actuals table: project_id, team_member_id, month (YYYY-MM), hours_logged
UI allows replacing or incrementing monthly total
Utilization calculated as: (Actual hours / Capacity) × 100%

Decision 8: Defer Real-Time Notifications to Phase 2

Choice: No WebSocket notifications in MVP, users refresh to see changes.

Rationale:

Allocations are planned monthly, not time-critical
WebSocket setup adds 6 hours of dev time
Polling every 30s is acceptable alternative but also deferred
Focus MVP on core allocation/reporting functionality

Alternatives Considered:

WebSocket + 1 notification PoC (6 hours): Rejected (not critical for monthly planning)
Polling-based notifications (2 hours): Rejected (also deferred, users can refresh)

Implementation (Phase 2):

Laravel Broadcasting with Redis adapter
SvelteKit WebSocket client
Events: AllocationCreated, AllocationUpdated, EstimateApproved

Decision 9: Database Schema Design

Choice: Normalized schema with master data tables, JSON for forecasted effort, UUIDs for primary keys.

Key Tables:

team_members: id (UUID), name, role_id (FK), hourly_rate, active
projects: id (UUID), code (unique), title, status_id (FK), type_id (FK), approved_estimate, forecasted_effort (JSON)
allocations: id (UUID), project_id (FK), team_member_id (FK), month (YYYY-MM), allocated_hours
actuals: id (UUID), project_id (FK), team_member_id (FK), month (YYYY-MM), hours_logged
roles, project_statuses, project_types: Master data tables
holidays, ptos: Calendar data

Design Rationale:

UUIDs: Prevent ID enumeration attacks, easier distributed systems later
Normalized master data: Roles/statuses/types in separate tables for dynamic configuration
Month as string (YYYY-MM): Simplifies queries, index-friendly, human-readable
JSON for forecasted effort: Flexible structure {"2026-02": 40, "2026-03": 60}, easy to extend
Soft deletes for projects: deleted_at timestamp for audit trail
Active flag for team members: Preserve historical allocations when person leaves

Indexes:

allocations: composite index on (project_id, month), (team_member_id, month)
actuals: composite index on (project_id, month), (team_member_id, month)
team_members: index on (role_id, active)
projects: index on (status_id, type_id), unique on (code)

Decision 10: API Design Pattern

Choice: REST API with Laravel API Resources for consistent JSON responses.

Rationale:

REST is simpler than GraphQL for this use case
Laravel API Resources provide consistent transformation layer
Standard HTTP verbs (GET, POST, PUT, DELETE)
Easy to document with Laravel Scribe (SwaggerUI)

Endpoint Structure:

/api/auth/login (POST)
/api/auth/logout (POST)
/api/auth/refresh (POST)

/api/team-members (GET, POST)
/api/team-members/:id (GET, PUT, DELETE)

/api/projects (GET, POST)
/api/projects/:id (GET, PUT, DELETE)

/api/allocations?month=YYYY-MM (GET, POST)
/api/allocations/bulk (POST)
/api/allocations/:id (PUT, DELETE)

/api/actuals?month=YYYY-MM (GET, POST)
/api/actuals/bulk (POST)
/api/actuals/:id (PUT)

/api/reports/forecast?from=YYYY-MM&to=YYYY-MM (GET)
/api/reports/utilization?month=YYYY-MM (GET)
/api/reports/cost?month=YYYY-MM (GET)
/api/reports/allocation?month=YYYY-MM (GET)
/api/reports/variance?month=YYYY-MM (GET)

/api/master-data/roles (GET)
/api/master-data/statuses (GET)
/api/master-data/types (GET)

Response Format (Laravel API Resources):

{
  "data": { /* resource */ },
  "meta": { /* pagination, counts */ },
  "links": { /* HATEOAS links */ }
}

Error Format:

{
  "message": "Validation failed",
  "errors": {
    "allocated_hours": ["Must be greater than 0"]
  }
}

Decision 11: Testing Strategy

Choice: >70% code coverage with unit + E2E + regression tests on every change.

Test Layers:

Backend Unit (PHPUnit): Model methods, service classes, utilities
Backend Feature (Pest): API endpoints, authentication, authorization
Frontend Unit (Vitest): Svelte components, stores, utilities
E2E (Playwright): Critical user flows (login → allocate → view reports)

Coverage Targets:

Backend: >80% (easier to test server-side logic)
Frontend: >70% (UI testing is harder)
Overall: >70% (enforced in /opsx-verify)

Test Data Strategy:

Database seeders for test data (Laravel seeders)
Factories for model generation (Laravel factories)
Test fixtures for E2E tests (Playwright fixtures)

Regression Test Approach (MVP):

Run full test suite on every change
E2E tests cover happy paths + critical error cases
Phase 2: Issue-driven loop (E2E failure → create GitHub issue → fix → retest)

Implementation:

Pre-commit hooks: Run linters + unit tests
CI/CD: Run full suite (unit + E2E) before merge
openspec verify command: Check coverage, run tests, lint

Risks / Trade-offs

Risk: SvelteKit Learning Curve

Impact: Owner and associate unfamiliar with Svelte, may slow initial development.

Mitigation:

SvelteKit has excellent documentation
Simpler than React/Vue (less boilerplate)
TanStack Table is framework-agnostic (owner unfamiliar but AI will guide)
Start with simple components, iterate

Risk: Two-Container Complexity

Impact: More moving parts than single monolith, deployment overhead.

Mitigation:

Docker Compose handles orchestration
Code-mounted volumes for hot reload (no rebuild needed)
Owner comfortable with Docker
Cleaner architecture worth the overhead

Risk: Over-Allocation Soft Validation

Impact: Managers could ignore RED flags and over-allocate anyway.

Mitigation:

Visual warnings are prominent (RED color, tooltip with exact impact)
Reports show over-allocation clearly
Phase 2: Add email notifications when allocations exceed threshold
Manager discipline assumed (this is their job)

Risk: Manual Time Entry Accuracy

Impact: Team members may forget to log hours or log inaccurately.

Mitigation:

Utilization reports highlight under-logging (planned > actual consistently)
Manager can follow up with team members showing low actuals
Phase 2: Integrate with time-tracking tools (Jira, Harvest, Toggl)
Incremental weekly updates reduce forgetting

Risk: Cache Invalidation Bugs

Impact: Stale data shown if cache invalidation logic fails.

Mitigation:

Comprehensive test coverage for cache invalidation logic
Redis TTL ensures eventual consistency (max 1 hour stale)
Cache keys are scoped (project, month, person)
Invalidation triggered on all mutations (create/update/delete)
Manual cache flush available for admins

Risk: JWT Token Security

Impact: Stolen tokens could allow unauthorized access.

Mitigation:

Refresh token rotation (one-time use)
Short access token TTL (60 minutes)
Refresh tokens stored in Redis (revocable)
HTTPS enforced via Nginx Proxy Manager
Logout invalidates refresh token
Token includes user role for authorization checks

Trade-off: No Real-Time Notifications

Benefit: Saves 6 hours of dev time, keeps MVP scope tight.

Cost: Users must manually refresh to see allocation changes.

Justification: Allocations are monthly planning activity, not time-critical. Acceptable for MVP.

Trade-off: No PDF/CSV Exports

Benefit: Faster MVP, avoids report formatting complexity.

Cost: Users cannot export reports for offline viewing or stakeholder sharing.

Justification: On-screen reports are primary value, exports are nice-to-have for Phase 2.

Trade-off: Manual Time Entry

Benefit: Avoids vendor lock-in, no integration complexity.

Cost: Team members must manually enter hours monthly.

Justification: Monthly aggregate is low overhead (~5 minutes per person per month).

Migration Plan

Deployment Steps:

Initial Setup:
- Run docker-compose up (creates 4 containers)
- Laravel migrations create database schema
- Database seeders populate master data (roles, statuses, types)
- Create superuser account via Laravel seeder
Data Import (Optional):
- If team has historical spreadsheet data, create import script
- Import team members (name, role, hourly rate)
- Import active projects (code, title, approved estimate)
- Do NOT import historical allocations (start fresh)
User Onboarding:
- Train managers on allocation workflow (1 hour session)
- Demo: capacity planning → project setup → allocation → reports
- Provide Quick Start guide (Markdown doc)
Go-Live:
- Managers create February 2026 capacity plan (holidays, PTO, availability)
- Managers allocate resources for February
- Team members log February actuals mid-month (incremental updates)
- Month-end: Review utilization reports, adjust March allocations

Rollback Strategy:

MVP is greenfield (no data migration to revert)
If critical bug discovered, roll back to previous container image
Docker Compose down/up with previous image tag
PostgreSQL data persisted in volume (safe across container restarts)
Zero-downtime rollback: Blue/green deployment (Phase 2, not needed for MVP)

Monitoring (Phase 2):

Application logs (Laravel log files)
Database performance (PostgreSQL slow query log)
Cache hit rate (Redis INFO stats)
API response times (Laravel Telescope or custom middleware)

Open Questions

Question 1: Hourly Rate Visibility

Should developers see their own hourly rate, or only managers/top brass?

Options:

A) Developers can see their own rate (transparency)
B) Developers cannot see rates (only allocations)

Recommendation: A (transparency fosters trust, rate is not secret in most orgs)

Decision: To be finalized with owner before implementation.

Question 2: Hours Per Day Configuration

Is "1.0 availability = 8 hours" globally configured, or per-project?

Options:

A) Global setting (e.g., 1.0 = 8 hours for everyone)
B) Per-team member (some people work 6-hour days)
C) Per-project (different billing rates for different project types)

Recommendation: A (global setting, simplest for MVP)

Decision: Owner mentioned "configurable per project" but likely meant per team. Clarify.

Question 3: PTO Approval Workflow

Is PTO auto-approved, or does it require manager approval?

Options:

A) Auto-approved (capacity reduced immediately)
B) Requires approval (pending state until manager approves)

Recommendation: B (manager approval, prevents abuse)

Decision: Owner likely expects approval workflow. Confirm.

Question 4: Support Projects in Revenue Forecast

Should "Support" type projects appear in revenue forecasts?

Options:

A) Exclude from revenue (they're ongoing ops, not billable)
B) Include in revenue (still billable internally)

Recommendation: Ask owner's preference (may vary by org)

Decision: To be confirmed during implementation.

Question 5: Allocation Tolerance Threshold

What's the tolerance for "within estimate" (GREEN indicator)?

Current assumption: ±5% (e.g., 100-hour project allocated 95-105 hours is GREEN)

Confirm: Is 5% the right threshold, or should it be configurable?

Decision: Start with 5%, make configurable in Phase 2 if needed.

End of Design Document

Next Steps:

Review open questions with owner
Finalize database schema (ERD diagram)
Create tasks.md (implementation checklist)
Begin Sprint 1: Docker Compose setup + database migrations

19 KiB Raw Permalink Blame History Unescape Escape

Context

Goals / Non-Goals

Decisions

Decision 1: Two-Container Architecture (Laravel API + SvelteKit Frontend)

Decision 2: PostgreSQL from Day 1

Decision 3: Redis Caching from Day 1

Decision 4: JWT Authentication (Token-Based)

Decision 5: SvelteKit Frontend Stack

Decision 6: Allocation Validation Strategy

Decision 7: Monthly Aggregate Actuals (Not Daily)

Decision 8: Defer Real-Time Notifications to Phase 2

Decision 9: Database Schema Design

Decision 10: API Design Pattern

Decision 11: Testing Strategy

Risks / Trade-offs

Risk: SvelteKit Learning Curve

Risk: Two-Container Complexity

Risk: Over-Allocation Soft Validation

Risk: Manual Time Entry Accuracy

Risk: Cache Invalidation Bugs

Risk: JWT Token Security

Trade-off: No Real-Time Notifications

Trade-off: No PDF/CSV Exports

Trade-off: Manual Time Entry

Migration Plan

Open Questions

Question 1: Hourly Rate Visibility

Question 2: Hours Per Day Configuration

Question 3: PTO Approval Workflow

Question 4: Support Projects in Revenue Forecast

Question 5: Allocation Tolerance Threshold

19 KiB

Raw Permalink Blame History