Files

19 KiB
Raw Permalink Blame History

Context

Headroom is a greenfield web application to replace manual spreadsheet-based capacity planning for engineering teams. The current spreadsheet approach is error-prone, lacks validation, provides no audit trail, and wastes manager time (2+ hours monthly on allocation work).

Current State:

  • No existing system to migrate from
  • Team uses spreadsheets for capacity planning and resource allocation
  • No automation, no validation, no visibility into team headroom

Constraints:

  • MVP must be production-ready with >70% test coverage
  • Must run containerized (Docker Compose) from day 1
  • Existing Nginx Proxy Manager in environment (no Caddy/Traefik)
  • Must support 10-15 developers across 10-12 concurrent projects
  • Monthly capacity planning cycle (not real-time)

Stakeholders:

  • Engineering managers (primary users)
  • Team members (log hours, view allocations)
  • Top brass (view reports only)
  • Superuser/admin (system configuration)

Goals / Non-Goals

Goals:

  • Automate capacity calculations (holidays, PTO, availability)
  • Validate allocations against capacity and approved estimates
  • Prevent billing errors (over/under-allocation detection)
  • Provide clear visibility into team headroom
  • Track planned vs actual hours for utilization analysis
  • Generate 5 core reports with customizable filters
  • Reduce manager allocation time from 2+ hours to <30 minutes per month
  • Enforce role-based access control (4 personas)
  • Maintain >70% test coverage with comprehensive E2E tests

Non-Goals:

  • Real-time notifications (deferred to Phase 2, polling is acceptable for MVP)
  • PDF/CSV report exports (deferred to Phase 2, on-screen only for MVP)
  • Time-tracking tool integration (manual entry only for MVP)
  • Multi-tenancy (single-tenant MVP, add tenant_id later)
  • Mobile app (desktop web only)
  • AI-powered forecasting (rule-based validation sufficient)

Decisions

Decision 1: Two-Container Architecture (Laravel API + SvelteKit Frontend)

Choice: Separate Laravel API backend and SvelteKit frontend in different containers.

Rationale:

  • Clean separation of concerns (API vs UI)
  • Easier to scale independently in future
  • SvelteKit is modern and great for dashboards, worth learning curve
  • Laravel provides robust API development (owner has PHP background)

Alternatives Considered:

  • Laravel + Blade templates: Rejected (less interactive UI, harder for dashboards)
  • Laravel + Vue (Inertia): Rejected (owner preferred Svelte over Vue)
  • SvelteKit full-stack: Rejected (owner has PHP background, prefer Laravel for API)

Implementation:

  • Frontend: SvelteKit (port 5173), Tailwind CSS + DaisyUI, Recharts, TanStack Table
  • Backend: Laravel 12 (latest) (port 3000), PostgreSQL (latest), Redis (latest)
  • Communication: REST API with Laravel API Resources for consistent JSON
  • Reverse proxy: Existing Nginx Proxy Manager routes /api/* → Laravel, /* → SvelteKit

Decision 2: PostgreSQL from Day 1

Choice: Use PostgreSQL in production and development (no SQLite).

Rationale:

  • Avoid migration pain later (SQLite → PostgreSQL is error-prone)
  • Production-grade features (JSON operators, full-text search, advanced indexing)
  • Better for reporting queries (complex aggregations, window functions)
  • Docker volume mount preserves portability

Alternatives Considered:

  • SQLite for local dev, PostgreSQL for prod: Rejected (migration pain, feature parity issues)

Implementation:

  • PostgreSQL (latest, Alpine container)
  • Volume-mounted to ./data/postgres for backup/portability
  • Migrations from day 1 (Laravel migrations)
  • UUIDs for primary keys (prevents ID enumeration, easier distributed systems later)

Decision 3: Redis Caching from Day 1

Choice: Implement query and response caching with Redis from the start.

Rationale:

  • Owner insisted: "No need to sweat on refactoring it everywhere"
  • Prevents technical debt accumulation
  • Expensive queries (capacity calculations, reports) benefit immediately
  • Easy automatic cache invalidation with Laravel

Alternatives Considered:

  • Defer caching to Phase 2: Rejected (owner's preference for avoiding future refactoring)

Implementation:

  • Redis (latest, Alpine container)
  • Cache keys pattern: allocations:month:{YYYY-MM}, reports:forecast:{from}:{to}:{hash}
  • TTL: 1 hour (allocations), 15 min (reports), 24 hours (master data)
  • Automatic invalidation on mutations (create/update/delete triggers cache flush)
  • Laravel cache facade for consistency

Decision 4: JWT Authentication (Token-Based)

Choice: JWT tokens instead of session-based authentication.

Rationale:

  • Stateless (better for API-first architecture)
  • Suitable for SPA frontend
  • Easier to add mobile app later (future-proofing)
  • Industry standard for REST APIs

Alternatives Considered:

  • Laravel sessions: Rejected (owner preferred JWT for future mobile support)

Implementation:

  • tymon/jwt-auth package
  • Access token: 60 minute TTL
  • Refresh token: 7 day TTL (stored in Redis, one-time use with rotation)
  • Token claims: user UUID, role, permissions array
  • Refresh endpoint rotates tokens on each use

Decision 5: SvelteKit Frontend Stack

Choice: SvelteKit + Tailwind CSS + DaisyUI + Recharts + TanStack Table + Superforms + Zod

Rationale:

  • DaisyUI: Fast development, opinionated but speeds up dashboard creation
  • Recharts: Good balance of power and simplicity for charts
  • TanStack Table: Industry standard for data grids, powerful filtering/sorting
  • Superforms + Zod: Type-safe validation, seamless SvelteKit Form Actions integration

Alternatives Considered:

  • Shadcn/ui: Rejected (DaisyUI faster for MVP)
  • Chart.js: Rejected (Recharts more powerful)
  • Custom table component: Rejected (TanStack is proven, owner unfamiliar but trusts recommendation)

Implementation:

  • Svelte stores for minimal UI state only (filters, modals)
  • Fetch API for HTTP (no Axios, native is sufficient)
  • Vitest for unit tests, Playwright for E2E tests

Decision 6: Allocation Validation Strategy

Choice: Soft validation with visual indicators (GREEN/YELLOW/RED), not hard blocks.

Rationale:

  • Managers sometimes need flexibility to over-allocate temporarily
  • Hard blocks would frustrate workflow
  • Visual warnings catch errors while allowing override
  • "This money is my salary!" — both over and under-allocation must be flagged

Validation Rules:

  • GREEN: Allocation = Approved estimate (100%, within ±5% tolerance)
  • YELLOW: Under-allocation (<95% of approved estimate) — will undercharge
  • RED: Over-allocation (>105% of approved estimate) — will overcharge
  • Person capacity: YELLOW warning at >100%, RED alert at >120%

Implementation:

  • API returns validation status with each allocation response
  • Frontend displays color-coded indicators in allocation matrix
  • Tooltip shows exact variance ("Over by 20 hours, will overcharge client")

Decision 7: Monthly Aggregate Actuals (Not Daily)

Choice: Track actual hours as monthly totals, allowing incremental weekly updates.

Rationale:

  • Monthly planning cycle doesn't require daily granularity
  • Simplifies data model and UI
  • Team members can update weekly and system accumulates
  • No time-tracking integration for MVP (manual entry)

Alternatives Considered:

  • Daily time logging: Rejected (over-engineering for MVP, adds complexity)
  • Weekly buckets: Rejected (monthly is sufficient given monthly planning cycle)

Implementation:

  • Actuals table: project_id, team_member_id, month (YYYY-MM), hours_logged
  • UI allows replacing or incrementing monthly total
  • Utilization calculated as: (Actual hours / Capacity) × 100%

Decision 8: Defer Real-Time Notifications to Phase 2

Choice: No WebSocket notifications in MVP, users refresh to see changes.

Rationale:

  • Allocations are planned monthly, not time-critical
  • WebSocket setup adds 6 hours of dev time
  • Polling every 30s is acceptable alternative but also deferred
  • Focus MVP on core allocation/reporting functionality

Alternatives Considered:

  • WebSocket + 1 notification PoC (6 hours): Rejected (not critical for monthly planning)
  • Polling-based notifications (2 hours): Rejected (also deferred, users can refresh)

Implementation (Phase 2):

  • Laravel Broadcasting with Redis adapter
  • SvelteKit WebSocket client
  • Events: AllocationCreated, AllocationUpdated, EstimateApproved

Decision 9: Database Schema Design

Choice: Normalized schema with master data tables, JSON for forecasted effort, UUIDs for primary keys.

Key Tables:

  • team_members: id (UUID), name, role_id (FK), hourly_rate, active
  • projects: id (UUID), code (unique), title, status_id (FK), type_id (FK), approved_estimate, forecasted_effort (JSON)
  • allocations: id (UUID), project_id (FK), team_member_id (FK), month (YYYY-MM), allocated_hours
  • actuals: id (UUID), project_id (FK), team_member_id (FK), month (YYYY-MM), hours_logged
  • roles, project_statuses, project_types: Master data tables
  • holidays, ptos: Calendar data

Design Rationale:

  • UUIDs: Prevent ID enumeration attacks, easier distributed systems later
  • Normalized master data: Roles/statuses/types in separate tables for dynamic configuration
  • Month as string (YYYY-MM): Simplifies queries, index-friendly, human-readable
  • JSON for forecasted effort: Flexible structure {"2026-02": 40, "2026-03": 60}, easy to extend
  • Soft deletes for projects: deleted_at timestamp for audit trail
  • Active flag for team members: Preserve historical allocations when person leaves

Indexes:

  • allocations: composite index on (project_id, month), (team_member_id, month)
  • actuals: composite index on (project_id, month), (team_member_id, month)
  • team_members: index on (role_id, active)
  • projects: index on (status_id, type_id), unique on (code)

Decision 10: API Design Pattern

Choice: REST API with Laravel API Resources for consistent JSON responses.

Rationale:

  • REST is simpler than GraphQL for this use case
  • Laravel API Resources provide consistent transformation layer
  • Standard HTTP verbs (GET, POST, PUT, DELETE)
  • Easy to document with Laravel Scribe (SwaggerUI)

Endpoint Structure:

/api/auth/login (POST)
/api/auth/logout (POST)
/api/auth/refresh (POST)

/api/team-members (GET, POST)
/api/team-members/:id (GET, PUT, DELETE)

/api/projects (GET, POST)
/api/projects/:id (GET, PUT, DELETE)

/api/allocations?month=YYYY-MM (GET, POST)
/api/allocations/bulk (POST)
/api/allocations/:id (PUT, DELETE)

/api/actuals?month=YYYY-MM (GET, POST)
/api/actuals/bulk (POST)
/api/actuals/:id (PUT)

/api/reports/forecast?from=YYYY-MM&to=YYYY-MM (GET)
/api/reports/utilization?month=YYYY-MM (GET)
/api/reports/cost?month=YYYY-MM (GET)
/api/reports/allocation?month=YYYY-MM (GET)
/api/reports/variance?month=YYYY-MM (GET)

/api/master-data/roles (GET)
/api/master-data/statuses (GET)
/api/master-data/types (GET)

Response Format (Laravel API Resources):

{
  "data": { /* resource */ },
  "meta": { /* pagination, counts */ },
  "links": { /* HATEOAS links */ }
}

Error Format:

{
  "message": "Validation failed",
  "errors": {
    "allocated_hours": ["Must be greater than 0"]
  }
}

Decision 11: Testing Strategy

Choice: >70% code coverage with unit + E2E + regression tests on every change.

Test Layers:

  • Backend Unit (PHPUnit): Model methods, service classes, utilities
  • Backend Feature (Pest): API endpoints, authentication, authorization
  • Frontend Unit (Vitest): Svelte components, stores, utilities
  • E2E (Playwright): Critical user flows (login → allocate → view reports)

Coverage Targets:

  • Backend: >80% (easier to test server-side logic)
  • Frontend: >70% (UI testing is harder)
  • Overall: >70% (enforced in /opsx-verify)

Test Data Strategy:

  • Database seeders for test data (Laravel seeders)
  • Factories for model generation (Laravel factories)
  • Test fixtures for E2E tests (Playwright fixtures)

Regression Test Approach (MVP):

  • Run full test suite on every change
  • E2E tests cover happy paths + critical error cases
  • Phase 2: Issue-driven loop (E2E failure → create GitHub issue → fix → retest)

Implementation:

  • Pre-commit hooks: Run linters + unit tests
  • CI/CD: Run full suite (unit + E2E) before merge
  • openspec verify command: Check coverage, run tests, lint

Risks / Trade-offs

Risk: SvelteKit Learning Curve

Impact: Owner and associate unfamiliar with Svelte, may slow initial development.

Mitigation:

  • SvelteKit has excellent documentation
  • Simpler than React/Vue (less boilerplate)
  • TanStack Table is framework-agnostic (owner unfamiliar but AI will guide)
  • Start with simple components, iterate

Risk: Two-Container Complexity

Impact: More moving parts than single monolith, deployment overhead.

Mitigation:

  • Docker Compose handles orchestration
  • Code-mounted volumes for hot reload (no rebuild needed)
  • Owner comfortable with Docker
  • Cleaner architecture worth the overhead

Risk: Over-Allocation Soft Validation

Impact: Managers could ignore RED flags and over-allocate anyway.

Mitigation:

  • Visual warnings are prominent (RED color, tooltip with exact impact)
  • Reports show over-allocation clearly
  • Phase 2: Add email notifications when allocations exceed threshold
  • Manager discipline assumed (this is their job)

Risk: Manual Time Entry Accuracy

Impact: Team members may forget to log hours or log inaccurately.

Mitigation:

  • Utilization reports highlight under-logging (planned > actual consistently)
  • Manager can follow up with team members showing low actuals
  • Phase 2: Integrate with time-tracking tools (Jira, Harvest, Toggl)
  • Incremental weekly updates reduce forgetting

Risk: Cache Invalidation Bugs

Impact: Stale data shown if cache invalidation logic fails.

Mitigation:

  • Comprehensive test coverage for cache invalidation logic
  • Redis TTL ensures eventual consistency (max 1 hour stale)
  • Cache keys are scoped (project, month, person)
  • Invalidation triggered on all mutations (create/update/delete)
  • Manual cache flush available for admins

Risk: JWT Token Security

Impact: Stolen tokens could allow unauthorized access.

Mitigation:

  • Refresh token rotation (one-time use)
  • Short access token TTL (60 minutes)
  • Refresh tokens stored in Redis (revocable)
  • HTTPS enforced via Nginx Proxy Manager
  • Logout invalidates refresh token
  • Token includes user role for authorization checks

Trade-off: No Real-Time Notifications

Benefit: Saves 6 hours of dev time, keeps MVP scope tight.

Cost: Users must manually refresh to see allocation changes.

Justification: Allocations are monthly planning activity, not time-critical. Acceptable for MVP.


Trade-off: No PDF/CSV Exports

Benefit: Faster MVP, avoids report formatting complexity.

Cost: Users cannot export reports for offline viewing or stakeholder sharing.

Justification: On-screen reports are primary value, exports are nice-to-have for Phase 2.


Trade-off: Manual Time Entry

Benefit: Avoids vendor lock-in, no integration complexity.

Cost: Team members must manually enter hours monthly.

Justification: Monthly aggregate is low overhead (~5 minutes per person per month).


Migration Plan

Deployment Steps:

  1. Initial Setup:

    • Run docker-compose up (creates 4 containers)
    • Laravel migrations create database schema
    • Database seeders populate master data (roles, statuses, types)
    • Create superuser account via Laravel seeder
  2. Data Import (Optional):

    • If team has historical spreadsheet data, create import script
    • Import team members (name, role, hourly rate)
    • Import active projects (code, title, approved estimate)
    • Do NOT import historical allocations (start fresh)
  3. User Onboarding:

    • Train managers on allocation workflow (1 hour session)
    • Demo: capacity planning → project setup → allocation → reports
    • Provide Quick Start guide (Markdown doc)
  4. Go-Live:

    • Managers create February 2026 capacity plan (holidays, PTO, availability)
    • Managers allocate resources for February
    • Team members log February actuals mid-month (incremental updates)
    • Month-end: Review utilization reports, adjust March allocations

Rollback Strategy:

  • MVP is greenfield (no data migration to revert)
  • If critical bug discovered, roll back to previous container image
  • Docker Compose down/up with previous image tag
  • PostgreSQL data persisted in volume (safe across container restarts)
  • Zero-downtime rollback: Blue/green deployment (Phase 2, not needed for MVP)

Monitoring (Phase 2):

  • Application logs (Laravel log files)
  • Database performance (PostgreSQL slow query log)
  • Cache hit rate (Redis INFO stats)
  • API response times (Laravel Telescope or custom middleware)

Open Questions

Question 1: Hourly Rate Visibility

Should developers see their own hourly rate, or only managers/top brass?

Options:

  • A) Developers can see their own rate (transparency)
  • B) Developers cannot see rates (only allocations)

Recommendation: A (transparency fosters trust, rate is not secret in most orgs)

Decision: To be finalized with owner before implementation.


Question 2: Hours Per Day Configuration

Is "1.0 availability = 8 hours" globally configured, or per-project?

Options:

  • A) Global setting (e.g., 1.0 = 8 hours for everyone)
  • B) Per-team member (some people work 6-hour days)
  • C) Per-project (different billing rates for different project types)

Recommendation: A (global setting, simplest for MVP)

Decision: Owner mentioned "configurable per project" but likely meant per team. Clarify.


Question 3: PTO Approval Workflow

Is PTO auto-approved, or does it require manager approval?

Options:

  • A) Auto-approved (capacity reduced immediately)
  • B) Requires approval (pending state until manager approves)

Recommendation: B (manager approval, prevents abuse)

Decision: Owner likely expects approval workflow. Confirm.


Question 4: Support Projects in Revenue Forecast

Should "Support" type projects appear in revenue forecasts?

Options:

  • A) Exclude from revenue (they're ongoing ops, not billable)
  • B) Include in revenue (still billable internally)

Recommendation: Ask owner's preference (may vary by org)

Decision: To be confirmed during implementation.


Question 5: Allocation Tolerance Threshold

What's the tolerance for "within estimate" (GREEN indicator)?

Current assumption: ±5% (e.g., 100-hour project allocated 95-105 hours is GREEN)

Confirm: Is 5% the right threshold, or should it be configurable?

Decision: Start with 5%, make configurable in Phase 2 if needed.


End of Design Document

Next Steps:

  1. Review open questions with owner
  2. Finalize database schema (ERD diagram)
  3. Create tasks.md (implementation checklist)
  4. Begin Sprint 1: Docker Compose setup + database migrations