Add offline merchant category suggestions

2026-03-23 13:28:00 -04:00
parent 12c72ddcad
commit 696d393fca
11 changed files with 352 additions and 31 deletions
--- a/openspec/changes/monthly-expense-tracker-v1/design.md
+++ b/openspec/changes/monthly-expense-tracker-v1/design.md
@@ -1,6 +1,6 @@
 ## Context

-The repository starts with a product plan and OpenSpec configuration but no application code. The first version needs a complete local-first implementation using `Next.js`, `Prisma`, `SQLite`, and `OpenAI`, while keeping scope intentionally narrow: one user, manual data entry, fixed categories, and dashboard-only insights. Month boundaries are based on the local machine timezone, which affects date parsing, monthly aggregation, and paycheck coverage calculations.
+The repository starts with a product plan and OpenSpec configuration but no application code. The first version needs a complete local-first implementation using `Next.js`, `Prisma`, `SQLite`, and a fully offline local LLM runtime, while keeping scope intentionally narrow: one user, manual data entry, fixed categories, merchant-assisted categorization, and dashboard-only insights. Month boundaries are based on the local machine timezone, which affects date parsing, monthly aggregation, and paycheck coverage calculations.

 ## Goals / Non-Goals

@@ -8,12 +8,13 @@ The repository starts with a product plan and OpenSpec configuration but no appl
 - Build a single deployable `Next.js` app with UI views and server routes in one codebase.
 - Persist expenses, paychecks, and generated monthly insights in a local SQLite database managed by Prisma.
 - Centralize monthly aggregation logic so dashboard reads and AI generation use the same numbers.
- Keep AI integration isolated behind a small service layer that prepares structured monthly context and calls `OpenAI`.
+- Keep AI integration isolated behind a small service layer that prepares structured monthly context and calls a fully offline local inference runtime.
 - Make v1 testable with deterministic validation, aggregation, and safe fallback behavior for sparse data.
+- Add privacy-preserving merchant category suggestion with deterministic merchant mappings before model inference.

 **Non-Goals:**
 - Authentication, multi-user support, bank sync, receipt scanning, background jobs, or email delivery.
- Automatic categorization, editing data through AI, or free-form custom categories in v1.
+- Fully automatic uncapped categorization without user review for ambiguous merchants, editing data through AI, or free-form custom categories in v1.
 - Complex financial forecasting beyond simple next-month guidance derived from recent activity.

 ## Decisions
@@ -34,16 +35,25 @@ The repository starts with a product plan and OpenSpec configuration but no appl
 - Rationale: dashboard totals, paycheck coverage, category breakdowns, and AI snapshots must stay consistent across endpoints.
 - Alternative considered: separate logic per route. Rejected because it risks drift between dashboard and insight generation.

+### Use `Ollama` with a local Qwen-class instruct model
+- Rationale: privacy is a primary product requirement, and the target machine can comfortably run a recent local model for lightweight categorization and summary generation.
+- Alternative considered: hosted `OpenAI`. Rejected because it violates the privacy-first goal for personal financial data.
+
 ### Add an AI service boundary with structured prompt input and fallback responses
- Rationale: the app needs provider isolation, predictable prompt shape, and safe messaging when data is too sparse for useful advice.
- Alternative considered: calling `OpenAI` directly from a route handler with raw records. Rejected because it couples prompting, aggregation, and transport too tightly.
+- Rationale: the app needs runtime isolation, predictable prompt shape, and safe messaging when local inference is unavailable or data is too sparse for useful advice.
+- Alternative considered: calling the local model directly from a route handler with raw records. Rejected because it couples prompting, aggregation, and transport too tightly.
+
+### Use merchant rules first and local-model fallback second for category suggestion
+- Rationale: most repeated merchants can be categorized deterministically and faster than model inference, while unknown merchants still benefit from local AI assistance.
+- Alternative considered: model-only categorization. Rejected because it is slower, less predictable, and unnecessary for common merchants.

 ## Risks / Trade-offs

 - [Local timezone handling differs by machine] -> Normalize month calculations around stored local-date strings and test month edges explicitly.
 - [SQLite limits concurrency] -> Acceptable for single-user local-first v1; no mitigation beyond keeping writes simple.
 - [AI output quality varies with sparse or noisy data] -> Add minimum-data fallback logic and keep prompts grounded in structured aggregates.
- [OpenAI dependency requires API key management] -> Read configuration from environment variables and keep failure messages explicit in the UI/API.
+- [Local model may be unavailable or not yet pulled] -> Detect runtime/model readiness and return explicit offline setup guidance in the UI/API.
+- [Merchant names can be ambiguous] -> Use auto-fill only for known deterministic mappings and require user confirmation for fallback suggestions.

 ## Migration Plan

@@ -51,13 +61,13 @@ The repository starts with a product plan and OpenSpec configuration but no appl
 2. Add the Prisma schema, create the initial SQLite migration, and generate the client.
 3. Implement CRUD routes and UI forms for expenses and paychecks.
 4. Implement dashboard aggregation and month filtering.
-5. Add the AI insight service and persistence for generated monthly insights.
+5. Add the offline AI service, merchant-category suggestion flow, and persistence for generated monthly insights.
 6. Run automated tests, then exercise the main flows in the browser.

 Rollback is straightforward in early development: revert the code change and reset the local SQLite database if schema changes become invalid.

 ## Open Questions

- Which `OpenAI` model should be the initial default for monthly insight generation?
+- Which exact local Qwen model tag should be the initial default in `Ollama`?
 - Should generated monthly insights overwrite prior insights for the same month or create a historical trail of regenerated summaries?
 - Do we want soft confirmation in the UI before deleting expenses or paychecks, or is immediate deletion acceptable for v1?
--- a/openspec/changes/monthly-expense-tracker-v1/proposal.md
+++ b/openspec/changes/monthly-expense-tracker-v1/proposal.md
@@ -1,12 +1,13 @@
 ## Why

-The project currently has a product plan but no runnable application, spec artifacts, or implementation scaffold. Formalizing the first version now creates a clear contract for building a local-first expense tracker with reliable monthly summaries and AI-generated guidance.
+The project currently has a product plan but no runnable application, spec artifacts, or implementation scaffold. Formalizing the first version now creates a clear contract for building a local-first expense tracker with reliable monthly summaries, private offline AI assistance, and no dependency on hosted model providers.

 ## What Changes

 - Add a local-first web app for tracking expenses and biweekly paychecks without authentication.
 - Add dashboard capabilities for month-to-date totals, category breakdowns, cash flow, and spending comparisons.
- Add manual AI insight generation for a selected month using structured aggregates and transaction samples.
+- Add fully offline AI insight generation for a selected month using structured aggregates and transaction samples.
+- Add merchant-name-based category suggestion using deterministic rules plus local-model fallback.
 - Add local persistence, validation, and API routes for expenses, paychecks, dashboard data, and insight generation.

 ## Capabilities
@@ -15,7 +16,8 @@ The project currently has a product plan but no runnable application, spec artif
 - `expense-tracking`: Record, list, and delete categorized expenses for a given date.
 - `paycheck-tracking`: Record, list, and delete paycheck entries based on actual pay dates.
 - `monthly-dashboard`: View month-specific spending, income, and derived financial summaries.
- `monthly-insights`: Generate read-only AI insights from monthly financial activity.
+- `monthly-insights`: Generate private offline AI insights from monthly financial activity.
+- `category-suggestion`: Suggest expense categories from merchant/shop names without cloud calls.

 ### Modified Capabilities
 - None.
@@ -24,5 +26,5 @@ The project currently has a product plan but no runnable application, spec artif

 - Affected code: new `Next.js` application, server routes, UI views, Prisma schema, and AI integration service.
 - APIs: `POST/GET/DELETE` routes for expenses and paychecks, `GET /dashboard`, and `POST /insights/generate`.
- Dependencies: `Next.js`, `Prisma`, `SQLite`, and `OpenAI` SDK.
+- Dependencies: `Next.js`, `Prisma`, `SQLite`, `Ollama`, and a local Qwen-class instruct model.
 - Systems: local machine timezone handling for month boundaries and persisted local database storage.
--- a/openspec/changes/monthly-expense-tracker-v1/specs/category-suggestion/spec.md
+++ b/openspec/changes/monthly-expense-tracker-v1/specs/category-suggestion/spec.md
@@ -0,0 +1,23 @@
+## ADDED Requirements
+
+### Requirement: System suggests categories from merchant names
+The system SHALL support merchant-name-based category suggestion for expense entry while keeping all suggestion logic fully offline.
+
+#### Scenario: Known merchant resolves from deterministic rules
+- **WHEN** the user enters a merchant or shop name that matches a known merchant rule
+- **THEN** the system assigns the mapped category without needing model inference
+
+#### Scenario: Unknown merchant falls back to local model
+- **WHEN** the user enters a merchant or shop name that does not match a known merchant rule
+- **THEN** the system asks the local AI service for a category suggestion and returns the suggested category
+
+### Requirement: Ambiguous suggestions remain user-controlled
+The system SHALL keep the final saved category under user control for ambiguous or model-generated suggestions.
+
+#### Scenario: User confirms model suggestion before save
+- **WHEN** the category suggestion comes from model inference instead of a deterministic rule
+- **THEN** the user can review and confirm or change the category before the expense is saved
+
+#### Scenario: No cloud fallback is used
+- **WHEN** the local suggestion service is unavailable
+- **THEN** the system continues to allow manual category selection and does not send merchant data to a hosted provider
--- a/openspec/changes/monthly-expense-tracker-v1/specs/monthly-insights/spec.md
+++ b/openspec/changes/monthly-expense-tracker-v1/specs/monthly-insights/spec.md
@@ -1,11 +1,11 @@
 ## ADDED Requirements

 ### Requirement: User can generate monthly AI insights on demand
-The system SHALL allow the user to manually generate AI insights for any month with existing or sparse data by sending structured monthly context to the configured `OpenAI` provider.
+The system SHALL allow the user to manually generate monthly AI insights for any month with existing or sparse data by sending structured monthly context to a fully offline local inference runtime.

 #### Scenario: Insights are generated for a month with data
 - **WHEN** the user requests insight generation for a month with recorded activity
- **THEN** the system sends monthly aggregates plus transaction samples to the AI service and returns a rendered narrative summary with structured supporting totals
+- **THEN** the system sends monthly aggregates plus transaction samples to the local AI service and returns a rendered narrative summary with structured supporting totals

 #### Scenario: Prior month insights can be generated
 - **WHEN** the user requests insight generation for a previous month that has recorded data
@@ -21,3 +21,10 @@ The system SHALL keep AI insight generation read-only and return a safe fallback
 #### Scenario: AI does not mutate financial records
 - **WHEN** the system generates or stores monthly insights
 - **THEN** no expense or paycheck records are created, updated, or deleted as part of that request
+
+### Requirement: Insight generation remains private and resilient offline
+The system SHALL keep monthly insight generation fully offline and provide a clear fallback response when the local model runtime or selected model is unavailable.
+
+#### Scenario: Local runtime is unavailable
+- **WHEN** the user requests monthly insights while the local AI runtime is not running or the configured model is unavailable
+- **THEN** the system returns a clear setup or availability message instead of attempting a cloud fallback
--- a/openspec/changes/monthly-expense-tracker-v1/tasks.md
+++ b/openspec/changes/monthly-expense-tracker-v1/tasks.md
@@ -22,10 +22,17 @@

 - [x] 4.1 Implement monthly dashboard aggregation services for totals, category breakdowns, and derived comparisons.
 - [x] 4.2 Implement the dashboard API route and render dashboard sections for month-to-date metrics and comparisons.
- [ ] 4.3 Implement the `OpenAI` insight service with structured monthly snapshot input and sparse-month fallback logic.
- [ ] 4.4 Implement insight generation and display in the dashboard, including persisted monthly insight records.
+- [ ] 4.3 Implement the offline `Ollama` insight service with structured monthly snapshot input and sparse-month fallback logic.
+- [ ] 4.4 Implement insight generation and display in the dashboard, including persisted monthly insight records and offline-runtime fallback messaging.

-## 5. Verification
+## 5. Offline categorization

- [ ] 5.1 Add automated tests for validation, persistence, dashboard aggregates, and insight fallback behavior.
- [ ] 5.2 Verify the primary user flows in the browser, including expense entry, paycheck entry, dashboard updates, and insight generation.
+- [x] 5.1 Implement deterministic merchant-to-category mapping for known merchants.
+- [x] 5.2 Implement a local-model category suggestion endpoint for unknown merchants.
+- [x] 5.3 Update the expense entry flow to auto-fill known merchants and require confirmation for model-generated suggestions.
+- [x] 5.4 Add local runtime availability handling so category suggestion falls back to manual selection without cloud calls.
+
+## 6. Verification
+
+- [ ] 6.1 Add automated tests for validation, persistence, dashboard aggregates, offline insight fallback behavior, and category suggestion rules.
+- [ ] 6.2 Verify the primary user flows in the browser, including expense entry, paycheck entry, dashboard updates, category suggestion, and insight generation.