WIP: Phase 4 forge extract module with tests

2026-04-13 15:14:35 -04:00
parent 922e724cfe
commit f944bdc573
11 changed files with 1780 additions and 0 deletions
--- a/docs/superpowers/plans/2026-04-14-personal-companion-ai-phase4.md
+++ b/docs/superpowers/plans/2026-04-14-personal-companion-ai-phase4.md
@@ -0,0 +1,156 @@
+# Phase 4: Fine-Tuning Pipeline Implementation Plan
+
+## Goal
+Build a pipeline to extract training examples from the Obsidian vault and fine-tune a local 7B model using QLoRA on the RTX 5070.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│  Training Data Pipeline                                  │
+│  ─────────────────────                                   │
+│  1. Extract reflections from vault                      │
+│  2. Curate into conversation format                     │
+│  3. Split train/validation                              │
+│  4. Export to HuggingFace datasets format               │
+└─────────────────────────────────────────────────────────┘
+                            ↓
+┌─────────────────────────────────────────────────────────┐
+│  QLoRA Fine-Tuning (Unsloth)                             │
+│  ───────────────────────────                            │
+│  - Base: Llama 3.1 8B Instruct (4-bit)                  │
+│  - LoRA rank: 16, alpha: 32                            │
+│  - Target modules: q_proj, k_proj, v_proj, o_proj      │
+│  - Learning rate: 2e-4                                  │
+│  - Epochs: 3                                            │
+│  - Batch: 4, Gradient accumulation: 4                   │
+└─────────────────────────────────────────────────────────┘
+                            ↓
+┌─────────────────────────────────────────────────────────┐
+│  Model Export & Serving                                  │
+│  ─────────────────────                                   │
+│  - Export to GGUF (Q4_K_M quantization)                 │
+│  - Serve via llama.cpp or vLLM                          │
+│  - Hot-swap in FastAPI backend                          │
+└─────────────────────────────────────────────────────────┘
+```
+
+## Tasks
+
+### Task 1: Training Data Extractor
+**Files:**
+- `src/companion/forge/extract.py` - Extract reflection examples from vault
+- `tests/test_forge_extract.py` - Test extraction logic
+
+**Spec:**
+- Parse vault for "reflection" patterns (journal entries with insights, decision analyses)
+- Look for tags: #reflection, #decision, #learning, etc.
+- Extract entries where you reflect on situations, weigh options, or analyze outcomes
+- Format as conversation: user prompt + assistant response (your reflection)
+- Output: JSONL file with {"messages": [{"role": "...", "content": "..."}]}
+
+### Task 2: Training Data Curator
+**Files:**
+- `src/companion/forge/curate.py` - Human-in-the-loop curation
+- `src/companion/forge/cli.py` - CLI for curation workflow
+
+**Spec:**
+- Load extracted examples
+- Interactive review: show each example, allow approve/reject/edit
+- Track curation decisions in SQLite
+- Export approved examples to final training set
+- Deduplicate similar examples (use embeddings similarity)
+
+### Task 3: Training Configuration
+**Files:**
+- `src/companion/forge/config.py` - Training hyperparameters
+- `config.json` updates for fine_tuning section
+
+**Spec:**
+- Pydantic models for training config
+- Hyperparameters tuned for RTX 5070 (12GB VRAM)
+- Output paths, logging config
+
+### Task 4: QLoRA Training Script
+**Files:**
+- `src/companion/forge/train.py` - Unsloth training script
+- `scripts/train.sh` - Convenience launcher
+
+**Spec:**
+- Load base model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
+- Apply LoRA config (r=16, alpha=32, target_modules)
+- Load and tokenize dataset
+- Training loop with wandb logging (optional)
+- Save checkpoints every 500 steps
+- Validate on holdout set
+
+### Task 5: Model Export
+**Files:**
+- `src/companion/forge/export.py` - Export to GGUF
+- `src/companion/forge/merge.py` - Merge LoRA weights into base
+
+**Spec:**
+- Merge LoRA weights into base model
+- Export to GGUF with Q4_K_M quantization
+- Save to `~/.companion/models/`
+- Update config.json with new model path
+
+### Task 6: Model Hot-Swap
+**Files:**
+- Update `src/companion/api.py` - Add endpoint to reload model
+- `src/companion/forge/reload.py` - Model reloader utility
+
+**Spec:**
+- `/admin/reload-model` endpoint (requires auth/local-only)
+- Gracefully unload old model, load new GGUF
+- Return status: success or error
+
+### Task 7: Evaluation Framework
+**Files:**
+- `src/companion/forge/eval.py` - Evaluate model on test prompts
+- `tests/test_forge_eval.py` - Evaluation tests
+
+**Spec:**
+- Load test prompts (decision scenarios, relationship questions)
+- Generate responses from both base and fine-tuned model
+- Store outputs for human comparison
+- Track metrics: response time, token count
+
+## Success Criteria
+- [ ] Extract 100+ reflection examples from vault
+- [ ] Curate down to 50-100 high-quality training examples
+- [ ] Complete training run in <6 hours on RTX 5070
+- [ ] Export produces valid GGUF file
+- [ ] Hot-swap endpoint successfully reloads model
+- [ ] Evaluation shows distinguishable "Santhosh-style" in outputs
+
+## Dependencies
+```
+unsloth>=2024.1.0
+torch>=2.1.0
+transformers>=4.36.0
+datasets>=2.14.0
+peft>=0.7.0
+accelerate>=0.25.0
+bitsandbytes>=0.41.0
+sentencepiece>=0.1.99
+protobuf>=3.20.0
+```
+
+## Commands
+```bash
+# Extract training data
+python -m companion.forge.cli extract
+
+# Curate examples
+python -m companion.forge.cli curate
+
+# Train
+python -m companion.forge.train
+
+# Export
+python -m companion.forge.export
+
+# Reload model in API
+python -m companion.forge.reload
+```