# Phase 4: Fine-Tuning Pipeline Implementation Plan ## Goal Build a pipeline to extract training examples from the Obsidian vault and fine-tune a local 7B model using QLoRA on the RTX 5070. ## Architecture ``` ┌─────────────────────────────────────────────────────────┐ │ Training Data Pipeline │ │ ───────────────────── │ │ 1. Extract reflections from vault │ │ 2. Curate into conversation format │ │ 3. Split train/validation │ │ 4. Export to HuggingFace datasets format │ └─────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────┐ │ QLoRA Fine-Tuning (Unsloth) │ │ ─────────────────────────── │ │ - Base: Llama 3.1 8B Instruct (4-bit) │ │ - LoRA rank: 16, alpha: 32 │ │ - Target modules: q_proj, k_proj, v_proj, o_proj │ │ - Learning rate: 2e-4 │ │ - Epochs: 3 │ │ - Batch: 4, Gradient accumulation: 4 │ └─────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────┐ │ Model Export & Serving │ │ ───────────────────── │ │ - Export to GGUF (Q4_K_M quantization) │ │ - Serve via llama.cpp or vLLM │ │ - Hot-swap in FastAPI backend │ └─────────────────────────────────────────────────────────┘ ``` ## Tasks ### Task 1: Training Data Extractor **Files:** - `src/companion/forge/extract.py` - Extract reflection examples from vault - `tests/test_forge_extract.py` - Test extraction logic **Spec:** - Parse vault for "reflection" patterns (journal entries with insights, decision analyses) - Look for tags: #reflection, #decision, #learning, etc. - Extract entries where you reflect on situations, weigh options, or analyze outcomes - Format as conversation: user prompt + assistant response (your reflection) - Output: JSONL file with {"messages": [{"role": "...", "content": "..."}]} ### Task 2: Training Data Curator **Files:** - `src/companion/forge/curate.py` - Human-in-the-loop curation - `src/companion/forge/cli.py` - CLI for curation workflow **Spec:** - Load extracted examples - Interactive review: show each example, allow approve/reject/edit - Track curation decisions in SQLite - Export approved examples to final training set - Deduplicate similar examples (use embeddings similarity) ### Task 3: Training Configuration **Files:** - `src/companion/forge/config.py` - Training hyperparameters - `config.json` updates for fine_tuning section **Spec:** - Pydantic models for training config - Hyperparameters tuned for RTX 5070 (12GB VRAM) - Output paths, logging config ### Task 4: QLoRA Training Script **Files:** - `src/companion/forge/train.py` - Unsloth training script - `scripts/train.sh` - Convenience launcher **Spec:** - Load base model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit - Apply LoRA config (r=16, alpha=32, target_modules) - Load and tokenize dataset - Training loop with wandb logging (optional) - Save checkpoints every 500 steps - Validate on holdout set ### Task 5: Model Export **Files:** - `src/companion/forge/export.py` - Export to GGUF - `src/companion/forge/merge.py` - Merge LoRA weights into base **Spec:** - Merge LoRA weights into base model - Export to GGUF with Q4_K_M quantization - Save to `~/.companion/models/` - Update config.json with new model path ### Task 6: Model Hot-Swap **Files:** - Update `src/companion/api.py` - Add endpoint to reload model - `src/companion/forge/reload.py` - Model reloader utility **Spec:** - `/admin/reload-model` endpoint (requires auth/local-only) - Gracefully unload old model, load new GGUF - Return status: success or error ### Task 7: Evaluation Framework **Files:** - `src/companion/forge/eval.py` - Evaluate model on test prompts - `tests/test_forge_eval.py` - Evaluation tests **Spec:** - Load test prompts (decision scenarios, relationship questions) - Generate responses from both base and fine-tuned model - Store outputs for human comparison - Track metrics: response time, token count ## Success Criteria - [ ] Extract 100+ reflection examples from vault - [ ] Curate down to 50-100 high-quality training examples - [ ] Complete training run in <6 hours on RTX 5070 - [ ] Export produces valid GGUF file - [ ] Hot-swap endpoint successfully reloads model - [ ] Evaluation shows distinguishable "Santhosh-style" in outputs ## Dependencies ``` unsloth>=2024.1.0 torch>=2.1.0 transformers>=4.36.0 datasets>=2.14.0 peft>=0.7.0 accelerate>=0.25.0 bitsandbytes>=0.41.0 sentencepiece>=0.1.99 protobuf>=3.20.0 ``` ## Commands ```bash # Extract training data python -m companion.forge.cli extract # Curate examples python -m companion.forge.cli curate # Train python -m companion.forge.train # Export python -m companion.forge.export # Reload model in API python -m companion.forge.reload ```