6.1 KiB
6.1 KiB
Phase 4: Fine-Tuning Pipeline Implementation Plan
Goal
Build a pipeline to extract training examples from the Obsidian vault and fine-tune a local 7B model using QLoRA on the RTX 5070.
Architecture
┌─────────────────────────────────────────────────────────┐
│ Training Data Pipeline │
│ ───────────────────── │
│ 1. Extract reflections from vault │
│ 2. Curate into conversation format │
│ 3. Split train/validation │
│ 4. Export to HuggingFace datasets format │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ QLoRA Fine-Tuning (Unsloth) │
│ ─────────────────────────── │
│ - Base: Llama 3.1 8B Instruct (4-bit) │
│ - LoRA rank: 16, alpha: 32 │
│ - Target modules: q_proj, k_proj, v_proj, o_proj │
│ - Learning rate: 2e-4 │
│ - Epochs: 3 │
│ - Batch: 4, Gradient accumulation: 4 │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Model Export & Serving │
│ ───────────────────── │
│ - Export to GGUF (Q4_K_M quantization) │
│ - Serve via llama.cpp or vLLM │
│ - Hot-swap in FastAPI backend │
└─────────────────────────────────────────────────────────┘
Tasks
Task 1: Training Data Extractor
Files:
src/companion/forge/extract.py- Extract reflection examples from vaulttests/test_forge_extract.py- Test extraction logic
Spec:
- Parse vault for "reflection" patterns (journal entries with insights, decision analyses)
- Look for tags: #reflection, #decision, #learning, etc.
- Extract entries where you reflect on situations, weigh options, or analyze outcomes
- Format as conversation: user prompt + assistant response (your reflection)
- Output: JSONL file with {"messages": [{"role": "...", "content": "..."}]}
Task 2: Training Data Curator
Files:
src/companion/forge/curate.py- Human-in-the-loop curationsrc/companion/forge/cli.py- CLI for curation workflow
Spec:
- Load extracted examples
- Interactive review: show each example, allow approve/reject/edit
- Track curation decisions in SQLite
- Export approved examples to final training set
- Deduplicate similar examples (use embeddings similarity)
Task 3: Training Configuration
Files:
src/companion/forge/config.py- Training hyperparametersconfig.jsonupdates for fine_tuning section
Spec:
- Pydantic models for training config
- Hyperparameters tuned for RTX 5070 (12GB VRAM)
- Output paths, logging config
Task 4: QLoRA Training Script
Files:
src/companion/forge/train.py- Unsloth training scriptscripts/train.sh- Convenience launcher
Spec:
- Load base model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
- Apply LoRA config (r=16, alpha=32, target_modules)
- Load and tokenize dataset
- Training loop with wandb logging (optional)
- Save checkpoints every 500 steps
- Validate on holdout set
Task 5: Model Export
Files:
src/companion/forge/export.py- Export to GGUFsrc/companion/forge/merge.py- Merge LoRA weights into base
Spec:
- Merge LoRA weights into base model
- Export to GGUF with Q4_K_M quantization
- Save to
~/.companion/models/ - Update config.json with new model path
Task 6: Model Hot-Swap
Files:
- Update
src/companion/api.py- Add endpoint to reload model src/companion/forge/reload.py- Model reloader utility
Spec:
/admin/reload-modelendpoint (requires auth/local-only)- Gracefully unload old model, load new GGUF
- Return status: success or error
Task 7: Evaluation Framework
Files:
src/companion/forge/eval.py- Evaluate model on test promptstests/test_forge_eval.py- Evaluation tests
Spec:
- Load test prompts (decision scenarios, relationship questions)
- Generate responses from both base and fine-tuned model
- Store outputs for human comparison
- Track metrics: response time, token count
Success Criteria
- Extract 100+ reflection examples from vault
- Curate down to 50-100 high-quality training examples
- Complete training run in <6 hours on RTX 5070
- Export produces valid GGUF file
- Hot-swap endpoint successfully reloads model
- Evaluation shows distinguishable "Santhosh-style" in outputs
Dependencies
unsloth>=2024.1.0
torch>=2.1.0
transformers>=4.36.0
datasets>=2.14.0
peft>=0.7.0
accelerate>=0.25.0
bitsandbytes>=0.41.0
sentencepiece>=0.1.99
protobuf>=3.20.0
Commands
# Extract training data
python -m companion.forge.cli extract
# Curate examples
python -m companion.forge.cli curate
# Train
python -m companion.forge.train
# Export
python -m companion.forge.export
# Reload model in API
python -m companion.forge.reload