WIP: Phase 4 forge extract module with tests
This commit is contained in:
@@ -0,0 +1,156 @@
|
||||
# Phase 4: Fine-Tuning Pipeline Implementation Plan
|
||||
|
||||
## Goal
|
||||
Build a pipeline to extract training examples from the Obsidian vault and fine-tune a local 7B model using QLoRA on the RTX 5070.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Training Data Pipeline │
|
||||
│ ───────────────────── │
|
||||
│ 1. Extract reflections from vault │
|
||||
│ 2. Curate into conversation format │
|
||||
│ 3. Split train/validation │
|
||||
│ 4. Export to HuggingFace datasets format │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ QLoRA Fine-Tuning (Unsloth) │
|
||||
│ ─────────────────────────── │
|
||||
│ - Base: Llama 3.1 8B Instruct (4-bit) │
|
||||
│ - LoRA rank: 16, alpha: 32 │
|
||||
│ - Target modules: q_proj, k_proj, v_proj, o_proj │
|
||||
│ - Learning rate: 2e-4 │
|
||||
│ - Epochs: 3 │
|
||||
│ - Batch: 4, Gradient accumulation: 4 │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Model Export & Serving │
|
||||
│ ───────────────────── │
|
||||
│ - Export to GGUF (Q4_K_M quantization) │
|
||||
│ - Serve via llama.cpp or vLLM │
|
||||
│ - Hot-swap in FastAPI backend │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Tasks
|
||||
|
||||
### Task 1: Training Data Extractor
|
||||
**Files:**
|
||||
- `src/companion/forge/extract.py` - Extract reflection examples from vault
|
||||
- `tests/test_forge_extract.py` - Test extraction logic
|
||||
|
||||
**Spec:**
|
||||
- Parse vault for "reflection" patterns (journal entries with insights, decision analyses)
|
||||
- Look for tags: #reflection, #decision, #learning, etc.
|
||||
- Extract entries where you reflect on situations, weigh options, or analyze outcomes
|
||||
- Format as conversation: user prompt + assistant response (your reflection)
|
||||
- Output: JSONL file with {"messages": [{"role": "...", "content": "..."}]}
|
||||
|
||||
### Task 2: Training Data Curator
|
||||
**Files:**
|
||||
- `src/companion/forge/curate.py` - Human-in-the-loop curation
|
||||
- `src/companion/forge/cli.py` - CLI for curation workflow
|
||||
|
||||
**Spec:**
|
||||
- Load extracted examples
|
||||
- Interactive review: show each example, allow approve/reject/edit
|
||||
- Track curation decisions in SQLite
|
||||
- Export approved examples to final training set
|
||||
- Deduplicate similar examples (use embeddings similarity)
|
||||
|
||||
### Task 3: Training Configuration
|
||||
**Files:**
|
||||
- `src/companion/forge/config.py` - Training hyperparameters
|
||||
- `config.json` updates for fine_tuning section
|
||||
|
||||
**Spec:**
|
||||
- Pydantic models for training config
|
||||
- Hyperparameters tuned for RTX 5070 (12GB VRAM)
|
||||
- Output paths, logging config
|
||||
|
||||
### Task 4: QLoRA Training Script
|
||||
**Files:**
|
||||
- `src/companion/forge/train.py` - Unsloth training script
|
||||
- `scripts/train.sh` - Convenience launcher
|
||||
|
||||
**Spec:**
|
||||
- Load base model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
|
||||
- Apply LoRA config (r=16, alpha=32, target_modules)
|
||||
- Load and tokenize dataset
|
||||
- Training loop with wandb logging (optional)
|
||||
- Save checkpoints every 500 steps
|
||||
- Validate on holdout set
|
||||
|
||||
### Task 5: Model Export
|
||||
**Files:**
|
||||
- `src/companion/forge/export.py` - Export to GGUF
|
||||
- `src/companion/forge/merge.py` - Merge LoRA weights into base
|
||||
|
||||
**Spec:**
|
||||
- Merge LoRA weights into base model
|
||||
- Export to GGUF with Q4_K_M quantization
|
||||
- Save to `~/.companion/models/`
|
||||
- Update config.json with new model path
|
||||
|
||||
### Task 6: Model Hot-Swap
|
||||
**Files:**
|
||||
- Update `src/companion/api.py` - Add endpoint to reload model
|
||||
- `src/companion/forge/reload.py` - Model reloader utility
|
||||
|
||||
**Spec:**
|
||||
- `/admin/reload-model` endpoint (requires auth/local-only)
|
||||
- Gracefully unload old model, load new GGUF
|
||||
- Return status: success or error
|
||||
|
||||
### Task 7: Evaluation Framework
|
||||
**Files:**
|
||||
- `src/companion/forge/eval.py` - Evaluate model on test prompts
|
||||
- `tests/test_forge_eval.py` - Evaluation tests
|
||||
|
||||
**Spec:**
|
||||
- Load test prompts (decision scenarios, relationship questions)
|
||||
- Generate responses from both base and fine-tuned model
|
||||
- Store outputs for human comparison
|
||||
- Track metrics: response time, token count
|
||||
|
||||
## Success Criteria
|
||||
- [ ] Extract 100+ reflection examples from vault
|
||||
- [ ] Curate down to 50-100 high-quality training examples
|
||||
- [ ] Complete training run in <6 hours on RTX 5070
|
||||
- [ ] Export produces valid GGUF file
|
||||
- [ ] Hot-swap endpoint successfully reloads model
|
||||
- [ ] Evaluation shows distinguishable "Santhosh-style" in outputs
|
||||
|
||||
## Dependencies
|
||||
```
|
||||
unsloth>=2024.1.0
|
||||
torch>=2.1.0
|
||||
transformers>=4.36.0
|
||||
datasets>=2.14.0
|
||||
peft>=0.7.0
|
||||
accelerate>=0.25.0
|
||||
bitsandbytes>=0.41.0
|
||||
sentencepiece>=0.1.99
|
||||
protobuf>=3.20.0
|
||||
```
|
||||
|
||||
## Commands
|
||||
```bash
|
||||
# Extract training data
|
||||
python -m companion.forge.cli extract
|
||||
|
||||
# Curate examples
|
||||
python -m companion.forge.cli curate
|
||||
|
||||
# Train
|
||||
python -m companion.forge.train
|
||||
|
||||
# Export
|
||||
python -m companion.forge.export
|
||||
|
||||
# Reload model in API
|
||||
python -m companion.forge.reload
|
||||
```
|
||||
Reference in New Issue
Block a user