Files

Santhosh Janardhanan f944bdc573 WIP: Phase 4 forge extract module with tests

2026-04-13 15:14:35 -04:00

6.1 KiB

Raw Blame History

Phase 4: Fine-Tuning Pipeline Implementation Plan

Goal

Build a pipeline to extract training examples from the Obsidian vault and fine-tune a local 7B model using QLoRA on the RTX 5070.

Architecture

┌─────────────────────────────────────────────────────────┐
│  Training Data Pipeline                                  │
│  ─────────────────────                                   │
│  1. Extract reflections from vault                      │
│  2. Curate into conversation format                     │
│  3. Split train/validation                              │
│  4. Export to HuggingFace datasets format               │
└─────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────┐
│  QLoRA Fine-Tuning (Unsloth)                             │
│  ───────────────────────────                            │
│  - Base: Llama 3.1 8B Instruct (4-bit)                  │
│  - LoRA rank: 16, alpha: 32                            │
│  - Target modules: q_proj, k_proj, v_proj, o_proj      │
│  - Learning rate: 2e-4                                  │
│  - Epochs: 3                                            │
│  - Batch: 4, Gradient accumulation: 4                   │
└─────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────┐
│  Model Export & Serving                                  │
│  ─────────────────────                                   │
│  - Export to GGUF (Q4_K_M quantization)                 │
│  - Serve via llama.cpp or vLLM                          │
│  - Hot-swap in FastAPI backend                          │
└─────────────────────────────────────────────────────────┘

Tasks

Task 1: Training Data Extractor

Files:

src/companion/forge/extract.py - Extract reflection examples from vault
tests/test_forge_extract.py - Test extraction logic

Spec:

Parse vault for "reflection" patterns (journal entries with insights, decision analyses)
Look for tags: #reflection, #decision, #learning, etc.
Extract entries where you reflect on situations, weigh options, or analyze outcomes
Format as conversation: user prompt + assistant response (your reflection)
Output: JSONL file with {"messages": [{"role": "...", "content": "..."}]}

Task 2: Training Data Curator

Files:

src/companion/forge/curate.py - Human-in-the-loop curation
src/companion/forge/cli.py - CLI for curation workflow

Spec:

Load extracted examples
Interactive review: show each example, allow approve/reject/edit
Track curation decisions in SQLite
Export approved examples to final training set
Deduplicate similar examples (use embeddings similarity)

Task 3: Training Configuration

Files:

src/companion/forge/config.py - Training hyperparameters
config.json updates for fine_tuning section

Spec:

Pydantic models for training config
Hyperparameters tuned for RTX 5070 (12GB VRAM)
Output paths, logging config

Task 4: QLoRA Training Script

Files:

src/companion/forge/train.py - Unsloth training script
scripts/train.sh - Convenience launcher

Spec:

Load base model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
Apply LoRA config (r=16, alpha=32, target_modules)
Load and tokenize dataset
Training loop with wandb logging (optional)
Save checkpoints every 500 steps
Validate on holdout set

Task 5: Model Export

Files:

src/companion/forge/export.py - Export to GGUF
src/companion/forge/merge.py - Merge LoRA weights into base

Spec:

Merge LoRA weights into base model
Export to GGUF with Q4_K_M quantization
Save to ~/.companion/models/
Update config.json with new model path

Task 6: Model Hot-Swap

Files:

Update src/companion/api.py - Add endpoint to reload model
src/companion/forge/reload.py - Model reloader utility

Spec:

/admin/reload-model endpoint (requires auth/local-only)
Gracefully unload old model, load new GGUF
Return status: success or error

Task 7: Evaluation Framework

Files:

src/companion/forge/eval.py - Evaluate model on test prompts
tests/test_forge_eval.py - Evaluation tests

Spec:

Load test prompts (decision scenarios, relationship questions)
Generate responses from both base and fine-tuned model
Store outputs for human comparison
Track metrics: response time, token count

Success Criteria

Extract 100+ reflection examples from vault
Curate down to 50-100 high-quality training examples
Complete training run in <6 hours on RTX 5070
Export produces valid GGUF file
Hot-swap endpoint successfully reloads model
Evaluation shows distinguishable "Santhosh-style" in outputs

Dependencies

unsloth>=2024.1.0
torch>=2.1.0
transformers>=4.36.0
datasets>=2.14.0
peft>=0.7.0
accelerate>=0.25.0
bitsandbytes>=0.41.0
sentencepiece>=0.1.99
protobuf>=3.20.0

Commands

# Extract training data
python -m companion.forge.cli extract

# Curate examples
python -m companion.forge.cli curate

# Train
python -m companion.forge.train

# Export
python -m companion.forge.export

# Reload model in API
python -m companion.forge.reload

6.1 KiB Raw Blame History

Phase 4: Fine-Tuning Pipeline Implementation Plan

Goal

Architecture

Tasks

Task 1: Training Data Extractor

Task 2: Training Data Curator

Task 3: Training Configuration

Task 4: QLoRA Training Script

Task 5: Model Export

Task 6: Model Hot-Swap

Task 7: Evaluation Framework

Success Criteria

Dependencies

Commands

6.1 KiB

Raw Blame History