# FORGE Module Documentation The FORGE module handles fine-tuning of the companion model. It extracts training examples from your vault reflections and trains a custom LoRA adapter using QLoRA on your local GPU. ## Architecture ``` Vault Reflections ↓ ┌─────────────────┐ │ Extract │ - Scan for #reflection, #insight tags │ (extract.py) │ - Parse reflection patterns └────────┬────────┘ ↓ ┌─────────────────┐ │ Curate │ - Manual review (optional) │ (curate.py) │ - Deduplication └────────┬────────┘ ↓ ┌─────────────────┐ │ Train │ - QLoRA fine-tuning │ (train.py) │ - Unsloth + transformers └────────┬────────┘ ↓ ┌─────────────────┐ │ Export │ - Merge LoRA weights │ (export.py) │ - Convert to GGUF └────────┬────────┘ ↓ ┌─────────────────┐ │ Reload │ - Hot-swap in API │ (reload.py) │ - No restart needed └─────────────────┘ ``` ## Requirements - **GPU**: RTX 5070 or equivalent (12GB+ VRAM) - **Dependencies**: Install with `pip install -e ".[train]"` - **Time**: 4-6 hours for full training run ## Workflow ### 1. Extract Training Data Scan your vault for reflection patterns: ```bash python -m companion.forge.cli extract ``` This scans for: - Tags: `#reflection`, `#insight`, `#learning`, `#decision`, etc. - Patterns: "I think", "I realize", "Looking back", "What if" - Section headers in journal entries Output: `~/.companion/training_data/extracted.jsonl` **Example extracted data:** ```json { "messages": [ {"role": "system", "content": "You are a thoughtful, reflective companion."}, {"role": "user", "content": "I'm facing a decision. How should I think through this?"}, {"role": "assistant", "content": "#reflection I think I need to slow down..."} ], "source_file": "Journal/2026/04/2026-04-12.md", "tags": ["#reflection", "#DayInShort"], "date": "2026-04-12" } ``` ### 2. Train Model Run QLoRA fine-tuning: ```bash python -m companion.forge.cli train --epochs 3 --lr 2e-4 ``` **Hyperparameters (from config):** | Parameter | Default | Description | |-----------|---------|-------------| | `lora_rank` | 16 | LoRA rank (8-64) | | `lora_alpha` | 32 | LoRA scaling factor | | `learning_rate` | 2e-4 | Optimizer learning rate | | `num_epochs` | 3 | Training epochs | | `batch_size` | 4 | Per-device batch | | `gradient_accumulation_steps` | 4 | Steps before update | **Training Output:** - Checkpoints: `~/.companion/training/checkpoint-*/` - Final model: `~/.companion/training/final/` - Logs: Training loss, eval metrics ### 3. Reload Model Hot-swap without restarting API: ```bash python -m companion.forge.cli reload ~/.companion/training/final ``` Or via API: ```bash curl -X POST http://localhost:7373/admin/reload-model \ -H "Content-Type: application/json" \ -d '{"model_path": "~/.companion/training/final"}' ``` ## Components ### Extractor (`companion.forge.extract`) ```python from companion.forge.extract import TrainingDataExtractor, extract_training_data # Extract from vault extractor = TrainingDataExtractor(config) examples = extractor.extract() # Get statistics stats = extractor.get_stats() print(f"Extracted {stats['total']} examples") # Save to JSONL extractor.save_to_jsonl(Path("training.jsonl")) ``` **Reflection Detection:** - **Tags**: `#reflection`, `#learning`, `#insight`, `#decision`, `#analysis`, `#takeaway`, `#realization` - **Patterns**: "I think", "I feel", "I realize", "I wonder", "Looking back", "On one hand...", "Ultimately decided" ### Trainer (`companion.forge.train`) ```python from companion.forge.train import train final_path = train( data_path=Path("training.jsonl"), output_dir=Path("~/.companion/training"), base_model="unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit", lora_rank=16, lora_alpha=32, learning_rate=2e-4, num_epochs=3, ) ``` **Base Models:** - `unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit` - Recommended - `unsloth/llama-3-8b-bnb-4bit` - Alternative **Target Modules:** LoRA is applied to: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` ### Exporter (`companion.forge.export`) ```python from companion.forge.export import merge_only # Merge LoRA into base model merged_path = merge_only( checkpoint_path=Path("~/.companion/training/checkpoint-500"), output_path=Path("~/.companion/models/merged"), ) ``` ### Reloader (`companion.forge.reload`) ```python from companion.forge.reload import reload_model, get_model_status # Check current model status = get_model_status(config) print(f"Model size: {status['size_mb']} MB") # Reload with new model new_path = reload_model( config=config, new_model_path=Path("~/.companion/training/final"), backup=True, ) ``` ## CLI Reference ```bash # Extract training data python -m companion.forge.cli extract [--output PATH] # Train model python -m companion.forge.train \ --data PATH \ --output-dir PATH \ --epochs N \ --lr FLOAT # Check model status python -m companion.forge.cli status # Reload model python -m companion.forge.cli reload MODEL_PATH [--no-backup] ``` **Note:** Use `--output-dir` (or `--output`) to specify the training output directory. ## Training Tips **Dataset Size:** - Minimum: 50 examples - Optimal: 100-500 examples - More is not always better - quality over quantity **Epochs:** - Start with 3 epochs - Increase if underfitting (high loss) - Decrease if overfitting (loss increases on eval) **LoRA Rank:** - `8` - Quick experiments - `16` - Balanced (recommended) - `32-64` - High capacity, more VRAM **Overfitting Signs:** - Training loss decreasing, eval loss increasing - Model repeats exact phrases from training data - Responses feel "memorized" not "learned" ## VRAM Usage (RTX 5070, 12GB) | Config | VRAM | Batch Size | |--------|------|------------| | Rank 16, 8-bit adam | ~10GB | 4 | | Rank 32, 8-bit adam | ~11GB | 4 | | Rank 64, 8-bit adam | OOM | - | Use `gradient_accumulation_steps` to increase effective batch size. ## Troubleshooting **GPU Not Detected / CUDA Not Available** - See [GPU Compatibility Guide](gpu-compatibility.md) - Common issue on RTX 50-series: Install CUDA-enabled PyTorch: `pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121` - Verify: `python -c "import torch; print(torch.cuda.is_available())"` **CUDA Out of Memory** - Reduce `lora_rank` to 8 - Reduce `batch_size` to 2 - Increase `gradient_accumulation_steps` **Training Loss Not Decreasing** - Check data quality (reflections present?) - Increase learning rate to 5e-4 - Check for data formatting issues **Model Not Loading After Reload** - Check path exists: `ls -la ~/.companion/models/` - Verify model format (GGUF vs HF) - Check API logs for errors **Slow Training** - Expected: ~6 hours for 3 epochs on RTX 5070 - Enable gradient checkpointing (enabled by default) - Close other GPU applications ## Advanced: Custom Training Script ```python # custom_train.py from companion.forge.train import train from companion.config import load_config config = load_config() final_path = train( data_path=config.model.fine_tuning.training_data_path / "curated.jsonl", output_dir=config.model.fine_tuning.output_dir, base_model=config.model.fine_tuning.base_model, lora_rank=32, # Higher capacity lora_alpha=64, learning_rate=3e-4, # Slightly higher num_epochs=5, # More epochs batch_size=2, # Smaller batches gradient_accumulation_steps=8, # Effective batch = 16 ) print(f"Model saved to: {final_path}") ```