7.4 KiB
FORGE Module Documentation
The FORGE module handles fine-tuning of the companion model. It extracts training examples from your vault reflections and trains a custom LoRA adapter using QLoRA on your local GPU.
Architecture
Vault Reflections
↓
┌─────────────────┐
│ Extract │ - Scan for #reflection, #insight tags
│ (extract.py) │ - Parse reflection patterns
└────────┬────────┘
↓
┌─────────────────┐
│ Curate │ - Manual review (optional)
│ (curate.py) │ - Deduplication
└────────┬────────┘
↓
┌─────────────────┐
│ Train │ - QLoRA fine-tuning
│ (train.py) │ - Unsloth + transformers
└────────┬────────┘
↓
┌─────────────────┐
│ Export │ - Merge LoRA weights
│ (export.py) │ - Convert to GGUF
└────────┬────────┘
↓
┌─────────────────┐
│ Reload │ - Hot-swap in API
│ (reload.py) │ - No restart needed
└─────────────────┘
Requirements
- GPU: RTX 5070 or equivalent (12GB+ VRAM)
- Dependencies: Install with
pip install -e ".[train]" - Time: 4-6 hours for full training run
Workflow
1. Extract Training Data
Scan your vault for reflection patterns:
python -m companion.forge.cli extract
This scans for:
- Tags:
#reflection,#insight,#learning,#decision, etc. - Patterns: "I think", "I realize", "Looking back", "What if"
- Section headers in journal entries
Output: ~/.companion/training_data/extracted.jsonl
Example extracted data:
{
"messages": [
{"role": "system", "content": "You are a thoughtful, reflective companion."},
{"role": "user", "content": "I'm facing a decision. How should I think through this?"},
{"role": "assistant", "content": "#reflection I think I need to slow down..."}
],
"source_file": "Journal/2026/04/2026-04-12.md",
"tags": ["#reflection", "#DayInShort"],
"date": "2026-04-12"
}
2. Train Model
Run QLoRA fine-tuning:
python -m companion.forge.cli train --epochs 3 --lr 2e-4
Hyperparameters (from config):
| Parameter | Default | Description |
|---|---|---|
lora_rank |
16 | LoRA rank (8-64) |
lora_alpha |
32 | LoRA scaling factor |
learning_rate |
2e-4 | Optimizer learning rate |
num_epochs |
3 | Training epochs |
batch_size |
4 | Per-device batch |
gradient_accumulation_steps |
4 | Steps before update |
Training Output:
- Checkpoints:
~/.companion/training/checkpoint-*/ - Final model:
~/.companion/training/final/ - Logs: Training loss, eval metrics
3. Reload Model
Hot-swap without restarting API:
python -m companion.forge.cli reload ~/.companion/training/final
Or via API:
curl -X POST http://localhost:7373/admin/reload-model \
-H "Content-Type: application/json" \
-d '{"model_path": "~/.companion/training/final"}'
Components
Extractor (companion.forge.extract)
from companion.forge.extract import TrainingDataExtractor, extract_training_data
# Extract from vault
extractor = TrainingDataExtractor(config)
examples = extractor.extract()
# Get statistics
stats = extractor.get_stats()
print(f"Extracted {stats['total']} examples")
# Save to JSONL
extractor.save_to_jsonl(Path("training.jsonl"))
Reflection Detection:
- Tags:
#reflection,#learning,#insight,#decision,#analysis,#takeaway,#realization - Patterns: "I think", "I feel", "I realize", "I wonder", "Looking back", "On one hand...", "Ultimately decided"
Trainer (companion.forge.train)
from companion.forge.train import train
final_path = train(
data_path=Path("training.jsonl"),
output_dir=Path("~/.companion/training"),
base_model="unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
lora_rank=16,
lora_alpha=32,
learning_rate=2e-4,
num_epochs=3,
)
Base Models:
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit- Recommendedunsloth/llama-3-8b-bnb-4bit- Alternative
Target Modules:
LoRA is applied to: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Exporter (companion.forge.export)
from companion.forge.export import merge_only
# Merge LoRA into base model
merged_path = merge_only(
checkpoint_path=Path("~/.companion/training/checkpoint-500"),
output_path=Path("~/.companion/models/merged"),
)
Reloader (companion.forge.reload)
from companion.forge.reload import reload_model, get_model_status
# Check current model
status = get_model_status(config)
print(f"Model size: {status['size_mb']} MB")
# Reload with new model
new_path = reload_model(
config=config,
new_model_path=Path("~/.companion/training/final"),
backup=True,
)
CLI Reference
# Extract training data
companion.forge.cli extract [--output PATH]
# Train model
companion.forge.cli train \
[--data PATH] \
[--output PATH] \
[--epochs N] \
[--lr FLOAT]
# Check model status
companion.forge.cli status
# Reload model
companion.forge.cli reload MODEL_PATH [--no-backup]
Training Tips
Dataset Size:
- Minimum: 50 examples
- Optimal: 100-500 examples
- More is not always better - quality over quantity
Epochs:
- Start with 3 epochs
- Increase if underfitting (high loss)
- Decrease if overfitting (loss increases on eval)
LoRA Rank:
8- Quick experiments16- Balanced (recommended)32-64- High capacity, more VRAM
Overfitting Signs:
- Training loss decreasing, eval loss increasing
- Model repeats exact phrases from training data
- Responses feel "memorized" not "learned"
VRAM Usage (RTX 5070, 12GB)
| Config | VRAM | Batch Size |
|---|---|---|
| Rank 16, 8-bit adam | ~10GB | 4 |
| Rank 32, 8-bit adam | ~11GB | 4 |
| Rank 64, 8-bit adam | OOM | - |
Use gradient_accumulation_steps to increase effective batch size.
Troubleshooting
CUDA Out of Memory
- Reduce
lora_rankto 8 - Reduce
batch_sizeto 2 - Increase
gradient_accumulation_steps
Training Loss Not Decreasing
- Check data quality (reflections present?)
- Increase learning rate to 5e-4
- Check for data formatting issues
Model Not Loading After Reload
- Check path exists:
ls -la ~/.companion/models/ - Verify model format (GGUF vs HF)
- Check API logs for errors
Slow Training
- Expected: ~6 hours for 3 epochs on RTX 5070
- Enable gradient checkpointing (enabled by default)
- Close other GPU applications
Advanced: Custom Training Script
# custom_train.py
from companion.forge.train import train
from companion.config import load_config
config = load_config()
final_path = train(
data_path=config.model.fine_tuning.training_data_path / "curated.jsonl",
output_dir=config.model.fine_tuning.output_dir,
base_model=config.model.fine_tuning.base_model,
lora_rank=32, # Higher capacity
lora_alpha=64,
learning_rate=3e-4, # Slightly higher
num_epochs=5, # More epochs
batch_size=2, # Smaller batches
gradient_accumulation_steps=8, # Effective batch = 16
)
print(f"Model saved to: {final_path}")