289 lines
7.4 KiB
Markdown
289 lines
7.4 KiB
Markdown
# FORGE Module Documentation
|
|
|
|
The FORGE module handles fine-tuning of the companion model. It extracts training examples from your vault reflections and trains a custom LoRA adapter using QLoRA on your local GPU.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Vault Reflections
|
|
↓
|
|
┌─────────────────┐
|
|
│ Extract │ - Scan for #reflection, #insight tags
|
|
│ (extract.py) │ - Parse reflection patterns
|
|
└────────┬────────┘
|
|
↓
|
|
┌─────────────────┐
|
|
│ Curate │ - Manual review (optional)
|
|
│ (curate.py) │ - Deduplication
|
|
└────────┬────────┘
|
|
↓
|
|
┌─────────────────┐
|
|
│ Train │ - QLoRA fine-tuning
|
|
│ (train.py) │ - Unsloth + transformers
|
|
└────────┬────────┘
|
|
↓
|
|
┌─────────────────┐
|
|
│ Export │ - Merge LoRA weights
|
|
│ (export.py) │ - Convert to GGUF
|
|
└────────┬────────┘
|
|
↓
|
|
┌─────────────────┐
|
|
│ Reload │ - Hot-swap in API
|
|
│ (reload.py) │ - No restart needed
|
|
└─────────────────┘
|
|
```
|
|
|
|
## Requirements
|
|
|
|
- **GPU**: RTX 5070 or equivalent (12GB+ VRAM)
|
|
- **Dependencies**: Install with `pip install -e ".[train]"`
|
|
- **Time**: 4-6 hours for full training run
|
|
|
|
## Workflow
|
|
|
|
### 1. Extract Training Data
|
|
|
|
Scan your vault for reflection patterns:
|
|
|
|
```bash
|
|
python -m companion.forge.cli extract
|
|
```
|
|
|
|
This scans for:
|
|
- Tags: `#reflection`, `#insight`, `#learning`, `#decision`, etc.
|
|
- Patterns: "I think", "I realize", "Looking back", "What if"
|
|
- Section headers in journal entries
|
|
|
|
Output: `~/.companion/training_data/extracted.jsonl`
|
|
|
|
**Example extracted data:**
|
|
|
|
```json
|
|
{
|
|
"messages": [
|
|
{"role": "system", "content": "You are a thoughtful, reflective companion."},
|
|
{"role": "user", "content": "I'm facing a decision. How should I think through this?"},
|
|
{"role": "assistant", "content": "#reflection I think I need to slow down..."}
|
|
],
|
|
"source_file": "Journal/2026/04/2026-04-12.md",
|
|
"tags": ["#reflection", "#DayInShort"],
|
|
"date": "2026-04-12"
|
|
}
|
|
```
|
|
|
|
### 2. Train Model
|
|
|
|
Run QLoRA fine-tuning:
|
|
|
|
```bash
|
|
python -m companion.forge.cli train --epochs 3 --lr 2e-4
|
|
```
|
|
|
|
**Hyperparameters (from config):**
|
|
|
|
| Parameter | Default | Description |
|
|
|-----------|---------|-------------|
|
|
| `lora_rank` | 16 | LoRA rank (8-64) |
|
|
| `lora_alpha` | 32 | LoRA scaling factor |
|
|
| `learning_rate` | 2e-4 | Optimizer learning rate |
|
|
| `num_epochs` | 3 | Training epochs |
|
|
| `batch_size` | 4 | Per-device batch |
|
|
| `gradient_accumulation_steps` | 4 | Steps before update |
|
|
|
|
**Training Output:**
|
|
- Checkpoints: `~/.companion/training/checkpoint-*/`
|
|
- Final model: `~/.companion/training/final/`
|
|
- Logs: Training loss, eval metrics
|
|
|
|
### 3. Reload Model
|
|
|
|
Hot-swap without restarting API:
|
|
|
|
```bash
|
|
python -m companion.forge.cli reload ~/.companion/training/final
|
|
```
|
|
|
|
Or via API:
|
|
|
|
```bash
|
|
curl -X POST http://localhost:7373/admin/reload-model \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model_path": "~/.companion/training/final"}'
|
|
```
|
|
|
|
## Components
|
|
|
|
### Extractor (`companion.forge.extract`)
|
|
|
|
```python
|
|
from companion.forge.extract import TrainingDataExtractor, extract_training_data
|
|
|
|
# Extract from vault
|
|
extractor = TrainingDataExtractor(config)
|
|
examples = extractor.extract()
|
|
|
|
# Get statistics
|
|
stats = extractor.get_stats()
|
|
print(f"Extracted {stats['total']} examples")
|
|
|
|
# Save to JSONL
|
|
extractor.save_to_jsonl(Path("training.jsonl"))
|
|
```
|
|
|
|
**Reflection Detection:**
|
|
|
|
- **Tags**: `#reflection`, `#learning`, `#insight`, `#decision`, `#analysis`, `#takeaway`, `#realization`
|
|
- **Patterns**: "I think", "I feel", "I realize", "I wonder", "Looking back", "On one hand...", "Ultimately decided"
|
|
|
|
### Trainer (`companion.forge.train`)
|
|
|
|
```python
|
|
from companion.forge.train import train
|
|
|
|
final_path = train(
|
|
data_path=Path("training.jsonl"),
|
|
output_dir=Path("~/.companion/training"),
|
|
base_model="unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
|
|
lora_rank=16,
|
|
lora_alpha=32,
|
|
learning_rate=2e-4,
|
|
num_epochs=3,
|
|
)
|
|
```
|
|
|
|
**Base Models:**
|
|
|
|
- `unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit` - Recommended
|
|
- `unsloth/llama-3-8b-bnb-4bit` - Alternative
|
|
|
|
**Target Modules:**
|
|
|
|
LoRA is applied to: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
|
|
|
|
### Exporter (`companion.forge.export`)
|
|
|
|
```python
|
|
from companion.forge.export import merge_only
|
|
|
|
# Merge LoRA into base model
|
|
merged_path = merge_only(
|
|
checkpoint_path=Path("~/.companion/training/checkpoint-500"),
|
|
output_path=Path("~/.companion/models/merged"),
|
|
)
|
|
```
|
|
|
|
### Reloader (`companion.forge.reload`)
|
|
|
|
```python
|
|
from companion.forge.reload import reload_model, get_model_status
|
|
|
|
# Check current model
|
|
status = get_model_status(config)
|
|
print(f"Model size: {status['size_mb']} MB")
|
|
|
|
# Reload with new model
|
|
new_path = reload_model(
|
|
config=config,
|
|
new_model_path=Path("~/.companion/training/final"),
|
|
backup=True,
|
|
)
|
|
```
|
|
|
|
## CLI Reference
|
|
|
|
```bash
|
|
# Extract training data
|
|
companion.forge.cli extract [--output PATH]
|
|
|
|
# Train model
|
|
companion.forge.cli train \
|
|
[--data PATH] \
|
|
[--output PATH] \
|
|
[--epochs N] \
|
|
[--lr FLOAT]
|
|
|
|
# Check model status
|
|
companion.forge.cli status
|
|
|
|
# Reload model
|
|
companion.forge.cli reload MODEL_PATH [--no-backup]
|
|
```
|
|
|
|
## Training Tips
|
|
|
|
**Dataset Size:**
|
|
- Minimum: 50 examples
|
|
- Optimal: 100-500 examples
|
|
- More is not always better - quality over quantity
|
|
|
|
**Epochs:**
|
|
- Start with 3 epochs
|
|
- Increase if underfitting (high loss)
|
|
- Decrease if overfitting (loss increases on eval)
|
|
|
|
**LoRA Rank:**
|
|
- `8` - Quick experiments
|
|
- `16` - Balanced (recommended)
|
|
- `32-64` - High capacity, more VRAM
|
|
|
|
**Overfitting Signs:**
|
|
- Training loss decreasing, eval loss increasing
|
|
- Model repeats exact phrases from training data
|
|
- Responses feel "memorized" not "learned"
|
|
|
|
## VRAM Usage (RTX 5070, 12GB)
|
|
|
|
| Config | VRAM | Batch Size |
|
|
|--------|------|------------|
|
|
| Rank 16, 8-bit adam | ~10GB | 4 |
|
|
| Rank 32, 8-bit adam | ~11GB | 4 |
|
|
| Rank 64, 8-bit adam | OOM | - |
|
|
|
|
Use `gradient_accumulation_steps` to increase effective batch size.
|
|
|
|
## Troubleshooting
|
|
|
|
**CUDA Out of Memory**
|
|
- Reduce `lora_rank` to 8
|
|
- Reduce `batch_size` to 2
|
|
- Increase `gradient_accumulation_steps`
|
|
|
|
**Training Loss Not Decreasing**
|
|
- Check data quality (reflections present?)
|
|
- Increase learning rate to 5e-4
|
|
- Check for data formatting issues
|
|
|
|
**Model Not Loading After Reload**
|
|
- Check path exists: `ls -la ~/.companion/models/`
|
|
- Verify model format (GGUF vs HF)
|
|
- Check API logs for errors
|
|
|
|
**Slow Training**
|
|
- Expected: ~6 hours for 3 epochs on RTX 5070
|
|
- Enable gradient checkpointing (enabled by default)
|
|
- Close other GPU applications
|
|
|
|
## Advanced: Custom Training Script
|
|
|
|
```python
|
|
# custom_train.py
|
|
from companion.forge.train import train
|
|
from companion.config import load_config
|
|
|
|
config = load_config()
|
|
|
|
final_path = train(
|
|
data_path=config.model.fine_tuning.training_data_path / "curated.jsonl",
|
|
output_dir=config.model.fine_tuning.output_dir,
|
|
base_model=config.model.fine_tuning.base_model,
|
|
lora_rank=32, # Higher capacity
|
|
lora_alpha=64,
|
|
learning_rate=3e-4, # Slightly higher
|
|
num_epochs=5, # More epochs
|
|
batch_size=2, # Smaller batches
|
|
gradient_accumulation_steps=8, # Effective batch = 16
|
|
)
|
|
|
|
print(f"Model saved to: {final_path}")
|
|
```
|