docs: add comprehensive README and module documentation

This commit is contained in:
2026-04-13 15:35:22 -04:00
parent 47ac2f36e0
commit e77fa69b31
6 changed files with 2117 additions and 0 deletions

288
docs/forge.md Normal file
View File

@@ -0,0 +1,288 @@
# FORGE Module Documentation
The FORGE module handles fine-tuning of the companion model. It extracts training examples from your vault reflections and trains a custom LoRA adapter using QLoRA on your local GPU.
## Architecture
```
Vault Reflections
┌─────────────────┐
│ Extract │ - Scan for #reflection, #insight tags
│ (extract.py) │ - Parse reflection patterns
└────────┬────────┘
┌─────────────────┐
│ Curate │ - Manual review (optional)
│ (curate.py) │ - Deduplication
└────────┬────────┘
┌─────────────────┐
│ Train │ - QLoRA fine-tuning
│ (train.py) │ - Unsloth + transformers
└────────┬────────┘
┌─────────────────┐
│ Export │ - Merge LoRA weights
│ (export.py) │ - Convert to GGUF
└────────┬────────┘
┌─────────────────┐
│ Reload │ - Hot-swap in API
│ (reload.py) │ - No restart needed
└─────────────────┘
```
## Requirements
- **GPU**: RTX 5070 or equivalent (12GB+ VRAM)
- **Dependencies**: Install with `pip install -e ".[train]"`
- **Time**: 4-6 hours for full training run
## Workflow
### 1. Extract Training Data
Scan your vault for reflection patterns:
```bash
python -m companion.forge.cli extract
```
This scans for:
- Tags: `#reflection`, `#insight`, `#learning`, `#decision`, etc.
- Patterns: "I think", "I realize", "Looking back", "What if"
- Section headers in journal entries
Output: `~/.companion/training_data/extracted.jsonl`
**Example extracted data:**
```json
{
"messages": [
{"role": "system", "content": "You are a thoughtful, reflective companion."},
{"role": "user", "content": "I'm facing a decision. How should I think through this?"},
{"role": "assistant", "content": "#reflection I think I need to slow down..."}
],
"source_file": "Journal/2026/04/2026-04-12.md",
"tags": ["#reflection", "#DayInShort"],
"date": "2026-04-12"
}
```
### 2. Train Model
Run QLoRA fine-tuning:
```bash
python -m companion.forge.cli train --epochs 3 --lr 2e-4
```
**Hyperparameters (from config):**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `lora_rank` | 16 | LoRA rank (8-64) |
| `lora_alpha` | 32 | LoRA scaling factor |
| `learning_rate` | 2e-4 | Optimizer learning rate |
| `num_epochs` | 3 | Training epochs |
| `batch_size` | 4 | Per-device batch |
| `gradient_accumulation_steps` | 4 | Steps before update |
**Training Output:**
- Checkpoints: `~/.companion/training/checkpoint-*/`
- Final model: `~/.companion/training/final/`
- Logs: Training loss, eval metrics
### 3. Reload Model
Hot-swap without restarting API:
```bash
python -m companion.forge.cli reload ~/.companion/training/final
```
Or via API:
```bash
curl -X POST http://localhost:7373/admin/reload-model \
-H "Content-Type: application/json" \
-d '{"model_path": "~/.companion/training/final"}'
```
## Components
### Extractor (`companion.forge.extract`)
```python
from companion.forge.extract import TrainingDataExtractor, extract_training_data
# Extract from vault
extractor = TrainingDataExtractor(config)
examples = extractor.extract()
# Get statistics
stats = extractor.get_stats()
print(f"Extracted {stats['total']} examples")
# Save to JSONL
extractor.save_to_jsonl(Path("training.jsonl"))
```
**Reflection Detection:**
- **Tags**: `#reflection`, `#learning`, `#insight`, `#decision`, `#analysis`, `#takeaway`, `#realization`
- **Patterns**: "I think", "I feel", "I realize", "I wonder", "Looking back", "On one hand...", "Ultimately decided"
### Trainer (`companion.forge.train`)
```python
from companion.forge.train import train
final_path = train(
data_path=Path("training.jsonl"),
output_dir=Path("~/.companion/training"),
base_model="unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
lora_rank=16,
lora_alpha=32,
learning_rate=2e-4,
num_epochs=3,
)
```
**Base Models:**
- `unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit` - Recommended
- `unsloth/llama-3-8b-bnb-4bit` - Alternative
**Target Modules:**
LoRA is applied to: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
### Exporter (`companion.forge.export`)
```python
from companion.forge.export import merge_only
# Merge LoRA into base model
merged_path = merge_only(
checkpoint_path=Path("~/.companion/training/checkpoint-500"),
output_path=Path("~/.companion/models/merged"),
)
```
### Reloader (`companion.forge.reload`)
```python
from companion.forge.reload import reload_model, get_model_status
# Check current model
status = get_model_status(config)
print(f"Model size: {status['size_mb']} MB")
# Reload with new model
new_path = reload_model(
config=config,
new_model_path=Path("~/.companion/training/final"),
backup=True,
)
```
## CLI Reference
```bash
# Extract training data
companion.forge.cli extract [--output PATH]
# Train model
companion.forge.cli train \
[--data PATH] \
[--output PATH] \
[--epochs N] \
[--lr FLOAT]
# Check model status
companion.forge.cli status
# Reload model
companion.forge.cli reload MODEL_PATH [--no-backup]
```
## Training Tips
**Dataset Size:**
- Minimum: 50 examples
- Optimal: 100-500 examples
- More is not always better - quality over quantity
**Epochs:**
- Start with 3 epochs
- Increase if underfitting (high loss)
- Decrease if overfitting (loss increases on eval)
**LoRA Rank:**
- `8` - Quick experiments
- `16` - Balanced (recommended)
- `32-64` - High capacity, more VRAM
**Overfitting Signs:**
- Training loss decreasing, eval loss increasing
- Model repeats exact phrases from training data
- Responses feel "memorized" not "learned"
## VRAM Usage (RTX 5070, 12GB)
| Config | VRAM | Batch Size |
|--------|------|------------|
| Rank 16, 8-bit adam | ~10GB | 4 |
| Rank 32, 8-bit adam | ~11GB | 4 |
| Rank 64, 8-bit adam | OOM | - |
Use `gradient_accumulation_steps` to increase effective batch size.
## Troubleshooting
**CUDA Out of Memory**
- Reduce `lora_rank` to 8
- Reduce `batch_size` to 2
- Increase `gradient_accumulation_steps`
**Training Loss Not Decreasing**
- Check data quality (reflections present?)
- Increase learning rate to 5e-4
- Check for data formatting issues
**Model Not Loading After Reload**
- Check path exists: `ls -la ~/.companion/models/`
- Verify model format (GGUF vs HF)
- Check API logs for errors
**Slow Training**
- Expected: ~6 hours for 3 epochs on RTX 5070
- Enable gradient checkpointing (enabled by default)
- Close other GPU applications
## Advanced: Custom Training Script
```python
# custom_train.py
from companion.forge.train import train
from companion.config import load_config
config = load_config()
final_path = train(
data_path=config.model.fine_tuning.training_data_path / "curated.jsonl",
output_dir=config.model.fine_tuning.output_dir,
base_model=config.model.fine_tuning.base_model,
lora_rank=32, # Higher capacity
lora_alpha=64,
learning_rate=3e-4, # Slightly higher
num_epochs=5, # More epochs
batch_size=2, # Smaller batches
gradient_accumulation_steps=8, # Effective batch = 16
)
print(f"Model saved to: {final_path}")
```