docs: add comprehensive README and module documentation

2026-04-13 15:35:22 -04:00
parent 47ac2f36e0
commit e77fa69b31
6 changed files with 2117 additions and 0 deletions
--- a/docs/forge.md
+++ b/docs/forge.md
@@ -0,0 +1,288 @@
+# FORGE Module Documentation
+
+The FORGE module handles fine-tuning of the companion model. It extracts training examples from your vault reflections and trains a custom LoRA adapter using QLoRA on your local GPU.
+
+## Architecture
+
+```
+Vault Reflections
+         ↓
+┌─────────────────┐
+│    Extract      │  - Scan for #reflection, #insight tags
+│  (extract.py)   │  - Parse reflection patterns
+└────────┬────────┘
+         ↓
+┌─────────────────┐
+│     Curate      │  - Manual review (optional)
+│  (curate.py)    │  - Deduplication
+└────────┬────────┘
+         ↓
+┌─────────────────┐
+│     Train       │  - QLoRA fine-tuning
+│  (train.py)     │  - Unsloth + transformers
+└────────┬────────┘
+         ↓
+┌─────────────────┐
+│     Export      │  - Merge LoRA weights
+│  (export.py)    │  - Convert to GGUF
+└────────┬────────┘
+         ↓
+┌─────────────────┐
+│     Reload      │  - Hot-swap in API
+│  (reload.py)    │  - No restart needed
+└─────────────────┘
+```
+
+## Requirements
+
+- **GPU**: RTX 5070 or equivalent (12GB+ VRAM)
+- **Dependencies**: Install with `pip install -e ".[train]"`
+- **Time**: 4-6 hours for full training run
+
+## Workflow
+
+### 1. Extract Training Data
+
+Scan your vault for reflection patterns:
+
+```bash
+python -m companion.forge.cli extract
+```
+
+This scans for:
+- Tags: `#reflection`, `#insight`, `#learning`, `#decision`, etc.
+- Patterns: "I think", "I realize", "Looking back", "What if"
+- Section headers in journal entries
+
+Output: `~/.companion/training_data/extracted.jsonl`
+
+**Example extracted data:**
+
+```json
+{
+  "messages": [
+    {"role": "system", "content": "You are a thoughtful, reflective companion."},
+    {"role": "user", "content": "I'm facing a decision. How should I think through this?"},
+    {"role": "assistant", "content": "#reflection I think I need to slow down..."}
+  ],
+  "source_file": "Journal/2026/04/2026-04-12.md",
+  "tags": ["#reflection", "#DayInShort"],
+  "date": "2026-04-12"
+}
+```
+
+### 2. Train Model
+
+Run QLoRA fine-tuning:
+
+```bash
+python -m companion.forge.cli train --epochs 3 --lr 2e-4
+```
+
+**Hyperparameters (from config):**
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `lora_rank` | 16 | LoRA rank (8-64) |
+| `lora_alpha` | 32 | LoRA scaling factor |
+| `learning_rate` | 2e-4 | Optimizer learning rate |
+| `num_epochs` | 3 | Training epochs |
+| `batch_size` | 4 | Per-device batch |
+| `gradient_accumulation_steps` | 4 | Steps before update |
+
+**Training Output:**
+- Checkpoints: `~/.companion/training/checkpoint-*/`
+- Final model: `~/.companion/training/final/`
+- Logs: Training loss, eval metrics
+
+### 3. Reload Model
+
+Hot-swap without restarting API:
+
+```bash
+python -m companion.forge.cli reload ~/.companion/training/final
+```
+
+Or via API:
+
+```bash
+curl -X POST http://localhost:7373/admin/reload-model \
+  -H "Content-Type: application/json" \
+  -d '{"model_path": "~/.companion/training/final"}'
+```
+
+## Components
+
+### Extractor (`companion.forge.extract`)
+
+```python
+from companion.forge.extract import TrainingDataExtractor, extract_training_data
+
+# Extract from vault
+extractor = TrainingDataExtractor(config)
+examples = extractor.extract()
+
+# Get statistics
+stats = extractor.get_stats()
+print(f"Extracted {stats['total']} examples")
+
+# Save to JSONL
+extractor.save_to_jsonl(Path("training.jsonl"))
+```
+
+**Reflection Detection:**
+
+- **Tags**: `#reflection`, `#learning`, `#insight`, `#decision`, `#analysis`, `#takeaway`, `#realization`
+- **Patterns**: "I think", "I feel", "I realize", "I wonder", "Looking back", "On one hand...", "Ultimately decided"
+
+### Trainer (`companion.forge.train`)
+
+```python
+from companion.forge.train import train
+
+final_path = train(
+    data_path=Path("training.jsonl"),
+    output_dir=Path("~/.companion/training"),
+    base_model="unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
+    lora_rank=16,
+    lora_alpha=32,
+    learning_rate=2e-4,
+    num_epochs=3,
+)
+```
+
+**Base Models:**
+
+- `unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit` - Recommended
+- `unsloth/llama-3-8b-bnb-4bit` - Alternative
+
+**Target Modules:**
+
+LoRA is applied to: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
+
+### Exporter (`companion.forge.export`)
+
+```python
+from companion.forge.export import merge_only
+
+# Merge LoRA into base model
+merged_path = merge_only(
+    checkpoint_path=Path("~/.companion/training/checkpoint-500"),
+    output_path=Path("~/.companion/models/merged"),
+)
+```
+
+### Reloader (`companion.forge.reload`)
+
+```python
+from companion.forge.reload import reload_model, get_model_status
+
+# Check current model
+status = get_model_status(config)
+print(f"Model size: {status['size_mb']} MB")
+
+# Reload with new model
+new_path = reload_model(
+    config=config,
+    new_model_path=Path("~/.companion/training/final"),
+    backup=True,
+)
+```
+
+## CLI Reference
+
+```bash
+# Extract training data
+companion.forge.cli extract [--output PATH]
+
+# Train model
+companion.forge.cli train \
+  [--data PATH] \
+  [--output PATH] \
+  [--epochs N] \
+  [--lr FLOAT]
+
+# Check model status
+companion.forge.cli status
+
+# Reload model
+companion.forge.cli reload MODEL_PATH [--no-backup]
+```
+
+## Training Tips
+
+**Dataset Size:**
+- Minimum: 50 examples
+- Optimal: 100-500 examples
+- More is not always better - quality over quantity
+
+**Epochs:**
+- Start with 3 epochs
+- Increase if underfitting (high loss)
+- Decrease if overfitting (loss increases on eval)
+
+**LoRA Rank:**
+- `8` - Quick experiments
+- `16` - Balanced (recommended)
+- `32-64` - High capacity, more VRAM
+
+**Overfitting Signs:**
+- Training loss decreasing, eval loss increasing
+- Model repeats exact phrases from training data
+- Responses feel "memorized" not "learned"
+
+## VRAM Usage (RTX 5070, 12GB)
+
+| Config | VRAM | Batch Size |
+|--------|------|------------|
+| Rank 16, 8-bit adam | ~10GB | 4 |
+| Rank 32, 8-bit adam | ~11GB | 4 |
+| Rank 64, 8-bit adam | OOM | - |
+
+Use `gradient_accumulation_steps` to increase effective batch size.
+
+## Troubleshooting
+
+**CUDA Out of Memory**
+- Reduce `lora_rank` to 8
+- Reduce `batch_size` to 2
+- Increase `gradient_accumulation_steps`
+
+**Training Loss Not Decreasing**
+- Check data quality (reflections present?)
+- Increase learning rate to 5e-4
+- Check for data formatting issues
+
+**Model Not Loading After Reload**
+- Check path exists: `ls -la ~/.companion/models/`
+- Verify model format (GGUF vs HF)
+- Check API logs for errors
+
+**Slow Training**
+- Expected: ~6 hours for 3 epochs on RTX 5070
+- Enable gradient checkpointing (enabled by default)
+- Close other GPU applications
+
+## Advanced: Custom Training Script
+
+```python
+# custom_train.py
+from companion.forge.train import train
+from companion.config import load_config
+
+config = load_config()
+
+final_path = train(
+    data_path=config.model.fine_tuning.training_data_path / "curated.jsonl",
+    output_dir=config.model.fine_tuning.output_dir,
+    base_model=config.model.fine_tuning.base_model,
+    lora_rank=32,  # Higher capacity
+    lora_alpha=64,
+    learning_rate=3e-4,  # Slightly higher
+    num_epochs=5,  # More epochs
+    batch_size=2,  # Smaller batches
+    gradient_accumulation_steps=8,  # Effective batch = 16
+)
+
+print(f"Model saved to: {final_path}")
+```