# FORGE Module Documentation

The FORGE module handles fine-tuning of the companion model. It extracts training examples from your vault reflections and trains a custom LoRA adapter using QLoRA on your local GPU.

## Architecture

```
Vault Reflections
         ↓
┌─────────────────┐
│    Extract      │  - Scan for #reflection, #insight tags
│  (extract.py)   │  - Parse reflection patterns
└────────┬────────┘
         ↓
┌─────────────────┐
│     Curate      │  - Manual review (optional)
│  (curate.py)    │  - Deduplication
└────────┬────────┘
         ↓
┌─────────────────┐
│     Train       │  - QLoRA fine-tuning
│  (train.py)     │  - Unsloth + transformers
└────────┬────────┘
         ↓
┌─────────────────┐
│     Export      │  - Merge LoRA weights
│  (export.py)    │  - Convert to GGUF
└────────┬────────┘
         ↓
┌─────────────────┐
│     Reload      │  - Hot-swap in API
│  (reload.py)    │  - No restart needed
└─────────────────┘
```

## Requirements

- **GPU**: RTX 5070 or equivalent (12GB+ VRAM)
- **Dependencies**: Install with `pip install -e ".[train]"`
- **Time**: 4-6 hours for full training run

## Workflow

### 1. Extract Training Data

Scan your vault for reflection patterns:

```bash
python -m companion.forge.cli extract
```

This scans for:
- Tags: `#reflection`, `#insight`, `#learning`, `#decision`, etc.
- Patterns: "I think", "I realize", "Looking back", "What if"
- Section headers in journal entries

Output: `~/.companion/training_data/extracted.jsonl`

**Example extracted data:**

```json
{
  "messages": [
    {"role": "system", "content": "You are a thoughtful, reflective companion."},
    {"role": "user", "content": "I'm facing a decision. How should I think through this?"},
    {"role": "assistant", "content": "#reflection I think I need to slow down..."}
  ],
  "source_file": "Journal/2026/04/2026-04-12.md",
  "tags": ["#reflection", "#DayInShort"],
  "date": "2026-04-12"
}
```

### 2. Train Model

Run QLoRA fine-tuning:

```bash
python -m companion.forge.cli train --epochs 3 --lr 2e-4
```

**Hyperparameters (from config):**

| Parameter | Default | Description |
|-----------|---------|-------------|
| `lora_rank` | 16 | LoRA rank (8-64) |
| `lora_alpha` | 32 | LoRA scaling factor |
| `learning_rate` | 2e-4 | Optimizer learning rate |
| `num_epochs` | 3 | Training epochs |
| `batch_size` | 4 | Per-device batch |
| `gradient_accumulation_steps` | 4 | Steps before update |

**Training Output:**
- Checkpoints: `~/.companion/training/checkpoint-*/`
- Final model: `~/.companion/training/final/`
- Logs: Training loss, eval metrics

### 3. Reload Model

Hot-swap without restarting API:

```bash
python -m companion.forge.cli reload ~/.companion/training/final
```

Or via API:

```bash
curl -X POST http://localhost:7373/admin/reload-model \
  -H "Content-Type: application/json" \
  -d '{"model_path": "~/.companion/training/final"}'
```

## Components

### Extractor (`companion.forge.extract`)

```python
from companion.forge.extract import TrainingDataExtractor, extract_training_data

# Extract from vault
extractor = TrainingDataExtractor(config)
examples = extractor.extract()

# Get statistics
stats = extractor.get_stats()
print(f"Extracted {stats['total']} examples")

# Save to JSONL
extractor.save_to_jsonl(Path("training.jsonl"))
```

**Reflection Detection:**

- **Tags**: `#reflection`, `#learning`, `#insight`, `#decision`, `#analysis`, `#takeaway`, `#realization`
- **Patterns**: "I think", "I feel", "I realize", "I wonder", "Looking back", "On one hand...", "Ultimately decided"

### Trainer (`companion.forge.train`)

```python
from companion.forge.train import train

final_path = train(
    data_path=Path("training.jsonl"),
    output_dir=Path("~/.companion/training"),
    base_model="unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    lora_rank=16,
    lora_alpha=32,
    learning_rate=2e-4,
    num_epochs=3,
)
```

**Base Models:**

- `unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit` - Recommended
- `unsloth/llama-3-8b-bnb-4bit` - Alternative

**Target Modules:**

LoRA is applied to: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`

### Exporter (`companion.forge.export`)

```python
from companion.forge.export import merge_only

# Merge LoRA into base model
merged_path = merge_only(
    checkpoint_path=Path("~/.companion/training/checkpoint-500"),
    output_path=Path("~/.companion/models/merged"),
)
```

### Reloader (`companion.forge.reload`)

```python
from companion.forge.reload import reload_model, get_model_status

# Check current model
status = get_model_status(config)
print(f"Model size: {status['size_mb']} MB")

# Reload with new model
new_path = reload_model(
    config=config,
    new_model_path=Path("~/.companion/training/final"),
    backup=True,
)
```

## CLI Reference

```bash
# Extract training data
python -m companion.forge.cli extract [--output PATH]

# Train model
python -m companion.forge.train \
  --data PATH \
  --output-dir PATH \
  --epochs N \
  --lr FLOAT

# Check model status
python -m companion.forge.cli status

# Reload model
python -m companion.forge.cli reload MODEL_PATH [--no-backup]
```

**Note:** Use `--output-dir` (or `--output`) to specify the training output directory.

## Training Tips

**Dataset Size:**
- Minimum: 50 examples
- Optimal: 100-500 examples
- More is not always better - quality over quantity

**Epochs:**
- Start with 3 epochs
- Increase if underfitting (high loss)
- Decrease if overfitting (loss increases on eval)

**LoRA Rank:**
- `8` - Quick experiments
- `16` - Balanced (recommended)
- `32-64` - High capacity, more VRAM

**Overfitting Signs:**
- Training loss decreasing, eval loss increasing
- Model repeats exact phrases from training data
- Responses feel "memorized" not "learned"

## VRAM Usage (RTX 5070, 12GB)

| Config | VRAM | Batch Size |
|--------|------|------------|
| Rank 16, 8-bit adam | ~10GB | 4 |
| Rank 32, 8-bit adam | ~11GB | 4 |
| Rank 64, 8-bit adam | OOM | - |

Use `gradient_accumulation_steps` to increase effective batch size.

## Troubleshooting

**GPU Not Detected / CUDA Not Available**
- See [GPU Compatibility Guide](gpu-compatibility.md)
- Common issue on RTX 50-series: Install CUDA-enabled PyTorch: `pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121`
- Verify: `python -c "import torch; print(torch.cuda.is_available())"`

**CUDA Out of Memory**
- Reduce `lora_rank` to 8
- Reduce `batch_size` to 2
- Increase `gradient_accumulation_steps`

**Training Loss Not Decreasing**
- Check data quality (reflections present?)
- Increase learning rate to 5e-4
- Check for data formatting issues

**Model Not Loading After Reload**
- Check path exists: `ls -la ~/.companion/models/`
- Verify model format (GGUF vs HF)
- Check API logs for errors

**Slow Training**
- Expected: ~6 hours for 3 epochs on RTX 5070
- Enable gradient checkpointing (enabled by default)
- Close other GPU applications

## Advanced: Custom Training Script

```python
# custom_train.py
from companion.forge.train import train
from companion.config import load_config

config = load_config()

final_path = train(
    data_path=config.model.fine_tuning.training_data_path / "curated.jsonl",
    output_dir=config.model.fine_tuning.output_dir,
    base_model=config.model.fine_tuning.base_model,
    lora_rank=32,  # Higher capacity
    lora_alpha=64,
    learning_rate=3e-4,  # Slightly higher
    num_epochs=5,  # More epochs
    batch_size=2,  # Smaller batches
    gradient_accumulation_steps=8,  # Effective batch = 16
)

print(f"Model saved to: {final_path}")
```