Files
kv-ai/docs/forge.md

7.4 KiB

FORGE Module Documentation

The FORGE module handles fine-tuning of the companion model. It extracts training examples from your vault reflections and trains a custom LoRA adapter using QLoRA on your local GPU.

Architecture

Vault Reflections
         ↓
┌─────────────────┐
│    Extract      │  - Scan for #reflection, #insight tags
│  (extract.py)   │  - Parse reflection patterns
└────────┬────────┘
         ↓
┌─────────────────┐
│     Curate      │  - Manual review (optional)
│  (curate.py)    │  - Deduplication
└────────┬────────┘
         ↓
┌─────────────────┐
│     Train       │  - QLoRA fine-tuning
│  (train.py)     │  - Unsloth + transformers
└────────┬────────┘
         ↓
┌─────────────────┐
│     Export      │  - Merge LoRA weights
│  (export.py)    │  - Convert to GGUF
└────────┬────────┘
         ↓
┌─────────────────┐
│     Reload      │  - Hot-swap in API
│  (reload.py)    │  - No restart needed
└─────────────────┘

Requirements

  • GPU: RTX 5070 or equivalent (12GB+ VRAM)
  • Dependencies: Install with pip install -e ".[train]"
  • Time: 4-6 hours for full training run

Workflow

1. Extract Training Data

Scan your vault for reflection patterns:

python -m companion.forge.cli extract

This scans for:

  • Tags: #reflection, #insight, #learning, #decision, etc.
  • Patterns: "I think", "I realize", "Looking back", "What if"
  • Section headers in journal entries

Output: ~/.companion/training_data/extracted.jsonl

Example extracted data:

{
  "messages": [
    {"role": "system", "content": "You are a thoughtful, reflective companion."},
    {"role": "user", "content": "I'm facing a decision. How should I think through this?"},
    {"role": "assistant", "content": "#reflection I think I need to slow down..."}
  ],
  "source_file": "Journal/2026/04/2026-04-12.md",
  "tags": ["#reflection", "#DayInShort"],
  "date": "2026-04-12"
}

2. Train Model

Run QLoRA fine-tuning:

python -m companion.forge.cli train --epochs 3 --lr 2e-4

Hyperparameters (from config):

Parameter Default Description
lora_rank 16 LoRA rank (8-64)
lora_alpha 32 LoRA scaling factor
learning_rate 2e-4 Optimizer learning rate
num_epochs 3 Training epochs
batch_size 4 Per-device batch
gradient_accumulation_steps 4 Steps before update

Training Output:

  • Checkpoints: ~/.companion/training/checkpoint-*/
  • Final model: ~/.companion/training/final/
  • Logs: Training loss, eval metrics

3. Reload Model

Hot-swap without restarting API:

python -m companion.forge.cli reload ~/.companion/training/final

Or via API:

curl -X POST http://localhost:7373/admin/reload-model \
  -H "Content-Type: application/json" \
  -d '{"model_path": "~/.companion/training/final"}'

Components

Extractor (companion.forge.extract)

from companion.forge.extract import TrainingDataExtractor, extract_training_data

# Extract from vault
extractor = TrainingDataExtractor(config)
examples = extractor.extract()

# Get statistics
stats = extractor.get_stats()
print(f"Extracted {stats['total']} examples")

# Save to JSONL
extractor.save_to_jsonl(Path("training.jsonl"))

Reflection Detection:

  • Tags: #reflection, #learning, #insight, #decision, #analysis, #takeaway, #realization
  • Patterns: "I think", "I feel", "I realize", "I wonder", "Looking back", "On one hand...", "Ultimately decided"

Trainer (companion.forge.train)

from companion.forge.train import train

final_path = train(
    data_path=Path("training.jsonl"),
    output_dir=Path("~/.companion/training"),
    base_model="unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    lora_rank=16,
    lora_alpha=32,
    learning_rate=2e-4,
    num_epochs=3,
)

Base Models:

  • unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit - Recommended
  • unsloth/llama-3-8b-bnb-4bit - Alternative

Target Modules:

LoRA is applied to: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Exporter (companion.forge.export)

from companion.forge.export import merge_only

# Merge LoRA into base model
merged_path = merge_only(
    checkpoint_path=Path("~/.companion/training/checkpoint-500"),
    output_path=Path("~/.companion/models/merged"),
)

Reloader (companion.forge.reload)

from companion.forge.reload import reload_model, get_model_status

# Check current model
status = get_model_status(config)
print(f"Model size: {status['size_mb']} MB")

# Reload with new model
new_path = reload_model(
    config=config,
    new_model_path=Path("~/.companion/training/final"),
    backup=True,
)

CLI Reference

# Extract training data
companion.forge.cli extract [--output PATH]

# Train model
companion.forge.cli train \
  [--data PATH] \
  [--output PATH] \
  [--epochs N] \
  [--lr FLOAT]

# Check model status
companion.forge.cli status

# Reload model
companion.forge.cli reload MODEL_PATH [--no-backup]

Training Tips

Dataset Size:

  • Minimum: 50 examples
  • Optimal: 100-500 examples
  • More is not always better - quality over quantity

Epochs:

  • Start with 3 epochs
  • Increase if underfitting (high loss)
  • Decrease if overfitting (loss increases on eval)

LoRA Rank:

  • 8 - Quick experiments
  • 16 - Balanced (recommended)
  • 32-64 - High capacity, more VRAM

Overfitting Signs:

  • Training loss decreasing, eval loss increasing
  • Model repeats exact phrases from training data
  • Responses feel "memorized" not "learned"

VRAM Usage (RTX 5070, 12GB)

Config VRAM Batch Size
Rank 16, 8-bit adam ~10GB 4
Rank 32, 8-bit adam ~11GB 4
Rank 64, 8-bit adam OOM -

Use gradient_accumulation_steps to increase effective batch size.

Troubleshooting

CUDA Out of Memory

  • Reduce lora_rank to 8
  • Reduce batch_size to 2
  • Increase gradient_accumulation_steps

Training Loss Not Decreasing

  • Check data quality (reflections present?)
  • Increase learning rate to 5e-4
  • Check for data formatting issues

Model Not Loading After Reload

  • Check path exists: ls -la ~/.companion/models/
  • Verify model format (GGUF vs HF)
  • Check API logs for errors

Slow Training

  • Expected: ~6 hours for 3 epochs on RTX 5070
  • Enable gradient checkpointing (enabled by default)
  • Close other GPU applications

Advanced: Custom Training Script

# custom_train.py
from companion.forge.train import train
from companion.config import load_config

config = load_config()

final_path = train(
    data_path=config.model.fine_tuning.training_data_path / "curated.jsonl",
    output_dir=config.model.fine_tuning.output_dir,
    base_model=config.model.fine_tuning.base_model,
    lora_rank=32,  # Higher capacity
    lora_alpha=64,
    learning_rate=3e-4,  # Slightly higher
    num_epochs=5,  # More epochs
    batch_size=2,  # Smaller batches
    gradient_accumulation_steps=8,  # Effective batch = 16
)

print(f"Model saved to: {final_path}")