Files

Santhosh Janardhanan e77fa69b31 docs: add comprehensive README and module documentation

2026-04-13 15:35:22 -04:00

7.4 KiB

Raw Blame History

FORGE Module Documentation

The FORGE module handles fine-tuning of the companion model. It extracts training examples from your vault reflections and trains a custom LoRA adapter using QLoRA on your local GPU.

Architecture

Vault Reflections
         ↓
┌─────────────────┐
│    Extract      │  - Scan for #reflection, #insight tags
│  (extract.py)   │  - Parse reflection patterns
└────────┬────────┘
         ↓
┌─────────────────┐
│     Curate      │  - Manual review (optional)
│  (curate.py)    │  - Deduplication
└────────┬────────┘
         ↓
┌─────────────────┐
│     Train       │  - QLoRA fine-tuning
│  (train.py)     │  - Unsloth + transformers
└────────┬────────┘
         ↓
┌─────────────────┐
│     Export      │  - Merge LoRA weights
│  (export.py)    │  - Convert to GGUF
└────────┬────────┘
         ↓
┌─────────────────┐
│     Reload      │  - Hot-swap in API
│  (reload.py)    │  - No restart needed
└─────────────────┘

Requirements

GPU: RTX 5070 or equivalent (12GB+ VRAM)
Dependencies: Install with pip install -e ".[train]"
Time: 4-6 hours for full training run

Workflow

1. Extract Training Data

Scan your vault for reflection patterns:

python -m companion.forge.cli extract

This scans for:

Tags: #reflection, #insight, #learning, #decision, etc.
Patterns: "I think", "I realize", "Looking back", "What if"
Section headers in journal entries

Output: ~/.companion/training_data/extracted.jsonl

Example extracted data:

{
  "messages": [
    {"role": "system", "content": "You are a thoughtful, reflective companion."},
    {"role": "user", "content": "I'm facing a decision. How should I think through this?"},
    {"role": "assistant", "content": "#reflection I think I need to slow down..."}
  ],
  "source_file": "Journal/2026/04/2026-04-12.md",
  "tags": ["#reflection", "#DayInShort"],
  "date": "2026-04-12"
}

2. Train Model

Run QLoRA fine-tuning:

python -m companion.forge.cli train --epochs 3 --lr 2e-4

Hyperparameters (from config):

Parameter	Default	Description
`lora_rank`	16	LoRA rank (8-64)
`lora_alpha`	32	LoRA scaling factor
`learning_rate`	2e-4	Optimizer learning rate
`num_epochs`	3	Training epochs
`batch_size`	4	Per-device batch
`gradient_accumulation_steps`	4	Steps before update

Training Output:

Checkpoints: ~/.companion/training/checkpoint-*/
Final model: ~/.companion/training/final/
Logs: Training loss, eval metrics

3. Reload Model

Hot-swap without restarting API:

python -m companion.forge.cli reload ~/.companion/training/final

Or via API:

curl -X POST http://localhost:7373/admin/reload-model \
  -H "Content-Type: application/json" \
  -d '{"model_path": "~/.companion/training/final"}'

Components

Extractor (`companion.forge.extract`)

from companion.forge.extract import TrainingDataExtractor, extract_training_data

# Extract from vault
extractor = TrainingDataExtractor(config)
examples = extractor.extract()

# Get statistics
stats = extractor.get_stats()
print(f"Extracted {stats['total']} examples")

# Save to JSONL
extractor.save_to_jsonl(Path("training.jsonl"))

Reflection Detection:

Tags: #reflection, #learning, #insight, #decision, #analysis, #takeaway, #realization
Patterns: "I think", "I feel", "I realize", "I wonder", "Looking back", "On one hand...", "Ultimately decided"

Trainer (`companion.forge.train`)

from companion.forge.train import train

final_path = train(
    data_path=Path("training.jsonl"),
    output_dir=Path("~/.companion/training"),
    base_model="unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    lora_rank=16,
    lora_alpha=32,
    learning_rate=2e-4,
    num_epochs=3,
)

Base Models:

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit - Recommended
unsloth/llama-3-8b-bnb-4bit - Alternative

Target Modules:

LoRA is applied to: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Exporter (`companion.forge.export`)

from companion.forge.export import merge_only

# Merge LoRA into base model
merged_path = merge_only(
    checkpoint_path=Path("~/.companion/training/checkpoint-500"),
    output_path=Path("~/.companion/models/merged"),
)

Reloader (`companion.forge.reload`)

from companion.forge.reload import reload_model, get_model_status

# Check current model
status = get_model_status(config)
print(f"Model size: {status['size_mb']} MB")

# Reload with new model
new_path = reload_model(
    config=config,
    new_model_path=Path("~/.companion/training/final"),
    backup=True,
)

CLI Reference

# Extract training data
companion.forge.cli extract [--output PATH]

# Train model
companion.forge.cli train \
  [--data PATH] \
  [--output PATH] \
  [--epochs N] \
  [--lr FLOAT]

# Check model status
companion.forge.cli status

# Reload model
companion.forge.cli reload MODEL_PATH [--no-backup]

Training Tips

Dataset Size:

Minimum: 50 examples
Optimal: 100-500 examples
More is not always better - quality over quantity

Epochs:

Start with 3 epochs
Increase if underfitting (high loss)
Decrease if overfitting (loss increases on eval)

LoRA Rank:

8 - Quick experiments
16 - Balanced (recommended)
32-64 - High capacity, more VRAM

Overfitting Signs:

Training loss decreasing, eval loss increasing
Model repeats exact phrases from training data
Responses feel "memorized" not "learned"

VRAM Usage (RTX 5070, 12GB)

Config	VRAM	Batch Size
Rank 16, 8-bit adam	~10GB	4
Rank 32, 8-bit adam	~11GB	4
Rank 64, 8-bit adam	OOM	-

Use gradient_accumulation_steps to increase effective batch size.

Troubleshooting

CUDA Out of Memory

Reduce lora_rank to 8
Reduce batch_size to 2
Increase gradient_accumulation_steps

Training Loss Not Decreasing

Check data quality (reflections present?)
Increase learning rate to 5e-4
Check for data formatting issues

Model Not Loading After Reload

Check path exists: ls -la ~/.companion/models/
Verify model format (GGUF vs HF)
Check API logs for errors

Slow Training

Expected: ~6 hours for 3 epochs on RTX 5070
Enable gradient checkpointing (enabled by default)
Close other GPU applications

Advanced: Custom Training Script

# custom_train.py
from companion.forge.train import train
from companion.config import load_config

config = load_config()

final_path = train(
    data_path=config.model.fine_tuning.training_data_path / "curated.jsonl",
    output_dir=config.model.fine_tuning.output_dir,
    base_model=config.model.fine_tuning.base_model,
    lora_rank=32,  # Higher capacity
    lora_alpha=64,
    learning_rate=3e-4,  # Slightly higher
    num_epochs=5,  # More epochs
    batch_size=2,  # Smaller batches
    gradient_accumulation_steps=8,  # Effective batch = 16
)

print(f"Model saved to: {final_path}")

7.4 KiB Raw Blame History