docs: add comprehensive README and module documentation

This commit is contained in:
2026-04-13 15:35:22 -04:00
parent 47ac2f36e0
commit e77fa69b31
6 changed files with 2117 additions and 0 deletions

207
README.md Normal file
View File

@@ -0,0 +1,207 @@
# Personal Companion AI
A fully local, privacy-first AI companion trained on your Obsidian vault. Combines fine-tuned reasoning with RAG-powered memory to answer questions about your life, relationships, and experiences.
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Personal Companion AI │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌─────────────────┐ ┌──────────┐ │
│ │ React UI │◄──►│ FastAPI │◄──►│ Ollama │ │
│ │ (Vite) │ │ Backend │ │ Models │ │
│ └──────────────┘ └─────────────────┘ └──────────┘ │
│ │ │
│ ┌─────────────────────┼─────────────────────┐ │
│ ↓ ↓ ↓ │
│ ┌──────────────┐ ┌─────────────────┐ ┌──────────┐ │
│ │ Fine-tuned │ │ RAG Engine │ │ Vault │ │
│ │ 7B Model │ │ (LanceDB) │ │ Indexer │ │
│ │ │ │ │ │ │ │
│ │ Quarterly │ │ • semantic │ │ • watch │ │
│ │ retrain │ │ search │ │ • chunk │ │
│ │ │ │ • hybrid │ │ • embed │ │
│ │ │ │ filters │ │ │ │
│ │ │ │ • relationship │ │ Daily │ │
│ │ │ │ graph │ │ auto-sync│ │
│ └──────────────┘ └─────────────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
## Quick Start
### Prerequisites
- Python 3.11+
- Node.js 18+ (for UI)
- Ollama running locally
- RTX 5070 or equivalent (12GB+ VRAM for fine-tuning)
### Installation
```bash
# Clone and setup
cd kv-rag
pip install -e ".[dev]"
# Install UI dependencies
cd ui && npm install && cd ..
# Pull required Ollama models
ollama pull mxbai-embed-large
ollama pull llama3.1:8b
```
### Configuration
Copy `config.json` and customize:
```json
{
"vault": {
"path": "/path/to/your/obsidian/vault"
},
"companion": {
"name": "SAN"
}
}
```
See [docs/config.md](docs/config.md) for full configuration reference.
### Running
**Terminal 1 - Backend:**
```bash
python -m uvicorn companion.api:app --host 0.0.0.0 --port 7373
```
**Terminal 2 - Frontend:**
```bash
cd ui && npm run dev
```
**Terminal 3 - Indexer (optional):**
```bash
# One-time full index
python -m companion.indexer_daemon.cli index
# Or continuous file watching
python -m companion.indexer_daemon.watcher
```
Open http://localhost:5173
## Usage
### Chat Interface
Type messages naturally. The companion will:
- Retrieve relevant context from your vault
- Reference past events, relationships, decisions
- Provide reflective, companion-style responses
### Indexing Your Vault
```bash
# Full reindex
python -m companion.indexer_daemon.cli index
# Incremental sync
python -m companion.indexer_daemon.cli sync
# Check status
python -m companion.indexer_daemon.cli status
```
### Fine-Tuning (Optional)
Train a custom model that reasons like you:
```bash
# Extract training examples from vault reflections
python -m companion.forge.cli extract
# Train with QLoRA (4-6 hours on RTX 5070)
python -m companion.forge.cli train --epochs 3
# Reload the fine-tuned model
python -m companion.forge.cli reload ~/.companion/training/final
```
## Modules
| Module | Purpose | Documentation |
|--------|---------|---------------|
| `companion.config` | Configuration management | [docs/config.md](docs/config.md) |
| `companion.rag` | RAG engine (chunk, embed, search) | [docs/rag.md](docs/rag.md) |
| `companion.forge` | Fine-tuning pipeline | [docs/forge.md](docs/forge.md) |
| `companion.api` | FastAPI backend | [docs/api.md](docs/api.md) |
| `ui/` | React frontend | [docs/ui.md](docs/ui.md) |
## Project Structure
```
kv-rag/
├── companion/ # Python backend
│ ├── __init__.py
│ ├── api.py # FastAPI app
│ ├── config.py # Configuration
│ ├── memory.py # Session memory (SQLite)
│ ├── orchestrator.py # Chat orchestration
│ ├── prompts.py # Prompt templates
│ ├── rag/ # RAG modules
│ │ ├── chunker.py
│ │ ├── embedder.py
│ │ ├── indexer.py
│ │ ├── search.py
│ │ └── vector_store.py
│ ├── forge/ # Fine-tuning
│ │ ├── extract.py
│ │ ├── train.py
│ │ ├── export.py
│ │ └── reload.py
│ └── indexer_daemon/ # File watching
│ ├── cli.py
│ └── watcher.py
├── ui/ # React frontend
│ ├── src/
│ │ ├── App.tsx
│ │ ├── components/
│ │ └── hooks/
│ └── package.json
├── tests/ # Test suite
├── config.json # Configuration file
├── docs/ # Documentation
└── README.md
```
## Testing
```bash
# Run all tests
pytest tests/ -v
# Run specific module
pytest tests/test_chunker.py -v
```
## Privacy & Security
- **Fully Local**: No data leaves your machine
- **Vault Data**: Never sent to external APIs for training
- **Config**: `local_only: true` blocks external API calls
- **Sensitive Tags**: Configurable patterns for health, finance, etc.
## License
MIT License - See LICENSE file
## Acknowledgments
- Built with [Unsloth](https://github.com/unslothai/unsloth) for efficient fine-tuning
- Uses [LanceDB](https://lancedb.github.io/) for vector storage
- UI inspired by [Obsidian](https://obsidian.md/) aesthetics

667
config-schema.json Normal file
View File

@@ -0,0 +1,667 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://companion.ai/config-schema.json",
"title": "Companion AI Configuration",
"description": "Configuration schema for Personal Companion AI",
"type": "object",
"required": ["companion", "vault", "rag", "model", "api", "ui", "logging", "security"],
"properties": {
"companion": {
"type": "object",
"title": "Companion Settings",
"required": ["name", "persona", "memory", "chat"],
"properties": {
"name": {
"type": "string",
"description": "Display name for the companion",
"default": "SAN"
},
"persona": {
"type": "object",
"required": ["role", "tone", "style", "boundaries"],
"properties": {
"role": {
"type": "string",
"description": "Role of the companion",
"enum": ["companion", "advisor", "reflector"],
"default": "companion"
},
"tone": {
"type": "string",
"description": "Communication tone",
"enum": ["reflective", "supportive", "analytical", "mixed"],
"default": "reflective"
},
"style": {
"type": "string",
"description": "Interaction style",
"enum": ["questioning", "supportive", "direct", "mixed"],
"default": "questioning"
},
"boundaries": {
"type": "array",
"description": "Behavioral guardrails",
"items": {
"type": "string",
"enum": [
"does_not_impersonate_user",
"no_future_predictions",
"no_medical_or_legal_advice"
]
},
"default": ["does_not_impersonate_user", "no_future_predictions", "no_medical_or_legal_advice"]
}
}
},
"memory": {
"type": "object",
"required": ["session_turns", "persistent_store", "summarize_after"],
"properties": {
"session_turns": {
"type": "integer",
"description": "Messages to keep in context",
"minimum": 1,
"maximum": 100,
"default": 20
},
"persistent_store": {
"type": "string",
"description": "SQLite database path",
"default": "~/.companion/memory.db"
},
"summarize_after": {
"type": "integer",
"description": "Summarize history after N turns",
"minimum": 5,
"maximum": 50,
"default": 10
}
}
},
"chat": {
"type": "object",
"required": ["streaming", "max_response_tokens", "default_temperature", "allow_temperature_override"],
"properties": {
"streaming": {
"type": "boolean",
"description": "Stream responses in real-time",
"default": true
},
"max_response_tokens": {
"type": "integer",
"description": "Max tokens per response",
"minimum": 256,
"maximum": 8192,
"default": 2048
},
"default_temperature": {
"type": "number",
"description": "Creativity level (0.0=deterministic, 2.0=creative)",
"minimum": 0.0,
"maximum": 2.0,
"default": 0.7
},
"allow_temperature_override": {
"type": "boolean",
"description": "Let users adjust temperature",
"default": true
}
}
}
}
},
"vault": {
"type": "object",
"title": "Vault Settings",
"required": ["path", "indexing", "chunking_rules"],
"properties": {
"path": {
"type": "string",
"description": "Absolute path to Obsidian vault root"
},
"indexing": {
"type": "object",
"required": ["auto_sync", "auto_sync_interval_minutes", "watch_fs_events", "file_patterns", "deny_dirs", "deny_patterns"],
"properties": {
"auto_sync": {
"type": "boolean",
"description": "Enable automatic syncing",
"default": true
},
"auto_sync_interval_minutes": {
"type": "integer",
"description": "Minutes between full syncs",
"minimum": 60,
"maximum": 10080,
"default": 1440
},
"watch_fs_events": {
"type": "boolean",
"description": "Watch for file system changes",
"default": true
},
"file_patterns": {
"type": "array",
"description": "File patterns to index",
"items": { "type": "string" },
"default": ["*.md"]
},
"deny_dirs": {
"type": "array",
"description": "Directories to skip",
"items": { "type": "string" },
"default": [".obsidian", ".trash", "zzz-Archive", ".git", ".logseq"]
},
"deny_patterns": {
"type": "array",
"description": "File patterns to ignore",
"items": { "type": "string" },
"default": ["*.tmp", "*.bak", "*conflict*", ".*"]
}
}
},
"chunking_rules": {
"type": "object",
"description": "Per-directory chunking rules (key: glob pattern, value: rule)",
"additionalProperties": {
"type": "object",
"required": ["strategy", "chunk_size", "chunk_overlap"],
"properties": {
"strategy": {
"type": "string",
"enum": ["sliding_window", "section"],
"description": "Chunking strategy"
},
"chunk_size": {
"type": "integer",
"description": "Target chunk size in words",
"minimum": 50,
"maximum": 2000
},
"chunk_overlap": {
"type": "integer",
"description": "Overlap between chunks in words",
"minimum": 0,
"maximum": 500
},
"section_tags": {
"type": "array",
"description": "Tags that mark sections (for section strategy)",
"items": { "type": "string" }
}
}
}
}
}
},
"rag": {
"type": "object",
"title": "RAG Settings",
"required": ["embedding", "vector_store", "search"],
"properties": {
"embedding": {
"type": "object",
"required": ["provider", "model", "base_url", "dimensions", "batch_size"],
"properties": {
"provider": {
"type": "string",
"description": "Embedding service provider",
"enum": ["ollama"],
"default": "ollama"
},
"model": {
"type": "string",
"description": "Model name for embeddings",
"enum": ["mxbai-embed-large", "nomic-embed-text", "all-minilm"],
"default": "mxbai-embed-large"
},
"base_url": {
"type": "string",
"description": "Provider API endpoint",
"format": "uri",
"default": "http://localhost:11434"
},
"dimensions": {
"type": "integer",
"description": "Embedding vector size",
"enum": [384, 768, 1024],
"default": 1024
},
"batch_size": {
"type": "integer",
"description": "Texts per embedding batch",
"minimum": 1,
"maximum": 256,
"default": 32
}
}
},
"vector_store": {
"type": "object",
"required": ["type", "path"],
"properties": {
"type": {
"type": "string",
"description": "Vector database type",
"enum": ["lancedb"],
"default": "lancedb"
},
"path": {
"type": "string",
"description": "Storage path",
"default": "~/.companion/vectors.lance"
}
}
},
"search": {
"type": "object",
"required": ["default_top_k", "max_top_k", "similarity_threshold", "hybrid_search", "filters"],
"properties": {
"default_top_k": {
"type": "integer",
"description": "Default results to retrieve",
"minimum": 1,
"maximum": 100,
"default": 8
},
"max_top_k": {
"type": "integer",
"description": "Maximum allowed results",
"minimum": 1,
"maximum": 100,
"default": 20
},
"similarity_threshold": {
"type": "number",
"description": "Minimum relevance score (0-1)",
"minimum": 0.0,
"maximum": 1.0,
"default": 0.75
},
"hybrid_search": {
"type": "object",
"required": ["enabled", "keyword_weight", "semantic_weight"],
"properties": {
"enabled": {
"type": "boolean",
"description": "Combine keyword + semantic search",
"default": true
},
"keyword_weight": {
"type": "number",
"description": "Keyword search weight",
"minimum": 0.0,
"maximum": 1.0,
"default": 0.3
},
"semantic_weight": {
"type": "number",
"description": "Semantic search weight",
"minimum": 0.0,
"maximum": 1.0,
"default": 0.7
}
}
},
"filters": {
"type": "object",
"required": ["date_range_enabled", "tag_filter_enabled", "directory_filter_enabled"],
"properties": {
"date_range_enabled": {
"type": "boolean",
"description": "Enable date range filtering",
"default": true
},
"tag_filter_enabled": {
"type": "boolean",
"description": "Enable tag filtering",
"default": true
},
"directory_filter_enabled": {
"type": "boolean",
"description": "Enable directory filtering",
"default": true
}
}
}
}
}
}
},
"model": {
"type": "object",
"title": "Model Settings",
"required": ["inference", "fine_tuning", "retrain_schedule"],
"properties": {
"inference": {
"type": "object",
"required": ["backend", "model_path", "context_length", "gpu_layers", "batch_size", "threads"],
"properties": {
"backend": {
"type": "string",
"description": "Inference engine",
"enum": ["llama.cpp", "vllm"],
"default": "llama.cpp"
},
"model_path": {
"type": "string",
"description": "Path to GGUF or HF model",
"default": "~/.companion/models/companion-7b-q4.gguf"
},
"context_length": {
"type": "integer",
"description": "Max context tokens",
"minimum": 2048,
"maximum": 32768,
"default": 8192
},
"gpu_layers": {
"type": "integer",
"description": "Layers to offload to GPU (0 for CPU-only)",
"minimum": 0,
"maximum": 100,
"default": 35
},
"batch_size": {
"type": "integer",
"description": "Inference batch size",
"minimum": 1,
"maximum": 2048,
"default": 512
},
"threads": {
"type": "integer",
"description": "CPU threads for inference",
"minimum": 1,
"maximum": 64,
"default": 8
}
}
},
"fine_tuning": {
"type": "object",
"required": ["base_model", "output_dir", "lora_rank", "lora_alpha", "learning_rate", "batch_size", "gradient_accumulation_steps", "num_epochs", "warmup_steps", "save_steps", "eval_steps", "training_data_path", "validation_split"],
"properties": {
"base_model": {
"type": "string",
"description": "Base model for fine-tuning",
"default": "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit"
},
"output_dir": {
"type": "string",
"description": "Training outputs directory",
"default": "~/.companion/training"
},
"lora_rank": {
"type": "integer",
"description": "LoRA rank (higher = more capacity, more VRAM)",
"minimum": 4,
"maximum": 128,
"default": 16
},
"lora_alpha": {
"type": "integer",
"description": "LoRA alpha (scaling factor, typically 2x rank)",
"minimum": 8,
"maximum": 256,
"default": 32
},
"learning_rate": {
"type": "number",
"description": "Training learning rate",
"minimum": 1e-6,
"maximum": 1e-3,
"default": 0.0002
},
"batch_size": {
"type": "integer",
"description": "Per-device batch size",
"minimum": 1,
"maximum": 32,
"default": 4
},
"gradient_accumulation_steps": {
"type": "integer",
"description": "Steps to accumulate before update",
"minimum": 1,
"maximum": 64,
"default": 4
},
"num_epochs": {
"type": "integer",
"description": "Training epochs",
"minimum": 1,
"maximum": 20,
"default": 3
},
"warmup_steps": {
"type": "integer",
"description": "Learning rate warmup steps",
"minimum": 0,
"maximum": 10000,
"default": 100
},
"save_steps": {
"type": "integer",
"description": "Checkpoint frequency",
"minimum": 10,
"maximum": 10000,
"default": 500
},
"eval_steps": {
"type": "integer",
"description": "Evaluation frequency",
"minimum": 10,
"maximum": 10000,
"default": 250
},
"training_data_path": {
"type": "string",
"description": "Training data directory",
"default": "~/.companion/training_data/"
},
"validation_split": {
"type": "number",
"description": "Fraction of data for validation",
"minimum": 0.0,
"maximum": 0.5,
"default": 0.1
}
}
},
"retrain_schedule": {
"type": "object",
"required": ["auto_reminder", "default_interval_days", "reminder_channels"],
"properties": {
"auto_reminder": {
"type": "boolean",
"description": "Enable retrain reminders",
"default": true
},
"default_interval_days": {
"type": "integer",
"description": "Days between retrain reminders",
"minimum": 30,
"maximum": 365,
"default": 90
},
"reminder_channels": {
"type": "array",
"description": "Where to show reminders",
"items": {
"type": "string",
"enum": ["chat_stream", "log", "ui"]
},
"default": ["chat_stream", "log"]
}
}
}
}
},
"api": {
"type": "object",
"title": "API Settings",
"required": ["host", "port", "cors_origins", "auth"],
"properties": {
"host": {
"type": "string",
"description": "Bind address (use 0.0.0.0 for LAN access)",
"default": "127.0.0.1"
},
"port": {
"type": "integer",
"description": "HTTP port",
"minimum": 1,
"maximum": 65535,
"default": 7373
},
"cors_origins": {
"type": "array",
"description": "Allowed CORS origins",
"items": {
"type": "string",
"format": "uri"
},
"default": ["http://localhost:5173"]
},
"auth": {
"type": "object",
"required": ["enabled"],
"properties": {
"enabled": {
"type": "boolean",
"description": "Enable API key authentication",
"default": false
}
}
}
}
},
"ui": {
"type": "object",
"title": "UI Settings",
"required": ["web", "cli"],
"properties": {
"web": {
"type": "object",
"required": ["enabled", "theme", "features"],
"properties": {
"enabled": {
"type": "boolean",
"description": "Enable web interface",
"default": true
},
"theme": {
"type": "string",
"description": "UI theme",
"enum": ["obsidian"],
"default": "obsidian"
},
"features": {
"type": "object",
"required": ["streaming", "citations", "source_preview"],
"properties": {
"streaming": {
"type": "boolean",
"description": "Real-time response streaming",
"default": true
},
"citations": {
"type": "boolean",
"description": "Show source citations",
"default": true
},
"source_preview": {
"type": "boolean",
"description": "Preview source snippets",
"default": true
}
}
}
}
},
"cli": {
"type": "object",
"required": ["enabled", "rich_output"],
"properties": {
"enabled": {
"type": "boolean",
"description": "Enable CLI interface",
"default": true
},
"rich_output": {
"type": "boolean",
"description": "Rich terminal formatting",
"default": true
}
}
}
}
},
"logging": {
"type": "object",
"title": "Logging Settings",
"required": ["level", "file", "max_size_mb", "backup_count"],
"properties": {
"level": {
"type": "string",
"description": "Log level",
"enum": ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
"default": "INFO"
},
"file": {
"type": "string",
"description": "Log file path",
"default": "~/.companion/logs/companion.log"
},
"max_size_mb": {
"type": "integer",
"description": "Max log file size in MB",
"minimum": 10,
"maximum": 1000,
"default": 100
},
"backup_count": {
"type": "integer",
"description": "Number of rotated backups",
"minimum": 1,
"maximum": 20,
"default": 5
}
}
},
"security": {
"type": "object",
"title": "Security Settings",
"required": ["local_only", "vault_path_traversal_check", "sensitive_content_detection", "sensitive_patterns", "require_confirmation_for_external_apis"],
"properties": {
"local_only": {
"type": "boolean",
"description": "Block external API calls",
"default": true
},
"vault_path_traversal_check": {
"type": "boolean",
"description": "Prevent path traversal attacks",
"default": true
},
"sensitive_content_detection": {
"type": "boolean",
"description": "Tag sensitive content",
"default": true
},
"sensitive_patterns": {
"type": "array",
"description": "Tags considered sensitive",
"items": { "type": "string" },
"default": ["#mentalhealth", "#physicalhealth", "#finance", "#Relations"]
},
"require_confirmation_for_external_apis": {
"type": "boolean",
"description": "Confirm before external API calls",
"default": true
}
}
}
}
}

278
docs/config.md Normal file
View File

@@ -0,0 +1,278 @@
# Configuration Reference
Complete reference for `config.json` configuration options.
## Overview
The configuration file uses JSON format with support for:
- Path expansion (`~` expands to home directory)
- Type validation via Pydantic models
- Environment-specific overrides
## Schema Validation
Validate your config against the schema:
```bash
python -c "from companion.config import load_config; load_config('config.json')"
```
Or use the JSON Schema directly: [config-schema.json](../config-schema.json)
## Configuration Sections
### companion
Core companion personality and behavior settings.
```json
{
"companion": {
"name": "SAN",
"persona": {
"role": "companion",
"tone": "reflective",
"style": "questioning",
"boundaries": [
"does_not_impersonate_user",
"no_future_predictions",
"no_medical_or_legal_advice"
]
},
"memory": {
"session_turns": 20,
"persistent_store": "~/.companion/memory.db",
"summarize_after": 10
},
"chat": {
"streaming": true,
"max_response_tokens": 2048,
"default_temperature": 0.7,
"allow_temperature_override": true
}
}
}
```
#### Fields
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `name` | string | "SAN" | Display name for the companion |
| `persona.role` | string | "companion" | Role description (companion/advisor/reflector) |
| `persona.tone` | string | "reflective" | Communication tone (reflective/supportive/analytical) |
| `persona.style` | string | "questioning" | Interaction style (questioning/supportive/direct) |
| `persona.boundaries` | string[] | [...] | Behavioral guardrails |
| `memory.session_turns` | int | 20 | Messages to keep in context |
| `memory.persistent_store` | string | "~/.companion/memory.db" | SQLite database path |
| `memory.summarize_after` | int | 10 | Summarize history after N turns |
| `chat.streaming` | bool | true | Stream responses in real-time |
| `chat.max_response_tokens` | int | 2048 | Max tokens per response |
| `chat.default_temperature` | float | 0.7 | Creativity (0.0=deterministic, 2.0=creative) |
| `chat.allow_temperature_override` | bool | true | Let users adjust temperature |
---
### vault
Obsidian vault indexing configuration.
```json
{
"vault": {
"path": "~/KnowledgeVault/Default",
"indexing": {
"auto_sync": true,
"auto_sync_interval_minutes": 1440,
"watch_fs_events": true,
"file_patterns": ["*.md"],
"deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git"],
"deny_patterns": ["*.tmp", "*.bak", "*conflict*"]
},
"chunking_rules": {
"default": {
"strategy": "sliding_window",
"chunk_size": 500,
"chunk_overlap": 100
},
"Journal/**": {
"strategy": "section",
"section_tags": ["#DayInShort", "#mentalhealth", "#work"],
"chunk_size": 300,
"chunk_overlap": 50
}
}
}
}
```
---
### rag
RAG (Retrieval-Augmented Generation) engine configuration.
```json
{
"rag": {
"embedding": {
"provider": "ollama",
"model": "mxbai-embed-large",
"base_url": "http://localhost:11434",
"dimensions": 1024,
"batch_size": 32
},
"vector_store": {
"type": "lancedb",
"path": "~/.companion/vectors.lance"
},
"search": {
"default_top_k": 8,
"max_top_k": 20,
"similarity_threshold": 0.75,
"hybrid_search": {
"enabled": true,
"keyword_weight": 0.3,
"semantic_weight": 0.7
},
"filters": {
"date_range_enabled": true,
"tag_filter_enabled": true,
"directory_filter_enabled": true
}
}
}
}
```
---
### model
LLM configuration for inference and fine-tuning.
```json
{
"model": {
"inference": {
"backend": "llama.cpp",
"model_path": "~/.companion/models/companion-7b-q4.gguf",
"context_length": 8192,
"gpu_layers": 35,
"batch_size": 512,
"threads": 8
},
"fine_tuning": {
"base_model": "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
"output_dir": "~/.companion/training",
"lora_rank": 16,
"lora_alpha": 32,
"learning_rate": 0.0002,
"batch_size": 4,
"gradient_accumulation_steps": 4,
"num_epochs": 3,
"warmup_steps": 100,
"save_steps": 500,
"eval_steps": 250,
"training_data_path": "~/.companion/training_data/",
"validation_split": 0.1
},
"retrain_schedule": {
"auto_reminder": true,
"default_interval_days": 90,
"reminder_channels": ["chat_stream", "log"]
}
}
}
```
---
### api
FastAPI backend configuration.
```json
{
"api": {
"host": "127.0.0.1",
"port": 7373,
"cors_origins": ["http://localhost:5173"],
"auth": {
"enabled": false
}
}
}
```
---
### ui
Web UI configuration.
```json
{
"ui": {
"web": {
"enabled": true,
"theme": "obsidian",
"features": {
"streaming": true,
"citations": true,
"source_preview": true
}
},
"cli": {
"enabled": true,
"rich_output": true
}
}
}
```
---
### logging
Logging configuration.
```json
{
"logging": {
"level": "INFO",
"file": "~/.companion/logs/companion.log",
"max_size_mb": 100,
"backup_count": 5
}
}
```
---
### security
Security and privacy settings.
```json
{
"security": {
"local_only": true,
"vault_path_traversal_check": true,
"sensitive_content_detection": true,
"sensitive_patterns": [
"#mentalhealth",
"#physicalhealth",
"#finance",
"#Relations"
],
"require_confirmation_for_external_apis": true
}
}
```
---
## Full Example
See [config.json](../config.json) for a complete working configuration.

288
docs/forge.md Normal file
View File

@@ -0,0 +1,288 @@
# FORGE Module Documentation
The FORGE module handles fine-tuning of the companion model. It extracts training examples from your vault reflections and trains a custom LoRA adapter using QLoRA on your local GPU.
## Architecture
```
Vault Reflections
┌─────────────────┐
│ Extract │ - Scan for #reflection, #insight tags
│ (extract.py) │ - Parse reflection patterns
└────────┬────────┘
┌─────────────────┐
│ Curate │ - Manual review (optional)
│ (curate.py) │ - Deduplication
└────────┬────────┘
┌─────────────────┐
│ Train │ - QLoRA fine-tuning
│ (train.py) │ - Unsloth + transformers
└────────┬────────┘
┌─────────────────┐
│ Export │ - Merge LoRA weights
│ (export.py) │ - Convert to GGUF
└────────┬────────┘
┌─────────────────┐
│ Reload │ - Hot-swap in API
│ (reload.py) │ - No restart needed
└─────────────────┘
```
## Requirements
- **GPU**: RTX 5070 or equivalent (12GB+ VRAM)
- **Dependencies**: Install with `pip install -e ".[train]"`
- **Time**: 4-6 hours for full training run
## Workflow
### 1. Extract Training Data
Scan your vault for reflection patterns:
```bash
python -m companion.forge.cli extract
```
This scans for:
- Tags: `#reflection`, `#insight`, `#learning`, `#decision`, etc.
- Patterns: "I think", "I realize", "Looking back", "What if"
- Section headers in journal entries
Output: `~/.companion/training_data/extracted.jsonl`
**Example extracted data:**
```json
{
"messages": [
{"role": "system", "content": "You are a thoughtful, reflective companion."},
{"role": "user", "content": "I'm facing a decision. How should I think through this?"},
{"role": "assistant", "content": "#reflection I think I need to slow down..."}
],
"source_file": "Journal/2026/04/2026-04-12.md",
"tags": ["#reflection", "#DayInShort"],
"date": "2026-04-12"
}
```
### 2. Train Model
Run QLoRA fine-tuning:
```bash
python -m companion.forge.cli train --epochs 3 --lr 2e-4
```
**Hyperparameters (from config):**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `lora_rank` | 16 | LoRA rank (8-64) |
| `lora_alpha` | 32 | LoRA scaling factor |
| `learning_rate` | 2e-4 | Optimizer learning rate |
| `num_epochs` | 3 | Training epochs |
| `batch_size` | 4 | Per-device batch |
| `gradient_accumulation_steps` | 4 | Steps before update |
**Training Output:**
- Checkpoints: `~/.companion/training/checkpoint-*/`
- Final model: `~/.companion/training/final/`
- Logs: Training loss, eval metrics
### 3. Reload Model
Hot-swap without restarting API:
```bash
python -m companion.forge.cli reload ~/.companion/training/final
```
Or via API:
```bash
curl -X POST http://localhost:7373/admin/reload-model \
-H "Content-Type: application/json" \
-d '{"model_path": "~/.companion/training/final"}'
```
## Components
### Extractor (`companion.forge.extract`)
```python
from companion.forge.extract import TrainingDataExtractor, extract_training_data
# Extract from vault
extractor = TrainingDataExtractor(config)
examples = extractor.extract()
# Get statistics
stats = extractor.get_stats()
print(f"Extracted {stats['total']} examples")
# Save to JSONL
extractor.save_to_jsonl(Path("training.jsonl"))
```
**Reflection Detection:**
- **Tags**: `#reflection`, `#learning`, `#insight`, `#decision`, `#analysis`, `#takeaway`, `#realization`
- **Patterns**: "I think", "I feel", "I realize", "I wonder", "Looking back", "On one hand...", "Ultimately decided"
### Trainer (`companion.forge.train`)
```python
from companion.forge.train import train
final_path = train(
data_path=Path("training.jsonl"),
output_dir=Path("~/.companion/training"),
base_model="unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
lora_rank=16,
lora_alpha=32,
learning_rate=2e-4,
num_epochs=3,
)
```
**Base Models:**
- `unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit` - Recommended
- `unsloth/llama-3-8b-bnb-4bit` - Alternative
**Target Modules:**
LoRA is applied to: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
### Exporter (`companion.forge.export`)
```python
from companion.forge.export import merge_only
# Merge LoRA into base model
merged_path = merge_only(
checkpoint_path=Path("~/.companion/training/checkpoint-500"),
output_path=Path("~/.companion/models/merged"),
)
```
### Reloader (`companion.forge.reload`)
```python
from companion.forge.reload import reload_model, get_model_status
# Check current model
status = get_model_status(config)
print(f"Model size: {status['size_mb']} MB")
# Reload with new model
new_path = reload_model(
config=config,
new_model_path=Path("~/.companion/training/final"),
backup=True,
)
```
## CLI Reference
```bash
# Extract training data
companion.forge.cli extract [--output PATH]
# Train model
companion.forge.cli train \
[--data PATH] \
[--output PATH] \
[--epochs N] \
[--lr FLOAT]
# Check model status
companion.forge.cli status
# Reload model
companion.forge.cli reload MODEL_PATH [--no-backup]
```
## Training Tips
**Dataset Size:**
- Minimum: 50 examples
- Optimal: 100-500 examples
- More is not always better - quality over quantity
**Epochs:**
- Start with 3 epochs
- Increase if underfitting (high loss)
- Decrease if overfitting (loss increases on eval)
**LoRA Rank:**
- `8` - Quick experiments
- `16` - Balanced (recommended)
- `32-64` - High capacity, more VRAM
**Overfitting Signs:**
- Training loss decreasing, eval loss increasing
- Model repeats exact phrases from training data
- Responses feel "memorized" not "learned"
## VRAM Usage (RTX 5070, 12GB)
| Config | VRAM | Batch Size |
|--------|------|------------|
| Rank 16, 8-bit adam | ~10GB | 4 |
| Rank 32, 8-bit adam | ~11GB | 4 |
| Rank 64, 8-bit adam | OOM | - |
Use `gradient_accumulation_steps` to increase effective batch size.
## Troubleshooting
**CUDA Out of Memory**
- Reduce `lora_rank` to 8
- Reduce `batch_size` to 2
- Increase `gradient_accumulation_steps`
**Training Loss Not Decreasing**
- Check data quality (reflections present?)
- Increase learning rate to 5e-4
- Check for data formatting issues
**Model Not Loading After Reload**
- Check path exists: `ls -la ~/.companion/models/`
- Verify model format (GGUF vs HF)
- Check API logs for errors
**Slow Training**
- Expected: ~6 hours for 3 epochs on RTX 5070
- Enable gradient checkpointing (enabled by default)
- Close other GPU applications
## Advanced: Custom Training Script
```python
# custom_train.py
from companion.forge.train import train
from companion.config import load_config
config = load_config()
final_path = train(
data_path=config.model.fine_tuning.training_data_path / "curated.jsonl",
output_dir=config.model.fine_tuning.output_dir,
base_model=config.model.fine_tuning.base_model,
lora_rank=32, # Higher capacity
lora_alpha=64,
learning_rate=3e-4, # Slightly higher
num_epochs=5, # More epochs
batch_size=2, # Smaller batches
gradient_accumulation_steps=8, # Effective batch = 16
)
print(f"Model saved to: {final_path}")
```

269
docs/rag.md Normal file
View File

@@ -0,0 +1,269 @@
# RAG Module Documentation
The RAG (Retrieval-Augmented Generation) module provides semantic search over your Obsidian vault. It handles document chunking, embedding generation, and vector similarity search.
## Architecture
```
Vault Markdown Files
┌─────────────────┐
│ Chunker │ - Split by strategy (sliding window / section)
│ (chunker.py) │ - Extract metadata (tags, dates, sections)
└────────┬────────┘
┌─────────────────┐
│ Embedder │ - HTTP client for Ollama API
│ (embedder.py) │ - Batch processing with retries
└────────┬────────┘
┌─────────────────┐
│ Vector Store │ - LanceDB persistence
│(vector_store.py)│ - Upsert, delete, search
└────────┬────────┘
┌─────────────────┐
│ Indexer │ - Full/incremental sync
│ (indexer.py) │ - File watching
└─────────────────┘
```
## Components
### Chunker (`companion.rag.chunker`)
Splits markdown files into searchable chunks.
```python
from companion.rag.chunker import chunk_file, ChunkingRule
rules = {
"default": ChunkingRule(strategy="sliding_window", chunk_size=500, chunk_overlap=100),
"Journal/**": ChunkingRule(strategy="section", section_tags=["#DayInShort"], chunk_size=300, chunk_overlap=50),
}
chunks = chunk_file(
file_path=Path("journal/2026-04-12.md"),
vault_root=Path("~/vault"),
rules=rules,
modified_at=1234567890.0,
)
for chunk in chunks:
print(f"{chunk.source_file}:{chunk.chunk_index}")
print(f"Text: {chunk.text[:100]}...")
print(f"Tags: {chunk.tags}")
print(f"Date: {chunk.date}")
```
#### Chunking Strategies
**Sliding Window**
- Fixed-size chunks with overlap
- Best for: Longform text, articles
```python
ChunkingRule(
strategy="sliding_window",
chunk_size=500, # words per chunk
chunk_overlap=100, # words overlap between chunks
)
```
**Section-Based**
- Split on section headers (tags)
- Best for: Structured journals, daily notes
```python
ChunkingRule(
strategy="section",
section_tags=["#DayInShort", "#mentalhealth", "#work"],
chunk_size=300,
chunk_overlap=50,
)
```
#### Metadata Extraction
Each chunk includes:
- `source_file` - Relative path from vault root
- `source_directory` - Top-level directory
- `section` - Section header (for section strategy)
- `date` - Parsed from filename
- `tags` - Hashtags and wikilinks
- `chunk_index` - Position in document
- `modified_at` - File mtime for sync
### Embedder (`companion.rag.embedder`)
Generates embeddings via Ollama API.
```python
from companion.rag.embedder import OllamaEmbedder
embedder = OllamaEmbedder(
base_url="http://localhost:11434",
model="mxbai-embed-large",
batch_size=32,
)
# Single embedding
embeddings = embedder.embed(["Hello world"])
print(len(embeddings[0])) # 1024 dimensions
# Batch embedding (with automatic batching)
texts = ["text 1", "text 2", "text 3", ...] # 100 texts
embeddings = embedder.embed(texts) # Automatically batches
```
#### Features
- **Batching**: Automatically splits large requests
- **Retries**: Exponential backoff on failures
- **Context Manager**: Proper resource cleanup
```python
with OllamaEmbedder(...) as embedder:
embeddings = embedder.embed(texts)
```
### Vector Store (`companion.rag.vector_store`)
LanceDB wrapper for vector storage.
```python
from companion.rag.vector_store import VectorStore
store = VectorStore(
uri="~/.companion/vectors.lance",
dimensions=1024,
)
# Upsert chunks
store.upsert(
ids=["file.md::0", "file.md::1"],
texts=["chunk 1", "chunk 2"],
embeddings=[[0.1, ...], [0.2, ...]],
metadatas=[
{"source_file": "file.md", "source_directory": "docs"},
{"source_file": "file.md", "source_directory": "docs"},
],
)
# Search
results = store.search(
query_vector=[0.1, ...],
top_k=8,
filters={"source_directory": "Journal"},
)
```
#### Schema
| Field | Type | Nullable |
|-------|------|----------|
| id | string | No |
| text | string | No |
| vector | list[float32] | No |
| source_file | string | No |
| source_directory | string | No |
| section | string | Yes |
| date | string | Yes |
| tags | list[string] | Yes |
| chunk_index | int32 | No |
| total_chunks | int32 | No |
| modified_at | float64 | Yes |
| rule_applied | string | No |
### Indexer (`companion.rag.indexer`)
Orchestrates vault indexing.
```python
from companion.config import load_config
from companion.rag.indexer import Indexer
from companion.rag.vector_store import VectorStore
config = load_config()
store = VectorStore(
uri=config.rag.vector_store.path,
dimensions=config.rag.embedding.dimensions,
)
indexer = Indexer(config, store)
# Full reindex (clear + rebuild)
indexer.full_index()
# Incremental sync (only changed files)
indexer.sync()
# Get status
status = indexer.status()
print(f"Total chunks: {status['total_chunks']}")
print(f"Unindexed files: {status['unindexed_files']}")
```
### Search (`companion.rag.search`)
High-level search interface.
```python
from companion.rag.search import SearchEngine
engine = SearchEngine(
vector_store=store,
embedder_base_url="http://localhost:11434",
embedder_model="mxbai-embed-large",
default_top_k=8,
similarity_threshold=0.75,
hybrid_search_enabled=False,
)
results = engine.search(
query="What did I learn about friendships?",
top_k=8,
filters={"source_directory": "Journal"},
)
for result in results:
print(f"Source: {result['source_file']}")
print(f"Relevance: {1 - result['_distance']:.2f}")
```
## CLI Commands
```bash
# Full index
python -m companion.indexer_daemon.cli index
# Incremental sync
python -m companion.indexer_daemon.cli sync
# Check status
python -m companion.indexer_daemon.cli status
# Reindex (same as index)
python -m companion.indexer_daemon.cli reindex
```
## Performance Tips
1. **Chunk Size**: Smaller chunks = better retrieval, larger = more context
2. **Batch Size**: 32 is optimal for Ollama embeddings
3. **Filters**: Use directory filters to narrow search scope
4. **Sync vs Index**: Use `sync` for daily updates, `index` for full rebuilds
## Troubleshooting
**Slow indexing**
- Check Ollama is running: `ollama ps`
- Reduce batch size if OOM
**No results**
- Verify vault path in config
- Check `indexer.status()` for unindexed files
**Duplicate chunks**
- Each chunk ID is `{source_file}::{chunk_index}`
- Use `full_index()` to clear and rebuild

408
docs/ui.md Normal file
View File

@@ -0,0 +1,408 @@
# UI Module Documentation
The UI is a React + Vite frontend for the companion chat interface. It provides real-time streaming chat with a clean, Obsidian-inspired dark theme.
## Architecture
```
HTTP/SSE
┌─────────────────┐
│ App.tsx │ - State management
│ Message state │ - User/assistant messages
└────────┬────────┘
┌─────────────────┐
│ MessageList │ - Render messages
│ (components/) │ - User/assistant styling
└─────────────────┘
┌─────────────────┐
│ ChatInput │ - Textarea + send
│ (components/) │ - Auto-resize, hotkeys
└─────────────────┘
┌─────────────────┐
│ useChatStream │ - SSE streaming
│ (hooks/) │ - Session management
└─────────────────┘
```
## Project Structure
```
ui/
├── src/
│ ├── main.tsx # React entry point
│ ├── App.tsx # Main app component
│ ├── App.css # App layout styles
│ ├── index.css # Global styles
│ ├── components/
│ │ ├── MessageList.tsx # Message display
│ │ ├── MessageList.css # Message styling
│ │ ├── ChatInput.tsx # Input textarea
│ │ └── ChatInput.css # Input styling
│ └── hooks/
│ └── useChatStream.ts # SSE streaming hook
├── index.html # HTML template
├── vite.config.ts # Vite configuration
├── tsconfig.json # TypeScript config
└── package.json # Dependencies
```
## Components
### App.tsx
Main application state management:
```typescript
interface Message {
role: 'user' | 'assistant'
content: string
}
// State
const [messages, setMessages] = useState<Message[]>([])
const [input, setInput] = useState('')
const [isLoading, setIsLoading] = useState(false)
// Handlers
const handleSend = async () => { /* ... */ }
const handleKeyDown = (e) => { /* Enter to send, Shift+Enter newline */ }
```
**Features:**
- Auto-scroll to bottom on new messages
- Keyboard shortcuts (Enter to send, Shift+Enter for newline)
- Loading state with animation
- Message streaming in real-time
### MessageList.tsx
Renders the chat history:
```typescript
interface MessageListProps {
messages: Message[]
isLoading: boolean
}
```
**Layout:**
- User messages: Right-aligned, blue background
- Assistant messages: Left-aligned, gray background with border
- Loading indicator: Three animated dots
- Empty state: Prompt text when no messages
**Styling:**
- Max-width 800px, centered
- Smooth scroll behavior
- Avatar-less design (clean, text-focused)
### ChatInput.tsx
Textarea input with send button:
```typescript
interface ChatInputProps {
value: string
onChange: (value: string) => void
onSend: () => void
onKeyDown: (e: KeyboardEvent) => void
disabled: boolean
}
```
**Features:**
- Auto-resizing textarea
- Send button with loading state
- Placeholder text
- Disabled during streaming
## Hooks
### useChatStream.ts
Manages SSE streaming connection:
```typescript
interface UseChatStreamReturn {
sendMessage: (
message: string,
onChunk: (chunk: string) => void
) => Promise<void>
sessionId: string | null
}
const { sendMessage, sessionId } = useChatStream()
```
**Usage:**
```typescript
await sendMessage("Hello", (chunk) => {
// Append chunk to current response
setMessages(prev => {
const last = prev[prev.length - 1]
if (last?.role === 'assistant') {
last.content += chunk
return [...prev]
}
return [...prev, { role: 'assistant', content: chunk }]
})
})
```
**SSE Protocol:**
The API streams events in this format:
```
data: {"type": "chunk", "content": "Hello"}
data: {"type": "chunk", "content": " world"}
data: {"type": "sources", "sources": [{"file": "journal.md"}]}
data: {"type": "done", "session_id": "uuid"}
```
## Styling
### Design System
Based on Obsidian's dark theme:
```css
:root {
--bg-primary: #0d1117; /* App background */
--bg-secondary: #161b22; /* Header/footer */
--bg-tertiary: #21262d; /* Input background */
--text-primary: #c9d1d9; /* Main text */
--text-secondary: #8b949e; /* Placeholder */
--accent-primary: #58a6ff; /* Primary blue */
--accent-secondary: #79c0ff;/* Lighter blue */
--border: #30363d; /* Borders */
--user-bg: #1f6feb; /* User message */
--assistant-bg: #21262d; /* Assistant message */
}
```
### Message Styling
**User Message:**
- Blue background (`--user-bg`)
- White text
- Border radius: 12px (12px 12px 4px 12px)
- Max-width: 80%
**Assistant Message:**
- Gray background (`--assistant-bg`)
- Light text (`--text-primary`)
- Border: 1px solid `--border`
- Border radius: 12px (12px 12px 12px 4px)
### Loading Animation
Three bouncing dots using CSS keyframes:
```css
@keyframes bounce {
0%, 80%, 100% { transform: scale(0.6); }
40% { transform: scale(1); }
}
```
## Development
### Setup
```bash
cd ui
npm install
```
### Dev Server
```bash
npm run dev
# Opens http://localhost:5173
```
### Build
```bash
npm run build
# Output: ui/dist/
```
### Preview Production Build
```bash
npm run preview
```
## Configuration
### Vite Config
`vite.config.ts`:
```typescript
export default defineConfig({
plugins: [react()],
server: {
port: 5173,
proxy: {
'/api': {
target: 'http://localhost:7373',
changeOrigin: true,
},
},
},
})
```
**Proxy Setup:**
- Frontend: `http://localhost:5173`
- API: `http://localhost:7373`
- `/api/*``http://localhost:7373/api/*`
This allows using relative API paths in the code:
```typescript
const API_BASE = '/api' // Not http://localhost:7373/api
```
## TypeScript
### Types
```typescript
// Message role
type Role = 'user' | 'assistant'
// Message object
interface Message {
role: Role
content: string
}
// Chat request
type ChatRequest = {
message: string
session_id?: string
temperature?: number
}
// SSE chunk
type ChunkEvent = {
type: 'chunk'
content: string
}
type SourcesEvent = {
type: 'sources'
sources: Array<{
file: string
section?: string
date?: string
}>
}
type DoneEvent = {
type: 'done'
session_id: string
}
```
## API Integration
### Chat Endpoint
```typescript
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: userInput,
session_id: sessionId, // null for new session
stream: true,
}),
})
// Read SSE stream
const reader = response.body?.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
const chunk = decoder.decode(value, { stream: true })
// Parse SSE lines
}
```
### Session Persistence
The backend maintains conversation history via `session_id`:
1. First message: `session_id: null` → backend creates UUID
2. Response header: `X-Session-ID: <uuid>`
3. Subsequent messages: include `session_id: <uuid>`
4. History retrieved automatically
## Customization
### Themes
Modify `App.css` and `index.css`:
```css
/* Custom accent color */
--accent-primary: #ff6b6b;
--user-bg: #ff6b6b;
```
### Fonts
Update `index.css`:
```css
body {
font-family: 'Inter', -apple-system, sans-serif;
}
```
### Message Layout
Modify `MessageList.css`:
```css
.message-content {
max-width: 90%; /* Wider messages */
font-size: 16px; /* Larger text */
}
```
## Troubleshooting
**CORS errors**
- Check `vite.config.ts` proxy configuration
- Verify backend CORS origins include `http://localhost:5173`
**Stream not updating**
- Check browser network tab for SSE events
- Verify `EventSourceResponse` from backend
**Messages not appearing**
- Check React DevTools for state updates
- Verify `messages` array is being mutated correctly
**Build fails**
- Check TypeScript errors: `npx tsc --noEmit`
- Update dependencies: `npm update`