385 lines
14 KiB
Markdown
385 lines
14 KiB
Markdown
# Personal Companion AI — Design Spec
|
|
|
|
**Date:** 2026-04-13
|
|
**Status:** Approved
|
|
**Author:** Santhosh Janardhanan
|
|
|
|
## 1. Overview
|
|
|
|
A fully local, privacy-first AI companion trained on Santhosh's Obsidian vault. The companion is a reflective confidante—not a digital twin or advisor—that can answer questions about past events, summarize relationships, and explore life patterns alongside him.
|
|
|
|
The system combines a **fine-tuned local LLM** (for reasoning style and reflective voice) with a **RAG layer** (for factual retrieval from 677+ vault notes).
|
|
|
|
## 2. Core Philosophy
|
|
|
|
- **Companion, not clone**: The AI does not speak as Santhosh. It speaks *to* him.
|
|
- **Fully local**: No vault data leaves the machine. Ollama, LanceDB, and inference all run locally.
|
|
- **Evolving self**: Quarterly model retraining + daily RAG sync keeps the companion aligned with his changing life.
|
|
- **Minimal noise**: Notifications are quiet (streaming text + logs). No pop-ups.
|
|
|
|
## 3. Architecture
|
|
|
|
### Approach: Decoupled Services
|
|
|
|
Three independent processes:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Companion Chat (Web UI / CLI) │
|
|
│ React + Vite ←───→ FastAPI backend (orchestrator) │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
┌─────────────────────┼─────────────────────┐
|
|
↓ ↓ ↓
|
|
┌──────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
|
│ Fine-tuned │ │ RAG Engine │ │ Vault Indexer │
|
|
│ 7B Model │ │ (LanceDB) │ │ (daemon/CLI) │
|
|
│ (llama.cpp) │ │ │ │ │
|
|
│ │ │ • semantic │ │ • watches vault│
|
|
│ Quarterly │ │ search │ │ • chunks/embeds│
|
|
│ retrain │ │ • hybrid │ │ • daily auto │
|
|
│ │ │ filters │ │ sync │
|
|
│ │ │ • relationship │ │ • manual │
|
|
│ │ │ graph │ │ trigger │
|
|
└──────────────┘ └─────────────────┘ └─────────────────┘
|
|
```
|
|
|
|
### Service Responsibilities
|
|
|
|
| Service | Language | Role |
|
|
|---------|----------|------|
|
|
| **Companion Engine** | Python (FastAPI) + TS (React) | Chat UI, session memory, prompt orchestration |
|
|
| **RAG Engine** | Python | LanceDB queries, embedding cache, hybrid search |
|
|
| **Vault Indexer** | Python (CLI + daemon) | File watching, chunking, embedding via Ollama |
|
|
| **Model Forge** | Python (on-demand) | QLoRA fine-tuning, GGUF export |
|
|
|
|
## 4. Data Flow: A Chat Turn
|
|
|
|
1. User asks: *"I've been feeling off about my friendship with Vinay. What do you think?"*
|
|
2. **Orchestrator** detects this needs relationship context + reflective reasoning.
|
|
3. **RAG Engine** queries LanceDB for `Vinay`, `friendship`, `#Relations` — returns top 8 chunks.
|
|
4. **Prompt construction**:
|
|
- System: Companion persona + reasoning instructions
|
|
- Retrieved context: Relevant vault entries
|
|
- Conversation history: Last 20 turns
|
|
- User message
|
|
5. **Local LLM** streams a reflective response.
|
|
6. **Optional**: Companion asks a gentle follow-up to deepen reflection.
|
|
|
|
## 5. Technology Choices
|
|
|
|
| Component | Choice | Rationale |
|
|
|-----------|--------|-----------|
|
|
| Base model | Meta-Llama-3.1-8B-Instruct | Strong reasoning, fits 12GB VRAM quantized |
|
|
| Fine-tuning | Unsloth + QLoRA (4-bit) | Fast, memory-efficient, runs on RTX 5070 |
|
|
| Inference | llama.cpp (GGUF) | Mature, fast local inference, easy GPU layer tuning |
|
|
| Embedding | `mxbai-embed-large` via Ollama | 1024-dim, local, high quality |
|
|
| Vector store | LanceDB (embedded) | File-based, no server, Rust-backed |
|
|
| Backend | FastAPI + WebSockets | Streaming chat, simple Python API |
|
|
| Frontend | React + Vite | Lightweight, fast dev loop |
|
|
| File watcher | `watchdog` (Python) | Reliable cross-platform vault monitoring |
|
|
|
|
## 6. Fine-Tuning Strategy
|
|
|
|
### What the model learns
|
|
- **Reflective reasoning style**: How Santhosh thinks through situations
|
|
- **Values and priorities**: What he tends to weigh in decisions
|
|
- **Communication patterns**: His tone in journal entries (direct, questioning, humorous)
|
|
- **Relationship dynamics**: Patterns in how he describes people over time
|
|
|
|
### What stays in RAG
|
|
- Specific dates, events, amounts
|
|
- Exact quotes and conversations
|
|
- Recent updates (between retrainings)
|
|
- Granular facts
|
|
|
|
### Training data format
|
|
Curated "reflection examples" from the vault, formatted as conversation turns:
|
|
|
|
```json
|
|
{
|
|
"messages": [
|
|
{"role": "system", "content": "You are a thoughtful companion..."},
|
|
{"role": "user", "content": "Journal entry about Vinay visit... What do you notice?"},
|
|
{"role": "assistant", "content": "It seems like you value these drop-ins..."}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Training schedule
|
|
- **Quarterly retrain**: Automatic reminder (log + chat stream) every 90 days.
|
|
- **Manual trigger**: User can initiate retrain anytime via CLI/UI.
|
|
- **Pipeline**: `vault → extract reflections → curate → train → export GGUF → swap model file`
|
|
|
|
## 7. RAG Engine Design
|
|
|
|
### Indexing Modes
|
|
- **`index`**: Full rebuild of the vector store.
|
|
- **`sync`**: Incremental — only process files modified since last sync.
|
|
- **`reindex`**: Force full rebuild.
|
|
- **`status`**: Show doc count, last sync, unindexed files.
|
|
|
|
### Auto-Sync Strategy
|
|
- **File system watcher**: `watchdog` monitors vault root and triggers incremental sync on any `.md` change.
|
|
- **Daily full sync**: At 3:00 AM, run a full sync to catch any missed events.
|
|
- **Manual trigger**: `POST /index/trigger` from chat or CLI.
|
|
|
|
### Per-Directory Chunking Rules
|
|
Different vault directories need different granularity:
|
|
|
|
```json
|
|
"chunking_rules": {
|
|
"default": {
|
|
"strategy": "sliding_window",
|
|
"chunk_size": 500,
|
|
"chunk_overlap": 100
|
|
},
|
|
"Journal/**": {
|
|
"strategy": "section",
|
|
"section_tags": ["#DayInShort", "#mentalhealth", "#physicalhealth", "#work", "#finance", "#Relations"],
|
|
"chunk_size": 300,
|
|
"chunk_overlap": 50
|
|
},
|
|
"zzz-Archive/**": {
|
|
"strategy": "sliding_window",
|
|
"chunk_size": 800,
|
|
"chunk_overlap": 150
|
|
}
|
|
}
|
|
```
|
|
|
|
**Rationale**: Journal entries contain dense emotional/factual tags and benefit from section-based chunking with smaller chunks. Archives are reference material and can be chunked more coarsely.
|
|
|
|
### Metadata per Chunk
|
|
- `source_file`: Relative path from vault root
|
|
- `source_directory`: Top-level directory
|
|
- `section`: Section heading (for structured notes)
|
|
- `date`: Parsed from filename or frontmatter
|
|
- `tags`: All hashtags and wikilinks found in chunk
|
|
- `chunk_index`: Position in document
|
|
- `modified_at`: File mtime for incremental sync
|
|
- `rule_applied`: Which chunking rule was used
|
|
|
|
### Search
|
|
- **Default top-k**: 8 chunks
|
|
- **Max top-k**: 20
|
|
- **Similarity threshold**: 0.75
|
|
- **Hybrid search**: Enabled by default (30% keyword, 70% semantic)
|
|
- **Filters**: date range, tag list, directory glob
|
|
|
|
## 8. Configuration Schema
|
|
|
|
```json
|
|
{
|
|
"companion": {
|
|
"name": "SAN",
|
|
"persona": {
|
|
"role": "companion",
|
|
"tone": "reflective",
|
|
"style": "questioning",
|
|
"boundaries": [
|
|
"does_not_impersonate_user",
|
|
"no_future_predictions",
|
|
"no_medical_or_legal_advice"
|
|
]
|
|
},
|
|
"memory": {
|
|
"session_turns": 20,
|
|
"persistent_store": "~/.companion/memory.db",
|
|
"summarize_after": 10
|
|
},
|
|
"chat": {
|
|
"streaming": true,
|
|
"max_response_tokens": 2048,
|
|
"default_temperature": 0.7,
|
|
"allow_temperature_override": true
|
|
}
|
|
},
|
|
"vault": {
|
|
"path": "/home/san/KnowledgeVault/Default",
|
|
"indexing": {
|
|
"auto_sync": true,
|
|
"auto_sync_interval_minutes": 1440,
|
|
"watch_fs_events": true,
|
|
"file_patterns": ["*.md"],
|
|
"deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git", ".logseq"],
|
|
"deny_patterns": ["*.tmp", "*.bak", "*conflict*", ".*"]
|
|
},
|
|
"chunking_rules": {
|
|
"default": { "strategy": "sliding_window", "chunk_size": 500, "chunk_overlap": 100 },
|
|
"Journal/**": { "strategy": "section", "chunk_size": 300, "chunk_overlap": 50 },
|
|
"zzz-Archive/**": { "strategy": "sliding_window", "chunk_size": 800, "chunk_overlap": 150 }
|
|
}
|
|
},
|
|
"rag": {
|
|
"embedding": {
|
|
"provider": "ollama",
|
|
"model": "mxbai-embed-large",
|
|
"base_url": "http://localhost:11434",
|
|
"dimensions": 1024,
|
|
"batch_size": 32
|
|
},
|
|
"vector_store": {
|
|
"type": "lancedb",
|
|
"path": "~/.companion/vectors.lance"
|
|
},
|
|
"search": {
|
|
"default_top_k": 8,
|
|
"max_top_k": 20,
|
|
"similarity_threshold": 0.75,
|
|
"hybrid_search": { "enabled": true, "keyword_weight": 0.3, "semantic_weight": 0.7 },
|
|
"filters": { "date_range_enabled": true, "tag_filter_enabled": true, "directory_filter_enabled": true }
|
|
}
|
|
},
|
|
"model": {
|
|
"inference": {
|
|
"backend": "llama.cpp",
|
|
"model_path": "~/.companion/models/companion-7b-q4.gguf",
|
|
"context_length": 8192,
|
|
"gpu_layers": 35,
|
|
"batch_size": 512,
|
|
"threads": 8
|
|
},
|
|
"fine_tuning": {
|
|
"base_model": "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
|
|
"output_dir": "~/.companion/training",
|
|
"lora_rank": 16,
|
|
"lora_alpha": 32,
|
|
"learning_rate": 0.0002,
|
|
"batch_size": 4,
|
|
"gradient_accumulation_steps": 4,
|
|
"num_epochs": 3,
|
|
"warmup_steps": 100,
|
|
"save_steps": 500,
|
|
"eval_steps": 250,
|
|
"training_data_path": "~/.companion/training_data/",
|
|
"validation_split": 0.1
|
|
},
|
|
"retrain_schedule": {
|
|
"auto_reminder": true,
|
|
"default_interval_days": 90,
|
|
"reminder_channels": ["chat_stream", "log"]
|
|
}
|
|
},
|
|
"api": {
|
|
"host": "127.0.0.1",
|
|
"port": 7373,
|
|
"cors_origins": ["http://localhost:5173"],
|
|
"auth": { "enabled": false }
|
|
},
|
|
"ui": {
|
|
"web": {
|
|
"enabled": true,
|
|
"theme": "obsidian",
|
|
"features": { "streaming": true, "citations": true, "source_preview": true }
|
|
},
|
|
"cli": { "enabled": true, "rich_output": true }
|
|
},
|
|
"logging": {
|
|
"level": "INFO",
|
|
"file": "~/.companion/logs/companion.log",
|
|
"max_size_mb": 100,
|
|
"backup_count": 5
|
|
},
|
|
"security": {
|
|
"local_only": true,
|
|
"vault_path_traversal_check": true,
|
|
"sensitive_content_detection": true,
|
|
"sensitive_patterns": ["#mentalhealth", "#physicalhealth", "#finance", "#Relations"],
|
|
"require_confirmation_for_external_apis": true
|
|
}
|
|
}
|
|
```
|
|
|
|
## 9. Project Structure
|
|
|
|
```
|
|
companion/
|
|
├── README.md
|
|
├── pyproject.toml
|
|
├── config.json
|
|
├── src/
|
|
│ ├── companion/ # FastAPI backend + orchestrator
|
|
│ │ ├── main.py
|
|
│ │ ├── api/
|
|
│ │ │ ├── chat.py
|
|
│ │ │ ├── index.py
|
|
│ │ │ └── status.py
|
|
│ │ ├── core/
|
|
│ │ │ ├── orchestrator.py
|
|
│ │ │ ├── memory.py
|
|
│ │ │ └── prompts.py
|
|
│ │ └── config.py
|
|
│ ├── rag/ # RAG engine
|
|
│ │ ├── indexer.py
|
|
│ │ ├── chunker.py
|
|
│ │ ├── embedder.py
|
|
│ │ ├── vector_store.py
|
|
│ │ └── search.py
|
|
│ ├── indexer_daemon/ # Vault watcher + indexer CLI
|
|
│ │ ├── daemon.py
|
|
│ │ ├── cli.py
|
|
│ │ └── watcher.py
|
|
│ └── forge/ # Fine-tuning pipeline
|
|
│ ├── extract.py
|
|
│ ├── train.py
|
|
│ ├── export.py
|
|
│ └── evaluate.py
|
|
├── ui/ # React frontend
|
|
│ ├── src/
|
|
│ │ ├── App.tsx
|
|
│ │ ├── components/
|
|
│ │ │ ├── Chat.tsx
|
|
│ │ │ ├── Message.tsx
|
|
│ │ │ └── Settings.tsx
|
|
│ │ └── api.ts
|
|
│ └── package.json
|
|
├── tests/
|
|
│ ├── companion/
|
|
│ ├── rag/
|
|
│ └── forge/
|
|
└── docs/
|
|
└── superpowers/
|
|
└── specs/
|
|
└── 2026-04-13-personal-companion-ai-design.md
|
|
```
|
|
|
|
## 10. Testing Strategy
|
|
|
|
| Layer | Tests |
|
|
|-------|-------|
|
|
| **RAG** | Chunking correctness, search relevance, incremental sync accuracy |
|
|
| **API** | Chat streaming, parameter validation, error handling |
|
|
| **Security** | Path traversal, sensitive content detection, local-only enforcement |
|
|
| **Forge** | Training convergence, eval loss trends, output GGUF validity |
|
|
| **E2E** | Full chat turn with RAG retrieval, citation rendering |
|
|
|
|
## 11. Risks & Mitigations
|
|
|
|
| Risk | Mitigation |
|
|
|------|------------|
|
|
| **Overfitting** on small dataset | LoRA rank 16, strong regularization, 10% validation split, human eval |
|
|
| **Temporal drift** | Quarterly retrain + daily RAG sync |
|
|
| **Privacy leak in training data** | Manual curation of training examples; exclude others' private details |
|
|
| **Emotional weight / uncanny valley** | Persona is companion, not predictor; framed as reflection |
|
|
| **Hardware limits** | QLoRA 4-bit, 8B model, ~35 GPU layers; fallback to CPU offloading if needed |
|
|
| **Maintenance fatigue** | Auto-sync removes daily work; retrain is one script + reminder |
|
|
|
|
## 12. Success Criteria
|
|
|
|
- [ ] Chat interface streams responses locally with sub-second first-token latency
|
|
- [ ] RAG retrieves relevant vault context for >80% of personal questions
|
|
- [ ] Fine-tuned model produces responses that "feel" recognizably aligned with Santhosh's reflective style
|
|
- [ ] Quarterly retrain completes successfully on RTX 5070 in <6 hours
|
|
- [ ] Daily auto-sync and manual trigger both work reliably
|
|
- [ ] No vault data leaves the local machine
|
|
|
|
## 13. Next Steps
|
|
|
|
1. Write implementation plan using `writing-plans` skill
|
|
2. Scaffold the repository structure
|
|
3. Build Vault Indexer + RAG engine first ( Week 1-2)
|
|
4. Integrate chat UI with base model (Week 3)
|
|
5. Curate training data and begin fine-tuning experiments (Week 4-6)
|
|
6. Polish, evaluate, and integrate (Week 7-8)
|