# Personal Companion AI — Design Spec **Date:** 2026-04-13 **Status:** Approved **Author:** Santhosh Janardhanan ## 1. Overview A fully local, privacy-first AI companion trained on Santhosh's Obsidian vault. The companion is a reflective confidante—not a digital twin or advisor—that can answer questions about past events, summarize relationships, and explore life patterns alongside him. The system combines a **fine-tuned local LLM** (for reasoning style and reflective voice) with a **RAG layer** (for factual retrieval from 677+ vault notes). ## 2. Core Philosophy - **Companion, not clone**: The AI does not speak as Santhosh. It speaks *to* him. - **Fully local**: No vault data leaves the machine. Ollama, LanceDB, and inference all run locally. - **Evolving self**: Quarterly model retraining + daily RAG sync keeps the companion aligned with his changing life. - **Minimal noise**: Notifications are quiet (streaming text + logs). No pop-ups. ## 3. Architecture ### Approach: Decoupled Services Three independent processes: ``` ┌─────────────────────────────────────────────────────────────┐ │ Companion Chat (Web UI / CLI) │ │ React + Vite ←───→ FastAPI backend (orchestrator) │ └─────────────────────────────────────────────────────────────┘ │ ┌─────────────────────┼─────────────────────┐ ↓ ↓ ↓ ┌──────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Fine-tuned │ │ RAG Engine │ │ Vault Indexer │ │ 7B Model │ │ (LanceDB) │ │ (daemon/CLI) │ │ (llama.cpp) │ │ │ │ │ │ │ │ • semantic │ │ • watches vault│ │ Quarterly │ │ search │ │ • chunks/embeds│ │ retrain │ │ • hybrid │ │ • daily auto │ │ │ │ filters │ │ sync │ │ │ │ • relationship │ │ • manual │ │ │ │ graph │ │ trigger │ └──────────────┘ └─────────────────┘ └─────────────────┘ ``` ### Service Responsibilities | Service | Language | Role | |---------|----------|------| | **Companion Engine** | Python (FastAPI) + TS (React) | Chat UI, session memory, prompt orchestration | | **RAG Engine** | Python | LanceDB queries, embedding cache, hybrid search | | **Vault Indexer** | Python (CLI + daemon) | File watching, chunking, embedding via Ollama | | **Model Forge** | Python (on-demand) | QLoRA fine-tuning, GGUF export | ## 4. Data Flow: A Chat Turn 1. User asks: *"I've been feeling off about my friendship with Vinay. What do you think?"* 2. **Orchestrator** detects this needs relationship context + reflective reasoning. 3. **RAG Engine** queries LanceDB for `Vinay`, `friendship`, `#Relations` — returns top 8 chunks. 4. **Prompt construction**: - System: Companion persona + reasoning instructions - Retrieved context: Relevant vault entries - Conversation history: Last 20 turns - User message 5. **Local LLM** streams a reflective response. 6. **Optional**: Companion asks a gentle follow-up to deepen reflection. ## 5. Technology Choices | Component | Choice | Rationale | |-----------|--------|-----------| | Base model | Meta-Llama-3.1-8B-Instruct | Strong reasoning, fits 12GB VRAM quantized | | Fine-tuning | Unsloth + QLoRA (4-bit) | Fast, memory-efficient, runs on RTX 5070 | | Inference | llama.cpp (GGUF) | Mature, fast local inference, easy GPU layer tuning | | Embedding | `mxbai-embed-large` via Ollama | 1024-dim, local, high quality | | Vector store | LanceDB (embedded) | File-based, no server, Rust-backed | | Backend | FastAPI + WebSockets | Streaming chat, simple Python API | | Frontend | React + Vite | Lightweight, fast dev loop | | File watcher | `watchdog` (Python) | Reliable cross-platform vault monitoring | ## 6. Fine-Tuning Strategy ### What the model learns - **Reflective reasoning style**: How Santhosh thinks through situations - **Values and priorities**: What he tends to weigh in decisions - **Communication patterns**: His tone in journal entries (direct, questioning, humorous) - **Relationship dynamics**: Patterns in how he describes people over time ### What stays in RAG - Specific dates, events, amounts - Exact quotes and conversations - Recent updates (between retrainings) - Granular facts ### Training data format Curated "reflection examples" from the vault, formatted as conversation turns: ```json { "messages": [ {"role": "system", "content": "You are a thoughtful companion..."}, {"role": "user", "content": "Journal entry about Vinay visit... What do you notice?"}, {"role": "assistant", "content": "It seems like you value these drop-ins..."} ] } ``` ### Training schedule - **Quarterly retrain**: Automatic reminder (log + chat stream) every 90 days. - **Manual trigger**: User can initiate retrain anytime via CLI/UI. - **Pipeline**: `vault → extract reflections → curate → train → export GGUF → swap model file` ## 7. RAG Engine Design ### Indexing Modes - **`index`**: Full rebuild of the vector store. - **`sync`**: Incremental — only process files modified since last sync. - **`reindex`**: Force full rebuild. - **`status`**: Show doc count, last sync, unindexed files. ### Auto-Sync Strategy - **File system watcher**: `watchdog` monitors vault root and triggers incremental sync on any `.md` change. - **Daily full sync**: At 3:00 AM, run a full sync to catch any missed events. - **Manual trigger**: `POST /index/trigger` from chat or CLI. ### Per-Directory Chunking Rules Different vault directories need different granularity: ```json "chunking_rules": { "default": { "strategy": "sliding_window", "chunk_size": 500, "chunk_overlap": 100 }, "Journal/**": { "strategy": "section", "section_tags": ["#DayInShort", "#mentalhealth", "#physicalhealth", "#work", "#finance", "#Relations"], "chunk_size": 300, "chunk_overlap": 50 }, "zzz-Archive/**": { "strategy": "sliding_window", "chunk_size": 800, "chunk_overlap": 150 } } ``` **Rationale**: Journal entries contain dense emotional/factual tags and benefit from section-based chunking with smaller chunks. Archives are reference material and can be chunked more coarsely. ### Metadata per Chunk - `source_file`: Relative path from vault root - `source_directory`: Top-level directory - `section`: Section heading (for structured notes) - `date`: Parsed from filename or frontmatter - `tags`: All hashtags and wikilinks found in chunk - `chunk_index`: Position in document - `modified_at`: File mtime for incremental sync - `rule_applied`: Which chunking rule was used ### Search - **Default top-k**: 8 chunks - **Max top-k**: 20 - **Similarity threshold**: 0.75 - **Hybrid search**: Enabled by default (30% keyword, 70% semantic) - **Filters**: date range, tag list, directory glob ## 8. Configuration Schema ```json { "companion": { "name": "SAN", "persona": { "role": "companion", "tone": "reflective", "style": "questioning", "boundaries": [ "does_not_impersonate_user", "no_future_predictions", "no_medical_or_legal_advice" ] }, "memory": { "session_turns": 20, "persistent_store": "~/.companion/memory.db", "summarize_after": 10 }, "chat": { "streaming": true, "max_response_tokens": 2048, "default_temperature": 0.7, "allow_temperature_override": true } }, "vault": { "path": "/home/san/KnowledgeVault/Default", "indexing": { "auto_sync": true, "auto_sync_interval_minutes": 1440, "watch_fs_events": true, "file_patterns": ["*.md"], "deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git", ".logseq"], "deny_patterns": ["*.tmp", "*.bak", "*conflict*", ".*"] }, "chunking_rules": { "default": { "strategy": "sliding_window", "chunk_size": 500, "chunk_overlap": 100 }, "Journal/**": { "strategy": "section", "chunk_size": 300, "chunk_overlap": 50 }, "zzz-Archive/**": { "strategy": "sliding_window", "chunk_size": 800, "chunk_overlap": 150 } } }, "rag": { "embedding": { "provider": "ollama", "model": "mxbai-embed-large", "base_url": "http://localhost:11434", "dimensions": 1024, "batch_size": 32 }, "vector_store": { "type": "lancedb", "path": "~/.companion/vectors.lance" }, "search": { "default_top_k": 8, "max_top_k": 20, "similarity_threshold": 0.75, "hybrid_search": { "enabled": true, "keyword_weight": 0.3, "semantic_weight": 0.7 }, "filters": { "date_range_enabled": true, "tag_filter_enabled": true, "directory_filter_enabled": true } } }, "model": { "inference": { "backend": "llama.cpp", "model_path": "~/.companion/models/companion-7b-q4.gguf", "context_length": 8192, "gpu_layers": 35, "batch_size": 512, "threads": 8 }, "fine_tuning": { "base_model": "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit", "output_dir": "~/.companion/training", "lora_rank": 16, "lora_alpha": 32, "learning_rate": 0.0002, "batch_size": 4, "gradient_accumulation_steps": 4, "num_epochs": 3, "warmup_steps": 100, "save_steps": 500, "eval_steps": 250, "training_data_path": "~/.companion/training_data/", "validation_split": 0.1 }, "retrain_schedule": { "auto_reminder": true, "default_interval_days": 90, "reminder_channels": ["chat_stream", "log"] } }, "api": { "host": "127.0.0.1", "port": 7373, "cors_origins": ["http://localhost:5173"], "auth": { "enabled": false } }, "ui": { "web": { "enabled": true, "theme": "obsidian", "features": { "streaming": true, "citations": true, "source_preview": true } }, "cli": { "enabled": true, "rich_output": true } }, "logging": { "level": "INFO", "file": "~/.companion/logs/companion.log", "max_size_mb": 100, "backup_count": 5 }, "security": { "local_only": true, "vault_path_traversal_check": true, "sensitive_content_detection": true, "sensitive_patterns": ["#mentalhealth", "#physicalhealth", "#finance", "#Relations"], "require_confirmation_for_external_apis": true } } ``` ## 9. Project Structure ``` companion/ ├── README.md ├── pyproject.toml ├── config.json ├── src/ │ ├── companion/ # FastAPI backend + orchestrator │ │ ├── main.py │ │ ├── api/ │ │ │ ├── chat.py │ │ │ ├── index.py │ │ │ └── status.py │ │ ├── core/ │ │ │ ├── orchestrator.py │ │ │ ├── memory.py │ │ │ └── prompts.py │ │ └── config.py │ ├── rag/ # RAG engine │ │ ├── indexer.py │ │ ├── chunker.py │ │ ├── embedder.py │ │ ├── vector_store.py │ │ └── search.py │ ├── indexer_daemon/ # Vault watcher + indexer CLI │ │ ├── daemon.py │ │ ├── cli.py │ │ └── watcher.py │ └── forge/ # Fine-tuning pipeline │ ├── extract.py │ ├── train.py │ ├── export.py │ └── evaluate.py ├── ui/ # React frontend │ ├── src/ │ │ ├── App.tsx │ │ ├── components/ │ │ │ ├── Chat.tsx │ │ │ ├── Message.tsx │ │ │ └── Settings.tsx │ │ └── api.ts │ └── package.json ├── tests/ │ ├── companion/ │ ├── rag/ │ └── forge/ └── docs/ └── superpowers/ └── specs/ └── 2026-04-13-personal-companion-ai-design.md ``` ## 10. Testing Strategy | Layer | Tests | |-------|-------| | **RAG** | Chunking correctness, search relevance, incremental sync accuracy | | **API** | Chat streaming, parameter validation, error handling | | **Security** | Path traversal, sensitive content detection, local-only enforcement | | **Forge** | Training convergence, eval loss trends, output GGUF validity | | **E2E** | Full chat turn with RAG retrieval, citation rendering | ## 11. Risks & Mitigations | Risk | Mitigation | |------|------------| | **Overfitting** on small dataset | LoRA rank 16, strong regularization, 10% validation split, human eval | | **Temporal drift** | Quarterly retrain + daily RAG sync | | **Privacy leak in training data** | Manual curation of training examples; exclude others' private details | | **Emotional weight / uncanny valley** | Persona is companion, not predictor; framed as reflection | | **Hardware limits** | QLoRA 4-bit, 8B model, ~35 GPU layers; fallback to CPU offloading if needed | | **Maintenance fatigue** | Auto-sync removes daily work; retrain is one script + reminder | ## 12. Success Criteria - [ ] Chat interface streams responses locally with sub-second first-token latency - [ ] RAG retrieves relevant vault context for >80% of personal questions - [ ] Fine-tuned model produces responses that "feel" recognizably aligned with Santhosh's reflective style - [ ] Quarterly retrain completes successfully on RTX 5070 in <6 hours - [ ] Daily auto-sync and manual trigger both work reliably - [ ] No vault data leaves the local machine ## 13. Next Steps 1. Write implementation plan using `writing-plans` skill 2. Scaffold the repository structure 3. Build Vault Indexer + RAG engine first ( Week 1-2) 4. Integrate chat UI with base model (Week 3) 5. Curate training data and begin fine-tuning experiments (Week 4-6) 6. Polish, evaluate, and integrate (Week 7-8)