feat: add LanceDB vector store with upsert, delete, and search
This commit is contained in:
1939
docs/superpowers/plans/2026-04-13-personal-companion-ai-phase1.md
Normal file
1939
docs/superpowers/plans/2026-04-13-personal-companion-ai-phase1.md
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,384 @@
|
||||
# Personal Companion AI — Design Spec
|
||||
|
||||
**Date:** 2026-04-13
|
||||
**Status:** Approved
|
||||
**Author:** Santhosh Janardhanan
|
||||
|
||||
## 1. Overview
|
||||
|
||||
A fully local, privacy-first AI companion trained on Santhosh's Obsidian vault. The companion is a reflective confidante—not a digital twin or advisor—that can answer questions about past events, summarize relationships, and explore life patterns alongside him.
|
||||
|
||||
The system combines a **fine-tuned local LLM** (for reasoning style and reflective voice) with a **RAG layer** (for factual retrieval from 677+ vault notes).
|
||||
|
||||
## 2. Core Philosophy
|
||||
|
||||
- **Companion, not clone**: The AI does not speak as Santhosh. It speaks *to* him.
|
||||
- **Fully local**: No vault data leaves the machine. Ollama, LanceDB, and inference all run locally.
|
||||
- **Evolving self**: Quarterly model retraining + daily RAG sync keeps the companion aligned with his changing life.
|
||||
- **Minimal noise**: Notifications are quiet (streaming text + logs). No pop-ups.
|
||||
|
||||
## 3. Architecture
|
||||
|
||||
### Approach: Decoupled Services
|
||||
|
||||
Three independent processes:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Companion Chat (Web UI / CLI) │
|
||||
│ React + Vite ←───→ FastAPI backend (orchestrator) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────┼─────────────────────┐
|
||||
↓ ↓ ↓
|
||||
┌──────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Fine-tuned │ │ RAG Engine │ │ Vault Indexer │
|
||||
│ 7B Model │ │ (LanceDB) │ │ (daemon/CLI) │
|
||||
│ (llama.cpp) │ │ │ │ │
|
||||
│ │ │ • semantic │ │ • watches vault│
|
||||
│ Quarterly │ │ search │ │ • chunks/embeds│
|
||||
│ retrain │ │ • hybrid │ │ • daily auto │
|
||||
│ │ │ filters │ │ sync │
|
||||
│ │ │ • relationship │ │ • manual │
|
||||
│ │ │ graph │ │ trigger │
|
||||
└──────────────┘ └─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
### Service Responsibilities
|
||||
|
||||
| Service | Language | Role |
|
||||
|---------|----------|------|
|
||||
| **Companion Engine** | Python (FastAPI) + TS (React) | Chat UI, session memory, prompt orchestration |
|
||||
| **RAG Engine** | Python | LanceDB queries, embedding cache, hybrid search |
|
||||
| **Vault Indexer** | Python (CLI + daemon) | File watching, chunking, embedding via Ollama |
|
||||
| **Model Forge** | Python (on-demand) | QLoRA fine-tuning, GGUF export |
|
||||
|
||||
## 4. Data Flow: A Chat Turn
|
||||
|
||||
1. User asks: *"I've been feeling off about my friendship with Vinay. What do you think?"*
|
||||
2. **Orchestrator** detects this needs relationship context + reflective reasoning.
|
||||
3. **RAG Engine** queries LanceDB for `Vinay`, `friendship`, `#Relations` — returns top 8 chunks.
|
||||
4. **Prompt construction**:
|
||||
- System: Companion persona + reasoning instructions
|
||||
- Retrieved context: Relevant vault entries
|
||||
- Conversation history: Last 20 turns
|
||||
- User message
|
||||
5. **Local LLM** streams a reflective response.
|
||||
6. **Optional**: Companion asks a gentle follow-up to deepen reflection.
|
||||
|
||||
## 5. Technology Choices
|
||||
|
||||
| Component | Choice | Rationale |
|
||||
|-----------|--------|-----------|
|
||||
| Base model | Meta-Llama-3.1-8B-Instruct | Strong reasoning, fits 12GB VRAM quantized |
|
||||
| Fine-tuning | Unsloth + QLoRA (4-bit) | Fast, memory-efficient, runs on RTX 5070 |
|
||||
| Inference | llama.cpp (GGUF) | Mature, fast local inference, easy GPU layer tuning |
|
||||
| Embedding | `mxbai-embed-large` via Ollama | 1024-dim, local, high quality |
|
||||
| Vector store | LanceDB (embedded) | File-based, no server, Rust-backed |
|
||||
| Backend | FastAPI + WebSockets | Streaming chat, simple Python API |
|
||||
| Frontend | React + Vite | Lightweight, fast dev loop |
|
||||
| File watcher | `watchdog` (Python) | Reliable cross-platform vault monitoring |
|
||||
|
||||
## 6. Fine-Tuning Strategy
|
||||
|
||||
### What the model learns
|
||||
- **Reflective reasoning style**: How Santhosh thinks through situations
|
||||
- **Values and priorities**: What he tends to weigh in decisions
|
||||
- **Communication patterns**: His tone in journal entries (direct, questioning, humorous)
|
||||
- **Relationship dynamics**: Patterns in how he describes people over time
|
||||
|
||||
### What stays in RAG
|
||||
- Specific dates, events, amounts
|
||||
- Exact quotes and conversations
|
||||
- Recent updates (between retrainings)
|
||||
- Granular facts
|
||||
|
||||
### Training data format
|
||||
Curated "reflection examples" from the vault, formatted as conversation turns:
|
||||
|
||||
```json
|
||||
{
|
||||
"messages": [
|
||||
{"role": "system", "content": "You are a thoughtful companion..."},
|
||||
{"role": "user", "content": "Journal entry about Vinay visit... What do you notice?"},
|
||||
{"role": "assistant", "content": "It seems like you value these drop-ins..."}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Training schedule
|
||||
- **Quarterly retrain**: Automatic reminder (log + chat stream) every 90 days.
|
||||
- **Manual trigger**: User can initiate retrain anytime via CLI/UI.
|
||||
- **Pipeline**: `vault → extract reflections → curate → train → export GGUF → swap model file`
|
||||
|
||||
## 7. RAG Engine Design
|
||||
|
||||
### Indexing Modes
|
||||
- **`index`**: Full rebuild of the vector store.
|
||||
- **`sync`**: Incremental — only process files modified since last sync.
|
||||
- **`reindex`**: Force full rebuild.
|
||||
- **`status`**: Show doc count, last sync, unindexed files.
|
||||
|
||||
### Auto-Sync Strategy
|
||||
- **File system watcher**: `watchdog` monitors vault root and triggers incremental sync on any `.md` change.
|
||||
- **Daily full sync**: At 3:00 AM, run a full sync to catch any missed events.
|
||||
- **Manual trigger**: `POST /index/trigger` from chat or CLI.
|
||||
|
||||
### Per-Directory Chunking Rules
|
||||
Different vault directories need different granularity:
|
||||
|
||||
```json
|
||||
"chunking_rules": {
|
||||
"default": {
|
||||
"strategy": "sliding_window",
|
||||
"chunk_size": 500,
|
||||
"chunk_overlap": 100
|
||||
},
|
||||
"Journal/**": {
|
||||
"strategy": "section",
|
||||
"section_tags": ["#DayInShort", "#mentalhealth", "#physicalhealth", "#work", "#finance", "#Relations"],
|
||||
"chunk_size": 300,
|
||||
"chunk_overlap": 50
|
||||
},
|
||||
"zzz-Archive/**": {
|
||||
"strategy": "sliding_window",
|
||||
"chunk_size": 800,
|
||||
"chunk_overlap": 150
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Rationale**: Journal entries contain dense emotional/factual tags and benefit from section-based chunking with smaller chunks. Archives are reference material and can be chunked more coarsely.
|
||||
|
||||
### Metadata per Chunk
|
||||
- `source_file`: Relative path from vault root
|
||||
- `source_directory`: Top-level directory
|
||||
- `section`: Section heading (for structured notes)
|
||||
- `date`: Parsed from filename or frontmatter
|
||||
- `tags`: All hashtags and wikilinks found in chunk
|
||||
- `chunk_index`: Position in document
|
||||
- `modified_at`: File mtime for incremental sync
|
||||
- `rule_applied`: Which chunking rule was used
|
||||
|
||||
### Search
|
||||
- **Default top-k**: 8 chunks
|
||||
- **Max top-k**: 20
|
||||
- **Similarity threshold**: 0.75
|
||||
- **Hybrid search**: Enabled by default (30% keyword, 70% semantic)
|
||||
- **Filters**: date range, tag list, directory glob
|
||||
|
||||
## 8. Configuration Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"companion": {
|
||||
"name": "SAN",
|
||||
"persona": {
|
||||
"role": "companion",
|
||||
"tone": "reflective",
|
||||
"style": "questioning",
|
||||
"boundaries": [
|
||||
"does_not_impersonate_user",
|
||||
"no_future_predictions",
|
||||
"no_medical_or_legal_advice"
|
||||
]
|
||||
},
|
||||
"memory": {
|
||||
"session_turns": 20,
|
||||
"persistent_store": "~/.companion/memory.db",
|
||||
"summarize_after": 10
|
||||
},
|
||||
"chat": {
|
||||
"streaming": true,
|
||||
"max_response_tokens": 2048,
|
||||
"default_temperature": 0.7,
|
||||
"allow_temperature_override": true
|
||||
}
|
||||
},
|
||||
"vault": {
|
||||
"path": "/home/san/KnowledgeVault/Default",
|
||||
"indexing": {
|
||||
"auto_sync": true,
|
||||
"auto_sync_interval_minutes": 1440,
|
||||
"watch_fs_events": true,
|
||||
"file_patterns": ["*.md"],
|
||||
"deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git", ".logseq"],
|
||||
"deny_patterns": ["*.tmp", "*.bak", "*conflict*", ".*"]
|
||||
},
|
||||
"chunking_rules": {
|
||||
"default": { "strategy": "sliding_window", "chunk_size": 500, "chunk_overlap": 100 },
|
||||
"Journal/**": { "strategy": "section", "chunk_size": 300, "chunk_overlap": 50 },
|
||||
"zzz-Archive/**": { "strategy": "sliding_window", "chunk_size": 800, "chunk_overlap": 150 }
|
||||
}
|
||||
},
|
||||
"rag": {
|
||||
"embedding": {
|
||||
"provider": "ollama",
|
||||
"model": "mxbai-embed-large",
|
||||
"base_url": "http://localhost:11434",
|
||||
"dimensions": 1024,
|
||||
"batch_size": 32
|
||||
},
|
||||
"vector_store": {
|
||||
"type": "lancedb",
|
||||
"path": "~/.companion/vectors.lance"
|
||||
},
|
||||
"search": {
|
||||
"default_top_k": 8,
|
||||
"max_top_k": 20,
|
||||
"similarity_threshold": 0.75,
|
||||
"hybrid_search": { "enabled": true, "keyword_weight": 0.3, "semantic_weight": 0.7 },
|
||||
"filters": { "date_range_enabled": true, "tag_filter_enabled": true, "directory_filter_enabled": true }
|
||||
}
|
||||
},
|
||||
"model": {
|
||||
"inference": {
|
||||
"backend": "llama.cpp",
|
||||
"model_path": "~/.companion/models/companion-7b-q4.gguf",
|
||||
"context_length": 8192,
|
||||
"gpu_layers": 35,
|
||||
"batch_size": 512,
|
||||
"threads": 8
|
||||
},
|
||||
"fine_tuning": {
|
||||
"base_model": "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
|
||||
"output_dir": "~/.companion/training",
|
||||
"lora_rank": 16,
|
||||
"lora_alpha": 32,
|
||||
"learning_rate": 0.0002,
|
||||
"batch_size": 4,
|
||||
"gradient_accumulation_steps": 4,
|
||||
"num_epochs": 3,
|
||||
"warmup_steps": 100,
|
||||
"save_steps": 500,
|
||||
"eval_steps": 250,
|
||||
"training_data_path": "~/.companion/training_data/",
|
||||
"validation_split": 0.1
|
||||
},
|
||||
"retrain_schedule": {
|
||||
"auto_reminder": true,
|
||||
"default_interval_days": 90,
|
||||
"reminder_channels": ["chat_stream", "log"]
|
||||
}
|
||||
},
|
||||
"api": {
|
||||
"host": "127.0.0.1",
|
||||
"port": 7373,
|
||||
"cors_origins": ["http://localhost:5173"],
|
||||
"auth": { "enabled": false }
|
||||
},
|
||||
"ui": {
|
||||
"web": {
|
||||
"enabled": true,
|
||||
"theme": "obsidian",
|
||||
"features": { "streaming": true, "citations": true, "source_preview": true }
|
||||
},
|
||||
"cli": { "enabled": true, "rich_output": true }
|
||||
},
|
||||
"logging": {
|
||||
"level": "INFO",
|
||||
"file": "~/.companion/logs/companion.log",
|
||||
"max_size_mb": 100,
|
||||
"backup_count": 5
|
||||
},
|
||||
"security": {
|
||||
"local_only": true,
|
||||
"vault_path_traversal_check": true,
|
||||
"sensitive_content_detection": true,
|
||||
"sensitive_patterns": ["#mentalhealth", "#physicalhealth", "#finance", "#Relations"],
|
||||
"require_confirmation_for_external_apis": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 9. Project Structure
|
||||
|
||||
```
|
||||
companion/
|
||||
├── README.md
|
||||
├── pyproject.toml
|
||||
├── config.json
|
||||
├── src/
|
||||
│ ├── companion/ # FastAPI backend + orchestrator
|
||||
│ │ ├── main.py
|
||||
│ │ ├── api/
|
||||
│ │ │ ├── chat.py
|
||||
│ │ │ ├── index.py
|
||||
│ │ │ └── status.py
|
||||
│ │ ├── core/
|
||||
│ │ │ ├── orchestrator.py
|
||||
│ │ │ ├── memory.py
|
||||
│ │ │ └── prompts.py
|
||||
│ │ └── config.py
|
||||
│ ├── rag/ # RAG engine
|
||||
│ │ ├── indexer.py
|
||||
│ │ ├── chunker.py
|
||||
│ │ ├── embedder.py
|
||||
│ │ ├── vector_store.py
|
||||
│ │ └── search.py
|
||||
│ ├── indexer_daemon/ # Vault watcher + indexer CLI
|
||||
│ │ ├── daemon.py
|
||||
│ │ ├── cli.py
|
||||
│ │ └── watcher.py
|
||||
│ └── forge/ # Fine-tuning pipeline
|
||||
│ ├── extract.py
|
||||
│ ├── train.py
|
||||
│ ├── export.py
|
||||
│ └── evaluate.py
|
||||
├── ui/ # React frontend
|
||||
│ ├── src/
|
||||
│ │ ├── App.tsx
|
||||
│ │ ├── components/
|
||||
│ │ │ ├── Chat.tsx
|
||||
│ │ │ ├── Message.tsx
|
||||
│ │ │ └── Settings.tsx
|
||||
│ │ └── api.ts
|
||||
│ └── package.json
|
||||
├── tests/
|
||||
│ ├── companion/
|
||||
│ ├── rag/
|
||||
│ └── forge/
|
||||
└── docs/
|
||||
└── superpowers/
|
||||
└── specs/
|
||||
└── 2026-04-13-personal-companion-ai-design.md
|
||||
```
|
||||
|
||||
## 10. Testing Strategy
|
||||
|
||||
| Layer | Tests |
|
||||
|-------|-------|
|
||||
| **RAG** | Chunking correctness, search relevance, incremental sync accuracy |
|
||||
| **API** | Chat streaming, parameter validation, error handling |
|
||||
| **Security** | Path traversal, sensitive content detection, local-only enforcement |
|
||||
| **Forge** | Training convergence, eval loss trends, output GGUF validity |
|
||||
| **E2E** | Full chat turn with RAG retrieval, citation rendering |
|
||||
|
||||
## 11. Risks & Mitigations
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| **Overfitting** on small dataset | LoRA rank 16, strong regularization, 10% validation split, human eval |
|
||||
| **Temporal drift** | Quarterly retrain + daily RAG sync |
|
||||
| **Privacy leak in training data** | Manual curation of training examples; exclude others' private details |
|
||||
| **Emotional weight / uncanny valley** | Persona is companion, not predictor; framed as reflection |
|
||||
| **Hardware limits** | QLoRA 4-bit, 8B model, ~35 GPU layers; fallback to CPU offloading if needed |
|
||||
| **Maintenance fatigue** | Auto-sync removes daily work; retrain is one script + reminder |
|
||||
|
||||
## 12. Success Criteria
|
||||
|
||||
- [ ] Chat interface streams responses locally with sub-second first-token latency
|
||||
- [ ] RAG retrieves relevant vault context for >80% of personal questions
|
||||
- [ ] Fine-tuned model produces responses that "feel" recognizably aligned with Santhosh's reflective style
|
||||
- [ ] Quarterly retrain completes successfully on RTX 5070 in <6 hours
|
||||
- [ ] Daily auto-sync and manual trigger both work reliably
|
||||
- [ ] No vault data leaves the local machine
|
||||
|
||||
## 13. Next Steps
|
||||
|
||||
1. Write implementation plan using `writing-plans` skill
|
||||
2. Scaffold the repository structure
|
||||
3. Build Vault Indexer + RAG engine first ( Week 1-2)
|
||||
4. Integrate chat UI with base model (Week 3)
|
||||
5. Curate training data and begin fine-tuning experiments (Week 4-6)
|
||||
6. Polish, evaluate, and integrate (Week 7-8)
|
||||
Reference in New Issue
Block a user