feat: add LanceDB vector store with upsert, delete, and search

2026-04-13 14:18:04 -04:00
parent 0948d2dcb7
commit 8d2762268b
570 changed files with 14491 additions and 0 deletions
--- a/docs/superpowers/plans/2026-04-13-personal-companion-ai-phase1.md
+++ b/docs/superpowers/plans/2026-04-13-personal-companion-ai-phase1.md
--- a/docs/superpowers/specs/2026-04-13-personal-companion-ai-design.md
+++ b/docs/superpowers/specs/2026-04-13-personal-companion-ai-design.md
@@ -0,0 +1,384 @@
+# Personal Companion AI — Design Spec
+
+**Date:** 2026-04-13  
+**Status:** Approved  
+**Author:** Santhosh Janardhanan
+
+## 1. Overview
+
+A fully local, privacy-first AI companion trained on Santhosh's Obsidian vault. The companion is a reflective confidante—not a digital twin or advisor—that can answer questions about past events, summarize relationships, and explore life patterns alongside him.
+
+The system combines a **fine-tuned local LLM** (for reasoning style and reflective voice) with a **RAG layer** (for factual retrieval from 677+ vault notes).
+
+## 2. Core Philosophy
+
+- **Companion, not clone**: The AI does not speak as Santhosh. It speaks *to* him.
+- **Fully local**: No vault data leaves the machine. Ollama, LanceDB, and inference all run locally.
+- **Evolving self**: Quarterly model retraining + daily RAG sync keeps the companion aligned with his changing life.
+- **Minimal noise**: Notifications are quiet (streaming text + logs). No pop-ups.
+
+## 3. Architecture
+
+### Approach: Decoupled Services
+
+Three independent processes:
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Companion Chat (Web UI / CLI)                              │
+│  React + Vite  ←───→  FastAPI backend (orchestrator)        │
+└─────────────────────────────────────────────────────────────┘
+                              │
+        ┌─────────────────────┼─────────────────────┐
+        ↓                     ↓                     ↓
+┌──────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│ Fine-tuned   │    │  RAG Engine     │    │  Vault Indexer  │
+│ 7B Model     │    │  (LanceDB)      │    │  (daemon/CLI)   │
+│ (llama.cpp)  │    │                 │    │                 │
+│              │    │  • semantic     │    │  • watches vault│
+│  Quarterly   │    │    search       │    │  • chunks/embeds│
+│  retrain     │    │  • hybrid       │    │  • daily auto   │
+│              │    │    filters      │    │    sync         │
+│              │    │  • relationship │    │  • manual       │
+│              │    │    graph        │    │    trigger      │
+└──────────────┘    └─────────────────┘    └─────────────────┘
+```
+
+### Service Responsibilities
+
+| Service | Language | Role |
+|---------|----------|------|
+| **Companion Engine** | Python (FastAPI) + TS (React) | Chat UI, session memory, prompt orchestration |
+| **RAG Engine** | Python | LanceDB queries, embedding cache, hybrid search |
+| **Vault Indexer** | Python (CLI + daemon) | File watching, chunking, embedding via Ollama |
+| **Model Forge** | Python (on-demand) | QLoRA fine-tuning, GGUF export |
+
+## 4. Data Flow: A Chat Turn
+
+1. User asks: *"I've been feeling off about my friendship with Vinay. What do you think?"*
+2. **Orchestrator** detects this needs relationship context + reflective reasoning.
+3. **RAG Engine** queries LanceDB for `Vinay`, `friendship`, `#Relations` — returns top 8 chunks.
+4. **Prompt construction**:
+   - System: Companion persona + reasoning instructions
+   - Retrieved context: Relevant vault entries
+   - Conversation history: Last 20 turns
+   - User message
+5. **Local LLM** streams a reflective response.
+6. **Optional**: Companion asks a gentle follow-up to deepen reflection.
+
+## 5. Technology Choices
+
+| Component | Choice | Rationale |
+|-----------|--------|-----------|
+| Base model | Meta-Llama-3.1-8B-Instruct | Strong reasoning, fits 12GB VRAM quantized |
+| Fine-tuning | Unsloth + QLoRA (4-bit) | Fast, memory-efficient, runs on RTX 5070 |
+| Inference | llama.cpp (GGUF) | Mature, fast local inference, easy GPU layer tuning |
+| Embedding | `mxbai-embed-large` via Ollama | 1024-dim, local, high quality |
+| Vector store | LanceDB (embedded) | File-based, no server, Rust-backed |
+| Backend | FastAPI + WebSockets | Streaming chat, simple Python API |
+| Frontend | React + Vite | Lightweight, fast dev loop |
+| File watcher | `watchdog` (Python) | Reliable cross-platform vault monitoring |
+
+## 6. Fine-Tuning Strategy
+
+### What the model learns
+- **Reflective reasoning style**: How Santhosh thinks through situations
+- **Values and priorities**: What he tends to weigh in decisions
+- **Communication patterns**: His tone in journal entries (direct, questioning, humorous)
+- **Relationship dynamics**: Patterns in how he describes people over time
+
+### What stays in RAG
+- Specific dates, events, amounts
+- Exact quotes and conversations
+- Recent updates (between retrainings)
+- Granular facts
+
+### Training data format
+Curated "reflection examples" from the vault, formatted as conversation turns:
+
+```json
+{
+  "messages": [
+    {"role": "system", "content": "You are a thoughtful companion..."},
+    {"role": "user", "content": "Journal entry about Vinay visit... What do you notice?"},
+    {"role": "assistant", "content": "It seems like you value these drop-ins..."}
+  ]
+}
+```
+
+### Training schedule
+- **Quarterly retrain**: Automatic reminder (log + chat stream) every 90 days.
+- **Manual trigger**: User can initiate retrain anytime via CLI/UI.
+- **Pipeline**: `vault → extract reflections → curate → train → export GGUF → swap model file`
+
+## 7. RAG Engine Design
+
+### Indexing Modes
+- **`index`**: Full rebuild of the vector store.
+- **`sync`**: Incremental — only process files modified since last sync.
+- **`reindex`**: Force full rebuild.
+- **`status`**: Show doc count, last sync, unindexed files.
+
+### Auto-Sync Strategy
+- **File system watcher**: `watchdog` monitors vault root and triggers incremental sync on any `.md` change.
+- **Daily full sync**: At 3:00 AM, run a full sync to catch any missed events.
+- **Manual trigger**: `POST /index/trigger` from chat or CLI.
+
+### Per-Directory Chunking Rules
+Different vault directories need different granularity:
+
+```json
+"chunking_rules": {
+  "default": {
+    "strategy": "sliding_window",
+    "chunk_size": 500,
+    "chunk_overlap": 100
+  },
+  "Journal/**": {
+    "strategy": "section",
+    "section_tags": ["#DayInShort", "#mentalhealth", "#physicalhealth", "#work", "#finance", "#Relations"],
+    "chunk_size": 300,
+    "chunk_overlap": 50
+  },
+  "zzz-Archive/**": {
+    "strategy": "sliding_window",
+    "chunk_size": 800,
+    "chunk_overlap": 150
+  }
+}
+```
+
+**Rationale**: Journal entries contain dense emotional/factual tags and benefit from section-based chunking with smaller chunks. Archives are reference material and can be chunked more coarsely.
+
+### Metadata per Chunk
+- `source_file`: Relative path from vault root
+- `source_directory`: Top-level directory
+- `section`: Section heading (for structured notes)
+- `date`: Parsed from filename or frontmatter
+- `tags`: All hashtags and wikilinks found in chunk
+- `chunk_index`: Position in document
+- `modified_at`: File mtime for incremental sync
+- `rule_applied`: Which chunking rule was used
+
+### Search
+- **Default top-k**: 8 chunks
+- **Max top-k**: 20
+- **Similarity threshold**: 0.75
+- **Hybrid search**: Enabled by default (30% keyword, 70% semantic)
+- **Filters**: date range, tag list, directory glob
+
+## 8. Configuration Schema
+
+```json
+{
+  "companion": {
+    "name": "SAN",
+    "persona": {
+      "role": "companion",
+      "tone": "reflective",
+      "style": "questioning",
+      "boundaries": [
+        "does_not_impersonate_user",
+        "no_future_predictions",
+        "no_medical_or_legal_advice"
+      ]
+    },
+    "memory": {
+      "session_turns": 20,
+      "persistent_store": "~/.companion/memory.db",
+      "summarize_after": 10
+    },
+    "chat": {
+      "streaming": true,
+      "max_response_tokens": 2048,
+      "default_temperature": 0.7,
+      "allow_temperature_override": true
+    }
+  },
+  "vault": {
+    "path": "/home/san/KnowledgeVault/Default",
+    "indexing": {
+      "auto_sync": true,
+      "auto_sync_interval_minutes": 1440,
+      "watch_fs_events": true,
+      "file_patterns": ["*.md"],
+      "deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git", ".logseq"],
+      "deny_patterns": ["*.tmp", "*.bak", "*conflict*", ".*"]
+    },
+    "chunking_rules": {
+      "default": { "strategy": "sliding_window", "chunk_size": 500, "chunk_overlap": 100 },
+      "Journal/**": { "strategy": "section", "chunk_size": 300, "chunk_overlap": 50 },
+      "zzz-Archive/**": { "strategy": "sliding_window", "chunk_size": 800, "chunk_overlap": 150 }
+    }
+  },
+  "rag": {
+    "embedding": {
+      "provider": "ollama",
+      "model": "mxbai-embed-large",
+      "base_url": "http://localhost:11434",
+      "dimensions": 1024,
+      "batch_size": 32
+    },
+    "vector_store": {
+      "type": "lancedb",
+      "path": "~/.companion/vectors.lance"
+    },
+    "search": {
+      "default_top_k": 8,
+      "max_top_k": 20,
+      "similarity_threshold": 0.75,
+      "hybrid_search": { "enabled": true, "keyword_weight": 0.3, "semantic_weight": 0.7 },
+      "filters": { "date_range_enabled": true, "tag_filter_enabled": true, "directory_filter_enabled": true }
+    }
+  },
+  "model": {
+    "inference": {
+      "backend": "llama.cpp",
+      "model_path": "~/.companion/models/companion-7b-q4.gguf",
+      "context_length": 8192,
+      "gpu_layers": 35,
+      "batch_size": 512,
+      "threads": 8
+    },
+    "fine_tuning": {
+      "base_model": "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
+      "output_dir": "~/.companion/training",
+      "lora_rank": 16,
+      "lora_alpha": 32,
+      "learning_rate": 0.0002,
+      "batch_size": 4,
+      "gradient_accumulation_steps": 4,
+      "num_epochs": 3,
+      "warmup_steps": 100,
+      "save_steps": 500,
+      "eval_steps": 250,
+      "training_data_path": "~/.companion/training_data/",
+      "validation_split": 0.1
+    },
+    "retrain_schedule": {
+      "auto_reminder": true,
+      "default_interval_days": 90,
+      "reminder_channels": ["chat_stream", "log"]
+    }
+  },
+  "api": {
+    "host": "127.0.0.1",
+    "port": 7373,
+    "cors_origins": ["http://localhost:5173"],
+    "auth": { "enabled": false }
+  },
+  "ui": {
+    "web": {
+      "enabled": true,
+      "theme": "obsidian",
+      "features": { "streaming": true, "citations": true, "source_preview": true }
+    },
+    "cli": { "enabled": true, "rich_output": true }
+  },
+  "logging": {
+    "level": "INFO",
+    "file": "~/.companion/logs/companion.log",
+    "max_size_mb": 100,
+    "backup_count": 5
+  },
+  "security": {
+    "local_only": true,
+    "vault_path_traversal_check": true,
+    "sensitive_content_detection": true,
+    "sensitive_patterns": ["#mentalhealth", "#physicalhealth", "#finance", "#Relations"],
+    "require_confirmation_for_external_apis": true
+  }
+}
+```
+
+## 9. Project Structure
+
+```
+companion/
+├── README.md
+├── pyproject.toml
+├── config.json
+├── src/
+│   ├── companion/              # FastAPI backend + orchestrator
+│   │   ├── main.py
+│   │   ├── api/
+│   │   │   ├── chat.py
+│   │   │   ├── index.py
+│   │   │   └── status.py
+│   │   ├── core/
+│   │   │   ├── orchestrator.py
+│   │   │   ├── memory.py
+│   │   │   └── prompts.py
+│   │   └── config.py
+│   ├── rag/                    # RAG engine
+│   │   ├── indexer.py
+│   │   ├── chunker.py
+│   │   ├── embedder.py
+│   │   ├── vector_store.py
+│   │   └── search.py
+│   ├── indexer_daemon/         # Vault watcher + indexer CLI
+│   │   ├── daemon.py
+│   │   ├── cli.py
+│   │   └── watcher.py
+│   └── forge/                  # Fine-tuning pipeline
+│       ├── extract.py
+│       ├── train.py
+│       ├── export.py
+│       └── evaluate.py
+├── ui/                         # React frontend
+│   ├── src/
+│   │   ├── App.tsx
+│   │   ├── components/
+│   │   │   ├── Chat.tsx
+│   │   │   ├── Message.tsx
+│   │   │   └── Settings.tsx
+│   │   └── api.ts
+│   └── package.json
+├── tests/
+│   ├── companion/
+│   ├── rag/
+│   └── forge/
+└── docs/
+    └── superpowers/
+        └── specs/
+            └── 2026-04-13-personal-companion-ai-design.md
+```
+
+## 10. Testing Strategy
+
+| Layer | Tests |
+|-------|-------|
+| **RAG** | Chunking correctness, search relevance, incremental sync accuracy |
+| **API** | Chat streaming, parameter validation, error handling |
+| **Security** | Path traversal, sensitive content detection, local-only enforcement |
+| **Forge** | Training convergence, eval loss trends, output GGUF validity |
+| **E2E** | Full chat turn with RAG retrieval, citation rendering |
+
+## 11. Risks & Mitigations
+
+| Risk | Mitigation |
+|------|------------|
+| **Overfitting** on small dataset | LoRA rank 16, strong regularization, 10% validation split, human eval |
+| **Temporal drift** | Quarterly retrain + daily RAG sync |
+| **Privacy leak in training data** | Manual curation of training examples; exclude others' private details |
+| **Emotional weight / uncanny valley** | Persona is companion, not predictor; framed as reflection |
+| **Hardware limits** | QLoRA 4-bit, 8B model, ~35 GPU layers; fallback to CPU offloading if needed |
+| **Maintenance fatigue** | Auto-sync removes daily work; retrain is one script + reminder |
+
+## 12. Success Criteria
+
+- [ ] Chat interface streams responses locally with sub-second first-token latency
+- [ ] RAG retrieves relevant vault context for >80% of personal questions
+- [ ] Fine-tuned model produces responses that "feel" recognizably aligned with Santhosh's reflective style
+- [ ] Quarterly retrain completes successfully on RTX 5070 in <6 hours
+- [ ] Daily auto-sync and manual trigger both work reliably
+- [ ] No vault data leaves the local machine
+
+## 13. Next Steps
+
+1. Write implementation plan using `writing-plans` skill
+2. Scaffold the repository structure
+3. Build Vault Indexer + RAG engine first ( Week 1-2)
+4. Integrate chat UI with base model (Week 3)
+5. Curate training data and begin fine-tuning experiments (Week 4-6)
+6. Polish, evaluate, and integrate (Week 7-8)