kv-ai/docs/superpowers/specs/2026-04-13-personal-companion-ai-design.md

# Personal Companion AI — Design Spec

**Date:** 2026-04-13
**Status:** Approved
**Author:** Santhosh Janardhanan

## 1. Overview

A fully local, privacy-first AI companion trained on Santhosh's Obsidian vault. The companion is a reflective confidante—not a digital twin or advisor—that can answer questions about past events, summarize relationships, and explore life patterns alongside him.

The system combines a **fine-tuned local LLM** (for reasoning style and reflective voice) with a **RAG layer** (for factual retrieval from 677+ vault notes).

## 2. Core Philosophy

- **Companion, not clone**: The AI does not speak as Santhosh. It speaks *to* him.
- **Fully local**: No vault data leaves the machine. Ollama, LanceDB, and inference all run locally.
- **Evolving self**: Quarterly model retraining + daily RAG sync keeps the companion aligned with his changing life.
- **Minimal noise**: Notifications are quiet (streaming text + logs). No pop-ups.

## 3. Architecture

### Approach: Decoupled Services

Three independent processes:

```
┌─────────────────────────────────────────────────────────────┐
│  Companion Chat (Web UI / CLI)                              │
│  React + Vite  ←───→  FastAPI backend (orchestrator)        │
└─────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        ↓                     ↓                     ↓
┌──────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Fine-tuned   │    │  RAG Engine     │    │  Vault Indexer  │
│ 7B Model     │    │  (LanceDB)      │    │  (daemon/CLI)   │
│ (llama.cpp)  │    │                 │    │                 │
│              │    │  • semantic     │    │  • watches vault│
│  Quarterly   │    │    search       │    │  • chunks/embeds│
│  retrain     │    │  • hybrid       │    │  • daily auto   │
│              │    │    filters      │    │    sync         │
│              │    │  • relationship │    │  • manual       │
│              │    │    graph        │    │    trigger      │
└──────────────┘    └─────────────────┘    └─────────────────┘
```

### Service Responsibilities

| Service | Language | Role |
|---------|----------|------|
| **Companion Engine** | Python (FastAPI) + TS (React) | Chat UI, session memory, prompt orchestration |
| **RAG Engine** | Python | LanceDB queries, embedding cache, hybrid search |
| **Vault Indexer** | Python (CLI + daemon) | File watching, chunking, embedding via Ollama |
| **Model Forge** | Python (on-demand) | QLoRA fine-tuning, GGUF export |

## 4. Data Flow: A Chat Turn

1. User asks: *"I've been feeling off about my friendship with Vinay. What do you think?"*
2. **Orchestrator** detects this needs relationship context + reflective reasoning.
3. **RAG Engine** queries LanceDB for `Vinay`, `friendship`, `#Relations` — returns top 8 chunks.
4. **Prompt construction**:
   - System: Companion persona + reasoning instructions
   - Retrieved context: Relevant vault entries
   - Conversation history: Last 20 turns
   - User message
5. **Local LLM** streams a reflective response.
6. **Optional**: Companion asks a gentle follow-up to deepen reflection.

## 5. Technology Choices

| Component | Choice | Rationale |
|-----------|--------|-----------|
| Base model | Meta-Llama-3.1-8B-Instruct | Strong reasoning, fits 12GB VRAM quantized |
| Fine-tuning | Unsloth + QLoRA (4-bit) | Fast, memory-efficient, runs on RTX 5070 |
| Inference | llama.cpp (GGUF) | Mature, fast local inference, easy GPU layer tuning |
| Embedding | `mxbai-embed-large` via Ollama | 1024-dim, local, high quality |
| Vector store | LanceDB (embedded) | File-based, no server, Rust-backed |
| Backend | FastAPI + WebSockets | Streaming chat, simple Python API |
| Frontend | React + Vite | Lightweight, fast dev loop |
| File watcher | `watchdog` (Python) | Reliable cross-platform vault monitoring |

## 6. Fine-Tuning Strategy

### What the model learns
- **Reflective reasoning style**: How Santhosh thinks through situations
- **Values and priorities**: What he tends to weigh in decisions
- **Communication patterns**: His tone in journal entries (direct, questioning, humorous)
- **Relationship dynamics**: Patterns in how he describes people over time

### What stays in RAG
- Specific dates, events, amounts
- Exact quotes and conversations
- Recent updates (between retrainings)
- Granular facts

### Training data format
Curated "reflection examples" from the vault, formatted as conversation turns:

```json
{
  "messages": [
    {"role": "system", "content": "You are a thoughtful companion..."},
    {"role": "user", "content": "Journal entry about Vinay visit... What do you notice?"},
    {"role": "assistant", "content": "It seems like you value these drop-ins..."}
  ]
}
```

### Training schedule
- **Quarterly retrain**: Automatic reminder (log + chat stream) every 90 days.
- **Manual trigger**: User can initiate retrain anytime via CLI/UI.
- **Pipeline**: `vault → extract reflections → curate → train → export GGUF → swap model file`

## 7. RAG Engine Design

### Indexing Modes
- **`index`**: Full rebuild of the vector store.
- **`sync`**: Incremental — only process files modified since last sync.
- **`reindex`**: Force full rebuild.
- **`status`**: Show doc count, last sync, unindexed files.

### Auto-Sync Strategy
- **File system watcher**: `watchdog` monitors vault root and triggers incremental sync on any `.md` change.
- **Daily full sync**: At 3:00 AM, run a full sync to catch any missed events.
- **Manual trigger**: `POST /index/trigger` from chat or CLI.

### Per-Directory Chunking Rules
Different vault directories need different granularity:

```json
"chunking_rules": {
  "default": {
    "strategy": "sliding_window",
    "chunk_size": 500,
    "chunk_overlap": 100
  },
  "Journal/**": {
    "strategy": "section",
    "section_tags": ["#DayInShort", "#mentalhealth", "#physicalhealth", "#work", "#finance", "#Relations"],
    "chunk_size": 300,
    "chunk_overlap": 50
  },
  "zzz-Archive/**": {
    "strategy": "sliding_window",
    "chunk_size": 800,
    "chunk_overlap": 150
  }
}
```

**Rationale**: Journal entries contain dense emotional/factual tags and benefit from section-based chunking with smaller chunks. Archives are reference material and can be chunked more coarsely.

### Metadata per Chunk
- `source_file`: Relative path from vault root
- `source_directory`: Top-level directory
- `section`: Section heading (for structured notes)
- `date`: Parsed from filename or frontmatter
- `tags`: All hashtags and wikilinks found in chunk
- `chunk_index`: Position in document
- `modified_at`: File mtime for incremental sync
- `rule_applied`: Which chunking rule was used

### Search
- **Default top-k**: 8 chunks
- **Max top-k**: 20
- **Similarity threshold**: 0.75
- **Hybrid search**: Enabled by default (30% keyword, 70% semantic)
- **Filters**: date range, tag list, directory glob

## 8. Configuration Schema

```json
{
  "companion": {
    "name": "SAN",
    "persona": {
      "role": "companion",
      "tone": "reflective",
      "style": "questioning",
      "boundaries": [
        "does_not_impersonate_user",
        "no_future_predictions",
        "no_medical_or_legal_advice"
      ]
    },
    "memory": {
      "session_turns": 20,
      "persistent_store": "~/.companion/memory.db",
      "summarize_after": 10
    },
    "chat": {
      "streaming": true,
      "max_response_tokens": 2048,
      "default_temperature": 0.7,
      "allow_temperature_override": true
    }
  },
  "vault": {
    "path": "/home/san/KnowledgeVault/Default",
    "indexing": {
      "auto_sync": true,
      "auto_sync_interval_minutes": 1440,
      "watch_fs_events": true,
      "file_patterns": ["*.md"],
      "deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git", ".logseq"],
      "deny_patterns": ["*.tmp", "*.bak", "*conflict*", ".*"]
    },
    "chunking_rules": {
      "default": { "strategy": "sliding_window", "chunk_size": 500, "chunk_overlap": 100 },
      "Journal/**": { "strategy": "section", "chunk_size": 300, "chunk_overlap": 50 },
      "zzz-Archive/**": { "strategy": "sliding_window", "chunk_size": 800, "chunk_overlap": 150 }
    }
  },
  "rag": {
    "embedding": {
      "provider": "ollama",
      "model": "mxbai-embed-large",
      "base_url": "http://localhost:11434",
      "dimensions": 1024,
      "batch_size": 32
    },
    "vector_store": {
      "type": "lancedb",
      "path": "~/.companion/vectors.lance"
    },
    "search": {
      "default_top_k": 8,
      "max_top_k": 20,
      "similarity_threshold": 0.75,
      "hybrid_search": { "enabled": true, "keyword_weight": 0.3, "semantic_weight": 0.7 },
      "filters": { "date_range_enabled": true, "tag_filter_enabled": true, "directory_filter_enabled": true }
    }
  },
  "model": {
    "inference": {
      "backend": "llama.cpp",
      "model_path": "~/.companion/models/companion-7b-q4.gguf",
      "context_length": 8192,
      "gpu_layers": 35,
      "batch_size": 512,
      "threads": 8
    },
    "fine_tuning": {
      "base_model": "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
      "output_dir": "~/.companion/training",
      "lora_rank": 16,
      "lora_alpha": 32,
      "learning_rate": 0.0002,
      "batch_size": 4,
      "gradient_accumulation_steps": 4,
      "num_epochs": 3,
      "warmup_steps": 100,
      "save_steps": 500,
      "eval_steps": 250,
      "training_data_path": "~/.companion/training_data/",
      "validation_split": 0.1
    },
    "retrain_schedule": {
      "auto_reminder": true,
      "default_interval_days": 90,
      "reminder_channels": ["chat_stream", "log"]
    }
  },
  "api": {
    "host": "127.0.0.1",
    "port": 7373,
    "cors_origins": ["http://localhost:5173"],
    "auth": { "enabled": false }
  },
  "ui": {
    "web": {
      "enabled": true,
      "theme": "obsidian",
      "features": { "streaming": true, "citations": true, "source_preview": true }
    },
    "cli": { "enabled": true, "rich_output": true }
  },
  "logging": {
    "level": "INFO",
    "file": "~/.companion/logs/companion.log",
    "max_size_mb": 100,
    "backup_count": 5
  },
  "security": {
    "local_only": true,
    "vault_path_traversal_check": true,
    "sensitive_content_detection": true,
    "sensitive_patterns": ["#mentalhealth", "#physicalhealth", "#finance", "#Relations"],
    "require_confirmation_for_external_apis": true
  }
}
```

## 9. Project Structure

```
companion/
├── README.md
├── pyproject.toml
├── config.json
├── src/
│   ├── companion/              # FastAPI backend + orchestrator
│   │   ├── main.py
│   │   ├── api/
│   │   │   ├── chat.py
│   │   │   ├── index.py
│   │   │   └── status.py
│   │   ├── core/
│   │   │   ├── orchestrator.py
│   │   │   ├── memory.py
│   │   │   └── prompts.py
│   │   └── config.py
│   ├── rag/                    # RAG engine
│   │   ├── indexer.py
│   │   ├── chunker.py
│   │   ├── embedder.py
│   │   ├── vector_store.py
│   │   └── search.py
│   ├── indexer_daemon/         # Vault watcher + indexer CLI
│   │   ├── daemon.py
│   │   ├── cli.py
│   │   └── watcher.py
│   └── forge/                  # Fine-tuning pipeline
│       ├── extract.py
│       ├── train.py
│       ├── export.py
│       └── evaluate.py
├── ui/                         # React frontend
│   ├── src/
│   │   ├── App.tsx
│   │   ├── components/
│   │   │   ├── Chat.tsx
│   │   │   ├── Message.tsx
│   │   │   └── Settings.tsx
│   │   └── api.ts
│   └── package.json
├── tests/
│   ├── companion/
│   ├── rag/
│   └── forge/
└── docs/
    └── superpowers/
        └── specs/
            └── 2026-04-13-personal-companion-ai-design.md
```

## 10. Testing Strategy

| Layer | Tests |
|-------|-------|
| **RAG** | Chunking correctness, search relevance, incremental sync accuracy |
| **API** | Chat streaming, parameter validation, error handling |
| **Security** | Path traversal, sensitive content detection, local-only enforcement |
| **Forge** | Training convergence, eval loss trends, output GGUF validity |
| **E2E** | Full chat turn with RAG retrieval, citation rendering |

## 11. Risks & Mitigations

| Risk | Mitigation |
|------|------------|
| **Overfitting** on small dataset | LoRA rank 16, strong regularization, 10% validation split, human eval |
| **Temporal drift** | Quarterly retrain + daily RAG sync |
| **Privacy leak in training data** | Manual curation of training examples; exclude others' private details |
| **Emotional weight / uncanny valley** | Persona is companion, not predictor; framed as reflection |
| **Hardware limits** | QLoRA 4-bit, 8B model, ~35 GPU layers; fallback to CPU offloading if needed |
| **Maintenance fatigue** | Auto-sync removes daily work; retrain is one script + reminder |

## 12. Success Criteria

- [ ] Chat interface streams responses locally with sub-second first-token latency
- [ ] RAG retrieves relevant vault context for >80% of personal questions
- [ ] Fine-tuned model produces responses that "feel" recognizably aligned with Santhosh's reflective style
- [ ] Quarterly retrain completes successfully on RTX 5070 in <6 hours
- [ ] Daily auto-sync and manual trigger both work reliably
- [ ] No vault data leaves the local machine

## 13. Next Steps

1. Write implementation plan using `writing-plans` skill
2. Scaffold the repository structure
3. Build Vault Indexer + RAG engine first ( Week 1-2)
4. Integrate chat UI with base model (Week 3)
5. Curate training data and begin fine-tuning experiments (Week 4-6)
6. Polish, evaluate, and integrate (Week 7-8)