# RAG Module Documentation The RAG (Retrieval-Augmented Generation) module provides semantic search over your Obsidian vault. It handles document chunking, embedding generation, and vector similarity search. ## Architecture ``` Vault Markdown Files ↓ ┌─────────────────┐ │ Chunker │ - Split by strategy (sliding window / section) │ (chunker.py) │ - Extract metadata (tags, dates, sections) └────────┬────────┘ ↓ ┌─────────────────┐ │ Embedder │ - HTTP client for Ollama API │ (embedder.py) │ - Batch processing with retries └────────┬────────┘ ↓ ┌─────────────────┐ │ Vector Store │ - LanceDB persistence │(vector_store.py)│ - Upsert, delete, search └────────┬────────┘ ↓ ┌─────────────────┐ │ Indexer │ - Full/incremental sync │ (indexer.py) │ - File watching └─────────────────┘ ``` ## Components ### Chunker (`companion.rag.chunker`) Splits markdown files into searchable chunks. ```python from companion.rag.chunker import chunk_file, ChunkingRule rules = { "default": ChunkingRule(strategy="sliding_window", chunk_size=500, chunk_overlap=100), "Journal/**": ChunkingRule(strategy="section", section_tags=["#DayInShort"], chunk_size=300, chunk_overlap=50), } chunks = chunk_file( file_path=Path("journal/2026-04-12.md"), vault_root=Path("~/vault"), rules=rules, modified_at=1234567890.0, ) for chunk in chunks: print(f"{chunk.source_file}:{chunk.chunk_index}") print(f"Text: {chunk.text[:100]}...") print(f"Tags: {chunk.tags}") print(f"Date: {chunk.date}") ``` #### Chunking Strategies **Sliding Window** - Fixed-size chunks with overlap - Best for: Longform text, articles ```python ChunkingRule( strategy="sliding_window", chunk_size=500, # words per chunk chunk_overlap=100, # words overlap between chunks ) ``` **Section-Based** - Split on section headers (tags) - Best for: Structured journals, daily notes ```python ChunkingRule( strategy="section", section_tags=["#DayInShort", "#mentalhealth", "#work"], chunk_size=300, chunk_overlap=50, ) ``` #### Metadata Extraction Each chunk includes: - `source_file` - Relative path from vault root - `source_directory` - Top-level directory - `section` - Section header (for section strategy) - `date` - Parsed from filename - `tags` - Hashtags and wikilinks - `chunk_index` - Position in document - `modified_at` - File mtime for sync ### Embedder (`companion.rag.embedder`) Generates embeddings via Ollama API. ```python from companion.rag.embedder import OllamaEmbedder embedder = OllamaEmbedder( base_url="http://localhost:11434", model="mxbai-embed-large", batch_size=32, ) # Single embedding embeddings = embedder.embed(["Hello world"]) print(len(embeddings[0])) # 1024 dimensions # Batch embedding (with automatic batching) texts = ["text 1", "text 2", "text 3", ...] # 100 texts embeddings = embedder.embed(texts) # Automatically batches ``` #### Features - **Batching**: Automatically splits large requests - **Retries**: Exponential backoff on failures - **Context Manager**: Proper resource cleanup ```python with OllamaEmbedder(...) as embedder: embeddings = embedder.embed(texts) ``` ### Vector Store (`companion.rag.vector_store`) LanceDB wrapper for vector storage. ```python from companion.rag.vector_store import VectorStore store = VectorStore( uri="~/.companion/vectors.lance", dimensions=1024, ) # Upsert chunks store.upsert( ids=["file.md::0", "file.md::1"], texts=["chunk 1", "chunk 2"], embeddings=[[0.1, ...], [0.2, ...]], metadatas=[ {"source_file": "file.md", "source_directory": "docs"}, {"source_file": "file.md", "source_directory": "docs"}, ], ) # Search results = store.search( query_vector=[0.1, ...], top_k=8, filters={"source_directory": "Journal"}, ) ``` #### Schema | Field | Type | Nullable | |-------|------|----------| | id | string | No | | text | string | No | | vector | list[float32] | No | | source_file | string | No | | source_directory | string | No | | section | string | Yes | | date | string | Yes | | tags | list[string] | Yes | | chunk_index | int32 | No | | total_chunks | int32 | No | | modified_at | float64 | Yes | | rule_applied | string | No | ### Indexer (`companion.rag.indexer`) Orchestrates vault indexing. ```python from companion.config import load_config from companion.rag.indexer import Indexer from companion.rag.vector_store import VectorStore config = load_config() store = VectorStore( uri=config.rag.vector_store.path, dimensions=config.rag.embedding.dimensions, ) indexer = Indexer(config, store) # Full reindex (clear + rebuild) indexer.full_index() # Incremental sync (only changed files) indexer.sync() # Get status status = indexer.status() print(f"Total chunks: {status['total_chunks']}") print(f"Unindexed files: {status['unindexed_files']}") ``` ### Search (`companion.rag.search`) High-level search interface. ```python from companion.rag.search import SearchEngine engine = SearchEngine( vector_store=store, embedder_base_url="http://localhost:11434", embedder_model="mxbai-embed-large", default_top_k=8, similarity_threshold=0.75, hybrid_search_enabled=False, ) results = engine.search( query="What did I learn about friendships?", top_k=8, filters={"source_directory": "Journal"}, ) for result in results: print(f"Source: {result['source_file']}") print(f"Relevance: {1 - result['_distance']:.2f}") ``` ## CLI Commands ```bash # Full index python -m companion.indexer_daemon.cli index # Incremental sync python -m companion.indexer_daemon.cli sync # Check status python -m companion.indexer_daemon.cli status # Reindex (same as index) python -m companion.indexer_daemon.cli reindex ``` ## Performance Tips 1. **Chunk Size**: Smaller chunks = better retrieval, larger = more context 2. **Batch Size**: 32 is optimal for Ollama embeddings 3. **Filters**: Use directory filters to narrow search scope 4. **Sync vs Index**: Use `sync` for daily updates, `index` for full rebuilds ## Troubleshooting **Slow indexing** - Check Ollama is running: `ollama ps` - Reduce batch size if OOM **No results** - Verify vault path in config - Check `indexer.status()` for unindexed files **Duplicate chunks** - Each chunk ID is `{source_file}::{chunk_index}` - Use `full_index()` to clear and rebuild