Files
kv-ai/docs/rag.md

6.6 KiB

RAG Module Documentation

The RAG (Retrieval-Augmented Generation) module provides semantic search over your Obsidian vault. It handles document chunking, embedding generation, and vector similarity search.

Architecture

Vault Markdown Files
         ↓
┌─────────────────┐
│    Chunker      │  - Split by strategy (sliding window / section)
│  (chunker.py)   │  - Extract metadata (tags, dates, sections)
└────────┬────────┘
         ↓
┌─────────────────┐
│    Embedder     │  - HTTP client for Ollama API
│  (embedder.py)  │  - Batch processing with retries
└────────┬────────┘
         ↓
┌─────────────────┐
│  Vector Store   │  - LanceDB persistence
│(vector_store.py)│  - Upsert, delete, search
└────────┬────────┘
         ↓
┌─────────────────┐
│  Indexer        │  - Full/incremental sync
│  (indexer.py)   │  - File watching
└─────────────────┘

Components

Chunker (companion.rag.chunker)

Splits markdown files into searchable chunks.

from companion.rag.chunker import chunk_file, ChunkingRule

rules = {
    "default": ChunkingRule(strategy="sliding_window", chunk_size=500, chunk_overlap=100),
    "Journal/**": ChunkingRule(strategy="section", section_tags=["#DayInShort"], chunk_size=300, chunk_overlap=50),
}

chunks = chunk_file(
    file_path=Path("journal/2026-04-12.md"),
    vault_root=Path("~/vault"),
    rules=rules,
    modified_at=1234567890.0,
)

for chunk in chunks:
    print(f"{chunk.source_file}:{chunk.chunk_index}")
    print(f"Text: {chunk.text[:100]}...")
    print(f"Tags: {chunk.tags}")
    print(f"Date: {chunk.date}")

Chunking Strategies

Sliding Window

  • Fixed-size chunks with overlap
  • Best for: Longform text, articles
ChunkingRule(
    strategy="sliding_window",
    chunk_size=500,    # words per chunk
    chunk_overlap=100, # words overlap between chunks
)

Section-Based

  • Split on section headers (tags)
  • Best for: Structured journals, daily notes
ChunkingRule(
    strategy="section",
    section_tags=["#DayInShort", "#mentalhealth", "#work"],
    chunk_size=300,
    chunk_overlap=50,
)

Metadata Extraction

Each chunk includes:

  • source_file - Relative path from vault root
  • source_directory - Top-level directory
  • section - Section header (for section strategy)
  • date - Parsed from filename
  • tags - Hashtags and wikilinks
  • chunk_index - Position in document
  • modified_at - File mtime for sync

Embedder (companion.rag.embedder)

Generates embeddings via Ollama API.

from companion.rag.embedder import OllamaEmbedder

embedder = OllamaEmbedder(
    base_url="http://localhost:11434",
    model="mxbai-embed-large",
    batch_size=32,
)

# Single embedding
embeddings = embedder.embed(["Hello world"])
print(len(embeddings[0]))  # 1024 dimensions

# Batch embedding (with automatic batching)
texts = ["text 1", "text 2", "text 3", ...]  # 100 texts
embeddings = embedder.embed(texts)  # Automatically batches

Features

  • Batching: Automatically splits large requests
  • Retries: Exponential backoff on failures
  • Context Manager: Proper resource cleanup
with OllamaEmbedder(...) as embedder:
    embeddings = embedder.embed(texts)

Vector Store (companion.rag.vector_store)

LanceDB wrapper for vector storage.

from companion.rag.vector_store import VectorStore

store = VectorStore(
    uri="~/.companion/vectors.lance",
    dimensions=1024,
)

# Upsert chunks
store.upsert(
    ids=["file.md::0", "file.md::1"],
    texts=["chunk 1", "chunk 2"],
    embeddings=[[0.1, ...], [0.2, ...]],
    metadatas=[
        {"source_file": "file.md", "source_directory": "docs"},
        {"source_file": "file.md", "source_directory": "docs"},
    ],
)

# Search
results = store.search(
    query_vector=[0.1, ...],
    top_k=8,
    filters={"source_directory": "Journal"},
)

Schema

Field Type Nullable
id string No
text string No
vector list[float32] No
source_file string No
source_directory string No
section string Yes
date string Yes
tags list[string] Yes
chunk_index int32 No
total_chunks int32 No
modified_at float64 Yes
rule_applied string No

Indexer (companion.rag.indexer)

Orchestrates vault indexing.

from companion.config import load_config
from companion.rag.indexer import Indexer
from companion.rag.vector_store import VectorStore

config = load_config()
store = VectorStore(
    uri=config.rag.vector_store.path,
    dimensions=config.rag.embedding.dimensions,
)

indexer = Indexer(config, store)

# Full reindex (clear + rebuild)
indexer.full_index()

# Incremental sync (only changed files)
indexer.sync()

# Get status
status = indexer.status()
print(f"Total chunks: {status['total_chunks']}")
print(f"Unindexed files: {status['unindexed_files']}")

Search (companion.rag.search)

High-level search interface.

from companion.rag.search import SearchEngine

engine = SearchEngine(
    vector_store=store,
    embedder_base_url="http://localhost:11434",
    embedder_model="mxbai-embed-large",
    default_top_k=8,
    similarity_threshold=0.75,
    hybrid_search_enabled=False,
)

results = engine.search(
    query="What did I learn about friendships?",
    top_k=8,
    filters={"source_directory": "Journal"},
)

for result in results:
    print(f"Source: {result['source_file']}")
    print(f"Relevance: {1 - result['_distance']:.2f}")

CLI Commands

# Full index
python -m companion.indexer_daemon.cli index

# Incremental sync
python -m companion.indexer_daemon.cli sync

# Check status
python -m companion.indexer_daemon.cli status

# Reindex (same as index)
python -m companion.indexer_daemon.cli reindex

Performance Tips

  1. Chunk Size: Smaller chunks = better retrieval, larger = more context
  2. Batch Size: 32 is optimal for Ollama embeddings
  3. Filters: Use directory filters to narrow search scope
  4. Sync vs Index: Use sync for daily updates, index for full rebuilds

Troubleshooting

Slow indexing

  • Check Ollama is running: ollama ps
  • Reduce batch size if OOM

No results

  • Verify vault path in config
  • Check indexer.status() for unindexed files

Duplicate chunks

  • Each chunk ID is {source_file}::{chunk_index}
  • Use full_index() to clear and rebuild