6.6 KiB
6.6 KiB
RAG Module Documentation
The RAG (Retrieval-Augmented Generation) module provides semantic search over your Obsidian vault. It handles document chunking, embedding generation, and vector similarity search.
Architecture
Vault Markdown Files
↓
┌─────────────────┐
│ Chunker │ - Split by strategy (sliding window / section)
│ (chunker.py) │ - Extract metadata (tags, dates, sections)
└────────┬────────┘
↓
┌─────────────────┐
│ Embedder │ - HTTP client for Ollama API
│ (embedder.py) │ - Batch processing with retries
└────────┬────────┘
↓
┌─────────────────┐
│ Vector Store │ - LanceDB persistence
│(vector_store.py)│ - Upsert, delete, search
└────────┬────────┘
↓
┌─────────────────┐
│ Indexer │ - Full/incremental sync
│ (indexer.py) │ - File watching
└─────────────────┘
Components
Chunker (companion.rag.chunker)
Splits markdown files into searchable chunks.
from companion.rag.chunker import chunk_file, ChunkingRule
rules = {
"default": ChunkingRule(strategy="sliding_window", chunk_size=500, chunk_overlap=100),
"Journal/**": ChunkingRule(strategy="section", section_tags=["#DayInShort"], chunk_size=300, chunk_overlap=50),
}
chunks = chunk_file(
file_path=Path("journal/2026-04-12.md"),
vault_root=Path("~/vault"),
rules=rules,
modified_at=1234567890.0,
)
for chunk in chunks:
print(f"{chunk.source_file}:{chunk.chunk_index}")
print(f"Text: {chunk.text[:100]}...")
print(f"Tags: {chunk.tags}")
print(f"Date: {chunk.date}")
Chunking Strategies
Sliding Window
- Fixed-size chunks with overlap
- Best for: Longform text, articles
ChunkingRule(
strategy="sliding_window",
chunk_size=500, # words per chunk
chunk_overlap=100, # words overlap between chunks
)
Section-Based
- Split on section headers (tags)
- Best for: Structured journals, daily notes
ChunkingRule(
strategy="section",
section_tags=["#DayInShort", "#mentalhealth", "#work"],
chunk_size=300,
chunk_overlap=50,
)
Metadata Extraction
Each chunk includes:
source_file- Relative path from vault rootsource_directory- Top-level directorysection- Section header (for section strategy)date- Parsed from filenametags- Hashtags and wikilinkschunk_index- Position in documentmodified_at- File mtime for sync
Embedder (companion.rag.embedder)
Generates embeddings via Ollama API.
from companion.rag.embedder import OllamaEmbedder
embedder = OllamaEmbedder(
base_url="http://localhost:11434",
model="mxbai-embed-large",
batch_size=32,
)
# Single embedding
embeddings = embedder.embed(["Hello world"])
print(len(embeddings[0])) # 1024 dimensions
# Batch embedding (with automatic batching)
texts = ["text 1", "text 2", "text 3", ...] # 100 texts
embeddings = embedder.embed(texts) # Automatically batches
Features
- Batching: Automatically splits large requests
- Retries: Exponential backoff on failures
- Context Manager: Proper resource cleanup
with OllamaEmbedder(...) as embedder:
embeddings = embedder.embed(texts)
Vector Store (companion.rag.vector_store)
LanceDB wrapper for vector storage.
from companion.rag.vector_store import VectorStore
store = VectorStore(
uri="~/.companion/vectors.lance",
dimensions=1024,
)
# Upsert chunks
store.upsert(
ids=["file.md::0", "file.md::1"],
texts=["chunk 1", "chunk 2"],
embeddings=[[0.1, ...], [0.2, ...]],
metadatas=[
{"source_file": "file.md", "source_directory": "docs"},
{"source_file": "file.md", "source_directory": "docs"},
],
)
# Search
results = store.search(
query_vector=[0.1, ...],
top_k=8,
filters={"source_directory": "Journal"},
)
Schema
| Field | Type | Nullable |
|---|---|---|
| id | string | No |
| text | string | No |
| vector | list[float32] | No |
| source_file | string | No |
| source_directory | string | No |
| section | string | Yes |
| date | string | Yes |
| tags | list[string] | Yes |
| chunk_index | int32 | No |
| total_chunks | int32 | No |
| modified_at | float64 | Yes |
| rule_applied | string | No |
Indexer (companion.rag.indexer)
Orchestrates vault indexing.
from companion.config import load_config
from companion.rag.indexer import Indexer
from companion.rag.vector_store import VectorStore
config = load_config()
store = VectorStore(
uri=config.rag.vector_store.path,
dimensions=config.rag.embedding.dimensions,
)
indexer = Indexer(config, store)
# Full reindex (clear + rebuild)
indexer.full_index()
# Incremental sync (only changed files)
indexer.sync()
# Get status
status = indexer.status()
print(f"Total chunks: {status['total_chunks']}")
print(f"Unindexed files: {status['unindexed_files']}")
Search (companion.rag.search)
High-level search interface.
from companion.rag.search import SearchEngine
engine = SearchEngine(
vector_store=store,
embedder_base_url="http://localhost:11434",
embedder_model="mxbai-embed-large",
default_top_k=8,
similarity_threshold=0.75,
hybrid_search_enabled=False,
)
results = engine.search(
query="What did I learn about friendships?",
top_k=8,
filters={"source_directory": "Journal"},
)
for result in results:
print(f"Source: {result['source_file']}")
print(f"Relevance: {1 - result['_distance']:.2f}")
CLI Commands
# Full index
python -m companion.indexer_daemon.cli index
# Incremental sync
python -m companion.indexer_daemon.cli sync
# Check status
python -m companion.indexer_daemon.cli status
# Reindex (same as index)
python -m companion.indexer_daemon.cli reindex
Performance Tips
- Chunk Size: Smaller chunks = better retrieval, larger = more context
- Batch Size: 32 is optimal for Ollama embeddings
- Filters: Use directory filters to narrow search scope
- Sync vs Index: Use
syncfor daily updates,indexfor full rebuilds
Troubleshooting
Slow indexing
- Check Ollama is running:
ollama ps - Reduce batch size if OOM
No results
- Verify vault path in config
- Check
indexer.status()for unindexed files
Duplicate chunks
- Each chunk ID is
{source_file}::{chunk_index} - Use
full_index()to clear and rebuild