Files
kv-ai/docs/rag.md

270 lines
6.6 KiB
Markdown

# RAG Module Documentation
The RAG (Retrieval-Augmented Generation) module provides semantic search over your Obsidian vault. It handles document chunking, embedding generation, and vector similarity search.
## Architecture
```
Vault Markdown Files
┌─────────────────┐
│ Chunker │ - Split by strategy (sliding window / section)
│ (chunker.py) │ - Extract metadata (tags, dates, sections)
└────────┬────────┘
┌─────────────────┐
│ Embedder │ - HTTP client for Ollama API
│ (embedder.py) │ - Batch processing with retries
└────────┬────────┘
┌─────────────────┐
│ Vector Store │ - LanceDB persistence
│(vector_store.py)│ - Upsert, delete, search
└────────┬────────┘
┌─────────────────┐
│ Indexer │ - Full/incremental sync
│ (indexer.py) │ - File watching
└─────────────────┘
```
## Components
### Chunker (`companion.rag.chunker`)
Splits markdown files into searchable chunks.
```python
from companion.rag.chunker import chunk_file, ChunkingRule
rules = {
"default": ChunkingRule(strategy="sliding_window", chunk_size=500, chunk_overlap=100),
"Journal/**": ChunkingRule(strategy="section", section_tags=["#DayInShort"], chunk_size=300, chunk_overlap=50),
}
chunks = chunk_file(
file_path=Path("journal/2026-04-12.md"),
vault_root=Path("~/vault"),
rules=rules,
modified_at=1234567890.0,
)
for chunk in chunks:
print(f"{chunk.source_file}:{chunk.chunk_index}")
print(f"Text: {chunk.text[:100]}...")
print(f"Tags: {chunk.tags}")
print(f"Date: {chunk.date}")
```
#### Chunking Strategies
**Sliding Window**
- Fixed-size chunks with overlap
- Best for: Longform text, articles
```python
ChunkingRule(
strategy="sliding_window",
chunk_size=500, # words per chunk
chunk_overlap=100, # words overlap between chunks
)
```
**Section-Based**
- Split on section headers (tags)
- Best for: Structured journals, daily notes
```python
ChunkingRule(
strategy="section",
section_tags=["#DayInShort", "#mentalhealth", "#work"],
chunk_size=300,
chunk_overlap=50,
)
```
#### Metadata Extraction
Each chunk includes:
- `source_file` - Relative path from vault root
- `source_directory` - Top-level directory
- `section` - Section header (for section strategy)
- `date` - Parsed from filename
- `tags` - Hashtags and wikilinks
- `chunk_index` - Position in document
- `modified_at` - File mtime for sync
### Embedder (`companion.rag.embedder`)
Generates embeddings via Ollama API.
```python
from companion.rag.embedder import OllamaEmbedder
embedder = OllamaEmbedder(
base_url="http://localhost:11434",
model="mxbai-embed-large",
batch_size=32,
)
# Single embedding
embeddings = embedder.embed(["Hello world"])
print(len(embeddings[0])) # 1024 dimensions
# Batch embedding (with automatic batching)
texts = ["text 1", "text 2", "text 3", ...] # 100 texts
embeddings = embedder.embed(texts) # Automatically batches
```
#### Features
- **Batching**: Automatically splits large requests
- **Retries**: Exponential backoff on failures
- **Context Manager**: Proper resource cleanup
```python
with OllamaEmbedder(...) as embedder:
embeddings = embedder.embed(texts)
```
### Vector Store (`companion.rag.vector_store`)
LanceDB wrapper for vector storage.
```python
from companion.rag.vector_store import VectorStore
store = VectorStore(
uri="~/.companion/vectors.lance",
dimensions=1024,
)
# Upsert chunks
store.upsert(
ids=["file.md::0", "file.md::1"],
texts=["chunk 1", "chunk 2"],
embeddings=[[0.1, ...], [0.2, ...]],
metadatas=[
{"source_file": "file.md", "source_directory": "docs"},
{"source_file": "file.md", "source_directory": "docs"},
],
)
# Search
results = store.search(
query_vector=[0.1, ...],
top_k=8,
filters={"source_directory": "Journal"},
)
```
#### Schema
| Field | Type | Nullable |
|-------|------|----------|
| id | string | No |
| text | string | No |
| vector | list[float32] | No |
| source_file | string | No |
| source_directory | string | No |
| section | string | Yes |
| date | string | Yes |
| tags | list[string] | Yes |
| chunk_index | int32 | No |
| total_chunks | int32 | No |
| modified_at | float64 | Yes |
| rule_applied | string | No |
### Indexer (`companion.rag.indexer`)
Orchestrates vault indexing.
```python
from companion.config import load_config
from companion.rag.indexer import Indexer
from companion.rag.vector_store import VectorStore
config = load_config()
store = VectorStore(
uri=config.rag.vector_store.path,
dimensions=config.rag.embedding.dimensions,
)
indexer = Indexer(config, store)
# Full reindex (clear + rebuild)
indexer.full_index()
# Incremental sync (only changed files)
indexer.sync()
# Get status
status = indexer.status()
print(f"Total chunks: {status['total_chunks']}")
print(f"Unindexed files: {status['unindexed_files']}")
```
### Search (`companion.rag.search`)
High-level search interface.
```python
from companion.rag.search import SearchEngine
engine = SearchEngine(
vector_store=store,
embedder_base_url="http://localhost:11434",
embedder_model="mxbai-embed-large",
default_top_k=8,
similarity_threshold=0.75,
hybrid_search_enabled=False,
)
results = engine.search(
query="What did I learn about friendships?",
top_k=8,
filters={"source_directory": "Journal"},
)
for result in results:
print(f"Source: {result['source_file']}")
print(f"Relevance: {1 - result['_distance']:.2f}")
```
## CLI Commands
```bash
# Full index
python -m companion.indexer_daemon.cli index
# Incremental sync
python -m companion.indexer_daemon.cli sync
# Check status
python -m companion.indexer_daemon.cli status
# Reindex (same as index)
python -m companion.indexer_daemon.cli reindex
```
## Performance Tips
1. **Chunk Size**: Smaller chunks = better retrieval, larger = more context
2. **Batch Size**: 32 is optimal for Ollama embeddings
3. **Filters**: Use directory filters to narrow search scope
4. **Sync vs Index**: Use `sync` for daily updates, `index` for full rebuilds
## Troubleshooting
**Slow indexing**
- Check Ollama is running: `ollama ps`
- Reduce batch size if OOM
**No results**
- Verify vault path in config
- Check `indexer.status()` for unindexed files
**Duplicate chunks**
- Each chunk ID is `{source_file}::{chunk_index}`
- Use `full_index()` to clear and rebuild