Previously total_chunks counted from process_file return (num_chunks) which could differ from actual stored count if upsert silently failed. Now using stored count returned by upsert_chunks. Also fixes cli._index to skip progress yields when building result.
Obsidian RAG — Manual Testing Guide
What it does: Indexes an Obsidian vault → LanceDB → semantic search via Ollama embeddings. Powers OpenClaw agent tools for natural-language queries over 677+ personal notes.
Stack: Python indexer (CLI) → LanceDB → TypeScript plugin (OpenClaw)
Prerequisites
| Component | Version | Verify |
|---|---|---|
| Python | ≥3.11 | python --version |
| Node.js | ≥18 | node --version |
| Ollama | running | curl http://localhost:11434/api/tags |
| Ollama model | mxbai-embed-large:335m |
ollama list |
Install Ollama + model (if needed):
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Pull embedding model
ollama pull mxbai-embed-large:335m
Installation
1. Python CLI (indexer)
cd /Users/santhoshj/dev/obsidian-rag
# Create virtual environment (optional but recommended)
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# .\.venv\Scripts\Activate.ps1 # Windows PowerShell
# .venv\Scripts\activate.bat # Windows CMD
# Install in editable mode
pip install -e python/
Verify:
obsidian-rag --help
# → obsidian-rag index | sync | reindex | status
2. TypeScript Plugin (for OpenClaw integration)
npm install
npm run build # → dist/index.js (131kb)
3. (Optional) Ollama running
ollama serve &
curl http://localhost:11434/api/tags
Configuration
Edit obsidian-rag/config.json at the project root:
{
"vault_path": "./KnowledgeVault/Default",
"embedding": {
"provider": "ollama",
"model": "mxbai-embed-large:335m",
"base_url": "http://localhost:11434",
"dimensions": 1024,
"batch_size": 64
},
"vector_store": {
"type": "lancedb",
"path": "./obsidian-rag/vectors.lance"
},
"indexing": {
"chunk_size": 500,
"chunk_overlap": 100,
"file_patterns": ["*.md"],
"deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git", ".logseq"],
"allow_dirs": []
},
"security": {
"require_confirmation_for": ["health", "financial_debt"],
"sensitive_sections": ["#mentalhealth", "#physicalhealth", "#Relations"],
"local_only": true
}
}
| Field | What it does |
|---|---|
vault_path |
Root of Obsidian vault (relative or absolute) |
embedding.model |
Ollama model for mxbai-embed-large:335m |
vector_store.path |
Where LanceDB data lives |
deny_dirs |
Always-skipped directories |
allow_dirs |
If non-empty, only these directories are indexed |
Windows users: Use ".\\KnowledgeVault\\Default" or an absolute path like "C:\\Users\\you\\KnowledgeVault\\Default".
CLI Commands
All commands run from the project root (/Users/santhoshj/dev/obsidian-rag).
obsidian-rag index — Full Index
First-time indexing. Scans all .md files → chunks → embeds → stores in LanceDB.
obsidian-rag index
Output:
{
"type": "complete",
"indexed_files": 627,
"total_chunks": 3764,
"duration_ms": 45230,
"errors": []
}
What happens:
- Walk vault (respects
deny_dirs/allow_dirs) - Parse markdown: frontmatter, headings, tags, dates
- Chunk: structured notes (journal) split by
# heading; unstructured use 500-token sliding window - Embed: batch of 64 chunks → Ollama
/api/embeddings - Upsert: write to LanceDB
- Write
obsidian-rag/sync-result.jsonatomically
Time: ~45s for 627 files on first run.
obsidian-rag sync — Incremental Sync
Only re-indexes files changed since last sync (by mtime).
obsidian-rag sync
Output:
{
"type": "complete",
"indexed_files": 3,
"total_chunks": 12,
"duration_ms": 1200,
"errors": []
}
Use when: You edited/added a few notes and want to update the index without a full rebuild.
obsidian-rag reindex — Force Rebuild
Nukes the existing LanceDB table and rebuilds from scratch.
obsidian-rag reindex
Use when:
- LanceDB schema changed
- Chunking strategy changed
- Index corrupted
- First run after upgrading (to pick up FTS index)
obsidian-rag status — Index Health
obsidian-rag status
Output:
{
"total_docs": 627,
"total_chunks": 3764,
"last_sync": "2026-04-11T00:30:00Z"
}
Re-index after schema upgrade (important!)
If you pulled a new version that changed the FTS index setup, you must reindex:
obsidian-rag reindex
This drops and recreates the LanceDB table, rebuilding the FTS index on chunk_text.
Manual Testing Walkthrough
Step 1 — Verify prerequisites
# Ollama up?
curl http://localhost:11434/api/tags
# Python CLI working?
obsidian-rag --help
# Vault accessible?
ls ./KnowledgeVault/Default | head -5
Step 2 — Do a full index
obsidian-rag index
Expected: ~30-60s. JSON output with indexed_files and total_chunks.
Step 3 — Check status
obsidian-rag status
Step 4 — Test search via Python
The Python indexer doesn't have an interactive search CLI, but you can test via the LanceDB Python API directly:
python3 -c "
import sys
sys.path.insert(0, 'python')
from obsidian_rag.vector_store import get_db, search_chunks
from obsidian_rag.embedder import embed_texts
from obsidian_rag.config import load_config
config = load_config()
db = get_db(config)
table = db.open_table('obsidian_chunks')
# Embed a query
query_vec = embed_texts(['how was my mental health in 2024'], config)[0]
# Search
results = search_chunks(table, query_vec, limit=3)
for r in results:
print(f'[{r.score:.3f}] {r.source_file} | {r.section or \"(no section)\"}')
print(f' {r.chunk_text[:200]}...')
print()
"
Step 5 — Test TypeScript search (via Node)
node --input-type=module -e "
import { loadConfig } from './src/utils/config.js';
import { searchVectorDb } from './src/utils/lancedb.js';
const config = loadConfig();
const results = await searchVectorDb(config, 'how was my mental health in 2024', { max_results: 3 });
for (const r of results) {
console.log(\`[\${r.score}] \${r.source_file} | \${r.section || '(no section)'}\`);
console.log(\` \${r.chunk_text.slice(0, 180)}...\`);
console.log();
}
"
Step 6 — Test DEGRADED mode (Ollama down)
Stop Ollama, then run the same search:
# Stop Ollama
pkill -f ollama # macOS/Linux
# Now run search — should fall back to FTS
node --input-type=module -e "
...same as above...
"
Expected: results come back using BM25 full-text search instead of vector similarity. You'll see lower _score values (BM25 scores are smaller floats).
Step 7 — Test sync
# Edit a note
echo "# Test edit
This is a test note about Ollama being down." >> ./KnowledgeVault/Default/test-note.md
# Sync
obsidian-rag sync
# Check it was indexed
obsidian-rag status
Step 8 — Test indexer health check
# Stop Ollama
pkill -f ollama
# Check status — will report Ollama as down but still show index stats
obsidian-rag status
# Restart Ollama
ollama serve
Directory Filtering
Test searching only within Journal:
node --input-type=module -e "
import { loadConfig } from './src/utils/config.js';
import { searchVectorDb } from './src/utils/lancedb.js';
const config = loadConfig();
const results = await searchVectorDb(config, 'my mood and feelings', {
max_results: 3,
directory_filter: ['Journal']
});
results.forEach(r => console.log(\`[\${r.score}] \${r.source_file}\`));
"
File Paths Reference
| File | Purpose |
|---|---|
obsidian-rag/vectors.lance/ |
LanceDB data directory |
obsidian-rag/sync-result.json |
Last sync timestamp + stats |
python/obsidian_rag/ |
Python package source |
src/ |
TypeScript plugin source |
dist/index.js |
Built plugin bundle |
Troubleshooting
FileNotFoundError: config.json
Config must be found. The CLI looks in:
./obsidian-rag/config.json(relative to project root)~/.obsidian-rag/config.json(home directory)
# Verify config is found
python3 -c "
import sys; sys.path.insert(0,'python')
from obsidian_rag.config import load_config
c = load_config()
print('vault_path:', c.vault_path)
"
ERROR: Index not found. Run 'obsidian-rag index' first.
LanceDB table doesn't exist yet. Run obsidian-rag index.
Ollama connection refused
curl http://localhost:11434/api/tags
If this fails, Ollama isn't running:
ollama serve &
ollama pull mxbai-embed-large:335m
Vector search returns 0 results
- Check index exists:
obsidian-rag status - Rebuild index:
obsidian-rag reindex - Check Ollama is up and model is available:
ollama list
FTS (DEGRADED mode) not working after upgrade
The FTS index on chunk_text was added in a recent change. Reindex to rebuild with FTS:
obsidian-rag reindex
Permission errors on Windows
Run terminal as Administrator, or install Python/Ollama to user-writable directories.
Very slow embedding
Reduce batch size in config.json:
"batch_size": 32
Project Structure
obsidian-rag/
├── obsidian-rag/
│ ├── config.json # Dev configuration
│ ├── vectors.lance/ # LanceDB data (created on first index)
│ └── sync-result.json # Last sync metadata
├── python/
│ ├── obsidian_rag/
│ │ ├── cli.py # obsidian-rag CLI entry point
│ │ ├── config.py # Config loader
│ │ ├── indexer.py # Full pipeline (scan → chunk → embed → store)
│ │ ├── chunker.py # Structured + sliding-window chunking
│ │ ├── embedder.py # Ollama /api/embeddings client
│ │ ├── vector_store.py # LanceDB CRUD
│ │ └── security.py # Path traversal, HTML strip, sensitive detection
│ └── tests/unit/ # 64 pytest tests
├── src/
│ ├── index.ts # OpenClaw plugin entry (definePluginEntry)
│ ├── tools/
│ │ ├── index.ts # 4× api.registerTool() calls
│ │ ├── index-tool.ts # obsidian_rag_index implementation
│ │ ├── search.ts # obsidian_rag_search implementation
│ │ ├── status.ts # obsidian_rag_status implementation
│ │ └── memory.ts # obsidian_rag_memory_store implementation
│ ├── services/
│ │ ├── health.ts # HEALTHY / DEGRADED / UNAVAILABLE state machine
│ │ ├── vault-watcher.ts # chokidar watcher + auto-sync
│ │ └── indexer-bridge.ts # Spawns Python CLI subprocess
│ └── utils/
│ ├── config.ts # TS config loader
│ ├── lancedb.ts # TS LanceDB query + FTS fallback
│ ├── types.ts # Shared types (SearchResult, ResponseEnvelope)
│ └── response.ts # makeEnvelope() factory
├── dist/index.js # Built plugin (do not edit)
├── openclaw.plugin.json # Plugin manifest
├── package.json
└── tsconfig.json
Health States
| State | Meaning | Search |
|---|---|---|
HEALTHY |
Ollama up + index exists | Vector similarity (semantic) |
DEGRADED |
Ollama down + index exists | FTS on chunk_text (BM25) |
UNAVAILABLE |
No index / corrupted | Error — run obsidian-rag index first |