# Obsidian RAG — Manual Testing Guide **What it does:** Indexes an Obsidian vault → LanceDB → semantic search via Ollama embeddings. Powers OpenClaw agent tools for natural-language queries over 677+ personal notes. **Stack:** Python indexer (CLI) → LanceDB → TypeScript plugin (OpenClaw) --- ## Prerequisites | Component | Version | Verify | |---|---|---| | Python | ≥3.11 | `python --version` | | Node.js | ≥18 | `node --version` | | Ollama | running | `curl http://localhost:11434/api/tags` | | Ollama model | `mxbai-embed-large:335m` | `ollama list` | **Install Ollama + model (if needed):** ```bash # macOS/Linux curl -fsSL https://ollama.com/install.sh | sh # Pull embedding model ollama pull mxbai-embed-large:335m ``` --- ## Installation ### 1. Python CLI (indexer) ```bash cd /Users/santhoshj/dev/obsidian-rag # Create virtual environment (optional but recommended) python -m venv .venv source .venv/bin/activate # macOS/Linux # .\.venv\Scripts\Activate.ps1 # Windows PowerShell # .venv\Scripts\activate.bat # Windows CMD # Install in editable mode pip install -e python/ ``` **Verify:** ```bash obsidian-rag --help # → obsidian-rag index | sync | reindex | status ``` ### 2. TypeScript Plugin (for OpenClaw integration) ```bash npm install npm run build # → dist/index.js (131kb) ``` ### 3. (Optional) Ollama running ```bash ollama serve & curl http://localhost:11434/api/tags ``` --- ## Configuration Edit `obsidian-rag/config.json` at the project root: ```json { "vault_path": "./KnowledgeVault/Default", "embedding": { "provider": "ollama", "model": "mxbai-embed-large:335m", "base_url": "http://localhost:11434", "dimensions": 1024, "batch_size": 64 }, "vector_store": { "type": "lancedb", "path": "./obsidian-rag/vectors.lance" }, "indexing": { "chunk_size": 500, "chunk_overlap": 100, "file_patterns": ["*.md"], "deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git", ".logseq"], "allow_dirs": [] }, "security": { "require_confirmation_for": ["health", "financial_debt"], "sensitive_sections": ["#mentalhealth", "#physicalhealth", "#Relations"], "local_only": true } } ``` | Field | What it does | |---|---| | `vault_path` | Root of Obsidian vault (relative or absolute) | | `embedding.model` | Ollama model for `mxbai-embed-large:335m` | | `vector_store.path` | Where LanceDB data lives | | `deny_dirs` | Always-skipped directories | | `allow_dirs` | If non-empty, **only** these directories are indexed | **Windows users:** Use `".\\KnowledgeVault\\Default"` or an absolute path like `"C:\\Users\\you\\KnowledgeVault\\Default"`. --- ## CLI Commands All commands run from the project root (`/Users/santhoshj/dev/obsidian-rag`). ### `obsidian-rag index` — Full Index First-time indexing. Scans all `.md` files → chunks → embeds → stores in LanceDB. ```bash obsidian-rag index ``` **Output:** ```json { "type": "complete", "indexed_files": 627, "total_chunks": 3764, "duration_ms": 45230, "errors": [] } ``` **What happens:** 1. Walk vault (respects `deny_dirs` / `allow_dirs`) 2. Parse markdown: frontmatter, headings, tags, dates 3. Chunk: structured notes (journal) split by `# heading`; unstructured use 500-token sliding window 4. Embed: batch of 64 chunks → Ollama `/api/embeddings` 5. Upsert: write to LanceDB 6. Write `obsidian-rag/sync-result.json` atomically **Time:** ~45s for 627 files on first run. ### `obsidian-rag sync` — Incremental Sync Only re-indexes files changed since last sync (by `mtime`). ```bash obsidian-rag sync ``` **Output:** ```json { "type": "complete", "indexed_files": 3, "total_chunks": 12, "duration_ms": 1200, "errors": [] } ``` **Use when:** You edited/added a few notes and want to update the index without a full rebuild. ### `obsidian-rag reindex` — Force Rebuild Nukes the existing LanceDB table and rebuilds from scratch. ```bash obsidian-rag reindex ``` **Use when:** - LanceDB schema changed - Chunking strategy changed - Index corrupted - First run after upgrading (to pick up FTS index) ### `obsidian-rag status` — Index Health ```bash obsidian-rag status ``` **Output:** ```json { "total_docs": 627, "total_chunks": 3764, "last_sync": "2026-04-11T00:30:00Z" } ``` ### Re-index after schema upgrade (important!) If you pulled a new version that changed the FTS index setup, you **must** reindex: ```bash obsidian-rag reindex ``` This drops and recreates the LanceDB table, rebuilding the FTS index on `chunk_text`. --- ## Manual Testing Walkthrough ### Step 1 — Verify prerequisites ```bash # Ollama up? curl http://localhost:11434/api/tags # Python CLI working? obsidian-rag --help # Vault accessible? ls ./KnowledgeVault/Default | head -5 ``` ### Step 2 — Do a full index ```bash obsidian-rag index ``` Expected: ~30-60s. JSON output with `indexed_files` and `total_chunks`. ### Step 3 — Check status ```bash obsidian-rag status ``` ### Step 4 — Test search via Python The Python indexer doesn't have an interactive search CLI, but you can test via the LanceDB Python API directly: ```python python3 -c " import sys sys.path.insert(0, 'python') from obsidian_rag.vector_store import get_db, search_chunks from obsidian_rag.embedder import embed_texts from obsidian_rag.config import load_config config = load_config() db = get_db(config) table = db.open_table('obsidian_chunks') # Embed a query query_vec = embed_texts(['how was my mental health in 2024'], config)[0] # Search results = search_chunks(table, query_vec, limit=3) for r in results: print(f'[{r.score:.3f}] {r.source_file} | {r.section or \"(no section)\"}') print(f' {r.chunk_text[:200]}...') print() " ``` ### Step 5 — Test TypeScript search (via Node) ```bash node --input-type=module -e " import { loadConfig } from './src/utils/config.js'; import { searchVectorDb } from './src/utils/lancedb.js'; const config = loadConfig(); const results = await searchVectorDb(config, 'how was my mental health in 2024', { max_results: 3 }); for (const r of results) { console.log(\`[\${r.score}] \${r.source_file} | \${r.section || '(no section)'}\`); console.log(\` \${r.chunk_text.slice(0, 180)}...\`); console.log(); } " ``` ### Step 6 — Test DEGRADED mode (Ollama down) Stop Ollama, then run the same search: ```bash # Stop Ollama pkill -f ollama # macOS/Linux # Now run search — should fall back to FTS node --input-type=module -e " ...same as above... " ``` Expected: results come back using BM25 full-text search instead of vector similarity. You'll see lower `_score` values (BM25 scores are smaller floats). ### Step 7 — Test sync ```bash # Edit a note echo "# Test edit This is a test note about Ollama being down." >> ./KnowledgeVault/Default/test-note.md # Sync obsidian-rag sync # Check it was indexed obsidian-rag status ``` ### Step 8 — Test indexer health check ```bash # Stop Ollama pkill -f ollama # Check status — will report Ollama as down but still show index stats obsidian-rag status # Restart Ollama ollama serve ``` --- ## Directory Filtering Test searching only within `Journal`: ```bash node --input-type=module -e " import { loadConfig } from './src/utils/config.js'; import { searchVectorDb } from './src/utils/lancedb.js'; const config = loadConfig(); const results = await searchVectorDb(config, 'my mood and feelings', { max_results: 3, directory_filter: ['Journal'] }); results.forEach(r => console.log(\`[\${r.score}] \${r.source_file}\`)); " ``` --- ## File Paths Reference | File | Purpose | |---|---| | `obsidian-rag/vectors.lance/` | LanceDB data directory | | `obsidian-rag/sync-result.json` | Last sync timestamp + stats | | `python/obsidian_rag/` | Python package source | | `src/` | TypeScript plugin source | | `dist/index.js` | Built plugin bundle | --- ## Troubleshooting ### `FileNotFoundError: config.json` Config must be found. The CLI looks in: 1. `./obsidian-rag/config.json` (relative to project root) 2. `~/.obsidian-rag/config.json` (home directory) ```bash # Verify config is found python3 -c " import sys; sys.path.insert(0,'python') from obsidian_rag.config import load_config c = load_config() print('vault_path:', c.vault_path) " ``` ### `ERROR: Index not found. Run 'obsidian-rag index' first.` LanceDB table doesn't exist yet. Run `obsidian-rag index`. ### Ollama connection refused ```bash curl http://localhost:11434/api/tags ``` If this fails, Ollama isn't running: ```bash ollama serve & ollama pull mxbai-embed-large:335m ``` ### Vector search returns 0 results 1. Check index exists: `obsidian-rag status` 2. Rebuild index: `obsidian-rag reindex` 3. Check Ollama is up and model is available: `ollama list` ### FTS (DEGRADED mode) not working after upgrade The FTS index on `chunk_text` was added in a recent change. **Reindex to rebuild with FTS:** ```bash obsidian-rag reindex ``` ### Permission errors on Windows Run terminal as Administrator, or install Python/Ollama to user-writable directories. ### Very slow embedding Reduce batch size in `config.json`: ```json "batch_size": 32 ``` --- ## Project Structure ``` obsidian-rag/ ├── obsidian-rag/ │ ├── config.json # Dev configuration │ ├── vectors.lance/ # LanceDB data (created on first index) │ └── sync-result.json # Last sync metadata ├── python/ │ ├── obsidian_rag/ │ │ ├── cli.py # obsidian-rag CLI entry point │ │ ├── config.py # Config loader │ │ ├── indexer.py # Full pipeline (scan → chunk → embed → store) │ │ ├── chunker.py # Structured + sliding-window chunking │ │ ├── embedder.py # Ollama /api/embeddings client │ │ ├── vector_store.py # LanceDB CRUD │ │ └── security.py # Path traversal, HTML strip, sensitive detection │ └── tests/unit/ # 64 pytest tests ├── src/ │ ├── index.ts # OpenClaw plugin entry (definePluginEntry) │ ├── tools/ │ │ ├── index.ts # 4× api.registerTool() calls │ │ ├── index-tool.ts # obsidian_rag_index implementation │ │ ├── search.ts # obsidian_rag_search implementation │ │ ├── status.ts # obsidian_rag_status implementation │ │ └── memory.ts # obsidian_rag_memory_store implementation │ ├── services/ │ │ ├── health.ts # HEALTHY / DEGRADED / UNAVAILABLE state machine │ │ ├── vault-watcher.ts # chokidar watcher + auto-sync │ │ └── indexer-bridge.ts # Spawns Python CLI subprocess │ └── utils/ │ ├── config.ts # TS config loader │ ├── lancedb.ts # TS LanceDB query + FTS fallback │ ├── types.ts # Shared types (SearchResult, ResponseEnvelope) │ └── response.ts # makeEnvelope() factory ├── dist/index.js # Built plugin (do not edit) ├── openclaw.plugin.json # Plugin manifest ├── package.json └── tsconfig.json ``` --- ## Health States | State | Meaning | Search | |---|---|---| | `HEALTHY` | Ollama up + index exists | Vector similarity (semantic) | | `DEGRADED` | Ollama down + index exists | FTS on `chunk_text` (BM25) | | `UNAVAILABLE` | No index / corrupted | Error — run `obsidian-rag index` first |