- Add index-tool.ts command implementation - Wire lancedb.ts vector search into plugin - Update src/tools/index.ts exports - Bump package deps (ts-jest, jest, typescript, lancedb) - Add .claude/settings.local.json Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
471 lines
11 KiB
Markdown
471 lines
11 KiB
Markdown
# Obsidian RAG — Manual Testing Guide
|
||
|
||
**What it does:** Indexes an Obsidian vault → LanceDB → semantic search via Ollama embeddings. Powers OpenClaw agent tools for natural-language queries over 677+ personal notes.
|
||
|
||
**Stack:** Python indexer (CLI) → LanceDB → TypeScript plugin (OpenClaw)
|
||
|
||
---
|
||
|
||
## Prerequisites
|
||
|
||
| Component | Version | Verify |
|
||
|---|---|---|
|
||
| Python | ≥3.11 | `python --version` |
|
||
| Node.js | ≥18 | `node --version` |
|
||
| Ollama | running | `curl http://localhost:11434/api/tags` |
|
||
| Ollama model | `mxbai-embed-large:335m` | `ollama list` |
|
||
|
||
**Install Ollama + model (if needed):**
|
||
```bash
|
||
# macOS/Linux
|
||
curl -fsSL https://ollama.com/install.sh | sh
|
||
|
||
# Pull embedding model
|
||
ollama pull mxbai-embed-large:335m
|
||
```
|
||
|
||
---
|
||
|
||
## Installation
|
||
|
||
### 1. Python CLI (indexer)
|
||
|
||
```bash
|
||
cd /Users/santhoshj/dev/obsidian-rag
|
||
|
||
# Create virtual environment (optional but recommended)
|
||
python -m venv .venv
|
||
source .venv/bin/activate # macOS/Linux
|
||
# .\.venv\Scripts\Activate.ps1 # Windows PowerShell
|
||
# .venv\Scripts\activate.bat # Windows CMD
|
||
|
||
# Install in editable mode
|
||
pip install -e python/
|
||
```
|
||
|
||
**Verify:**
|
||
```bash
|
||
obsidian-rag --help
|
||
# → obsidian-rag index | sync | reindex | status
|
||
```
|
||
|
||
### 2. TypeScript Plugin (for OpenClaw integration)
|
||
|
||
```bash
|
||
npm install
|
||
npm run build # → dist/index.js (131kb)
|
||
```
|
||
|
||
### 3. (Optional) Ollama running
|
||
|
||
```bash
|
||
ollama serve &
|
||
curl http://localhost:11434/api/tags
|
||
```
|
||
|
||
---
|
||
|
||
## Configuration
|
||
|
||
Edit `obsidian-rag/config.json` at the project root:
|
||
|
||
```json
|
||
{
|
||
"vault_path": "./KnowledgeVault/Default",
|
||
"embedding": {
|
||
"provider": "ollama",
|
||
"model": "mxbai-embed-large:335m",
|
||
"base_url": "http://localhost:11434",
|
||
"dimensions": 1024,
|
||
"batch_size": 64
|
||
},
|
||
"vector_store": {
|
||
"type": "lancedb",
|
||
"path": "./obsidian-rag/vectors.lance"
|
||
},
|
||
"indexing": {
|
||
"chunk_size": 500,
|
||
"chunk_overlap": 100,
|
||
"file_patterns": ["*.md"],
|
||
"deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git", ".logseq"],
|
||
"allow_dirs": []
|
||
},
|
||
"security": {
|
||
"require_confirmation_for": ["health", "financial_debt"],
|
||
"sensitive_sections": ["#mentalhealth", "#physicalhealth", "#Relations"],
|
||
"local_only": true
|
||
}
|
||
}
|
||
```
|
||
|
||
| Field | What it does |
|
||
|---|---|
|
||
| `vault_path` | Root of Obsidian vault (relative or absolute) |
|
||
| `embedding.model` | Ollama model for `mxbai-embed-large:335m` |
|
||
| `vector_store.path` | Where LanceDB data lives |
|
||
| `deny_dirs` | Always-skipped directories |
|
||
| `allow_dirs` | If non-empty, **only** these directories are indexed |
|
||
|
||
**Windows users:** Use `".\\KnowledgeVault\\Default"` or an absolute path like `"C:\\Users\\you\\KnowledgeVault\\Default"`.
|
||
|
||
---
|
||
|
||
## CLI Commands
|
||
|
||
All commands run from the project root (`/Users/santhoshj/dev/obsidian-rag`).
|
||
|
||
### `obsidian-rag index` — Full Index
|
||
|
||
First-time indexing. Scans all `.md` files → chunks → embeds → stores in LanceDB.
|
||
|
||
```bash
|
||
obsidian-rag index
|
||
```
|
||
|
||
**Output:**
|
||
```json
|
||
{
|
||
"type": "complete",
|
||
"indexed_files": 627,
|
||
"total_chunks": 3764,
|
||
"duration_ms": 45230,
|
||
"errors": []
|
||
}
|
||
```
|
||
|
||
**What happens:**
|
||
1. Walk vault (respects `deny_dirs` / `allow_dirs`)
|
||
2. Parse markdown: frontmatter, headings, tags, dates
|
||
3. Chunk: structured notes (journal) split by `# heading`; unstructured use 500-token sliding window
|
||
4. Embed: batch of 64 chunks → Ollama `/api/embeddings`
|
||
5. Upsert: write to LanceDB
|
||
6. Write `obsidian-rag/sync-result.json` atomically
|
||
|
||
**Time:** ~45s for 627 files on first run.
|
||
|
||
### `obsidian-rag sync` — Incremental Sync
|
||
|
||
Only re-indexes files changed since last sync (by `mtime`).
|
||
|
||
```bash
|
||
obsidian-rag sync
|
||
```
|
||
|
||
**Output:**
|
||
```json
|
||
{
|
||
"type": "complete",
|
||
"indexed_files": 3,
|
||
"total_chunks": 12,
|
||
"duration_ms": 1200,
|
||
"errors": []
|
||
}
|
||
```
|
||
|
||
**Use when:** You edited/added a few notes and want to update the index without a full rebuild.
|
||
|
||
### `obsidian-rag reindex` — Force Rebuild
|
||
|
||
Nukes the existing LanceDB table and rebuilds from scratch.
|
||
|
||
```bash
|
||
obsidian-rag reindex
|
||
```
|
||
|
||
**Use when:**
|
||
- LanceDB schema changed
|
||
- Chunking strategy changed
|
||
- Index corrupted
|
||
- First run after upgrading (to pick up FTS index)
|
||
|
||
### `obsidian-rag status` — Index Health
|
||
|
||
```bash
|
||
obsidian-rag status
|
||
```
|
||
|
||
**Output:**
|
||
```json
|
||
{
|
||
"total_docs": 627,
|
||
"total_chunks": 3764,
|
||
"last_sync": "2026-04-11T00:30:00Z"
|
||
}
|
||
```
|
||
|
||
### Re-index after schema upgrade (important!)
|
||
|
||
If you pulled a new version that changed the FTS index setup, you **must** reindex:
|
||
|
||
```bash
|
||
obsidian-rag reindex
|
||
```
|
||
|
||
This drops and recreates the LanceDB table, rebuilding the FTS index on `chunk_text`.
|
||
|
||
---
|
||
|
||
## Manual Testing Walkthrough
|
||
|
||
### Step 1 — Verify prerequisites
|
||
|
||
```bash
|
||
# Ollama up?
|
||
curl http://localhost:11434/api/tags
|
||
|
||
# Python CLI working?
|
||
obsidian-rag --help
|
||
|
||
# Vault accessible?
|
||
ls ./KnowledgeVault/Default | head -5
|
||
```
|
||
|
||
### Step 2 — Do a full index
|
||
|
||
```bash
|
||
obsidian-rag index
|
||
```
|
||
|
||
Expected: ~30-60s. JSON output with `indexed_files` and `total_chunks`.
|
||
|
||
### Step 3 — Check status
|
||
|
||
```bash
|
||
obsidian-rag status
|
||
```
|
||
|
||
### Step 4 — Test search via Python
|
||
|
||
The Python indexer doesn't have an interactive search CLI, but you can test via the LanceDB Python API directly:
|
||
|
||
```python
|
||
python3 -c "
|
||
import sys
|
||
sys.path.insert(0, 'python')
|
||
from obsidian_rag.vector_store import get_db, search_chunks
|
||
from obsidian_rag.embedder import embed_texts
|
||
from obsidian_rag.config import load_config
|
||
|
||
config = load_config()
|
||
db = get_db(config)
|
||
table = db.open_table('obsidian_chunks')
|
||
|
||
# Embed a query
|
||
query_vec = embed_texts(['how was my mental health in 2024'], config)[0]
|
||
|
||
# Search
|
||
results = search_chunks(table, query_vec, limit=3)
|
||
for r in results:
|
||
print(f'[{r.score:.3f}] {r.source_file} | {r.section or \"(no section)\"}')
|
||
print(f' {r.chunk_text[:200]}...')
|
||
print()
|
||
"
|
||
```
|
||
|
||
### Step 5 — Test TypeScript search (via Node)
|
||
|
||
```bash
|
||
node --input-type=module -e "
|
||
import { loadConfig } from './src/utils/config.js';
|
||
import { searchVectorDb } from './src/utils/lancedb.js';
|
||
|
||
const config = loadConfig();
|
||
const results = await searchVectorDb(config, 'how was my mental health in 2024', { max_results: 3 });
|
||
for (const r of results) {
|
||
console.log(\`[\${r.score}] \${r.source_file} | \${r.section || '(no section)'}\`);
|
||
console.log(\` \${r.chunk_text.slice(0, 180)}...\`);
|
||
console.log();
|
||
}
|
||
"
|
||
```
|
||
|
||
### Step 6 — Test DEGRADED mode (Ollama down)
|
||
|
||
Stop Ollama, then run the same search:
|
||
|
||
```bash
|
||
# Stop Ollama
|
||
pkill -f ollama # macOS/Linux
|
||
|
||
# Now run search — should fall back to FTS
|
||
node --input-type=module -e "
|
||
...same as above...
|
||
"
|
||
```
|
||
|
||
Expected: results come back using BM25 full-text search instead of vector similarity. You'll see lower `_score` values (BM25 scores are smaller floats).
|
||
|
||
### Step 7 — Test sync
|
||
|
||
```bash
|
||
# Edit a note
|
||
echo "# Test edit
|
||
This is a test note about Ollama being down." >> ./KnowledgeVault/Default/test-note.md
|
||
|
||
# Sync
|
||
obsidian-rag sync
|
||
|
||
# Check it was indexed
|
||
obsidian-rag status
|
||
```
|
||
|
||
### Step 8 — Test indexer health check
|
||
|
||
```bash
|
||
# Stop Ollama
|
||
pkill -f ollama
|
||
|
||
# Check status — will report Ollama as down but still show index stats
|
||
obsidian-rag status
|
||
|
||
# Restart Ollama
|
||
ollama serve
|
||
```
|
||
|
||
---
|
||
|
||
## Directory Filtering
|
||
|
||
Test searching only within `Journal`:
|
||
|
||
```bash
|
||
node --input-type=module -e "
|
||
import { loadConfig } from './src/utils/config.js';
|
||
import { searchVectorDb } from './src/utils/lancedb.js';
|
||
const config = loadConfig();
|
||
const results = await searchVectorDb(config, 'my mood and feelings', {
|
||
max_results: 3,
|
||
directory_filter: ['Journal']
|
||
});
|
||
results.forEach(r => console.log(\`[\${r.score}] \${r.source_file}\`));
|
||
"
|
||
```
|
||
|
||
---
|
||
|
||
## File Paths Reference
|
||
|
||
| File | Purpose |
|
||
|---|---|
|
||
| `obsidian-rag/vectors.lance/` | LanceDB data directory |
|
||
| `obsidian-rag/sync-result.json` | Last sync timestamp + stats |
|
||
| `python/obsidian_rag/` | Python package source |
|
||
| `src/` | TypeScript plugin source |
|
||
| `dist/index.js` | Built plugin bundle |
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
### `FileNotFoundError: config.json`
|
||
|
||
Config must be found. The CLI looks in:
|
||
1. `./obsidian-rag/config.json` (relative to project root)
|
||
2. `~/.obsidian-rag/config.json` (home directory)
|
||
|
||
```bash
|
||
# Verify config is found
|
||
python3 -c "
|
||
import sys; sys.path.insert(0,'python')
|
||
from obsidian_rag.config import load_config
|
||
c = load_config()
|
||
print('vault_path:', c.vault_path)
|
||
"
|
||
```
|
||
|
||
### `ERROR: Index not found. Run 'obsidian-rag index' first.`
|
||
|
||
LanceDB table doesn't exist yet. Run `obsidian-rag index`.
|
||
|
||
### Ollama connection refused
|
||
|
||
```bash
|
||
curl http://localhost:11434/api/tags
|
||
```
|
||
|
||
If this fails, Ollama isn't running:
|
||
|
||
```bash
|
||
ollama serve &
|
||
ollama pull mxbai-embed-large:335m
|
||
```
|
||
|
||
### Vector search returns 0 results
|
||
|
||
1. Check index exists: `obsidian-rag status`
|
||
2. Rebuild index: `obsidian-rag reindex`
|
||
3. Check Ollama is up and model is available: `ollama list`
|
||
|
||
### FTS (DEGRADED mode) not working after upgrade
|
||
|
||
The FTS index on `chunk_text` was added in a recent change. **Reindex to rebuild with FTS:**
|
||
|
||
```bash
|
||
obsidian-rag reindex
|
||
```
|
||
|
||
### Permission errors on Windows
|
||
|
||
Run terminal as Administrator, or install Python/Ollama to user-writable directories.
|
||
|
||
### Very slow embedding
|
||
|
||
Reduce batch size in `config.json`:
|
||
|
||
```json
|
||
"batch_size": 32
|
||
```
|
||
|
||
---
|
||
|
||
## Project Structure
|
||
|
||
```
|
||
obsidian-rag/
|
||
├── obsidian-rag/
|
||
│ ├── config.json # Dev configuration
|
||
│ ├── vectors.lance/ # LanceDB data (created on first index)
|
||
│ └── sync-result.json # Last sync metadata
|
||
├── python/
|
||
│ ├── obsidian_rag/
|
||
│ │ ├── cli.py # obsidian-rag CLI entry point
|
||
│ │ ├── config.py # Config loader
|
||
│ │ ├── indexer.py # Full pipeline (scan → chunk → embed → store)
|
||
│ │ ├── chunker.py # Structured + sliding-window chunking
|
||
│ │ ├── embedder.py # Ollama /api/embeddings client
|
||
│ │ ├── vector_store.py # LanceDB CRUD
|
||
│ │ └── security.py # Path traversal, HTML strip, sensitive detection
|
||
│ └── tests/unit/ # 64 pytest tests
|
||
├── src/
|
||
│ ├── index.ts # OpenClaw plugin entry (definePluginEntry)
|
||
│ ├── tools/
|
||
│ │ ├── index.ts # 4× api.registerTool() calls
|
||
│ │ ├── index-tool.ts # obsidian_rag_index implementation
|
||
│ │ ├── search.ts # obsidian_rag_search implementation
|
||
│ │ ├── status.ts # obsidian_rag_status implementation
|
||
│ │ └── memory.ts # obsidian_rag_memory_store implementation
|
||
│ ├── services/
|
||
│ │ ├── health.ts # HEALTHY / DEGRADED / UNAVAILABLE state machine
|
||
│ │ ├── vault-watcher.ts # chokidar watcher + auto-sync
|
||
│ │ └── indexer-bridge.ts # Spawns Python CLI subprocess
|
||
│ └── utils/
|
||
│ ├── config.ts # TS config loader
|
||
│ ├── lancedb.ts # TS LanceDB query + FTS fallback
|
||
│ ├── types.ts # Shared types (SearchResult, ResponseEnvelope)
|
||
│ └── response.ts # makeEnvelope() factory
|
||
├── dist/index.js # Built plugin (do not edit)
|
||
├── openclaw.plugin.json # Plugin manifest
|
||
├── package.json
|
||
└── tsconfig.json
|
||
```
|
||
|
||
---
|
||
|
||
## Health States
|
||
|
||
| State | Meaning | Search |
|
||
|---|---|---|
|
||
| `HEALTHY` | Ollama up + index exists | Vector similarity (semantic) |
|
||
| `DEGRADED` | Ollama down + index exists | FTS on `chunk_text` (BM25) |
|
||
| `UNAVAILABLE` | No index / corrupted | Error — run `obsidian-rag index` first |
|