Go to file

Santhosh Janardhanan 21b9704e21 fix(indexer): use upsert_chunks return value for chunk count

Previously total_chunks counted from process_file return (num_chunks)
which could differ from actual stored count if upsert silently failed.
Now using stored count returned by upsert_chunks.

Also fixes cli._index to skip progress yields when building result.

2026-04-12 02:16:19 -04:00

.superpowers/brainstorm/27-1775849590

all in

2026-04-10 19:00:38 -04:00

docs

docs: add troubleshooting guide for misleading openclaw.hooks error

2026-04-11 22:50:04 -04:00

python

fix(indexer): use upsert_chunks return value for chunk count

2026-04-12 02:16:19 -04:00

src

fix(config): use 'obsidian-rag' not '.obsidian-rag' for dev config path

2026-04-12 01:03:00 -04:00

tests/unit

Sprint 0-1: Python indexer, TS plugin scaffolding, and test suite

2026-04-10 22:56:50 -04:00

.gitignore

Initial commit: Obsidian RAG Plugin design spec and TDD

2026-04-10 16:40:46 -04:00

AGENTS.md

feat(indexer): hierarchical chunking for large sections

2026-04-11 23:58:05 -04:00

INSTALL.md

updates to install procedures

2026-04-11 16:58:46 -04:00

openclaw.plugin.json

feat(indexer): hierarchical chunking for large sections

2026-04-11 23:58:05 -04:00

package-lock.json

Sprint 0-2: TS plugin scaffolding, LanceDB utils, tooling updates

2026-04-11 13:24:26 -04:00

package.json

docs: add troubleshooting guide for misleading openclaw.hooks error

2026-04-11 22:50:04 -04:00

README.md

Sprint 0-2: TS plugin scaffolding, LanceDB utils, tooling updates

2026-04-11 13:24:26 -04:00

tsconfig.json

Sprint 0-1: Python indexer, TS plugin scaffolding, and test suite

2026-04-10 22:56:50 -04:00

vitest.config.ts

Sprint 0-1: Python indexer, TS plugin scaffolding, and test suite

2026-04-10 22:56:50 -04:00

README.md

Obsidian RAG — Manual Testing Guide

What it does: Indexes an Obsidian vault → LanceDB → semantic search via Ollama embeddings. Powers OpenClaw agent tools for natural-language queries over 677+ personal notes.

Stack: Python indexer (CLI) → LanceDB → TypeScript plugin (OpenClaw)

Prerequisites

Component	Version	Verify
Python	≥3.11	`python --version`
Node.js	≥18	`node --version`
Ollama	running	`curl http://localhost:11434/api/tags`
Ollama model	`mxbai-embed-large:335m`	`ollama list`

Install Ollama + model (if needed):

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Pull embedding model
ollama pull mxbai-embed-large:335m

Installation

1. Python CLI (indexer)

cd /Users/santhoshj/dev/obsidian-rag

# Create virtual environment (optional but recommended)
python -m venv .venv
source .venv/bin/activate        # macOS/Linux
# .\.venv\Scripts\Activate.ps1   # Windows PowerShell
# .venv\Scripts\activate.bat      # Windows CMD

# Install in editable mode
pip install -e python/

Verify:

obsidian-rag --help
# → obsidian-rag index | sync | reindex | status

2. TypeScript Plugin (for OpenClaw integration)

npm install
npm run build          # → dist/index.js (131kb)

3. (Optional) Ollama running

ollama serve &
curl http://localhost:11434/api/tags

Configuration

Edit obsidian-rag/config.json at the project root:

{
  "vault_path": "./KnowledgeVault/Default",
  "embedding": {
    "provider": "ollama",
    "model": "mxbai-embed-large:335m",
    "base_url": "http://localhost:11434",
    "dimensions": 1024,
    "batch_size": 64
  },
  "vector_store": {
    "type": "lancedb",
    "path": "./obsidian-rag/vectors.lance"
  },
  "indexing": {
    "chunk_size": 500,
    "chunk_overlap": 100,
    "file_patterns": ["*.md"],
    "deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git", ".logseq"],
    "allow_dirs": []
  },
  "security": {
    "require_confirmation_for": ["health", "financial_debt"],
    "sensitive_sections": ["#mentalhealth", "#physicalhealth", "#Relations"],
    "local_only": true
  }
}

Field	What it does
`vault_path`	Root of Obsidian vault (relative or absolute)
`embedding.model`	Ollama model for `mxbai-embed-large:335m`
`vector_store.path`	Where LanceDB data lives
`deny_dirs`	Always-skipped directories
`allow_dirs`	If non-empty, only these directories are indexed

Windows users: Use ".\\KnowledgeVault\\Default" or an absolute path like "C:\\Users\\you\\KnowledgeVault\\Default".

CLI Commands

All commands run from the project root (/Users/santhoshj/dev/obsidian-rag).

`obsidian-rag index` — Full Index

First-time indexing. Scans all .md files → chunks → embeds → stores in LanceDB.

obsidian-rag index

Output:

{
  "type": "complete",
  "indexed_files": 627,
  "total_chunks": 3764,
  "duration_ms": 45230,
  "errors": []
}

What happens:

Walk vault (respects deny_dirs / allow_dirs)
Parse markdown: frontmatter, headings, tags, dates
Chunk: structured notes (journal) split by # heading; unstructured use 500-token sliding window
Embed: batch of 64 chunks → Ollama /api/embeddings
Upsert: write to LanceDB
Write obsidian-rag/sync-result.json atomically

Time: ~45s for 627 files on first run.

`obsidian-rag sync` — Incremental Sync

Only re-indexes files changed since last sync (by mtime).

obsidian-rag sync

Output:

{
  "type": "complete",
  "indexed_files": 3,
  "total_chunks": 12,
  "duration_ms": 1200,
  "errors": []
}

Use when: You edited/added a few notes and want to update the index without a full rebuild.

`obsidian-rag reindex` — Force Rebuild

Nukes the existing LanceDB table and rebuilds from scratch.

obsidian-rag reindex

Use when:

LanceDB schema changed
Chunking strategy changed
Index corrupted
First run after upgrading (to pick up FTS index)

`obsidian-rag status` — Index Health

obsidian-rag status

Output:

{
  "total_docs": 627,
  "total_chunks": 3764,
  "last_sync": "2026-04-11T00:30:00Z"
}

Re-index after schema upgrade (important!)

If you pulled a new version that changed the FTS index setup, you must reindex:

obsidian-rag reindex

This drops and recreates the LanceDB table, rebuilding the FTS index on chunk_text.

Manual Testing Walkthrough

Step 1 — Verify prerequisites

# Ollama up?
curl http://localhost:11434/api/tags

# Python CLI working?
obsidian-rag --help

# Vault accessible?
ls ./KnowledgeVault/Default | head -5

Step 2 — Do a full index

obsidian-rag index

Expected: ~30-60s. JSON output with indexed_files and total_chunks.

Step 3 — Check status

obsidian-rag status

Step 4 — Test search via Python

The Python indexer doesn't have an interactive search CLI, but you can test via the LanceDB Python API directly:

python3 -c "
import sys
sys.path.insert(0, 'python')
from obsidian_rag.vector_store import get_db, search_chunks
from obsidian_rag.embedder import embed_texts
from obsidian_rag.config import load_config

config = load_config()
db = get_db(config)
table = db.open_table('obsidian_chunks')

# Embed a query
query_vec = embed_texts(['how was my mental health in 2024'], config)[0]

# Search
results = search_chunks(table, query_vec, limit=3)
for r in results:
    print(f'[{r.score:.3f}] {r.source_file} | {r.section or \"(no section)\"}')
    print(f'  {r.chunk_text[:200]}...')
    print()
"

Step 5 — Test TypeScript search (via Node)

node --input-type=module -e "
import { loadConfig } from './src/utils/config.js';
import { searchVectorDb } from './src/utils/lancedb.js';

const config = loadConfig();
const results = await searchVectorDb(config, 'how was my mental health in 2024', { max_results: 3 });
for (const r of results) {
  console.log(\`[\${r.score}] \${r.source_file} | \${r.section || '(no section)'}\`);
  console.log(\`  \${r.chunk_text.slice(0, 180)}...\`);
  console.log();
}
"

Step 6 — Test DEGRADED mode (Ollama down)

Stop Ollama, then run the same search:

# Stop Ollama
pkill -f ollama   # macOS/Linux

# Now run search — should fall back to FTS
node --input-type=module -e "
...same as above...
"

Expected: results come back using BM25 full-text search instead of vector similarity. You'll see lower _score values (BM25 scores are smaller floats).

Step 7 — Test sync

# Edit a note
echo "# Test edit
This is a test note about Ollama being down." >> ./KnowledgeVault/Default/test-note.md

# Sync
obsidian-rag sync

# Check it was indexed
obsidian-rag status

Step 8 — Test indexer health check

# Stop Ollama
pkill -f ollama

# Check status — will report Ollama as down but still show index stats
obsidian-rag status

# Restart Ollama
ollama serve

Directory Filtering

Test searching only within Journal:

node --input-type=module -e "
import { loadConfig } from './src/utils/config.js';
import { searchVectorDb } from './src/utils/lancedb.js';
const config = loadConfig();
const results = await searchVectorDb(config, 'my mood and feelings', {
  max_results: 3,
  directory_filter: ['Journal']
});
results.forEach(r => console.log(\`[\${r.score}] \${r.source_file}\`));
"

File Paths Reference

File	Purpose
`obsidian-rag/vectors.lance/`	LanceDB data directory
`obsidian-rag/sync-result.json`	Last sync timestamp + stats
`python/obsidian_rag/`	Python package source
`src/`	TypeScript plugin source
`dist/index.js`	Built plugin bundle

Troubleshooting

`FileNotFoundError: config.json`

Config must be found. The CLI looks in:

./obsidian-rag/config.json (relative to project root)
~/.obsidian-rag/config.json (home directory)

# Verify config is found
python3 -c "
import sys; sys.path.insert(0,'python')
from obsidian_rag.config import load_config
c = load_config()
print('vault_path:', c.vault_path)
"

`ERROR: Index not found. Run 'obsidian-rag index' first.`

LanceDB table doesn't exist yet. Run obsidian-rag index.

Ollama connection refused

curl http://localhost:11434/api/tags

If this fails, Ollama isn't running:

ollama serve &
ollama pull mxbai-embed-large:335m

Vector search returns 0 results

Check index exists: obsidian-rag status
Rebuild index: obsidian-rag reindex
Check Ollama is up and model is available: ollama list

FTS (DEGRADED mode) not working after upgrade

The FTS index on chunk_text was added in a recent change. Reindex to rebuild with FTS:

obsidian-rag reindex

Permission errors on Windows

Run terminal as Administrator, or install Python/Ollama to user-writable directories.

Very slow embedding

Reduce batch size in config.json:

"batch_size": 32

Project Structure

obsidian-rag/
├── obsidian-rag/
│   ├── config.json           # Dev configuration
│   ├── vectors.lance/        # LanceDB data (created on first index)
│   └── sync-result.json      # Last sync metadata
├── python/
│   ├── obsidian_rag/
│   │   ├── cli.py            # obsidian-rag CLI entry point
│   │   ├── config.py         # Config loader
│   │   ├── indexer.py        # Full pipeline (scan → chunk → embed → store)
│   │   ├── chunker.py        # Structured + sliding-window chunking
│   │   ├── embedder.py       # Ollama /api/embeddings client
│   │   ├── vector_store.py   # LanceDB CRUD
│   │   └── security.py       # Path traversal, HTML strip, sensitive detection
│   └── tests/unit/           # 64 pytest tests
├── src/
│   ├── index.ts              # OpenClaw plugin entry (definePluginEntry)
│   ├── tools/
│   │   ├── index.ts         # 4× api.registerTool() calls
│   │   ├── index-tool.ts     # obsidian_rag_index implementation
│   │   ├── search.ts        # obsidian_rag_search implementation
│   │   ├── status.ts        # obsidian_rag_status implementation
│   │   └── memory.ts        # obsidian_rag_memory_store implementation
│   ├── services/
│   │   ├── health.ts        # HEALTHY / DEGRADED / UNAVAILABLE state machine
│   │   ├── vault-watcher.ts  # chokidar watcher + auto-sync
│   │   └── indexer-bridge.ts # Spawns Python CLI subprocess
│   └── utils/
│       ├── config.ts         # TS config loader
│       ├── lancedb.ts        # TS LanceDB query + FTS fallback
│       ├── types.ts          # Shared types (SearchResult, ResponseEnvelope)
│       └── response.ts       # makeEnvelope() factory
├── dist/index.js             # Built plugin (do not edit)
├── openclaw.plugin.json      # Plugin manifest
├── package.json
└── tsconfig.json

Health States

State	Meaning	Search
`HEALTHY`	Ollama up + index exists	Vector similarity (semantic)
`DEGRADED`	Ollama down + index exists	FTS on `chunk_text` (BM25)
`UNAVAILABLE`	No index / corrupted	Error — run `obsidian-rag index` first

README.md Unescape Escape

Obsidian RAG — Manual Testing Guide

Prerequisites

Installation

1. Python CLI (indexer)

2. TypeScript Plugin (for OpenClaw integration)

3. (Optional) Ollama running

Configuration

CLI Commands

obsidian-rag index — Full Index

obsidian-rag sync — Incremental Sync

obsidian-rag reindex — Force Rebuild

obsidian-rag status — Index Health

Re-index after schema upgrade (important!)

Manual Testing Walkthrough

Step 1 — Verify prerequisites

Step 2 — Do a full index

Step 3 — Check status

Step 4 — Test search via Python

Step 5 — Test TypeScript search (via Node)

Step 6 — Test DEGRADED mode (Ollama down)

Step 7 — Test sync

Step 8 — Test indexer health check

Directory Filtering

File Paths Reference

Troubleshooting

FileNotFoundError: config.json

ERROR: Index not found. Run 'obsidian-rag index' first.

Ollama connection refused

Vector search returns 0 results

FTS (DEGRADED mode) not working after upgrade

Permission errors on Windows

Very slow embedding

Project Structure

Health States

README.md

`obsidian-rag index` — Full Index

`obsidian-rag sync` — Incremental Sync

`obsidian-rag reindex` — Force Rebuild

`obsidian-rag status` — Index Health

`FileNotFoundError: config.json`

`ERROR: Index not found. Run 'obsidian-rag index' first.`