Files

Santhosh Janardhanan de3b9c1c12 updates to install procedures

2026-04-11 16:58:46 -04:00

19 KiB

Raw Blame History

Obsidian-RAG — Installation Guide for OpenClaw

What this plugin does: Indexes an Obsidian vault into LanceDB using Ollama embeddings, then powers four OpenClaw tools — obsidian_rag_search, obsidian_rag_index, obsidian_rag_status, and obsidian_rag_memory_store — so OpenClaw can answer natural-language questions over your personal notes (journal, finance, health, relationships, etc.).

Stack:

Python 3.11+ CLI → LanceDB vector store + Ollama embeddings
TypeScript/OpenClaw plugin → OpenClaw agent tools
Ollama (local) → embedding inference

Prerequisites
Clone the Repository
Install Ollama + Embedding Model
Install Python CLI (Indexer)
Install Node.js / TypeScript Plugin
Configure the Plugin
Run the Initial Index
Register the Plugin with OpenClaw
Verify Everything Works
Keeping the Index Fresh
Troubleshooting

1. Prerequisites

Component	Required Version	Why
Python	≥ 3.11	Async I/O, modern type hints
Node.js	≥ 18	ESM modules, `node:` imports
npm	any recent	installs TypeScript deps
Ollama	running on `localhost:11434`	local embedding inference
Disk space	~500 MB free	LanceDB store grows with vault

Verify your environment:

python --version    # → Python 3.11.x or higher
node --version      # → v18.x.x or higher
npm --version       # → 9.x.x or higher
curl http://localhost:11434/api/tags  # → {"models": [...]} if Ollama is running

If Ollama is not running yet, skip to §3 before continuing.

2. Clone the Repository

# Replace DESTINATION with where you want the project to live.
# The project root must be writable (not inside /System or a read-only mount).
DESTINATION="$HOME/dev/obsidian-rag"
mkdir -p "$HOME/dev"
git clone https://git.phostrich.com/santhoshj/obsidian-rag.git "$DESTINATION"
cd "$DESTINATION"

Important: The obsidian-rag/config.json, obsidian-rag/vectors.lance/, and obsidian-rag/sync-result.json directories are created at runtime below the project root. Choose a destination with adequate write permissions.

Note for existing clones: If you are re-running this guide on an already-cloned copy, pull the latest changes first:
git pull origin model/minimax

3. Install Ollama + Embedding Model

The plugin requires Ollama running locally with the mxbai-embed-large:335m embedding model.

3.1 Install Ollama

macOS / Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download the installer from https://ollama.com/download

Verify:

ollama --version

3.2 Start Ollama

ollama serve &
# Give it 2 seconds to bind to port 11434
sleep 2
curl http://localhost:11434/api/tags
# → {"models": []}

Auto-start tip: On macOS, consider installing Ollama as a LaunchAgent so it survives reboots. On Linux systemd: sudo systemctl enable ollama

3.3 Pull the Embedding Model

ollama pull mxbai-embed-large:335m

This downloads ~335 MB. Expected output:

pulling manifest
pulling 4a5b...  100%
verifying sha256 digest
writing manifest
success

Verify the model is available:

ollama list
# → NAME                      ID           SIZE      MODIFIED
# → mxbai-embed-large:335m    7c6d...      335 MB    2026-04-...

Model note: The config (obsidian-rag/config.json) defaults to mxbai-embed-large:335m. If you use a different model, update embedding.model and embedding.dimensions in the config file (see §6).

4. Install Python CLI (Indexer)

The Python CLI (obsidian-rag) handles all vault scanning, chunking, embedding, and LanceDB storage.

4.1 Create a Virtual Environment

Using a virtual environment isolates this project's dependencies from your system Python.

macOS / Linux:

cd "$DESTINATION"
python -m venv .venv
source .venv/bin/activate

Windows (PowerShell):

cd "$DESTINATION"
python -m venv .venv
.venv\Scripts\Activate.ps1

Windows (CMD):

cd %DESTINATION%
python -m venv .venv
.venv\Scripts\activate.bat

You should now see (.venv) prepended to your shell prompt.

4.2 Install the Package in Editable Mode

pip install -e python/

This installs all runtime dependencies:

lancedb — vector database
httpx — HTTP client for Ollama
pyyaml — config file parsing
python-frontmatter — YAML frontmatter extraction

Verify the CLI is accessible:

obsidian-rag --help

Expected output:

usage: obsidian-rag [-h] {index,sync,reindex,status}

positional arguments:
  {index,sync,reindex,status}
    index       Full vault index (scan → chunk → embed → store)
    sync        Incremental sync (only changed files)
    reindex     Force clean rebuild (deletes existing index)
    status      Show index health and statistics

Python path tip: The CLI entry point (obsidian-rag) is installed into .venv/bin/. Always activate the venv before running CLI commands:
source .venv/bin/activate   # macOS/Linux
.venv\Scripts\activate       # Windows PowerShell

Without venv: If you prefer a system-wide install instead of a venv, skip step 4.1 and run pip install -e python/ directly. Not recommended if you have other Python projects with conflicting dependencies.

5. Install Node.js / TypeScript Plugin

The TypeScript plugin registers the OpenClaw tools (obsidian_rag_search, obsidian_rag_index, obsidian_rag_status, obsidian_rag_memory_store).

5.1 Install npm Dependencies

cd "$DESTINATION"
npm install

This installs into node_modules/ and writes package-lock.json. Packages include:

openclaw — plugin framework
@lancedb/lancedb — vector DB client (Node.js bindings)
chokidar — file system watcher for auto-sync
yaml — config file parsing

5.2 Build the Plugin

npm run build

This compiles src/index.ts → dist/index.js (a single ESM bundle, ~131 KB).

Expected output:

dist/index.js  131.2kb

Done in ~1s

Watch mode (development): Run npm run dev to rebuild automatically on file changes.

Type checking (optional but recommended):
npm run typecheck
Should produce no errors.

6. Configure the Plugin

All configuration lives in obsidian-rag/config.json relative to the project root.

6.1 Inspect the Default Config

cat "$DESTINATION/obsidian-rag/config.json"

6.2 Key Fields to Customize

Field	Default	Change if…
`vault_path`	`"./KnowledgeVault/Default"`	Your vault is in a different location
`embedding.model`	`"mxbai-embed-large:335m"`	You pulled a different Ollama model
`embedding.base_url`	`"http://localhost:11434"`	Ollama runs on a different host/port
`vector_store.path`	`"./obsidian-rag/vectors.lance"`	You want data in a different directory
`deny_dirs`	`[".obsidian", ".trash", ...]`	You want to skip or allow additional directories

6.3 Set Your Vault Path

Option A — Relative to the project root (recommended): Symlink or place your vault relative to the project:

# Example: your vault is at ~/obsidian-vault
# In config.json:
"vault_path": "../obsidian-vault"

Option B — Absolute path:

"vault_path": "/Users/yourusername/obsidian-vault"

Option C — Windows absolute path:

"vault_path": "C:\\Users\\YourUsername\\obsidian-vault"

Path validation: The CLI validates vault_path exists on the filesystem before indexing. You can verify manually:

ls "$DESTINATION/obsidian-rag/config.json"
python3 -c "
import json
with open('$DESTINATION/obsidian-rag/config.json') as f:
    cfg = json.load(f)
import os
assert os.path.isdir(cfg['vault_path']), 'vault_path does not exist'
print('Vault path OK:', cfg['vault_path'])
"

7. Run the Initial Index

This is a one-time step that scans every .md file in your vault, chunks them, embeds them via Ollama, and stores them in LanceDB.

# Make sure the venv is active
source .venv/bin/activate   # macOS/Linux
# .venv\Scripts\activate    # Windows

obsidian-rag index

Expected output (truncated):

{
  "type": "complete",
  "indexed_files": 627,
  "total_chunks": 3764,
  "duration_ms": 45230,
  "errors": []
}

What happens during `index`:

Vault walk — traverses all subdirectories, skipping deny_dirs (.obsidian, .trash, zzz-Archive, etc.)
Frontmatter parse — extracts YAML frontmatter, headings, tags, and dates from each .md file
Chunking — structured notes (journal entries) split by # heading; unstructured notes use a 500-token sliding window with 100-token overlap
Embedding — batches of 64 chunks sent to Ollama /api/embeddings endpoint
Storage — vectors upserted into LanceDB at obsidian-rag/vectors.lance/
Sync record — writes obsidian-rag/sync-result.json with timestamp and stats

Time estimate: ~30–60 seconds for 500–700 files on a modern machine. The embedding step is the bottleneck; Ollama must process each batch sequentially.

Batch size tuning: If embedding is slow, reduce embedding.batch_size in config.json (e.g., "batch_size": 32).

8. Register the Plugin with OpenClaw

OpenClaw discovers plugins from these locations:

~/.openclaw/extensions/ (global, recommended for most users)
<workspace>/.openclaw/extensions/ (workspace-specific)
Bundled plugins in OpenClaw's install directory

8.1 Link Plugin to Global Extensions (Recommended)

mkdir -p ~/.openclaw/extensions
ln -s "$DESTINATION" ~/.openclaw/extensions/obsidian-rag

8.2 Link Plugin to Workspace Extensions (Alternative)

# From your OpenClaw workspace root
mkdir -p ./.openclaw/extensions
ln -s "$DESTINATION" ./.openclaw/extensions/obsidian-rag

8.3 Using openclaw plugins install --link

openclaw plugins install --link "$DESTINATION"

8.4 Confirm the Plugin Loaded

openclaw plugins list | grep obsidian-rag
# or
openclaw plugins list --verbose | grep obsidian-rag

9. Verify Everything Works

9.1 Check Index Health

source .venv/bin/activate   # macOS/Linux
obsidian-rag status

Expected:

{
  "total_docs": 627,
  "total_chunks": 3764,
  "last_sync": "2026-04-11T00:30:00Z"
}

9.2 Test Semantic Search (via Node)

node --input-type=module -e "
import { loadConfig } from './src/utils/config.js';
import { searchVectorDb } from './src/utils/lancedb.js';

const config = loadConfig();
console.log('Searching for: how was my mental health in 2024');
const results = await searchVectorDb(config, 'how was my mental health in 2024', { max_results: 3 });
for (const r of results) {
  console.log('---');
  console.log('[' + r.score.toFixed(3) + '] ' + r.source_file + ' | ' + (r.section || '(no section)'));
  console.log('  ' + r.chunk_text.slice(0, 180) + '...');
}
"

Expected: ranked list of relevant note chunks with cosine similarity scores.

9.3 Test DEGRADED Mode (Ollama Down)

If Ollama is unavailable, the plugin falls back to BM25 full-text search on chunk_text. Verify this:

# Stop Ollama
pkill -f ollama   # macOS/Linux
# taskkill /F /IM ollama.exe  # Windows

# Run the same search — should still return results via FTS
node --input-type=module -e "
import { searchVectorDb } from './src/utils/lancedb.js';
import { loadConfig } from './src/utils/config.js';
const config = loadConfig();
const results = await searchVectorDb(config, 'mental health', { max_results: 3 });
results.forEach(r => console.log('[' + r.score.toFixed(4) + '] ' + r.source_file));
"

# Restart Ollama
ollama serve

9.4 Test OpenClaw Tools Directly

Ask OpenClaw to use the plugin:

Ask OpenClaw: "How was my mental health in 2024?"

OpenClaw should invoke obsidian_rag_search with your query and return ranked results from your journal.

Ask OpenClaw: "Run obsidian_rag_status"

OpenClaw should invoke obsidian_rag_status and display index stats.

10. Keeping the Index Fresh

10.1 Manual Incremental Sync

After editing or adding notes, run:

source .venv/bin/activate   # macOS/Linux
obsidian-rag sync

This only re-indexes files whose mtime changed since the last sync. Typically <5 seconds for a handful of changed files.

10.2 Automatic Sync via File Watcher

The TypeScript plugin includes a VaultWatcher service (using chokidar) that monitors the vault directory and auto-triggers incremental syncs on file changes.

To enable the watcher, call the watcher initialization in your OpenClaw setup or run:

node --input-type=module -e "
import { startVaultWatcher } from './src/services/vault-watcher.js';
import { loadConfig } from './src/utils/config.js';
const config = loadConfig();
const watcher = startVaultWatcher(config);
console.log('Watching vault for changes...');
// Keep process alive
setInterval(() => {}, 10000);
"

Note: The watcher runs as a long-lived background process. Terminate it when shutting down.

10.3 Force Rebuild

If the index becomes corrupted or you change the chunking strategy:

obsidian-rag reindex

This drops the LanceDB table and rebuilds from scratch (equivalent to obsidian-rag index).

10.4 After Upgrading the Plugin

If you pull a new version of this plugin that changed the LanceDB schema or added new indexes (e.g., the FTS index on chunk_text), always reindex:

obsidian-rag reindex

11. Troubleshooting

`FileNotFoundError: config.json`

The CLI searches for config at:

./obsidian-rag/config.json (relative to project root, where you run obsidian-rag)
~/.obsidian-rag/config.json (home directory fallback)

Fix: Ensure you run obsidian-rag from the project root ($DESTINATION), or verify the config file exists:

ls "$DESTINATION/obsidian-rag/config.json"

`ERROR: Index not found. Run 'obsidian-rag index' first.`

LanceDB table doesn't exist. This is normal on first install.

Fix:

source .venv/bin/activate
obsidian-rag index

`ConnectionRefusedError` / `Ollama connection refused`

Ollama is not running.

Fix:

ollama serve &
sleep 2
curl http://localhost:11434/api/tags   # must return JSON

If on a remote machine, update embedding.base_url in config.json:

"base_url": "http://192.168.1.100:11434"

Vector search returns 0 results

Check the index exists: obsidian-rag status
Check Ollama model is available: ollama list
Rebuild the index: obsidian-rag reindex

FTS (DEGRADED mode) not working after upgrade

The FTS index on chunk_text was added in a recent change. Reindex to rebuild with FTS:

obsidian-rag reindex

`npm run build` fails with TypeScript errors

npm run typecheck

Fix any type errors in src/, then rebuild. Common causes: missing type declarations, outdated openclaw package.

Permission errors (Windows)

Run your terminal as Administrator, or install Python/Ollama to user-writable directories (not C:\Program Files).

Very slow embedding (~minutes for 500 files)

Reduce batch_size in config.json to 32 or 16
Ensure no other heavy processes are competing for CPU
Ollama embedding is CPU-bound on machines without AVX2/AVX512

Vault path contains spaces or special characters

Use an absolute path with proper escaping:

macOS/Linux:

# In config.json, use double quotes and escape spaces:
"vault_path": "/Users/your name/Documents/My Vault"

Windows:

"vault_path": "C:\\Users\\yourname\\Documents\\My Vault"

Plugin not appearing in `openclaw plugins list`

Confirm dist/index.js exists:

ls -la ~/.openclaw/extensions/obsidian-rag/dist/

Confirm openclaw.plugin.json exists:

ls ~/.openclaw/extensions/obsidian-rag/openclaw.plugin.json

Check that the symlink is valid (not broken):

ls -la ~/.openclaw/extensions/obsidian-rag
# Should point to your DESTINATION, not show as "red" (broken)

Verify the manifest has configSchema (required since v0.1.1):

grep configSchema ~/.openclaw/extensions/obsidian-rag/openclaw.plugin.json

Try bypassing discovery cache:

OPENCLAW_DISABLE_PLUGIN_DISCOVERY_CACHE=1 openclaw plugins list

Quick Reference — All Commands in Order

# 1. Clone
git clone https://github.com/YOUR_GITHUB_USER/obsidian-rag.git ~/dev/obsidian-rag
cd ~/dev/obsidian-rag

# 2. Install Ollama (if not installed)
curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
ollama pull mxbai-embed-large:335m

# 3. Python venv + CLI
python -m venv .venv
source .venv/bin/activate
pip install -e python/

# 4. Node.js plugin
npm install
npm run build

# 5. Edit config: set vault_path in obsidian-rag/config.json

# 6. First-time index
obsidian-rag index

# 7. Register with OpenClaw
mkdir -p ~/.openclaw/extensions
ln -s ~/dev/obsidian-rag ~/.openclaw/extensions/obsidian-rag

# 8. Verify
obsidian-rag status
openclaw plugins list

Project Layout Reference

obsidian-rag/                          # Project root (git-cloned)
├── .git/                              # Git history
├── .venv/                             # Python virtual environment (created in step 4)
├── dist/
│   └── index.js                       # Built plugin bundle (created by npm run build)
├── node_modules/                      # npm packages (created by npm install)
├── obsidian-rag/                      # Runtime data directory (created on first index)
│   ├── config.json                    # Plugin configuration
│   ├── vectors.lance/                 # LanceDB vector store (created on first index)
│   └── sync-result.json               # Last sync metadata
├── openclaw.plugin.json               # Plugin manifest (do not edit — auto-generated)
├── python/
│   ├── obsidian_rag/                  # Python package source
│   │   ├── cli.py                     # CLI entry point
│   │   ├── config.py                  # Config loader
│   │   ├── indexer.py                 # Full indexing pipeline
│   │   ├── chunker.py                 # Text chunking
│   │   ├── embedder.py                # Ollama client
│   │   ├── vector_store.py            # LanceDB CRUD
│   │   └── security.py                # Path traversal, HTML strip
│   └── tests/                         # 64 pytest tests
├── src/
│   ├── index.ts                       # OpenClaw plugin entry (definePluginEntry)
│   ├── tools/                         # Tool registrations + implementations
│   ├── services/                      # Health, watcher, indexer bridge
│   └── utils/                         # Config, LanceDB, types, response
├── package.json
├── tsconfig.json
└── vitest.config.ts

Last updated: 2026-04-11 — obsidian-rag v0.1.0

19 KiB Raw Blame History Unescape Escape