Files
obsidian-rag/INSTALL.md

19 KiB
Raw Blame History

Obsidian-RAG — Installation Guide for OpenClaw

What this plugin does: Indexes an Obsidian vault into LanceDB using Ollama embeddings, then powers four OpenClaw tools — obsidian_rag_search, obsidian_rag_index, obsidian_rag_status, and obsidian_rag_memory_store — so OpenClaw can answer natural-language questions over your personal notes (journal, finance, health, relationships, etc.).

Stack:

  • Python 3.11+ CLI → LanceDB vector store + Ollama embeddings
  • TypeScript/OpenClaw plugin → OpenClaw agent tools
  • Ollama (local) → embedding inference

Table of Contents

  1. Prerequisites
  2. Clone the Repository
  3. Install Ollama + Embedding Model
  4. Install Python CLI (Indexer)
  5. Install Node.js / TypeScript Plugin
  6. Configure the Plugin
  7. Run the Initial Index
  8. Register the Plugin with OpenClaw
  9. Verify Everything Works
  10. Keeping the Index Fresh
  11. Troubleshooting

1. Prerequisites

Component Required Version Why
Python ≥ 3.11 Async I/O, modern type hints
Node.js ≥ 18 ESM modules, node: imports
npm any recent installs TypeScript deps
Ollama running on localhost:11434 local embedding inference
Disk space ~500 MB free LanceDB store grows with vault

Verify your environment:

python --version    # → Python 3.11.x or higher
node --version      # → v18.x.x or higher
npm --version       # → 9.x.x or higher
curl http://localhost:11434/api/tags  # → {"models": [...]} if Ollama is running

If Ollama is not running yet, skip to §3 before continuing.


2. Clone the Repository

# Replace DESTINATION with where you want the project to live.
# The project root must be writable (not inside /System or a read-only mount).
DESTINATION="$HOME/dev/obsidian-rag"
mkdir -p "$HOME/dev"
git clone https://git.phostrich.com/santhoshj/obsidian-rag.git "$DESTINATION"
cd "$DESTINATION"

Important: The obsidian-rag/config.json, obsidian-rag/vectors.lance/, and obsidian-rag/sync-result.json directories are created at runtime below the project root. Choose a destination with adequate write permissions.

Note for existing clones: If you are re-running this guide on an already-cloned copy, pull the latest changes first:

git pull origin model/minimax

3. Install Ollama + Embedding Model

The plugin requires Ollama running locally with the mxbai-embed-large:335m embedding model.

3.1 Install Ollama

macOS / Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download the installer from https://ollama.com/download

Verify:

ollama --version

3.2 Start Ollama

ollama serve &
# Give it 2 seconds to bind to port 11434
sleep 2
curl http://localhost:11434/api/tags
# → {"models": []}

Auto-start tip: On macOS, consider installing Ollama as a LaunchAgent so it survives reboots. On Linux systemd: sudo systemctl enable ollama

3.3 Pull the Embedding Model

ollama pull mxbai-embed-large:335m

This downloads ~335 MB. Expected output:

pulling manifest
pulling 4a5b...  100%
verifying sha256 digest
writing manifest
success

Verify the model is available:

ollama list
# → NAME                      ID           SIZE      MODIFIED
# → mxbai-embed-large:335m    7c6d...      335 MB    2026-04-...

Model note: The config (obsidian-rag/config.json) defaults to mxbai-embed-large:335m. If you use a different model, update embedding.model and embedding.dimensions in the config file (see §6).


4. Install Python CLI (Indexer)

The Python CLI (obsidian-rag) handles all vault scanning, chunking, embedding, and LanceDB storage.

4.1 Create a Virtual Environment

Using a virtual environment isolates this project's dependencies from your system Python.

macOS / Linux:

cd "$DESTINATION"
python -m venv .venv
source .venv/bin/activate

Windows (PowerShell):

cd "$DESTINATION"
python -m venv .venv
.venv\Scripts\Activate.ps1

Windows (CMD):

cd %DESTINATION%
python -m venv .venv
.venv\Scripts\activate.bat

You should now see (.venv) prepended to your shell prompt.

4.2 Install the Package in Editable Mode

pip install -e python/

This installs all runtime dependencies:

  • lancedb — vector database
  • httpx — HTTP client for Ollama
  • pyyaml — config file parsing
  • python-frontmatter — YAML frontmatter extraction

Verify the CLI is accessible:

obsidian-rag --help

Expected output:

usage: obsidian-rag [-h] {index,sync,reindex,status}

positional arguments:
  {index,sync,reindex,status}
    index       Full vault index (scan → chunk → embed → store)
    sync        Incremental sync (only changed files)
    reindex     Force clean rebuild (deletes existing index)
    status      Show index health and statistics

Python path tip: The CLI entry point (obsidian-rag) is installed into .venv/bin/. Always activate the venv before running CLI commands:

source .venv/bin/activate   # macOS/Linux
.venv\Scripts\activate       # Windows PowerShell

Without venv: If you prefer a system-wide install instead of a venv, skip step 4.1 and run pip install -e python/ directly. Not recommended if you have other Python projects with conflicting dependencies.


5. Install Node.js / TypeScript Plugin

The TypeScript plugin registers the OpenClaw tools (obsidian_rag_search, obsidian_rag_index, obsidian_rag_status, obsidian_rag_memory_store).

5.1 Install npm Dependencies

cd "$DESTINATION"
npm install

This installs into node_modules/ and writes package-lock.json. Packages include:

  • openclaw — plugin framework
  • @lancedb/lancedb — vector DB client (Node.js bindings)
  • chokidar — file system watcher for auto-sync
  • yaml — config file parsing

5.2 Build the Plugin

npm run build

This compiles src/index.tsdist/index.js (a single ESM bundle, ~131 KB).

Expected output:

dist/index.js  131.2kb

Done in ~1s

Watch mode (development): Run npm run dev to rebuild automatically on file changes.

Type checking (optional but recommended):

npm run typecheck

Should produce no errors.


6. Configure the Plugin

All configuration lives in obsidian-rag/config.json relative to the project root.

6.1 Inspect the Default Config

cat "$DESTINATION/obsidian-rag/config.json"

6.2 Key Fields to Customize

Field Default Change if…
vault_path "./KnowledgeVault/Default" Your vault is in a different location
embedding.model "mxbai-embed-large:335m" You pulled a different Ollama model
embedding.base_url "http://localhost:11434" Ollama runs on a different host/port
vector_store.path "./obsidian-rag/vectors.lance" You want data in a different directory
deny_dirs [".obsidian", ".trash", ...] You want to skip or allow additional directories

6.3 Set Your Vault Path

Option A — Relative to the project root (recommended): Symlink or place your vault relative to the project:

# Example: your vault is at ~/obsidian-vault
# In config.json:
"vault_path": "../obsidian-vault"

Option B — Absolute path:

"vault_path": "/Users/yourusername/obsidian-vault"

Option C — Windows absolute path:

"vault_path": "C:\\Users\\YourUsername\\obsidian-vault"

Path validation: The CLI validates vault_path exists on the filesystem before indexing. You can verify manually:

ls "$DESTINATION/obsidian-rag/config.json"
python3 -c "
import json
with open('$DESTINATION/obsidian-rag/config.json') as f:
    cfg = json.load(f)
import os
assert os.path.isdir(cfg['vault_path']), 'vault_path does not exist'
print('Vault path OK:', cfg['vault_path'])
"

7. Run the Initial Index

This is a one-time step that scans every .md file in your vault, chunks them, embeds them via Ollama, and stores them in LanceDB.

# Make sure the venv is active
source .venv/bin/activate   # macOS/Linux
# .venv\Scripts\activate    # Windows

obsidian-rag index

Expected output (truncated):

{
  "type": "complete",
  "indexed_files": 627,
  "total_chunks": 3764,
  "duration_ms": 45230,
  "errors": []
}

What happens during index:

  1. Vault walk — traverses all subdirectories, skipping deny_dirs (.obsidian, .trash, zzz-Archive, etc.)
  2. Frontmatter parse — extracts YAML frontmatter, headings, tags, and dates from each .md file
  3. Chunking — structured notes (journal entries) split by # heading; unstructured notes use a 500-token sliding window with 100-token overlap
  4. Embedding — batches of 64 chunks sent to Ollama /api/embeddings endpoint
  5. Storage — vectors upserted into LanceDB at obsidian-rag/vectors.lance/
  6. Sync record — writes obsidian-rag/sync-result.json with timestamp and stats

Time estimate: ~3060 seconds for 500700 files on a modern machine. The embedding step is the bottleneck; Ollama must process each batch sequentially.

Batch size tuning: If embedding is slow, reduce embedding.batch_size in config.json (e.g., "batch_size": 32).


8. Register the Plugin with OpenClaw

OpenClaw discovers plugins from these locations:

  • ~/.openclaw/extensions/ (global, recommended for most users)
  • <workspace>/.openclaw/extensions/ (workspace-specific)
  • Bundled plugins in OpenClaw's install directory
mkdir -p ~/.openclaw/extensions
ln -s "$DESTINATION" ~/.openclaw/extensions/obsidian-rag
# From your OpenClaw workspace root
mkdir -p ./.openclaw/extensions
ln -s "$DESTINATION" ./.openclaw/extensions/obsidian-rag
openclaw plugins install --link "$DESTINATION"

8.4 Confirm the Plugin Loaded

openclaw plugins list | grep obsidian-rag
# or
openclaw plugins list --verbose | grep obsidian-rag

9. Verify Everything Works

9.1 Check Index Health

source .venv/bin/activate   # macOS/Linux
obsidian-rag status

Expected:

{
  "total_docs": 627,
  "total_chunks": 3764,
  "last_sync": "2026-04-11T00:30:00Z"
}

9.2 Test Semantic Search (via Node)

node --input-type=module -e "
import { loadConfig } from './src/utils/config.js';
import { searchVectorDb } from './src/utils/lancedb.js';

const config = loadConfig();
console.log('Searching for: how was my mental health in 2024');
const results = await searchVectorDb(config, 'how was my mental health in 2024', { max_results: 3 });
for (const r of results) {
  console.log('---');
  console.log('[' + r.score.toFixed(3) + '] ' + r.source_file + ' | ' + (r.section || '(no section)'));
  console.log('  ' + r.chunk_text.slice(0, 180) + '...');
}
"

Expected: ranked list of relevant note chunks with cosine similarity scores.

9.3 Test DEGRADED Mode (Ollama Down)

If Ollama is unavailable, the plugin falls back to BM25 full-text search on chunk_text. Verify this:

# Stop Ollama
pkill -f ollama   # macOS/Linux
# taskkill /F /IM ollama.exe  # Windows

# Run the same search — should still return results via FTS
node --input-type=module -e "
import { searchVectorDb } from './src/utils/lancedb.js';
import { loadConfig } from './src/utils/config.js';
const config = loadConfig();
const results = await searchVectorDb(config, 'mental health', { max_results: 3 });
results.forEach(r => console.log('[' + r.score.toFixed(4) + '] ' + r.source_file));
"

# Restart Ollama
ollama serve

9.4 Test OpenClaw Tools Directly

Ask OpenClaw to use the plugin:

Ask OpenClaw: "How was my mental health in 2024?"

OpenClaw should invoke obsidian_rag_search with your query and return ranked results from your journal.

Ask OpenClaw: "Run obsidian_rag_status"

OpenClaw should invoke obsidian_rag_status and display index stats.


10. Keeping the Index Fresh

10.1 Manual Incremental Sync

After editing or adding notes, run:

source .venv/bin/activate   # macOS/Linux
obsidian-rag sync

This only re-indexes files whose mtime changed since the last sync. Typically <5 seconds for a handful of changed files.

10.2 Automatic Sync via File Watcher

The TypeScript plugin includes a VaultWatcher service (using chokidar) that monitors the vault directory and auto-triggers incremental syncs on file changes.

To enable the watcher, call the watcher initialization in your OpenClaw setup or run:

node --input-type=module -e "
import { startVaultWatcher } from './src/services/vault-watcher.js';
import { loadConfig } from './src/utils/config.js';
const config = loadConfig();
const watcher = startVaultWatcher(config);
console.log('Watching vault for changes...');
// Keep process alive
setInterval(() => {}, 10000);
"

Note: The watcher runs as a long-lived background process. Terminate it when shutting down.

10.3 Force Rebuild

If the index becomes corrupted or you change the chunking strategy:

obsidian-rag reindex

This drops the LanceDB table and rebuilds from scratch (equivalent to obsidian-rag index).

10.4 After Upgrading the Plugin

If you pull a new version of this plugin that changed the LanceDB schema or added new indexes (e.g., the FTS index on chunk_text), always reindex:

obsidian-rag reindex

11. Troubleshooting

FileNotFoundError: config.json

The CLI searches for config at:

  1. ./obsidian-rag/config.json (relative to project root, where you run obsidian-rag)
  2. ~/.obsidian-rag/config.json (home directory fallback)

Fix: Ensure you run obsidian-rag from the project root ($DESTINATION), or verify the config file exists:

ls "$DESTINATION/obsidian-rag/config.json"

ERROR: Index not found. Run 'obsidian-rag index' first.

LanceDB table doesn't exist. This is normal on first install.

Fix:

source .venv/bin/activate
obsidian-rag index

ConnectionRefusedError / Ollama connection refused

Ollama is not running.

Fix:

ollama serve &
sleep 2
curl http://localhost:11434/api/tags   # must return JSON

If on a remote machine, update embedding.base_url in config.json:

"base_url": "http://192.168.1.100:11434"

Vector search returns 0 results

  1. Check the index exists: obsidian-rag status
  2. Check Ollama model is available: ollama list
  3. Rebuild the index: obsidian-rag reindex

FTS (DEGRADED mode) not working after upgrade

The FTS index on chunk_text was added in a recent change. Reindex to rebuild with FTS:

obsidian-rag reindex

npm run build fails with TypeScript errors

npm run typecheck

Fix any type errors in src/, then rebuild. Common causes: missing type declarations, outdated openclaw package.

Permission errors (Windows)

Run your terminal as Administrator, or install Python/Ollama to user-writable directories (not C:\Program Files).

Very slow embedding (~minutes for 500 files)

  • Reduce batch_size in config.json to 32 or 16
  • Ensure no other heavy processes are competing for CPU
  • Ollama embedding is CPU-bound on machines without AVX2/AVX512

Vault path contains spaces or special characters

Use an absolute path with proper escaping:

macOS/Linux:

# In config.json, use double quotes and escape spaces:
"vault_path": "/Users/your name/Documents/My Vault"

Windows:

"vault_path": "C:\\Users\\yourname\\Documents\\My Vault"

Plugin not appearing in openclaw plugins list

  1. Confirm dist/index.js exists:
    ls -la ~/.openclaw/extensions/obsidian-rag/dist/
    
  2. Confirm openclaw.plugin.json exists:
    ls ~/.openclaw/extensions/obsidian-rag/openclaw.plugin.json
    
  3. Check that the symlink is valid (not broken):
    ls -la ~/.openclaw/extensions/obsidian-rag
    # Should point to your DESTINATION, not show as "red" (broken)
    
  4. Verify the manifest has configSchema (required since v0.1.1):
    grep configSchema ~/.openclaw/extensions/obsidian-rag/openclaw.plugin.json
    
  5. Try bypassing discovery cache:
    OPENCLAW_DISABLE_PLUGIN_DISCOVERY_CACHE=1 openclaw plugins list
    

Quick Reference — All Commands in Order

# 1. Clone
git clone https://github.com/YOUR_GITHUB_USER/obsidian-rag.git ~/dev/obsidian-rag
cd ~/dev/obsidian-rag

# 2. Install Ollama (if not installed)
curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
ollama pull mxbai-embed-large:335m

# 3. Python venv + CLI
python -m venv .venv
source .venv/bin/activate
pip install -e python/

# 4. Node.js plugin
npm install
npm run build

# 5. Edit config: set vault_path in obsidian-rag/config.json

# 6. First-time index
obsidian-rag index

# 7. Register with OpenClaw
mkdir -p ~/.openclaw/extensions
ln -s ~/dev/obsidian-rag ~/.openclaw/extensions/obsidian-rag

# 8. Verify
obsidian-rag status
openclaw plugins list

Project Layout Reference

obsidian-rag/                          # Project root (git-cloned)
├── .git/                              # Git history
├── .venv/                             # Python virtual environment (created in step 4)
├── dist/
│   └── index.js                       # Built plugin bundle (created by npm run build)
├── node_modules/                      # npm packages (created by npm install)
├── obsidian-rag/                      # Runtime data directory (created on first index)
│   ├── config.json                    # Plugin configuration
│   ├── vectors.lance/                 # LanceDB vector store (created on first index)
│   └── sync-result.json               # Last sync metadata
├── openclaw.plugin.json               # Plugin manifest (do not edit — auto-generated)
├── python/
│   ├── obsidian_rag/                  # Python package source
│   │   ├── cli.py                     # CLI entry point
│   │   ├── config.py                  # Config loader
│   │   ├── indexer.py                 # Full indexing pipeline
│   │   ├── chunker.py                 # Text chunking
│   │   ├── embedder.py                # Ollama client
│   │   ├── vector_store.py            # LanceDB CRUD
│   │   └── security.py                # Path traversal, HTML strip
│   └── tests/                         # 64 pytest tests
├── src/
│   ├── index.ts                       # OpenClaw plugin entry (definePluginEntry)
│   ├── tools/                         # Tool registrations + implementations
│   ├── services/                      # Health, watcher, indexer bridge
│   └── utils/                         # Config, LanceDB, types, response
├── package.json
├── tsconfig.json
└── vitest.config.ts

Last updated: 2026-04-11 — obsidian-rag v0.1.0