From 90d6f83937e7d86a613804c1e2363fd967a14eee Mon Sep 17 00:00:00 2001 From: Santhosh Janardhanan Date: Sat, 11 Apr 2026 15:17:44 -0400 Subject: [PATCH] Openclaw install instructions added --- INSTALL.md | 671 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 671 insertions(+) create mode 100644 INSTALL.md diff --git a/INSTALL.md b/INSTALL.md new file mode 100644 index 0000000..fa490e7 --- /dev/null +++ b/INSTALL.md @@ -0,0 +1,671 @@ +# Obsidian-RAG — Installation Guide for OpenClaw + +**What this plugin does:** Indexes an Obsidian vault into LanceDB using Ollama embeddings, then powers four OpenClaw tools — `obsidian_rag_search`, `obsidian_rag_index`, `obsidian_rag_status`, and `obsidian_rag_memory_store` — so OpenClaw can answer natural-language questions over your personal notes (journal, finance, health, relationships, etc.). + +**Stack:** +- Python 3.11+ CLI → LanceDB vector store + Ollama embeddings +- TypeScript/OpenClaw plugin → OpenClaw agent tools +- Ollama (local) → embedding inference + +--- + +## Table of Contents + +1. [Prerequisites](#1-prerequisites) +2. [Clone the Repository](#2-clone-the-repository) +3. [Install Ollama + Embedding Model](#3-install-ollama--embedding-model) +4. [Install Python CLI (Indexer)](#4-install-python-cli-indexer) +5. [Install Node.js / TypeScript Plugin](#5-install-nodejs--typescript-plugin) +6. [Configure the Plugin](#6-configure-the-plugin) +7. [Run the Initial Index](#7-run-the-initial-index) +8. [Register the Plugin with OpenClaw](#8-register-the-plugin-with-openclaw) +9. [Verify Everything Works](#9-verify-everything-works) +10. [Keeping the Index Fresh](#10-keeping-the-index-fresh) +11. [Troubleshooting](#11-troubleshooting) + +--- + +## 1. Prerequisites + +| Component | Required Version | Why | +|---|---|---| +| Python | ≥ 3.11 | Async I/O, modern type hints | +| Node.js | ≥ 18 | ESM modules, `node:` imports | +| npm | any recent | installs TypeScript deps | +| Ollama | running on `localhost:11434` | local embedding inference | +| Disk space | ~500 MB free | LanceDB store grows with vault | + +**Verify your environment:** + +```bash +python --version # → Python 3.11.x or higher +node --version # → v18.x.x or higher +npm --version # → 9.x.x or higher +curl http://localhost:11434/api/tags # → {"models": [...]} if Ollama is running +``` + +If Ollama is not running yet, skip to [§3](#3-install-ollama--embedding-model) before continuing. + +--- + +## 2. Clone the Repository + +```bash +# Replace DESTINATION with where you want the project to live. +# The project root must be writable (not inside /System or a read-only mount). +DESTINATION="$HOME/dev/obsidian-rag" +mkdir -p "$HOME/dev" +git clone https://github.com/YOUR_GITHUB_USER/obsidian-rag.git "$DESTINATION" +cd "$DESTINATION" +``` + +> **Important:** The `obsidian-rag/config.json`, `obsidian-rag/vectors.lance/`, and `obsidian-rag/sync-result.json` directories are created at runtime below the project root. Choose a destination with adequate write permissions. + +> **Note for existing clones:** If you are re-running this guide on an already-cloned copy, pull the latest changes first: +> ```bash +> git pull origin main +> ``` + +--- + +## 3. Install Ollama + Embedding Model + +The plugin requires Ollama running locally with the `mxbai-embed-large:335m` embedding model. + +### 3.1 Install Ollama + +**macOS / Linux:** +```bash +curl -fsSL https://ollama.com/install.sh | sh +``` + +**Windows:** Download the installer from https://ollama.com/download + +**Verify:** +```bash +ollama --version +``` + +### 3.2 Start Ollama + +```bash +ollama serve & +# Give it 2 seconds to bind to port 11434 +sleep 2 +curl http://localhost:11434/api/tags +# → {"models": []} +``` + +> **Auto-start tip:** On macOS, consider installing Ollama as a LaunchAgent so it survives reboots. +> On Linux systemd: `sudo systemctl enable ollama` + +### 3.3 Pull the Embedding Model + +```bash +ollama pull mxbai-embed-large:335m +``` + +This downloads ~335 MB. Expected output: +``` +pulling manifest +pulling 4a5b... 100% +verifying sha256 digest +writing manifest +success +``` + +**Verify the model is available:** +```bash +ollama list +# → NAME ID SIZE MODIFIED +# → mxbai-embed-large:335m 7c6d... 335 MB 2026-04-... +``` + +> **Model note:** The config (`obsidian-rag/config.json`) defaults to `mxbai-embed-large:335m`. If you use a different model, update `embedding.model` and `embedding.dimensions` in the config file (see [§6](#6-configure-the-plugin)). + +--- + +## 4. Install Python CLI (Indexer) + +The Python CLI (`obsidian-rag`) handles all vault scanning, chunking, embedding, and LanceDB storage. + +### 4.1 Create a Virtual Environment + +Using a virtual environment isolates this project's dependencies from your system Python. + +**macOS / Linux:** +```bash +cd "$DESTINATION" +python -m venv .venv +source .venv/bin/activate +``` + +**Windows (PowerShell):** +```powershell +cd "$DESTINATION" +python -m venv .venv +.venv\Scripts\Activate.ps1 +``` + +**Windows (CMD):** +```cmd +cd %DESTINATION% +python -m venv .venv +.venv\Scripts\activate.bat +``` + +You should now see `(.venv)` prepended to your shell prompt. + +### 4.2 Install the Package in Editable Mode + +```bash +pip install -e python/ +``` + +This installs all runtime dependencies: +- `lancedb` — vector database +- `httpx` — HTTP client for Ollama +- `pyyaml` — config file parsing +- `python-frontmatter` — YAML frontmatter extraction + +**Verify the CLI is accessible:** +```bash +obsidian-rag --help +``` + +Expected output: +``` +usage: obsidian-rag [-h] {index,sync,reindex,status} + +positional arguments: + {index,sync,reindex,status} + index Full vault index (scan → chunk → embed → store) + sync Incremental sync (only changed files) + reindex Force clean rebuild (deletes existing index) + status Show index health and statistics +``` + +> **Python path tip:** The CLI entry point (`obsidian-rag`) is installed into `.venv/bin/`. Always activate the venv before running CLI commands: +> ```bash +> source .venv/bin/activate # macOS/Linux +> .venv\Scripts\activate # Windows PowerShell +> ``` + +> **Without venv:** If you prefer a system-wide install instead of a venv, skip step 4.1 and run `pip install -e python/` directly. Not recommended if you have other Python projects with conflicting dependencies. + +--- + +## 5. Install Node.js / TypeScript Plugin + +The TypeScript plugin registers the OpenClaw tools (`obsidian_rag_search`, `obsidian_rag_index`, `obsidian_rag_status`, `obsidian_rag_memory_store`). + +### 5.1 Install npm Dependencies + +```bash +cd "$DESTINATION" +npm install +``` + +This installs into `node_modules/` and writes `package-lock.json`. Packages include: +- `openclaw` — plugin framework +- `@lancedb/lancedb` — vector DB client (Node.js bindings) +- `chokidar` — file system watcher for auto-sync +- `yaml` — config file parsing + +### 5.2 Build the Plugin + +```bash +npm run build +``` + +This compiles `src/index.ts` → `dist/index.js` (a single ESM bundle, ~131 KB). + +Expected output: +``` +dist/index.js 131.2kb + +Done in ~1s +``` + +> **Watch mode (development):** Run `npm run dev` to rebuild automatically on file changes. + +> **Type checking (optional but recommended):** +> ```bash +> npm run typecheck +> ``` +> Should produce no errors. + +--- + +## 6. Configure the Plugin + +All configuration lives in `obsidian-rag/config.json` relative to the project root. + +### 6.1 Inspect the Default Config + +```bash +cat "$DESTINATION/obsidian-rag/config.json" +``` + +### 6.2 Key Fields to Customize + +| Field | Default | Change if… | +|---|---|---| +| `vault_path` | `"./KnowledgeVault/Default"` | Your vault is in a different location | +| `embedding.model` | `"mxbai-embed-large:335m"` | You pulled a different Ollama model | +| `embedding.base_url` | `"http://localhost:11434"` | Ollama runs on a different host/port | +| `vector_store.path` | `"./obsidian-rag/vectors.lance"` | You want data in a different directory | +| `deny_dirs` | `[".obsidian", ".trash", ...]` | You want to skip or allow additional directories | + +### 6.3 Set Your Vault Path + +**Option A — Relative to the project root (recommended):** +Symlink or place your vault relative to the project: +```bash +# Example: your vault is at ~/obsidian-vault +# In config.json: +"vault_path": "../obsidian-vault" +``` + +**Option B — Absolute path:** +```json +"vault_path": "/Users/yourusername/obsidian-vault" +``` + +**Option C — Windows absolute path:** +```json +"vault_path": "C:\\Users\\YourUsername\\obsidian-vault" +``` + +> **Path validation:** The CLI validates `vault_path` exists on the filesystem before indexing. You can verify manually: +> ```bash +> ls "$DESTINATION/obsidian-rag/config.json" +> python3 -c " +> import json +> with open('$DESTINATION/obsidian-rag/config.json') as f: +> cfg = json.load(f) +> import os +> assert os.path.isdir(cfg['vault_path']), 'vault_path does not exist' +> print('Vault path OK:', cfg['vault_path']) +> " + +--- + +## 7. Run the Initial Index + +This is a one-time step that scans every `.md` file in your vault, chunks them, embeds them via Ollama, and stores them in LanceDB. + +```bash +# Make sure the venv is active +source .venv/bin/activate # macOS/Linux +# .venv\Scripts\activate # Windows + +obsidian-rag index +``` + +**Expected output (truncated):** +```json +{ + "type": "complete", + "indexed_files": 627, + "total_chunks": 3764, + "duration_ms": 45230, + "errors": [] +} +``` + +### What happens during `index`: + +1. **Vault walk** — traverses all subdirectories, skipping `deny_dirs` (`.obsidian`, `.trash`, `zzz-Archive`, etc.) +2. **Frontmatter parse** — extracts YAML frontmatter, headings, tags, and dates from each `.md` file +3. **Chunking** — structured notes (journal entries) split by `# heading`; unstructured notes use a 500-token sliding window with 100-token overlap +4. **Embedding** — batches of 64 chunks sent to Ollama `/api/embeddings` endpoint +5. **Storage** — vectors upserted into LanceDB at `obsidian-rag/vectors.lance/` +6. **Sync record** — writes `obsidian-rag/sync-result.json` with timestamp and stats + +> **Time estimate:** ~30–60 seconds for 500–700 files on a modern machine. The embedding step is the bottleneck; Ollama must process each batch sequentially. +> +> **Batch size tuning:** If embedding is slow, reduce `embedding.batch_size` in `config.json` (e.g., `"batch_size": 32`). + +--- + +## 8. Register the Plugin with OpenClaw + +OpenClaw auto-discovers plugins by reading the `openclaw.plugin.json` manifest in the project root and loading `dist/index.js`. + +### 8.1 Register via OpenClaw's Plugin Manager + +```bash +# OpenClaw CLI — register the local plugin +openclaw plugin add "$DESTINATION" +# or, if OpenClaw has a specific register command: +openclaw plugins register --path "$DESTINATION/dist/index.js" --name obsidian-rag +``` + +> **Note:** The exact command depends on your OpenClaw version. Check `openclaw --help` or `openclaw plugin --help` for the correct syntax. The plugin manifest (`openclaw.plugin.json`) in this project already declares all four tools. + +### 8.2 Alternative — Register by Path in OpenClaw Config + +If your OpenClaw installation uses a config file (e.g., `~/.openclaw/config.json` or `~/.openclaw/plugins.json`), add this project's built bundle: + +```json +{ + "plugins": [ + { + "name": "obsidian-rag", + "path": "/full/path/to/obsidian-rag/dist/index.js" + } + ] +} +``` + +### 8.3 Confirm the Plugin Loaded + +```bash +openclaw plugins list +# → obsidian-rag 0.1.0 (loaded) +``` + +Or, if OpenClaw has a status command: +```bash +openclaw status +# → Plugin: obsidian-rag ✓ loaded +``` + +--- + +## 9. Verify Everything Works + +### 9.1 Check Index Health + +```bash +source .venv/bin/activate # macOS/Linux +obsidian-rag status +``` + +Expected: +```json +{ + "total_docs": 627, + "total_chunks": 3764, + "last_sync": "2026-04-11T00:30:00Z" +} +``` + +### 9.2 Test Semantic Search (via Node) + +```bash +node --input-type=module -e " +import { loadConfig } from './src/utils/config.js'; +import { searchVectorDb } from './src/utils/lancedb.js'; + +const config = loadConfig(); +console.log('Searching for: how was my mental health in 2024'); +const results = await searchVectorDb(config, 'how was my mental health in 2024', { max_results: 3 }); +for (const r of results) { + console.log('---'); + console.log('[' + r.score.toFixed(3) + '] ' + r.source_file + ' | ' + (r.section || '(no section)')); + console.log(' ' + r.chunk_text.slice(0, 180) + '...'); +} +" +``` + +Expected: ranked list of relevant note chunks with cosine similarity scores. + +### 9.3 Test DEGRADED Mode (Ollama Down) + +If Ollama is unavailable, the plugin falls back to BM25 full-text search on `chunk_text`. Verify this: + +```bash +# Stop Ollama +pkill -f ollama # macOS/Linux +# taskkill /F /IM ollama.exe # Windows + +# Run the same search — should still return results via FTS +node --input-type=module -e " +import { searchVectorDb } from './src/utils/lancedb.js'; +import { loadConfig } from './src/utils/config.js'; +const config = loadConfig(); +const results = await searchVectorDb(config, 'mental health', { max_results: 3 }); +results.forEach(r => console.log('[' + r.score.toFixed(4) + '] ' + r.source_file)); +" + +# Restart Ollama +ollama serve +``` + +### 9.4 Test OpenClaw Tools Directly + +Ask OpenClaw to use the plugin: + +``` +Ask OpenClaw: "How was my mental health in 2024?" +``` + +OpenClaw should invoke `obsidian_rag_search` with your query and return ranked results from your journal. + +``` +Ask OpenClaw: "Run obsidian_rag_status" +``` + +OpenClaw should invoke `obsidian_rag_status` and display index stats. + +--- + +## 10. Keeping the Index Fresh + +### 10.1 Manual Incremental Sync + +After editing or adding notes, run: +```bash +source .venv/bin/activate # macOS/Linux +obsidian-rag sync +``` + +This only re-indexes files whose `mtime` changed since the last sync. Typically <5 seconds for a handful of changed files. + +### 10.2 Automatic Sync via File Watcher + +The TypeScript plugin includes a `VaultWatcher` service (using `chokidar`) that monitors the vault directory and auto-triggers incremental syncs on file changes. + +To enable the watcher, call the watcher initialization in your OpenClaw setup or run: +```bash +node --input-type=module -e " +import { startVaultWatcher } from './src/services/vault-watcher.js'; +import { loadConfig } from './src/utils/config.js'; +const config = loadConfig(); +const watcher = startVaultWatcher(config); +console.log('Watching vault for changes...'); +// Keep process alive +setInterval(() => {}, 10000); +" +``` + +> **Note:** The watcher runs as a long-lived background process. Terminate it when shutting down. + +### 10.3 Force Rebuild + +If the index becomes corrupted or you change the chunking strategy: +```bash +obsidian-rag reindex +``` + +This drops the LanceDB table and rebuilds from scratch (equivalent to `obsidian-rag index`). + +### 10.4 After Upgrading the Plugin + +If you pull a new version of this plugin that changed the LanceDB schema or added new indexes (e.g., the FTS index on `chunk_text`), always reindex: +```bash +obsidian-rag reindex +``` + +--- + +## 11. Troubleshooting + +### `FileNotFoundError: config.json` + +The CLI searches for config at: +1. `./obsidian-rag/config.json` (relative to project root, where you run `obsidian-rag`) +2. `~/.obsidian-rag/config.json` (home directory fallback) + +**Fix:** Ensure you run `obsidian-rag` from the project root (`$DESTINATION`), or verify the config file exists: +```bash +ls "$DESTINATION/obsidian-rag/config.json" +``` + +### `ERROR: Index not found. Run 'obsidian-rag index' first.` + +LanceDB table doesn't exist. This is normal on first install. + +**Fix:** +```bash +source .venv/bin/activate +obsidian-rag index +``` + +### `ConnectionRefusedError` / `Ollama connection refused` + +Ollama is not running. + +**Fix:** +```bash +ollama serve & +sleep 2 +curl http://localhost:11434/api/tags # must return JSON +``` + +If on a remote machine, update `embedding.base_url` in `config.json`: +```json +"base_url": "http://192.168.1.100:11434" +``` + +### Vector search returns 0 results + +1. Check the index exists: `obsidian-rag status` +2. Check Ollama model is available: `ollama list` +3. Rebuild the index: `obsidian-rag reindex` + +### FTS (DEGRADED mode) not working after upgrade + +The FTS index on `chunk_text` was added in a recent change. **Reindex to rebuild with FTS:** + +```bash +obsidian-rag reindex +``` + +### `npm run build` fails with TypeScript errors + +```bash +npm run typecheck +``` + +Fix any type errors in `src/`, then rebuild. Common causes: missing type declarations, outdated `openclaw` package. + +### Permission errors (Windows) + +Run your terminal as Administrator, or install Python/Ollama to user-writable directories (not `C:\Program Files`). + +### Very slow embedding (~minutes for 500 files) + +- Reduce `batch_size` in `config.json` to `32` or `16` +- Ensure no other heavy processes are competing for CPU +- Ollama embedding is CPU-bound on machines without AVX2/AVX512 + +### Vault path contains spaces or special characters + +Use an absolute path with proper escaping: + +**macOS/Linux:** +```bash +# In config.json, use double quotes and escape spaces: +"vault_path": "/Users/your name/Documents/My Vault" +``` + +**Windows:** +```json +"vault_path": "C:\\Users\\yourname\\Documents\\My Vault" +``` + +### Plugin not appearing in `openclaw plugins list` + +1. Confirm `dist/index.js` exists (`ls -la dist/`) +2. Check that `openclaw.plugin.json` is valid JSON +3. Try re-registering: `openclaw plugin add "$DESTINATION"` +4. Check OpenClaw's plugin discovery path — it may need to be in a specific directory like `~/.openclaw/plugins/` + +--- + +## Quick Reference — All Commands in Order + +```bash +# 1. Clone +git clone https://github.com/YOUR_GITHUB_USER/obsidian-rag.git ~/dev/obsidian-rag +cd ~/dev/obsidian-rag + +# 2. Install Ollama (if not installed) +curl -fsSL https://ollama.com/install.sh | sh +ollama serve & +ollama pull mxbai-embed-large:335m + +# 3. Python venv + CLI +python -m venv .venv +source .venv/bin/activate +pip install -e python/ + +# 4. Node.js plugin +npm install +npm run build + +# 5. Edit config: set vault_path in obsidian-rag/config.json + +# 6. First-time index +obsidian-rag index + +# 7. Register with OpenClaw +openclaw plugin add ~/dev/obsidian-rag + +# 8. Verify +obsidian-rag status +openclaw plugins list +``` + +--- + +## Project Layout Reference + +``` +obsidian-rag/ # Project root (git-cloned) +├── .git/ # Git history +├── .venv/ # Python virtual environment (created in step 4) +├── dist/ +│ └── index.js # Built plugin bundle (created by npm run build) +├── node_modules/ # npm packages (created by npm install) +├── obsidian-rag/ # Runtime data directory (created on first index) +│ ├── config.json # Plugin configuration +│ ├── vectors.lance/ # LanceDB vector store (created on first index) +│ └── sync-result.json # Last sync metadata +├── openclaw.plugin.json # Plugin manifest (do not edit — auto-generated) +├── python/ +│ ├── obsidian_rag/ # Python package source +│ │ ├── cli.py # CLI entry point +│ │ ├── config.py # Config loader +│ │ ├── indexer.py # Full indexing pipeline +│ │ ├── chunker.py # Text chunking +│ │ ├── embedder.py # Ollama client +│ │ ├── vector_store.py # LanceDB CRUD +│ │ └── security.py # Path traversal, HTML strip +│ └── tests/ # 64 pytest tests +├── src/ +│ ├── index.ts # OpenClaw plugin entry (definePluginEntry) +│ ├── tools/ # Tool registrations + implementations +│ ├── services/ # Health, watcher, indexer bridge +│ └── utils/ # Config, LanceDB, types, response +├── package.json +├── tsconfig.json +└── vitest.config.ts +``` + +--- + +*Last updated: 2026-04-11 — obsidian-rag v0.1.0*