8.3 KiB
Obsidian RAG Plugin for OpenClaw — Design Spec
Date: 2026-04-10 Status: Approved Author: Santhosh Janardhanan
Overview
An OpenClaw plugin that enables semantic search through Obsidian vault notes using RAG (Retrieval-Augmented Generation). The plugin allows OpenClaw to respond to natural language queries about personal journal entries, shopping lists, financial records, health data, podcast notes, and project ideas stored in an Obsidian vault.
Problem Statement
Personal knowledge is fragmented across 677+ markdown files in an Obsidian vault, organized by topic but not searchable by meaning. Questions like "How was my mental health in 2024?" or "How much do I owe Sreenivas?" require reading multiple files across directories and synthesizing the answer. The plugin provides semantic search to surface relevant context.
Architecture
Approach: Separate Indexer Service + Thin Plugin
KnowledgeVault → Python Indexer (CLI) → LanceDB (filesystem)
↑ query
OpenClaw → TS Plugin (tools) ─────────────┘
- Python Indexer: Handles vault scanning, markdown parsing, chunking, embedding generation via Ollama, and LanceDB storage. Runs as a CLI tool.
- TypeScript Plugin: Registers OpenClaw tools that query the pre-built LanceDB index. Thin wrapper that provides the agent interface.
- LanceDB: Embedded vector database stored on local filesystem at
~/.obsidian-rag/vectors.lance. No server required.
Technology Choices
| Component | Choice | Rationale |
|---|---|---|
| Embedding model | mxbai-embed-large (1024-dim) via Ollama |
Local, free, meets 1024+ dimension requirement, SOTA accuracy |
| Vector store | LanceDB (embedded) | No server, file-based, Rust-based efficiency, zero-copy versioning for incremental updates |
| Indexer language | Python | Richer embedding/ML ecosystem, better markdown parsing libraries |
| Plugin language | TypeScript | Native OpenClaw ecosystem, type safety, SDK examples |
| Config | Separate .obsidian-rag/config.json |
Keeps plugin config separate from OpenClaw config |
CLI Commands (Python Indexer)
| Command | Purpose |
|---|---|
obsidian-rag index |
Initial full index of the vault (first-time setup) |
obsidian-rag sync |
Incremental — only process files modified since last sync |
obsidian-rag reindex |
Force full reindex (nuke existing, start fresh) |
obsidian-rag status |
Show index health: total docs, last sync time, unindexed files |
Plugin Tools (TypeScript)
obsidian_rag_search
Primary search tool for OpenClaw agent.
Parameters:
query(required, string): Natural language questionmax_results(optional, default 5): Max chunks to returndirectory_filter(optional, string or string[]): Limit to subdirectories (e.g.,["Journal", "Entertainment Index"])date_range(optional, object):{ from: "2025-01-01", to: "2025-12-31" }tags(optional, string[]): Filter by hashtags
obsidian_rag_index
Trigger indexing from within OpenClaw.
Parameters:
mode(required, enum):"full"|"sync"|"reindex"
obsidian_rag_status
Check index health — doc count, last sync, unindexed files.
obsidian_rag_memory_store
Commit important facts to OpenClaw's memory for faster future retrieval.
Parameters:
key(string): Identifiervalue(string): The fact to remembersource(string): Source file path
Auto-suggest logic: When search results contain financial, health, or commitment patterns, the plugin suggests the agent use obsidian_rag_memory_store. The agent decides whether to commit.
Chunking Strategy
Structured notes (Journal entries)
Chunk by section headers (#mentalhealth, #finance, etc.). Each section becomes its own chunk with metadata: source_file, section_name, date, tags.
Unstructured notes (shopping lists, project ideas, entertainment index)
Sliding window chunking (500 tokens, 100 token overlap). Each chunk gets metadata: source_file, chunk_index, total_chunks, headings.
Metadata per chunk
source_file: Relative path from vault rootsource_directory: Top-level directory (enables directory filtering)section: Section heading (for structured notes)date: Parsed from filename (journal entries)tags: All hashtags found in the chunkchunk_index: Position within the documentmodified_at: File mtime for incremental sync
Security & Privacy
- Path traversal prevention — All file reads restricted to configured vault path. No
../, symlinks outside vault, or absolute paths. - Input sanitization — Strip HTML tags, remove executable code blocks, normalize whitespace. All vault content treated as untrusted.
- Local-only enforcement — Ollama on localhost, LanceDB on filesystem. Network audit test verifies no outbound requests.
- Directory allow/deny lists — Config supports
deny_dirs(default:.obsidian,.trash,zzz-Archive,.git) andallow_dirs. - Sensitive content guard — Detects health (
#mentalhealth,#physicalhealth), financial debt, and personal relationship content. Blocks external API transmission of sensitive content. Requires user confirmation if an external embedding endpoint is configured.
Configuration
Config file at ~/.obsidian-rag/config.json:
{
"vault_path": "/home/san/KnowledgeVault/Default",
"embedding": {
"provider": "ollama",
"model": "mxbai-embed-large",
"base_url": "http://localhost:11434",
"dimensions": 1024
},
"vector_store": {
"type": "lancedb",
"path": "~/.obsidian-rag/vectors.lance"
},
"indexing": {
"chunk_size": 500,
"chunk_overlap": 100,
"file_patterns": ["*.md"],
"deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git"],
"allow_dirs": []
},
"security": {
"require_confirmation_for": ["health", "financial_debt"],
"sensitive_sections": ["#mentalhealth", "#physicalhealth", "#Relations"],
"local_only": true
},
"memory": {
"auto_suggest": true,
"patterns": {
"financial": ["owe", "owed", "debt", "paid", "$", "spent", "spend"],
"health": ["#mentalhealth", "#physicalhealth", "medication", "therapy"],
"commitments": ["shopping list", "costco", "amazon", "grocery"]
}
}
}
Project Structure
obsidian-rag-skill/
├── README.md
├── LICENSE
├── .gitignore
├── openclaw.plugin.json
├── package.json
├── tsconfig.json
├── src/
│ ├── index.ts
│ ├── tools/
│ │ ├── search.ts
│ │ ├── index.ts
│ │ ├── status.ts
│ │ └── memory.ts
│ ├── services/
│ │ ├── vault-watcher.ts
│ │ ├── indexer-bridge.ts
│ │ └── security-guard.ts
│ └── utils/
│ ├── config.ts
│ └── lancedb.ts
├── python/
│ ├── pyproject.toml
│ ├── obsidian_rag/
│ │ ├── __init__.py
│ │ ├── cli.py
│ │ ├── indexer.py
│ │ ├── chunker.py
│ │ ├── embedder.py
│ │ ├── vector_store.py
│ │ ├── security.py
│ │ └── config.py
│ └── tests/
│ ├── test_chunker.py
│ ├── test_security.py
│ ├── test_embedder.py
│ ├── test_vector_store.py
│ └── test_indexer.py
├── tests/
│ ├── tools/
│ │ ├── search.test.ts
│ │ ├── index.test.ts
│ │ └── memory.test.ts
│ └── services/
│ ├── vault-watcher.test.ts
│ └── security-guard.test.ts
└── docs/
└── superpowers/
└── specs/
Testing Strategy
- Python: pytest with mocked Ollama, path traversal tests, input sanitization, LanceDB CRUD
- TypeScript: vitest with tool parameter validation, security guard, search filter logic
- Security: Dedicated test suites for path traversal, XSS, prompt injection, network audit, sensitive content detection
Publishing
Published to ClawHub as both a skill (SKILL.md) and a plugin package:
clawhub skill publish ./skill --slug obsidian-rag --version 1.0.0
clawhub package publish santhosh/obsidian-rag
Install: openclaw plugins install clawhub:obsidian-rag