Files
kv-ai/sample-data/Default/2026-04-10-obsidian-rag-design.md

8.3 KiB

Obsidian RAG Plugin for OpenClaw — Design Spec

Date: 2026-04-10 Status: Approved Author: Santhosh Janardhanan

Overview

An OpenClaw plugin that enables semantic search through Obsidian vault notes using RAG (Retrieval-Augmented Generation). The plugin allows OpenClaw to respond to natural language queries about personal journal entries, shopping lists, financial records, health data, podcast notes, and project ideas stored in an Obsidian vault.

Problem Statement

Personal knowledge is fragmented across 677+ markdown files in an Obsidian vault, organized by topic but not searchable by meaning. Questions like "How was my mental health in 2024?" or "How much do I owe Sreenivas?" require reading multiple files across directories and synthesizing the answer. The plugin provides semantic search to surface relevant context.

Architecture

Approach: Separate Indexer Service + Thin Plugin

KnowledgeVault → Python Indexer (CLI) → LanceDB (filesystem)
                                           ↑ query
OpenClaw → TS Plugin (tools) ─────────────┘
  • Python Indexer: Handles vault scanning, markdown parsing, chunking, embedding generation via Ollama, and LanceDB storage. Runs as a CLI tool.
  • TypeScript Plugin: Registers OpenClaw tools that query the pre-built LanceDB index. Thin wrapper that provides the agent interface.
  • LanceDB: Embedded vector database stored on local filesystem at ~/.obsidian-rag/vectors.lance. No server required.

Technology Choices

Component Choice Rationale
Embedding model mxbai-embed-large (1024-dim) via Ollama Local, free, meets 1024+ dimension requirement, SOTA accuracy
Vector store LanceDB (embedded) No server, file-based, Rust-based efficiency, zero-copy versioning for incremental updates
Indexer language Python Richer embedding/ML ecosystem, better markdown parsing libraries
Plugin language TypeScript Native OpenClaw ecosystem, type safety, SDK examples
Config Separate .obsidian-rag/config.json Keeps plugin config separate from OpenClaw config

CLI Commands (Python Indexer)

Command Purpose
obsidian-rag index Initial full index of the vault (first-time setup)
obsidian-rag sync Incremental — only process files modified since last sync
obsidian-rag reindex Force full reindex (nuke existing, start fresh)
obsidian-rag status Show index health: total docs, last sync time, unindexed files

Plugin Tools (TypeScript)

Primary search tool for OpenClaw agent.

Parameters:

  • query (required, string): Natural language question
  • max_results (optional, default 5): Max chunks to return
  • directory_filter (optional, string or string[]): Limit to subdirectories (e.g., ["Journal", "Entertainment Index"])
  • date_range (optional, object): { from: "2025-01-01", to: "2025-12-31" }
  • tags (optional, string[]): Filter by hashtags

obsidian_rag_index

Trigger indexing from within OpenClaw.

Parameters:

  • mode (required, enum): "full" | "sync" | "reindex"

obsidian_rag_status

Check index health — doc count, last sync, unindexed files.

obsidian_rag_memory_store

Commit important facts to OpenClaw's memory for faster future retrieval.

Parameters:

  • key (string): Identifier
  • value (string): The fact to remember
  • source (string): Source file path

Auto-suggest logic: When search results contain financial, health, or commitment patterns, the plugin suggests the agent use obsidian_rag_memory_store. The agent decides whether to commit.

Chunking Strategy

Structured notes (Journal entries)

Chunk by section headers (#mentalhealth, #finance, etc.). Each section becomes its own chunk with metadata: source_file, section_name, date, tags.

Unstructured notes (shopping lists, project ideas, entertainment index)

Sliding window chunking (500 tokens, 100 token overlap). Each chunk gets metadata: source_file, chunk_index, total_chunks, headings.

Metadata per chunk

  • source_file: Relative path from vault root
  • source_directory: Top-level directory (enables directory filtering)
  • section: Section heading (for structured notes)
  • date: Parsed from filename (journal entries)
  • tags: All hashtags found in the chunk
  • chunk_index: Position within the document
  • modified_at: File mtime for incremental sync

Security & Privacy

  1. Path traversal prevention — All file reads restricted to configured vault path. No ../, symlinks outside vault, or absolute paths.
  2. Input sanitization — Strip HTML tags, remove executable code blocks, normalize whitespace. All vault content treated as untrusted.
  3. Local-only enforcement — Ollama on localhost, LanceDB on filesystem. Network audit test verifies no outbound requests.
  4. Directory allow/deny lists — Config supports deny_dirs (default: .obsidian, .trash, zzz-Archive, .git) and allow_dirs.
  5. Sensitive content guard — Detects health (#mentalhealth, #physicalhealth), financial debt, and personal relationship content. Blocks external API transmission of sensitive content. Requires user confirmation if an external embedding endpoint is configured.

Configuration

Config file at ~/.obsidian-rag/config.json:

{
  "vault_path": "/home/san/KnowledgeVault/Default",
  "embedding": {
    "provider": "ollama",
    "model": "mxbai-embed-large",
    "base_url": "http://localhost:11434",
    "dimensions": 1024
  },
  "vector_store": {
    "type": "lancedb",
    "path": "~/.obsidian-rag/vectors.lance"
  },
  "indexing": {
    "chunk_size": 500,
    "chunk_overlap": 100,
    "file_patterns": ["*.md"],
    "deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git"],
    "allow_dirs": []
  },
  "security": {
    "require_confirmation_for": ["health", "financial_debt"],
    "sensitive_sections": ["#mentalhealth", "#physicalhealth", "#Relations"],
    "local_only": true
  },
  "memory": {
    "auto_suggest": true,
    "patterns": {
      "financial": ["owe", "owed", "debt", "paid", "$", "spent", "spend"],
      "health": ["#mentalhealth", "#physicalhealth", "medication", "therapy"],
      "commitments": ["shopping list", "costco", "amazon", "grocery"]
    }
  }
}

Project Structure

obsidian-rag-skill/
├── README.md
├── LICENSE
├── .gitignore
├── openclaw.plugin.json
├── package.json
├── tsconfig.json
├── src/
│   ├── index.ts
│   ├── tools/
│   │   ├── search.ts
│   │   ├── index.ts
│   │   ├── status.ts
│   │   └── memory.ts
│   ├── services/
│   │   ├── vault-watcher.ts
│   │   ├── indexer-bridge.ts
│   │   └── security-guard.ts
│   └── utils/
│       ├── config.ts
│       └── lancedb.ts
├── python/
│   ├── pyproject.toml
│   ├── obsidian_rag/
│   │   ├── __init__.py
│   │   ├── cli.py
│   │   ├── indexer.py
│   │   ├── chunker.py
│   │   ├── embedder.py
│   │   ├── vector_store.py
│   │   ├── security.py
│   │   └── config.py
│   └── tests/
│       ├── test_chunker.py
│       ├── test_security.py
│       ├── test_embedder.py
│       ├── test_vector_store.py
│       └── test_indexer.py
├── tests/
│   ├── tools/
│   │   ├── search.test.ts
│   │   ├── index.test.ts
│   │   └── memory.test.ts
│   └── services/
│       ├── vault-watcher.test.ts
│       └── security-guard.test.ts
└── docs/
    └── superpowers/
        └── specs/

Testing Strategy

  • Python: pytest with mocked Ollama, path traversal tests, input sanitization, LanceDB CRUD
  • TypeScript: vitest with tool parameter validation, security guard, search filter logic
  • Security: Dedicated test suites for path traversal, XSS, prompt injection, network audit, sensitive content detection

Publishing

Published to ClawHub as both a skill (SKILL.md) and a plugin package:

clawhub skill publish ./skill --slug obsidian-rag --version 1.0.0
clawhub package publish santhosh/obsidian-rag

Install: openclaw plugins install clawhub:obsidian-rag