Todo list
This commit is contained in:
213
docs/superpowers/specs/obsidian-rag-design.md
Normal file
213
docs/superpowers/specs/obsidian-rag-design.md
Normal file
@@ -0,0 +1,213 @@
|
||||
# Obsidian RAG Plugin for OpenClaw — Design Spec
|
||||
|
||||
**Date:** 2026-04-10
|
||||
**Status:** Approved
|
||||
**Author:** Santhosh Janardhanan
|
||||
|
||||
## Overview
|
||||
|
||||
An OpenClaw plugin that enables semantic search through Obsidian vault notes using RAG (Retrieval-Augmented Generation). The plugin allows OpenClaw to respond to natural language queries about personal journal entries, shopping lists, financial records, health data, podcast notes, and project ideas stored in an Obsidian vault.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Personal knowledge is fragmented across 677+ markdown files in an Obsidian vault, organized by topic but not searchable by meaning. Questions like "How was my mental health in 2024?" or "How much do I owe Sreenivas?" require reading multiple files across directories and synthesizing the answer. The plugin provides semantic search to surface relevant context.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Approach: Separate Indexer Service + Thin Plugin
|
||||
|
||||
```
|
||||
KnowledgeVault → Python Indexer (CLI) → LanceDB (filesystem)
|
||||
↑ query
|
||||
OpenClaw → TS Plugin (tools) ─────────────┘
|
||||
```
|
||||
|
||||
- **Python Indexer**: Handles vault scanning, markdown parsing, chunking, embedding generation via Ollama, and LanceDB storage. Runs as a CLI tool.
|
||||
- **TypeScript Plugin**: Registers OpenClaw tools that query the pre-built LanceDB index. Thin wrapper that provides the agent interface.
|
||||
- **LanceDB**: Embedded vector database stored on local filesystem at `~/.obsidian-rag/vectors.lance`. No server required.
|
||||
|
||||
## Technology Choices
|
||||
|
||||
| Component | Choice | Rationale |
|
||||
|-----------|--------|-----------|
|
||||
| Embedding model | `mxbai-embed-large` (1024-dim) via Ollama | Local, free, meets 1024+ dimension requirement, SOTA accuracy |
|
||||
| Vector store | LanceDB (embedded) | No server, file-based, Rust-based efficiency, zero-copy versioning for incremental updates |
|
||||
| Indexer language | Python | Richer embedding/ML ecosystem, better markdown parsing libraries |
|
||||
| Plugin language | TypeScript | Native OpenClaw ecosystem, type safety, SDK examples |
|
||||
| Config | Separate `.obsidian-rag/config.json` | Keeps plugin config separate from OpenClaw config |
|
||||
|
||||
## CLI Commands (Python Indexer)
|
||||
|
||||
| Command | Purpose |
|
||||
|---------|---------|
|
||||
| `obsidian-rag index` | Initial full index of the vault (first-time setup) |
|
||||
| `obsidian-rag sync` | Incremental — only process files modified since last sync |
|
||||
| `obsidian-rag reindex` | Force full reindex (nuke existing, start fresh) |
|
||||
| `obsidian-rag status` | Show index health: total docs, last sync time, unindexed files |
|
||||
|
||||
## Plugin Tools (TypeScript)
|
||||
|
||||
### `obsidian_rag_search`
|
||||
Primary search tool for OpenClaw agent.
|
||||
|
||||
**Parameters:**
|
||||
- `query` (required, string): Natural language question
|
||||
- `max_results` (optional, default 5): Max chunks to return
|
||||
- `directory_filter` (optional, string or string[]): Limit to subdirectories (e.g., `["Journal", "Entertainment Index"]`)
|
||||
- `date_range` (optional, object): `{ from: "2025-01-01", to: "2025-12-31" }`
|
||||
- `tags` (optional, string[]): Filter by hashtags
|
||||
|
||||
### `obsidian_rag_index`
|
||||
Trigger indexing from within OpenClaw.
|
||||
|
||||
**Parameters:**
|
||||
- `mode` (required, enum): `"full"` | `"sync"` | `"reindex"`
|
||||
|
||||
### `obsidian_rag_status`
|
||||
Check index health — doc count, last sync, unindexed files.
|
||||
|
||||
### `obsidian_rag_memory_store`
|
||||
Commit important facts to OpenClaw's memory for faster future retrieval.
|
||||
|
||||
**Parameters:**
|
||||
- `key` (string): Identifier
|
||||
- `value` (string): The fact to remember
|
||||
- `source` (string): Source file path
|
||||
|
||||
**Auto-suggest logic:** When search results contain financial, health, or commitment patterns, the plugin suggests the agent use `obsidian_rag_memory_store`. The agent decides whether to commit.
|
||||
|
||||
## Chunking Strategy
|
||||
|
||||
### Structured notes (Journal entries)
|
||||
Chunk by section headers (`#mentalhealth`, `#finance`, etc.). Each section becomes its own chunk with metadata: `source_file`, `section_name`, `date`, `tags`.
|
||||
|
||||
### Unstructured notes (shopping lists, project ideas, entertainment index)
|
||||
Sliding window chunking (500 tokens, 100 token overlap). Each chunk gets metadata: `source_file`, `chunk_index`, `total_chunks`, `headings`.
|
||||
|
||||
### Metadata per chunk
|
||||
- `source_file`: Relative path from vault root
|
||||
- `source_directory`: Top-level directory (enables directory filtering)
|
||||
- `section`: Section heading (for structured notes)
|
||||
- `date`: Parsed from filename (journal entries)
|
||||
- `tags`: All hashtags found in the chunk
|
||||
- `chunk_index`: Position within the document
|
||||
- `modified_at`: File mtime for incremental sync
|
||||
|
||||
## Security & Privacy
|
||||
|
||||
1. **Path traversal prevention** — All file reads restricted to configured vault path. No `../`, symlinks outside vault, or absolute paths.
|
||||
2. **Input sanitization** — Strip HTML tags, remove executable code blocks, normalize whitespace. All vault content treated as untrusted.
|
||||
3. **Local-only enforcement** — Ollama on localhost, LanceDB on filesystem. Network audit test verifies no outbound requests.
|
||||
4. **Directory allow/deny lists** — Config supports `deny_dirs` (default: `.obsidian`, `.trash`, `zzz-Archive`, `.git`) and `allow_dirs`.
|
||||
5. **Sensitive content guard** — Detects health (`#mentalhealth`, `#physicalhealth`), financial debt, and personal relationship content. Blocks external API transmission of sensitive content. Requires user confirmation if an external embedding endpoint is configured.
|
||||
|
||||
## Configuration
|
||||
|
||||
Config file at `~/.obsidian-rag/config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"vault_path": "/home/san/KnowledgeVault/Default",
|
||||
"embedding": {
|
||||
"provider": "ollama",
|
||||
"model": "mxbai-embed-large",
|
||||
"base_url": "http://localhost:11434",
|
||||
"dimensions": 1024
|
||||
},
|
||||
"vector_store": {
|
||||
"type": "lancedb",
|
||||
"path": "~/.obsidian-rag/vectors.lance"
|
||||
},
|
||||
"indexing": {
|
||||
"chunk_size": 500,
|
||||
"chunk_overlap": 100,
|
||||
"file_patterns": ["*.md"],
|
||||
"deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git"],
|
||||
"allow_dirs": []
|
||||
},
|
||||
"security": {
|
||||
"require_confirmation_for": ["health", "financial_debt"],
|
||||
"sensitive_sections": ["#mentalhealth", "#physicalhealth", "#Relations"],
|
||||
"local_only": true
|
||||
},
|
||||
"memory": {
|
||||
"auto_suggest": true,
|
||||
"patterns": {
|
||||
"financial": ["owe", "owed", "debt", "paid", "$", "spent", "spend"],
|
||||
"health": ["#mentalhealth", "#physicalhealth", "medication", "therapy"],
|
||||
"commitments": ["shopping list", "costco", "amazon", "grocery"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
obsidian-rag-skill/
|
||||
├── README.md
|
||||
├── LICENSE
|
||||
├── .gitignore
|
||||
├── openclaw.plugin.json
|
||||
├── package.json
|
||||
├── tsconfig.json
|
||||
├── src/
|
||||
│ ├── index.ts
|
||||
│ ├── tools/
|
||||
│ │ ├── search.ts
|
||||
│ │ ├── index.ts
|
||||
│ │ ├── status.ts
|
||||
│ │ └── memory.ts
|
||||
│ ├── services/
|
||||
│ │ ├── vault-watcher.ts
|
||||
│ │ ├── indexer-bridge.ts
|
||||
│ │ └── security-guard.ts
|
||||
│ └── utils/
|
||||
│ ├── config.ts
|
||||
│ └── lancedb.ts
|
||||
├── python/
|
||||
│ ├── pyproject.toml
|
||||
│ ├── obsidian_rag/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── cli.py
|
||||
│ │ ├── indexer.py
|
||||
│ │ ├── chunker.py
|
||||
│ │ ├── embedder.py
|
||||
│ │ ├── vector_store.py
|
||||
│ │ ├── security.py
|
||||
│ │ └── config.py
|
||||
│ └── tests/
|
||||
│ ├── test_chunker.py
|
||||
│ ├── test_security.py
|
||||
│ ├── test_embedder.py
|
||||
│ ├── test_vector_store.py
|
||||
│ └── test_indexer.py
|
||||
├── tests/
|
||||
│ ├── tools/
|
||||
│ │ ├── search.test.ts
|
||||
│ │ ├── index.test.ts
|
||||
│ │ └── memory.test.ts
|
||||
│ └── services/
|
||||
│ ├── vault-watcher.test.ts
|
||||
│ └── security-guard.test.ts
|
||||
└── docs/
|
||||
└── superpowers/
|
||||
└── specs/
|
||||
```
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
- **Python**: pytest with mocked Ollama, path traversal tests, input sanitization, LanceDB CRUD
|
||||
- **TypeScript**: vitest with tool parameter validation, security guard, search filter logic
|
||||
- **Security**: Dedicated test suites for path traversal, XSS, prompt injection, network audit, sensitive content detection
|
||||
|
||||
## Publishing
|
||||
|
||||
Published to ClawHub as both a skill (SKILL.md) and a plugin package:
|
||||
|
||||
```bash
|
||||
clawhub skill publish ./skill --slug obsidian-rag --version 1.0.0
|
||||
clawhub package publish santhosh/obsidian-rag
|
||||
```
|
||||
|
||||
Install: `openclaw plugins install clawhub:obsidian-rag`
|
||||
Reference in New Issue
Block a user