Todo list

This commit is contained in:
2026-04-10 19:15:38 -04:00
parent 2c976bb75b
commit 18ad47e100
2 changed files with 181 additions and 0 deletions

View File

@@ -0,0 +1,181 @@
# Obsidian RAG Plugin - Work Queue
**Date:** 2026-04-10
**Based on:** Work Breakdown Structure v1.0
**Last Updated:** 2026-04-10
## Legend
- `[ ]` = Pending
- `[x]` = Done
- `[~]` = In Progress
- `[!]` = Error / Blocked
---
## Phase 0: Project Scaffolding & Environment
### 0.1 Repository & Build Setup
- [ ] **0.1.1** Initialize TypeScript project structure (S) - Create package.json, tsconfig.json, src/ directory
- [ ] **0.1.2** Initialize Python package structure (S) - Create pyproject.toml, obsidian_rag/ module skeleton
- [ ] **0.1.3** Create development config file (S) - Depends on 0.1.1 - Create ./obsidian-rag/config.json
- [ ] **0.1.4** Set up OpenClaw plugin manifest (S) - Depends on 0.1.1 - Create openclaw.plugin.json
- [ ] **0.1.5** Configure test runners (S) - Depends on 0.1.1, 0.1.2 - Setup vitest and pytest configs
### 0.2 Environment Validation
- [ ] **0.2.1** Verify Ollama + mxbai-embed-large (S) - Test embedding API
- [ ] **0.2.2** Verify LanceDB Python package (S) - Test table creation and queries
- [ ] **0.2.3** Verify sample vault accessibility (S) - Count .md files in KnowledgeVault
---
## Phase 1: Data Layer (Python Indexer)
### 1.1 Configuration (Python)
- [ ] **1.1.1** Implement config loader (S) - Depends on 0.1.2 - Read JSON, resolve paths, validate schema
- [ ] **1.1.2** Write config tests (S) - Depends on 1.1.1 - Test validation and path resolution
### 1.2 Security (Python) - Can start after 1.1.1, parallel with other components
- [ ] **1.2.1** Implement path traversal prevention (S) - Depends on 1.1.1 - Validate paths, reject ../ and symlinks
- [ ] **1.2.2** Implement input sanitization (S) - Depends on 1.1.1 - Strip HTML, normalize whitespace
- [ ] **1.2.3** Implement sensitive content detection (S) - Depends on 1.1.1 - Detect health/financial/relations content
- [ ] **1.2.4** Implement directory access control (S) - Depends on 1.1.1 - Apply deny/allow lists
- [ ] **1.2.5** Write security tests (M) - Depends on 1.2.1-1.2.4 - Test all security functions
### 1.3 Chunking - Can start after 1.1.1, parallel with security
- [ ] **1.3.1** Implement markdown parser (S) - Depends on 0.1.2 - Parse frontmatter, headings, tags
- [ ] **1.3.2** Implement structured chunker (M) - Depends on 1.3.1 - Split by section headers
- [ ] **1.3.3** Implement sliding window chunker (S) - Depends on 1.3.1 - 500 token window with overlap
- [ ] **1.3.4** Implement chunk router (S) - Depends on 1.3.2, 1.3.3 - Route structured vs unstructured
- [ ] **1.3.5** Write chunker tests (M) - Depends on 1.3.4 - Test all chunking scenarios
### 1.4 Embedding - Can start after 1.1.1, parallel with chunking/security
- [ ] **1.4.1** Implement Ollama embedder (M) - Depends on 1.1.1 - Batch 64 chunks, error handling
- [ ] **1.4.2** Implement embedding cache (S) - Depends on 1.4.1 - File-based cache
- [ ] **1.4.3** Write embedder tests (S) - Depends on 1.4.1, 1.4.2 - Test batching and cache
### 1.5 Vector Store - Can start after 0.2.2, parallel with other components
- [ ] **1.5.1** Implement LanceDB table creation (S) - Depends on 0.2.2 - Create obsidian_chunks table
- [ ] **1.5.2** Implement vector upsert (S) - Depends on 1.5.1 - Add/update chunks
- [ ] **1.5.3** Implement vector delete (S) - Depends on 1.5.1 - Remove by source_file
- [ ] **1.5.4** Implement vector search (M) - Depends on 1.5.1 - Query with filters
- [ ] **1.5.5** Write vector store tests (M) - Depends on 1.5.2-1.5.4 - Test CRUD operations
### 1.6 Indexer Pipeline & CLI - Depends on multiple components
- [ ] **1.6.1** Implement full index pipeline (M) - Depends on 1.2.4, 1.3.4, 1.4.1, 1.5.2 - Scan → parse → chunk → embed → store
- [ ] **1.6.2** Implement incremental sync (M) - Depends on 1.6.1, 1.5.3 - Compare mtime, process changes
- [ ] **1.6.3** Implement reindex (S) - Depends on 1.6.1 - Drop table + rebuild
- [ ] **1.6.4** Implement sync-result.json writer (S) - Depends on 1.6.1 - Atomic file writing
- [ ] **1.6.5** Implement CLI entry point (M) - Depends on 1.6.1, 1.6.2, 1.6.3 - index/sync/reindex commands
- [ ] **1.6.6** Write indexer tests (M) - Depends on 1.6.5 - Test full pipeline and CLI
---
## Phase 2: Data Layer (TypeScript Client)
### 2.1 Configuration (TypeScript) - Can start after 0.1.1, parallel with Phase 1
- [ ] **2.1.1** Implement config loader (S) - Depends on 0.1.1 - Read JSON, validate schema
- [ ] **2.1.2** Implement config types (S) - Depends on 2.1.1 - TypeScript interfaces
### 2.2 LanceDB Client - Depends on Phase 1 completion
- [ ] **2.2.1** Implement LanceDB query client (M) - Depends on 0.1.1 - Connect and search
- [ ] **2.2.2** Implement full-text search fallback (S) - Depends on 2.2.1 - Degraded mode
### 2.3 Indexer Bridge - Depends on Phase 1 completion
- [ ] **2.3.1** Implement subprocess spawner (M) - Depends on 0.1.1 - Spawn Python CLI
- [ ] **2.3.2** Implement sync-result reader (S) - Depends on 2.3.1 - Read sync results
- [ ] **2.3.3** Implement job tracking (S) - Depends on 2.3.1 - Track progress
---
## Phase 3: Session & Transport Layers
### 3.1 Health State Machine - Depends on Phase 2
- [ ] **3.1.1** Implement health prober (S) - Depends on 2.1.1, 2.2.1 - Probe dependencies
- [ ] **3.1.2** Implement state machine (S) - Depends on 3.1.1 - HEALTHY/DEGRADED/UNAVAILABLE
- [ ] **3.1.3** Implement staleness detector (S) - Depends on 3.1.2, 2.3.2 - Detect stale syncs
### 3.2 Vault Watcher - Depends on Phase 2
- [ ] **3.2.1** Implement file watcher (S) - Depends on 2.1.1 - Watch vault directory
- [ ] **3.2.2** Implement debounce & batching (M) - Depends on 3.2.1 - Batch changes
- [ ] **3.2.3** Implement auto-sync trigger (M) - Depends on 3.2.2, 2.3.1, 3.1.2 - Trigger sync
- [ ] **3.2.4** Write vault watcher tests (M) - Depends on 3.2.3 - Test watcher behavior
### 3.3 Response Envelope & Error Normalization - Can start after 0.1.1, parallel
- [ ] **3.3.1** Implement response envelope factory (S) - Depends on 0.1.1 - Build response structure
- [ ] **3.3.2** Implement error normalizer (S) - Depends on 3.3.1 - Map exceptions to codes
### 3.4 Security Guard (TypeScript) - Can start after 2.1.1, parallel with 3.1-3.2
- [ ] **3.4.1** Implement directory filter validator (S) - Depends on 2.1.1 - Validate filters
- [ ] **3.4.2** Implement sensitive content flag (S) - Depends on 3.4.1 - Flag sensitive content
- [ ] **3.4.3** Write security guard tests (S) - Depends on 3.4.2 - Test security functions
---
## Phase 4: Tool Layer
### 4.1 Tool Implementations - Depends on Phase 3
- [ ] **4.1.1** Implement obsidian_rag_search tool (M) - Depends on 2.2.1, 3.3.1, 3.4.2 - Search with filters
- [ ] **4.1.2** Implement obsidian_rag_index tool (M) - Depends on 2.3.1, 2.3.3, 3.3.1 - Spawn indexer
- [ ] **4.1.3** Implement obsidian_rag_status tool (S) - Depends on 3.1.2, 2.3.2, 3.3.1 - Return health status
- [ ] **4.1.4** Implement obsidian_rag_memory_store tool (S) - Depends on 3.3.1 - Persist to memory
- [ ] **4.1.5** Write tool unit tests (M) - Depends on 4.1.1-4.1.4 - Test all tools
### 4.2 Plugin Registration - Depends on tools
- [ ] **4.2.1** Implement plugin entry point (M) - Depends on 4.1.1-4.1.4, 3.2.3, 3.1.2 - Plugin lifecycle
- [ ] **4.2.2** Verify OpenClaw plugin lifecycle (S) - Depends on 4.2.1 - Manual test
---
## Phase 5: Integration & Hardening
### 5.1 Integration Tests - Depends on Phase 4
- [ ] **5.1.1** Full pipeline integration test (M) - Depends on 1.6.5, 4.2.1 - Index → search
- [ ] **5.1.2** Sync cycle integration test (M) - Depends on 3.2.3, 5.1.1 - Modify → auto-sync → search
- [ ] **5.1.3** Health state integration test (S) - Depends on 3.1.2, 5.1.1 - Test state transitions
- [ ] **5.1.4** OpenClaw protocol integration test (M) - Depends on 4.2.1 - Test all tools
### 5.2 Security Test Suite - Depends on relevant components
- [ ] **5.2.1** Path traversal tests (S) - Depends on 1.2.1, 3.4.1 - Test ../, symlinks, Windows paths
- [ ] **5.2.2** XSS prevention tests (S) - Depends on 1.2.2 - Test HTML injection
- [ ] **5.2.3** Prompt injection tests (S) - Depends on 4.1.1 - Test malicious content
- [ ] **5.2.4** Network audit test (S) - Depends on 1.4.1 - Verify no outbound requests
- [ ] **5.2.5** Sensitive content tests (S) - Depends on 1.2.3, 3.4.2 - Test detection and flagging
### 5.3 Documentation & Publishing - Depends on integration tests
- [ ] **5.3.1** Write README (S) - Depends on 4.2.1 - Usage and setup docs
- [ ] **5.3.2** Create SKILL.md (S) - Depends on 4.2.1 - Skill manifest
- [ ] **5.3.3** Publish to ClawHub (S) - Depends on 5.1.1-5.2.5 - Publish skill
---
## Progress Summary
| Phase | Tasks | Done | Pending | In Progress | Blocked |
|-------|-------|------|---------|-------------|---------|
| Phase 0: Scaffolding | 8 | 0 | 8 | 0 | 0 |
| Phase 1: Python Indexer | 20 | 0 | 20 | 0 | 0 |
| Phase 2: TS Client | 7 | 0 | 7 | 0 | 0 |
| Phase 3: Session/Transport | 10 | 0 | 10 | 0 | 0 |
| Phase 4: Tool Layer | 7 | 0 | 7 | 0 | 0 |
| Phase 5: Integration | 12 | 0 | 12 | 0 | 0 |
| **Total** | **64** | **0** | **64** | **0** | **0** |
---
## Critical Path
1. Phase 0 → Phase 1 → Phase 2 → Phase 3 → Phase 4 → Phase 5
2. 0.1.1-0.1.5 → 1.1.1 → 1.3.1 → 1.6.1 → 2.2.1 → 3.1.1 → 3.2.1 → 4.1.1 → 4.2.1 → 5.1.1
## Parallel Work Opportunities
- **After 1.1.1**: Security (1.2), Chunking (1.3), Embedding (1.4) can work in parallel
- **After 0.2.2**: Vector Store (1.5) can work in parallel with other components
- **After 0.1.1**: TypeScript Config (2.1) can start early
- **Phase 3**: Response Envelope (3.3) and Security Guard (3.4) can work in parallel with Health (3.1) and Watcher (3.2)
## Effort Estimates
- **Small tasks (S)**: 31 tasks (~1-2 sessions each)
- **Medium tasks (M)**: 27 tasks (~3-5 sessions each)
- **Total**: 76-123 sessions across all phases

View File

@@ -0,0 +1,213 @@
# Obsidian RAG Plugin for OpenClaw — Design Spec
**Date:** 2026-04-10
**Status:** Approved
**Author:** Santhosh Janardhanan
## Overview
An OpenClaw plugin that enables semantic search through Obsidian vault notes using RAG (Retrieval-Augmented Generation). The plugin allows OpenClaw to respond to natural language queries about personal journal entries, shopping lists, financial records, health data, podcast notes, and project ideas stored in an Obsidian vault.
## Problem Statement
Personal knowledge is fragmented across 677+ markdown files in an Obsidian vault, organized by topic but not searchable by meaning. Questions like "How was my mental health in 2024?" or "How much do I owe Sreenivas?" require reading multiple files across directories and synthesizing the answer. The plugin provides semantic search to surface relevant context.
## Architecture
### Approach: Separate Indexer Service + Thin Plugin
```
KnowledgeVault → Python Indexer (CLI) → LanceDB (filesystem)
↑ query
OpenClaw → TS Plugin (tools) ─────────────┘
```
- **Python Indexer**: Handles vault scanning, markdown parsing, chunking, embedding generation via Ollama, and LanceDB storage. Runs as a CLI tool.
- **TypeScript Plugin**: Registers OpenClaw tools that query the pre-built LanceDB index. Thin wrapper that provides the agent interface.
- **LanceDB**: Embedded vector database stored on local filesystem at `~/.obsidian-rag/vectors.lance`. No server required.
## Technology Choices
| Component | Choice | Rationale |
|-----------|--------|-----------|
| Embedding model | `mxbai-embed-large` (1024-dim) via Ollama | Local, free, meets 1024+ dimension requirement, SOTA accuracy |
| Vector store | LanceDB (embedded) | No server, file-based, Rust-based efficiency, zero-copy versioning for incremental updates |
| Indexer language | Python | Richer embedding/ML ecosystem, better markdown parsing libraries |
| Plugin language | TypeScript | Native OpenClaw ecosystem, type safety, SDK examples |
| Config | Separate `.obsidian-rag/config.json` | Keeps plugin config separate from OpenClaw config |
## CLI Commands (Python Indexer)
| Command | Purpose |
|---------|---------|
| `obsidian-rag index` | Initial full index of the vault (first-time setup) |
| `obsidian-rag sync` | Incremental — only process files modified since last sync |
| `obsidian-rag reindex` | Force full reindex (nuke existing, start fresh) |
| `obsidian-rag status` | Show index health: total docs, last sync time, unindexed files |
## Plugin Tools (TypeScript)
### `obsidian_rag_search`
Primary search tool for OpenClaw agent.
**Parameters:**
- `query` (required, string): Natural language question
- `max_results` (optional, default 5): Max chunks to return
- `directory_filter` (optional, string or string[]): Limit to subdirectories (e.g., `["Journal", "Entertainment Index"]`)
- `date_range` (optional, object): `{ from: "2025-01-01", to: "2025-12-31" }`
- `tags` (optional, string[]): Filter by hashtags
### `obsidian_rag_index`
Trigger indexing from within OpenClaw.
**Parameters:**
- `mode` (required, enum): `"full"` | `"sync"` | `"reindex"`
### `obsidian_rag_status`
Check index health — doc count, last sync, unindexed files.
### `obsidian_rag_memory_store`
Commit important facts to OpenClaw's memory for faster future retrieval.
**Parameters:**
- `key` (string): Identifier
- `value` (string): The fact to remember
- `source` (string): Source file path
**Auto-suggest logic:** When search results contain financial, health, or commitment patterns, the plugin suggests the agent use `obsidian_rag_memory_store`. The agent decides whether to commit.
## Chunking Strategy
### Structured notes (Journal entries)
Chunk by section headers (`#mentalhealth`, `#finance`, etc.). Each section becomes its own chunk with metadata: `source_file`, `section_name`, `date`, `tags`.
### Unstructured notes (shopping lists, project ideas, entertainment index)
Sliding window chunking (500 tokens, 100 token overlap). Each chunk gets metadata: `source_file`, `chunk_index`, `total_chunks`, `headings`.
### Metadata per chunk
- `source_file`: Relative path from vault root
- `source_directory`: Top-level directory (enables directory filtering)
- `section`: Section heading (for structured notes)
- `date`: Parsed from filename (journal entries)
- `tags`: All hashtags found in the chunk
- `chunk_index`: Position within the document
- `modified_at`: File mtime for incremental sync
## Security & Privacy
1. **Path traversal prevention** — All file reads restricted to configured vault path. No `../`, symlinks outside vault, or absolute paths.
2. **Input sanitization** — Strip HTML tags, remove executable code blocks, normalize whitespace. All vault content treated as untrusted.
3. **Local-only enforcement** — Ollama on localhost, LanceDB on filesystem. Network audit test verifies no outbound requests.
4. **Directory allow/deny lists** — Config supports `deny_dirs` (default: `.obsidian`, `.trash`, `zzz-Archive`, `.git`) and `allow_dirs`.
5. **Sensitive content guard** — Detects health (`#mentalhealth`, `#physicalhealth`), financial debt, and personal relationship content. Blocks external API transmission of sensitive content. Requires user confirmation if an external embedding endpoint is configured.
## Configuration
Config file at `~/.obsidian-rag/config.json`:
```json
{
"vault_path": "/home/san/KnowledgeVault/Default",
"embedding": {
"provider": "ollama",
"model": "mxbai-embed-large",
"base_url": "http://localhost:11434",
"dimensions": 1024
},
"vector_store": {
"type": "lancedb",
"path": "~/.obsidian-rag/vectors.lance"
},
"indexing": {
"chunk_size": 500,
"chunk_overlap": 100,
"file_patterns": ["*.md"],
"deny_dirs": [".obsidian", ".trash", "zzz-Archive", ".git"],
"allow_dirs": []
},
"security": {
"require_confirmation_for": ["health", "financial_debt"],
"sensitive_sections": ["#mentalhealth", "#physicalhealth", "#Relations"],
"local_only": true
},
"memory": {
"auto_suggest": true,
"patterns": {
"financial": ["owe", "owed", "debt", "paid", "$", "spent", "spend"],
"health": ["#mentalhealth", "#physicalhealth", "medication", "therapy"],
"commitments": ["shopping list", "costco", "amazon", "grocery"]
}
}
}
```
## Project Structure
```
obsidian-rag-skill/
├── README.md
├── LICENSE
├── .gitignore
├── openclaw.plugin.json
├── package.json
├── tsconfig.json
├── src/
│ ├── index.ts
│ ├── tools/
│ │ ├── search.ts
│ │ ├── index.ts
│ │ ├── status.ts
│ │ └── memory.ts
│ ├── services/
│ │ ├── vault-watcher.ts
│ │ ├── indexer-bridge.ts
│ │ └── security-guard.ts
│ └── utils/
│ ├── config.ts
│ └── lancedb.ts
├── python/
│ ├── pyproject.toml
│ ├── obsidian_rag/
│ │ ├── __init__.py
│ │ ├── cli.py
│ │ ├── indexer.py
│ │ ├── chunker.py
│ │ ├── embedder.py
│ │ ├── vector_store.py
│ │ ├── security.py
│ │ └── config.py
│ └── tests/
│ ├── test_chunker.py
│ ├── test_security.py
│ ├── test_embedder.py
│ ├── test_vector_store.py
│ └── test_indexer.py
├── tests/
│ ├── tools/
│ │ ├── search.test.ts
│ │ ├── index.test.ts
│ │ └── memory.test.ts
│ └── services/
│ ├── vault-watcher.test.ts
│ └── security-guard.test.ts
└── docs/
└── superpowers/
└── specs/
```
## Testing Strategy
- **Python**: pytest with mocked Ollama, path traversal tests, input sanitization, LanceDB CRUD
- **TypeScript**: vitest with tool parameter validation, security guard, search filter logic
- **Security**: Dedicated test suites for path traversal, XSS, prompt injection, network audit, sensitive content detection
## Publishing
Published to ClawHub as both a skill (SKILL.md) and a plugin package:
```bash
clawhub skill publish ./skill --slug obsidian-rag --version 1.0.0
clawhub package publish santhosh/obsidian-rag
```
Install: `openclaw plugins install clawhub:obsidian-rag`