Files
obsidian-rag/docs/superpowers/specs/2026-04-10-obsidian-rag-task-list.md
Santhosh Janardhanan 5c281165c7 Sprint 0-1: Python indexer, TS plugin scaffolding, and test suite
## What's new

**Python indexer (`python/obsidian_rag/`)** — full pipeline from scan to LanceDB:
- `config.py` — JSON config loader with cross-platform path resolution
- `security.py` — path traversal prevention, HTML stripping, sensitive content detection, dir allow/deny lists
- `chunker.py` — section-split for journal entries (date-named files), sliding-window for unstructured notes
- `embedder.py` — Ollama `/api/embeddings` client with batched requests and timeout/error handling
- `vector_store.py` — LanceDB schema, upsert (merge_insert), delete, search with filters, stats
- `indexer.py` — full/sync/reindex pipeline orchestrator with progress yields
- `cli.py` — `index | sync | reindex | status` CLI commands

**TypeScript plugin (`src/`)** — OpenClaw plugin scaffold:
- `utils/` — config loader, TypeScript types, response envelope factory, LanceDB client
- `services/` — health state machine (HEALTHY/DEGRADED/UNAVAILABLE), vault watcher with debounce/batching, indexer bridge (subprocess spawner)
- `tools/` — 4 tool stubs: search, index, status, memory_store (OpenClaw wiring pending)
- `index.ts` — plugin entry point with health probe + vault watcher startup

**Config** (`obsidian-rag/config.json`, `openclaw.plugin.json`):
- 627 files / 3764 chunks indexed in dev vault

**Tests: 76 passing**
- Python: 64 pytest tests (chunker, security, vector_store, config)
- TypeScript: 12 vitest tests (lancedb client, response envelope)

## Bugs fixed

- LanceDB `tags` column filter: `LIKE '%tag%'` → `list_contains(tags, 'tag')` (List<String> column)
- LanceDB JS `db.list_tables()` returns `ListTablesResponse` object, not plain array
- LanceDB JS result score field: `_score` → `_distance`
- TypeScript regex literal with unescaped `/` in path-resolve regex
- Python: `create_table_if_not_exists` identity check → name comparison

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 22:56:50 -04:00

9.8 KiB

Obsidian RAG Plugin - Work Queue

Date: 2026-04-10 Based on: Work Breakdown Structure v1.0 Last Updated: 2026-04-10 21:30

Legend

  • [ ] = Pending
  • [x] = Done
  • [~] = In Progress
  • [!] = Error / Blocked

Phase 0: Project Scaffolding & Environment

0.1 Repository & Build Setup

  • 0.1.1 Initialize TypeScript project structure (S) - Create package.json, tsconfig.json, src/ directory
  • 0.1.2 Initialize Python package structure (S) - Create pyproject.toml, obsidian_rag/ module skeleton
  • 0.1.3 Create development config file (S) - Depends on 0.1.1 - Create ./obsidian-rag/config.json
  • 0.1.4 Set up OpenClaw plugin manifest (S) - Depends on 0.1.1 - Create openclaw.plugin.json
  • 0.1.5 Configure test runners (S) - Depends on 0.1.1, 0.1.2 - Setup vitest and pytest configs

0.2 Environment Validation

  • 0.2.1 Verify Ollama + mxbai-embed-large (S) - Test embedding API
  • 0.2.2 Verify LanceDB Python package (S) - Test table creation and queries
  • 0.2.3 Verify sample vault accessibility (S) - Count .md files in KnowledgeVault

Phase 1: Data Layer (Python Indexer)

1.1 Configuration (Python)

  • 1.1.1 Implement config loader (S) - Depends on 0.1.2 - Read JSON, resolve paths, validate schema
  • 1.1.2 Write config tests (S) - Depends on 1.1.1 - Test validation and path resolution

1.2 Security (Python) - Can start after 1.1.1, parallel with other components

  • 1.2.1 Implement path traversal prevention (S) - Depends on 1.1.1 - Validate paths, reject ../ and symlinks
  • 1.2.2 Implement input sanitization (S) - Depends on 1.1.1 - Strip HTML, normalize whitespace
  • 1.2.3 Implement sensitive content detection (S) - Depends on 1.1.1 - Detect health/financial/relations content
  • 1.2.4 Implement directory access control (S) - Depends on 1.1.1 - Apply deny/allow lists
  • 1.2.5 Write security tests (M) - Depends on 1.2.1-1.2.4 - Test all security functions

1.3 Chunking - Can start after 1.1.1, parallel with security

  • 1.3.1 Implement markdown parser (S) - Depends on 0.1.2 - Parse frontmatter, headings, tags
  • 1.3.2 Implement structured chunker (M) - Depends on 1.3.1 - Split by section headers
  • 1.3.3 Implement sliding window chunker (S) - Depends on 1.3.1 - 500 token window with overlap
  • 1.3.4 Implement chunk router (S) - Depends on 1.3.2, 1.3.3 - Route structured vs unstructured
  • 1.3.5 Write chunker tests (M) - Depends on 1.3.4 - Test all chunking scenarios

1.4 Embedding - Can start after 1.1.1, parallel with chunking/security

  • 1.4.1 Implement Ollama embedder (M) - Depends on 1.1.1 - Batch 64 chunks, error handling
  • 1.4.2 Implement embedding cache (S) - Depends on 1.4.1 - File-based cache
  • 1.4.3 Write embedder tests (S) - Depends on 1.4.1, 1.4.2 - Test batching and cache

1.5 Vector Store - Can start after 0.2.2, parallel with other components

  • 1.5.1 Implement LanceDB table creation (S) - Depends on 0.2.2 - Create obsidian_chunks table
  • 1.5.2 Implement vector upsert (S) - Depends on 1.5.1 - Add/update chunks
  • 1.5.3 Implement vector delete (S) - Depends on 1.5.1 - Remove by source_file
  • 1.5.4 Implement vector search (M) - Depends on 1.5.1 - Query with filters
  • 1.5.5 Write vector store tests (M) - Depends on 1.5.2-1.5.4 - Test CRUD operations

1.6 Indexer Pipeline & CLI - Depends on multiple components

  • 1.6.1 Implement full index pipeline (M) - Depends on 1.2.4, 1.3.4, 1.4.1, 1.5.2 - Scan → parse → chunk → embed → store
  • 1.6.2 Implement incremental sync (M) - Depends on 1.6.1, 1.5.3 - Compare mtime, process changes
  • 1.6.3 Implement reindex (S) - Depends on 1.6.1 - Drop table + rebuild
  • 1.6.4 Implement sync-result.json writer (S) - Depends on 1.6.1 - Atomic file writing
  • 1.6.5 Implement CLI entry point (M) - Depends on 1.6.1, 1.6.2, 1.6.3 - index/sync/reindex commands
  • 1.6.6 Write indexer tests (M) - Depends on 1.6.5 - Test full pipeline and CLI

Phase 2: Data Layer (TypeScript Client)

2.1 Configuration (TypeScript) - Can start after 0.1.1, parallel with Phase 1

  • 2.1.1 Implement config loader (S) - Depends on 0.1.1 - Read JSON, validate schema
  • 2.1.2 Implement config types (S) - Depends on 2.1.1 - TypeScript interfaces

2.2 LanceDB Client - Depends on Phase 1 completion

  • 2.2.1 Implement LanceDB query client (M) - Depends on 0.1.1 - Connect and search
  • [~] 2.2.2 Implement full-text search fallback (S) - Depends on 2.2.1 - Degraded mode

2.3 Indexer Bridge - Depends on Phase 1 completion

  • 2.3.1 Implement subprocess spawner (M) - Depends on 0.1.1 - Spawn Python CLI
  • 2.3.2 Implement sync-result reader (S) - Depends on 2.3.1 - Read sync results
  • 2.3.3 Implement job tracking (S) - Depends on 2.3.1 - Track progress

Phase 3: Session & Transport Layers

3.1 Health State Machine - Depends on Phase 2

  • 3.1.1 Implement health prober (S) - Depends on 2.1.1, 2.2.1 - Probe dependencies
  • 3.1.2 Implement state machine (S) - Depends on 3.1.1 - HEALTHY/DEGRADED/UNAVAILABLE
  • 3.1.3 Implement staleness detector (S) - Depends on 3.1.2, 2.3.2 - Detect stale syncs

3.2 Vault Watcher - Depends on Phase 2

  • 3.2.1 Implement file watcher (S) - Depends on 2.1.1 - Watch vault directory
  • 3.2.2 Implement debounce & batching (M) - Depends on 3.2.1 - Batch changes
  • 3.2.3 Implement auto-sync trigger (M) - Depends on 3.2.2, 2.3.1, 3.1.2 - Trigger sync
  • 3.2.4 Write vault watcher tests (M) - Depends on 3.2.3 - Test watcher behavior

3.3 Response Envelope & Error Normalization - Can start after 0.1.1, parallel

  • 3.3.1 Implement response envelope factory (S) - Depends on 0.1.1 - Build response structure
  • 3.3.2 Implement error normalizer (S) - Depends on 3.3.1 - Map exceptions to codes

3.4 Security Guard (TypeScript) - Can start after 2.1.1, parallel with 3.1-3.2

  • 3.4.1 Implement directory filter validator (S) - Depends on 2.1.1 - Validate filters
  • 3.4.2 Implement sensitive content flag (S) - Depends on 3.4.1 - Flag sensitive content
  • 3.4.3 Write security guard tests (S) - Depends on 3.4.2 - Test security functions

Phase 4: Tool Layer

4.1 Tool Implementations - Depends on Phase 3

  • [~] 4.1.1 Implement obsidian_rag_search tool (M) - Depends on 2.2.1, 3.3.1, 3.4.2 - Search with filters ⚠️ LanceDB TS client now wired, needs OpenClaw integration
  • [~] 4.1.2 Implement obsidian_rag_index tool (M) - Depends on 2.3.1, 2.3.3, 3.3.1 - Spawn indexer ⚠️ stub — tool registration not wired to OpenClaw
  • [~] 4.1.3 Implement obsidian_rag_status tool (S) - Depends on 3.1.2, 2.3.2, 3.3.1 - Return health status ⚠️ stub — reads sync-result not LanceDB stats
  • [~] 4.1.4 Implement obsidian_rag_memory_store tool (S) - Depends on 3.3.1 - Persist to memory ⚠️ stub — no-op
  • 4.1.5 Write tool unit tests (M) - Depends on 4.1.1-4.1.4 - Test all tools

4.2 Plugin Registration - Depends on tools

  • [~] 4.2.1 Implement plugin entry point (M) - Depends on 4.1.1-4.1.4, 3.2.3, 3.1.2 - Plugin lifecycle ⚠️ stub — tools registration is a TODO
  • 4.2.2 Verify OpenClaw plugin lifecycle (S) - Depends on 4.2.1 - Manual test

Phase 5: Integration & Hardening

5.1 Integration Tests - Depends on Phase 4

  • 5.1.1 Full pipeline integration test (M) - Depends on 1.6.5, 4.2.1 - Index → search
  • 5.1.2 Sync cycle integration test (M) - Depends on 3.2.3, 5.1.1 - Modify → auto-sync → search
  • 5.1.3 Health state integration test (S) - Depends on 3.1.2, 5.1.1 - Test state transitions
  • 5.1.4 OpenClaw protocol integration test (M) - Depends on 4.2.1 - Test all tools

5.2 Security Test Suite - Depends on relevant components

  • 5.2.1 Path traversal tests (S) - Depends on 1.2.1, 3.4.1 - Test ../, symlinks, Windows paths
  • 5.2.2 XSS prevention tests (S) - Depends on 1.2.2 - Test HTML injection
  • 5.2.3 Prompt injection tests (S) - Depends on 4.1.1 - Test malicious content
  • 5.2.4 Network audit test (S) - Depends on 1.4.1 - Verify no outbound requests
  • 5.2.5 Sensitive content tests (S) - Depends on 1.2.3, 3.4.2 - Test detection and flagging

5.3 Documentation & Publishing - Depends on integration tests

  • 5.3.1 Write README (S) - Depends on 4.2.1 - Usage and setup docs
  • 5.3.2 Create SKILL.md (S) - Depends on 4.2.1 - Skill manifest
  • 5.3.3 Publish to ClawHub (S) - Depends on 5.1.1-5.2.5 - Publish skill

Progress Summary

Phase Tasks Done Pending In Progress Blocked
Phase 0: Scaffolding 8 8 0 0 0
Phase 1: Python Indexer 20 16 2 2 0
Phase 2: TS Client 7 6 0 1 0
Phase 3: Session/Transport 10 8 1 1 0
Phase 4: Tool Layer 7 1 5 1 0
Phase 5: Integration 12 0 12 0 0
Total 64 40 20 5 0

Critical Path

  1. Phase 0 → Phase 1 → Phase 2 → Phase 3 → Phase 4 → Phase 5
  2. 0.1.1-0.1.5 → 1.1.1 → 1.3.1 → 1.6.1 → 2.2.1 → 3.1.1 → 3.2.1 → 4.1.1 → 4.2.1 → 5.1.1

Parallel Work Opportunities

  • After 1.1.1: Security (1.2), Chunking (1.3), Embedding (1.4) can work in parallel
  • After 0.2.2: Vector Store (1.5) can work in parallel with other components
  • After 0.1.1: TypeScript Config (2.1) can start early
  • Phase 3: Response Envelope (3.3) and Security Guard (3.4) can work in parallel with Health (3.1) and Watcher (3.2)

Effort Estimates

  • Small tasks (S): 31 tasks (~1-2 sessions each)
  • Medium tasks (M): 27 tasks (~3-5 sessions each)
  • Total: 76-123 sessions across all phases