Files
obsidian-rag/docs/superpowers/specs/2026-04-10-obsidian-rag-wbs.md
Santhosh Janardhanan b8996d2ecb Add Work Breakdown Structure for Obsidian RAG Plugin
64 work packages across 5 phases and 15 work areas, organized
bottom-up through the layered protocol architecture. Includes
dependency map, critical path, parallelizable work, effort
estimates, and risk items.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 18:49:58 -04:00

272 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Obsidian RAG Plugin — Work Breakdown Structure
**Date:** 2026-04-10
**Based on:** Technical Design Document v1.0
## WBS Overview
The work is decomposed into **5 phases**, **15 work areas**, and **48 work packages**. Phases are sequenced by dependency: foundation first, then bottom-up through the protocol layers, then integration and hardening.
Each work package follows the format:
- **ID**: Hierarchical code (e.g., 1.1.2)
- **Name**: Imperative, action-oriented title
- **Delivers**: Concrete artifact or behavior
- **Depends on**: Prerequisite WBS IDs
- **Effort**: S/M/L relative sizing (S=1-2 sessions, M=3-5 sessions, L=6+ sessions)
---
## Phase 0: Project Scaffolding & Environment
### 0.1 Repository & Build Setup
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 0.1.1 | Initialize TypeScript project structure | package.json, tsconfig.json, src/ directory skeleton | — | S |
| 0.1.2 | Initialize Python package structure | pyproject.toml, obsidian_rag/ module skeleton | — | S |
| 0.1.3 | Create development config file | ./obsidian-rag/config.json with ./KnowledgeVault/Default | 0.1.1 | S |
| 0.1.4 | Set up OpenClaw plugin manifest | openclaw.plugin.json with tool declarations | 0.1.1 | S |
| 0.1.5 | Configure test runners | vitest config (TS), pytest config (Python) | 0.1.1, 0.1.2 | S |
### 0.2 Environment Validation
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 0.2.1 | Verify Ollama + mxbai-embed-large | Script that calls /api/embed and returns 1024-dim vector | — | S |
| 0.2.2 | Verify LanceDB Python package | Script that creates a table, inserts, queries | — | S |
| 0.2.3 | Verify sample vault accessibility | Script that walks ./KnowledgeVault/Default and counts .md files | — | S |
---
## Phase 1: Data Layer (Python Indexer)
The data layer is the foundation — everything else depends on it being able to index and store vectors.
### 1.1 Configuration (Python)
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 1.1.1 | Implement config loader | config.py — reads JSON config, resolves paths cross-platform, validates schema | 0.1.2 | S |
| 1.1.2 | Write config tests | test_config.py — valid/invalid config, path resolution, defaults | 1.1.1 | S |
### 1.2 Security (Python)
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 1.2.1 | Implement path traversal prevention | security.py — validate_path() rejects ../, absolute, symlinks outside vault | 1.1.1 | S |
| 1.2.2 | Implement input sanitization | security.py — sanitize_text() strips HTML, code blocks, normalizes whitespace, caps length | 1.1.1 | S |
| 1.2.3 | Implement sensitive content detection | security.py — detect_sensitive() returns categories matched (health/financial/relations) | 1.1.1 | S |
| 1.2.4 | Implement directory access control | security.py — should_index_dir() applies deny/allow lists | 1.1.1 | S |
| 1.2.5 | Write security tests | test_security.py — path traversal vectors (incl. Windows), sanitization, sensitive detection, dir control | 1.2.11.2.4 | M |
### 1.3 Chunking
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 1.3.1 | Implement markdown parser | chunker.py — parse frontmatter, headings, tags, date from filename | 0.1.2 | S |
| 1.3.2 | Implement structured chunker | chunker.py — split by section headers, each section = chunk with metadata | 1.3.1 | M |
| 1.3.3 | Implement sliding window chunker | chunker.py — 500 token window, 100 overlap, for unstructured notes | 1.3.1 | S |
| 1.3.4 | Implement chunk router | chunker.py — detect structured vs unstructured, route to correct chunker | 1.3.2, 1.3.3 | S |
| 1.3.5 | Write chunker tests | test_chunker.py — section splitting, sliding window, metadata, edge cases (empty, single-line) | 1.3.4 | M |
### 1.4 Embedding
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 1.4.1 | Implement Ollama embedder | embedder.py — call /api/embed, batch 64 chunks, handle errors/retries | 1.1.1 | M |
| 1.4.2 | Implement embedding cache | embedder.py — optional file-based cache to avoid re-embedding unchanged chunks | 1.4.1 | S |
| 1.4.3 | Write embedder tests | test_embedder.py — mocked Ollama, batch handling, error recovery, cache hit/miss | 1.4.1, 1.4.2 | S |
### 1.5 Vector Store
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 1.5.1 | Implement LanceDB table creation | vector_store.py — create obsidian_chunks table with schema | 0.2.2 | S |
| 1.5.2 | Implement vector upsert | vector_store.py — add/update chunks by chunk_id | 1.5.1 | S |
| 1.5.3 | Implement vector delete | vector_store.py — remove chunks by source_file (for deleted files) | 1.5.1 | S |
| 1.5.4 | Implement vector search | vector_store.py — query by embedding vector with filters (directory, date, tags) | 1.5.1 | M |
| 1.5.5 | Write vector store tests | test_vector_store.py — CRUD, upsert idempotency, search with filters, temp directory cleanup | 1.5.21.5.4 | M |
### 1.6 Indexer Pipeline & CLI
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 1.6.1 | Implement full index pipeline | indexer.py — scan → parse → chunk → enrich → embed → store, for all vault files | 1.2.4, 1.3.4, 1.4.1, 1.5.2 | M |
| 1.6.2 | Implement incremental sync | indexer.py — compare mtime, process only changed/deleted files | 1.6.1, 1.5.3 | M |
| 1.6.3 | Implement reindex (nuke + rebuild) | indexer.py — drop table, run full index | 1.6.1 | S |
| 1.6.4 | Implement sync-result.json writer | indexer.py — write atomic .tmp + rename with index stats | 1.6.1 | S |
| 1.6.5 | Implement CLI entry point | cli.py — obsidian-rag index/sync/reindex/status commands, NDJSON progress on stdout | 1.6.1, 1.6.2, 1.6.3 | M |
| 1.6.6 | Write indexer tests | test_indexer.py — full pipeline with mock embedder, incremental sync, reindex, CLI arg parsing | 1.6.5 | M |
---
## Phase 2: Data Layer (TypeScript Client)
### 2.1 Configuration (TypeScript)
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 2.1.1 | Implement config loader | config.ts — read JSON config, validate schema, resolve relative paths | 0.1.1 | S |
| 2.1.2 | Implement config types | config.ts — TypeScript interfaces for all config sections | 2.1.1 | S |
### 2.2 LanceDB Client
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 2.2.1 | Implement LanceDB query client | lancedb.ts — connect to existing table, perform vector search with filters | 0.1.1 | M |
| 2.2.2 | Implement full-text search fallback | lancedb.ts — LanceDB scalar query when Ollama is down (degraded mode) | 2.2.1 | S |
### 2.3 Indexer Bridge
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 2.3.1 | Implement subprocess spawner | indexer-bridge.ts — spawn python -m obsidian_rag.cli, parse NDJSON progress | 0.1.1 | M |
| 2.3.2 | Implement sync-result reader | indexer-bridge.ts — read sync-result.json, parse and return | 2.3.1 | S |
| 2.3.3 | Implement job tracking | indexer-bridge.ts — track active job (job_id, mode, progress), detect completion | 2.3.1 | S |
---
## Phase 3: Session & Transport Layers (TypeScript)
### 3.1 Health State Machine
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 3.1.1 | Implement health prober | services/health.ts — probe Ollama (/api/tags), probe LanceDB (table exists), probe vault (dir exists) | 2.1.1, 2.2.1 | S |
| 3.1.2 | Implement state machine | services/health.ts — HEALTHY/DEGRADED/UNAVAILABLE transitions, 30s re-probe timer | 3.1.1 | S |
| 3.1.3 | Implement staleness detector | services/health.ts — if last sync >1h and vault changed, set degraded | 3.1.2, 2.3.2 | S |
### 3.2 Vault Watcher
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 3.2.1 | Implement file watcher | vault-watcher.ts — chokidar watch on vault_path, respect deny/allow dirs | 2.1.1 | S |
| 3.2.2 | Implement debounce & batching | vault-watcher.ts — 2s debounce, 5s collect window, group into changeset | 3.2.1 | M |
| 3.2.3 | Implement auto-sync trigger | vault-watcher.ts — after batch, spawn indexer sync, update health on result | 3.2.2, 2.3.1, 3.1.2 | M |
| 3.2.4 | Write vault watcher tests | vault-watcher.test.ts — mock chokidar events, debounce timing, batch grouping | 3.2.3 | M |
### 3.3 Response Envelope & Error Normalization
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 3.3.1 | Implement response envelope factory | utils/response.ts — build {status, data, error, meta} from tool results | 0.1.1 | S |
| 3.3.2 | Implement error normalizer | utils/response.ts — map exceptions/codes to error codes, status, recoverable flag | 3.3.1 | S |
### 3.4 Security Guard (TypeScript)
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 3.4.1 | Implement directory filter validator | security-guard.ts — validate directory_filter against known vault dirs | 2.1.1 | S |
| 3.4.2 | Implement sensitive content flag | security-guard.ts — set sensitive_detected, generate memory_suggestion | 3.4.1 | S |
| 3.4.3 | Write security guard tests | security-guard.test.ts — invalid dirs, sensitive patterns, suggestion generation | 3.4.2 | S |
---
## Phase 4: Tool Layer (TypeScript)
### 4.1 Tool Implementations
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 4.1.1 | Implement obsidian_rag_search tool | tools/search.ts — validate params, call LanceDB search, apply filters, flag sensitive, return envelope | 2.2.1, 3.3.1, 3.4.2 | M |
| 4.1.2 | Implement obsidian_rag_index tool | tools/index.ts — validate mode, spawn indexer, return job_id, track progress | 2.3.1, 2.3.3, 3.3.1 | M |
| 4.1.3 | Implement obsidian_rag_status tool | tools/status.ts — return health state, index stats, active job, ollama status | 3.1.2, 2.3.2, 3.3.1 | S |
| 4.1.4 | Implement obsidian_rag_memory_store tool | tools/memory.ts — validate key/value/source, persist to OpenClaw memory | 3.3.1 | S |
| 4.1.5 | Write tool unit tests | search.test.ts, index.test.ts, memory.test.ts — param validation, filter logic, response shape | 4.1.14.1.4 | M |
### 4.2 Plugin Registration
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 4.2.1 | Implement plugin entry point | index.ts — Plugin.onLoad (probe deps, start watcher), register tools, Plugin.onUnload | 4.1.14.1.4, 3.2.3, 3.1.2 | M |
| 4.2.2 | Verify OpenClaw plugin lifecycle | Manual test: install → register → call tools → shutdown | 4.2.1 | S |
---
## Phase 5: Integration & Hardening
### 5.1 Integration Tests
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 5.1.1 | Full pipeline integration test | Index KnowledgeVault → search → verify results | 1.6.5, 4.2.1 | M |
| 5.1.2 | Sync cycle integration test | Modify vault file → auto-sync → search returns updated content | 3.2.3, 5.1.1 | M |
| 5.1.3 | Health state integration test | Stop Ollama → verify degraded → restart → verify healthy | 3.1.2, 5.1.1 | S |
| 5.1.4 | OpenClaw protocol integration test | Agent calls all 4 tools, validates envelope, error paths | 4.2.1 | M |
### 5.2 Security Test Suite
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 5.2.1 | Path traversal tests | ../, symlinks, absolute paths, encoded paths, Windows-specific (C:\, UNC) | 1.2.1, 3.4.1 | S |
| 5.2.2 | XSS prevention tests | HTML/script injection in chunk_text, response rendering | 1.2.2 | S |
| 5.2.3 | Prompt injection tests | Malicious vault note content attempting agent manipulation | 4.1.1 | S |
| 5.2.4 | Network audit test | Verify zero outbound requests when local_only=true | 1.4.1 | S |
| 5.2.5 | Sensitive content tests | Pattern detection, flagging in search results, blocking on external API | 1.2.3, 3.4.2 | S |
### 5.3 Documentation & Publishing
| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 5.3.1 | Write README | Usage, setup, config reference, CLI commands, OpenClaw integration | 4.2.1 | S |
| 5.3.2 | Create SKILL.md | Skill manifest for ClawHub publishing | 4.2.1 | S |
| 5.3.3 | Publish to ClawHub | clawhub skill publish + clawhub package publish | 5.1.15.2.5 | S |
---
## Dependency Map (Critical Path)
```
Phase 0 (scaffolding)
├─→ Phase 1 (Python Data Layer) ── critical path
│ │
│ └─→ Phase 2 (TS Data Client)
│ │
│ ├─→ Phase 3 (Session & Transport)
│ │ │
│ │ └─→ Phase 4 (Tools)
│ │ │
│ │ └─→ Phase 5 (Integration)
│ │
│ └─→ (3.4 Security Guard can start in parallel with 3.13.2)
└─→ (1.4 Embedder can start after 1.1 Config, parallel with 1.21.3)
```
**Critical path:** 0.1 → 1.1 → 1.3 → 1.6 → 2.2 → 3.1 → 3.2 → 4.1 → 4.2 → 5.1
**Parallelizable work:**
- 1.2 (Python security) can run parallel with 1.3 (chunker) after 1.1
- 1.4 (embedder) can run parallel with 1.3 after 1.1
- 1.5 (vector store) can run parallel with 1.31.4 after 0.2.2
- 2.1 (TS config) can run parallel with Phase 1 after 0.1.1
- 3.3 (response envelope) can run parallel with 3.13.2 after 0.1.1
- 3.4 (security guard) can run parallel with 3.13.2 after 2.1.1
---
## Effort Summary
| Phase | Work Packages | S | M | L | Estimated Sessions |
|-------|---------------|---|---|---|-------------------|
| 0: Scaffolding | 8 | 8 | 0 | 0 | 48 |
| 1: Python Data Layer | 20 | 7 | 11 | 0 | 2540 |
| 2: TS Data Client | 7 | 3 | 3 | 0 | 915 |
| 3: Session & Transport | 10 | 5 | 4 | 0 | 1320 |
| 4: Tool Layer | 7 | 1 | 5 | 0 | 1018 |
| 5: Integration & Hardening | 12 | 7 | 4 | 0 | 1522 |
| **Total** | **64** | **31** | **27** | **0** | **76123 sessions** |
---
## Risk Items
| Risk | Impact | Mitigation |
|------|--------|------------|
| Ollama mxbai-embed-large model not pulled | Blocks embedding pipeline | WBS 0.2.1 validates early; pull model before Phase 1 |
| LanceDB Python API breaking changes | Schema/query code breaks | Pin lancedb version in pyproject.toml |
| OpenClaw plugin SDK not available/stable | Plugin registration fails | Stub plugin interfaces for development; defer 4.2.2 until SDK confirmed |
| Windows path handling edge cases | Security bypass or crashes | Dedicated Windows test vectors in 5.2.1 |
| chokidar unreliable on Windows | Auto-sync misses changes | Integration test 5.1.2 validates on actual Windows FS; fallback to polling if needed |
| 677 files take too long to embed | UX poor on first index | Batch embedding (64/chunk) + NDJSON progress; measure actual time in 1.6.1 |