obsidian-rag/docs/superpowers/specs/2026-04-10-obsidian-rag-wbs.md

# Obsidian RAG Plugin — Work Breakdown Structure

**Date:** 2026-04-10
**Based on:** Technical Design Document v1.0

## WBS Overview

The work is decomposed into **5 phases**, **15 work areas**, and **48 work packages**. Phases are sequenced by dependency: foundation first, then bottom-up through the protocol layers, then integration and hardening.

Each work package follows the format:
- **ID**: Hierarchical code (e.g., 1.1.2)
- **Name**: Imperative, action-oriented title
- **Delivers**: Concrete artifact or behavior
- **Depends on**: Prerequisite WBS IDs
- **Effort**: S/M/L relative sizing (S=1-2 sessions, M=3-5 sessions, L=6+ sessions)

---

## Phase 0: Project Scaffolding & Environment

### 0.1 Repository & Build Setup

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 0.1.1 | Initialize TypeScript project structure | package.json, tsconfig.json, src/ directory skeleton | — | S |
| 0.1.2 | Initialize Python package structure | pyproject.toml, obsidian_rag/ module skeleton | — | S |
| 0.1.3 | Create development config file | ./obsidian-rag/config.json with ./KnowledgeVault/Default | 0.1.1 | S |
| 0.1.4 | Set up OpenClaw plugin manifest | openclaw.plugin.json with tool declarations | 0.1.1 | S |
| 0.1.5 | Configure test runners | vitest config (TS), pytest config (Python) | 0.1.1, 0.1.2 | S |

### 0.2 Environment Validation

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 0.2.1 | Verify Ollama + mxbai-embed-large | Script that calls /api/embed and returns 1024-dim vector | — | S |
| 0.2.2 | Verify LanceDB Python package | Script that creates a table, inserts, queries | — | S |
| 0.2.3 | Verify sample vault accessibility | Script that walks ./KnowledgeVault/Default and counts .md files | — | S |

---

## Phase 1: Data Layer (Python Indexer)

The data layer is the foundation — everything else depends on it being able to index and store vectors.

### 1.1 Configuration (Python)

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 1.1.1 | Implement config loader | config.py — reads JSON config, resolves paths cross-platform, validates schema | 0.1.2 | S |
| 1.1.2 | Write config tests | test_config.py — valid/invalid config, path resolution, defaults | 1.1.1 | S |

### 1.2 Security (Python)

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 1.2.1 | Implement path traversal prevention | security.py — validate_path() rejects ../, absolute, symlinks outside vault | 1.1.1 | S |
| 1.2.2 | Implement input sanitization | security.py — sanitize_text() strips HTML, code blocks, normalizes whitespace, caps length | 1.1.1 | S |
| 1.2.3 | Implement sensitive content detection | security.py — detect_sensitive() returns categories matched (health/financial/relations) | 1.1.1 | S |
| 1.2.4 | Implement directory access control | security.py — should_index_dir() applies deny/allow lists | 1.1.1 | S |
| 1.2.5 | Write security tests | test_security.py — path traversal vectors (incl. Windows), sanitization, sensitive detection, dir control | 1.2.1–1.2.4 | M |

### 1.3 Chunking

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 1.3.1 | Implement markdown parser | chunker.py — parse frontmatter, headings, tags, date from filename | 0.1.2 | S |
| 1.3.2 | Implement structured chunker | chunker.py — split by section headers, each section = chunk with metadata | 1.3.1 | M |
| 1.3.3 | Implement sliding window chunker | chunker.py — 500 token window, 100 overlap, for unstructured notes | 1.3.1 | S |
| 1.3.4 | Implement chunk router | chunker.py — detect structured vs unstructured, route to correct chunker | 1.3.2, 1.3.3 | S |
| 1.3.5 | Write chunker tests | test_chunker.py — section splitting, sliding window, metadata, edge cases (empty, single-line) | 1.3.4 | M |

### 1.4 Embedding

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 1.4.1 | Implement Ollama embedder | embedder.py — call /api/embed, batch 64 chunks, handle errors/retries | 1.1.1 | M |
| 1.4.2 | Implement embedding cache | embedder.py — optional file-based cache to avoid re-embedding unchanged chunks | 1.4.1 | S |
| 1.4.3 | Write embedder tests | test_embedder.py — mocked Ollama, batch handling, error recovery, cache hit/miss | 1.4.1, 1.4.2 | S |

### 1.5 Vector Store

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 1.5.1 | Implement LanceDB table creation | vector_store.py — create obsidian_chunks table with schema | 0.2.2 | S |
| 1.5.2 | Implement vector upsert | vector_store.py — add/update chunks by chunk_id | 1.5.1 | S |
| 1.5.3 | Implement vector delete | vector_store.py — remove chunks by source_file (for deleted files) | 1.5.1 | S |
| 1.5.4 | Implement vector search | vector_store.py — query by embedding vector with filters (directory, date, tags) | 1.5.1 | M |
| 1.5.5 | Write vector store tests | test_vector_store.py — CRUD, upsert idempotency, search with filters, temp directory cleanup | 1.5.2–1.5.4 | M |

### 1.6 Indexer Pipeline & CLI

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 1.6.1 | Implement full index pipeline | indexer.py — scan → parse → chunk → enrich → embed → store, for all vault files | 1.2.4, 1.3.4, 1.4.1, 1.5.2 | M |
| 1.6.2 | Implement incremental sync | indexer.py — compare mtime, process only changed/deleted files | 1.6.1, 1.5.3 | M |
| 1.6.3 | Implement reindex (nuke + rebuild) | indexer.py — drop table, run full index | 1.6.1 | S |
| 1.6.4 | Implement sync-result.json writer | indexer.py — write atomic .tmp + rename with index stats | 1.6.1 | S |
| 1.6.5 | Implement CLI entry point | cli.py — obsidian-rag index/sync/reindex/status commands, NDJSON progress on stdout | 1.6.1, 1.6.2, 1.6.3 | M |
| 1.6.6 | Write indexer tests | test_indexer.py — full pipeline with mock embedder, incremental sync, reindex, CLI arg parsing | 1.6.5 | M |

---

## Phase 2: Data Layer (TypeScript Client)

### 2.1 Configuration (TypeScript)

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 2.1.1 | Implement config loader | config.ts — read JSON config, validate schema, resolve relative paths | 0.1.1 | S |
| 2.1.2 | Implement config types | config.ts — TypeScript interfaces for all config sections | 2.1.1 | S |

### 2.2 LanceDB Client

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 2.2.1 | Implement LanceDB query client | lancedb.ts — connect to existing table, perform vector search with filters | 0.1.1 | M |
| 2.2.2 | Implement full-text search fallback | lancedb.ts — LanceDB scalar query when Ollama is down (degraded mode) | 2.2.1 | S |

### 2.3 Indexer Bridge

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 2.3.1 | Implement subprocess spawner | indexer-bridge.ts — spawn python -m obsidian_rag.cli, parse NDJSON progress | 0.1.1 | M |
| 2.3.2 | Implement sync-result reader | indexer-bridge.ts — read sync-result.json, parse and return | 2.3.1 | S |
| 2.3.3 | Implement job tracking | indexer-bridge.ts — track active job (job_id, mode, progress), detect completion | 2.3.1 | S |

---

## Phase 3: Session & Transport Layers (TypeScript)

### 3.1 Health State Machine

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 3.1.1 | Implement health prober | services/health.ts — probe Ollama (/api/tags), probe LanceDB (table exists), probe vault (dir exists) | 2.1.1, 2.2.1 | S |
| 3.1.2 | Implement state machine | services/health.ts — HEALTHY/DEGRADED/UNAVAILABLE transitions, 30s re-probe timer | 3.1.1 | S |
| 3.1.3 | Implement staleness detector | services/health.ts — if last sync >1h and vault changed, set degraded | 3.1.2, 2.3.2 | S |

### 3.2 Vault Watcher

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 3.2.1 | Implement file watcher | vault-watcher.ts — chokidar watch on vault_path, respect deny/allow dirs | 2.1.1 | S |
| 3.2.2 | Implement debounce & batching | vault-watcher.ts — 2s debounce, 5s collect window, group into changeset | 3.2.1 | M |
| 3.2.3 | Implement auto-sync trigger | vault-watcher.ts — after batch, spawn indexer sync, update health on result | 3.2.2, 2.3.1, 3.1.2 | M |
| 3.2.4 | Write vault watcher tests | vault-watcher.test.ts — mock chokidar events, debounce timing, batch grouping | 3.2.3 | M |

### 3.3 Response Envelope & Error Normalization

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 3.3.1 | Implement response envelope factory | utils/response.ts — build {status, data, error, meta} from tool results | 0.1.1 | S |
| 3.3.2 | Implement error normalizer | utils/response.ts — map exceptions/codes to error codes, status, recoverable flag | 3.3.1 | S |

### 3.4 Security Guard (TypeScript)

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 3.4.1 | Implement directory filter validator | security-guard.ts — validate directory_filter against known vault dirs | 2.1.1 | S |
| 3.4.2 | Implement sensitive content flag | security-guard.ts — set sensitive_detected, generate memory_suggestion | 3.4.1 | S |
| 3.4.3 | Write security guard tests | security-guard.test.ts — invalid dirs, sensitive patterns, suggestion generation | 3.4.2 | S |

---

## Phase 4: Tool Layer (TypeScript)

### 4.1 Tool Implementations

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 4.1.1 | Implement obsidian_rag_search tool | tools/search.ts — validate params, call LanceDB search, apply filters, flag sensitive, return envelope | 2.2.1, 3.3.1, 3.4.2 | M |
| 4.1.2 | Implement obsidian_rag_index tool | tools/index.ts — validate mode, spawn indexer, return job_id, track progress | 2.3.1, 2.3.3, 3.3.1 | M |
| 4.1.3 | Implement obsidian_rag_status tool | tools/status.ts — return health state, index stats, active job, ollama status | 3.1.2, 2.3.2, 3.3.1 | S |
| 4.1.4 | Implement obsidian_rag_memory_store tool | tools/memory.ts — validate key/value/source, persist to OpenClaw memory | 3.3.1 | S |
| 4.1.5 | Write tool unit tests | search.test.ts, index.test.ts, memory.test.ts — param validation, filter logic, response shape | 4.1.1–4.1.4 | M |

### 4.2 Plugin Registration

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 4.2.1 | Implement plugin entry point | index.ts — Plugin.onLoad (probe deps, start watcher), register tools, Plugin.onUnload | 4.1.1–4.1.4, 3.2.3, 3.1.2 | M |
| 4.2.2 | Verify OpenClaw plugin lifecycle | Manual test: install → register → call tools → shutdown | 4.2.1 | S |

---

## Phase 5: Integration & Hardening

### 5.1 Integration Tests

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 5.1.1 | Full pipeline integration test | Index KnowledgeVault → search → verify results | 1.6.5, 4.2.1 | M |
| 5.1.2 | Sync cycle integration test | Modify vault file → auto-sync → search returns updated content | 3.2.3, 5.1.1 | M |
| 5.1.3 | Health state integration test | Stop Ollama → verify degraded → restart → verify healthy | 3.1.2, 5.1.1 | S |
| 5.1.4 | OpenClaw protocol integration test | Agent calls all 4 tools, validates envelope, error paths | 4.2.1 | M |

### 5.2 Security Test Suite

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 5.2.1 | Path traversal tests | ../, symlinks, absolute paths, encoded paths, Windows-specific (C:\, UNC) | 1.2.1, 3.4.1 | S |
| 5.2.2 | XSS prevention tests | HTML/script injection in chunk_text, response rendering | 1.2.2 | S |
| 5.2.3 | Prompt injection tests | Malicious vault note content attempting agent manipulation | 4.1.1 | S |
| 5.2.4 | Network audit test | Verify zero outbound requests when local_only=true | 1.4.1 | S |
| 5.2.5 | Sensitive content tests | Pattern detection, flagging in search results, blocking on external API | 1.2.3, 3.4.2 | S |

### 5.3 Documentation & Publishing

| ID | Name | Delivers | Depends on | Effort |
|----|------|----------|------------|--------|
| 5.3.1 | Write README | Usage, setup, config reference, CLI commands, OpenClaw integration | 4.2.1 | S |
| 5.3.2 | Create SKILL.md | Skill manifest for ClawHub publishing | 4.2.1 | S |
| 5.3.3 | Publish to ClawHub | clawhub skill publish + clawhub package publish | 5.1.1–5.2.5 | S |

---

## Dependency Map (Critical Path)

```
Phase 0 (scaffolding)
  │
  ├─→ Phase 1 (Python Data Layer) ── critical path
  │     │
  │     └─→ Phase 2 (TS Data Client)
  │           │
  │           ├─→ Phase 3 (Session & Transport)
  │           │     │
  │           │     └─→ Phase 4 (Tools)
  │           │           │
  │           │           └─→ Phase 5 (Integration)
  │           │
  │           └─→ (3.4 Security Guard can start in parallel with 3.1–3.2)
  │
  └─→ (1.4 Embedder can start after 1.1 Config, parallel with 1.2–1.3)
```

**Critical path:** 0.1 → 1.1 → 1.3 → 1.6 → 2.2 → 3.1 → 3.2 → 4.1 → 4.2 → 5.1

**Parallelizable work:**
- 1.2 (Python security) can run parallel with 1.3 (chunker) after 1.1
- 1.4 (embedder) can run parallel with 1.3 after 1.1
- 1.5 (vector store) can run parallel with 1.3–1.4 after 0.2.2
- 2.1 (TS config) can run parallel with Phase 1 after 0.1.1
- 3.3 (response envelope) can run parallel with 3.1–3.2 after 0.1.1
- 3.4 (security guard) can run parallel with 3.1–3.2 after 2.1.1

---

## Effort Summary

| Phase | Work Packages | S | M | L | Estimated Sessions |
|-------|---------------|---|---|---|-------------------|
| 0: Scaffolding | 8 | 8 | 0 | 0 | 4–8 |
| 1: Python Data Layer | 20 | 7 | 11 | 0 | 25–40 |
| 2: TS Data Client | 7 | 3 | 3 | 0 | 9–15 |
| 3: Session & Transport | 10 | 5 | 4 | 0 | 13–20 |
| 4: Tool Layer | 7 | 1 | 5 | 0 | 10–18 |
| 5: Integration & Hardening | 12 | 7 | 4 | 0 | 15–22 |
| **Total** | **64** | **31** | **27** | **0** | **76–123 sessions** |

---

## Risk Items

| Risk | Impact | Mitigation |
|------|--------|------------|
| Ollama mxbai-embed-large model not pulled | Blocks embedding pipeline | WBS 0.2.1 validates early; pull model before Phase 1 |
| LanceDB Python API breaking changes | Schema/query code breaks | Pin lancedb version in pyproject.toml |
| OpenClaw plugin SDK not available/stable | Plugin registration fails | Stub plugin interfaces for development; defer 4.2.2 until SDK confirmed |
| Windows path handling edge cases | Security bypass or crashes | Dedicated Windows test vectors in 5.2.1 |
| chokidar unreliable on Windows | Auto-sync misses changes | Integration test 5.1.2 validates on actual Windows FS; fallback to polling if needed |
| 677 files take too long to embed | UX poor on first index | Batch embedding (64/chunk) + NDJSON progress; measure actual time in 1.6.1 |