Files
obsidian-rag/docs/superpowers/specs/2026-04-10-obsidian-rag-wbs.md
Santhosh Janardhanan b8996d2ecb Add Work Breakdown Structure for Obsidian RAG Plugin
64 work packages across 5 phases and 15 work areas, organized
bottom-up through the layered protocol architecture. Includes
dependency map, critical path, parallelizable work, effort
estimates, and risk items.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 18:49:58 -04:00

15 KiB
Raw Blame History

Obsidian RAG Plugin — Work Breakdown Structure

Date: 2026-04-10 Based on: Technical Design Document v1.0

WBS Overview

The work is decomposed into 5 phases, 15 work areas, and 48 work packages. Phases are sequenced by dependency: foundation first, then bottom-up through the protocol layers, then integration and hardening.

Each work package follows the format:

  • ID: Hierarchical code (e.g., 1.1.2)
  • Name: Imperative, action-oriented title
  • Delivers: Concrete artifact or behavior
  • Depends on: Prerequisite WBS IDs
  • Effort: S/M/L relative sizing (S=1-2 sessions, M=3-5 sessions, L=6+ sessions)

Phase 0: Project Scaffolding & Environment

0.1 Repository & Build Setup

ID Name Delivers Depends on Effort
0.1.1 Initialize TypeScript project structure package.json, tsconfig.json, src/ directory skeleton S
0.1.2 Initialize Python package structure pyproject.toml, obsidian_rag/ module skeleton S
0.1.3 Create development config file ./obsidian-rag/config.json with ./KnowledgeVault/Default 0.1.1 S
0.1.4 Set up OpenClaw plugin manifest openclaw.plugin.json with tool declarations 0.1.1 S
0.1.5 Configure test runners vitest config (TS), pytest config (Python) 0.1.1, 0.1.2 S

0.2 Environment Validation

ID Name Delivers Depends on Effort
0.2.1 Verify Ollama + mxbai-embed-large Script that calls /api/embed and returns 1024-dim vector S
0.2.2 Verify LanceDB Python package Script that creates a table, inserts, queries S
0.2.3 Verify sample vault accessibility Script that walks ./KnowledgeVault/Default and counts .md files S

Phase 1: Data Layer (Python Indexer)

The data layer is the foundation — everything else depends on it being able to index and store vectors.

1.1 Configuration (Python)

ID Name Delivers Depends on Effort
1.1.1 Implement config loader config.py — reads JSON config, resolves paths cross-platform, validates schema 0.1.2 S
1.1.2 Write config tests test_config.py — valid/invalid config, path resolution, defaults 1.1.1 S

1.2 Security (Python)

ID Name Delivers Depends on Effort
1.2.1 Implement path traversal prevention security.py — validate_path() rejects ../, absolute, symlinks outside vault 1.1.1 S
1.2.2 Implement input sanitization security.py — sanitize_text() strips HTML, code blocks, normalizes whitespace, caps length 1.1.1 S
1.2.3 Implement sensitive content detection security.py — detect_sensitive() returns categories matched (health/financial/relations) 1.1.1 S
1.2.4 Implement directory access control security.py — should_index_dir() applies deny/allow lists 1.1.1 S
1.2.5 Write security tests test_security.py — path traversal vectors (incl. Windows), sanitization, sensitive detection, dir control 1.2.11.2.4 M

1.3 Chunking

ID Name Delivers Depends on Effort
1.3.1 Implement markdown parser chunker.py — parse frontmatter, headings, tags, date from filename 0.1.2 S
1.3.2 Implement structured chunker chunker.py — split by section headers, each section = chunk with metadata 1.3.1 M
1.3.3 Implement sliding window chunker chunker.py — 500 token window, 100 overlap, for unstructured notes 1.3.1 S
1.3.4 Implement chunk router chunker.py — detect structured vs unstructured, route to correct chunker 1.3.2, 1.3.3 S
1.3.5 Write chunker tests test_chunker.py — section splitting, sliding window, metadata, edge cases (empty, single-line) 1.3.4 M

1.4 Embedding

ID Name Delivers Depends on Effort
1.4.1 Implement Ollama embedder embedder.py — call /api/embed, batch 64 chunks, handle errors/retries 1.1.1 M
1.4.2 Implement embedding cache embedder.py — optional file-based cache to avoid re-embedding unchanged chunks 1.4.1 S
1.4.3 Write embedder tests test_embedder.py — mocked Ollama, batch handling, error recovery, cache hit/miss 1.4.1, 1.4.2 S

1.5 Vector Store

ID Name Delivers Depends on Effort
1.5.1 Implement LanceDB table creation vector_store.py — create obsidian_chunks table with schema 0.2.2 S
1.5.2 Implement vector upsert vector_store.py — add/update chunks by chunk_id 1.5.1 S
1.5.3 Implement vector delete vector_store.py — remove chunks by source_file (for deleted files) 1.5.1 S
1.5.4 Implement vector search vector_store.py — query by embedding vector with filters (directory, date, tags) 1.5.1 M
1.5.5 Write vector store tests test_vector_store.py — CRUD, upsert idempotency, search with filters, temp directory cleanup 1.5.21.5.4 M

1.6 Indexer Pipeline & CLI

ID Name Delivers Depends on Effort
1.6.1 Implement full index pipeline indexer.py — scan → parse → chunk → enrich → embed → store, for all vault files 1.2.4, 1.3.4, 1.4.1, 1.5.2 M
1.6.2 Implement incremental sync indexer.py — compare mtime, process only changed/deleted files 1.6.1, 1.5.3 M
1.6.3 Implement reindex (nuke + rebuild) indexer.py — drop table, run full index 1.6.1 S
1.6.4 Implement sync-result.json writer indexer.py — write atomic .tmp + rename with index stats 1.6.1 S
1.6.5 Implement CLI entry point cli.py — obsidian-rag index/sync/reindex/status commands, NDJSON progress on stdout 1.6.1, 1.6.2, 1.6.3 M
1.6.6 Write indexer tests test_indexer.py — full pipeline with mock embedder, incremental sync, reindex, CLI arg parsing 1.6.5 M

Phase 2: Data Layer (TypeScript Client)

2.1 Configuration (TypeScript)

ID Name Delivers Depends on Effort
2.1.1 Implement config loader config.ts — read JSON config, validate schema, resolve relative paths 0.1.1 S
2.1.2 Implement config types config.ts — TypeScript interfaces for all config sections 2.1.1 S

2.2 LanceDB Client

ID Name Delivers Depends on Effort
2.2.1 Implement LanceDB query client lancedb.ts — connect to existing table, perform vector search with filters 0.1.1 M
2.2.2 Implement full-text search fallback lancedb.ts — LanceDB scalar query when Ollama is down (degraded mode) 2.2.1 S

2.3 Indexer Bridge

ID Name Delivers Depends on Effort
2.3.1 Implement subprocess spawner indexer-bridge.ts — spawn python -m obsidian_rag.cli, parse NDJSON progress 0.1.1 M
2.3.2 Implement sync-result reader indexer-bridge.ts — read sync-result.json, parse and return 2.3.1 S
2.3.3 Implement job tracking indexer-bridge.ts — track active job (job_id, mode, progress), detect completion 2.3.1 S

Phase 3: Session & Transport Layers (TypeScript)

3.1 Health State Machine

ID Name Delivers Depends on Effort
3.1.1 Implement health prober services/health.ts — probe Ollama (/api/tags), probe LanceDB (table exists), probe vault (dir exists) 2.1.1, 2.2.1 S
3.1.2 Implement state machine services/health.ts — HEALTHY/DEGRADED/UNAVAILABLE transitions, 30s re-probe timer 3.1.1 S
3.1.3 Implement staleness detector services/health.ts — if last sync >1h and vault changed, set degraded 3.1.2, 2.3.2 S

3.2 Vault Watcher

ID Name Delivers Depends on Effort
3.2.1 Implement file watcher vault-watcher.ts — chokidar watch on vault_path, respect deny/allow dirs 2.1.1 S
3.2.2 Implement debounce & batching vault-watcher.ts — 2s debounce, 5s collect window, group into changeset 3.2.1 M
3.2.3 Implement auto-sync trigger vault-watcher.ts — after batch, spawn indexer sync, update health on result 3.2.2, 2.3.1, 3.1.2 M
3.2.4 Write vault watcher tests vault-watcher.test.ts — mock chokidar events, debounce timing, batch grouping 3.2.3 M

3.3 Response Envelope & Error Normalization

ID Name Delivers Depends on Effort
3.3.1 Implement response envelope factory utils/response.ts — build {status, data, error, meta} from tool results 0.1.1 S
3.3.2 Implement error normalizer utils/response.ts — map exceptions/codes to error codes, status, recoverable flag 3.3.1 S

3.4 Security Guard (TypeScript)

ID Name Delivers Depends on Effort
3.4.1 Implement directory filter validator security-guard.ts — validate directory_filter against known vault dirs 2.1.1 S
3.4.2 Implement sensitive content flag security-guard.ts — set sensitive_detected, generate memory_suggestion 3.4.1 S
3.4.3 Write security guard tests security-guard.test.ts — invalid dirs, sensitive patterns, suggestion generation 3.4.2 S

Phase 4: Tool Layer (TypeScript)

4.1 Tool Implementations

ID Name Delivers Depends on Effort
4.1.1 Implement obsidian_rag_search tool tools/search.ts — validate params, call LanceDB search, apply filters, flag sensitive, return envelope 2.2.1, 3.3.1, 3.4.2 M
4.1.2 Implement obsidian_rag_index tool tools/index.ts — validate mode, spawn indexer, return job_id, track progress 2.3.1, 2.3.3, 3.3.1 M
4.1.3 Implement obsidian_rag_status tool tools/status.ts — return health state, index stats, active job, ollama status 3.1.2, 2.3.2, 3.3.1 S
4.1.4 Implement obsidian_rag_memory_store tool tools/memory.ts — validate key/value/source, persist to OpenClaw memory 3.3.1 S
4.1.5 Write tool unit tests search.test.ts, index.test.ts, memory.test.ts — param validation, filter logic, response shape 4.1.14.1.4 M

4.2 Plugin Registration

ID Name Delivers Depends on Effort
4.2.1 Implement plugin entry point index.ts — Plugin.onLoad (probe deps, start watcher), register tools, Plugin.onUnload 4.1.14.1.4, 3.2.3, 3.1.2 M
4.2.2 Verify OpenClaw plugin lifecycle Manual test: install → register → call tools → shutdown 4.2.1 S

Phase 5: Integration & Hardening

5.1 Integration Tests

ID Name Delivers Depends on Effort
5.1.1 Full pipeline integration test Index KnowledgeVault → search → verify results 1.6.5, 4.2.1 M
5.1.2 Sync cycle integration test Modify vault file → auto-sync → search returns updated content 3.2.3, 5.1.1 M
5.1.3 Health state integration test Stop Ollama → verify degraded → restart → verify healthy 3.1.2, 5.1.1 S
5.1.4 OpenClaw protocol integration test Agent calls all 4 tools, validates envelope, error paths 4.2.1 M

5.2 Security Test Suite

ID Name Delivers Depends on Effort
5.2.1 Path traversal tests ../, symlinks, absolute paths, encoded paths, Windows-specific (C:, UNC) 1.2.1, 3.4.1 S
5.2.2 XSS prevention tests HTML/script injection in chunk_text, response rendering 1.2.2 S
5.2.3 Prompt injection tests Malicious vault note content attempting agent manipulation 4.1.1 S
5.2.4 Network audit test Verify zero outbound requests when local_only=true 1.4.1 S
5.2.5 Sensitive content tests Pattern detection, flagging in search results, blocking on external API 1.2.3, 3.4.2 S

5.3 Documentation & Publishing

ID Name Delivers Depends on Effort
5.3.1 Write README Usage, setup, config reference, CLI commands, OpenClaw integration 4.2.1 S
5.3.2 Create SKILL.md Skill manifest for ClawHub publishing 4.2.1 S
5.3.3 Publish to ClawHub clawhub skill publish + clawhub package publish 5.1.15.2.5 S

Dependency Map (Critical Path)

Phase 0 (scaffolding)
  │
  ├─→ Phase 1 (Python Data Layer) ── critical path
  │     │
  │     └─→ Phase 2 (TS Data Client)
  │           │
  │           ├─→ Phase 3 (Session & Transport)
  │           │     │
  │           │     └─→ Phase 4 (Tools)
  │           │           │
  │           │           └─→ Phase 5 (Integration)
  │           │
  │           └─→ (3.4 Security Guard can start in parallel with 3.13.2)
  │
  └─→ (1.4 Embedder can start after 1.1 Config, parallel with 1.21.3)

Critical path: 0.1 → 1.1 → 1.3 → 1.6 → 2.2 → 3.1 → 3.2 → 4.1 → 4.2 → 5.1

Parallelizable work:

  • 1.2 (Python security) can run parallel with 1.3 (chunker) after 1.1
  • 1.4 (embedder) can run parallel with 1.3 after 1.1
  • 1.5 (vector store) can run parallel with 1.31.4 after 0.2.2
  • 2.1 (TS config) can run parallel with Phase 1 after 0.1.1
  • 3.3 (response envelope) can run parallel with 3.13.2 after 0.1.1
  • 3.4 (security guard) can run parallel with 3.13.2 after 2.1.1

Effort Summary

Phase Work Packages S M L Estimated Sessions
0: Scaffolding 8 8 0 0 48
1: Python Data Layer 20 7 11 0 2540
2: TS Data Client 7 3 3 0 915
3: Session & Transport 10 5 4 0 1320
4: Tool Layer 7 1 5 0 1018
5: Integration & Hardening 12 7 4 0 1522
Total 64 31 27 0 76123 sessions

Risk Items

Risk Impact Mitigation
Ollama mxbai-embed-large model not pulled Blocks embedding pipeline WBS 0.2.1 validates early; pull model before Phase 1
LanceDB Python API breaking changes Schema/query code breaks Pin lancedb version in pyproject.toml
OpenClaw plugin SDK not available/stable Plugin registration fails Stub plugin interfaces for development; defer 4.2.2 until SDK confirmed
Windows path handling edge cases Security bypass or crashes Dedicated Windows test vectors in 5.2.1
chokidar unreliable on Windows Auto-sync misses changes Integration test 5.1.2 validates on actual Windows FS; fallback to polling if needed
677 files take too long to embed UX poor on first index Batch embedding (64/chunk) + NDJSON progress; measure actual time in 1.6.1