Files

Santhosh Janardhanan b8996d2ecb Add Work Breakdown Structure for Obsidian RAG Plugin

64 work packages across 5 phases and 15 work areas, organized
bottom-up through the layered protocol architecture. Includes
dependency map, critical path, parallelizable work, effort
estimates, and risk items.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-10 18:49:58 -04:00

15 KiB

Raw Blame History

Obsidian RAG Plugin — Work Breakdown Structure

Date: 2026-04-10 Based on: Technical Design Document v1.0

WBS Overview

The work is decomposed into 5 phases, 15 work areas, and 48 work packages. Phases are sequenced by dependency: foundation first, then bottom-up through the protocol layers, then integration and hardening.

Each work package follows the format:

ID: Hierarchical code (e.g., 1.1.2)
Name: Imperative, action-oriented title
Delivers: Concrete artifact or behavior
Depends on: Prerequisite WBS IDs
Effort: S/M/L relative sizing (S=1-2 sessions, M=3-5 sessions, L=6+ sessions)

Phase 0: Project Scaffolding & Environment

0.1 Repository & Build Setup

ID	Name	Delivers	Depends on	Effort
0.1.1	Initialize TypeScript project structure	package.json, tsconfig.json, src/ directory skeleton	—	S
0.1.2	Initialize Python package structure	pyproject.toml, obsidian_rag/ module skeleton	—	S
0.1.3	Create development config file	./obsidian-rag/config.json with ./KnowledgeVault/Default	0.1.1	S
0.1.4	Set up OpenClaw plugin manifest	openclaw.plugin.json with tool declarations	0.1.1	S
0.1.5	Configure test runners	vitest config (TS), pytest config (Python)	0.1.1, 0.1.2	S

0.2 Environment Validation

ID	Name	Delivers	Depends on	Effort
0.2.1	Verify Ollama + mxbai-embed-large	Script that calls /api/embed and returns 1024-dim vector	—	S
0.2.2	Verify LanceDB Python package	Script that creates a table, inserts, queries	—	S
0.2.3	Verify sample vault accessibility	Script that walks ./KnowledgeVault/Default and counts .md files	—	S

Phase 1: Data Layer (Python Indexer)

The data layer is the foundation — everything else depends on it being able to index and store vectors.

1.1 Configuration (Python)

ID	Name	Delivers	Depends on	Effort
1.1.1	Implement config loader	config.py — reads JSON config, resolves paths cross-platform, validates schema	0.1.2	S
1.1.2	Write config tests	test_config.py — valid/invalid config, path resolution, defaults	1.1.1	S

1.2 Security (Python)

ID	Name	Delivers	Depends on	Effort
1.2.1	Implement path traversal prevention	security.py — validate_path() rejects ../, absolute, symlinks outside vault	1.1.1	S
1.2.2	Implement input sanitization	security.py — sanitize_text() strips HTML, code blocks, normalizes whitespace, caps length	1.1.1	S
1.2.3	Implement sensitive content detection	security.py — detect_sensitive() returns categories matched (health/financial/relations)	1.1.1	S
1.2.4	Implement directory access control	security.py — should_index_dir() applies deny/allow lists	1.1.1	S
1.2.5	Write security tests	test_security.py — path traversal vectors (incl. Windows), sanitization, sensitive detection, dir control	1.2.1–1.2.4	M

1.3 Chunking

ID	Name	Delivers	Depends on	Effort
1.3.1	Implement markdown parser	chunker.py — parse frontmatter, headings, tags, date from filename	0.1.2	S
1.3.2	Implement structured chunker	chunker.py — split by section headers, each section = chunk with metadata	1.3.1	M
1.3.3	Implement sliding window chunker	chunker.py — 500 token window, 100 overlap, for unstructured notes	1.3.1	S
1.3.4	Implement chunk router	chunker.py — detect structured vs unstructured, route to correct chunker	1.3.2, 1.3.3	S
1.3.5	Write chunker tests	test_chunker.py — section splitting, sliding window, metadata, edge cases (empty, single-line)	1.3.4	M

1.4 Embedding

ID	Name	Delivers	Depends on	Effort
1.4.1	Implement Ollama embedder	embedder.py — call /api/embed, batch 64 chunks, handle errors/retries	1.1.1	M
1.4.2	Implement embedding cache	embedder.py — optional file-based cache to avoid re-embedding unchanged chunks	1.4.1	S
1.4.3	Write embedder tests	test_embedder.py — mocked Ollama, batch handling, error recovery, cache hit/miss	1.4.1, 1.4.2	S

1.5 Vector Store

ID	Name	Delivers	Depends on	Effort
1.5.1	Implement LanceDB table creation	vector_store.py — create obsidian_chunks table with schema	0.2.2	S
1.5.2	Implement vector upsert	vector_store.py — add/update chunks by chunk_id	1.5.1	S
1.5.3	Implement vector delete	vector_store.py — remove chunks by source_file (for deleted files)	1.5.1	S
1.5.4	Implement vector search	vector_store.py — query by embedding vector with filters (directory, date, tags)	1.5.1	M
1.5.5	Write vector store tests	test_vector_store.py — CRUD, upsert idempotency, search with filters, temp directory cleanup	1.5.2–1.5.4	M

1.6 Indexer Pipeline & CLI

ID	Name	Delivers	Depends on	Effort
1.6.1	Implement full index pipeline	indexer.py — scan → parse → chunk → enrich → embed → store, for all vault files	1.2.4, 1.3.4, 1.4.1, 1.5.2	M
1.6.2	Implement incremental sync	indexer.py — compare mtime, process only changed/deleted files	1.6.1, 1.5.3	M
1.6.3	Implement reindex (nuke + rebuild)	indexer.py — drop table, run full index	1.6.1	S
1.6.4	Implement sync-result.json writer	indexer.py — write atomic .tmp + rename with index stats	1.6.1	S
1.6.5	Implement CLI entry point	cli.py — obsidian-rag index/sync/reindex/status commands, NDJSON progress on stdout	1.6.1, 1.6.2, 1.6.3	M
1.6.6	Write indexer tests	test_indexer.py — full pipeline with mock embedder, incremental sync, reindex, CLI arg parsing	1.6.5	M

Phase 2: Data Layer (TypeScript Client)

2.1 Configuration (TypeScript)

ID	Name	Delivers	Depends on	Effort
2.1.1	Implement config loader	config.ts — read JSON config, validate schema, resolve relative paths	0.1.1	S
2.1.2	Implement config types	config.ts — TypeScript interfaces for all config sections	2.1.1	S

2.2 LanceDB Client

ID	Name	Delivers	Depends on	Effort
2.2.1	Implement LanceDB query client	lancedb.ts — connect to existing table, perform vector search with filters	0.1.1	M
2.2.2	Implement full-text search fallback	lancedb.ts — LanceDB scalar query when Ollama is down (degraded mode)	2.2.1	S

2.3 Indexer Bridge

ID	Name	Delivers	Depends on	Effort
2.3.1	Implement subprocess spawner	indexer-bridge.ts — spawn python -m obsidian_rag.cli, parse NDJSON progress	0.1.1	M
2.3.2	Implement sync-result reader	indexer-bridge.ts — read sync-result.json, parse and return	2.3.1	S
2.3.3	Implement job tracking	indexer-bridge.ts — track active job (job_id, mode, progress), detect completion	2.3.1	S

Phase 3: Session & Transport Layers (TypeScript)

3.1 Health State Machine

ID	Name	Delivers	Depends on	Effort
3.1.1	Implement health prober	services/health.ts — probe Ollama (/api/tags), probe LanceDB (table exists), probe vault (dir exists)	2.1.1, 2.2.1	S
3.1.2	Implement state machine	services/health.ts — HEALTHY/DEGRADED/UNAVAILABLE transitions, 30s re-probe timer	3.1.1	S
3.1.3	Implement staleness detector	services/health.ts — if last sync >1h and vault changed, set degraded	3.1.2, 2.3.2	S

3.2 Vault Watcher

ID	Name	Delivers	Depends on	Effort
3.2.1	Implement file watcher	vault-watcher.ts — chokidar watch on vault_path, respect deny/allow dirs	2.1.1	S
3.2.2	Implement debounce & batching	vault-watcher.ts — 2s debounce, 5s collect window, group into changeset	3.2.1	M
3.2.3	Implement auto-sync trigger	vault-watcher.ts — after batch, spawn indexer sync, update health on result	3.2.2, 2.3.1, 3.1.2	M
3.2.4	Write vault watcher tests	vault-watcher.test.ts — mock chokidar events, debounce timing, batch grouping	3.2.3	M

3.3 Response Envelope & Error Normalization

ID	Name	Delivers	Depends on	Effort
3.3.1	Implement response envelope factory	utils/response.ts — build {status, data, error, meta} from tool results	0.1.1	S
3.3.2	Implement error normalizer	utils/response.ts — map exceptions/codes to error codes, status, recoverable flag	3.3.1	S

3.4 Security Guard (TypeScript)

ID	Name	Delivers	Depends on	Effort
3.4.1	Implement directory filter validator	security-guard.ts — validate directory_filter against known vault dirs	2.1.1	S
3.4.2	Implement sensitive content flag	security-guard.ts — set sensitive_detected, generate memory_suggestion	3.4.1	S
3.4.3	Write security guard tests	security-guard.test.ts — invalid dirs, sensitive patterns, suggestion generation	3.4.2	S

Phase 4: Tool Layer (TypeScript)

4.1 Tool Implementations

ID	Name	Delivers	Depends on	Effort
4.1.1	Implement obsidian_rag_search tool	tools/search.ts — validate params, call LanceDB search, apply filters, flag sensitive, return envelope	2.2.1, 3.3.1, 3.4.2	M
4.1.2	Implement obsidian_rag_index tool	tools/index.ts — validate mode, spawn indexer, return job_id, track progress	2.3.1, 2.3.3, 3.3.1	M
4.1.3	Implement obsidian_rag_status tool	tools/status.ts — return health state, index stats, active job, ollama status	3.1.2, 2.3.2, 3.3.1	S
4.1.4	Implement obsidian_rag_memory_store tool	tools/memory.ts — validate key/value/source, persist to OpenClaw memory	3.3.1	S
4.1.5	Write tool unit tests	search.test.ts, index.test.ts, memory.test.ts — param validation, filter logic, response shape	4.1.1–4.1.4	M

4.2 Plugin Registration

ID	Name	Delivers	Depends on	Effort
4.2.1	Implement plugin entry point	index.ts — Plugin.onLoad (probe deps, start watcher), register tools, Plugin.onUnload	4.1.1–4.1.4, 3.2.3, 3.1.2	M
4.2.2	Verify OpenClaw plugin lifecycle	Manual test: install → register → call tools → shutdown	4.2.1	S

Phase 5: Integration & Hardening

5.1 Integration Tests

ID	Name	Delivers	Depends on	Effort
5.1.1	Full pipeline integration test	Index KnowledgeVault → search → verify results	1.6.5, 4.2.1	M
5.1.2	Sync cycle integration test	Modify vault file → auto-sync → search returns updated content	3.2.3, 5.1.1	M
5.1.3	Health state integration test	Stop Ollama → verify degraded → restart → verify healthy	3.1.2, 5.1.1	S
5.1.4	OpenClaw protocol integration test	Agent calls all 4 tools, validates envelope, error paths	4.2.1	M

5.2 Security Test Suite

ID	Name	Delivers	Depends on	Effort
5.2.1	Path traversal tests	../, symlinks, absolute paths, encoded paths, Windows-specific (C:, UNC)	1.2.1, 3.4.1	S
5.2.2	XSS prevention tests	HTML/script injection in chunk_text, response rendering	1.2.2	S
5.2.3	Prompt injection tests	Malicious vault note content attempting agent manipulation	4.1.1	S
5.2.4	Network audit test	Verify zero outbound requests when local_only=true	1.4.1	S
5.2.5	Sensitive content tests	Pattern detection, flagging in search results, blocking on external API	1.2.3, 3.4.2	S

5.3 Documentation & Publishing

ID	Name	Delivers	Depends on	Effort
5.3.1	Write README	Usage, setup, config reference, CLI commands, OpenClaw integration	4.2.1	S
5.3.2	Create SKILL.md	Skill manifest for ClawHub publishing	4.2.1	S
5.3.3	Publish to ClawHub	clawhub skill publish + clawhub package publish	5.1.1–5.2.5	S

Dependency Map (Critical Path)

Phase 0 (scaffolding)
  │
  ├─→ Phase 1 (Python Data Layer) ── critical path
  │     │
  │     └─→ Phase 2 (TS Data Client)
  │           │
  │           ├─→ Phase 3 (Session & Transport)
  │           │     │
  │           │     └─→ Phase 4 (Tools)
  │           │           │
  │           │           └─→ Phase 5 (Integration)
  │           │
  │           └─→ (3.4 Security Guard can start in parallel with 3.1–3.2)
  │
  └─→ (1.4 Embedder can start after 1.1 Config, parallel with 1.2–1.3)

Critical path: 0.1 → 1.1 → 1.3 → 1.6 → 2.2 → 3.1 → 3.2 → 4.1 → 4.2 → 5.1

Parallelizable work:

1.2 (Python security) can run parallel with 1.3 (chunker) after 1.1
1.4 (embedder) can run parallel with 1.3 after 1.1
1.5 (vector store) can run parallel with 1.3–1.4 after 0.2.2
2.1 (TS config) can run parallel with Phase 1 after 0.1.1
3.3 (response envelope) can run parallel with 3.1–3.2 after 0.1.1
3.4 (security guard) can run parallel with 3.1–3.2 after 2.1.1

Effort Summary

Phase	Work Packages	S	M	Estimated Sessions
0: Scaffolding	8	8	0	4–8
1: Python Data Layer	20	7	11	25–40
2: TS Data Client	7	3	3	9–15
3: Session & Transport	10	5	4	13–20
4: Tool Layer	7	1	5	10–18
5: Integration & Hardening	12	7	4	15–22
Total	64	31	27	76–123 sessions

Risk Items

Risk	Impact	Mitigation
Ollama mxbai-embed-large model not pulled	Blocks embedding pipeline	WBS 0.2.1 validates early; pull model before Phase 1
LanceDB Python API breaking changes	Schema/query code breaks	Pin lancedb version in pyproject.toml
OpenClaw plugin SDK not available/stable	Plugin registration fails	Stub plugin interfaces for development; defer 4.2.2 until SDK confirmed
Windows path handling edge cases	Security bypass or crashes	Dedicated Windows test vectors in 5.2.1
chokidar unreliable on Windows	Auto-sync misses changes	Integration test 5.1.2 validates on actual Windows FS; fallback to polling if needed
677 files take too long to embed	UX poor on first index	Batch embedding (64/chunk) + NDJSON progress; measure actual time in 1.6.1

15 KiB Raw Blame History Unescape Escape