obsidian-rag/docs/superpowers/specs/2026-04-10-obsidian-rag-task-list.md at 208531d28d256ab042db8c9a3dc64b372e4ca420

Santhosh Janardhanan 5c281165c7 Sprint 0-1: Python indexer, TS plugin scaffolding, and test suite

## What's new

**Python indexer (`python/obsidian_rag/`)** — full pipeline from scan to LanceDB:
- `config.py` — JSON config loader with cross-platform path resolution
- `security.py` — path traversal prevention, HTML stripping, sensitive content detection, dir allow/deny lists
- `chunker.py` — section-split for journal entries (date-named files), sliding-window for unstructured notes
- `embedder.py` — Ollama `/api/embeddings` client with batched requests and timeout/error handling
- `vector_store.py` — LanceDB schema, upsert (merge_insert), delete, search with filters, stats
- `indexer.py` — full/sync/reindex pipeline orchestrator with progress yields
- `cli.py` — `index | sync | reindex | status` CLI commands

**TypeScript plugin (`src/`)** — OpenClaw plugin scaffold:
- `utils/` — config loader, TypeScript types, response envelope factory, LanceDB client
- `services/` — health state machine (HEALTHY/DEGRADED/UNAVAILABLE), vault watcher with debounce/batching, indexer bridge (subprocess spawner)
- `tools/` — 4 tool stubs: search, index, status, memory_store (OpenClaw wiring pending)
- `index.ts` — plugin entry point with health probe + vault watcher startup

**Config** (`obsidian-rag/config.json`, `openclaw.plugin.json`):
- 627 files / 3764 chunks indexed in dev vault

**Tests: 76 passing**
- Python: 64 pytest tests (chunker, security, vector_store, config)
- TypeScript: 12 vitest tests (lancedb client, response envelope)

## Bugs fixed

- LanceDB `tags` column filter: `LIKE '%tag%'` → `list_contains(tags, 'tag')` (List<String> column)
- LanceDB JS `db.list_tables()` returns `ListTablesResponse` object, not plain array
- LanceDB JS result score field: `_score` → `_distance`
- TypeScript regex literal with unescaped `/` in path-resolve regex
- Python: `create_table_if_not_exists` identity check → name comparison

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Phase	Tasks	Done	Pending	In Progress
Phase 0: Scaffolding	8	8	0	0
Phase 1: Python Indexer	20	16	2	2
Phase 2: TS Client	7	6	0	1
Phase 3: Session/Transport	10	8	1	1
Phase 4: Tool Layer	7	1	5	1
Phase 5: Integration	12	0	12	0
Total	64	40	20	5

9.8 KiB

Raw Blame History

Obsidian RAG Plugin - Work Queue

Legend

Phase 0: Project Scaffolding & Environment

0.1 Repository & Build Setup

0.2 Environment Validation

Phase 1: Data Layer (Python Indexer)

1.1 Configuration (Python)

1.2 Security (Python) - Can start after 1.1.1, parallel with other components

1.3 Chunking - Can start after 1.1.1, parallel with security

1.4 Embedding - Can start after 1.1.1, parallel with chunking/security

1.5 Vector Store - Can start after 0.2.2, parallel with other components

1.6 Indexer Pipeline & CLI - Depends on multiple components

Phase 2: Data Layer (TypeScript Client)

2.1 Configuration (TypeScript) - Can start after 0.1.1, parallel with Phase 1

2.2 LanceDB Client - Depends on Phase 1 completion

2.3 Indexer Bridge - Depends on Phase 1 completion

Phase 3: Session & Transport Layers

3.1 Health State Machine - Depends on Phase 2

3.2 Vault Watcher - Depends on Phase 2

3.3 Response Envelope & Error Normalization - Can start after 0.1.1, parallel

3.4 Security Guard (TypeScript) - Can start after 2.1.1, parallel with 3.1-3.2

Phase 4: Tool Layer

4.1 Tool Implementations - Depends on Phase 3

4.2 Plugin Registration - Depends on tools

Phase 5: Integration & Hardening

5.1 Integration Tests - Depends on Phase 4

5.2 Security Test Suite - Depends on relevant components

5.3 Documentation & Publishing - Depends on integration tests

Progress Summary

Critical Path

Parallel Work Opportunities

Effort Estimates

9.8 KiB Raw Blame History

Obsidian RAG Plugin - Work Queue

Legend

Phase 0: Project Scaffolding & Environment

0.1 Repository & Build Setup

0.2 Environment Validation

Phase 1: Data Layer (Python Indexer)

1.1 Configuration (Python)

1.2 Security (Python) - Can start after 1.1.1, parallel with other components

1.3 Chunking - Can start after 1.1.1, parallel with security

1.4 Embedding - Can start after 1.1.1, parallel with chunking/security

1.5 Vector Store - Can start after 0.2.2, parallel with other components

1.6 Indexer Pipeline & CLI - Depends on multiple components

Phase 2: Data Layer (TypeScript Client)

2.1 Configuration (TypeScript) - Can start after 0.1.1, parallel with Phase 1

2.2 LanceDB Client - Depends on Phase 1 completion

2.3 Indexer Bridge - Depends on Phase 1 completion

Phase 3: Session & Transport Layers

3.1 Health State Machine - Depends on Phase 2

3.2 Vault Watcher - Depends on Phase 2

3.3 Response Envelope & Error Normalization - Can start after 0.1.1, parallel

3.4 Security Guard (TypeScript) - Can start after 2.1.1, parallel with 3.1-3.2

Phase 4: Tool Layer

4.1 Tool Implementations - Depends on Phase 3

4.2 Plugin Registration - Depends on tools

Phase 5: Integration & Hardening

5.1 Integration Tests - Depends on Phase 4

5.2 Security Test Suite - Depends on relevant components

5.3 Documentation & Publishing - Depends on integration tests

Progress Summary

Critical Path

Parallel Work Opportunities

Effort Estimates

9.8 KiB

Raw Blame History