Sprint 0-1: Python indexer, TS plugin scaffolding, and test suite
## What's new **Python indexer (`python/obsidian_rag/`)** — full pipeline from scan to LanceDB: - `config.py` — JSON config loader with cross-platform path resolution - `security.py` — path traversal prevention, HTML stripping, sensitive content detection, dir allow/deny lists - `chunker.py` — section-split for journal entries (date-named files), sliding-window for unstructured notes - `embedder.py` — Ollama `/api/embeddings` client with batched requests and timeout/error handling - `vector_store.py` — LanceDB schema, upsert (merge_insert), delete, search with filters, stats - `indexer.py` — full/sync/reindex pipeline orchestrator with progress yields - `cli.py` — `index | sync | reindex | status` CLI commands **TypeScript plugin (`src/`)** — OpenClaw plugin scaffold: - `utils/` — config loader, TypeScript types, response envelope factory, LanceDB client - `services/` — health state machine (HEALTHY/DEGRADED/UNAVAILABLE), vault watcher with debounce/batching, indexer bridge (subprocess spawner) - `tools/` — 4 tool stubs: search, index, status, memory_store (OpenClaw wiring pending) - `index.ts` — plugin entry point with health probe + vault watcher startup **Config** (`obsidian-rag/config.json`, `openclaw.plugin.json`): - 627 files / 3764 chunks indexed in dev vault **Tests: 76 passing** - Python: 64 pytest tests (chunker, security, vector_store, config) - TypeScript: 12 vitest tests (lancedb client, response envelope) ## Bugs fixed - LanceDB `tags` column filter: `LIKE '%tag%'` → `list_contains(tags, 'tag')` (List<String> column) - LanceDB JS `db.list_tables()` returns `ListTablesResponse` object, not plain array - LanceDB JS result score field: `_score` → `_distance` - TypeScript regex literal with unescaped `/` in path-resolve regex - Python: `create_table_if_not_exists` identity check → name comparison Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
|
||||
**Date:** 2026-04-10
|
||||
**Based on:** Work Breakdown Structure v1.0
|
||||
**Last Updated:** 2026-04-10
|
||||
**Last Updated:** 2026-04-10 21:30
|
||||
|
||||
## Legend
|
||||
- `[ ]` = Pending
|
||||
@@ -15,57 +15,57 @@
|
||||
## Phase 0: Project Scaffolding & Environment
|
||||
|
||||
### 0.1 Repository & Build Setup
|
||||
- [ ] **0.1.1** Initialize TypeScript project structure (S) - Create package.json, tsconfig.json, src/ directory
|
||||
- [ ] **0.1.2** Initialize Python package structure (S) - Create pyproject.toml, obsidian_rag/ module skeleton
|
||||
- [ ] **0.1.3** Create development config file (S) - Depends on 0.1.1 - Create ./obsidian-rag/config.json
|
||||
- [ ] **0.1.4** Set up OpenClaw plugin manifest (S) - Depends on 0.1.1 - Create openclaw.plugin.json
|
||||
- [ ] **0.1.5** Configure test runners (S) - Depends on 0.1.1, 0.1.2 - Setup vitest and pytest configs
|
||||
- [x] **0.1.1** Initialize TypeScript project structure (S) - Create package.json, tsconfig.json, src/ directory
|
||||
- [x] **0.1.2** Initialize Python package structure (S) - Create pyproject.toml, obsidian_rag/ module skeleton
|
||||
- [x] **0.1.3** Create development config file (S) - Depends on 0.1.1 - Create ./obsidian-rag/config.json
|
||||
- [x] **0.1.4** Set up OpenClaw plugin manifest (S) - Depends on 0.1.1 - Create openclaw.plugin.json
|
||||
- [x] **0.1.5** Configure test runners (S) - Depends on 0.1.1, 0.1.2 - Setup vitest and pytest configs
|
||||
|
||||
### 0.2 Environment Validation
|
||||
- [ ] **0.2.1** Verify Ollama + mxbai-embed-large (S) - Test embedding API
|
||||
- [ ] **0.2.2** Verify LanceDB Python package (S) - Test table creation and queries
|
||||
- [ ] **0.2.3** Verify sample vault accessibility (S) - Count .md files in KnowledgeVault
|
||||
- [x] **0.2.1** Verify Ollama + mxbai-embed-large (S) - Test embedding API
|
||||
- [x] **0.2.2** Verify LanceDB Python package (S) - Test table creation and queries
|
||||
- [x] **0.2.3** Verify sample vault accessibility (S) - Count .md files in KnowledgeVault
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Data Layer (Python Indexer)
|
||||
|
||||
### 1.1 Configuration (Python)
|
||||
- [ ] **1.1.1** Implement config loader (S) - Depends on 0.1.2 - Read JSON, resolve paths, validate schema
|
||||
- [x] **1.1.1** Implement config loader (S) - Depends on 0.1.2 - Read JSON, resolve paths, validate schema
|
||||
- [ ] **1.1.2** Write config tests (S) - Depends on 1.1.1 - Test validation and path resolution
|
||||
|
||||
### 1.2 Security (Python) - Can start after 1.1.1, parallel with other components
|
||||
- [ ] **1.2.1** Implement path traversal prevention (S) - Depends on 1.1.1 - Validate paths, reject ../ and symlinks
|
||||
- [ ] **1.2.2** Implement input sanitization (S) - Depends on 1.1.1 - Strip HTML, normalize whitespace
|
||||
- [ ] **1.2.3** Implement sensitive content detection (S) - Depends on 1.1.1 - Detect health/financial/relations content
|
||||
- [ ] **1.2.4** Implement directory access control (S) - Depends on 1.1.1 - Apply deny/allow lists
|
||||
- [ ] **1.2.5** Write security tests (M) - Depends on 1.2.1-1.2.4 - Test all security functions
|
||||
- [x] **1.2.1** Implement path traversal prevention (S) - Depends on 1.1.1 - Validate paths, reject ../ and symlinks
|
||||
- [x] **1.2.2** Implement input sanitization (S) - Depends on 1.1.1 - Strip HTML, normalize whitespace
|
||||
- [x] **1.2.3** Implement sensitive content detection (S) - Depends on 1.1.1 - Detect health/financial/relations content
|
||||
- [x] **1.2.4** Implement directory access control (S) - Depends on 1.1.1 - Apply deny/allow lists
|
||||
- [x] **1.2.5** Write security tests (M) - Depends on 1.2.1-1.2.4 - Test all security functions
|
||||
|
||||
### 1.3 Chunking - Can start after 1.1.1, parallel with security
|
||||
- [ ] **1.3.1** Implement markdown parser (S) - Depends on 0.1.2 - Parse frontmatter, headings, tags
|
||||
- [ ] **1.3.2** Implement structured chunker (M) - Depends on 1.3.1 - Split by section headers
|
||||
- [ ] **1.3.3** Implement sliding window chunker (S) - Depends on 1.3.1 - 500 token window with overlap
|
||||
- [ ] **1.3.4** Implement chunk router (S) - Depends on 1.3.2, 1.3.3 - Route structured vs unstructured
|
||||
- [ ] **1.3.5** Write chunker tests (M) - Depends on 1.3.4 - Test all chunking scenarios
|
||||
- [x] **1.3.1** Implement markdown parser (S) - Depends on 0.1.2 - Parse frontmatter, headings, tags
|
||||
- [x] **1.3.2** Implement structured chunker (M) - Depends on 1.3.1 - Split by section headers
|
||||
- [x] **1.3.3** Implement sliding window chunker (S) - Depends on 1.3.1 - 500 token window with overlap
|
||||
- [x] **1.3.4** Implement chunk router (S) - Depends on 1.3.2, 1.3.3 - Route structured vs unstructured
|
||||
- [x] **1.3.5** Write chunker tests (M) - Depends on 1.3.4 - Test all chunking scenarios
|
||||
|
||||
### 1.4 Embedding - Can start after 1.1.1, parallel with chunking/security
|
||||
- [ ] **1.4.1** Implement Ollama embedder (M) - Depends on 1.1.1 - Batch 64 chunks, error handling
|
||||
- [x] **1.4.1** Implement Ollama embedder (M) - Depends on 1.1.1 - Batch 64 chunks, error handling
|
||||
- [ ] **1.4.2** Implement embedding cache (S) - Depends on 1.4.1 - File-based cache
|
||||
- [ ] **1.4.3** Write embedder tests (S) - Depends on 1.4.1, 1.4.2 - Test batching and cache
|
||||
|
||||
### 1.5 Vector Store - Can start after 0.2.2, parallel with other components
|
||||
- [ ] **1.5.1** Implement LanceDB table creation (S) - Depends on 0.2.2 - Create obsidian_chunks table
|
||||
- [ ] **1.5.2** Implement vector upsert (S) - Depends on 1.5.1 - Add/update chunks
|
||||
- [ ] **1.5.3** Implement vector delete (S) - Depends on 1.5.1 - Remove by source_file
|
||||
- [ ] **1.5.4** Implement vector search (M) - Depends on 1.5.1 - Query with filters
|
||||
- [ ] **1.5.5** Write vector store tests (M) - Depends on 1.5.2-1.5.4 - Test CRUD operations
|
||||
- [x] **1.5.1** Implement LanceDB table creation (S) - Depends on 0.2.2 - Create obsidian_chunks table
|
||||
- [x] **1.5.2** Implement vector upsert (S) - Depends on 1.5.1 - Add/update chunks
|
||||
- [x] **1.5.3** Implement vector delete (S) - Depends on 1.5.1 - Remove by source_file
|
||||
- [x] **1.5.4** Implement vector search (M) - Depends on 1.5.1 - Query with filters
|
||||
- [x] **1.5.5** Write vector store tests (M) - Depends on 1.5.2-1.5.4 - Test CRUD operations
|
||||
|
||||
### 1.6 Indexer Pipeline & CLI - Depends on multiple components
|
||||
- [ ] **1.6.1** Implement full index pipeline (M) - Depends on 1.2.4, 1.3.4, 1.4.1, 1.5.2 - Scan → parse → chunk → embed → store
|
||||
- [ ] **1.6.2** Implement incremental sync (M) - Depends on 1.6.1, 1.5.3 - Compare mtime, process changes
|
||||
- [ ] **1.6.3** Implement reindex (S) - Depends on 1.6.1 - Drop table + rebuild
|
||||
- [ ] **1.6.4** Implement sync-result.json writer (S) - Depends on 1.6.1 - Atomic file writing
|
||||
- [ ] **1.6.5** Implement CLI entry point (M) - Depends on 1.6.1, 1.6.2, 1.6.3 - index/sync/reindex commands
|
||||
- [x] **1.6.1** Implement full index pipeline (M) - Depends on 1.2.4, 1.3.4, 1.4.1, 1.5.2 - Scan → parse → chunk → embed → store
|
||||
- [x] **1.6.2** Implement incremental sync (M) - Depends on 1.6.1, 1.5.3 - Compare mtime, process changes
|
||||
- [x] **1.6.3** Implement reindex (S) - Depends on 1.6.1 - Drop table + rebuild
|
||||
- [x] **1.6.4** Implement sync-result.json writer (S) - Depends on 1.6.1 - Atomic file writing
|
||||
- [x] **1.6.5** Implement CLI entry point (M) - Depends on 1.6.1, 1.6.2, 1.6.3 - index/sync/reindex commands
|
||||
- [ ] **1.6.6** Write indexer tests (M) - Depends on 1.6.5 - Test full pipeline and CLI
|
||||
|
||||
---
|
||||
@@ -73,40 +73,40 @@
|
||||
## Phase 2: Data Layer (TypeScript Client)
|
||||
|
||||
### 2.1 Configuration (TypeScript) - Can start after 0.1.1, parallel with Phase 1
|
||||
- [ ] **2.1.1** Implement config loader (S) - Depends on 0.1.1 - Read JSON, validate schema
|
||||
- [ ] **2.1.2** Implement config types (S) - Depends on 2.1.1 - TypeScript interfaces
|
||||
- [x] **2.1.1** Implement config loader (S) - Depends on 0.1.1 - Read JSON, validate schema
|
||||
- [x] **2.1.2** Implement config types (S) - Depends on 2.1.1 - TypeScript interfaces
|
||||
|
||||
### 2.2 LanceDB Client - Depends on Phase 1 completion
|
||||
- [ ] **2.2.1** Implement LanceDB query client (M) - Depends on 0.1.1 - Connect and search
|
||||
- [ ] **2.2.2** Implement full-text search fallback (S) - Depends on 2.2.1 - Degraded mode
|
||||
- [x] **2.2.1** Implement LanceDB query client (M) - Depends on 0.1.1 - Connect and search
|
||||
- [~] **2.2.2** Implement full-text search fallback (S) - Depends on 2.2.1 - Degraded mode
|
||||
|
||||
### 2.3 Indexer Bridge - Depends on Phase 1 completion
|
||||
- [ ] **2.3.1** Implement subprocess spawner (M) - Depends on 0.1.1 - Spawn Python CLI
|
||||
- [ ] **2.3.2** Implement sync-result reader (S) - Depends on 2.3.1 - Read sync results
|
||||
- [ ] **2.3.3** Implement job tracking (S) - Depends on 2.3.1 - Track progress
|
||||
- [x] **2.3.1** Implement subprocess spawner (M) - Depends on 0.1.1 - Spawn Python CLI
|
||||
- [x] **2.3.2** Implement sync-result reader (S) - Depends on 2.3.1 - Read sync results
|
||||
- [x] **2.3.3** Implement job tracking (S) - Depends on 2.3.1 - Track progress
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Session & Transport Layers
|
||||
|
||||
### 3.1 Health State Machine - Depends on Phase 2
|
||||
- [ ] **3.1.1** Implement health prober (S) - Depends on 2.1.1, 2.2.1 - Probe dependencies
|
||||
- [ ] **3.1.2** Implement state machine (S) - Depends on 3.1.1 - HEALTHY/DEGRADED/UNAVAILABLE
|
||||
- [ ] **3.1.3** Implement staleness detector (S) - Depends on 3.1.2, 2.3.2 - Detect stale syncs
|
||||
- [x] **3.1.1** Implement health prober (S) - Depends on 2.1.1, 2.2.1 - Probe dependencies
|
||||
- [x] **3.1.2** Implement state machine (S) - Depends on 3.1.1 - HEALTHY/DEGRADED/UNAVAILABLE
|
||||
- [x] **3.1.3** Implement staleness detector (S) - Depends on 3.1.2, 2.3.2 - Detect stale syncs
|
||||
|
||||
### 3.2 Vault Watcher - Depends on Phase 2
|
||||
- [ ] **3.2.1** Implement file watcher (S) - Depends on 2.1.1 - Watch vault directory
|
||||
- [ ] **3.2.2** Implement debounce & batching (M) - Depends on 3.2.1 - Batch changes
|
||||
- [ ] **3.2.3** Implement auto-sync trigger (M) - Depends on 3.2.2, 2.3.1, 3.1.2 - Trigger sync
|
||||
- [x] **3.2.1** Implement file watcher (S) - Depends on 2.1.1 - Watch vault directory
|
||||
- [x] **3.2.2** Implement debounce & batching (M) - Depends on 3.2.1 - Batch changes
|
||||
- [x] **3.2.3** Implement auto-sync trigger (M) - Depends on 3.2.2, 2.3.1, 3.1.2 - Trigger sync
|
||||
- [ ] **3.2.4** Write vault watcher tests (M) - Depends on 3.2.3 - Test watcher behavior
|
||||
|
||||
### 3.3 Response Envelope & Error Normalization - Can start after 0.1.1, parallel
|
||||
- [ ] **3.3.1** Implement response envelope factory (S) - Depends on 0.1.1 - Build response structure
|
||||
- [ ] **3.3.2** Implement error normalizer (S) - Depends on 3.3.1 - Map exceptions to codes
|
||||
- [x] **3.3.1** Implement response envelope factory (S) - Depends on 0.1.1 - Build response structure
|
||||
- [x] **3.3.2** Implement error normalizer (S) - Depends on 3.3.1 - Map exceptions to codes
|
||||
|
||||
### 3.4 Security Guard (TypeScript) - Can start after 2.1.1, parallel with 3.1-3.2
|
||||
- [ ] **3.4.1** Implement directory filter validator (S) - Depends on 2.1.1 - Validate filters
|
||||
- [ ] **3.4.2** Implement sensitive content flag (S) - Depends on 3.4.1 - Flag sensitive content
|
||||
- [x] **3.4.1** Implement directory filter validator (S) - Depends on 2.1.1 - Validate filters
|
||||
- [x] **3.4.2** Implement sensitive content flag (S) - Depends on 3.4.1 - Flag sensitive content
|
||||
- [ ] **3.4.3** Write security guard tests (S) - Depends on 3.4.2 - Test security functions
|
||||
|
||||
---
|
||||
@@ -114,14 +114,14 @@
|
||||
## Phase 4: Tool Layer
|
||||
|
||||
### 4.1 Tool Implementations - Depends on Phase 3
|
||||
- [ ] **4.1.1** Implement obsidian_rag_search tool (M) - Depends on 2.2.1, 3.3.1, 3.4.2 - Search with filters
|
||||
- [ ] **4.1.2** Implement obsidian_rag_index tool (M) - Depends on 2.3.1, 2.3.3, 3.3.1 - Spawn indexer
|
||||
- [ ] **4.1.3** Implement obsidian_rag_status tool (S) - Depends on 3.1.2, 2.3.2, 3.3.1 - Return health status
|
||||
- [ ] **4.1.4** Implement obsidian_rag_memory_store tool (S) - Depends on 3.3.1 - Persist to memory
|
||||
- [~] **4.1.1** Implement obsidian_rag_search tool (M) - Depends on 2.2.1, 3.3.1, 3.4.2 - Search with filters ⚠️ LanceDB TS client now wired, needs OpenClaw integration
|
||||
- [~] **4.1.2** Implement obsidian_rag_index tool (M) - Depends on 2.3.1, 2.3.3, 3.3.1 - Spawn indexer ⚠️ stub — tool registration not wired to OpenClaw
|
||||
- [~] **4.1.3** Implement obsidian_rag_status tool (S) - Depends on 3.1.2, 2.3.2, 3.3.1 - Return health status ⚠️ stub — reads sync-result not LanceDB stats
|
||||
- [~] **4.1.4** Implement obsidian_rag_memory_store tool (S) - Depends on 3.3.1 - Persist to memory ⚠️ stub — no-op
|
||||
- [ ] **4.1.5** Write tool unit tests (M) - Depends on 4.1.1-4.1.4 - Test all tools
|
||||
|
||||
### 4.2 Plugin Registration - Depends on tools
|
||||
- [ ] **4.2.1** Implement plugin entry point (M) - Depends on 4.1.1-4.1.4, 3.2.3, 3.1.2 - Plugin lifecycle
|
||||
- [~] **4.2.1** Implement plugin entry point (M) - Depends on 4.1.1-4.1.4, 3.2.3, 3.1.2 - Plugin lifecycle ⚠️ stub — tools registration is a TODO
|
||||
- [ ] **4.2.2** Verify OpenClaw plugin lifecycle (S) - Depends on 4.2.1 - Manual test
|
||||
|
||||
---
|
||||
@@ -152,13 +152,13 @@
|
||||
|
||||
| Phase | Tasks | Done | Pending | In Progress | Blocked |
|
||||
|-------|-------|------|---------|-------------|---------|
|
||||
| Phase 0: Scaffolding | 8 | 0 | 8 | 0 | 0 |
|
||||
| Phase 1: Python Indexer | 20 | 0 | 20 | 0 | 0 |
|
||||
| Phase 2: TS Client | 7 | 0 | 7 | 0 | 0 |
|
||||
| Phase 3: Session/Transport | 10 | 0 | 10 | 0 | 0 |
|
||||
| Phase 4: Tool Layer | 7 | 0 | 7 | 0 | 0 |
|
||||
| Phase 0: Scaffolding | 8 | 8 | 0 | 0 | 0 |
|
||||
| Phase 1: Python Indexer | 20 | 16 | 2 | 2 | 0 |
|
||||
| Phase 2: TS Client | 7 | 6 | 0 | 1 | 0 |
|
||||
| Phase 3: Session/Transport | 10 | 8 | 1 | 1 | 0 |
|
||||
| Phase 4: Tool Layer | 7 | 1 | 5 | 1 | 0 |
|
||||
| Phase 5: Integration | 12 | 0 | 12 | 0 | 0 |
|
||||
| **Total** | **64** | **0** | **64** | **0** | **0** |
|
||||
| **Total** | **64** | **40** | **20** | **5** | **0** |
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user