Commit Graph

5 Commits

Author SHA1 Message Date
34f3ce97f7 feat(indexer): hierarchical chunking for large sections
- Section-split first for structured notes
- Large sections (>max_section_chars) broken via sliding-window
- Small sections stay intact with heading preserved
- Adds max_section_chars config (default 4000)
- 2 new TDD tests for hierarchical chunking
2026-04-11 23:58:05 -04:00
e15e4ff856 fix: add configSchema to openclaw.plugin.json, add search CLI command, fix total_docs stat
- Add required configSchema to openclaw.plugin.json for OpenClaw plugin discovery
- Add search command to CLI with --limit, --dir, --from-date, --to-date, --tags filters
- Fix get_stats() to properly count unique docs (was returning 0 for non-null values)
- Remove hardcoded max_results default of 5; search now returns all results by default
- Update INSTALL.md and design docs with correct OpenClaw extension path instructions
2026-04-11 20:01:09 -04:00
de3b9c1c12 updates to install procedures 2026-04-11 16:58:46 -04:00
da1cf8bb60 feat: wire all 4 obsidian_rag tools to OpenClaw via AnyAgentTool factory
- Replace TypeBox + AgentToolResult with native OpenClaw AnyAgentTool pattern
- Add id, openclaw, main fields to openclaw.plugin.json manifest
- registerTools() now uses factory helpers returning typed AnyAgentTool objects
- toAgentResult() adapter bridges search/index/status/memory results to AgentToolResult shape
- Build clean — pi-agent-core peer dep not needed, openclaw exports all types
- Task list updated: Phase 4 tools + plugin registration marked complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 14:12:22 -04:00
5c281165c7 Sprint 0-1: Python indexer, TS plugin scaffolding, and test suite
## What's new

**Python indexer (`python/obsidian_rag/`)** — full pipeline from scan to LanceDB:
- `config.py` — JSON config loader with cross-platform path resolution
- `security.py` — path traversal prevention, HTML stripping, sensitive content detection, dir allow/deny lists
- `chunker.py` — section-split for journal entries (date-named files), sliding-window for unstructured notes
- `embedder.py` — Ollama `/api/embeddings` client with batched requests and timeout/error handling
- `vector_store.py` — LanceDB schema, upsert (merge_insert), delete, search with filters, stats
- `indexer.py` — full/sync/reindex pipeline orchestrator with progress yields
- `cli.py` — `index | sync | reindex | status` CLI commands

**TypeScript plugin (`src/`)** — OpenClaw plugin scaffold:
- `utils/` — config loader, TypeScript types, response envelope factory, LanceDB client
- `services/` — health state machine (HEALTHY/DEGRADED/UNAVAILABLE), vault watcher with debounce/batching, indexer bridge (subprocess spawner)
- `tools/` — 4 tool stubs: search, index, status, memory_store (OpenClaw wiring pending)
- `index.ts` — plugin entry point with health probe + vault watcher startup

**Config** (`obsidian-rag/config.json`, `openclaw.plugin.json`):
- 627 files / 3764 chunks indexed in dev vault

**Tests: 76 passing**
- Python: 64 pytest tests (chunker, security, vector_store, config)
- TypeScript: 12 vitest tests (lancedb client, response envelope)

## Bugs fixed

- LanceDB `tags` column filter: `LIKE '%tag%'` → `list_contains(tags, 'tag')` (List<String> column)
- LanceDB JS `db.list_tables()` returns `ListTablesResponse` object, not plain array
- LanceDB JS result score field: `_score` → `_distance`
- TypeScript regex literal with unescaped `/` in path-resolve regex
- Python: `create_table_if_not_exists` identity check → name comparison

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 22:56:50 -04:00