Files
obsidian-rag/AGENTS.md
Santhosh Janardhanan 34f3ce97f7 feat(indexer): hierarchical chunking for large sections
- Section-split first for structured notes
- Large sections (>max_section_chars) broken via sliding-window
- Small sections stay intact with heading preserved
- Adds max_section_chars config (default 4000)
- 2 new TDD tests for hierarchical chunking
2026-04-11 23:58:05 -04:00

77 lines
2.5 KiB
Markdown

# AGENTS.md
## Stack
Two independent packages in one repo:
| Directory | Role | Entry | Build |
|-----------|------|-------|-------|
| `src/` | TypeScript OpenClaw plugin | `src/index.ts` | esbuild → `dist/index.js` |
| `python/` | Python CLI indexer | `obsidian_rag/cli.py` | pip install -e |
## Commands
**TypeScript (OpenClaw plugin):**
```bash
npm run build # esbuild → dist/index.js
npm run typecheck # tsc --noEmit
npm run test # vitest run
```
**Python (RAG indexer):**
```bash
pip install -e python/ # editable install
obsidian-rag index|sync|reindex|status # CLI
pytest python/ # tests
ruff check python/ # lint
```
## OpenClaw Plugin Install
Plugin `package.json` MUST have:
```json
"openclaw": {
"extensions": ["./dist/index.js"],
"hook": []
}
```
- `extensions` = array, string path
- `hook` = singular, not `hooks`
## Config
User config at `~/.obsidian-rag/config.json` or `./obsidian-rag/` dev config.
Key indexing fields:
- `indexing.chunk_size` — sliding window chunk size (default 500)
- `indexing.chunk_overlap` — overlap between chunks (default 100)
- `indexing.max_section_chars` — max chars per section before hierarchical split (default 4000)
Key security fields:
- `security.require_confirmation_for` — list of categories (e.g. `["health", "financial_debt"]`). Empty list disables guard.
- `security.auto_approve_sensitive``true` bypasses sensitive content prompts.
- `security.local_only``true` blocks non-localhost Ollama.
## Ollama Context Length
`python/obsidian_rag/embedder.py` truncates chunks at `MAX_CHUNK_CHARS = 8000` before embedding. If Ollama 500 error returns, increase `max_section_chars` (to reduce section sizes) or reduce `chunk_size` in config.
## Hierarchical Chunking
Structured notes (date-named files) use section-split first, then sliding-window within sections that exceed `max_section_chars`. Small sections stay intact; large sections are broken into sub-chunks with the parent section heading preserved.
## Sensitive Content Guard
Triggered by categories in `require_confirmation_for`. Raises `SensitiveContentError` from `obsidian_rag/indexer.py`.
To disable: set `require_confirmation_for: []` or `auto_approve_sensitive: true` in config.
## Architecture
```
User query → OpenClaw (TypeScript plugin src/index.ts)
→ obsidian_rag_* tools (python/obsidian_rag/)
→ Ollama embeddings (http://localhost:11434)
→ LanceDB vector store
```