Files
obsidian-rag/AGENTS.md
Santhosh Janardhanan 34f3ce97f7 feat(indexer): hierarchical chunking for large sections
- Section-split first for structured notes
- Large sections (>max_section_chars) broken via sliding-window
- Small sections stay intact with heading preserved
- Adds max_section_chars config (default 4000)
- 2 new TDD tests for hierarchical chunking
2026-04-11 23:58:05 -04:00

2.5 KiB

AGENTS.md

Stack

Two independent packages in one repo:

Directory Role Entry Build
src/ TypeScript OpenClaw plugin src/index.ts esbuild → dist/index.js
python/ Python CLI indexer obsidian_rag/cli.py pip install -e

Commands

TypeScript (OpenClaw plugin):

npm run build     # esbuild → dist/index.js
npm run typecheck # tsc --noEmit
npm run test      # vitest run

Python (RAG indexer):

pip install -e python/          # editable install
obsidian-rag index|sync|reindex|status   # CLI
pytest python/                 # tests
ruff check python/              # lint

OpenClaw Plugin Install

Plugin package.json MUST have:

"openclaw": {
  "extensions": ["./dist/index.js"],
  "hook": []
}
  • extensions = array, string path
  • hook = singular, not hooks

Config

User config at ~/.obsidian-rag/config.json or ./obsidian-rag/ dev config.

Key indexing fields:

  • indexing.chunk_size — sliding window chunk size (default 500)
  • indexing.chunk_overlap — overlap between chunks (default 100)
  • indexing.max_section_chars — max chars per section before hierarchical split (default 4000)

Key security fields:

  • security.require_confirmation_for — list of categories (e.g. ["health", "financial_debt"]). Empty list disables guard.
  • security.auto_approve_sensitivetrue bypasses sensitive content prompts.
  • security.local_onlytrue blocks non-localhost Ollama.

Ollama Context Length

python/obsidian_rag/embedder.py truncates chunks at MAX_CHUNK_CHARS = 8000 before embedding. If Ollama 500 error returns, increase max_section_chars (to reduce section sizes) or reduce chunk_size in config.

Hierarchical Chunking

Structured notes (date-named files) use section-split first, then sliding-window within sections that exceed max_section_chars. Small sections stay intact; large sections are broken into sub-chunks with the parent section heading preserved.

Sensitive Content Guard

Triggered by categories in require_confirmation_for. Raises SensitiveContentError from obsidian_rag/indexer.py.

To disable: set require_confirmation_for: [] or auto_approve_sensitive: true in config.

Architecture

User query → OpenClaw (TypeScript plugin src/index.ts)
           → obsidian_rag_* tools (python/obsidian_rag/)
           → Ollama embeddings (http://localhost:11434)
           → LanceDB vector store