feat(indexer): hierarchical chunking for large sections
- Section-split first for structured notes - Large sections (>max_section_chars) broken via sliding-window - Small sections stay intact with heading preserved - Adds max_section_chars config (default 4000) - 2 new TDD tests for hierarchical chunking
This commit is contained in:
11
AGENTS.md
11
AGENTS.md
@@ -42,6 +42,11 @@ Plugin `package.json` MUST have:
|
||||
|
||||
User config at `~/.obsidian-rag/config.json` or `./obsidian-rag/` dev config.
|
||||
|
||||
Key indexing fields:
|
||||
- `indexing.chunk_size` — sliding window chunk size (default 500)
|
||||
- `indexing.chunk_overlap` — overlap between chunks (default 100)
|
||||
- `indexing.max_section_chars` — max chars per section before hierarchical split (default 4000)
|
||||
|
||||
Key security fields:
|
||||
- `security.require_confirmation_for` — list of categories (e.g. `["health", "financial_debt"]`). Empty list disables guard.
|
||||
- `security.auto_approve_sensitive` — `true` bypasses sensitive content prompts.
|
||||
@@ -49,7 +54,11 @@ Key security fields:
|
||||
|
||||
## Ollama Context Length
|
||||
|
||||
`python/obsidian_rag/embedder.py` truncates chunks at `MAX_CHUNK_CHARS = 8000` before embedding. If Ollama 500 error returns, increase this value or reduce `indexing.chunk_size` in config.
|
||||
`python/obsidian_rag/embedder.py` truncates chunks at `MAX_CHUNK_CHARS = 8000` before embedding. If Ollama 500 error returns, increase `max_section_chars` (to reduce section sizes) or reduce `chunk_size` in config.
|
||||
|
||||
## Hierarchical Chunking
|
||||
|
||||
Structured notes (date-named files) use section-split first, then sliding-window within sections that exceed `max_section_chars`. Small sections stay intact; large sections are broken into sub-chunks with the parent section heading preserved.
|
||||
|
||||
## Sensitive Content Guard
|
||||
|
||||
|
||||
Reference in New Issue
Block a user