- Section-split first for structured notes - Large sections (>max_section_chars) broken via sliding-window - Small sections stay intact with heading preserved - Adds max_section_chars config (default 4000) - 2 new TDD tests for hierarchical chunking
77 lines
2.5 KiB
Markdown
77 lines
2.5 KiB
Markdown
# AGENTS.md
|
|
|
|
## Stack
|
|
|
|
Two independent packages in one repo:
|
|
|
|
| Directory | Role | Entry | Build |
|
|
|-----------|------|-------|-------|
|
|
| `src/` | TypeScript OpenClaw plugin | `src/index.ts` | esbuild → `dist/index.js` |
|
|
| `python/` | Python CLI indexer | `obsidian_rag/cli.py` | pip install -e |
|
|
|
|
## Commands
|
|
|
|
**TypeScript (OpenClaw plugin):**
|
|
```bash
|
|
npm run build # esbuild → dist/index.js
|
|
npm run typecheck # tsc --noEmit
|
|
npm run test # vitest run
|
|
```
|
|
|
|
**Python (RAG indexer):**
|
|
```bash
|
|
pip install -e python/ # editable install
|
|
obsidian-rag index|sync|reindex|status # CLI
|
|
pytest python/ # tests
|
|
ruff check python/ # lint
|
|
```
|
|
|
|
## OpenClaw Plugin Install
|
|
|
|
Plugin `package.json` MUST have:
|
|
```json
|
|
"openclaw": {
|
|
"extensions": ["./dist/index.js"],
|
|
"hook": []
|
|
}
|
|
```
|
|
- `extensions` = array, string path
|
|
- `hook` = singular, not `hooks`
|
|
|
|
## Config
|
|
|
|
User config at `~/.obsidian-rag/config.json` or `./obsidian-rag/` dev config.
|
|
|
|
Key indexing fields:
|
|
- `indexing.chunk_size` — sliding window chunk size (default 500)
|
|
- `indexing.chunk_overlap` — overlap between chunks (default 100)
|
|
- `indexing.max_section_chars` — max chars per section before hierarchical split (default 4000)
|
|
|
|
Key security fields:
|
|
- `security.require_confirmation_for` — list of categories (e.g. `["health", "financial_debt"]`). Empty list disables guard.
|
|
- `security.auto_approve_sensitive` — `true` bypasses sensitive content prompts.
|
|
- `security.local_only` — `true` blocks non-localhost Ollama.
|
|
|
|
## Ollama Context Length
|
|
|
|
`python/obsidian_rag/embedder.py` truncates chunks at `MAX_CHUNK_CHARS = 8000` before embedding. If Ollama 500 error returns, increase `max_section_chars` (to reduce section sizes) or reduce `chunk_size` in config.
|
|
|
|
## Hierarchical Chunking
|
|
|
|
Structured notes (date-named files) use section-split first, then sliding-window within sections that exceed `max_section_chars`. Small sections stay intact; large sections are broken into sub-chunks with the parent section heading preserved.
|
|
|
|
## Sensitive Content Guard
|
|
|
|
Triggered by categories in `require_confirmation_for`. Raises `SensitiveContentError` from `obsidian_rag/indexer.py`.
|
|
|
|
To disable: set `require_confirmation_for: []` or `auto_approve_sensitive: true` in config.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
User query → OpenClaw (TypeScript plugin src/index.ts)
|
|
→ obsidian_rag_* tools (python/obsidian_rag/)
|
|
→ Ollama embeddings (http://localhost:11434)
|
|
→ LanceDB vector store
|
|
```
|