markitect-tool/docs/performance-notes.md

# Performance Notes

Markitect is designed to remain useful without persistent services. The current
performance posture is therefore local-first:

- parse individual Markdown files directly for one-off use
- build a local SQLite index for repeated corpus operations
- use refresh planning to avoid unnecessary parse/index work
- keep policy filtering and context package creation deterministic

## Current Smoke Coverage

`tests/test_practical_usecases_e2e.py` includes a large-corpus smoke test that
creates 120 synthetic Markdown files, indexes them locally, and runs an FTS
search.

The thresholds are intentionally generous:

- local cache/index build: under 30 seconds
- local indexed search: under 5 seconds

These are not benchmark claims. They are regression guards to catch accidental
algorithmic or IO mistakes while keeping the test portable.

## Practical Guidance

For one file:

```bash
mkt parse file.md
mkt query file.md 'sections[heading=Decision]'
```

For a directory you will query repeatedly:

```bash
mkt cache index docs --root .
mkt search keyword --root .
mkt cache query 'sections[heading=Decision]' --root .
```

Before refreshing derived work:

```bash
mkt backend refresh-plan docs --root . --verify-hashes
```

## Future Measurement

If performance becomes a release gate, add a separate benchmark suite instead
of tightening normal E2E tests. Good benchmark dimensions would be:

- number of documents
- total bytes
- heading/section density
- frontmatter size
- number of reference/include relationships
- policy labels per document
- index rebuild versus incremental refresh
- context package item and token budgets