Contract robustness and bottleneck test

2026-05-05 20:26:56 +02:00
parent ef8391e6a7
commit fcd50bdfe8
14 changed files with 654 additions and 79 deletions
--- a/docs/markitect-tool-capacity-risks.md
+++ b/docs/markitect-tool-capacity-risks.md
@@ -0,0 +1,82 @@
+# markitect-tool Capacity Risk Sentinels
+
+Date: 2026-05-05
+
+Status: opt-in bottleneck tests for the `kontextual-engine` to
+`markitect-tool` integration boundary.
+
+## Purpose
+
+The example-backed contract tests prove that the Markitect interface behaves
+correctly for representative documents. Capacity sentinels add one more layer:
+they exercise larger generated examples so we can notice algorithmic trouble
+before engine workplans depend on the interface.
+
+These tests are not microbenchmarks. They are deliberately coarse, generous,
+and opt-in. A failure should trigger investigation, profiling, or an upstream
+`markitect-tool` improvement before the engine builds more assumptions on top.
+
+## Suspected Bottleneck Areas
+
+| Area | Risk | Sentinel |
+| --- | --- | --- |
+| Large Markdown parsing | Section-heavy documents may create many headings, blocks, tokens, and sections. | Parse a generated document with hundreds of sections and verify document shape under a generous wall-clock budget. |
+| Selector extraction | Repeated selectors over large documents can become `queries x document-size`. | Run multiple heading, section, frontmatter, and block selectors over one parsed large document. |
+| Include resolution and composition | Fan-out includes with selectors may repeatedly parse included files and expand output size. | Resolve a generated include fan-out bundle and compose many Markdown files. |
+| Context package creation | Packing many source files can parse and query each file, then filter by policy. | Create and activate a context package from many generated public/internal Markdown sources. |
+| Snapshot identity | Hashing many or larger files should remain predictable and content-addressed. | Generate many Markdown files and compute stable snapshot identities. |
+
+## Running The Sentinels
+
+Normal test runs skip these tests. Run them against the sibling
+`markitect-tool` checkout with:
+
+```bash
+KONTEXTUAL_RUN_CAPACITY=1 \
+PYTHONPATH=/home/worsch/kontextual-engine/src:/home/worsch/markitect-tool/src \
+  python3 -m pytest tests/test_markitect_tool_capacity.py -q
+```
+
+Run all Markitect interface checks with:
+
+```bash
+KONTEXTUAL_RUN_CAPACITY=1 \
+PYTHONPATH=/home/worsch/kontextual-engine/src:/home/worsch/markitect-tool/src \
+  python3 -m pytest -m "markitect_tool" -q
+```
+
+## Interpretation
+
+- Passing sentinels mean the current integration boundary is healthy enough for
+  the planned engine work.
+- Failing sentinels should be treated as interface risk, not as proof of engine
+  failure.
+- If a sentinel is too noisy, prefer improving its generated scenario or
+  threshold over deleting it.
+- If a real use case exceeds the current generated sizes, add a new sentinel
+  before relying on the behavior in an engine workplan.
+
+## Current Generated Sizes
+
+The tests currently generate:
+
+- one section-heavy document with hundreds of decision sections,
+- dozens of repeated selector queries over a large parsed document,
+- a fan-out include bundle over many partial files,
+- a context package over many public/internal source files,
+- many snapshot identities over generated Markdown files.
+
+The generated data lives in temporary pytest directories so the repository
+does not carry bulky synthetic corpora.
+
+## Initial Local Baseline
+
+On 2026-05-05, running against `/home/worsch/markitect-tool/src` on the local
+WSL workspace, all sentinels passed. The slowest observed sentinel was repeated
+selector queries over a large parsed document, followed by large parse/query
+and context-package creation. This suggests selectors are the first area to
+watch as engine retrieval workloads grow.
+
+The baseline is observational, not a committed performance guarantee. The
+budgets in `tests/test_markitect_tool_capacity.py` are intentionally wider than
+the observed timings to avoid false failures from normal workstation variance.
--- a/docs/markitect-tool-integration-usecases.md
+++ b/docs/markitect-tool-integration-usecases.md
@@ -14,7 +14,11 @@ Instead, it should wrap them as adapters and persist engine-owned assets,
 lineage, policy decisions, audit events, and service contracts around them.

 The executable companion for this document is
-`tests/test_markitect_tool_contract.py`.
+`tests/test_markitect_tool_contract.py`. The reusable fixture corpus lives in
+`examples/markitect-tool-contract/`.
+Opt-in bottleneck sentinels are described in
+`docs/markitect-tool-capacity-risks.md` and implemented in
+`tests/test_markitect_tool_capacity.py`.

 ## Expected Dependency Shape

@@ -26,6 +30,22 @@ The executable companion for this document is
 - Persistence posture: store serializable Markitect results and provenance as
  adapter metadata, not as canonical domain objects.

+Run the examples against the sibling source checkout during integration
+development with:
+
+```bash
+PYTHONPATH=/home/worsch/kontextual-engine/src:/home/worsch/markitect-tool/src \
+  python3 -m pytest tests/test_markitect_tool_contract.py -q
+```
+
+Run the larger capacity sentinels with:
+
+```bash
+KONTEXTUAL_RUN_CAPACITY=1 \
+PYTHONPATH=/home/worsch/kontextual-engine/src:/home/worsch/markitect-tool/src \
+  python3 -m pytest tests/test_markitect_tool_capacity.py -q
+```
+
 ## Use Case 1: Markdown Normalization

 Intent: convert Markdown source content into structured frontmatter, headings,
@@ -207,7 +227,10 @@ Engine expectation:
 | Transform and include provenance | Markdown ops retain Markitect provenance. |
 | Snapshot identity | Engine stores Markitect snapshot metadata without owning the algorithm. |
 | Context package policy filtering | Agent context can reuse Markitect packages and local label policy. |
+| Document contracts | Markdown validation can call Markitect contracts without moving contract semantics into the engine. |
+| Capacity sentinels | Larger generated examples expose likely parser, selector, include, context-package, and snapshot bottlenecks. |

-These tests are intentionally small. They are not a replacement for
-`markitect-tool`'s own test suite; they assert only the behaviors this engine
-depends on.
+These tests are intentionally small but example-backed. They are not a
+replacement for `markitect-tool`'s own test suite; they assert only the
+behaviors this engine depends on and provide concrete data for diagnosing
+interface drift.