Testbased performance monitor

2026-05-05 20:58:25 +02:00
parent fcd50bdfe8
commit f6f3116ae7
3 changed files with 811 additions and 0 deletions
--- a/docs/test-performance-monitoring.md
+++ b/docs/test-performance-monitoring.md
@@ -0,0 +1,109 @@
+# Test Performance Monitoring
+
+Date: 2026-05-05
+
+Status: lightweight pytest performance history for local situational awareness.
+
+## Purpose
+
+The test suite records a compact performance history on every pytest run. The
+goal is not detailed profiling. It is a small scorekeeping loop that helps us
+notice negative drift while the engine grows.
+
+The monitor captures:
+
+- run start and finish timestamps,
+- total test run duration,
+- per-test duration and outcome,
+- Python and platform identity,
+- logical CPU count,
+- load averages and load-per-CPU where available,
+- memory total, available memory, and available ratio from `/proc/meminfo`
+  where available,
+- process user/system CPU deltas and peak resident memory.
+
+## Storage
+
+Default history path:
+
+```text
+.pytest_cache/kontextual/performance-history.json
+```
+
+`.pytest_cache/` is ignored by git, so regular test runs do not dirty the
+repository. A different path can be supplied with `--perf-history-path` or
+`KONTEXTUAL_PERF_HISTORY`.
+
+## Retention Model
+
+The JSON file keeps a bounded, compact record:
+
+- the last `N` raw runs,
+- the last `N` rolling averages over the retained runs,
+- the average of the last `N` rolling averages,
+- one compact daily average record per day, updated on every run,
+- daily records retained for a configurable number of days.
+
+Defaults:
+
+- `N = 20`,
+- daily retention = `730` days,
+- drift warning ratio = `35%`,
+- minimum duration delta before warning = `0.05s`.
+
+Skipped tests are recorded in raw runs and aggregate counts, but they are not
+used as per-test duration baselines. This keeps optional Markitect and capacity
+tests from producing false regressions when they switch from skipped to
+executed.
+
+## Warnings
+
+At the end of the pytest run, the monitor compares the current run with the
+previous average-of-averages. It prints warnings for:
+
+- total run duration drift, when the executed test count is comparable,
+- individual test duration drift,
+- materially higher normalized start load,
+- materially lower available-memory ratio.
+
+Warnings do not fail the test run. They are meant to create attention, not gate
+development.
+
+## Configuration
+
+Disable monitoring:
+
+```bash
+python3 -m pytest --perf-history-disable
+```
+
+or:
+
+```bash
+KONTEXTUAL_PERF_MONITOR=0 python3 -m pytest
+```
+
+Override retention and warning thresholds:
+
+```bash
+python3 -m pytest \
+  --perf-history-window 30 \
+  --perf-history-drift-ratio 0.50 \
+  --perf-history-min-delta 0.10
+```
+
+Environment equivalents:
+
+- `KONTEXTUAL_PERF_HISTORY`,
+- `KONTEXTUAL_PERF_WINDOW`,
+- `KONTEXTUAL_PERF_DAILY_RETENTION_DAYS`,
+- `KONTEXTUAL_PERF_DRIFT_RATIO`,
+- `KONTEXTUAL_PERF_MIN_DELTA_SECONDS`.
+
+## When To Profile Instead
+
+Use this monitor to spot drift and identify candidate tests or areas. If a
+warning points to a real bottleneck, create a focused profiling experiment or a
+capacity sentinel. Do not add large traces or per-function profiling data to
+the rolling history.
+