Files
kontextual-engine/docs/test-performance-monitoring.md
2026-05-05 20:58:25 +02:00

110 lines
2.9 KiB
Markdown

# Test Performance Monitoring
Date: 2026-05-05
Status: lightweight pytest performance history for local situational awareness.
## Purpose
The test suite records a compact performance history on every pytest run. The
goal is not detailed profiling. It is a small scorekeeping loop that helps us
notice negative drift while the engine grows.
The monitor captures:
- run start and finish timestamps,
- total test run duration,
- per-test duration and outcome,
- Python and platform identity,
- logical CPU count,
- load averages and load-per-CPU where available,
- memory total, available memory, and available ratio from `/proc/meminfo`
where available,
- process user/system CPU deltas and peak resident memory.
## Storage
Default history path:
```text
.pytest_cache/kontextual/performance-history.json
```
`.pytest_cache/` is ignored by git, so regular test runs do not dirty the
repository. A different path can be supplied with `--perf-history-path` or
`KONTEXTUAL_PERF_HISTORY`.
## Retention Model
The JSON file keeps a bounded, compact record:
- the last `N` raw runs,
- the last `N` rolling averages over the retained runs,
- the average of the last `N` rolling averages,
- one compact daily average record per day, updated on every run,
- daily records retained for a configurable number of days.
Defaults:
- `N = 20`,
- daily retention = `730` days,
- drift warning ratio = `35%`,
- minimum duration delta before warning = `0.05s`.
Skipped tests are recorded in raw runs and aggregate counts, but they are not
used as per-test duration baselines. This keeps optional Markitect and capacity
tests from producing false regressions when they switch from skipped to
executed.
## Warnings
At the end of the pytest run, the monitor compares the current run with the
previous average-of-averages. It prints warnings for:
- total run duration drift, when the executed test count is comparable,
- individual test duration drift,
- materially higher normalized start load,
- materially lower available-memory ratio.
Warnings do not fail the test run. They are meant to create attention, not gate
development.
## Configuration
Disable monitoring:
```bash
python3 -m pytest --perf-history-disable
```
or:
```bash
KONTEXTUAL_PERF_MONITOR=0 python3 -m pytest
```
Override retention and warning thresholds:
```bash
python3 -m pytest \
--perf-history-window 30 \
--perf-history-drift-ratio 0.50 \
--perf-history-min-delta 0.10
```
Environment equivalents:
- `KONTEXTUAL_PERF_HISTORY`,
- `KONTEXTUAL_PERF_WINDOW`,
- `KONTEXTUAL_PERF_DAILY_RETENTION_DAYS`,
- `KONTEXTUAL_PERF_DRIFT_RATIO`,
- `KONTEXTUAL_PERF_MIN_DELTA_SECONDS`.
## When To Profile Instead
Use this monitor to spot drift and identify candidate tests or areas. If a
warning points to a real bottleneck, create a focused profiling experiment or a
capacity sentinel. Do not add large traces or per-function profiling data to
the rolling history.