generated from coulomb/repo-seed
110 lines
2.9 KiB
Markdown
110 lines
2.9 KiB
Markdown
# Test Performance Monitoring
|
|
|
|
Date: 2026-05-05
|
|
|
|
Status: lightweight pytest performance history for local situational awareness.
|
|
|
|
## Purpose
|
|
|
|
The test suite records a compact performance history on every pytest run. The
|
|
goal is not detailed profiling. It is a small scorekeeping loop that helps us
|
|
notice negative drift while the engine grows.
|
|
|
|
The monitor captures:
|
|
|
|
- run start and finish timestamps,
|
|
- total test run duration,
|
|
- per-test duration and outcome,
|
|
- Python and platform identity,
|
|
- logical CPU count,
|
|
- load averages and load-per-CPU where available,
|
|
- memory total, available memory, and available ratio from `/proc/meminfo`
|
|
where available,
|
|
- process user/system CPU deltas and peak resident memory.
|
|
|
|
## Storage
|
|
|
|
Default history path:
|
|
|
|
```text
|
|
.pytest_cache/kontextual/performance-history.json
|
|
```
|
|
|
|
`.pytest_cache/` is ignored by git, so regular test runs do not dirty the
|
|
repository. A different path can be supplied with `--perf-history-path` or
|
|
`KONTEXTUAL_PERF_HISTORY`.
|
|
|
|
## Retention Model
|
|
|
|
The JSON file keeps a bounded, compact record:
|
|
|
|
- the last `N` raw runs,
|
|
- the last `N` rolling averages over the retained runs,
|
|
- the average of the last `N` rolling averages,
|
|
- one compact daily average record per day, updated on every run,
|
|
- daily records retained for a configurable number of days.
|
|
|
|
Defaults:
|
|
|
|
- `N = 20`,
|
|
- daily retention = `730` days,
|
|
- drift warning ratio = `35%`,
|
|
- minimum duration delta before warning = `0.05s`.
|
|
|
|
Skipped tests are recorded in raw runs and aggregate counts, but they are not
|
|
used as per-test duration baselines. This keeps optional Markitect and capacity
|
|
tests from producing false regressions when they switch from skipped to
|
|
executed.
|
|
|
|
## Warnings
|
|
|
|
At the end of the pytest run, the monitor compares the current run with the
|
|
previous average-of-averages. It prints warnings for:
|
|
|
|
- total run duration drift, when the executed test count is comparable,
|
|
- individual test duration drift,
|
|
- materially higher normalized start load,
|
|
- materially lower available-memory ratio.
|
|
|
|
Warnings do not fail the test run. They are meant to create attention, not gate
|
|
development.
|
|
|
|
## Configuration
|
|
|
|
Disable monitoring:
|
|
|
|
```bash
|
|
python3 -m pytest --perf-history-disable
|
|
```
|
|
|
|
or:
|
|
|
|
```bash
|
|
KONTEXTUAL_PERF_MONITOR=0 python3 -m pytest
|
|
```
|
|
|
|
Override retention and warning thresholds:
|
|
|
|
```bash
|
|
python3 -m pytest \
|
|
--perf-history-window 30 \
|
|
--perf-history-drift-ratio 0.50 \
|
|
--perf-history-min-delta 0.10
|
|
```
|
|
|
|
Environment equivalents:
|
|
|
|
- `KONTEXTUAL_PERF_HISTORY`,
|
|
- `KONTEXTUAL_PERF_WINDOW`,
|
|
- `KONTEXTUAL_PERF_DAILY_RETENTION_DAYS`,
|
|
- `KONTEXTUAL_PERF_DRIFT_RATIO`,
|
|
- `KONTEXTUAL_PERF_MIN_DELTA_SECONDS`.
|
|
|
|
## When To Profile Instead
|
|
|
|
Use this monitor to spot drift and identify candidate tests or areas. If a
|
|
warning points to a real bottleneck, create a focused profiling experiment or a
|
|
capacity sentinel. Do not add large traces or per-function profiling data to
|
|
the rolling history.
|
|
|