kontextual-engine/docs/test-performance-monitoring.md

# Test Performance Monitoring

Date: 2026-05-05

Status: lightweight pytest performance history for local situational awareness.

## Purpose

The test suite records a compact performance history on every pytest run. The
goal is not detailed profiling. It is a small scorekeeping loop that helps us
notice negative drift while the engine grows.

The monitor captures:

- run start and finish timestamps,
- total test run duration,
- per-test duration and outcome,
- Python and platform identity,
- logical CPU count,
- load averages and load-per-CPU where available,
- memory total, available memory, and available ratio from `/proc/meminfo`
  where available,
- process user/system CPU deltas and peak resident memory.

## Storage

Default history path:

```text
.pytest_cache/kontextual/performance-history.json
```

`.pytest_cache/` is ignored by git, so regular test runs do not dirty the
repository. A different path can be supplied with `--perf-history-path` or
`KONTEXTUAL_PERF_HISTORY`.

## Retention Model

The JSON file keeps a bounded, compact record:

- the last `N` raw runs,
- the last `N` rolling averages over the retained runs,
- the average of the last `N` rolling averages,
- one compact daily average record per day, updated on every run,
- daily records retained for a configurable number of days.

Defaults:

- `N = 20`,
- daily retention = `730` days,
- drift warning ratio = `35%`,
- minimum duration delta before warning = `0.05s`.

Skipped tests are recorded in raw runs and aggregate counts, but they are not
used as per-test duration baselines. This keeps optional Markitect and capacity
tests from producing false regressions when they switch from skipped to
executed.

## Warnings

At the end of the pytest run, the monitor compares the current run with the
previous average-of-averages. It prints warnings for:

- total run duration drift, when the executed test count is comparable,
- individual test duration drift,
- materially higher normalized start load,
- materially lower available-memory ratio.

Warnings do not fail the test run. They are meant to create attention, not gate
development.

## Configuration

Disable monitoring:

```bash
python3 -m pytest --perf-history-disable
```

or:

```bash
KONTEXTUAL_PERF_MONITOR=0 python3 -m pytest
```

Override retention and warning thresholds:

```bash
python3 -m pytest \
  --perf-history-window 30 \
  --perf-history-drift-ratio 0.50 \
  --perf-history-min-delta 0.10
```

Environment equivalents:

- `KONTEXTUAL_PERF_HISTORY`,
- `KONTEXTUAL_PERF_WINDOW`,
- `KONTEXTUAL_PERF_DAILY_RETENTION_DAYS`,
- `KONTEXTUAL_PERF_DRIFT_RATIO`,
- `KONTEXTUAL_PERF_MIN_DELTA_SECONDS`.

## When To Profile Instead

Use this monitor to spot drift and identify candidate tests or areas. If a
warning points to a real bottleneck, create a focused profiling experiment or a
capacity sentinel. Do not add large traces or per-function profiling data to
the rolling history.