Files
kontextual-engine/docs/test-performance-monitoring.md
2026-05-05 20:58:25 +02:00

2.9 KiB

Test Performance Monitoring

Date: 2026-05-05

Status: lightweight pytest performance history for local situational awareness.

Purpose

The test suite records a compact performance history on every pytest run. The goal is not detailed profiling. It is a small scorekeeping loop that helps us notice negative drift while the engine grows.

The monitor captures:

  • run start and finish timestamps,
  • total test run duration,
  • per-test duration and outcome,
  • Python and platform identity,
  • logical CPU count,
  • load averages and load-per-CPU where available,
  • memory total, available memory, and available ratio from /proc/meminfo where available,
  • process user/system CPU deltas and peak resident memory.

Storage

Default history path:

.pytest_cache/kontextual/performance-history.json

.pytest_cache/ is ignored by git, so regular test runs do not dirty the repository. A different path can be supplied with --perf-history-path or KONTEXTUAL_PERF_HISTORY.

Retention Model

The JSON file keeps a bounded, compact record:

  • the last N raw runs,
  • the last N rolling averages over the retained runs,
  • the average of the last N rolling averages,
  • one compact daily average record per day, updated on every run,
  • daily records retained for a configurable number of days.

Defaults:

  • N = 20,
  • daily retention = 730 days,
  • drift warning ratio = 35%,
  • minimum duration delta before warning = 0.05s.

Skipped tests are recorded in raw runs and aggregate counts, but they are not used as per-test duration baselines. This keeps optional Markitect and capacity tests from producing false regressions when they switch from skipped to executed.

Warnings

At the end of the pytest run, the monitor compares the current run with the previous average-of-averages. It prints warnings for:

  • total run duration drift, when the executed test count is comparable,
  • individual test duration drift,
  • materially higher normalized start load,
  • materially lower available-memory ratio.

Warnings do not fail the test run. They are meant to create attention, not gate development.

Configuration

Disable monitoring:

python3 -m pytest --perf-history-disable

or:

KONTEXTUAL_PERF_MONITOR=0 python3 -m pytest

Override retention and warning thresholds:

python3 -m pytest \
  --perf-history-window 30 \
  --perf-history-drift-ratio 0.50 \
  --perf-history-min-delta 0.10

Environment equivalents:

  • KONTEXTUAL_PERF_HISTORY,
  • KONTEXTUAL_PERF_WINDOW,
  • KONTEXTUAL_PERF_DAILY_RETENTION_DAYS,
  • KONTEXTUAL_PERF_DRIFT_RATIO,
  • KONTEXTUAL_PERF_MIN_DELTA_SECONDS.

When To Profile Instead

Use this monitor to spot drift and identify candidate tests or areas. If a warning points to a real bottleneck, create a focused profiling experiment or a capacity sentinel. Do not add large traces or per-function profiling data to the rolling history.