Files
infospace-bench/docs/evaluation-history-and-metrics.md
2026-05-14 15:35:04 +02:00

1.8 KiB

Evaluation History And Metrics

infospace-bench keeps evaluation history as committed, inspectable files under each infospace root. This replaces the legacy markitect-project history workflow while retaining the useful behaviors: Markdown evaluation files, append-only snapshot history, metric merging, and viability checks.

Files

  • output/evaluations/*.md: per-artifact evaluation files with YAML frontmatter and a human-readable Markdown body.
  • output/metrics/metrics.yaml: latest merged metrics. Collection metrics, evaluation-derived metrics, and structured non-numeric values are preserved.
  • output/metrics/history.yaml: append-only list of evaluation snapshots.
  • output/metrics/snapshots/<snapshot-id>.yaml: named snapshot copies for reproducible diffs.
  • output/metrics/viability.yaml: structured viability report generated from infospace.yaml thresholds and the current metrics file.

Replacement Mapping

The old infospace history code used entity-oriented names such as entity_count, entity_evaluations, and entity_slug. The successor model uses artifact-oriented names:

  • artifact_count replaces entity_count
  • artifact_evaluations replaces entity_evaluations
  • artifact_id replaces entity_slug

Readers accept the old snapshot aliases where practical so legacy fixtures can be inspected, but new files should use the artifact-oriented vocabulary.

CLI

python3 -m infospace_bench metrics infospaces/bootstrap-pilot
python3 -m infospace_bench history infospaces/bootstrap-pilot
python3 -m infospace_bench history infospaces/bootstrap-pilot --metric coverage_ratio
python3 -m infospace_bench history-diff infospaces/bootstrap-pilot snap-a snap-b

Snapshot references may be exact snapshot IDs or ISO-like dates such as 2026-05-14. Date references resolve to the nearest snapshot in the history.