Evaluation History And Metrics

infospace-bench keeps evaluation history as committed, inspectable files under each infospace root. This replaces the legacy markitect-project history workflow while retaining the useful behaviors: Markdown evaluation files, append-only snapshot history, metric merging, and viability checks.

Files

output/evaluations/*.md: per-artifact evaluation files with YAML frontmatter and a human-readable Markdown body.
output/metrics/metrics.yaml: latest merged metrics. Collection metrics, evaluation-derived metrics, and structured non-numeric values are preserved.
output/metrics/history.yaml: append-only list of evaluation snapshots.
output/metrics/snapshots/<snapshot-id>.yaml: named snapshot copies for reproducible diffs.
output/metrics/viability.yaml: structured viability report generated from infospace.yaml thresholds and the current metrics file.

Replacement Mapping

The old infospace history code used entity-oriented names such as entity_count, entity_evaluations, and entity_slug. The successor model uses artifact-oriented names:

artifact_count replaces entity_count
artifact_evaluations replaces entity_evaluations
artifact_id replaces entity_slug

Readers accept the old snapshot aliases where practical so legacy fixtures can be inspected, but new files should use the artifact-oriented vocabulary.

CLI

python3 -m infospace_bench metrics infospaces/bootstrap-pilot
python3 -m infospace_bench history infospaces/bootstrap-pilot
python3 -m infospace_bench history infospaces/bootstrap-pilot --metric coverage_ratio
python3 -m infospace_bench history-diff infospaces/bootstrap-pilot snap-a snap-b

Snapshot references may be exact snapshot IDs or ISO-like dates such as 2026-05-14. Date references resolve to the nearest snapshot in the history.

1.8 KiB Raw Blame History

Evaluation History And Metrics

Files

Replacement Mapping

CLI

1.8 KiB

Raw Blame History