1.8 KiB
Evaluation History And Metrics
infospace-bench keeps evaluation history as committed, inspectable files under
each infospace root. This replaces the legacy markitect-project history
workflow while retaining the useful behaviors: Markdown evaluation files,
append-only snapshot history, metric merging, and viability checks.
Files
output/evaluations/*.md: per-artifact evaluation files with YAML frontmatter and a human-readable Markdown body.output/metrics/metrics.yaml: latest merged metrics. Collection metrics, evaluation-derived metrics, and structured non-numeric values are preserved.output/metrics/history.yaml: append-only list of evaluation snapshots.output/metrics/snapshots/<snapshot-id>.yaml: named snapshot copies for reproducible diffs.output/metrics/viability.yaml: structured viability report generated frominfospace.yamlthresholds and the current metrics file.
Replacement Mapping
The old infospace history code used entity-oriented names such as
entity_count, entity_evaluations, and entity_slug. The successor model
uses artifact-oriented names:
artifact_countreplacesentity_countartifact_evaluationsreplacesentity_evaluationsartifact_idreplacesentity_slug
Readers accept the old snapshot aliases where practical so legacy fixtures can be inspected, but new files should use the artifact-oriented vocabulary.
CLI
python3 -m infospace_bench metrics infospaces/bootstrap-pilot
python3 -m infospace_bench history infospaces/bootstrap-pilot
python3 -m infospace_bench history infospaces/bootstrap-pilot --metric coverage_ratio
python3 -m infospace_bench history-diff infospaces/bootstrap-pilot snap-a snap-b
Snapshot references may be exact snapshot IDs or ISO-like dates such as
2026-05-14. Date references resolve to the nearest snapshot in the history.