Files
infospace-bench/docs/evaluation-history-and-metrics.md
2026-05-14 15:35:04 +02:00

44 lines
1.8 KiB
Markdown

# Evaluation History And Metrics
`infospace-bench` keeps evaluation history as committed, inspectable files under
each infospace root. This replaces the legacy `markitect-project` history
workflow while retaining the useful behaviors: Markdown evaluation files,
append-only snapshot history, metric merging, and viability checks.
## Files
- `output/evaluations/*.md`: per-artifact evaluation files with YAML
frontmatter and a human-readable Markdown body.
- `output/metrics/metrics.yaml`: latest merged metrics. Collection metrics,
evaluation-derived metrics, and structured non-numeric values are preserved.
- `output/metrics/history.yaml`: append-only list of evaluation snapshots.
- `output/metrics/snapshots/<snapshot-id>.yaml`: named snapshot copies for
reproducible diffs.
- `output/metrics/viability.yaml`: structured viability report generated from
`infospace.yaml` thresholds and the current metrics file.
## Replacement Mapping
The old infospace history code used entity-oriented names such as
`entity_count`, `entity_evaluations`, and `entity_slug`. The successor model
uses artifact-oriented names:
- `artifact_count` replaces `entity_count`
- `artifact_evaluations` replaces `entity_evaluations`
- `artifact_id` replaces `entity_slug`
Readers accept the old snapshot aliases where practical so legacy fixtures can
be inspected, but new files should use the artifact-oriented vocabulary.
## CLI
```bash
python3 -m infospace_bench metrics infospaces/bootstrap-pilot
python3 -m infospace_bench history infospaces/bootstrap-pilot
python3 -m infospace_bench history infospaces/bootstrap-pilot --metric coverage_ratio
python3 -m infospace_bench history-diff infospaces/bootstrap-pilot snap-a snap-b
```
Snapshot references may be exact snapshot IDs or ISO-like dates such as
`2026-05-14`. Date references resolve to the nearest snapshot in the history.