generated from coulomb/repo-seed
eval history and metrics
This commit is contained in:
43
docs/evaluation-history-and-metrics.md
Normal file
43
docs/evaluation-history-and-metrics.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# Evaluation History And Metrics
|
||||
|
||||
`infospace-bench` keeps evaluation history as committed, inspectable files under
|
||||
each infospace root. This replaces the legacy `markitect-project` history
|
||||
workflow while retaining the useful behaviors: Markdown evaluation files,
|
||||
append-only snapshot history, metric merging, and viability checks.
|
||||
|
||||
## Files
|
||||
|
||||
- `output/evaluations/*.md`: per-artifact evaluation files with YAML
|
||||
frontmatter and a human-readable Markdown body.
|
||||
- `output/metrics/metrics.yaml`: latest merged metrics. Collection metrics,
|
||||
evaluation-derived metrics, and structured non-numeric values are preserved.
|
||||
- `output/metrics/history.yaml`: append-only list of evaluation snapshots.
|
||||
- `output/metrics/snapshots/<snapshot-id>.yaml`: named snapshot copies for
|
||||
reproducible diffs.
|
||||
- `output/metrics/viability.yaml`: structured viability report generated from
|
||||
`infospace.yaml` thresholds and the current metrics file.
|
||||
|
||||
## Replacement Mapping
|
||||
|
||||
The old infospace history code used entity-oriented names such as
|
||||
`entity_count`, `entity_evaluations`, and `entity_slug`. The successor model
|
||||
uses artifact-oriented names:
|
||||
|
||||
- `artifact_count` replaces `entity_count`
|
||||
- `artifact_evaluations` replaces `entity_evaluations`
|
||||
- `artifact_id` replaces `entity_slug`
|
||||
|
||||
Readers accept the old snapshot aliases where practical so legacy fixtures can
|
||||
be inspected, but new files should use the artifact-oriented vocabulary.
|
||||
|
||||
## CLI
|
||||
|
||||
```bash
|
||||
python3 -m infospace_bench metrics infospaces/bootstrap-pilot
|
||||
python3 -m infospace_bench history infospaces/bootstrap-pilot
|
||||
python3 -m infospace_bench history infospaces/bootstrap-pilot --metric coverage_ratio
|
||||
python3 -m infospace_bench history-diff infospaces/bootstrap-pilot snap-a snap-b
|
||||
```
|
||||
|
||||
Snapshot references may be exact snapshot IDs or ISO-like dates such as
|
||||
`2026-05-14`. Date references resolve to the nearest snapshot in the history.
|
||||
Reference in New Issue
Block a user