generated from coulomb/repo-seed
82 lines
2.0 KiB
Markdown
82 lines
2.0 KiB
Markdown
---
|
|
id: IB-WP-0008
|
|
type: workplan
|
|
title: "Evaluation History And Metrics Parity"
|
|
domain: markitect
|
|
repo: infospace-bench
|
|
status: completed
|
|
owner: markitect
|
|
topic_slug: markitect
|
|
created: "2026-05-14"
|
|
updated: "2026-05-14"
|
|
state_hub_workstream_slug: "ib-wp-0008-evaluation-history-metrics-parity"
|
|
state_hub_workstream_id: "f00ba036-dc97-4370-a4a5-9ac2bce7ce6f"
|
|
---
|
|
|
|
# IB-WP-0008 — Evaluation History And Metrics Parity
|
|
|
|
## Goal
|
|
|
|
Bring the current evaluation dataclasses up to practical parity with the old
|
|
infospace evaluation history and metrics behavior.
|
|
|
|
## Tasks
|
|
|
|
### T01 — Evaluation file I/O
|
|
|
|
```task
|
|
id: IB-WP-0008-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "95b48ad3-c4d1-442c-9bc0-7591d948d23e"
|
|
```
|
|
|
|
- Write and read per-entity/per-artifact evaluation files
|
|
- Preserve human-readable Markdown body plus structured metadata
|
|
- Add round-trip tests
|
|
|
|
### T02 — Snapshot history
|
|
|
|
```task
|
|
id: IB-WP-0008-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "b4800ba8-5b86-44bb-bf47-e893bae36b22"
|
|
```
|
|
|
|
- Write, append, and read evaluation snapshot history
|
|
- Support diffing named snapshots from disk
|
|
- Add CLI support for history and history-diff behavior
|
|
|
|
### T03 — Metrics merge and viability reports
|
|
|
|
```task
|
|
id: IB-WP-0008-T03
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "7abcbd63-0147-4ae8-85f0-4af51882f476"
|
|
```
|
|
|
|
- Persist latest metrics under `output/metrics`
|
|
- Merge collection metrics with evaluation-derived metrics
|
|
- Emit structured viability reports
|
|
|
|
### T04 — Legacy metric compatibility notes
|
|
|
|
```task
|
|
id: IB-WP-0008-T04
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "675d1d45-39d9-4ddd-9ab7-5d7de8a0f601"
|
|
```
|
|
|
|
- Compare old metric names and meanings to new baseline
|
|
- Document changed semantics and accepted differences
|
|
- Add fixtures that preserve critical old behavior
|
|
|
|
## Acceptance
|
|
|
|
- Evaluation and metrics history are inspectable, diffable, and reproducible
|
|
- Viability reports can be generated from committed files
|
|
- Old evaluation-history workflows have a clear replacement path
|