Files
infospace-bench/workplans/IB-WP-0008-evaluation-history-metrics-parity.md
2026-05-14 15:35:04 +02:00

82 lines
2.0 KiB
Markdown

---
id: IB-WP-0008
type: workplan
title: "Evaluation History And Metrics Parity"
domain: markitect
repo: infospace-bench
status: completed
owner: markitect
topic_slug: markitect
created: "2026-05-14"
updated: "2026-05-14"
state_hub_workstream_slug: "ib-wp-0008-evaluation-history-metrics-parity"
state_hub_workstream_id: "f00ba036-dc97-4370-a4a5-9ac2bce7ce6f"
---
# IB-WP-0008 — Evaluation History And Metrics Parity
## Goal
Bring the current evaluation dataclasses up to practical parity with the old
infospace evaluation history and metrics behavior.
## Tasks
### T01 — Evaluation file I/O
```task
id: IB-WP-0008-T01
status: done
priority: high
state_hub_task_id: "95b48ad3-c4d1-442c-9bc0-7591d948d23e"
```
- Write and read per-entity/per-artifact evaluation files
- Preserve human-readable Markdown body plus structured metadata
- Add round-trip tests
### T02 — Snapshot history
```task
id: IB-WP-0008-T02
status: done
priority: high
state_hub_task_id: "b4800ba8-5b86-44bb-bf47-e893bae36b22"
```
- Write, append, and read evaluation snapshot history
- Support diffing named snapshots from disk
- Add CLI support for history and history-diff behavior
### T03 — Metrics merge and viability reports
```task
id: IB-WP-0008-T03
status: done
priority: high
state_hub_task_id: "7abcbd63-0147-4ae8-85f0-4af51882f476"
```
- Persist latest metrics under `output/metrics`
- Merge collection metrics with evaluation-derived metrics
- Emit structured viability reports
### T04 — Legacy metric compatibility notes
```task
id: IB-WP-0008-T04
status: done
priority: medium
state_hub_task_id: "675d1d45-39d9-4ddd-9ab7-5d7de8a0f601"
```
- Compare old metric names and meanings to new baseline
- Document changed semantics and accepted differences
- Add fixtures that preserve critical old behavior
## Acceptance
- Evaluation and metrics history are inspectable, diffable, and reproducible
- Viability reports can be generated from committed files
- Old evaluation-history workflows have a clear replacement path