---
id: IB-WP-0008
type: workplan
title: "Evaluation History And Metrics Parity"
domain: markitect
repo: infospace-bench
status: completed
owner: markitect
topic_slug: markitect
created: "2026-05-14"
updated: "2026-05-14"
state_hub_workstream_slug: "ib-wp-0008-evaluation-history-metrics-parity"
state_hub_workstream_id: "f00ba036-dc97-4370-a4a5-9ac2bce7ce6f"
---

# IB-WP-0008 — Evaluation History And Metrics Parity

## Goal

Bring the current evaluation dataclasses up to practical parity with the old
infospace evaluation history and metrics behavior.

## Tasks

### T01 — Evaluation file I/O

```task
id: IB-WP-0008-T01
status: done
priority: high
state_hub_task_id: "95b48ad3-c4d1-442c-9bc0-7591d948d23e"
```

- Write and read per-entity/per-artifact evaluation files
- Preserve human-readable Markdown body plus structured metadata
- Add round-trip tests

### T02 — Snapshot history

```task
id: IB-WP-0008-T02
status: done
priority: high
state_hub_task_id: "b4800ba8-5b86-44bb-bf47-e893bae36b22"
```

- Write, append, and read evaluation snapshot history
- Support diffing named snapshots from disk
- Add CLI support for history and history-diff behavior

### T03 — Metrics merge and viability reports

```task
id: IB-WP-0008-T03
status: done
priority: high
state_hub_task_id: "7abcbd63-0147-4ae8-85f0-4af51882f476"
```

- Persist latest metrics under `output/metrics`
- Merge collection metrics with evaluation-derived metrics
- Emit structured viability reports

### T04 — Legacy metric compatibility notes

```task
id: IB-WP-0008-T04
status: done
priority: medium
state_hub_task_id: "675d1d45-39d9-4ddd-9ab7-5d7de8a0f601"
```

- Compare old metric names and meanings to new baseline
- Document changed semantics and accepted differences
- Add fixtures that preserve critical old behavior

## Acceptance

- Evaluation and metrics history are inspectable, diffable, and reproducible
- Viability reports can be generated from committed files
- Old evaluation-history workflows have a clear replacement path