generated from coulomb/repo-seed
2.0 KiB
2.0 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_slug, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | state_hub_workstream_slug | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|---|
| IB-WP-0008 | workplan | Evaluation History And Metrics Parity | markitect | infospace-bench | completed | markitect | markitect | 2026-05-14 | 2026-05-14 | ib-wp-0008-evaluation-history-metrics-parity | f00ba036-dc97-4370-a4a5-9ac2bce7ce6f |
IB-WP-0008 — Evaluation History And Metrics Parity
Goal
Bring the current evaluation dataclasses up to practical parity with the old infospace evaluation history and metrics behavior.
Tasks
T01 — Evaluation file I/O
id: IB-WP-0008-T01
status: done
priority: high
state_hub_task_id: "95b48ad3-c4d1-442c-9bc0-7591d948d23e"
- Write and read per-entity/per-artifact evaluation files
- Preserve human-readable Markdown body plus structured metadata
- Add round-trip tests
T02 — Snapshot history
id: IB-WP-0008-T02
status: done
priority: high
state_hub_task_id: "b4800ba8-5b86-44bb-bf47-e893bae36b22"
- Write, append, and read evaluation snapshot history
- Support diffing named snapshots from disk
- Add CLI support for history and history-diff behavior
T03 — Metrics merge and viability reports
id: IB-WP-0008-T03
status: done
priority: high
state_hub_task_id: "7abcbd63-0147-4ae8-85f0-4af51882f476"
- Persist latest metrics under
output/metrics - Merge collection metrics with evaluation-derived metrics
- Emit structured viability reports
T04 — Legacy metric compatibility notes
id: IB-WP-0008-T04
status: done
priority: medium
state_hub_task_id: "675d1d45-39d9-4ddd-9ab7-5d7de8a0f601"
- Compare old metric names and meanings to new baseline
- Document changed semantics and accepted differences
- Add fixtures that preserve critical old behavior
Acceptance
- Evaluation and metrics history are inspectable, diffable, and reproducible
- Viability reports can be generated from committed files
- Old evaluation-history workflows have a clear replacement path