Files
infospace-bench/workplans/IB-WP-0008-evaluation-history-metrics-parity.md
2026-05-14 15:35:04 +02:00

2.0 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_slug, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated state_hub_workstream_slug state_hub_workstream_id
IB-WP-0008 workplan Evaluation History And Metrics Parity markitect infospace-bench completed markitect markitect 2026-05-14 2026-05-14 ib-wp-0008-evaluation-history-metrics-parity f00ba036-dc97-4370-a4a5-9ac2bce7ce6f

IB-WP-0008 — Evaluation History And Metrics Parity

Goal

Bring the current evaluation dataclasses up to practical parity with the old infospace evaluation history and metrics behavior.

Tasks

T01 — Evaluation file I/O

id: IB-WP-0008-T01
status: done
priority: high
state_hub_task_id: "95b48ad3-c4d1-442c-9bc0-7591d948d23e"
  • Write and read per-entity/per-artifact evaluation files
  • Preserve human-readable Markdown body plus structured metadata
  • Add round-trip tests

T02 — Snapshot history

id: IB-WP-0008-T02
status: done
priority: high
state_hub_task_id: "b4800ba8-5b86-44bb-bf47-e893bae36b22"
  • Write, append, and read evaluation snapshot history
  • Support diffing named snapshots from disk
  • Add CLI support for history and history-diff behavior

T03 — Metrics merge and viability reports

id: IB-WP-0008-T03
status: done
priority: high
state_hub_task_id: "7abcbd63-0147-4ae8-85f0-4af51882f476"
  • Persist latest metrics under output/metrics
  • Merge collection metrics with evaluation-derived metrics
  • Emit structured viability reports

T04 — Legacy metric compatibility notes

id: IB-WP-0008-T04
status: done
priority: medium
state_hub_task_id: "675d1d45-39d9-4ddd-9ab7-5d7de8a0f601"
  • Compare old metric names and meanings to new baseline
  • Document changed semantics and accepted differences
  • Add fixtures that preserve critical old behavior

Acceptance

  • Evaluation and metrics history are inspectable, diffable, and reproducible
  • Viability reports can be generated from committed files
  • Old evaluation-history workflows have a clear replacement path