infospace-bench/workplans/IB-WP-0003-evaluation-and-inspection.md

---
id: IB-WP-0003
type: workplan
title: "Evaluation And Inspection Framework"
domain: markitect
repo: infospace-bench
status: done
owner: markitect
topic_slug: markitect
created: "2026-05-03"
updated: "2026-05-14"
state_hub_workstream_slug: "ib-wp-0003-evaluation-and-inspection"
state_hub_workstream_id: "bc368ba0-9fd7-4821-a5d7-e5c301faa80a"
---

# IB-WP-0003 — Evaluation And Inspection Framework

## Goal

Reestablish infospace quality evaluation and inspection as first-class
application behavior.

## FRS Coverage

- FR-030 to FR-032: evaluation and quality assessment
- FR-040 to FR-042: inspection, exploration, and visualization
- FR-080 to FR-081: optional AI-assisted operations and context provision

## Tasks

### T01 — Port evaluation result concepts

```task
id: IB-WP-0003-T01
status: done
priority: high
state_hub_task_id: "9bab4b20-3fef-469e-9ce2-f0db3e05e26a"
```

- Reimplement score entries, entity evaluations, metric values, snapshots, and
  diffs from `markitect/infospace/evaluation.py`
- Keep serialization simple and inspectable

### T02 — Rebuild collection checks

```task
id: IB-WP-0003-T02
status: done
priority: high
state_hub_task_id: "ee335d74-5be3-4b94-91e3-509486909f93"
```

- Recreate redundancy, coverage, coherence, consistency, and granularity checks
- Keep dependencies explicit for embeddings and relationship graphs
- Write results to reusable structured outputs

### T03 — Add viability evaluation

```task
id: IB-WP-0003-T03
status: done
priority: high
state_hub_task_id: "d46b3429-37ef-4375-96e1-304eabf2cc13"
```

- Compare latest metrics to `infospace.yaml` thresholds
- Report pass/fail per threshold and overall viability

### T04 — Add relationship inspection output

```task
id: IB-WP-0003-T04
status: done
priority: medium
state_hub_task_id: "de4f45e4-81a1-4ddb-98de-15e99ed5605a"
```

- Export relationship summaries and graph representations
- Support at least one textual output and one graph-friendly output

## Acceptance

- Evaluation outputs are structured and diffable
- Collection-level metrics can be produced for a sample infospace
- Viability can be computed from declared thresholds
- Relationship structure is inspectable without hidden state