Files
infospace-bench/docs/evaluation-and-inspection.md
2026-05-14 11:32:25 +02:00

43 lines
1.2 KiB
Markdown

# Evaluation And Inspection
`infospace-bench` now has a deterministic baseline for evaluation and
inspection. It is intentionally small: the repo can produce structured quality
objects and relationship summaries before any LLM or engine integration is
introduced.
## Evaluation Objects
- `ScoreEntry`
- `EntityEvaluation`
- `MetricValue`
- `EvaluationSnapshot`
- `SnapshotDiff`
Snapshots are serializable through `to_dict()` / `from_dict()` and can be
compared with `diff_snapshots()`.
## Collection Checks
`run_collection_checks()` produces five baseline metrics:
- `redundancy_ratio`
- `coverage_ratio`
- `coherence_components`
- `consistency_cycles`
- `granularity_entropy`
These metrics are deliberately deterministic and file-backed. Later work can
replace or extend their internals with embeddings, richer graph analysis, or
agent-assisted evaluation without changing the result contract.
## Viability
`evaluate_viability()` compares metric values against declared
`ViabilityThreshold` values. Missing metrics fail visibly.
## Relationship Inspection
`relationship_summary()` extracts nodes, edges, and relationship type counts
from artifact manifests. `export_mermaid()` provides the first graph-friendly
representation.