Files
infospace-bench/docs/evaluation-and-inspection.md
2026-05-14 11:32:25 +02:00

1.2 KiB

Evaluation And Inspection

infospace-bench now has a deterministic baseline for evaluation and inspection. It is intentionally small: the repo can produce structured quality objects and relationship summaries before any LLM or engine integration is introduced.

Evaluation Objects

  • ScoreEntry
  • EntityEvaluation
  • MetricValue
  • EvaluationSnapshot
  • SnapshotDiff

Snapshots are serializable through to_dict() / from_dict() and can be compared with diff_snapshots().

Collection Checks

run_collection_checks() produces five baseline metrics:

  • redundancy_ratio
  • coverage_ratio
  • coherence_components
  • consistency_cycles
  • granularity_entropy

These metrics are deliberately deterministic and file-backed. Later work can replace or extend their internals with embeddings, richer graph analysis, or agent-assisted evaluation without changing the result contract.

Viability

evaluate_viability() compares metric values against declared ViabilityThreshold values. Missing metrics fail visibly.

Relationship Inspection

relationship_summary() extracts nodes, edges, and relationship type counts from artifact manifests. export_mermaid() provides the first graph-friendly representation.