generated from coulomb/repo-seed
43 lines
1.2 KiB
Markdown
43 lines
1.2 KiB
Markdown
# Evaluation And Inspection
|
|
|
|
`infospace-bench` now has a deterministic baseline for evaluation and
|
|
inspection. It is intentionally small: the repo can produce structured quality
|
|
objects and relationship summaries before any LLM or engine integration is
|
|
introduced.
|
|
|
|
## Evaluation Objects
|
|
|
|
- `ScoreEntry`
|
|
- `EntityEvaluation`
|
|
- `MetricValue`
|
|
- `EvaluationSnapshot`
|
|
- `SnapshotDiff`
|
|
|
|
Snapshots are serializable through `to_dict()` / `from_dict()` and can be
|
|
compared with `diff_snapshots()`.
|
|
|
|
## Collection Checks
|
|
|
|
`run_collection_checks()` produces five baseline metrics:
|
|
|
|
- `redundancy_ratio`
|
|
- `coverage_ratio`
|
|
- `coherence_components`
|
|
- `consistency_cycles`
|
|
- `granularity_entropy`
|
|
|
|
These metrics are deliberately deterministic and file-backed. Later work can
|
|
replace or extend their internals with embeddings, richer graph analysis, or
|
|
agent-assisted evaluation without changing the result contract.
|
|
|
|
## Viability
|
|
|
|
`evaluate_viability()` compares metric values against declared
|
|
`ViabilityThreshold` values. Missing metrics fail visibly.
|
|
|
|
## Relationship Inspection
|
|
|
|
`relationship_summary()` extracts nodes, edges, and relationship type counts
|
|
from artifact manifests. `export_mermaid()` provides the first graph-friendly
|
|
representation.
|