generated from coulomb/repo-seed
Initial implementation
This commit is contained in:
42
docs/evaluation-and-inspection.md
Normal file
42
docs/evaluation-and-inspection.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# Evaluation And Inspection
|
||||
|
||||
`infospace-bench` now has a deterministic baseline for evaluation and
|
||||
inspection. It is intentionally small: the repo can produce structured quality
|
||||
objects and relationship summaries before any LLM or engine integration is
|
||||
introduced.
|
||||
|
||||
## Evaluation Objects
|
||||
|
||||
- `ScoreEntry`
|
||||
- `EntityEvaluation`
|
||||
- `MetricValue`
|
||||
- `EvaluationSnapshot`
|
||||
- `SnapshotDiff`
|
||||
|
||||
Snapshots are serializable through `to_dict()` / `from_dict()` and can be
|
||||
compared with `diff_snapshots()`.
|
||||
|
||||
## Collection Checks
|
||||
|
||||
`run_collection_checks()` produces five baseline metrics:
|
||||
|
||||
- `redundancy_ratio`
|
||||
- `coverage_ratio`
|
||||
- `coherence_components`
|
||||
- `consistency_cycles`
|
||||
- `granularity_entropy`
|
||||
|
||||
These metrics are deliberately deterministic and file-backed. Later work can
|
||||
replace or extend their internals with embeddings, richer graph analysis, or
|
||||
agent-assisted evaluation without changing the result contract.
|
||||
|
||||
## Viability
|
||||
|
||||
`evaluate_viability()` compares metric values against declared
|
||||
`ViabilityThreshold` values. Missing metrics fail visibly.
|
||||
|
||||
## Relationship Inspection
|
||||
|
||||
`relationship_summary()` extracts nodes, edges, and relationship type counts
|
||||
from artifact manifests. `export_mermaid()` provides the first graph-friendly
|
||||
representation.
|
||||
Reference in New Issue
Block a user