feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)
Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on first pass; 3 failed (network errors). eval-summary --update-metrics written with per_entity_mean=3.9556. Viability dashboard: 6/6 PASS redundancy_ratio 0.0061 (max 0.10) coverage_ratio 0.6190 (min 0.40) coherence_comps 0.0000 (max 3) consistency_cycles 0.0000 (max 0) granularity_entropy 2.6748 (min 1.0) per_entity_mean 3.9556 (min 3.5) Dimension breakdown (mean across 985 entities): definition_precision 3.62 source_grounding 4.36 domain_placement 4.56 vsm_relevance 3.31 explanatory_value 3.94 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,59 @@
|
||||
---
|
||||
entity_slug: warehouse_system
|
||||
evaluator: null
|
||||
evaluated_at: '2026-02-23T06:38:26.974499'
|
||||
overall_score: 1.2
|
||||
scores:
|
||||
- name: definition_precision
|
||||
value: 1.0
|
||||
max_value: 5.0
|
||||
rationale: There is no definition provided at all, making it impossible to assess
|
||||
precision or conceptual distinctness. Without any definitional content, this entity
|
||||
fails to establish what concept it represents.
|
||||
- name: source_grounding
|
||||
value: 1.0
|
||||
max_value: 5.0
|
||||
rationale: With no definition, context, or source chapter specified, there is no
|
||||
evidence this entity is grounded in Smith's actual text. The term "warehouse system"
|
||||
does not appear to be a central concept in The Wealth of Nations.
|
||||
- name: domain_placement
|
||||
value: 1.0
|
||||
max_value: 5.0
|
||||
rationale: The domain is listed as "unspecified," providing no basis for evaluating
|
||||
whether the entity belongs in the correct conceptual category. Without context
|
||||
or definition, proper domain placement cannot be assessed.
|
||||
- name: vsm_relevance
|
||||
value: 2.0
|
||||
max_value: 5.0
|
||||
rationale: A warehouse system could theoretically map to S1 (operational storage/distribution)
|
||||
or S3 (resource management), but without definition or context, any VSM placement
|
||||
would be purely speculative. The potential exists but cannot be evaluated meaningfully.
|
||||
- name: explanatory_value
|
||||
value: 1.0
|
||||
max_value: 5.0
|
||||
rationale: An entity with no definition, context, or source grounding provides zero
|
||||
explanatory value about Smith's economic theory. It neither illuminates mechanisms
|
||||
nor clarifies structural relations in the source material.
|
||||
---
|
||||
|
||||
# Evaluation: Warehouse System
|
||||
|
||||
## definition_precision — 1.0 / 5.0
|
||||
|
||||
There is no definition provided at all, making it impossible to assess precision or conceptual distinctness. Without any definitional content, this entity fails to establish what concept it represents.
|
||||
|
||||
## source_grounding — 1.0 / 5.0
|
||||
|
||||
With no definition, context, or source chapter specified, there is no evidence this entity is grounded in Smith's actual text. The term "warehouse system" does not appear to be a central concept in The Wealth of Nations.
|
||||
|
||||
## domain_placement — 1.0 / 5.0
|
||||
|
||||
The domain is listed as "unspecified," providing no basis for evaluating whether the entity belongs in the correct conceptual category. Without context or definition, proper domain placement cannot be assessed.
|
||||
|
||||
## vsm_relevance — 2.0 / 5.0
|
||||
|
||||
A warehouse system could theoretically map to S1 (operational storage/distribution) or S3 (resource management), but without definition or context, any VSM placement would be purely speculative. The potential exists but cannot be evaluated meaningfully.
|
||||
|
||||
## explanatory_value — 1.0 / 5.0
|
||||
|
||||
An entity with no definition, context, or source grounding provides zero explanatory value about Smith's economic theory. It neither illuminates mechanisms nor clarifies structural relations in the source material.
|
||||
Reference in New Issue
Block a user