feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)

Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 09:36:46 +01:00
parent 81a4c8796a
commit a9ca0adfcf
986 changed files with 63216 additions and 1 deletions

View File

@@ -0,0 +1,64 @@
---
entity_slug: economic_system_implementation
evaluator: null
evaluated_at: '2026-02-23T05:16:47.168866'
overall_score: 2.4
scores:
- name: definition_precision
value: 2.0
max_value: 5.0
rationale: The definition is overly broad and vague, essentially describing "how
economic systems work in practice" without identifying any specific mechanisms
or distinct conceptual boundaries. It reads more like a general description of
policy implementation than a precise economic concept.
- name: source_grounding
value: 1.0
max_value: 5.0
rationale: The entity explicitly admits Smith does not discuss this concept in the
referenced chapter, stating it is "implied" in his discussion - this represents
a significant inferential leap rather than grounding in actual source text. Book
IV, Chapter 0 would also be unusual as chapters typically start with Chapter 1.
- name: domain_placement
value: 3.0
max_value: 5.0
rationale: While "Regulation" is a reasonable domain for implementation processes,
this concept is so broad it could equally belong in institutional economics, political
economy, or policy studies. The domain assignment captures one aspect but doesn't
reflect the entity's expansive scope.
- name: vsm_relevance
value: 4.0
max_value: 5.0
rationale: This entity maps well to multiple VSM systems - S1 (operational implementation),
S3 (internal regulation and audit of implementation), and S4 (adaptation of implementation
to environment). The implementation focus gives it clear VSM applicability across
several systems.
- name: explanatory_value
value: 2.0
max_value: 5.0
rationale: The entity merely labels the general phenomenon of "putting economic
ideas into practice" without illuminating any specific mechanisms, structural
relationships, or causal processes. It adds descriptive terminology but little
analytical insight into how economic systems actually function.
---
# Evaluation: Economic System Implementation
## definition_precision — 2.0 / 5.0
The definition is overly broad and vague, essentially describing "how economic systems work in practice" without identifying any specific mechanisms or distinct conceptual boundaries. It reads more like a general description of policy implementation than a precise economic concept.
## source_grounding — 1.0 / 5.0
The entity explicitly admits Smith does not discuss this concept in the referenced chapter, stating it is "implied" in his discussion - this represents a significant inferential leap rather than grounding in actual source text. Book IV, Chapter 0 would also be unusual as chapters typically start with Chapter 1.
## domain_placement — 3.0 / 5.0
While "Regulation" is a reasonable domain for implementation processes, this concept is so broad it could equally belong in institutional economics, political economy, or policy studies. The domain assignment captures one aspect but doesn't reflect the entity's expansive scope.
## vsm_relevance — 4.0 / 5.0
This entity maps well to multiple VSM systems - S1 (operational implementation), S3 (internal regulation and audit of implementation), and S4 (adaptation of implementation to environment). The implementation focus gives it clear VSM applicability across several systems.
## explanatory_value — 2.0 / 5.0
The entity merely labels the general phenomenon of "putting economic ideas into practice" without illuminating any specific mechanisms, structural relationships, or causal processes. It adds descriptive terminology but little analytical insight into how economic systems actually function.