feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)

Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 09:36:46 +01:00
parent 81a4c8796a
commit a9ca0adfcf
986 changed files with 63216 additions and 1 deletions

View File

@@ -0,0 +1,63 @@
---
entity_slug: economic_system_efficiency
evaluator: null
evaluated_at: '2026-02-23T05:15:10.930901'
overall_score: 2.6
scores:
- name: definition_precision
value: 3.0
max_value: 5.0
rationale: The definition captures a coherent concept about resource allocation
and productivity, but it's somewhat broad and could apply to almost any economic
arrangement. The phrase "achieve their objectives" is vague since different systems
may have different objectives.
- name: source_grounding
value: 2.0
max_value: 5.0
rationale: The entity acknowledges that efficiency is "not explicitly discussed
by Smith in this chapter" and is only "implied," which suggests weak grounding
in the actual source text. This appears to be imposing a modern economic concept
onto Smith's work rather than extracting what he actually wrote.
- name: domain_placement
value: 4.0
max_value: 5.0
rationale: '"General Theory" is an appropriate domain for a broad concept about
how economic systems function overall. The concept does relate to fundamental
questions about economic arrangements that Smith addresses throughout his work.'
- name: vsm_relevance
value: 2.0
max_value: 5.0
rationale: This concept is too abstract and general to map naturally to specific
VSM systems - it could theoretically apply to any system (S1-S5) since all systems
should operate efficiently. It lacks the structural specificity needed for meaningful
VSM placement.
- name: explanatory_value
value: 2.0
max_value: 5.0
rationale: While efficiency is an important economic concept, this entity doesn't
illuminate specific mechanisms or structural relations from Smith's analysis.
It's more of a general evaluative criterion than an explanatory concept that reveals
how economic systems actually work.
---
# Evaluation: Economic System Efficiency
## definition_precision — 3.0 / 5.0
The definition captures a coherent concept about resource allocation and productivity, but it's somewhat broad and could apply to almost any economic arrangement. The phrase "achieve their objectives" is vague since different systems may have different objectives.
## source_grounding — 2.0 / 5.0
The entity acknowledges that efficiency is "not explicitly discussed by Smith in this chapter" and is only "implied," which suggests weak grounding in the actual source text. This appears to be imposing a modern economic concept onto Smith's work rather than extracting what he actually wrote.
## domain_placement — 4.0 / 5.0
"General Theory" is an appropriate domain for a broad concept about how economic systems function overall. The concept does relate to fundamental questions about economic arrangements that Smith addresses throughout his work.
## vsm_relevance — 2.0 / 5.0
This concept is too abstract and general to map naturally to specific VSM systems - it could theoretically apply to any system (S1-S5) since all systems should operate efficiently. It lacks the structural specificity needed for meaningful VSM placement.
## explanatory_value — 2.0 / 5.0
While efficiency is an important economic concept, this entity doesn't illuminate specific mechanisms or structural relations from Smith's analysis. It's more of a general evaluative criterion than an explanatory concept that reveals how economic systems actually work.