feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)

Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 09:36:46 +01:00
parent 81a4c8796a
commit a9ca0adfcf
986 changed files with 63216 additions and 1 deletions

View File

@@ -0,0 +1,65 @@
---
entity_slug: economic_system_structure
evaluator: null
evaluated_at: '2026-02-23T05:21:03.439317'
overall_score: 2.0
scores:
- name: definition_precision
value: 2.0
max_value: 5.0
rationale: The definition is overly broad and umbrella-like, essentially describing
"how economies are organized" in general terms. It lacks precision and could apply
to virtually any economic arrangement, making it more of a meta-category than
a distinct analytical concept.
- name: source_grounding
value: 2.0
max_value: 5.0
rationale: While Smith does analyze different economic systems, this entity appears
to impose a modern organizational framework vocabulary ("institutional arrangements,"
"decision-making authority") that doesn't clearly emerge from Smith's actual text.
The reference to "Book IV, Chapter 0" is also problematic as Book IV doesn't have
a Chapter 0.
- name: domain_placement
value: 3.0
max_value: 5.0
rationale: '"General Theory" is appropriate given the broad, structural nature of
this concept. However, the entity is so abstract that it might be better placed
in a methodological or meta-theoretical category rather than substantive economic
theory.'
- name: vsm_relevance
value: 1.0
max_value: 5.0
rationale: This entity is far too abstract and general to map meaningfully to any
specific VSM system. It essentially describes the entire organizational structure
rather than identifying particular cybernetic functions, making it VSM-irrelevant
rather than VSM-mappable.
- name: explanatory_value
value: 2.0
max_value: 5.0
rationale: The entity provides minimal explanatory power, functioning more as a
general label than as an analytical tool that illuminates specific mechanisms
or relationships. It names a broad phenomenon without offering insight into how
economic systems actually function or what makes them viable.
---
# Evaluation: Economic System Structure
## definition_precision — 2.0 / 5.0
The definition is overly broad and umbrella-like, essentially describing "how economies are organized" in general terms. It lacks precision and could apply to virtually any economic arrangement, making it more of a meta-category than a distinct analytical concept.
## source_grounding — 2.0 / 5.0
While Smith does analyze different economic systems, this entity appears to impose a modern organizational framework vocabulary ("institutional arrangements," "decision-making authority") that doesn't clearly emerge from Smith's actual text. The reference to "Book IV, Chapter 0" is also problematic as Book IV doesn't have a Chapter 0.
## domain_placement — 3.0 / 5.0
"General Theory" is appropriate given the broad, structural nature of this concept. However, the entity is so abstract that it might be better placed in a methodological or meta-theoretical category rather than substantive economic theory.
## vsm_relevance — 1.0 / 5.0
This entity is far too abstract and general to map meaningfully to any specific VSM system. It essentially describes the entire organizational structure rather than identifying particular cybernetic functions, making it VSM-irrelevant rather than VSM-mappable.
## explanatory_value — 2.0 / 5.0
The entity provides minimal explanatory power, functioning more as a general label than as an analytical tool that illuminates specific mechanisms or relationships. It names a broad phenomenon without offering insight into how economic systems actually function or what makes them viable.