feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)

Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 09:36:46 +01:00
parent 81a4c8796a
commit a9ca0adfcf
986 changed files with 63216 additions and 1 deletions

View File

@@ -0,0 +1,64 @@
---
entity_slug: economic_system_application
evaluator: null
evaluated_at: '2026-02-23T05:12:57.618791'
overall_score: 2.6
scores:
- name: definition_precision
value: 2.0
max_value: 5.0
rationale: The definition is overly broad and vague, essentially describing any
implementation of economic ideas without capturing a distinct concept. It reads
more like a general description of applied economics rather than a precise, bounded
entity that Smith would have recognized.
- name: source_grounding
value: 2.0
max_value: 5.0
rationale: While Smith does discuss different economic systems in Book IV, there's
no evidence he conceptualized "economic system application" as a distinct analytical
category. The entity appears to be a modern abstraction imposed on Smith's more
specific discussions of particular policies and their effects.
- name: domain_placement
value: 3.0
max_value: 5.0
rationale: '"Regulation" is a reasonable domain since implementation involves regulatory
mechanisms, but this entity is so broad it could equally belong in policy, institutions,
or theory domains. The placement isn''t wrong but reflects the entity''s lack
of conceptual specificity.'
- name: vsm_relevance
value: 4.0
max_value: 5.0
rationale: This entity maps well to S3 (internal regulation) as it concerns translating
policy into operational practice, with potential connections to S4 (adaptation
to environmental contexts). The VSM framing actually helps clarify what this otherwise
vague concept might mean.
- name: explanatory_value
value: 2.0
max_value: 5.0
rationale: The entity adds little explanatory power beyond stating that economic
theories must be implemented in practice. It doesn't illuminate any specific mechanism
or structural relationship that Smith identified, functioning more as a meta-category
than an analytical tool.
---
# Evaluation: Economic System Application
## definition_precision — 2.0 / 5.0
The definition is overly broad and vague, essentially describing any implementation of economic ideas without capturing a distinct concept. It reads more like a general description of applied economics rather than a precise, bounded entity that Smith would have recognized.
## source_grounding — 2.0 / 5.0
While Smith does discuss different economic systems in Book IV, there's no evidence he conceptualized "economic system application" as a distinct analytical category. The entity appears to be a modern abstraction imposed on Smith's more specific discussions of particular policies and their effects.
## domain_placement — 3.0 / 5.0
"Regulation" is a reasonable domain since implementation involves regulatory mechanisms, but this entity is so broad it could equally belong in policy, institutions, or theory domains. The placement isn't wrong but reflects the entity's lack of conceptual specificity.
## vsm_relevance — 4.0 / 5.0
This entity maps well to S3 (internal regulation) as it concerns translating policy into operational practice, with potential connections to S4 (adaptation to environmental contexts). The VSM framing actually helps clarify what this otherwise vague concept might mean.
## explanatory_value — 2.0 / 5.0
The entity adds little explanatory power beyond stating that economic theories must be implemented in practice. It doesn't illuminate any specific mechanism or structural relationship that Smith identified, functioning more as a meta-category than an analytical tool.