feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)

Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 09:36:46 +01:00
parent 81a4c8796a
commit a9ca0adfcf
986 changed files with 63216 additions and 1 deletions

View File

@@ -0,0 +1,66 @@
---
entity_slug: colonial_economic_system_implementation
evaluator: null
evaluated_at: '2026-02-23T04:49:04.542833'
overall_score: 4.0
scores:
- name: definition_precision
value: 3.0
max_value: 5.0
rationale: The definition captures a distinct concept about practical application
of colonial policies, but uses somewhat vague terms like "different approaches"
and "implementation quality" without specifying what constitutes good vs. poor
implementation. It avoids circularity but could be more precise about the mechanisms
involved.
- name: source_grounding
value: 4.0
max_value: 5.0
rationale: This entity appears well-grounded in Smith's actual analysis of colonial
policy implementation in Book IV, Chapter 7, where he does examine how monopoly
systems fail in practice and argues for more open arrangements. The contrast between
monopoly and open implementation approaches reflects Smith's documented arguments.
- name: domain_placement
value: 5.0
max_value: 5.0
rationale: '"Regulation" is the correct domain placement, as this entity specifically
concerns the practical mechanisms of implementing and enforcing colonial trade
regulations and administrative structures. This is fundamentally about regulatory
systems rather than production, exchange, or distribution.'
- name: vsm_relevance
value: 4.0
max_value: 5.0
rationale: This entity maps naturally to S3 (internal regulation/audit) as it concerns
the implementation and enforcement of colonial policies, and potentially S2 (coordination)
regarding administrative structures. The focus on "enforcement mechanisms" and
"administrative structures" aligns well with VSM regulatory functions.
- name: explanatory_value
value: 4.0
max_value: 5.0
rationale: This entity provides genuine explanatory power by highlighting that policy
effectiveness depends not just on design but on implementation quality, and that
different policy approaches (monopoly vs. open) have different implementation
challenges. It illuminates a key mechanism in Smith's colonial analysis rather
than just naming a surface phenomenon.
---
# Evaluation: Colonial Economic System Implementation
## definition_precision — 3.0 / 5.0
The definition captures a distinct concept about practical application of colonial policies, but uses somewhat vague terms like "different approaches" and "implementation quality" without specifying what constitutes good vs. poor implementation. It avoids circularity but could be more precise about the mechanisms involved.
## source_grounding — 4.0 / 5.0
This entity appears well-grounded in Smith's actual analysis of colonial policy implementation in Book IV, Chapter 7, where he does examine how monopoly systems fail in practice and argues for more open arrangements. The contrast between monopoly and open implementation approaches reflects Smith's documented arguments.
## domain_placement — 5.0 / 5.0
"Regulation" is the correct domain placement, as this entity specifically concerns the practical mechanisms of implementing and enforcing colonial trade regulations and administrative structures. This is fundamentally about regulatory systems rather than production, exchange, or distribution.
## vsm_relevance — 4.0 / 5.0
This entity maps naturally to S3 (internal regulation/audit) as it concerns the implementation and enforcement of colonial policies, and potentially S2 (coordination) regarding administrative structures. The focus on "enforcement mechanisms" and "administrative structures" aligns well with VSM regulatory functions.
## explanatory_value — 4.0 / 5.0
This entity provides genuine explanatory power by highlighting that policy effectiveness depends not just on design but on implementation quality, and that different policy approaches (monopoly vs. open) have different implementation challenges. It illuminates a key mechanism in Smith's colonial analysis rather than just naming a surface phenomenon.