feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)

Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 09:36:46 +01:00
parent 81a4c8796a
commit a9ca0adfcf
986 changed files with 63216 additions and 1 deletions

View File

@@ -0,0 +1,65 @@
---
entity_slug: old_subsidy_drawback_rules
evaluator: null
evaluated_at: '2026-02-23T06:03:01.815879'
overall_score: 4.4
scores:
- name: definition_precision
value: 4.0
max_value: 5.0
rationale: The definition is quite precise, specifying exact timeframes (twelve
months for English merchants, nine months for aliens) and clearly describing the
mechanism of duty recovery upon exportation. It captures a distinct regulatory
framework rather than a vague concept, though it could be slightly more concise.
- name: source_grounding
value: 5.0
max_value: 5.0
rationale: This entity is clearly grounded in the actual source text from Book IV,
Chapter 4, which explicitly discusses the original subsidy act and its drawback
provisions. The specific details about timeframes and different treatment for
various goods directly reflect Smith's analysis of these historical regulations.
- name: domain_placement
value: 5.0
max_value: 5.0
rationale: The "Regulation" domain assignment is perfectly appropriate, as this
entity describes specific administrative rules and procedures governing trade
policy. These drawback rules represent a clear example of governmental regulation
of commercial activity.
- name: vsm_relevance
value: 4.0
max_value: 5.0
rationale: This entity maps well to S3 (internal regulation/audit) as it represents
the operational rules and monitoring mechanisms for duty recovery, and potentially
to S2 (coordination) as it standardizes procedures across different merchant categories.
The regulatory framework has clear VSM relevance for understanding systemic control
mechanisms.
- name: explanatory_value
value: 4.0
max_value: 5.0
rationale: The entity provides genuine explanatory value by illuminating the specific
mechanisms through which drawback policies operated and evolved, showing how different
commodities and merchant types received differentiated treatment. It reveals structural
relations in trade policy administration rather than merely naming a surface phenomenon.
---
# Evaluation: Old Subsidy Drawback Rules
## definition_precision — 4.0 / 5.0
The definition is quite precise, specifying exact timeframes (twelve months for English merchants, nine months for aliens) and clearly describing the mechanism of duty recovery upon exportation. It captures a distinct regulatory framework rather than a vague concept, though it could be slightly more concise.
## source_grounding — 5.0 / 5.0
This entity is clearly grounded in the actual source text from Book IV, Chapter 4, which explicitly discusses the original subsidy act and its drawback provisions. The specific details about timeframes and different treatment for various goods directly reflect Smith's analysis of these historical regulations.
## domain_placement — 5.0 / 5.0
The "Regulation" domain assignment is perfectly appropriate, as this entity describes specific administrative rules and procedures governing trade policy. These drawback rules represent a clear example of governmental regulation of commercial activity.
## vsm_relevance — 4.0 / 5.0
This entity maps well to S3 (internal regulation/audit) as it represents the operational rules and monitoring mechanisms for duty recovery, and potentially to S2 (coordination) as it standardizes procedures across different merchant categories. The regulatory framework has clear VSM relevance for understanding systemic control mechanisms.
## explanatory_value — 4.0 / 5.0
The entity provides genuine explanatory value by illuminating the specific mechanisms through which drawback policies operated and evolved, showing how different commodities and merchant types received differentiated treatment. It reveals structural relations in trade policy administration rather than merely naming a surface phenomenon.