feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)
Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on first pass; 3 failed (network errors). eval-summary --update-metrics written with per_entity_mean=3.9556. Viability dashboard: 6/6 PASS redundancy_ratio 0.0061 (max 0.10) coverage_ratio 0.6190 (min 0.40) coherence_comps 0.0000 (max 3) consistency_cycles 0.0000 (max 0) granularity_entropy 2.6748 (min 1.0) per_entity_mean 3.9556 (min 3.5) Dimension breakdown (mean across 985 entities): definition_precision 3.62 source_grounding 4.36 domain_placement 4.56 vsm_relevance 3.31 explanatory_value 3.94 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
64
examples/infospace-with-history/output/evaluations/taille.md
Normal file
64
examples/infospace-with-history/output/evaluations/taille.md
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
entity_slug: taille
|
||||
evaluator: null
|
||||
evaluated_at: '2026-02-23T06:28:57.228034'
|
||||
overall_score: 4.6
|
||||
scores:
|
||||
- name: definition_precision
|
||||
value: 4.0
|
||||
max_value: 5.0
|
||||
rationale: The definition clearly specifies the taille as a French land tax based
|
||||
on supposed profits from farm stock, with well-articulated perverse incentives.
|
||||
It's precise and non-circular, though could benefit from slightly more detail
|
||||
about the assessment mechanism.
|
||||
- name: source_grounding
|
||||
value: 5.0
|
||||
max_value: 5.0
|
||||
rationale: This is directly grounded in Smith's text from Book III, Chapter 2, where
|
||||
he explicitly discusses the taille as a problematic French tax system. The description
|
||||
of its effects on cultivation and investment aligns closely with Smith's analysis.
|
||||
- name: domain_placement
|
||||
value: 5.0
|
||||
max_value: 5.0
|
||||
rationale: '"Regulation" is the correct domain placement, as the taille represents
|
||||
a specific regulatory mechanism (taxation) that creates systematic economic distortions.
|
||||
It fits perfectly within Smith''s broader analysis of how regulatory frameworks
|
||||
affect economic behavior.'
|
||||
- name: vsm_relevance
|
||||
value: 4.0
|
||||
max_value: 5.0
|
||||
rationale: This maps well to S3 (internal regulation/audit) as a regulatory mechanism
|
||||
that's supposed to monitor and extract value from economic activity, though it's
|
||||
a dysfunctional one. It also touches on S4 concerns about how regulatory systems
|
||||
adapt (or fail to adapt) to economic realities.
|
||||
- name: explanatory_value
|
||||
value: 5.0
|
||||
max_value: 5.0
|
||||
rationale: This entity illuminates a crucial mechanism showing how tax design creates
|
||||
systematic incentives that distort economic behavior, serving as a concrete example
|
||||
of Smith's broader principles about the relationship between institutional design
|
||||
and economic outcomes. It demonstrates structural causation rather than just naming
|
||||
a phenomenon.
|
||||
---
|
||||
|
||||
# Evaluation: Taille
|
||||
|
||||
## definition_precision — 4.0 / 5.0
|
||||
|
||||
The definition clearly specifies the taille as a French land tax based on supposed profits from farm stock, with well-articulated perverse incentives. It's precise and non-circular, though could benefit from slightly more detail about the assessment mechanism.
|
||||
|
||||
## source_grounding — 5.0 / 5.0
|
||||
|
||||
This is directly grounded in Smith's text from Book III, Chapter 2, where he explicitly discusses the taille as a problematic French tax system. The description of its effects on cultivation and investment aligns closely with Smith's analysis.
|
||||
|
||||
## domain_placement — 5.0 / 5.0
|
||||
|
||||
"Regulation" is the correct domain placement, as the taille represents a specific regulatory mechanism (taxation) that creates systematic economic distortions. It fits perfectly within Smith's broader analysis of how regulatory frameworks affect economic behavior.
|
||||
|
||||
## vsm_relevance — 4.0 / 5.0
|
||||
|
||||
This maps well to S3 (internal regulation/audit) as a regulatory mechanism that's supposed to monitor and extract value from economic activity, though it's a dysfunctional one. It also touches on S4 concerns about how regulatory systems adapt (or fail to adapt) to economic realities.
|
||||
|
||||
## explanatory_value — 5.0 / 5.0
|
||||
|
||||
This entity illuminates a crucial mechanism showing how tax design creates systematic incentives that distort economic behavior, serving as a concrete example of Smith's broader principles about the relationship between institutional design and economic outcomes. It demonstrates structural causation rather than just naming a phenomenon.
|
||||
Reference in New Issue
Block a user