Files

tegwick a9ca0adfcf feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)

Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-23 09:36:46 +01:00

3.9 KiB

Raw Blame History

entity_slug, evaluator, evaluated_at, overall_score, scores

entity_slug

evaluator

evaluated_at

overall_score

scores

economic_system_outcomes

null

2026-02-23T05:19:26.313408

3.0

name	value	max_value	rationale
definition_precision	3.0	5.0	The definition captures a meaningful concept about evaluating economic system results, but it's somewhat broad and could be more precise about what constitutes "outcomes" versus other related concepts like performance metrics or effectiveness measures. The inclusion of both wealth generation and distribution effects adds useful specificity.

name	value	max_value	rationale
source_grounding	2.0	5.0	While Smith does compare different economic systems throughout Book IV, this entity appears to abstract beyond what the source explicitly discusses - Smith focuses more on the mechanisms and principles of systems rather than systematically analyzing "outcomes" as a distinct analytical category. The concept feels more like a modern analytical framework imposed on Smith's work.

name	value	max_value	rationale
domain_placement	4.0	5.0	"General Theory" is appropriate since this concept would span across Smith's analysis of different economic arrangements rather than belonging to a specific system like mercantilism or physiocracy. The cross-cutting nature of evaluating system results fits well in this domain.

name	value	max_value	rationale
vsm_relevance	4.0	5.0	This entity maps well to S3 (internal regulation/audit) as it concerns monitoring and evaluating system performance, and potentially S4 (intelligence) as outcomes inform adaptation and policy decisions. The evaluative nature of outcomes assessment is central to viable system functioning.

name	value	max_value	rationale
explanatory_value	2.0	5.0	This entity primarily names a category of analysis rather than illuminating specific mechanisms or structural relations that Smith identifies. It's more of a meta-analytical concept that could apply to any economic system evaluation rather than revealing particular insights about how wealth creation actually works.

Evaluation: Economic System Outcomes

definition_precision — 3.0 / 5.0

The definition captures a meaningful concept about evaluating economic system results, but it's somewhat broad and could be more precise about what constitutes "outcomes" versus other related concepts like performance metrics or effectiveness measures. The inclusion of both wealth generation and distribution effects adds useful specificity.

source_grounding — 2.0 / 5.0

While Smith does compare different economic systems throughout Book IV, this entity appears to abstract beyond what the source explicitly discusses - Smith focuses more on the mechanisms and principles of systems rather than systematically analyzing "outcomes" as a distinct analytical category. The concept feels more like a modern analytical framework imposed on Smith's work.

domain_placement — 4.0 / 5.0

"General Theory" is appropriate since this concept would span across Smith's analysis of different economic arrangements rather than belonging to a specific system like mercantilism or physiocracy. The cross-cutting nature of evaluating system results fits well in this domain.

vsm_relevance — 4.0 / 5.0

This entity maps well to S3 (internal regulation/audit) as it concerns monitoring and evaluating system performance, and potentially S4 (intelligence) as outcomes inform adaptation and policy decisions. The evaluative nature of outcomes assessment is central to viable system functioning.

explanatory_value — 2.0 / 5.0

This entity primarily names a category of analysis rather than illuminating specific mechanisms or structural relations that Smith identifies. It's more of a meta-analytical concept that could apply to any economic system evaluation rather than revealing particular insights about how wealth creation actually works.

3.9 KiB Raw Blame History