Files

tegwick a9ca0adfcf feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)

Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-23 09:36:46 +01:00

3.5 KiB

Raw Blame History

entity_slug, evaluator, evaluated_at, overall_score, scores

entity_slug

evaluator

evaluated_at

overall_score

scores

colony_economic_system_outcomes

null

2026-02-23T04:55:04.529955

2.6

name	value	max_value	rationale
definition_precision	2.0	5.0	The definition is vague and circular, essentially saying "outcomes are results that fell short due to flaws" without specifying what constitutes these outcomes or providing measurable criteria. It functions more as a general evaluative statement than a precise conceptual definition.

name	value	max_value	rationale
source_grounding	4.0	5.0	This is well-grounded in Smith's actual analysis in Book V, Chapter 3, where he systematically evaluates the disappointing results of colonial policies and traces them to mercantilist misconceptions. The entity accurately reflects Smith's empirical assessment of colonial economic performance.

name	value	max_value	rationale
domain_placement	3.0	5.0	While "Regulation" captures the policy evaluation aspect, this entity spans multiple domains since it encompasses trade outcomes, fiscal results, and broader economic effects. It might be better placed in a more comprehensive domain like "Colonial Economics" or "Policy Analysis."

name	value	max_value	rationale
vsm_relevance	2.0	5.0	This entity represents evaluative outcomes rather than operational mechanisms, making it difficult to map to specific VSM systems. It's more of a performance assessment that could theoretically relate to S3 (audit) but lacks the structural specificity that VSM analysis requires.

name	value	max_value	rationale
explanatory_value	2.0	5.0	The entity merely labels disappointing results without illuminating the underlying mechanisms that produced these outcomes or the structural relationships between colonial policies and their effects. It describes a surface phenomenon rather than explaining causal processes.

Evaluation: Colony Economic System Outcomes

definition_precision — 2.0 / 5.0

The definition is vague and circular, essentially saying "outcomes are results that fell short due to flaws" without specifying what constitutes these outcomes or providing measurable criteria. It functions more as a general evaluative statement than a precise conceptual definition.

source_grounding — 4.0 / 5.0

This is well-grounded in Smith's actual analysis in Book V, Chapter 3, where he systematically evaluates the disappointing results of colonial policies and traces them to mercantilist misconceptions. The entity accurately reflects Smith's empirical assessment of colonial economic performance.

domain_placement — 3.0 / 5.0

While "Regulation" captures the policy evaluation aspect, this entity spans multiple domains since it encompasses trade outcomes, fiscal results, and broader economic effects. It might be better placed in a more comprehensive domain like "Colonial Economics" or "Policy Analysis."

vsm_relevance — 2.0 / 5.0

This entity represents evaluative outcomes rather than operational mechanisms, making it difficult to map to specific VSM systems. It's more of a performance assessment that could theoretically relate to S3 (audit) but lacks the structural specificity that VSM analysis requires.

explanatory_value — 2.0 / 5.0

The entity merely labels disappointing results without illuminating the underlying mechanisms that produced these outcomes or the structural relationships between colonial policies and their effects. It describes a surface phenomenon rather than explaining causal processes.

3.5 KiB Raw Blame History