Files

tegwick a9ca0adfcf feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)

Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-23 09:36:46 +01:00

3.6 KiB

Raw Blame History

entity_slug, evaluator, evaluated_at, overall_score, scores

entity_slug

evaluator

evaluated_at

overall_score

scores

economic_system_benchmark

null

2026-02-23T05:13:07.382052

2.6

name	value	max_value	rationale
definition_precision	2.0	5.0	The definition is quite vague and circular, essentially defining benchmarks as "reference points and measures used to evaluate" without specifying what constitutes these benchmarks or how they function. It reads more like a generic description of evaluation criteria than a precise economic concept.

name	value	max_value	rationale
source_grounding	2.0	5.0	While Smith does compare different economic systems in Book IV, the text does not explicitly discuss "benchmarks" as a distinct analytical framework or methodology. This appears to impose modern evaluation terminology onto Smith's comparative analysis rather than extracting a concept he actually articulated.

name	value	max_value	rationale
domain_placement	3.0	5.0	"General Theory" is appropriate given the broad evaluative nature described, though the entity is so abstract it could arguably fit in multiple domains. The placement isn't wrong but reflects the entity's lack of specificity.

name	value	max_value	rationale
vsm_relevance	4.0	5.0	This entity maps well to S3 (internal regulation/audit) and S4 (intelligence/environmental adaptation) functions, as benchmarking involves both performance monitoring and comparative intelligence gathering. The evaluative and comparative aspects align naturally with VSM control and adaptation mechanisms.

name	value	max_value	rationale
explanatory_value	2.0	5.0	The entity provides minimal explanatory power, functioning more as a meta-label for evaluation processes rather than illuminating specific mechanisms or structural relations in Smith's economic theory. It names a general phenomenon without revealing how economic comparison actually works in Smith's framework.

Evaluation: Economic System Benchmark

definition_precision — 2.0 / 5.0

The definition is quite vague and circular, essentially defining benchmarks as "reference points and measures used to evaluate" without specifying what constitutes these benchmarks or how they function. It reads more like a generic description of evaluation criteria than a precise economic concept.

source_grounding — 2.0 / 5.0

While Smith does compare different economic systems in Book IV, the text does not explicitly discuss "benchmarks" as a distinct analytical framework or methodology. This appears to impose modern evaluation terminology onto Smith's comparative analysis rather than extracting a concept he actually articulated.

domain_placement — 3.0 / 5.0

"General Theory" is appropriate given the broad evaluative nature described, though the entity is so abstract it could arguably fit in multiple domains. The placement isn't wrong but reflects the entity's lack of specificity.

vsm_relevance — 4.0 / 5.0

This entity maps well to S3 (internal regulation/audit) and S4 (intelligence/environmental adaptation) functions, as benchmarking involves both performance monitoring and comparative intelligence gathering. The evaluative and comparative aspects align naturally with VSM control and adaptation mechanisms.

explanatory_value — 2.0 / 5.0

The entity provides minimal explanatory power, functioning more as a meta-label for evaluation processes rather than illuminating specific mechanisms or structural relations in Smith's economic theory. It names a general phenomenon without revealing how economic comparison actually works in Smith's framework.

3.6 KiB Raw Blame History