Files
markitect-main/examples/infospace-with-history/output/evaluations/economic_system_benchmark.md
tegwick a9ca0adfcf feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)
Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 09:36:46 +01:00

3.6 KiB

entity_slug, evaluator, evaluated_at, overall_score, scores
entity_slug evaluator evaluated_at overall_score scores
economic_system_benchmark null 2026-02-23T05:13:07.382052 2.6
name value max_value rationale
definition_precision 2.0 5.0 The definition is quite vague and circular, essentially defining benchmarks as "reference points and measures used to evaluate" without specifying what constitutes these benchmarks or how they function. It reads more like a generic description of evaluation criteria than a precise economic concept.
name value max_value rationale
source_grounding 2.0 5.0 While Smith does compare different economic systems in Book IV, the text does not explicitly discuss "benchmarks" as a distinct analytical framework or methodology. This appears to impose modern evaluation terminology onto Smith's comparative analysis rather than extracting a concept he actually articulated.
name value max_value rationale
domain_placement 3.0 5.0 "General Theory" is appropriate given the broad evaluative nature described, though the entity is so abstract it could arguably fit in multiple domains. The placement isn't wrong but reflects the entity's lack of specificity.
name value max_value rationale
vsm_relevance 4.0 5.0 This entity maps well to S3 (internal regulation/audit) and S4 (intelligence/environmental adaptation) functions, as benchmarking involves both performance monitoring and comparative intelligence gathering. The evaluative and comparative aspects align naturally with VSM control and adaptation mechanisms.
name value max_value rationale
explanatory_value 2.0 5.0 The entity provides minimal explanatory power, functioning more as a meta-label for evaluation processes rather than illuminating specific mechanisms or structural relations in Smith's economic theory. It names a general phenomenon without revealing how economic comparison actually works in Smith's framework.

Evaluation: Economic System Benchmark

definition_precision — 2.0 / 5.0

The definition is quite vague and circular, essentially defining benchmarks as "reference points and measures used to evaluate" without specifying what constitutes these benchmarks or how they function. It reads more like a generic description of evaluation criteria than a precise economic concept.

source_grounding — 2.0 / 5.0

While Smith does compare different economic systems in Book IV, the text does not explicitly discuss "benchmarks" as a distinct analytical framework or methodology. This appears to impose modern evaluation terminology onto Smith's comparative analysis rather than extracting a concept he actually articulated.

domain_placement — 3.0 / 5.0

"General Theory" is appropriate given the broad evaluative nature described, though the entity is so abstract it could arguably fit in multiple domains. The placement isn't wrong but reflects the entity's lack of specificity.

vsm_relevance — 4.0 / 5.0

This entity maps well to S3 (internal regulation/audit) and S4 (intelligence/environmental adaptation) functions, as benchmarking involves both performance monitoring and comparative intelligence gathering. The evaluative and comparative aspects align naturally with VSM control and adaptation mechanisms.

explanatory_value — 2.0 / 5.0

The entity provides minimal explanatory power, functioning more as a meta-label for evaluation processes rather than illuminating specific mechanisms or structural relations in Smith's economic theory. It names a general phenomenon without revealing how economic comparison actually works in Smith's framework.