feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)

Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 09:36:46 +01:00
parent 81a4c8796a
commit a9ca0adfcf
986 changed files with 63216 additions and 1 deletions

View File

@@ -0,0 +1,64 @@
---
entity_slug: systemic_stability_analysis
evaluator: null
evaluated_at: '2026-02-23T06:28:40.185979'
overall_score: 2.6
scores:
- name: definition_precision
value: 2.0
max_value: 5.0
rationale: The definition is quite vague and circular, using terms like "stability
and resilience" without clear operational meaning. It functions more as an umbrella
term for various economic considerations rather than capturing a distinct analytical
concept.
- name: source_grounding
value: 2.0
max_value: 5.0
rationale: While Smith does discuss economic stability in Book IV, Chapter 6, he
doesn't present "systemic stability analysis" as a formal analytical framework
or methodology. This appears to impose modern systems thinking terminology onto
Smith's more contextual discussions of policy effects.
- name: domain_placement
value: 4.0
max_value: 5.0
rationale: '"General Theory" is appropriate since this concept, if it exists in
Smith, would span multiple economic domains rather than belonging to a specific
area like trade or production. The broad theoretical nature fits this classification
well.'
- name: vsm_relevance
value: 3.0
max_value: 5.0
rationale: This concept could potentially map to S3 (internal regulation) or S4
(intelligence/adaptation) functions, as it involves monitoring system health and
adapting to maintain stability. However, the vague definition makes precise VSM
mapping difficult.
- name: explanatory_value
value: 2.0
max_value: 5.0
rationale: As currently defined, this entity provides little explanatory power beyond
naming the general concern for economic stability. It doesn't illuminate specific
mechanisms, trade-offs, or structural relationships that Smith actually analyzes
in his work.
---
# Evaluation: Systemic Stability Analysis
## definition_precision — 2.0 / 5.0
The definition is quite vague and circular, using terms like "stability and resilience" without clear operational meaning. It functions more as an umbrella term for various economic considerations rather than capturing a distinct analytical concept.
## source_grounding — 2.0 / 5.0
While Smith does discuss economic stability in Book IV, Chapter 6, he doesn't present "systemic stability analysis" as a formal analytical framework or methodology. This appears to impose modern systems thinking terminology onto Smith's more contextual discussions of policy effects.
## domain_placement — 4.0 / 5.0
"General Theory" is appropriate since this concept, if it exists in Smith, would span multiple economic domains rather than belonging to a specific area like trade or production. The broad theoretical nature fits this classification well.
## vsm_relevance — 3.0 / 5.0
This concept could potentially map to S3 (internal regulation) or S4 (intelligence/adaptation) functions, as it involves monitoring system health and adapting to maintain stability. However, the vague definition makes precise VSM mapping difficult.
## explanatory_value — 2.0 / 5.0
As currently defined, this entity provides little explanatory power beyond naming the general concern for economic stability. It doesn't illuminate specific mechanisms, trade-offs, or structural relationships that Smith actually analyzes in his work.