feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)

Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 09:36:46 +01:00
parent 81a4c8796a
commit a9ca0adfcf
986 changed files with 63216 additions and 1 deletions

View File

@@ -0,0 +1,61 @@
---
entity_slug: colony_economic_system_implementation
evaluator: null
evaluated_at: '2026-02-23T04:54:27.728479'
overall_score: 1.2
scores:
- name: definition_precision
value: 1.0
max_value: 5.0
rationale: There is no definition provided at all, making this entity completely
imprecise. Without any definitional content, it's impossible to assess whether
this captures a distinct concept or represents a vague umbrella term.
- name: source_grounding
value: 1.0
max_value: 5.0
rationale: With no definition, context, or source chapter specified, there's no
evidence this entity is grounded in Smith's actual text. The generic title suggests
it may be an artificial construct rather than a concept Smith explicitly developed.
- name: domain_placement
value: 2.0
max_value: 5.0
rationale: While "Colony Economic System Implementation" sounds economically relevant
to Smith's work on colonial trade, the unspecified domain and lack of definition
make it impossible to verify correct thematic placement. The title alone suggests
economic relevance but provides no substantive content to evaluate.
- name: vsm_relevance
value: 1.0
max_value: 5.0
rationale: Without any definition or context, it's impossible to determine which
VSM system this entity might map to or whether it has any VSM relevance at all.
The term "implementation" could theoretically relate to S1 operations, but this
is pure speculation given the lack of content.
- name: explanatory_value
value: 1.0
max_value: 5.0
rationale: An entity with no definition, context, or source grounding provides zero
explanatory power. It neither illuminates mechanisms nor structural relations,
functioning merely as an empty label that names nothing substantive.
---
# Evaluation: Colony Economic System Implementation
## definition_precision — 1.0 / 5.0
There is no definition provided at all, making this entity completely imprecise. Without any definitional content, it's impossible to assess whether this captures a distinct concept or represents a vague umbrella term.
## source_grounding — 1.0 / 5.0
With no definition, context, or source chapter specified, there's no evidence this entity is grounded in Smith's actual text. The generic title suggests it may be an artificial construct rather than a concept Smith explicitly developed.
## domain_placement — 2.0 / 5.0
While "Colony Economic System Implementation" sounds economically relevant to Smith's work on colonial trade, the unspecified domain and lack of definition make it impossible to verify correct thematic placement. The title alone suggests economic relevance but provides no substantive content to evaluate.
## vsm_relevance — 1.0 / 5.0
Without any definition or context, it's impossible to determine which VSM system this entity might map to or whether it has any VSM relevance at all. The term "implementation" could theoretically relate to S1 operations, but this is pure speculation given the lack of content.
## explanatory_value — 1.0 / 5.0
An entity with no definition, context, or source grounding provides zero explanatory power. It neither illuminates mechanisms nor structural relations, functioning merely as an empty label that names nothing substantive.