feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)

Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 09:36:46 +01:00
parent 81a4c8796a
commit a9ca0adfcf
986 changed files with 63216 additions and 1 deletions

View File

@@ -0,0 +1,62 @@
---
entity_slug: non_enumerated_commodities
evaluator: null
evaluated_at: '2026-02-23T06:02:36.867347'
overall_score: 1.4
scores:
- name: definition_precision
value: 1.0
max_value: 5.0
rationale: There is no definition provided at all, making this entity completely
imprecise. Without any definitional content, it's impossible to assess whether
this captures a distinct concept or represents a vague umbrella term.
- name: source_grounding
value: 2.0
max_value: 5.0
rationale: While "non-enumerated commodities" appears to reference a legitimate
historical concept from colonial trade policy (goods not specifically listed in
navigation acts), there's no evidence this entity is actually grounded in Smith's
text. The complete absence of source chapter information and context suggests
poor connection to the actual corpus.
- name: domain_placement
value: 1.0
max_value: 5.0
rationale: The domain is listed as "unspecified," making it impossible to assess
whether the economic/thematic categorization is appropriate. This represents a
fundamental failure in conceptual organization within the infospace.
- name: vsm_relevance
value: 2.0
max_value: 5.0
rationale: If properly defined, this concept might relate to S1 (operational trade
categories) or S4 (environmental adaptation to regulatory frameworks), but the
complete lack of definition makes VSM mapping speculative at best. The entity
is too underdeveloped to meaningfully place within the VSM framework.
- name: explanatory_value
value: 1.0
max_value: 5.0
rationale: With no definition, context, or source grounding, this entity provides
zero explanatory power about economic mechanisms or structural relations. It currently
functions as nothing more than an empty label that illuminates no phenomena whatsoever.
---
# Evaluation: Non Enumerated Commodities
## definition_precision — 1.0 / 5.0
There is no definition provided at all, making this entity completely imprecise. Without any definitional content, it's impossible to assess whether this captures a distinct concept or represents a vague umbrella term.
## source_grounding — 2.0 / 5.0
While "non-enumerated commodities" appears to reference a legitimate historical concept from colonial trade policy (goods not specifically listed in navigation acts), there's no evidence this entity is actually grounded in Smith's text. The complete absence of source chapter information and context suggests poor connection to the actual corpus.
## domain_placement — 1.0 / 5.0
The domain is listed as "unspecified," making it impossible to assess whether the economic/thematic categorization is appropriate. This represents a fundamental failure in conceptual organization within the infospace.
## vsm_relevance — 2.0 / 5.0
If properly defined, this concept might relate to S1 (operational trade categories) or S4 (environmental adaptation to regulatory frameworks), but the complete lack of definition makes VSM mapping speculative at best. The entity is too underdeveloped to meaningfully place within the VSM framework.
## explanatory_value — 1.0 / 5.0
With no definition, context, or source grounding, this entity provides zero explanatory power about economic mechanisms or structural relations. It currently functions as nothing more than an empty label that illuminates no phenomena whatsoever.