feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)
Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on first pass; 3 failed (network errors). eval-summary --update-metrics written with per_entity_mean=3.9556. Viability dashboard: 6/6 PASS redundancy_ratio 0.0061 (max 0.10) coverage_ratio 0.6190 (min 0.40) coherence_comps 0.0000 (max 3) consistency_cycles 0.0000 (max 0) granularity_entropy 2.6748 (min 1.0) per_entity_mean 3.9556 (min 3.5) Dimension breakdown (mean across 985 entities): definition_precision 3.62 source_grounding 4.36 domain_placement 4.56 vsm_relevance 3.31 explanatory_value 3.94 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,63 @@
|
||||
---
|
||||
entity_slug: fraud_in_drawback_system
|
||||
evaluator: null
|
||||
evaluated_at: '2026-02-23T05:31:00.842843'
|
||||
overall_score: 2.4
|
||||
scores:
|
||||
- name: definition_precision
|
||||
value: 1.0
|
||||
max_value: 5.0
|
||||
rationale: There is no definition provided at all, making it impossible to assess
|
||||
precision or distinctness. The entity exists only as a title without any conceptual
|
||||
content.
|
||||
- name: source_grounding
|
||||
value: 2.0
|
||||
max_value: 5.0
|
||||
rationale: While Smith does discuss drawbacks (export bounties/refunds) and mentions
|
||||
potential for abuse in tax systems, the specific framing as "fraud in drawback
|
||||
system" may not be explicitly articulated as a distinct concept in the source
|
||||
text. Without seeing the actual definition and context, it's unclear if this represents
|
||||
Smith's own conceptualization.
|
||||
- name: domain_placement
|
||||
value: 3.0
|
||||
max_value: 5.0
|
||||
rationale: The concept would logically belong in public finance or trade policy
|
||||
domains, which are central to Smith's work, but without a specified domain or
|
||||
definition, proper placement cannot be confirmed. The economic relevance is apparent
|
||||
but underspecified.
|
||||
- name: vsm_relevance
|
||||
value: 4.0
|
||||
max_value: 5.0
|
||||
rationale: This entity would map well to S3 (internal regulation/audit) as it concerns
|
||||
detecting and preventing abuse within government financial systems. The concept
|
||||
has clear VSM relevance for control and monitoring functions.
|
||||
- name: explanatory_value
|
||||
value: 2.0
|
||||
max_value: 5.0
|
||||
rationale: While fraud in tax/trade systems could illuminate important structural
|
||||
weaknesses in government finance, without any definition or context provided,
|
||||
this entity currently offers no explanatory power beyond naming a potential problem
|
||||
area. It remains a surface-level label rather than an analytical concept.
|
||||
---
|
||||
|
||||
# Evaluation: Fraud In Drawback System
|
||||
|
||||
## definition_precision — 1.0 / 5.0
|
||||
|
||||
There is no definition provided at all, making it impossible to assess precision or distinctness. The entity exists only as a title without any conceptual content.
|
||||
|
||||
## source_grounding — 2.0 / 5.0
|
||||
|
||||
While Smith does discuss drawbacks (export bounties/refunds) and mentions potential for abuse in tax systems, the specific framing as "fraud in drawback system" may not be explicitly articulated as a distinct concept in the source text. Without seeing the actual definition and context, it's unclear if this represents Smith's own conceptualization.
|
||||
|
||||
## domain_placement — 3.0 / 5.0
|
||||
|
||||
The concept would logically belong in public finance or trade policy domains, which are central to Smith's work, but without a specified domain or definition, proper placement cannot be confirmed. The economic relevance is apparent but underspecified.
|
||||
|
||||
## vsm_relevance — 4.0 / 5.0
|
||||
|
||||
This entity would map well to S3 (internal regulation/audit) as it concerns detecting and preventing abuse within government financial systems. The concept has clear VSM relevance for control and monitoring functions.
|
||||
|
||||
## explanatory_value — 2.0 / 5.0
|
||||
|
||||
While fraud in tax/trade systems could illuminate important structural weaknesses in government finance, without any definition or context provided, this entity currently offers no explanatory power beyond naming a potential problem area. It remains a surface-level label rather than an analytical concept.
|
||||
Reference in New Issue
Block a user