Files
markitect-main/examples/infospace-with-history/output/evaluations/assaying.md
tegwick a9ca0adfcf feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)
Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 09:36:46 +01:00

3.3 KiB

entity_slug, evaluator, evaluated_at, overall_score, scores
entity_slug evaluator evaluated_at overall_score scores
assaying null 2026-02-23T00:35:54.499834 4.4
name value max_value rationale
definition_precision 4.0 5.0 The definition is precise and captures a distinct technical process - testing metal purity to verify quality in exchange. It clearly distinguishes assaying from other metal-related processes and specifies its role in addressing uncertainty about metal quality.
name value max_value rationale
source_grounding 5.0 5.0 This entity is directly grounded in Smith's text where he explicitly discusses the inconveniences of using unstamped metals, including the difficulty and uncertainty of determining their purity. The concept emerges naturally from Smith's analysis of early exchange mechanisms.
name value max_value rationale
domain_placement 5.0 5.0 The "Exchange" domain placement is correct, as assaying is fundamentally about verifying the quality of exchange media (metals) to facilitate trade. This fits perfectly within Smith's discussion of the evolution of exchange mechanisms.
name value max_value rationale
vsm_relevance 4.0 5.0 Assaying maps well to S3 (internal regulation/audit) as it represents a quality control and verification function within exchange systems. It could also relate to S2 (coordination) by reducing uncertainty and enabling smoother transactions between parties.
name value max_value rationale
explanatory_value 4.0 5.0 This entity illuminates a crucial mechanism in the evolution of money - how societies addressed the problem of verifying exchange media quality. It helps explain why stamped coinage emerged as a solution to the assaying problem, showing structural relations in monetary development.

Evaluation: Assaying

definition_precision — 4.0 / 5.0

The definition is precise and captures a distinct technical process - testing metal purity to verify quality in exchange. It clearly distinguishes assaying from other metal-related processes and specifies its role in addressing uncertainty about metal quality.

source_grounding — 5.0 / 5.0

This entity is directly grounded in Smith's text where he explicitly discusses the inconveniences of using unstamped metals, including the difficulty and uncertainty of determining their purity. The concept emerges naturally from Smith's analysis of early exchange mechanisms.

domain_placement — 5.0 / 5.0

The "Exchange" domain placement is correct, as assaying is fundamentally about verifying the quality of exchange media (metals) to facilitate trade. This fits perfectly within Smith's discussion of the evolution of exchange mechanisms.

vsm_relevance — 4.0 / 5.0

Assaying maps well to S3 (internal regulation/audit) as it represents a quality control and verification function within exchange systems. It could also relate to S2 (coordination) by reducing uncertainty and enabling smoother transactions between parties.

explanatory_value — 4.0 / 5.0

This entity illuminates a crucial mechanism in the evolution of money - how societies addressed the problem of verifying exchange media quality. It helps explain why stamped coinage emerged as a solution to the assaying problem, showing structural relations in monetary development.