Files
markitect-main/examples/infospace-with-history/output/evaluations/bank_financial_stability_metrics.md
tegwick a9ca0adfcf feat(example): add per-entity LLM evaluations for 985 WoN entities (S3.3)
Batch evaluation of all 988 entities via OpenRouter. 984 succeeded on
first pass; 3 failed (network errors). eval-summary --update-metrics
written with per_entity_mean=3.9556.

Viability dashboard: 6/6 PASS
  redundancy_ratio   0.0061  (max 0.10)
  coverage_ratio     0.6190  (min 0.40)
  coherence_comps    0.0000  (max 3)
  consistency_cycles 0.0000  (max 0)
  granularity_entropy 2.6748 (min 1.0)
  per_entity_mean    3.9556  (min 3.5)

Dimension breakdown (mean across 985 entities):
  definition_precision  3.62
  source_grounding      4.36
  domain_placement      4.56
  vsm_relevance         3.31
  explanatory_value     3.94

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 09:36:46 +01:00

3.7 KiB

entity_slug, evaluator, evaluated_at, overall_score, scores
entity_slug evaluator evaluated_at overall_score scores
bank_financial_stability_metrics null 2026-02-23T00:41:51.431624 2.4
name value max_value rationale
definition_precision 2.0 5.0 The definition lists modern banking metrics (capital adequacy ratios, liquidity coverage ratios, stress tests) that are anachronistic for Smith's era and creates a vague umbrella term rather than capturing a distinct concept. The circular phrasing "measures used to assess banking stability" that "help evaluate banking stability" lacks precision.
name value max_value rationale
source_grounding 1.0 5.0 This entity introduces modern regulatory banking concepts that did not exist in Smith's time and are not discussed in Book II, Chapter 2, which focuses on the nature and accumulation of stock. Smith's banking analysis lacks the sophisticated quantitative metrics described here.
name value max_value rationale
domain_placement 3.0 5.0 While "Regulation" is a reasonable domain for banking oversight concepts, the entity would be better placed in a "Banking" or "Financial Systems" domain since it focuses on internal bank assessment rather than external regulatory frameworks. The domain assignment is defensible but not optimal.
name value max_value rationale
vsm_relevance 4.0 5.0 This entity maps well to VSM System 3 (internal regulation/audit) as these metrics represent monitoring and control mechanisms for assessing organizational health. The concept of systematic measurement for stability aligns naturally with the regulatory/audit function.
name value max_value rationale
explanatory_value 2.0 5.0 While banking stability assessment is conceptually important, this entity merely names modern measurement categories without illuminating underlying mechanisms or structural relationships that Smith actually explored. It adds little explanatory power to understanding Smith's economic framework.

Evaluation: Bank Financial Stability Metrics

definition_precision — 2.0 / 5.0

The definition lists modern banking metrics (capital adequacy ratios, liquidity coverage ratios, stress tests) that are anachronistic for Smith's era and creates a vague umbrella term rather than capturing a distinct concept. The circular phrasing "measures used to assess banking stability" that "help evaluate banking stability" lacks precision.

source_grounding — 1.0 / 5.0

This entity introduces modern regulatory banking concepts that did not exist in Smith's time and are not discussed in Book II, Chapter 2, which focuses on the nature and accumulation of stock. Smith's banking analysis lacks the sophisticated quantitative metrics described here.

domain_placement — 3.0 / 5.0

While "Regulation" is a reasonable domain for banking oversight concepts, the entity would be better placed in a "Banking" or "Financial Systems" domain since it focuses on internal bank assessment rather than external regulatory frameworks. The domain assignment is defensible but not optimal.

vsm_relevance — 4.0 / 5.0

This entity maps well to VSM System 3 (internal regulation/audit) as these metrics represent monitoring and control mechanisms for assessing organizational health. The concept of systematic measurement for stability aligns naturally with the regulatory/audit function.

explanatory_value — 2.0 / 5.0

While banking stability assessment is conceptually important, this entity merely names modern measurement categories without illuminating underlying mechanisms or structural relationships that Smith actually explored. It adds little explanatory power to understanding Smith's economic framework.