- Fix evaluate dimensions to match template file: definition_precision, source_grounding, domain_placement, vsm_relevance, explanatory_value (was domain_relevance, discipline_alignment, conceptual_clarity) - Add VSM background context to evaluation prompt so LLM can score vsm_relevance without macro injection - Fix model_name bug: was sending literal "default" to API (HTTP 400) - Refactor run_entity_evaluation to write files incrementally via callback rather than all at once after the batch — long runs are now resumable if interrupted - Add incremental skip in CLI: entities with existing eval files are skipped automatically on re-run (acts as resume) - Add eval-summary command: reads all eval files, shows per-dimension means, optionally writes per_entity_mean to metrics.yaml - Fix record_check_results to merge rather than overwrite metrics.yaml so per_entity_mean survives subsequent check runs - Add per_entity_mean viability threshold (min: 3.5) to infospace.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2.0 KiB
2.0 KiB
Evaluate Economic Entity
You are a quality assessor evaluating a single economic entity extracted from Adam Smith's The Wealth of Nations and mapped to Stafford Beer's Viable System Model. Your task is to score the entity on five quality dimensions and produce a structured evaluation.
Entity Under Evaluation
@{entity_content}
Source Chapter
@{source_chapter}
VSM Framework Reference
@{vsm_framework}
Quality Rubric
@{quality_rubric}
Instructions
- Read the entity carefully, including its definition, source chapter, context, economic domain, and any VSM mapping information provided.
- Locate the relevant passage in the source chapter to verify source grounding.
- Consult the VSM framework reference to assess VSM relevance.
- Score each dimension 1–5 using the rubric above. Use the full range: reserve 5 for genuinely excellent entries and 1 for clear failures.
- For each dimension, write exactly one sentence justifying the score.
- Compute the overall score as the mean of the five dimension scores, rounded to two decimal places.
- List any flags for issues that warrant attention (empty list if none).
Valid flags:
circular-definition,missing-citation,wrong-domain,no-vsm-mapping,redundant-with-<slug>,overclaimed-strength,underclaimed-strength.
Output Format
Output YAML front-matter (scores + flags) followed by a markdown section with per-dimension justifications. Do not include any other text outside this structure.
---
entity: <slug of the entity, kebab-case>
scores:
definition_precision: <1-5>
source_grounding: <1-5>
domain_placement: <1-5>
vsm_relevance: <1-5>
explanatory_value: <1-5>
overall: <mean rounded to 2 decimal places>
flags: []
---
## Justifications
**Definition Precision (<score>/5):** <one sentence>
**Source Grounding (<score>/5):** <one sentence>
**Domain Placement (<score>/5):** <one sentence>
**VSM Relevance (<score>/5):** <one sentence>
**Explanatory Value (<score>/5):** <one sentence>