feat(infospace): add eval-summary command and improve evaluate pipeline (S3.3)

- Fix evaluate dimensions to match template file:
  definition_precision, source_grounding, domain_placement,
  vsm_relevance, explanatory_value (was domain_relevance,
  discipline_alignment, conceptual_clarity)
- Add VSM background context to evaluation prompt so LLM can
  score vsm_relevance without macro injection
- Fix model_name bug: was sending literal "default" to API (HTTP 400)
- Refactor run_entity_evaluation to write files incrementally via
  callback rather than all at once after the batch — long runs are
  now resumable if interrupted
- Add incremental skip in CLI: entities with existing eval files
  are skipped automatically on re-run (acts as resume)
- Add eval-summary command: reads all eval files, shows per-dimension
  means, optionally writes per_entity_mean to metrics.yaml
- Fix record_check_results to merge rather than overwrite metrics.yaml
  so per_entity_mean survives subsequent check runs
- Add per_entity_mean viability threshold (min: 3.5) to infospace.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 01:26:45 +01:00
parent 574bb11db6
commit 7f1eecbdb2
7 changed files with 242 additions and 42 deletions

View File

@@ -934,3 +934,29 @@
concern: C1
metadata:
source: collection-checks
- snapshot_id: 090bb961
created_at: '2026-02-23T00:22:25.818146+00:00'
schema_name: default
entity_count: 988
entity_evaluations: []
collection_metrics:
- name: coherence_components
value: 0.0
concern: C3
- name: consistency_cycles
value: 0.0
concern: C4
- name: coverage_ratio
value: 0.6190476190476191
concern: C2
- name: granularity_entropy
value: 2.6747519428200657
concern: C5
- name: modularity
value: 0.0
concern: C3
- name: redundancy_ratio
value: 0.006072874493927126
concern: C1
metadata:
source: collection-checks

View File

@@ -1,6 +1,7 @@
coherence_components: 0.0
consistency_cycles: 0.0
coverage_ratio: 0.442424
granularity_entropy: 2.953326
coverage_ratio: 0.619048
granularity_entropy: 2.674752
modularity: 0.0
redundancy_ratio: 0.005877
per_entity_mean: 4.42
redundancy_ratio: 0.006073