feat(infospace): add eval-summary command and improve evaluate pipeline (S3.3)

- Fix evaluate dimensions to match template file:
  definition_precision, source_grounding, domain_placement,
  vsm_relevance, explanatory_value (was domain_relevance,
  discipline_alignment, conceptual_clarity)
- Add VSM background context to evaluation prompt so LLM can
  score vsm_relevance without macro injection
- Fix model_name bug: was sending literal "default" to API (HTTP 400)
- Refactor run_entity_evaluation to write files incrementally via
  callback rather than all at once after the batch — long runs are
  now resumable if interrupted
- Add incremental skip in CLI: entities with existing eval files
  are skipped automatically on re-run (acts as resume)
- Add eval-summary command: reads all eval files, shows per-dimension
  means, optionally writes per_entity_mean to metrics.yaml
- Fix record_check_results to merge rather than overwrite metrics.yaml
  so per_entity_mean survives subsequent check runs
- Add per_entity_mean viability threshold (min: 3.5) to infospace.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

This commit is contained in:

Bernd Worsch

2026-02-23 01:26:45 +01:00

parent 574bb11db6

commit 7f1eecbdb2

7 changed files with 242 additions and 42 deletions

									
										2

examples/infospace-with-history/infospace.yaml
									
												View File
												
				@@ -37,6 +37,8 @@ viability:

				    max: 0

				  granularity_entropy:

				    min: 1.0

				  per_entity_mean:

				    min: 3.5  # LLM quality score across 5 dimensions (1-5 scale)

				pipeline:

				  stages:

feat(infospace): add eval-summary command and improve evaluate pipeline (S3.3)

2 examples/infospace-with-history/infospace.yaml Unescape Escape View File

2

examples/infospace-with-history/infospace.yaml

View File