generated from coulomb/repo-seed
b0d67ae79e9174c1e24c3317242920dbfb5f7814
Add --shadow-baseline <id> and --shadow-rate <float> opt-in flags to
generate run, generate resume, and generate from-source. When
--shadow-baseline names a candidate id from the routing config,
build_routing_policy_from_config wraps every other candidate in an
llm-connect ShadowingAdapter using that baseline plus a
PairedGrader(ExactMatchJudge()) and the workspace-resolved
QualityLedger. The baseline candidate itself is never wrapped — that
would shadow it against itself. --shadow-rate defaults to 0.1 when
--shadow-baseline is set; passing --shadow-rate without
--shadow-baseline fails fast with shadow_rate_without_baseline.
Setting --shadow-baseline without a ledger_path in the config fails
with missing_routing_ledger_for_shadow so observations have a place to
land before any call goes out.
run_generation grew shadow_baseline + shadow_rate kwargs and
_adapter_for("routing", ...) plumbs them into
build_routing_policy_from_config. The wrapped ShadowingAdapter slots
into the policy's prefer/fallback per task type via a
(candidate_id, task_type) reverse lookup, and adapters_by_id on the
adaptive policy gets the string-keyed entries.
Five new tests cover: shadow_rate without baseline fails fast, shadow
mode without a ledger fails fast, unknown shadow baseline id fails
fast, structural assertion that ShadowingAdapter wraps non-baseline
candidates and leaves the baseline raw, and a behavioural check that
shadow_rate=1.0 calls the baseline on every call while shadow_rate=0.0
skips entirely. Test forces async_shadow=False so the call counter is
deterministic.
Closes IB-WP-0020: T01-T05 all done. Workplan status flips from active
to finished. 179 tests pass, 2 skipped (both live OpenRouter smokes).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
infospace-bench
Workspace and service for creating, developing, evaluating, and inspecting structured knowledge spaces.
This repo is the application-layer successor for the infospace work that began
inside markitect-main. It focuses on concrete infospaces and their lifecycle,
while lower-level markdown tooling and runtime orchestration remain in sibling
projects.
Start with:
INTENT.mdwiki/ProductRequirementsDocument.mdwiki/FunctionalRequirementsSpecification.mdSCOPE.mddocs/infospace-layout.mddocs/evaluation-and-inspection.mddocs/reference-pilot-decision.mddocs/markitect-main-scope-assessment.mddocs/markitect-tool-adapter.mddocs/entity-relation-model.mddocs/evaluation-history-and-metrics.mddocs/workflow-generation-pipeline.mddocs/kontextual-engine-boundary.mddocs/orthogonal-successor-roadmap.mddocs/legacy-infospace-feature-inventory.mddocs/successor-boundary-interface-map.mddocs/replacement-acceptance-matrix.mddocs/legacy-command-parity.mddocs/legacy-infospace-migration-guide.mddocs/replacement-readiness-decision.mddocs/wealth-vsm-generation-pipeline.mddocs/generic-source-generator.mddocs/agentic-memory-profile-pilot.mddocs/lefevre-epub3-validation.mdinfospaces/bootstrap-pilot/infospaces/wealth-vsm-legacy-slice/infospaces/wealth-vsm-generation-pilot/infospaces/agentic-memory-profile-pilot/workplans/
Current development command:
python3 -m pytest
Languages
Python
99.9%
Makefile
0.1%