The 2026-06-02 daily-triage canary debugging session uncovered five real
bugs (commits 9de0f49, 435da49, cd4551c, 583ab57, 1b01f0e), mostly because
llm-connect has no way to see what payload the adapter sent or what the
provider returned. Capture the six structural improvements that would
collapse the next diagnosis of this shape from half a day to minutes:
T01 — LLM_CONNECT_DEBUG envelope mode for /execute responses
T02 — ThreadingHTTPServer drop-in replacement for stdlib HTTPServer
T03 — Per-call audit log + replay CLI (LLM_CONNECT_AUDIT_DIR)
T04 — Apply param-translation contract to OpenAI and Gemini adapters
T05 — Provider-agnostic structured-output smoke test in CI
T06 — Document the model_params translation contract for adapter authors
All six registered in the State Hub under workstream
adhoc-llmc-2026-06-02 (1c936c91-79c7-427d-ab37-9052e8a61cda).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drafted workplan to move two consumer-side concerns into llm-connect:
- ModelRateRegistry: per-model USD-per-1k rates with provenance, a
property of the base model, not the application.
- ProblemClass token estimators: generic shapes (chunk-summarization,
entity-extraction, relation-extraction, judge-eval, report-synthesis)
with base dimensions + tunable params; consumer supplies the shape
of its problem and gets a TokenEstimate before any call.
Demand signal: the 2026-05-18 infospace-bench Lefevre Chapter-I smoke
ran 32 calls / 28k tokens / 0.009 USD actual against a planned 8.40
USD — the 1000x variance was entirely consumer-side because there is
no rate table in llm-connect to delegate to.
Three new modules (rates.py, costs.py, problem_classes.py), eight
tasks, registered as workstream 869196c5-551b-4eef-b8d8-cca6f770a9b0
under the custodian topic. A follow-on consumer workplan in
infospace-bench will migrate plan_generation_summary to delegate once
T01-T04 land here.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Draft the workplan that extends the static RoutingPolicy (WP-0003) with
a quality observation ledger, a BaselineGrader (ClaudeCodeAdapter as the
default oracle), an AdaptiveRoutingPolicy that picks the cheapest
adapter clearing a per-task quality floor, and a sampled
ShadowingAdapter for production observation collection.
Scope is explicit: ship primitives only. Task-type taxonomy, quality
thresholds, baseline choice, and re-grading cadence stay with the
consumer. infospace-bench is the named first consumer; consumer wiring
deferred until T01-T03 land.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>