The 2026-06-02 daily-triage canary debugging session uncovered five real
bugs (commits 9de0f49, 435da49, cd4551c, 583ab57, 1b01f0e), mostly because
llm-connect has no way to see what payload the adapter sent or what the
provider returned. Capture the six structural improvements that would
collapse the next diagnosis of this shape from half a day to minutes:
T01 — LLM_CONNECT_DEBUG envelope mode for /execute responses
T02 — ThreadingHTTPServer drop-in replacement for stdlib HTTPServer
T03 — Per-call audit log + replay CLI (LLM_CONNECT_AUDIT_DIR)
T04 — Apply param-translation contract to OpenAI and Gemini adapters
T05 — Provider-agnostic structured-output smoke test in CI
T06 — Document the model_params translation contract for adapter authors
All six registered in the State Hub under workstream
adhoc-llmc-2026-06-02 (1c936c91-79c7-427d-ab37-9052e8a61cda).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The adapter compared self._model to _DEFAULT_MODEL ("anthropic/claude-sonnet-4")
to decide whether to honour the constructor's model. When a caller passes
that exact value via --model, the comparison treats it as "not specified"
and falls through to RunConfig.model_name, which defaults to "gpt-4". So
every llm-connect call started with --provider openrouter --model
anthropic/claude-sonnet-4 actually landed on OpenAI's gpt-4 — and on
gpt-4 OpenAI's structured-output response_format requires a model with
schema support that gpt-4 lacks, returning 400. The CUST-WP-0045 canary
hit this for hours; the smoke probes that worked were the ones with no
json_schema, where gpt-4 returned fine.
Track _explicit_model separately so a constructor or LLMConfig that
matches the default is still treated as a real intent.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous strict=True default rejected the activity-core daily-triage
schema (and most real-world application schemas) because OpenAI strict
mode requires additionalProperties:false on every object and every
property in the required list. Application-supplied schemas typically
do not meet that bar — adding additionalProperties recursively at the
adapter would be surprising and may break callers that rely on extra
fields. Flipping strict to False keeps the schema as a soft constraint;
the model still produces structured output and the activity-core
canary's 400 from OpenRouter goes away.
Callers who need strict enforcement can pass response_format directly
via model_params, where the adapter's pass-through handling preserves
the strict flag they set.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The adapter previously did a blind payload.update(config.model_params).
For callers like activity-core that pass reasoning_effort, max_depth,
and json_schema (Claude / llm-connect-specific fields), those leaked
into the OpenAI Chat Completions request body and OpenRouter rejected
the whole call with HTTP 400. CUST-WP-0045 canary on 2026-06-02 hit
this — manual repro confirmed: same prompt with no model_params returns
a clean 10-recommendation WSJF report in 4.5s; with model_params
included, every call 400s.
Replace the merge with a whitelist + translation step:
- pass-through known OpenAI Chat Completions fields (top_p, stop, seed,
tools, response_format, etc.)
- translate json_schema into the proper response_format wrapper
({type:"json_schema", json_schema:{name,schema,strict}})
- drop documented non-OpenAI fields (reasoning_effort, max_depth) so
the payload stays valid
- silently drop unknown keys rather than risk another 400
The same pattern will need to apply to the OpenAI and Gemini adapters
when their callers start passing provider-specific keys — left as
follow-up rather than speculative refactoring.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The first CUST-WP-0045 canary retry after 9de0f49 still failed schema
validation with `Expecting value: line 1 column 1 (char 0)`. The original
allowlist returned envelope.result verbatim, which on longer prompts
carries the model's conversational preamble ("Triage report generated
and returned via structured output. Key signals: ..."), not the
schema-enforced JSON. The actual structured payload lives in a different
envelope field whose name varies across CLI versions.
Make the unwrap order-aware:
1. Scan envelope fields and return the first one whose value parses as
JSON (dict, list, or a string that loads cleanly). Skip well-known
metadata keys (type, usage, total_cost_usd, etc.) so telemetry can
never be mistaken for the model payload.
2. Fall back to the original text-field allowlist only when no field
carries JSON, so non-schema callers via this same code path still
see the model's prose.
3. Surface the raw envelope as last resort.
This is robust against unknown envelope shapes — as long as the schema-
enforced JSON appears somewhere in a non-metadata field, the adapter
will find it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Claude Code adapter previously passed --json-schema alone. On Claude
CLI 2.1.160 that combination still emits the model's conversational
preamble on stdout while the schema-enforced structured payload ships on
a sidecar channel the adapter cannot read. Result: callers requesting
structured output got prose that fails JSON parsing downstream — exactly
the failure mode the activity-core CUST-WP-0045 daily triage canary hit
on 2026-06-01 ("Triage report generated and returned via structured
output. Key signals:..." → json.loads error at column 1).
Fix: when --json-schema is set, also pass --output-format json. The CLI
then writes a JSON envelope on stdout. The adapter unwraps it by
probing a small allowlist of known text-bearing fields (result,
result_text, content, text, output). Unknown envelope shapes fall
through to raw stdout so the operator can introspect the structure and
extend the allowlist.
The unwrap path is only triggered when --json-schema was set, so non-
schema callers keep the existing raw-stdout behavior.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drafted workplan to move two consumer-side concerns into llm-connect:
- ModelRateRegistry: per-model USD-per-1k rates with provenance, a
property of the base model, not the application.
- ProblemClass token estimators: generic shapes (chunk-summarization,
entity-extraction, relation-extraction, judge-eval, report-synthesis)
with base dimensions + tunable params; consumer supplies the shape
of its problem and gets a TokenEstimate before any call.
Demand signal: the 2026-05-18 infospace-bench Lefevre Chapter-I smoke
ran 32 calls / 28k tokens / 0.009 USD actual against a planned 8.40
USD — the 1000x variance was entirely consumer-side because there is
no rate table in llm-connect to delegate to.
Three new modules (rates.py, costs.py, problem_classes.py), eight
tasks, registered as workstream 869196c5-551b-4eef-b8d8-cca6f770a9b0
under the custodian topic. A follow-on consumer workplan in
infospace-bench will migrate plan_generation_summary to delegate once
T01-T04 land here.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Draft the workplan that extends the static RoutingPolicy (WP-0003) with
a quality observation ledger, a BaselineGrader (ClaudeCodeAdapter as the
default oracle), an AdaptiveRoutingPolicy that picks the cheapest
adapter clearing a per-task quality floor, and a sampled
ShadowingAdapter for production observation collection.
Scope is explicit: ship primitives only. Task-type taxonomy, quality
thresholds, baseline choice, and re-grading cadence stay with the
consumer. infospace-bench is the named first consumer; consumer wiring
deferred until T01-T03 land.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove redundant async_execute_prompt overrides from OpenAI/Gemini/OpenRouter
adapters (identical to base class default — asyncio import also removed)
- Cache prompt.split() result in MockLLMAdapter to avoid double evaluation
- Promote deferred LLMBudgetExceededError imports to module level in
models.py and adapter.py (no circular dependency)
- Auto-populate context dict in LLMBudgetExceededError.__init__ so callers
need not pass redundant context= kwarg
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Registers llm-connect with the Custodian agent system:
- CLAUDE.md: thin @-import index pointing to modular rules
- .claude/rules/session-protocol.md: orient with get_domain_summary("custodian")
- .claude/rules/repo-identity.md: domain=custodian, slug=llm-connect
- .claude/rules/first-session.md, workplan-convention.md, stack-and-commands.md,
architecture.md, repo-boundary.md, agents.md, scope.md (stubs to fill in)
- session-protocol notes both local (:8000) and CoulombCore bridge (:18000) URLs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds uv lockfile and .venv. Only runtime dep is toml; pytest added as dev dep.
Service-level dependencies (OpenAI, Gemini, Anthropic, OpenRouter APIs) are
tracked separately via the state-hub capability/service dependency system.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copy markitect.llm module into standalone llm_connect package.
All markitect.* imports replaced with llm_connect.* equivalents.
LLMError base class inlined (no markitect.exceptions dependency).
Verified: from llm_connect import create_adapter works.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>