Commit Graph

34 Commits

Author SHA1 Message Date
24f4c09d42 Implement llm-connect ADHOC diagnostics
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
2026-06-03 11:56:21 +02:00
79c899b694 Capture llm-connect lessons from CUST-WP-0045 canary as ADHOC-2026-06-02
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
The 2026-06-02 daily-triage canary debugging session uncovered five real
bugs (commits 9de0f49, 435da49, cd4551c, 583ab57, 1b01f0e), mostly because
llm-connect has no way to see what payload the adapter sent or what the
provider returned. Capture the six structural improvements that would
collapse the next diagnosis of this shape from half a day to minutes:

  T01 — LLM_CONNECT_DEBUG envelope mode for /execute responses
  T02 — ThreadingHTTPServer drop-in replacement for stdlib HTTPServer
  T03 — Per-call audit log + replay CLI (LLM_CONNECT_AUDIT_DIR)
  T04 — Apply param-translation contract to OpenAI and Gemini adapters
  T05 — Provider-agnostic structured-output smoke test in CI
  T06 — Document the model_params translation contract for adapter authors

All six registered in the State Hub under workstream
adhoc-llmc-2026-06-02 (1c936c91-79c7-427d-ab37-9052e8a61cda).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 15:55:42 +02:00
1b01f0edf4 Honour explicit OpenRouter --model when it equals the adapter default
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
The adapter compared self._model to _DEFAULT_MODEL ("anthropic/claude-sonnet-4")
to decide whether to honour the constructor's model. When a caller passes
that exact value via --model, the comparison treats it as "not specified"
and falls through to RunConfig.model_name, which defaults to "gpt-4". So
every llm-connect call started with --provider openrouter --model
anthropic/claude-sonnet-4 actually landed on OpenAI's gpt-4 — and on
gpt-4 OpenAI's structured-output response_format requires a model with
schema support that gpt-4 lacks, returning 400. The CUST-WP-0045 canary
hit this for hours; the smoke probes that worked were the ones with no
json_schema, where gpt-4 returned fine.

Track _explicit_model separately so a constructor or LLMConfig that
matches the default is still treated as a real intent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 14:50:37 +02:00
583ab57a59 Set response_format json_schema strict=False in OpenRouter adapter
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
The previous strict=True default rejected the activity-core daily-triage
schema (and most real-world application schemas) because OpenAI strict
mode requires additionalProperties:false on every object and every
property in the required list. Application-supplied schemas typically
do not meet that bar — adding additionalProperties recursively at the
adapter would be surprising and may break callers that rely on extra
fields. Flipping strict to False keeps the schema as a soft constraint;
the model still produces structured output and the activity-core
canary's 400 from OpenRouter goes away.

Callers who need strict enforcement can pass response_format directly
via model_params, where the adapter's pass-through handling preserves
the strict flag they set.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 14:18:33 +02:00
cd4551c575 Translate json_schema and drop non-OpenAI fields in OpenRouter adapter
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
The adapter previously did a blind payload.update(config.model_params).
For callers like activity-core that pass reasoning_effort, max_depth,
and json_schema (Claude / llm-connect-specific fields), those leaked
into the OpenAI Chat Completions request body and OpenRouter rejected
the whole call with HTTP 400. CUST-WP-0045 canary on 2026-06-02 hit
this — manual repro confirmed: same prompt with no model_params returns
a clean 10-recommendation WSJF report in 4.5s; with model_params
included, every call 400s.

Replace the merge with a whitelist + translation step:

- pass-through known OpenAI Chat Completions fields (top_p, stop, seed,
  tools, response_format, etc.)
- translate json_schema into the proper response_format wrapper
  ({type:"json_schema", json_schema:{name,schema,strict}})
- drop documented non-OpenAI fields (reasoning_effort, max_depth) so
  the payload stays valid
- silently drop unknown keys rather than risk another 400

The same pattern will need to apply to the OpenAI and Gemini adapters
when their callers start passing provider-specific keys — left as
follow-up rather than speculative refactoring.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 14:15:24 +02:00
435da49263 Prefer JSON-bearing envelope fields, skip metadata, in Claude CLI unwrap
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
The first CUST-WP-0045 canary retry after 9de0f49 still failed schema
validation with `Expecting value: line 1 column 1 (char 0)`. The original
allowlist returned envelope.result verbatim, which on longer prompts
carries the model's conversational preamble ("Triage report generated
and returned via structured output. Key signals: ..."), not the
schema-enforced JSON. The actual structured payload lives in a different
envelope field whose name varies across CLI versions.

Make the unwrap order-aware:
  1. Scan envelope fields and return the first one whose value parses as
     JSON (dict, list, or a string that loads cleanly). Skip well-known
     metadata keys (type, usage, total_cost_usd, etc.) so telemetry can
     never be mistaken for the model payload.
  2. Fall back to the original text-field allowlist only when no field
     carries JSON, so non-schema callers via this same code path still
     see the model's prose.
  3. Surface the raw envelope as last resort.

This is robust against unknown envelope shapes — as long as the schema-
enforced JSON appears somewhere in a non-metadata field, the adapter
will find it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 12:44:25 +02:00
9de0f495db Pass --output-format json with --json-schema and unwrap CLI envelope
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
The Claude Code adapter previously passed --json-schema alone. On Claude
CLI 2.1.160 that combination still emits the model's conversational
preamble on stdout while the schema-enforced structured payload ships on
a sidecar channel the adapter cannot read. Result: callers requesting
structured output got prose that fails JSON parsing downstream — exactly
the failure mode the activity-core CUST-WP-0045 daily triage canary hit
on 2026-06-01 ("Triage report generated and returned via structured
output. Key signals:..." → json.loads error at column 1).

Fix: when --json-schema is set, also pass --output-format json. The CLI
then writes a JSON envelope on stdout. The adapter unwraps it by
probing a small allowlist of known text-bearing fields (result,
result_text, content, text, output). Unknown envelope shapes fall
through to raw stdout so the operator can introspect the structure and
extend the allowlist.

The unwrap path is only triggered when --json-schema was set, so non-
schema callers keep the existing raw-stdout behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 10:20:24 +02:00
b12d1af8bb Support Claude Code JSON schema execution
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
2026-05-21 03:19:27 +02:00
82e3c07928 Preserve llm-connect run config in server mode
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
2026-05-19 20:55:02 +02:00
c11c6afa3f Implement-LLM-WP-0005-cost-model-estimators
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
2026-05-19 05:02:20 +02:00
0054afe689 plan: WP-0005 — cost model and problem-class token estimators
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Drafted workplan to move two consumer-side concerns into llm-connect:

- ModelRateRegistry: per-model USD-per-1k rates with provenance, a
  property of the base model, not the application.
- ProblemClass token estimators: generic shapes (chunk-summarization,
  entity-extraction, relation-extraction, judge-eval, report-synthesis)
  with base dimensions + tunable params; consumer supplies the shape
  of its problem and gets a TokenEstimate before any call.

Demand signal: the 2026-05-18 infospace-bench Lefevre Chapter-I smoke
ran 32 calls / 28k tokens / 0.009 USD actual against a planned 8.40
USD — the 1000x variance was entirely consumer-side because there is
no rate table in llm-connect to delegate to.

Three new modules (rates.py, costs.py, problem_classes.py), eight
tasks, registered as workstream 869196c5-551b-4eef-b8d8-cca6f770a9b0
under the custodian topic. A follow-on consumer workplan in
infospace-bench will migrate plan_generation_summary to delegate once
T01-T04 land here.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 04:30:52 +02:00
4b685e849c Refresh agent instruction files
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
2026-05-18 16:55:44 +02:00
a27945101c Adaptive routing initial version
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
2026-05-18 11:38:12 +02:00
14838ae968 chore(consistency): sync task status from DB [auto]
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Updated by fix-consistency on 2026-05-17:
  - update .custodian-brief.md for llm-connect
2026-05-17 22:54:25 +02:00
c4ad4bb9f2 Add adaptive cost-quality routing primitives
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
2026-05-17 21:32:27 +02:00
bf86a03c5d chore(consistency): sync task status from DB [auto]
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Updated by fix-consistency on 2026-05-17:
  - update .custodian-brief.md for llm-connect
2026-05-17 20:22:06 +02:00
37ace7b99c chore(consistency): sync task status from DB [auto]
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Updated by fix-consistency on 2026-05-17:
  - update .custodian-brief.md for llm-connect
2026-05-17 19:51:26 +02:00
bd2315cf4c chore(consistency): sync task status from DB [auto]
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Updated by fix-consistency on 2026-05-17:
  - update .custodian-brief.md for llm-connect
2026-05-17 19:21:57 +02:00
2136fb21d7 chore(consistency): sync task status from DB [auto]
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Updated by fix-consistency on 2026-05-17:
  - update .custodian-brief.md for llm-connect
2026-05-17 18:47:30 +02:00
deade6ad76 plan: WP-0004 — adaptive cost-quality routing (todo)
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Draft the workplan that extends the static RoutingPolicy (WP-0003) with
a quality observation ledger, a BaselineGrader (ClaudeCodeAdapter as the
default oracle), an AdaptiveRoutingPolicy that picks the cheapest
adapter clearing a per-task quality floor, and a sampled
ShadowingAdapter for production observation collection.

Scope is explicit: ship primitives only. Task-type taxonomy, quality
thresholds, baseline choice, and re-grading cadence stay with the
consumer. infospace-bench is the named first consumer; consumer wiring
deferred until T01-T03 land.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 17:17:07 +02:00
66dfc7cf06 Added INTENT.md file and reviewed scope
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
2026-05-03 17:46:24 +02:00
665e925be6 Scope update from repo-scoping refactor
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
2026-05-01 12:26:51 +02:00
a4b4a770ab chore(consistency): sync task status from DB [auto]
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Updated by fix-consistency on 2026-05-01:
  - update .custodian-brief.md for llm-connect
2026-05-01 12:19:40 +02:00
d51d6303e2 feat: WP-0003 — RoutingPolicy (FR-2) and HTTP serve mode (FR-1)
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
FR-2 RoutingPolicy:
- RoutingPolicy + RoutingRule dataclasses in llm_connect/routing.py
- resolve(task_type, estimated_cost_per_1k=None) with cost-cap fallback
- Exported from llm_connect.__init__; contract doc at contracts/functional/routing-policy.md
- 11 tests covering rule match, cost-cap, fallback, unknown type, no-match

FR-1 HTTP serve mode:
- LLMServer in llm_connect/server.py (stdlib http.server, zero extra deps)
- POST /execute + GET /health; CLI via python -m llm_connect.server
- [server] optional-dep group added to pyproject.toml
- Contract doc at contracts/functional/server.md
- 9 tests: health, round-trip, 400/404/500 errors, config forwarding
- Added "mock" provider to factory for CLI default

All 101 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 22:34:00 +00:00
f76a58d6e9 refactor: simplify post-WP-0002 cleanup
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
- Remove redundant async_execute_prompt overrides from OpenAI/Gemini/OpenRouter
  adapters (identical to base class default — asyncio import also removed)
- Cache prompt.split() result in MockLLMAdapter to avoid double evaluation
- Promote deferred LLMBudgetExceededError imports to module level in
  models.py and adapter.py (no circular dependency)
- Auto-populate context dict in LLMBudgetExceededError.__init__ so callers
  need not pass redundant context= kwarg

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 22:30:00 +00:00
d71f4114d1 feat: WP-0001 foundation + WP-0002 core extensions
WP-0001 — Foundation & GAAF Baseline
- SCOPE.md, ARCHITECTURE-LAYERS.md, contracts/ tree
- .claude/rules/ stubs filled (architecture, stack, boundary)
- 57 tests (pytest), pyproject.toml with ruff+mypy, CI workflow

WP-0002 — Core Extensions (FR-4 + FR-3)
- FR-4: BudgetTracker (thread-safe) + LLMBudgetExceededError +
  optional RunConfig.budget_tracker + enforcement in all adapters
- FR-3: async_execute_prompt on LLMAdapter ABC (asyncio.to_thread
  fallback) + native asyncio.create_subprocess_exec in ClaudeCodeAdapter

81 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 22:24:14 +00:00
tegwick
57b346bb8b chore(custodian): add CLAUDE.md and .claude/rules/ orientation files
Registers llm-connect with the Custodian agent system:
- CLAUDE.md: thin @-import index pointing to modular rules
- .claude/rules/session-protocol.md: orient with get_domain_summary("custodian")
- .claude/rules/repo-identity.md: domain=custodian, slug=llm-connect
- .claude/rules/first-session.md, workplan-convention.md, stack-and-commands.md,
  architecture.md, repo-boundary.md, agents.md, scope.md (stubs to fill in)
- session-protocol notes both local (:8000) and CoulombCore bridge (:18000) URLs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 23:15:29 +02:00
7dfd1054a7 added feature requests 2026-04-01 21:08:15 +00:00
7b36e2f744 Third party services catalog declaration 2026-03-25 00:10:13 +01:00
8ab24899bd chore: set up uv, add uv.lock
Adds uv lockfile and .venv. Only runtime dep is toml; pytest added as dev dep.
Service-level dependencies (OpenAI, Gemini, Anthropic, OpenRouter APIs) are
tracked separately via the state-hub capability/service dependency system.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 23:17:02 +01:00
0d22eb582a docs: add README with quickstart, provider table, API reference
Covers: installation, all 4 providers, RunConfig/LLMResponse types,
custom adapter pattern, TOML config chain, embeddings, exceptions,
testing with MockLLMAdapter, and origin note.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 08:10:34 +01:00
2355df6589 fix: use setuptools.build_meta backend (legacy path not available)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 08:04:54 +01:00
ad68b4bfef chore: add .gitignore, remove pycache
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 07:54:53 +01:00
e499edba90 feat: initial llm-connect package scaffold
Copy markitect.llm module into standalone llm_connect package.
All markitect.* imports replaced with llm_connect.* equivalents.
LLMError base class inlined (no markitect.exceptions dependency).
Verified: from llm_connect import create_adapter works.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 07:54:42 +01:00