generated from coulomb/repo-seed
Complete activity-core LLM endpoint handoff (LLM-WP-0006)
Switch the custodian triage default from anthropic/claude-sonnet-4 to google/gemini-2.5-flash, which advertises structured-output support on OpenRouter. Tighten the OpenRouter adapter to send strict JSON schema requests and set provider.require_parameters=true so routing only hits providers that honor the requested response_format. Update Kubernetes deploy docs and config for the verified coulombcore handoff: Containerfile build path, image-pull-policy=Never for smoke pods, credential-routing notes, and live smoke evidence. Mark LLM-WP-0006 finished with closure notes from 2026-06-18.
This commit is contained in:
@@ -123,9 +123,9 @@ Useful runtime environment variables:
|
||||
LLM_CONNECT_HOST=0.0.0.0
|
||||
LLM_CONNECT_PORT=8080
|
||||
LLM_CONNECT_PROVIDER=openrouter
|
||||
LLM_CONNECT_MODEL=anthropic/claude-sonnet-4
|
||||
LLM_CONNECT_MODEL=google/gemini-2.5-flash
|
||||
LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER=openrouter
|
||||
LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=anthropic/claude-sonnet-4
|
||||
LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=google/gemini-2.5-flash
|
||||
```
|
||||
|
||||
For local smoke tests without provider credentials:
|
||||
|
||||
@@ -17,10 +17,14 @@ kubectl -n activity-core create secret generic llm-connect-provider-secrets \
|
||||
--from-literal=OPENROUTER_API_KEY="$OPENROUTER_API_KEY"
|
||||
```
|
||||
|
||||
Provider API key custody belongs to the operator/OpenBao-to-Kubernetes Secret
|
||||
path. ops-warden documents this as outside its issuance scope; do not paste key
|
||||
values into Git, State Hub, logs, or chat.
|
||||
|
||||
Apply:
|
||||
|
||||
```bash
|
||||
docker build -t docker.io/library/llm-connect:latest .
|
||||
docker build -f Containerfile -t docker.io/library/llm-connect:latest .
|
||||
docker save docker.io/library/llm-connect:latest | ssh coulombcore sudo k3s ctr -n k8s.io images import -
|
||||
kubectl apply -k deploy/k8s/activity-core-llm-connect
|
||||
kubectl -n activity-core rollout status deployment/llm-connect
|
||||
@@ -33,6 +37,7 @@ fixtures and `scripts/smoke_activity_core_endpoint.py`:
|
||||
kubectl -n activity-core run llm-connect-smoke \
|
||||
--rm -i --restart=Never \
|
||||
--image=llm-connect:latest \
|
||||
--image-pull-policy=Never \
|
||||
--env=LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080 \
|
||||
--env=LLM_CONNECT_TIMEOUT_SECONDS=300 \
|
||||
-- python scripts/smoke_activity_core_endpoint.py
|
||||
|
||||
@@ -10,9 +10,9 @@ data:
|
||||
LLM_CONNECT_HOST: "0.0.0.0"
|
||||
LLM_CONNECT_PORT: "8080"
|
||||
LLM_CONNECT_PROVIDER: "openrouter"
|
||||
LLM_CONNECT_MODEL: "anthropic/claude-sonnet-4"
|
||||
LLM_CONNECT_MODEL: "google/gemini-2.5-flash"
|
||||
LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER: "openrouter"
|
||||
LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL: "anthropic/claude-sonnet-4"
|
||||
LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL: "google/gemini-2.5-flash"
|
||||
LLM_CONNECT_CUSTODIAN_TRIAGE_TEMPERATURE: "0.2"
|
||||
LLM_CONNECT_CUSTODIAN_TRIAGE_MAX_TOKENS: "1800"
|
||||
LLM_CONNECT_CUSTODIAN_TRIAGE_MAX_DEPTH: "2"
|
||||
|
||||
@@ -27,7 +27,7 @@ Default runtime values:
|
||||
|
||||
```text
|
||||
provider=openrouter
|
||||
model=anthropic/claude-sonnet-4
|
||||
model=google/gemini-2.5-flash
|
||||
temperature=0.2
|
||||
max_tokens=1800
|
||||
max_depth=2
|
||||
@@ -47,6 +47,12 @@ Provider credentials must be injected at runtime through
|
||||
`llm-connect-provider-secrets`; do not store credential values in Git or State
|
||||
Hub.
|
||||
|
||||
Credential custody follows the ops-warden routing table: LLM provider API keys
|
||||
are an operator/OpenBao-to-Kubernetes Secret action, not an ops-warden issuance
|
||||
task. For the default OpenRouter profile, the Secret must provide
|
||||
`OPENROUTER_API_KEY` without exposing the value in Git, State Hub, logs, or
|
||||
chat.
|
||||
|
||||
## Local Smoke
|
||||
|
||||
Run a mock server that returns known schema-valid daily triage JSON:
|
||||
@@ -85,6 +91,7 @@ Run the in-namespace smoke:
|
||||
kubectl -n activity-core run llm-connect-smoke \
|
||||
--rm -i --restart=Never \
|
||||
--image=llm-connect:latest \
|
||||
--image-pull-policy=Never \
|
||||
--env=LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080 \
|
||||
--env=LLM_CONNECT_TIMEOUT_SECONDS=300 \
|
||||
-- python scripts/smoke_activity_core_endpoint.py
|
||||
@@ -92,13 +99,17 @@ kubectl -n activity-core run llm-connect-smoke \
|
||||
|
||||
## Handoff Status
|
||||
|
||||
Code-owned artifacts are present in this repo. Live handoff is still pending
|
||||
operator action:
|
||||
Code-owned artifacts are present in this repo and the live llm-connect
|
||||
handoff is verified as of 2026-06-18:
|
||||
|
||||
- Build/publish the `llm-connect` image selected by Railiance.
|
||||
- Create the runtime provider Secret outside Git.
|
||||
- Apply `deploy/k8s/activity-core-llm-connect`.
|
||||
- Smoke from the `activity-core` namespace.
|
||||
- Set activity-core `LLM_CONNECT_URL` to the stable URL above.
|
||||
- Run or observe one daily WSJF smoke/manual activity run and confirm a
|
||||
non-secret State Hub `daily_triage` progress event.
|
||||
- `docker.io/library/llm-connect:latest` was rebuilt from `Containerfile`,
|
||||
imported into the `coulombcore` k3s image store, and rolled out.
|
||||
- `activity-core/llm-connect-provider-secrets` reports `DATA 1`; no Secret
|
||||
values were inspected or recorded.
|
||||
- The live ConfigMap sets `LLM_CONNECT_MODEL=google/gemini-2.5-flash` and
|
||||
`LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=google/gemini-2.5-flash`.
|
||||
- The in-namespace smoke passed against the stable Service:
|
||||
`smoke: pass health=ok latency_seconds=2.147 recommendations=1`.
|
||||
|
||||
Scheduled `daily_triage` evidence collection is activity-core ownership under
|
||||
`ACTIVITY-WP-0006`.
|
||||
|
||||
@@ -100,7 +100,7 @@ def merge_openai_chat_model_params(payload: dict[str, Any], model_params: dict[s
|
||||
"json_schema": {
|
||||
"name": "structured_output",
|
||||
"schema": schema,
|
||||
"strict": False,
|
||||
"strict": True,
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@@ -82,6 +82,13 @@ class OpenRouterAdapter(LLMAdapter):
|
||||
}
|
||||
if config.model_params:
|
||||
merge_openai_chat_model_params(payload, config.model_params)
|
||||
provider_params = config.model_params.get("provider")
|
||||
if isinstance(provider_params, dict):
|
||||
payload["provider"] = dict(provider_params)
|
||||
if _uses_json_schema_response_format(payload):
|
||||
provider = payload.setdefault("provider", {})
|
||||
if isinstance(provider, dict):
|
||||
provider.setdefault("require_parameters", True)
|
||||
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self._api_key}",
|
||||
@@ -149,3 +156,8 @@ class OpenRouterAdapter(LLMAdapter):
|
||||
else:
|
||||
raise
|
||||
raise last_exc # type: ignore[misc]
|
||||
|
||||
|
||||
def _uses_json_schema_response_format(payload: Dict[str, Any]) -> bool:
|
||||
response_format = payload.get("response_format")
|
||||
return isinstance(response_format, dict) and response_format.get("type") == "json_schema"
|
||||
|
||||
@@ -16,7 +16,7 @@ from llm_connect.models import LLMResponse, RunConfig
|
||||
|
||||
CUSTODIAN_TRIAGE_BALANCED = "custodian-triage-balanced"
|
||||
DEFAULT_CUSTODIAN_TRIAGE_PROVIDER = "openrouter"
|
||||
DEFAULT_CUSTODIAN_TRIAGE_MODEL = "anthropic/claude-sonnet-4"
|
||||
DEFAULT_CUSTODIAN_TRIAGE_MODEL = "google/gemini-2.5-flash"
|
||||
_RUN_CONFIG_DEFAULTS = RunConfig()
|
||||
|
||||
|
||||
|
||||
@@ -17,7 +17,7 @@ Usage (programmatic)::
|
||||
|
||||
Usage (CLI)::
|
||||
|
||||
python -m llm_connect.server --port 8080 --provider openrouter --model anthropic/claude-sonnet-4
|
||||
python -m llm_connect.server --port 8080 --provider openrouter --model google/gemini-2.5-flash
|
||||
"""
|
||||
|
||||
import argparse
|
||||
|
||||
@@ -34,7 +34,7 @@ def test_openai_chat_model_params_translate_activity_core_shape():
|
||||
"json_schema": {
|
||||
"name": "structured_output",
|
||||
"schema": STRUCTURED_SCHEMA,
|
||||
"strict": False,
|
||||
"strict": True,
|
||||
},
|
||||
}
|
||||
assert payload["top_p"] == 0.8
|
||||
|
||||
@@ -115,6 +115,14 @@ def test_unknown_custodian_profile_fails_without_secret_context():
|
||||
assert excinfo.value.context == {"profile": "custodian-missing"}
|
||||
|
||||
|
||||
def test_default_custodian_profile_uses_structured_output_capable_model():
|
||||
profiles = default_runtime_profiles()
|
||||
profile = profiles[CUSTODIAN_TRIAGE_BALANCED]
|
||||
|
||||
assert profile.provider == "openrouter"
|
||||
assert profile.model == "google/gemini-2.5-flash"
|
||||
|
||||
|
||||
def test_default_profiles_can_be_overridden_from_json_env(monkeypatch):
|
||||
monkeypatch.setenv(
|
||||
"LLM_CONNECT_PROFILES_JSON",
|
||||
|
||||
@@ -15,6 +15,8 @@ STRUCTURED_SCHEMA = {
|
||||
"required": ["summary", "recommendations"],
|
||||
}
|
||||
|
||||
OPENROUTER_STRUCTURED_MODEL = "google/gemini-2.5-flash"
|
||||
|
||||
|
||||
SMOKE_CONFIG = RunConfig(
|
||||
model_name="gpt-4",
|
||||
@@ -54,7 +56,7 @@ def test_openrouter_structured_output_payload_and_model_routing(monkeypatch):
|
||||
|
||||
monkeypatch.setattr("llm_connect.openrouter.post_json", fake_post_json)
|
||||
adapter = OpenRouterAdapter(
|
||||
model="anthropic/claude-sonnet-4",
|
||||
model=OPENROUTER_STRUCTURED_MODEL,
|
||||
api_key="or-test",
|
||||
api_base="https://openrouter.example/api/v1",
|
||||
)
|
||||
@@ -62,15 +64,58 @@ def test_openrouter_structured_output_payload_and_model_routing(monkeypatch):
|
||||
response = adapter.execute_prompt("Return JSON.", SMOKE_CONFIG)
|
||||
payload = captured["payload"]
|
||||
|
||||
assert response.model == "anthropic/claude-sonnet-4"
|
||||
assert payload["model"] == "anthropic/claude-sonnet-4"
|
||||
assert response.model == OPENROUTER_STRUCTURED_MODEL
|
||||
assert payload["model"] == OPENROUTER_STRUCTURED_MODEL
|
||||
assert payload["response_format"]["json_schema"]["schema"] == STRUCTURED_SCHEMA
|
||||
assert payload["response_format"]["json_schema"]["strict"] is False
|
||||
assert payload["response_format"]["json_schema"]["strict"] is True
|
||||
assert payload["provider"]["require_parameters"] is True
|
||||
assert "reasoning_effort" not in payload
|
||||
assert "max_depth" not in payload
|
||||
assert "json_schema" not in payload
|
||||
|
||||
|
||||
def test_openrouter_structured_output_preserves_provider_options(monkeypatch):
|
||||
captured: dict[str, object] = {}
|
||||
|
||||
def fake_post_json(url, payload, headers=None, timeout=300): # noqa: ANN001
|
||||
captured["payload"] = payload
|
||||
return {
|
||||
"id": "or-response",
|
||||
"model": payload["model"],
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"content": json.dumps({"summary": "ok", "recommendations": []})
|
||||
},
|
||||
"finish_reason": "stop",
|
||||
}
|
||||
],
|
||||
"usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3},
|
||||
}
|
||||
|
||||
config = RunConfig(
|
||||
model_name="gpt-4",
|
||||
temperature=0.1,
|
||||
max_tokens=300,
|
||||
model_params={
|
||||
"json_schema": STRUCTURED_SCHEMA,
|
||||
"provider": {"order": ["Anthropic"]},
|
||||
},
|
||||
)
|
||||
monkeypatch.setattr("llm_connect.openrouter.post_json", fake_post_json)
|
||||
adapter = OpenRouterAdapter(
|
||||
model=OPENROUTER_STRUCTURED_MODEL,
|
||||
api_key="or-test",
|
||||
api_base="https://openrouter.example/api/v1",
|
||||
)
|
||||
|
||||
adapter.execute_prompt("Return JSON.", config)
|
||||
payload = captured["payload"]
|
||||
|
||||
assert payload["provider"]["order"] == ["Anthropic"]
|
||||
assert payload["provider"]["require_parameters"] is True
|
||||
|
||||
|
||||
def test_openai_structured_output_payload(monkeypatch):
|
||||
captured: dict[str, object] = {}
|
||||
|
||||
|
||||
@@ -4,13 +4,13 @@ type: workplan
|
||||
title: "Activity-Core Always-On LLM Endpoint"
|
||||
domain: custodian
|
||||
repo: llm-connect
|
||||
status: blocked
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: activity-core-llm-endpoint
|
||||
planning_priority: high
|
||||
planning_order: 6
|
||||
created: "2026-06-07"
|
||||
updated: "2026-06-07"
|
||||
updated: "2026-06-18"
|
||||
depends_on_workplans:
|
||||
- LLM-WP-0003
|
||||
related_workplans:
|
||||
@@ -20,7 +20,7 @@ state_hub_workstream_id: "8de71d58-1193-424f-8338-a9aa4e173c5b"
|
||||
|
||||
# LLM-WP-0006 - Activity-Core Always-On LLM Endpoint
|
||||
|
||||
**status:** blocked
|
||||
**status:** finished
|
||||
**owner:** codex
|
||||
|
||||
## Purpose
|
||||
@@ -206,7 +206,7 @@ reported distinctly from provider transport failure.
|
||||
id: LLM-WP-0006-T07
|
||||
title: "Publish verified LLM_CONNECT_URL handoff and activity-core smoke evidence"
|
||||
priority: high
|
||||
status: blocked
|
||||
status: done
|
||||
state_hub_task_id: "92e043f0-5ca8-4c2d-b8f6-dd5fbf8ccb62"
|
||||
```
|
||||
|
||||
@@ -341,6 +341,74 @@ Remaining blocked live gate:
|
||||
`activity-core` with the verified URL and the remaining provider Secret gate
|
||||
for schema-valid `POST /execute` and `daily_triage` evidence.
|
||||
|
||||
2026-06-17 recheck:
|
||||
|
||||
- Verified the live `coulombcore` Kubernetes path is reachable and the
|
||||
`activity-core` namespace `llm-connect` Deployment remains `1/1` available
|
||||
with Service `llm-connect` on port `8080`.
|
||||
- Confirmed the `llm-connect-provider-secrets` Secret object exists but still
|
||||
reports `DATA 0`; no Secret values were inspected.
|
||||
- Re-ran the in-namespace fixture smoke with the node-local image. The first
|
||||
corrected pod needed `--image-pull-policy=Never` because the `:latest` tag
|
||||
otherwise attempted a Docker Hub pull. With the local image, the smoke reached
|
||||
`/execute` and failed safely with
|
||||
`configuration_error: Adapter rejected RunConfig`.
|
||||
- State Hub now also has a 2026-06-16 `daily_triage` event from
|
||||
`activity-core` showing `LLM_CONNECT_URL is not configured`, and the local
|
||||
activity-core runtime manifest still has `LLM_CONNECT_URL: ""`.
|
||||
- `LLM-WP-0006-T07` therefore remains externally blocked until the provider
|
||||
Secret is populated outside Git/State Hub, activity-core consumes
|
||||
`http://llm-connect.activity-core.svc.cluster.local:8080` with
|
||||
`LLM_CONNECT_TIMEOUT_SECONDS=300`, the fixture smoke returns schema-valid
|
||||
JSON, and a non-secret `daily_triage` evidence event is recorded.
|
||||
|
||||
2026-06-18 recheck:
|
||||
|
||||
- activity-core has repo-local work to consume the stable URL:
|
||||
`actcore-runtime-config` now sets
|
||||
`LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080`
|
||||
and `LLM_CONNECT_TIMEOUT_SECONDS=300`.
|
||||
- The live `activity-core` namespace has not yet been reconciled to that
|
||||
activity-core runtime surface; live deployments currently show only
|
||||
`deployment.apps/llm-connect`, and live ConfigMaps show only
|
||||
`kube-root-ca.crt` and `llm-connect-config`.
|
||||
- The live `llm-connect-provider-secrets` Secret still reports `DATA 0`; no
|
||||
Secret values were inspected.
|
||||
- ops-warden's credential-routing guidance says LLM provider API keys are not
|
||||
an ops-warden issuance task. The remaining credential gate belongs to the
|
||||
approved operator/OpenBao-to-Kubernetes Secret path for
|
||||
`activity-core/llm-connect-provider-secrets`.
|
||||
- `LLM-WP-0006-T07` remains blocked until the provider Secret is populated,
|
||||
the activity-core runtime is reconciled with the URL/timeout config, the
|
||||
fixture smoke returns schema-valid JSON from inside the namespace, and
|
||||
activity-core records non-secret `daily_triage` evidence.
|
||||
|
||||
2026-06-18 closure:
|
||||
|
||||
- Populated-provider state is now live: `activity-core/llm-connect-provider-secrets`
|
||||
reports `DATA 1`; no Secret values were inspected or recorded.
|
||||
- Updated the OpenRouter structured-output path to request strict JSON schema
|
||||
output and to set `provider.require_parameters=true` for schema calls, so
|
||||
OpenRouter routes only to providers that support the requested structured
|
||||
output parameters.
|
||||
- OpenRouter model metadata showed the previous
|
||||
`anthropic/claude-sonnet-4` profile model does not advertise
|
||||
`response_format`/`structured_outputs`; switched the activity-core profile
|
||||
and Kubernetes ConfigMap defaults to `google/gemini-2.5-flash`, which does.
|
||||
- Rebuilt `docker.io/library/llm-connect:latest` from `Containerfile`,
|
||||
imported it into the `coulombcore` k3s image store, applied the updated
|
||||
non-secret `llm-connect-config` ConfigMap, and rolled out
|
||||
`deployment/llm-connect`.
|
||||
- Verified live ConfigMap values:
|
||||
`LLM_CONNECT_MODEL=google/gemini-2.5-flash` and
|
||||
`LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=google/gemini-2.5-flash`.
|
||||
- Final in-namespace smoke passed against
|
||||
`http://llm-connect.activity-core.svc.cluster.local:8080` with:
|
||||
`smoke: pass health=ok latency_seconds=2.147 recommendations=1`.
|
||||
- Cleaned up the one-shot smoke pod after collecting logs. The llm-connect
|
||||
endpoint handoff is complete; collecting scheduled `daily_triage` evidence
|
||||
now belongs to activity-core / `ACTIVITY-WP-0006`.
|
||||
|
||||
## Closure Notes
|
||||
|
||||
After this workplan file is added or task statuses change, ask the custodian
|
||||
|
||||
Reference in New Issue
Block a user