Complete activity-core LLM endpoint handoff (LLM-WP-0006)
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled

Switch the custodian triage default from anthropic/claude-sonnet-4 to
google/gemini-2.5-flash, which advertises structured-output support on
OpenRouter. Tighten the OpenRouter adapter to send strict JSON schema
requests and set provider.require_parameters=true so routing only hits
providers that honor the requested response_format.

Update Kubernetes deploy docs and config for the verified coulombcore
handoff: Containerfile build path, image-pull-policy=Never for smoke
pods, credential-routing notes, and live smoke evidence. Mark
LLM-WP-0006 finished with closure notes from 2026-06-18.
This commit is contained in:
2026-06-19 13:51:12 +02:00
parent 6a0319ee86
commit 90eb39c247
12 changed files with 176 additions and 27 deletions

View File

@@ -123,9 +123,9 @@ Useful runtime environment variables:
LLM_CONNECT_HOST=0.0.0.0
LLM_CONNECT_PORT=8080
LLM_CONNECT_PROVIDER=openrouter
LLM_CONNECT_MODEL=anthropic/claude-sonnet-4
LLM_CONNECT_MODEL=google/gemini-2.5-flash
LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER=openrouter
LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=anthropic/claude-sonnet-4
LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=google/gemini-2.5-flash
```
For local smoke tests without provider credentials:

View File

@@ -17,10 +17,14 @@ kubectl -n activity-core create secret generic llm-connect-provider-secrets \
--from-literal=OPENROUTER_API_KEY="$OPENROUTER_API_KEY"
```
Provider API key custody belongs to the operator/OpenBao-to-Kubernetes Secret
path. ops-warden documents this as outside its issuance scope; do not paste key
values into Git, State Hub, logs, or chat.
Apply:
```bash
docker build -t docker.io/library/llm-connect:latest .
docker build -f Containerfile -t docker.io/library/llm-connect:latest .
docker save docker.io/library/llm-connect:latest | ssh coulombcore sudo k3s ctr -n k8s.io images import -
kubectl apply -k deploy/k8s/activity-core-llm-connect
kubectl -n activity-core rollout status deployment/llm-connect
@@ -33,6 +37,7 @@ fixtures and `scripts/smoke_activity_core_endpoint.py`:
kubectl -n activity-core run llm-connect-smoke \
--rm -i --restart=Never \
--image=llm-connect:latest \
--image-pull-policy=Never \
--env=LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080 \
--env=LLM_CONNECT_TIMEOUT_SECONDS=300 \
-- python scripts/smoke_activity_core_endpoint.py

View File

@@ -10,9 +10,9 @@ data:
LLM_CONNECT_HOST: "0.0.0.0"
LLM_CONNECT_PORT: "8080"
LLM_CONNECT_PROVIDER: "openrouter"
LLM_CONNECT_MODEL: "anthropic/claude-sonnet-4"
LLM_CONNECT_MODEL: "google/gemini-2.5-flash"
LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER: "openrouter"
LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL: "anthropic/claude-sonnet-4"
LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL: "google/gemini-2.5-flash"
LLM_CONNECT_CUSTODIAN_TRIAGE_TEMPERATURE: "0.2"
LLM_CONNECT_CUSTODIAN_TRIAGE_MAX_TOKENS: "1800"
LLM_CONNECT_CUSTODIAN_TRIAGE_MAX_DEPTH: "2"

View File

@@ -27,7 +27,7 @@ Default runtime values:
```text
provider=openrouter
model=anthropic/claude-sonnet-4
model=google/gemini-2.5-flash
temperature=0.2
max_tokens=1800
max_depth=2
@@ -47,6 +47,12 @@ Provider credentials must be injected at runtime through
`llm-connect-provider-secrets`; do not store credential values in Git or State
Hub.
Credential custody follows the ops-warden routing table: LLM provider API keys
are an operator/OpenBao-to-Kubernetes Secret action, not an ops-warden issuance
task. For the default OpenRouter profile, the Secret must provide
`OPENROUTER_API_KEY` without exposing the value in Git, State Hub, logs, or
chat.
## Local Smoke
Run a mock server that returns known schema-valid daily triage JSON:
@@ -85,6 +91,7 @@ Run the in-namespace smoke:
kubectl -n activity-core run llm-connect-smoke \
--rm -i --restart=Never \
--image=llm-connect:latest \
--image-pull-policy=Never \
--env=LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080 \
--env=LLM_CONNECT_TIMEOUT_SECONDS=300 \
-- python scripts/smoke_activity_core_endpoint.py
@@ -92,13 +99,17 @@ kubectl -n activity-core run llm-connect-smoke \
## Handoff Status
Code-owned artifacts are present in this repo. Live handoff is still pending
operator action:
Code-owned artifacts are present in this repo and the live llm-connect
handoff is verified as of 2026-06-18:
- Build/publish the `llm-connect` image selected by Railiance.
- Create the runtime provider Secret outside Git.
- Apply `deploy/k8s/activity-core-llm-connect`.
- Smoke from the `activity-core` namespace.
- Set activity-core `LLM_CONNECT_URL` to the stable URL above.
- Run or observe one daily WSJF smoke/manual activity run and confirm a
non-secret State Hub `daily_triage` progress event.
- `docker.io/library/llm-connect:latest` was rebuilt from `Containerfile`,
imported into the `coulombcore` k3s image store, and rolled out.
- `activity-core/llm-connect-provider-secrets` reports `DATA 1`; no Secret
values were inspected or recorded.
- The live ConfigMap sets `LLM_CONNECT_MODEL=google/gemini-2.5-flash` and
`LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=google/gemini-2.5-flash`.
- The in-namespace smoke passed against the stable Service:
`smoke: pass health=ok latency_seconds=2.147 recommendations=1`.
Scheduled `daily_triage` evidence collection is activity-core ownership under
`ACTIVITY-WP-0006`.

View File

@@ -100,7 +100,7 @@ def merge_openai_chat_model_params(payload: dict[str, Any], model_params: dict[s
"json_schema": {
"name": "structured_output",
"schema": schema,
"strict": False,
"strict": True,
},
}

View File

@@ -82,6 +82,13 @@ class OpenRouterAdapter(LLMAdapter):
}
if config.model_params:
merge_openai_chat_model_params(payload, config.model_params)
provider_params = config.model_params.get("provider")
if isinstance(provider_params, dict):
payload["provider"] = dict(provider_params)
if _uses_json_schema_response_format(payload):
provider = payload.setdefault("provider", {})
if isinstance(provider, dict):
provider.setdefault("require_parameters", True)
headers = {
"Authorization": f"Bearer {self._api_key}",
@@ -149,3 +156,8 @@ class OpenRouterAdapter(LLMAdapter):
else:
raise
raise last_exc # type: ignore[misc]
def _uses_json_schema_response_format(payload: Dict[str, Any]) -> bool:
response_format = payload.get("response_format")
return isinstance(response_format, dict) and response_format.get("type") == "json_schema"

View File

@@ -16,7 +16,7 @@ from llm_connect.models import LLMResponse, RunConfig
CUSTODIAN_TRIAGE_BALANCED = "custodian-triage-balanced"
DEFAULT_CUSTODIAN_TRIAGE_PROVIDER = "openrouter"
DEFAULT_CUSTODIAN_TRIAGE_MODEL = "anthropic/claude-sonnet-4"
DEFAULT_CUSTODIAN_TRIAGE_MODEL = "google/gemini-2.5-flash"
_RUN_CONFIG_DEFAULTS = RunConfig()

View File

@@ -17,7 +17,7 @@ Usage (programmatic)::
Usage (CLI)::
python -m llm_connect.server --port 8080 --provider openrouter --model anthropic/claude-sonnet-4
python -m llm_connect.server --port 8080 --provider openrouter --model google/gemini-2.5-flash
"""
import argparse

View File

@@ -34,7 +34,7 @@ def test_openai_chat_model_params_translate_activity_core_shape():
"json_schema": {
"name": "structured_output",
"schema": STRUCTURED_SCHEMA,
"strict": False,
"strict": True,
},
}
assert payload["top_p"] == 0.8

View File

@@ -115,6 +115,14 @@ def test_unknown_custodian_profile_fails_without_secret_context():
assert excinfo.value.context == {"profile": "custodian-missing"}
def test_default_custodian_profile_uses_structured_output_capable_model():
profiles = default_runtime_profiles()
profile = profiles[CUSTODIAN_TRIAGE_BALANCED]
assert profile.provider == "openrouter"
assert profile.model == "google/gemini-2.5-flash"
def test_default_profiles_can_be_overridden_from_json_env(monkeypatch):
monkeypatch.setenv(
"LLM_CONNECT_PROFILES_JSON",

View File

@@ -15,6 +15,8 @@ STRUCTURED_SCHEMA = {
"required": ["summary", "recommendations"],
}
OPENROUTER_STRUCTURED_MODEL = "google/gemini-2.5-flash"
SMOKE_CONFIG = RunConfig(
model_name="gpt-4",
@@ -54,7 +56,7 @@ def test_openrouter_structured_output_payload_and_model_routing(monkeypatch):
monkeypatch.setattr("llm_connect.openrouter.post_json", fake_post_json)
adapter = OpenRouterAdapter(
model="anthropic/claude-sonnet-4",
model=OPENROUTER_STRUCTURED_MODEL,
api_key="or-test",
api_base="https://openrouter.example/api/v1",
)
@@ -62,15 +64,58 @@ def test_openrouter_structured_output_payload_and_model_routing(monkeypatch):
response = adapter.execute_prompt("Return JSON.", SMOKE_CONFIG)
payload = captured["payload"]
assert response.model == "anthropic/claude-sonnet-4"
assert payload["model"] == "anthropic/claude-sonnet-4"
assert response.model == OPENROUTER_STRUCTURED_MODEL
assert payload["model"] == OPENROUTER_STRUCTURED_MODEL
assert payload["response_format"]["json_schema"]["schema"] == STRUCTURED_SCHEMA
assert payload["response_format"]["json_schema"]["strict"] is False
assert payload["response_format"]["json_schema"]["strict"] is True
assert payload["provider"]["require_parameters"] is True
assert "reasoning_effort" not in payload
assert "max_depth" not in payload
assert "json_schema" not in payload
def test_openrouter_structured_output_preserves_provider_options(monkeypatch):
captured: dict[str, object] = {}
def fake_post_json(url, payload, headers=None, timeout=300): # noqa: ANN001
captured["payload"] = payload
return {
"id": "or-response",
"model": payload["model"],
"choices": [
{
"message": {
"content": json.dumps({"summary": "ok", "recommendations": []})
},
"finish_reason": "stop",
}
],
"usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3},
}
config = RunConfig(
model_name="gpt-4",
temperature=0.1,
max_tokens=300,
model_params={
"json_schema": STRUCTURED_SCHEMA,
"provider": {"order": ["Anthropic"]},
},
)
monkeypatch.setattr("llm_connect.openrouter.post_json", fake_post_json)
adapter = OpenRouterAdapter(
model=OPENROUTER_STRUCTURED_MODEL,
api_key="or-test",
api_base="https://openrouter.example/api/v1",
)
adapter.execute_prompt("Return JSON.", config)
payload = captured["payload"]
assert payload["provider"]["order"] == ["Anthropic"]
assert payload["provider"]["require_parameters"] is True
def test_openai_structured_output_payload(monkeypatch):
captured: dict[str, object] = {}

View File

@@ -4,13 +4,13 @@ type: workplan
title: "Activity-Core Always-On LLM Endpoint"
domain: custodian
repo: llm-connect
status: blocked
status: finished
owner: codex
topic_slug: activity-core-llm-endpoint
planning_priority: high
planning_order: 6
created: "2026-06-07"
updated: "2026-06-07"
updated: "2026-06-18"
depends_on_workplans:
- LLM-WP-0003
related_workplans:
@@ -20,7 +20,7 @@ state_hub_workstream_id: "8de71d58-1193-424f-8338-a9aa4e173c5b"
# LLM-WP-0006 - Activity-Core Always-On LLM Endpoint
**status:** blocked
**status:** finished
**owner:** codex
## Purpose
@@ -206,7 +206,7 @@ reported distinctly from provider transport failure.
id: LLM-WP-0006-T07
title: "Publish verified LLM_CONNECT_URL handoff and activity-core smoke evidence"
priority: high
status: blocked
status: done
state_hub_task_id: "92e043f0-5ca8-4c2d-b8f6-dd5fbf8ccb62"
```
@@ -341,6 +341,74 @@ Remaining blocked live gate:
`activity-core` with the verified URL and the remaining provider Secret gate
for schema-valid `POST /execute` and `daily_triage` evidence.
2026-06-17 recheck:
- Verified the live `coulombcore` Kubernetes path is reachable and the
`activity-core` namespace `llm-connect` Deployment remains `1/1` available
with Service `llm-connect` on port `8080`.
- Confirmed the `llm-connect-provider-secrets` Secret object exists but still
reports `DATA 0`; no Secret values were inspected.
- Re-ran the in-namespace fixture smoke with the node-local image. The first
corrected pod needed `--image-pull-policy=Never` because the `:latest` tag
otherwise attempted a Docker Hub pull. With the local image, the smoke reached
`/execute` and failed safely with
`configuration_error: Adapter rejected RunConfig`.
- State Hub now also has a 2026-06-16 `daily_triage` event from
`activity-core` showing `LLM_CONNECT_URL is not configured`, and the local
activity-core runtime manifest still has `LLM_CONNECT_URL: ""`.
- `LLM-WP-0006-T07` therefore remains externally blocked until the provider
Secret is populated outside Git/State Hub, activity-core consumes
`http://llm-connect.activity-core.svc.cluster.local:8080` with
`LLM_CONNECT_TIMEOUT_SECONDS=300`, the fixture smoke returns schema-valid
JSON, and a non-secret `daily_triage` evidence event is recorded.
2026-06-18 recheck:
- activity-core has repo-local work to consume the stable URL:
`actcore-runtime-config` now sets
`LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080`
and `LLM_CONNECT_TIMEOUT_SECONDS=300`.
- The live `activity-core` namespace has not yet been reconciled to that
activity-core runtime surface; live deployments currently show only
`deployment.apps/llm-connect`, and live ConfigMaps show only
`kube-root-ca.crt` and `llm-connect-config`.
- The live `llm-connect-provider-secrets` Secret still reports `DATA 0`; no
Secret values were inspected.
- ops-warden's credential-routing guidance says LLM provider API keys are not
an ops-warden issuance task. The remaining credential gate belongs to the
approved operator/OpenBao-to-Kubernetes Secret path for
`activity-core/llm-connect-provider-secrets`.
- `LLM-WP-0006-T07` remains blocked until the provider Secret is populated,
the activity-core runtime is reconciled with the URL/timeout config, the
fixture smoke returns schema-valid JSON from inside the namespace, and
activity-core records non-secret `daily_triage` evidence.
2026-06-18 closure:
- Populated-provider state is now live: `activity-core/llm-connect-provider-secrets`
reports `DATA 1`; no Secret values were inspected or recorded.
- Updated the OpenRouter structured-output path to request strict JSON schema
output and to set `provider.require_parameters=true` for schema calls, so
OpenRouter routes only to providers that support the requested structured
output parameters.
- OpenRouter model metadata showed the previous
`anthropic/claude-sonnet-4` profile model does not advertise
`response_format`/`structured_outputs`; switched the activity-core profile
and Kubernetes ConfigMap defaults to `google/gemini-2.5-flash`, which does.
- Rebuilt `docker.io/library/llm-connect:latest` from `Containerfile`,
imported it into the `coulombcore` k3s image store, applied the updated
non-secret `llm-connect-config` ConfigMap, and rolled out
`deployment/llm-connect`.
- Verified live ConfigMap values:
`LLM_CONNECT_MODEL=google/gemini-2.5-flash` and
`LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=google/gemini-2.5-flash`.
- Final in-namespace smoke passed against
`http://llm-connect.activity-core.svc.cluster.local:8080` with:
`smoke: pass health=ok latency_seconds=2.147 recommendations=1`.
- Cleaned up the one-shot smoke pod after collecting logs. The llm-connect
endpoint handoff is complete; collecting scheduled `daily_triage` evidence
now belongs to activity-core / `ACTIVITY-WP-0006`.
## Closure Notes
After this workplan file is added or task statuses change, ask the custodian