Complete activity-core LLM endpoint handoff (LLM-WP-0006)

Switch the custodian triage default from anthropic/claude-sonnet-4 to google/gemini-2.5-flash, which advertises structured-output support on OpenRouter. Tighten the OpenRouter adapter to send strict JSON schema requests and set provider.require_parameters=true so routing only hits providers that honor the requested response_format. Update Kubernetes deploy docs and config for the verified coulombcore handoff: Containerfile build path, image-pull-policy=Never for smoke pods, credential-routing notes, and live smoke evidence. Mark LLM-WP-0006 finished with closure notes from 2026-06-18.
2026-06-19 13:51:12 +02:00
parent 6a0319ee86
commit 90eb39c247
12 changed files with 176 additions and 27 deletions
--- a/README.md
+++ b/README.md
@@ -123,9 +123,9 @@ Useful runtime environment variables:
 LLM_CONNECT_HOST=0.0.0.0
 LLM_CONNECT_PORT=8080
 LLM_CONNECT_PROVIDER=openrouter
-LLM_CONNECT_MODEL=anthropic/claude-sonnet-4
+LLM_CONNECT_MODEL=google/gemini-2.5-flash
 LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER=openrouter
-LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=anthropic/claude-sonnet-4
+LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=google/gemini-2.5-flash
 ```

 For local smoke tests without provider credentials:
--- a/deploy/k8s/activity-core-llm-connect/README.md
+++ b/deploy/k8s/activity-core-llm-connect/README.md
@@ -17,10 +17,14 @@ kubectl -n activity-core create secret generic llm-connect-provider-secrets \
  --from-literal=OPENROUTER_API_KEY="$OPENROUTER_API_KEY"
 ```

+Provider API key custody belongs to the operator/OpenBao-to-Kubernetes Secret
+path. ops-warden documents this as outside its issuance scope; do not paste key
+values into Git, State Hub, logs, or chat.
+
 Apply:

 ```bash
-docker build -t docker.io/library/llm-connect:latest .
+docker build -f Containerfile -t docker.io/library/llm-connect:latest .
 docker save docker.io/library/llm-connect:latest | ssh coulombcore sudo k3s ctr -n k8s.io images import -
 kubectl apply -k deploy/k8s/activity-core-llm-connect
 kubectl -n activity-core rollout status deployment/llm-connect
@@ -33,6 +37,7 @@ fixtures and `scripts/smoke_activity_core_endpoint.py`:
 kubectl -n activity-core run llm-connect-smoke \
  --rm -i --restart=Never \
  --image=llm-connect:latest \
+  --image-pull-policy=Never \
  --env=LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080 \
  --env=LLM_CONNECT_TIMEOUT_SECONDS=300 \
  -- python scripts/smoke_activity_core_endpoint.py
--- a/deploy/k8s/activity-core-llm-connect/configmap.yaml
+++ b/deploy/k8s/activity-core-llm-connect/configmap.yaml
@@ -10,9 +10,9 @@ data:
  LLM_CONNECT_HOST: "0.0.0.0"
  LLM_CONNECT_PORT: "8080"
  LLM_CONNECT_PROVIDER: "openrouter"
-  LLM_CONNECT_MODEL: "anthropic/claude-sonnet-4"
+  LLM_CONNECT_MODEL: "google/gemini-2.5-flash"
  LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER: "openrouter"
-  LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL: "anthropic/claude-sonnet-4"
+  LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL: "google/gemini-2.5-flash"
  LLM_CONNECT_CUSTODIAN_TRIAGE_TEMPERATURE: "0.2"
  LLM_CONNECT_CUSTODIAN_TRIAGE_MAX_TOKENS: "1800"
  LLM_CONNECT_CUSTODIAN_TRIAGE_MAX_DEPTH: "2"
--- a/docs/activity-core-llm-endpoint.md
+++ b/docs/activity-core-llm-endpoint.md
@@ -27,7 +27,7 @@ Default runtime values:

 ```text
 provider=openrouter
-model=anthropic/claude-sonnet-4
+model=google/gemini-2.5-flash
 temperature=0.2
 max_tokens=1800
 max_depth=2
@@ -47,6 +47,12 @@ Provider credentials must be injected at runtime through
 `llm-connect-provider-secrets`; do not store credential values in Git or State
 Hub.

+Credential custody follows the ops-warden routing table: LLM provider API keys
+are an operator/OpenBao-to-Kubernetes Secret action, not an ops-warden issuance
+task. For the default OpenRouter profile, the Secret must provide
+`OPENROUTER_API_KEY` without exposing the value in Git, State Hub, logs, or
+chat.
+
 ## Local Smoke

 Run a mock server that returns known schema-valid daily triage JSON:
@@ -85,6 +91,7 @@ Run the in-namespace smoke:
 kubectl -n activity-core run llm-connect-smoke \
  --rm -i --restart=Never \
  --image=llm-connect:latest \
+  --image-pull-policy=Never \
  --env=LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080 \
  --env=LLM_CONNECT_TIMEOUT_SECONDS=300 \
  -- python scripts/smoke_activity_core_endpoint.py
@@ -92,13 +99,17 @@ kubectl -n activity-core run llm-connect-smoke \

 ## Handoff Status

-Code-owned artifacts are present in this repo. Live handoff is still pending
-operator action:
+Code-owned artifacts are present in this repo and the live llm-connect
+handoff is verified as of 2026-06-18:

- Build/publish the `llm-connect` image selected by Railiance.
- Create the runtime provider Secret outside Git.
- Apply `deploy/k8s/activity-core-llm-connect`.
- Smoke from the `activity-core` namespace.
- Set activity-core `LLM_CONNECT_URL` to the stable URL above.
- Run or observe one daily WSJF smoke/manual activity run and confirm a
-  non-secret State Hub `daily_triage` progress event.
+- `docker.io/library/llm-connect:latest` was rebuilt from `Containerfile`,
+  imported into the `coulombcore` k3s image store, and rolled out.
+- `activity-core/llm-connect-provider-secrets` reports `DATA 1`; no Secret
+  values were inspected or recorded.
+- The live ConfigMap sets `LLM_CONNECT_MODEL=google/gemini-2.5-flash` and
+  `LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=google/gemini-2.5-flash`.
+- The in-namespace smoke passed against the stable Service:
+  `smoke: pass health=ok latency_seconds=2.147 recommendations=1`.
+
+Scheduled `daily_triage` evidence collection is activity-core ownership under
+`ACTIVITY-WP-0006`.
--- a/llm_connect/_payload.py
+++ b/llm_connect/_payload.py
@@ -100,7 +100,7 @@ def merge_openai_chat_model_params(payload: dict[str, Any], model_params: dict[s
            "json_schema": {
                "name": "structured_output",
                "schema": schema,
-                "strict": False,
+                "strict": True,
            },
        }

--- a/llm_connect/openrouter.py
+++ b/llm_connect/openrouter.py
@@ -82,6 +82,13 @@ class OpenRouterAdapter(LLMAdapter):
        }
        if config.model_params:
            merge_openai_chat_model_params(payload, config.model_params)
+            provider_params = config.model_params.get("provider")
+            if isinstance(provider_params, dict):
+                payload["provider"] = dict(provider_params)
+            if _uses_json_schema_response_format(payload):
+                provider = payload.setdefault("provider", {})
+                if isinstance(provider, dict):
+                    provider.setdefault("require_parameters", True)

        headers = {
            "Authorization": f"Bearer {self._api_key}",
@@ -149,3 +156,8 @@ class OpenRouterAdapter(LLMAdapter):
                else:
                    raise
        raise last_exc  # type: ignore[misc]
+
+
+def _uses_json_schema_response_format(payload: Dict[str, Any]) -> bool:
+    response_format = payload.get("response_format")
+    return isinstance(response_format, dict) and response_format.get("type") == "json_schema"
--- a/llm_connect/profiles.py
+++ b/llm_connect/profiles.py
@@ -16,7 +16,7 @@ from llm_connect.models import LLMResponse, RunConfig

 CUSTODIAN_TRIAGE_BALANCED = "custodian-triage-balanced"
 DEFAULT_CUSTODIAN_TRIAGE_PROVIDER = "openrouter"
-DEFAULT_CUSTODIAN_TRIAGE_MODEL = "anthropic/claude-sonnet-4"
+DEFAULT_CUSTODIAN_TRIAGE_MODEL = "google/gemini-2.5-flash"
 _RUN_CONFIG_DEFAULTS = RunConfig()


--- a/llm_connect/server.py
+++ b/llm_connect/server.py
@@ -17,7 +17,7 @@ Usage (programmatic)::

 Usage (CLI)::

-    python -m llm_connect.server --port 8080 --provider openrouter --model anthropic/claude-sonnet-4
+    python -m llm_connect.server --port 8080 --provider openrouter --model google/gemini-2.5-flash
 """

 import argparse
--- a/tests/test_payload.py
+++ b/tests/test_payload.py
@@ -34,7 +34,7 @@ def test_openai_chat_model_params_translate_activity_core_shape():
        "json_schema": {
            "name": "structured_output",
            "schema": STRUCTURED_SCHEMA,
-            "strict": False,
+            "strict": True,
        },
    }
    assert payload["top_p"] == 0.8
--- a/tests/test_profiles.py
+++ b/tests/test_profiles.py
@@ -115,6 +115,14 @@ def test_unknown_custodian_profile_fails_without_secret_context():
    assert excinfo.value.context == {"profile": "custodian-missing"}


+def test_default_custodian_profile_uses_structured_output_capable_model():
+    profiles = default_runtime_profiles()
+    profile = profiles[CUSTODIAN_TRIAGE_BALANCED]
+
+    assert profile.provider == "openrouter"
+    assert profile.model == "google/gemini-2.5-flash"
+
+
 def test_default_profiles_can_be_overridden_from_json_env(monkeypatch):
    monkeypatch.setenv(
        "LLM_CONNECT_PROFILES_JSON",
--- a/tests/test_structured_output_smoke.py
+++ b/tests/test_structured_output_smoke.py
@@ -15,6 +15,8 @@ STRUCTURED_SCHEMA = {
    "required": ["summary", "recommendations"],
 }

+OPENROUTER_STRUCTURED_MODEL = "google/gemini-2.5-flash"
+

 SMOKE_CONFIG = RunConfig(
    model_name="gpt-4",
@@ -54,7 +56,7 @@ def test_openrouter_structured_output_payload_and_model_routing(monkeypatch):

    monkeypatch.setattr("llm_connect.openrouter.post_json", fake_post_json)
    adapter = OpenRouterAdapter(
-        model="anthropic/claude-sonnet-4",
+        model=OPENROUTER_STRUCTURED_MODEL,
        api_key="or-test",
        api_base="https://openrouter.example/api/v1",
    )
@@ -62,15 +64,58 @@ def test_openrouter_structured_output_payload_and_model_routing(monkeypatch):
    response = adapter.execute_prompt("Return JSON.", SMOKE_CONFIG)
    payload = captured["payload"]

-    assert response.model == "anthropic/claude-sonnet-4"
-    assert payload["model"] == "anthropic/claude-sonnet-4"
+    assert response.model == OPENROUTER_STRUCTURED_MODEL
+    assert payload["model"] == OPENROUTER_STRUCTURED_MODEL
    assert payload["response_format"]["json_schema"]["schema"] == STRUCTURED_SCHEMA
-    assert payload["response_format"]["json_schema"]["strict"] is False
+    assert payload["response_format"]["json_schema"]["strict"] is True
+    assert payload["provider"]["require_parameters"] is True
    assert "reasoning_effort" not in payload
    assert "max_depth" not in payload
    assert "json_schema" not in payload


+def test_openrouter_structured_output_preserves_provider_options(monkeypatch):
+    captured: dict[str, object] = {}
+
+    def fake_post_json(url, payload, headers=None, timeout=300):  # noqa: ANN001
+        captured["payload"] = payload
+        return {
+            "id": "or-response",
+            "model": payload["model"],
+            "choices": [
+                {
+                    "message": {
+                        "content": json.dumps({"summary": "ok", "recommendations": []})
+                    },
+                    "finish_reason": "stop",
+                }
+            ],
+            "usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3},
+        }
+
+    config = RunConfig(
+        model_name="gpt-4",
+        temperature=0.1,
+        max_tokens=300,
+        model_params={
+            "json_schema": STRUCTURED_SCHEMA,
+            "provider": {"order": ["Anthropic"]},
+        },
+    )
+    monkeypatch.setattr("llm_connect.openrouter.post_json", fake_post_json)
+    adapter = OpenRouterAdapter(
+        model=OPENROUTER_STRUCTURED_MODEL,
+        api_key="or-test",
+        api_base="https://openrouter.example/api/v1",
+    )
+
+    adapter.execute_prompt("Return JSON.", config)
+    payload = captured["payload"]
+
+    assert payload["provider"]["order"] == ["Anthropic"]
+    assert payload["provider"]["require_parameters"] is True
+
+
 def test_openai_structured_output_payload(monkeypatch):
    captured: dict[str, object] = {}

--- a/workplans/LLM-WP-0006-activity-core-always-on-endpoint.md
+++ b/workplans/LLM-WP-0006-activity-core-always-on-endpoint.md
@@ -4,13 +4,13 @@ type: workplan
 title: "Activity-Core Always-On LLM Endpoint"
 domain: custodian
 repo: llm-connect
-status: blocked
+status: finished
 owner: codex
 topic_slug: activity-core-llm-endpoint
 planning_priority: high
 planning_order: 6
 created: "2026-06-07"
-updated: "2026-06-07"
+updated: "2026-06-18"
 depends_on_workplans:
  - LLM-WP-0003
 related_workplans:
@@ -20,7 +20,7 @@ state_hub_workstream_id: "8de71d58-1193-424f-8338-a9aa4e173c5b"

 # LLM-WP-0006 - Activity-Core Always-On LLM Endpoint

-**status:** blocked
+**status:** finished
 **owner:** codex

 ## Purpose
@@ -206,7 +206,7 @@ reported distinctly from provider transport failure.
 id: LLM-WP-0006-T07
 title: "Publish verified LLM_CONNECT_URL handoff and activity-core smoke evidence"
 priority: high
-status: blocked
+status: done
 state_hub_task_id: "92e043f0-5ca8-4c2d-b8f6-dd5fbf8ccb62"
 ```

@@ -341,6 +341,74 @@ Remaining blocked live gate:
  `activity-core` with the verified URL and the remaining provider Secret gate
  for schema-valid `POST /execute` and `daily_triage` evidence.

+2026-06-17 recheck:
+
+- Verified the live `coulombcore` Kubernetes path is reachable and the
+  `activity-core` namespace `llm-connect` Deployment remains `1/1` available
+  with Service `llm-connect` on port `8080`.
+- Confirmed the `llm-connect-provider-secrets` Secret object exists but still
+  reports `DATA 0`; no Secret values were inspected.
+- Re-ran the in-namespace fixture smoke with the node-local image. The first
+  corrected pod needed `--image-pull-policy=Never` because the `:latest` tag
+  otherwise attempted a Docker Hub pull. With the local image, the smoke reached
+  `/execute` and failed safely with
+  `configuration_error: Adapter rejected RunConfig`.
+- State Hub now also has a 2026-06-16 `daily_triage` event from
+  `activity-core` showing `LLM_CONNECT_URL is not configured`, and the local
+  activity-core runtime manifest still has `LLM_CONNECT_URL: ""`.
+- `LLM-WP-0006-T07` therefore remains externally blocked until the provider
+  Secret is populated outside Git/State Hub, activity-core consumes
+  `http://llm-connect.activity-core.svc.cluster.local:8080` with
+  `LLM_CONNECT_TIMEOUT_SECONDS=300`, the fixture smoke returns schema-valid
+  JSON, and a non-secret `daily_triage` evidence event is recorded.
+
+2026-06-18 recheck:
+
+- activity-core has repo-local work to consume the stable URL:
+  `actcore-runtime-config` now sets
+  `LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080`
+  and `LLM_CONNECT_TIMEOUT_SECONDS=300`.
+- The live `activity-core` namespace has not yet been reconciled to that
+  activity-core runtime surface; live deployments currently show only
+  `deployment.apps/llm-connect`, and live ConfigMaps show only
+  `kube-root-ca.crt` and `llm-connect-config`.
+- The live `llm-connect-provider-secrets` Secret still reports `DATA 0`; no
+  Secret values were inspected.
+- ops-warden's credential-routing guidance says LLM provider API keys are not
+  an ops-warden issuance task. The remaining credential gate belongs to the
+  approved operator/OpenBao-to-Kubernetes Secret path for
+  `activity-core/llm-connect-provider-secrets`.
+- `LLM-WP-0006-T07` remains blocked until the provider Secret is populated,
+  the activity-core runtime is reconciled with the URL/timeout config, the
+  fixture smoke returns schema-valid JSON from inside the namespace, and
+  activity-core records non-secret `daily_triage` evidence.
+
+2026-06-18 closure:
+
+- Populated-provider state is now live: `activity-core/llm-connect-provider-secrets`
+  reports `DATA 1`; no Secret values were inspected or recorded.
+- Updated the OpenRouter structured-output path to request strict JSON schema
+  output and to set `provider.require_parameters=true` for schema calls, so
+  OpenRouter routes only to providers that support the requested structured
+  output parameters.
+- OpenRouter model metadata showed the previous
+  `anthropic/claude-sonnet-4` profile model does not advertise
+  `response_format`/`structured_outputs`; switched the activity-core profile
+  and Kubernetes ConfigMap defaults to `google/gemini-2.5-flash`, which does.
+- Rebuilt `docker.io/library/llm-connect:latest` from `Containerfile`,
+  imported it into the `coulombcore` k3s image store, applied the updated
+  non-secret `llm-connect-config` ConfigMap, and rolled out
+  `deployment/llm-connect`.
+- Verified live ConfigMap values:
+  `LLM_CONNECT_MODEL=google/gemini-2.5-flash` and
+  `LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=google/gemini-2.5-flash`.
+- Final in-namespace smoke passed against
+  `http://llm-connect.activity-core.svc.cluster.local:8080` with:
+  `smoke: pass health=ok latency_seconds=2.147 recommendations=1`.
+- Cleaned up the one-shot smoke pod after collecting logs. The llm-connect
+  endpoint handoff is complete; collecting scheduled `daily_triage` evidence
+  now belongs to activity-core / `ACTIVITY-WP-0006`.
+
 ## Closure Notes

 After this workplan file is added or task statuses change, ask the custodian