Switch the custodian triage default from anthropic/claude-sonnet-4 to google/gemini-2.5-flash, which advertises structured-output support on OpenRouter. Tighten the OpenRouter adapter to send strict JSON schema requests and set provider.require_parameters=true so routing only hits providers that honor the requested response_format. Update Kubernetes deploy docs and config for the verified coulombcore handoff: Containerfile build path, image-pull-policy=Never for smoke pods, credential-routing notes, and live smoke evidence. Mark LLM-WP-0006 finished with closure notes from 2026-06-18.
18 KiB
id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, created, updated, depends_on_workplans, related_workplans, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | planning_priority | planning_order | created | updated | depends_on_workplans | related_workplans | state_hub_workstream_id | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LLM-WP-0006 | workplan | Activity-Core Always-On LLM Endpoint | custodian | llm-connect | finished | codex | activity-core-llm-endpoint | high | 6 | 2026-06-07 | 2026-06-18 |
|
|
8de71d58-1193-424f-8338-a9aa4e173c5b |
LLM-WP-0006 - Activity-Core Always-On LLM Endpoint
status: finished owner: codex
Purpose
Provide an operator-approved, always-on llm-connect HTTP endpoint for
activity-core daily WSJF triage. The service must be reachable from the
activity-core Kubernetes namespace, expose the existing GET /health and
POST /execute contract, support the custodian-triage-balanced runtime
profile, and return JSON content that satisfies the daily triage schema without
leaking provider credentials or secret material into Git, logs, or State Hub.
This is not a new public API. The current llm_connect.server contract is a
lightweight internal service surface; this workplan turns it into a durable
internal dependency with profile resolution, deployable artifacts, smoke tests,
and activity-core handoff evidence.
Demand Signal
State Hub messages from activity-core on 2026-06-07 requested a stable
llm-connect endpoint before ACTIVITY-WP-0006/T03 can collect clean scheduled
WSJF evidence.
Required behavior from those messages:
GET /healthreturns 200 from inside the activity-core runtime path.POST /executeaccepts activity-coreRunConfigpayloads withmodel_name=custodian-triage-balanced,temperature=0.2,max_tokens=1800,max_depth=2,model_params.reasoning_effort=medium, andmodel_params.json_schemafor the daily triage report.- The response contains a string
contentfield whose value is valid JSON matching the daily triage schema. - Provider credentials stay outside Git and outside State Hub messages/progress.
- The stable service URL can be handed to activity-core as
LLM_CONNECT_URL. - The service fits within
LLM_CONNECT_TIMEOUT_SECONDS=300and surfaces useful provider/transport errors without exposing secrets.
Current Repo State
Already present:
llm_connect/server.pyexposesGET /healthandPOST /executeviaThreadingHTTPServer./executeforwardsRunConfigfields includingmax_depthandmodel_params.- Structured-output helpers translate
model_params.json_schemafor OpenAI, OpenRouter, Gemini, and Claude Code CLI. - Debug and audit modes redact provider request headers and can replay captured adapter transformations.
Missing for this request:
- No named runtime profile resolver for
custodian-triage-balanced. - No container or Kubernetes deployment artifact for an always-on service.
- No documented secret/config injection path for the cluster service.
- No activity-core daily triage fixture or in-cluster smoke job.
- No committed handoff document naming the final stable URL and verification evidence.
T01 - Lock Activity-Core Contract Fixture
id: LLM-WP-0006-T01
title: "Lock activity-core daily WSJF request and schema fixture"
priority: high
status: done
state_hub_task_id: "f1d21c4b-2df3-4da8-8e6e-418fd7998a63"
Capture a non-secret fixture for the exact POST /execute request used by
daily-statehub-wsjf-triage, including the daily triage JSON schema, timeout
budget, expected response shape, and minimum prompt fields. Store only schema
and dummy prompt/evidence values in the repo.
Done when a fixture can be used by tests and smoke scripts without any provider credentials or live State Hub data, and the workplan notes identify the activity-core consumer contract it represents.
T02 - Add Named Runtime Profile Resolution
id: LLM-WP-0006-T02
title: "Resolve custodian-triage-balanced to provider, model, and RunConfig defaults"
priority: high
status: done
state_hub_task_id: "4538bae3-e8cf-4aa6-9056-270fd8d54caa"
Add a small named-profile layer for server mode so activity-core can send
model_name=custodian-triage-balanced while operators configure the underlying
provider/model out of band. The profile should merge request overrides with
profile defaults for temperature, max tokens, max depth, timeout, and portable
model_params, while preserving the existing direct provider/model behavior.
Done when unit tests prove custodian-triage-balanced resolves to the selected
adapter/model without hard-coding provider secrets, unknown profile names fail
with a clear non-secret error, and existing /execute behavior remains
backward compatible.
T03 - Harden Server Responses for Operations
id: LLM-WP-0006-T03
title: "Return useful non-secret provider and transport errors from server mode"
priority: high
status: done
state_hub_task_id: "d4adfe3b-6a57-4184-86fd-2eb11979f075"
Review server error handling for provider configuration failures, timeouts,
HTTP/API failures, invalid profile config, and malformed structured-output
responses. Keep the normal LLMResponse.to_dict() success shape, but make
errors actionable for operators and consumers without echoing API keys, bearer
tokens, request headers, or prompt bodies by default.
Done when tests cover sanitized error responses for configuration, timeout, provider/API, and profile validation failures, and debug/audit mode remains opt-in and redacted.
T04 - Package the Always-On Service
id: LLM-WP-0006-T04
title: "Add container packaging and service entrypoint for llm-connect server"
priority: high
status: done
state_hub_task_id: "38822b17-fa58-4583-939f-26e59b9c93c7"
Create the deployable service artifact: container build definition, non-root runtime, healthcheck, explicit listen host/port, and environment-driven profile configuration. Keep provider keys injected only at runtime through the approved cluster secret path.
Done when the image builds locally, starts with mock and at least one real
provider configuration path, passes GET /health, and can receive a fixture
POST /execute without writing secrets to stdout, image layers, or committed
files.
T05 - Add Kubernetes Deployment Surface
id: LLM-WP-0006-T05
title: "Provide Kubernetes Deployment, Service, probes, and secret references"
priority: high
status: done
state_hub_task_id: "f9743610-b573-41b8-952f-b27319acb3e3"
Add the cluster deployment surface for an internal llm-connect service:
Deployment, Service, readiness/liveness probes, ConfigMap/profile settings,
Secret references for provider credentials, resource requests/limits, and
network access scoped to the activity-core namespace. Use the repository's
current deployment conventions if a shared Railiance chart location is selected
during implementation.
Done when an operator can apply the manifests without editing secret values
into Git, the service exposes stable cluster DNS, and GET /health succeeds
from an activity-core pod or equivalent smoke pod.
T06 - Build Smoke Tests and Validation Scripts
id: LLM-WP-0006-T06
title: "Validate health, fixture execute, JSON schema content, and timeout budget"
priority: high
status: done
state_hub_task_id: "f046d68b-97f3-4471-a1f6-f1ab351ec448"
Add smoke tooling that can run locally against mock/profile mode and in-cluster
against the deployed Service. It should check health, post the daily triage
fixture, parse response.content as JSON, validate it against the daily triage
schema, and report latency relative to the 300 second activity-core timeout.
Done when the smoke path produces a clear pass/fail summary without dumping secret headers or provider credentials, and failed JSON/schema validation is reported distinctly from provider transport failure.
T07 - Coordinate Activity-Core Handoff
id: LLM-WP-0006-T07
title: "Publish verified LLM_CONNECT_URL handoff and activity-core smoke evidence"
priority: high
status: done
state_hub_task_id: "92e043f0-5ca8-4c2d-b8f6-dd5fbf8ccb62"
After the service is deployed and smoke-tested, hand the stable URL to the
activity-core/railiance-cluster operator for LLM_CONNECT_URL. Coordinate one
manual or smoke daily WSJF run and record non-secret evidence that a State Hub
daily_triage event was emitted.
Done when the final URL value is documented in the appropriate operator-owned
config handoff, a fixture POST /execute succeeds from the activity-core
namespace, and activity-core has enough evidence to start counting clean 07:20
Europe/Berlin scheduled runs toward ACTIVITY-WP-0006/T03.
Scope Guardrails
In scope:
- Server-mode profile resolution needed by activity-core.
- Internal service packaging and Kubernetes deployment artifacts.
- Redacted diagnostics and operator-safe error responses.
- Health and execute smoke tooling using non-secret fixtures.
- Coordination notes for the final
LLM_CONNECT_URLhandoff.
Out of scope:
- Publishing
llm-connectas a public internet service. - Storing provider credentials, live prompts, or State Hub event payloads in Git.
- Replacing activity-core's scheduler or WSJF triage logic.
- Guaranteeing three scheduled production runs; this plan provides the endpoint and first smoke evidence, while scheduled-run collection remains activity-core ownership.
- Choosing or rotating production provider credentials; that is an operator secret-management action.
Acceptance
python -m llm_connect.serveror the packaged service starts an internal endpoint with a configuredcustodian-triage-balancedprofile.GET /healthreturns 200 locally and from inside the activity-core runtime network path.- A fixture
POST /executewith the daily WSJF schema returns anLLMResponsewhosecontentfield is a string containing schema-valid JSON. - Provider failures, timeouts, and profile/config errors return useful non-secret error bodies.
- The deployed Service has readiness/liveness probes, runtime-only secret injection, and a documented stable URL for activity-core.
- A manual or smoke daily WSJF run emits non-secret evidence of a State Hub
daily_triageevent.
Risks and Open Questions
- The final provider/model behind
custodian-triage-balancedneeds operator approval and runtime secret availability. The profile layer should keep that choice configurable. - If the chosen provider does not reliably honor the supplied JSON schema, the smoke path may need a retry or repair strategy; that should be explicit and bounded if added.
- The repository currently has no deployment directory. Implementation must decide whether Kubernetes artifacts live here, in a Railiance deployment repo, or are split between code-owned defaults here and environment-owned overlays elsewhere.
llm_connect.serveris stdlib HTTP and thread-per-request. That is likely sufficient for daily WSJF traffic, but sustained multi-consumer use may need a later ASGI/worker model.
Implementation Notes
2026-06-07:
- Added non-secret activity-core fixtures under
fixtures/activity_core/using thedaily-triage-reportschema from activity-core's Railiance runtime. - Added
llm_connect.profileswithcustodian-triage-balancedprofile dispatch, env/file profile overrides, and metadata on profiled responses. - Updated
llm_connect.serverso CLI serve mode enables runtime profiles by default, reads host/port/provider/model defaults from env, validates configs before execution, and returns structured sanitized error bodies. - Added
LLM_CONNECT_MOCK_RESPONSEsupport for local mock server smokes. - Added standard-library smoke tooling in
scripts/smoke_activity_core_endpoint.py, plus tests that run the smoke path against an in-process profiled mock HTTP server. - Added
Containerfile,.dockerignore, and a Kubernetes overlay atdeploy/k8s/activity-core-llm-connect/. - Added handoff docs in
docs/activity-core-llm-endpoint.md. - Verification completed locally:
python3 -m pytest tests/test_profiles.py tests/test_server.py tests/test_activity_core_smoke.py tests/test_factory.py tests/test_package_exports.py;docker build --progress=plain -f Containerfile -t llm-connect:wp0006-smoke .; andkubectl kustomize deploy/k8s/activity-core-llm-connect.
Live cluster evidence:
- Imported
docker.io/library/llm-connect:latestinto the actual Railiance k3s node runtime oncoulombcore(92.205.130.254) and updated the overlay to use that normalized image reference withimagePullPolicy: Never. - Applied the
activity-corenamespace deployment surface: ConfigMap, Secret reference, Service, Deployment, readiness/liveness probes, and NetworkPolicy. - Verified the live Deployment is
1/1ready with imagedocker.io/library/llm-connect:latest. - Verified the stable in-cluster URL
http://llm-connect.activity-core.svc.cluster.local:8080returns{"status": "ok"}forGET /healthfrom the activity-core namespace path. - Verified the activity-core fixture smoke reaches
POST /execute; it fails with a structuredconfiguration_erroruntil the provider credential Secret is populated. No Secret values were inspected or recorded.
Remaining blocked live gate:
LLM-WP-0006-T07still needs the runtime provider Secret populated outside Git/State Hub, a successful fixturePOST /executereturning schema-valid JSON, the verified URL written to activity-core runtime config, and a manual/smoke daily WSJF run that emits a non-secret State Hubdaily_triageevent.
2026-06-07 follow-up:
- Submitted State Hub message
8e644cb0-1af4-482c-8da7-7061080d21bctorailiance-clusterrequesting image publication, runtime provider Secret creation outside Git/State Hub, overlay apply or porting, in-namespace/health, and fixture smoke evidence forLLM-WP-0006-T05. - Submitted State Hub message
ff798e7c-b8ef-4a3f-ab92-00bf09410534toactivity-corerequestingLLM_CONNECT_URL/ timeout consumption after the cluster smoke, a manual or smoke daily WSJF run, State Hubdaily_triageevidence, working-memory verification, and continuation of the three clean scheduled 07:20 Europe/Berlin runs forACTIVITY-WP-0006-T03. - Submitted State Hub message
02033d4d-3cb0-41c8-b390-7b9e8471421etorailiance-clusterconfirming the live Deployment, stable URL, and/healthevidence after importing the image into the actualcoulombcorek3s node. - Submitted State Hub message
771afe14-a2d0-46ca-b905-52018bf86c62toactivity-corewith the verified URL and the remaining provider Secret gate for schema-validPOST /executeanddaily_triageevidence.
2026-06-17 recheck:
- Verified the live
coulombcoreKubernetes path is reachable and theactivity-corenamespacellm-connectDeployment remains1/1available with Servicellm-connecton port8080. - Confirmed the
llm-connect-provider-secretsSecret object exists but still reportsDATA 0; no Secret values were inspected. - Re-ran the in-namespace fixture smoke with the node-local image. The first
corrected pod needed
--image-pull-policy=Neverbecause the:latesttag otherwise attempted a Docker Hub pull. With the local image, the smoke reached/executeand failed safely withconfiguration_error: Adapter rejected RunConfig. - State Hub now also has a 2026-06-16
daily_triageevent fromactivity-coreshowingLLM_CONNECT_URL is not configured, and the local activity-core runtime manifest still hasLLM_CONNECT_URL: "". LLM-WP-0006-T07therefore remains externally blocked until the provider Secret is populated outside Git/State Hub, activity-core consumeshttp://llm-connect.activity-core.svc.cluster.local:8080withLLM_CONNECT_TIMEOUT_SECONDS=300, the fixture smoke returns schema-valid JSON, and a non-secretdaily_triageevidence event is recorded.
2026-06-18 recheck:
- activity-core has repo-local work to consume the stable URL:
actcore-runtime-confignow setsLLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080andLLM_CONNECT_TIMEOUT_SECONDS=300. - The live
activity-corenamespace has not yet been reconciled to that activity-core runtime surface; live deployments currently show onlydeployment.apps/llm-connect, and live ConfigMaps show onlykube-root-ca.crtandllm-connect-config. - The live
llm-connect-provider-secretsSecret still reportsDATA 0; no Secret values were inspected. - ops-warden's credential-routing guidance says LLM provider API keys are not
an ops-warden issuance task. The remaining credential gate belongs to the
approved operator/OpenBao-to-Kubernetes Secret path for
activity-core/llm-connect-provider-secrets. LLM-WP-0006-T07remains blocked until the provider Secret is populated, the activity-core runtime is reconciled with the URL/timeout config, the fixture smoke returns schema-valid JSON from inside the namespace, and activity-core records non-secretdaily_triageevidence.
2026-06-18 closure:
- Populated-provider state is now live:
activity-core/llm-connect-provider-secretsreportsDATA 1; no Secret values were inspected or recorded. - Updated the OpenRouter structured-output path to request strict JSON schema
output and to set
provider.require_parameters=truefor schema calls, so OpenRouter routes only to providers that support the requested structured output parameters. - OpenRouter model metadata showed the previous
anthropic/claude-sonnet-4profile model does not advertiseresponse_format/structured_outputs; switched the activity-core profile and Kubernetes ConfigMap defaults togoogle/gemini-2.5-flash, which does. - Rebuilt
docker.io/library/llm-connect:latestfromContainerfile, imported it into thecoulombcorek3s image store, applied the updated non-secretllm-connect-configConfigMap, and rolled outdeployment/llm-connect. - Verified live ConfigMap values:
LLM_CONNECT_MODEL=google/gemini-2.5-flashandLLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=google/gemini-2.5-flash. - Final in-namespace smoke passed against
http://llm-connect.activity-core.svc.cluster.local:8080with:smoke: pass health=ok latency_seconds=2.147 recommendations=1. - Cleaned up the one-shot smoke pod after collecting logs. The llm-connect
endpoint handoff is complete; collecting scheduled
daily_triageevidence now belongs to activity-core /ACTIVITY-WP-0006.
Closure Notes
After this workplan file is added or task statuses change, ask the custodian
operator to run from ~/state-hub:
make fix-consistency REPO=llm-connect
That syncs file-backed workplan state into the State Hub cache.