Files
llm-connect/workplans/LLM-WP-0006-activity-core-always-on-endpoint.md
tegwick 90eb39c247
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Complete activity-core LLM endpoint handoff (LLM-WP-0006)
Switch the custodian triage default from anthropic/claude-sonnet-4 to
google/gemini-2.5-flash, which advertises structured-output support on
OpenRouter. Tighten the OpenRouter adapter to send strict JSON schema
requests and set provider.require_parameters=true so routing only hits
providers that honor the requested response_format.

Update Kubernetes deploy docs and config for the verified coulombcore
handoff: Containerfile build path, image-pull-policy=Never for smoke
pods, credential-routing notes, and live smoke evidence. Mark
LLM-WP-0006 finished with closure notes from 2026-06-18.
2026-06-19 13:51:12 +02:00

18 KiB

id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, created, updated, depends_on_workplans, related_workplans, state_hub_workstream_id
id type title domain repo status owner topic_slug planning_priority planning_order created updated depends_on_workplans related_workplans state_hub_workstream_id
LLM-WP-0006 workplan Activity-Core Always-On LLM Endpoint custodian llm-connect finished codex activity-core-llm-endpoint high 6 2026-06-07 2026-06-18
LLM-WP-0003
ACTIVITY-WP-0006
8de71d58-1193-424f-8338-a9aa4e173c5b

LLM-WP-0006 - Activity-Core Always-On LLM Endpoint

status: finished owner: codex

Purpose

Provide an operator-approved, always-on llm-connect HTTP endpoint for activity-core daily WSJF triage. The service must be reachable from the activity-core Kubernetes namespace, expose the existing GET /health and POST /execute contract, support the custodian-triage-balanced runtime profile, and return JSON content that satisfies the daily triage schema without leaking provider credentials or secret material into Git, logs, or State Hub.

This is not a new public API. The current llm_connect.server contract is a lightweight internal service surface; this workplan turns it into a durable internal dependency with profile resolution, deployable artifacts, smoke tests, and activity-core handoff evidence.

Demand Signal

State Hub messages from activity-core on 2026-06-07 requested a stable llm-connect endpoint before ACTIVITY-WP-0006/T03 can collect clean scheduled WSJF evidence.

Required behavior from those messages:

  • GET /health returns 200 from inside the activity-core runtime path.
  • POST /execute accepts activity-core RunConfig payloads with model_name=custodian-triage-balanced, temperature=0.2, max_tokens=1800, max_depth=2, model_params.reasoning_effort=medium, and model_params.json_schema for the daily triage report.
  • The response contains a string content field whose value is valid JSON matching the daily triage schema.
  • Provider credentials stay outside Git and outside State Hub messages/progress.
  • The stable service URL can be handed to activity-core as LLM_CONNECT_URL.
  • The service fits within LLM_CONNECT_TIMEOUT_SECONDS=300 and surfaces useful provider/transport errors without exposing secrets.

Current Repo State

Already present:

  • llm_connect/server.py exposes GET /health and POST /execute via ThreadingHTTPServer.
  • /execute forwards RunConfig fields including max_depth and model_params.
  • Structured-output helpers translate model_params.json_schema for OpenAI, OpenRouter, Gemini, and Claude Code CLI.
  • Debug and audit modes redact provider request headers and can replay captured adapter transformations.

Missing for this request:

  • No named runtime profile resolver for custodian-triage-balanced.
  • No container or Kubernetes deployment artifact for an always-on service.
  • No documented secret/config injection path for the cluster service.
  • No activity-core daily triage fixture or in-cluster smoke job.
  • No committed handoff document naming the final stable URL and verification evidence.

T01 - Lock Activity-Core Contract Fixture

id: LLM-WP-0006-T01
title: "Lock activity-core daily WSJF request and schema fixture"
priority: high
status: done
state_hub_task_id: "f1d21c4b-2df3-4da8-8e6e-418fd7998a63"

Capture a non-secret fixture for the exact POST /execute request used by daily-statehub-wsjf-triage, including the daily triage JSON schema, timeout budget, expected response shape, and minimum prompt fields. Store only schema and dummy prompt/evidence values in the repo.

Done when a fixture can be used by tests and smoke scripts without any provider credentials or live State Hub data, and the workplan notes identify the activity-core consumer contract it represents.

T02 - Add Named Runtime Profile Resolution

id: LLM-WP-0006-T02
title: "Resolve custodian-triage-balanced to provider, model, and RunConfig defaults"
priority: high
status: done
state_hub_task_id: "4538bae3-e8cf-4aa6-9056-270fd8d54caa"

Add a small named-profile layer for server mode so activity-core can send model_name=custodian-triage-balanced while operators configure the underlying provider/model out of band. The profile should merge request overrides with profile defaults for temperature, max tokens, max depth, timeout, and portable model_params, while preserving the existing direct provider/model behavior.

Done when unit tests prove custodian-triage-balanced resolves to the selected adapter/model without hard-coding provider secrets, unknown profile names fail with a clear non-secret error, and existing /execute behavior remains backward compatible.

T03 - Harden Server Responses for Operations

id: LLM-WP-0006-T03
title: "Return useful non-secret provider and transport errors from server mode"
priority: high
status: done
state_hub_task_id: "d4adfe3b-6a57-4184-86fd-2eb11979f075"

Review server error handling for provider configuration failures, timeouts, HTTP/API failures, invalid profile config, and malformed structured-output responses. Keep the normal LLMResponse.to_dict() success shape, but make errors actionable for operators and consumers without echoing API keys, bearer tokens, request headers, or prompt bodies by default.

Done when tests cover sanitized error responses for configuration, timeout, provider/API, and profile validation failures, and debug/audit mode remains opt-in and redacted.

T04 - Package the Always-On Service

id: LLM-WP-0006-T04
title: "Add container packaging and service entrypoint for llm-connect server"
priority: high
status: done
state_hub_task_id: "38822b17-fa58-4583-939f-26e59b9c93c7"

Create the deployable service artifact: container build definition, non-root runtime, healthcheck, explicit listen host/port, and environment-driven profile configuration. Keep provider keys injected only at runtime through the approved cluster secret path.

Done when the image builds locally, starts with mock and at least one real provider configuration path, passes GET /health, and can receive a fixture POST /execute without writing secrets to stdout, image layers, or committed files.

T05 - Add Kubernetes Deployment Surface

id: LLM-WP-0006-T05
title: "Provide Kubernetes Deployment, Service, probes, and secret references"
priority: high
status: done
state_hub_task_id: "f9743610-b573-41b8-952f-b27319acb3e3"

Add the cluster deployment surface for an internal llm-connect service: Deployment, Service, readiness/liveness probes, ConfigMap/profile settings, Secret references for provider credentials, resource requests/limits, and network access scoped to the activity-core namespace. Use the repository's current deployment conventions if a shared Railiance chart location is selected during implementation.

Done when an operator can apply the manifests without editing secret values into Git, the service exposes stable cluster DNS, and GET /health succeeds from an activity-core pod or equivalent smoke pod.

T06 - Build Smoke Tests and Validation Scripts

id: LLM-WP-0006-T06
title: "Validate health, fixture execute, JSON schema content, and timeout budget"
priority: high
status: done
state_hub_task_id: "f046d68b-97f3-4471-a1f6-f1ab351ec448"

Add smoke tooling that can run locally against mock/profile mode and in-cluster against the deployed Service. It should check health, post the daily triage fixture, parse response.content as JSON, validate it against the daily triage schema, and report latency relative to the 300 second activity-core timeout.

Done when the smoke path produces a clear pass/fail summary without dumping secret headers or provider credentials, and failed JSON/schema validation is reported distinctly from provider transport failure.

T07 - Coordinate Activity-Core Handoff

id: LLM-WP-0006-T07
title: "Publish verified LLM_CONNECT_URL handoff and activity-core smoke evidence"
priority: high
status: done
state_hub_task_id: "92e043f0-5ca8-4c2d-b8f6-dd5fbf8ccb62"

After the service is deployed and smoke-tested, hand the stable URL to the activity-core/railiance-cluster operator for LLM_CONNECT_URL. Coordinate one manual or smoke daily WSJF run and record non-secret evidence that a State Hub daily_triage event was emitted.

Done when the final URL value is documented in the appropriate operator-owned config handoff, a fixture POST /execute succeeds from the activity-core namespace, and activity-core has enough evidence to start counting clean 07:20 Europe/Berlin scheduled runs toward ACTIVITY-WP-0006/T03.

Scope Guardrails

In scope:

  • Server-mode profile resolution needed by activity-core.
  • Internal service packaging and Kubernetes deployment artifacts.
  • Redacted diagnostics and operator-safe error responses.
  • Health and execute smoke tooling using non-secret fixtures.
  • Coordination notes for the final LLM_CONNECT_URL handoff.

Out of scope:

  • Publishing llm-connect as a public internet service.
  • Storing provider credentials, live prompts, or State Hub event payloads in Git.
  • Replacing activity-core's scheduler or WSJF triage logic.
  • Guaranteeing three scheduled production runs; this plan provides the endpoint and first smoke evidence, while scheduled-run collection remains activity-core ownership.
  • Choosing or rotating production provider credentials; that is an operator secret-management action.

Acceptance

  • python -m llm_connect.server or the packaged service starts an internal endpoint with a configured custodian-triage-balanced profile.
  • GET /health returns 200 locally and from inside the activity-core runtime network path.
  • A fixture POST /execute with the daily WSJF schema returns an LLMResponse whose content field is a string containing schema-valid JSON.
  • Provider failures, timeouts, and profile/config errors return useful non-secret error bodies.
  • The deployed Service has readiness/liveness probes, runtime-only secret injection, and a documented stable URL for activity-core.
  • A manual or smoke daily WSJF run emits non-secret evidence of a State Hub daily_triage event.

Risks and Open Questions

  • The final provider/model behind custodian-triage-balanced needs operator approval and runtime secret availability. The profile layer should keep that choice configurable.
  • If the chosen provider does not reliably honor the supplied JSON schema, the smoke path may need a retry or repair strategy; that should be explicit and bounded if added.
  • The repository currently has no deployment directory. Implementation must decide whether Kubernetes artifacts live here, in a Railiance deployment repo, or are split between code-owned defaults here and environment-owned overlays elsewhere.
  • llm_connect.server is stdlib HTTP and thread-per-request. That is likely sufficient for daily WSJF traffic, but sustained multi-consumer use may need a later ASGI/worker model.

Implementation Notes

2026-06-07:

  • Added non-secret activity-core fixtures under fixtures/activity_core/ using the daily-triage-report schema from activity-core's Railiance runtime.
  • Added llm_connect.profiles with custodian-triage-balanced profile dispatch, env/file profile overrides, and metadata on profiled responses.
  • Updated llm_connect.server so CLI serve mode enables runtime profiles by default, reads host/port/provider/model defaults from env, validates configs before execution, and returns structured sanitized error bodies.
  • Added LLM_CONNECT_MOCK_RESPONSE support for local mock server smokes.
  • Added standard-library smoke tooling in scripts/smoke_activity_core_endpoint.py, plus tests that run the smoke path against an in-process profiled mock HTTP server.
  • Added Containerfile, .dockerignore, and a Kubernetes overlay at deploy/k8s/activity-core-llm-connect/.
  • Added handoff docs in docs/activity-core-llm-endpoint.md.
  • Verification completed locally: python3 -m pytest tests/test_profiles.py tests/test_server.py tests/test_activity_core_smoke.py tests/test_factory.py tests/test_package_exports.py; docker build --progress=plain -f Containerfile -t llm-connect:wp0006-smoke .; and kubectl kustomize deploy/k8s/activity-core-llm-connect.

Live cluster evidence:

  • Imported docker.io/library/llm-connect:latest into the actual Railiance k3s node runtime on coulombcore (92.205.130.254) and updated the overlay to use that normalized image reference with imagePullPolicy: Never.
  • Applied the activity-core namespace deployment surface: ConfigMap, Secret reference, Service, Deployment, readiness/liveness probes, and NetworkPolicy.
  • Verified the live Deployment is 1/1 ready with image docker.io/library/llm-connect:latest.
  • Verified the stable in-cluster URL http://llm-connect.activity-core.svc.cluster.local:8080 returns {"status": "ok"} for GET /health from the activity-core namespace path.
  • Verified the activity-core fixture smoke reaches POST /execute; it fails with a structured configuration_error until the provider credential Secret is populated. No Secret values were inspected or recorded.

Remaining blocked live gate:

  • LLM-WP-0006-T07 still needs the runtime provider Secret populated outside Git/State Hub, a successful fixture POST /execute returning schema-valid JSON, the verified URL written to activity-core runtime config, and a manual/smoke daily WSJF run that emits a non-secret State Hub daily_triage event.

2026-06-07 follow-up:

  • Submitted State Hub message 8e644cb0-1af4-482c-8da7-7061080d21bc to railiance-cluster requesting image publication, runtime provider Secret creation outside Git/State Hub, overlay apply or porting, in-namespace /health, and fixture smoke evidence for LLM-WP-0006-T05.
  • Submitted State Hub message ff798e7c-b8ef-4a3f-ab92-00bf09410534 to activity-core requesting LLM_CONNECT_URL / timeout consumption after the cluster smoke, a manual or smoke daily WSJF run, State Hub daily_triage evidence, working-memory verification, and continuation of the three clean scheduled 07:20 Europe/Berlin runs for ACTIVITY-WP-0006-T03.
  • Submitted State Hub message 02033d4d-3cb0-41c8-b390-7b9e8471421e to railiance-cluster confirming the live Deployment, stable URL, and /health evidence after importing the image into the actual coulombcore k3s node.
  • Submitted State Hub message 771afe14-a2d0-46ca-b905-52018bf86c62 to activity-core with the verified URL and the remaining provider Secret gate for schema-valid POST /execute and daily_triage evidence.

2026-06-17 recheck:

  • Verified the live coulombcore Kubernetes path is reachable and the activity-core namespace llm-connect Deployment remains 1/1 available with Service llm-connect on port 8080.
  • Confirmed the llm-connect-provider-secrets Secret object exists but still reports DATA 0; no Secret values were inspected.
  • Re-ran the in-namespace fixture smoke with the node-local image. The first corrected pod needed --image-pull-policy=Never because the :latest tag otherwise attempted a Docker Hub pull. With the local image, the smoke reached /execute and failed safely with configuration_error: Adapter rejected RunConfig.
  • State Hub now also has a 2026-06-16 daily_triage event from activity-core showing LLM_CONNECT_URL is not configured, and the local activity-core runtime manifest still has LLM_CONNECT_URL: "".
  • LLM-WP-0006-T07 therefore remains externally blocked until the provider Secret is populated outside Git/State Hub, activity-core consumes http://llm-connect.activity-core.svc.cluster.local:8080 with LLM_CONNECT_TIMEOUT_SECONDS=300, the fixture smoke returns schema-valid JSON, and a non-secret daily_triage evidence event is recorded.

2026-06-18 recheck:

  • activity-core has repo-local work to consume the stable URL: actcore-runtime-config now sets LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080 and LLM_CONNECT_TIMEOUT_SECONDS=300.
  • The live activity-core namespace has not yet been reconciled to that activity-core runtime surface; live deployments currently show only deployment.apps/llm-connect, and live ConfigMaps show only kube-root-ca.crt and llm-connect-config.
  • The live llm-connect-provider-secrets Secret still reports DATA 0; no Secret values were inspected.
  • ops-warden's credential-routing guidance says LLM provider API keys are not an ops-warden issuance task. The remaining credential gate belongs to the approved operator/OpenBao-to-Kubernetes Secret path for activity-core/llm-connect-provider-secrets.
  • LLM-WP-0006-T07 remains blocked until the provider Secret is populated, the activity-core runtime is reconciled with the URL/timeout config, the fixture smoke returns schema-valid JSON from inside the namespace, and activity-core records non-secret daily_triage evidence.

2026-06-18 closure:

  • Populated-provider state is now live: activity-core/llm-connect-provider-secrets reports DATA 1; no Secret values were inspected or recorded.
  • Updated the OpenRouter structured-output path to request strict JSON schema output and to set provider.require_parameters=true for schema calls, so OpenRouter routes only to providers that support the requested structured output parameters.
  • OpenRouter model metadata showed the previous anthropic/claude-sonnet-4 profile model does not advertise response_format/structured_outputs; switched the activity-core profile and Kubernetes ConfigMap defaults to google/gemini-2.5-flash, which does.
  • Rebuilt docker.io/library/llm-connect:latest from Containerfile, imported it into the coulombcore k3s image store, applied the updated non-secret llm-connect-config ConfigMap, and rolled out deployment/llm-connect.
  • Verified live ConfigMap values: LLM_CONNECT_MODEL=google/gemini-2.5-flash and LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=google/gemini-2.5-flash.
  • Final in-namespace smoke passed against http://llm-connect.activity-core.svc.cluster.local:8080 with: smoke: pass health=ok latency_seconds=2.147 recommendations=1.
  • Cleaned up the one-shot smoke pod after collecting logs. The llm-connect endpoint handoff is complete; collecting scheduled daily_triage evidence now belongs to activity-core / ACTIVITY-WP-0006.

Closure Notes

After this workplan file is added or task statuses change, ask the custodian operator to run from ~/state-hub:

make fix-consistency REPO=llm-connect

That syncs file-backed workplan state into the State Hub cache.