--- id: LLM-WP-0006 type: workplan title: "Activity-Core Always-On LLM Endpoint" domain: custodian repo: llm-connect status: blocked owner: codex topic_slug: activity-core-llm-endpoint planning_priority: high planning_order: 6 created: "2026-06-07" updated: "2026-06-07" depends_on_workplans: - LLM-WP-0003 related_workplans: - ACTIVITY-WP-0006 state_hub_workstream_id: "8de71d58-1193-424f-8338-a9aa4e173c5b" --- # LLM-WP-0006 - Activity-Core Always-On LLM Endpoint **status:** blocked **owner:** codex ## Purpose Provide an operator-approved, always-on `llm-connect` HTTP endpoint for `activity-core` daily WSJF triage. The service must be reachable from the `activity-core` Kubernetes namespace, expose the existing `GET /health` and `POST /execute` contract, support the `custodian-triage-balanced` runtime profile, and return JSON content that satisfies the daily triage schema without leaking provider credentials or secret material into Git, logs, or State Hub. This is not a new public API. The current `llm_connect.server` contract is a lightweight internal service surface; this workplan turns it into a durable internal dependency with profile resolution, deployable artifacts, smoke tests, and activity-core handoff evidence. ## Demand Signal State Hub messages from `activity-core` on 2026-06-07 requested a stable `llm-connect` endpoint before `ACTIVITY-WP-0006/T03` can collect clean scheduled WSJF evidence. Required behavior from those messages: - `GET /health` returns 200 from inside the activity-core runtime path. - `POST /execute` accepts activity-core `RunConfig` payloads with `model_name=custodian-triage-balanced`, `temperature=0.2`, `max_tokens=1800`, `max_depth=2`, `model_params.reasoning_effort=medium`, and `model_params.json_schema` for the daily triage report. - The response contains a string `content` field whose value is valid JSON matching the daily triage schema. - Provider credentials stay outside Git and outside State Hub messages/progress. - The stable service URL can be handed to activity-core as `LLM_CONNECT_URL`. - The service fits within `LLM_CONNECT_TIMEOUT_SECONDS=300` and surfaces useful provider/transport errors without exposing secrets. ## Current Repo State Already present: - `llm_connect/server.py` exposes `GET /health` and `POST /execute` via `ThreadingHTTPServer`. - `/execute` forwards `RunConfig` fields including `max_depth` and `model_params`. - Structured-output helpers translate `model_params.json_schema` for OpenAI, OpenRouter, Gemini, and Claude Code CLI. - Debug and audit modes redact provider request headers and can replay captured adapter transformations. Missing for this request: - No named runtime profile resolver for `custodian-triage-balanced`. - No container or Kubernetes deployment artifact for an always-on service. - No documented secret/config injection path for the cluster service. - No activity-core daily triage fixture or in-cluster smoke job. - No committed handoff document naming the final stable URL and verification evidence. ## T01 - Lock Activity-Core Contract Fixture ```task id: LLM-WP-0006-T01 title: "Lock activity-core daily WSJF request and schema fixture" priority: high status: done state_hub_task_id: "f1d21c4b-2df3-4da8-8e6e-418fd7998a63" ``` Capture a non-secret fixture for the exact `POST /execute` request used by `daily-statehub-wsjf-triage`, including the daily triage JSON schema, timeout budget, expected response shape, and minimum prompt fields. Store only schema and dummy prompt/evidence values in the repo. Done when a fixture can be used by tests and smoke scripts without any provider credentials or live State Hub data, and the workplan notes identify the activity-core consumer contract it represents. ## T02 - Add Named Runtime Profile Resolution ```task id: LLM-WP-0006-T02 title: "Resolve custodian-triage-balanced to provider, model, and RunConfig defaults" priority: high status: done state_hub_task_id: "4538bae3-e8cf-4aa6-9056-270fd8d54caa" ``` Add a small named-profile layer for server mode so activity-core can send `model_name=custodian-triage-balanced` while operators configure the underlying provider/model out of band. The profile should merge request overrides with profile defaults for temperature, max tokens, max depth, timeout, and portable `model_params`, while preserving the existing direct provider/model behavior. Done when unit tests prove `custodian-triage-balanced` resolves to the selected adapter/model without hard-coding provider secrets, unknown profile names fail with a clear non-secret error, and existing `/execute` behavior remains backward compatible. ## T03 - Harden Server Responses for Operations ```task id: LLM-WP-0006-T03 title: "Return useful non-secret provider and transport errors from server mode" priority: high status: done state_hub_task_id: "d4adfe3b-6a57-4184-86fd-2eb11979f075" ``` Review server error handling for provider configuration failures, timeouts, HTTP/API failures, invalid profile config, and malformed structured-output responses. Keep the normal `LLMResponse.to_dict()` success shape, but make errors actionable for operators and consumers without echoing API keys, bearer tokens, request headers, or prompt bodies by default. Done when tests cover sanitized error responses for configuration, timeout, provider/API, and profile validation failures, and debug/audit mode remains opt-in and redacted. ## T04 - Package the Always-On Service ```task id: LLM-WP-0006-T04 title: "Add container packaging and service entrypoint for llm-connect server" priority: high status: done state_hub_task_id: "38822b17-fa58-4583-939f-26e59b9c93c7" ``` Create the deployable service artifact: container build definition, non-root runtime, healthcheck, explicit listen host/port, and environment-driven profile configuration. Keep provider keys injected only at runtime through the approved cluster secret path. Done when the image builds locally, starts with mock and at least one real provider configuration path, passes `GET /health`, and can receive a fixture `POST /execute` without writing secrets to stdout, image layers, or committed files. ## T05 - Add Kubernetes Deployment Surface ```task id: LLM-WP-0006-T05 title: "Provide Kubernetes Deployment, Service, probes, and secret references" priority: high status: done state_hub_task_id: "f9743610-b573-41b8-952f-b27319acb3e3" ``` Add the cluster deployment surface for an internal `llm-connect` service: Deployment, Service, readiness/liveness probes, ConfigMap/profile settings, Secret references for provider credentials, resource requests/limits, and network access scoped to the activity-core namespace. Use the repository's current deployment conventions if a shared Railiance chart location is selected during implementation. Done when an operator can apply the manifests without editing secret values into Git, the service exposes stable cluster DNS, and `GET /health` succeeds from an activity-core pod or equivalent smoke pod. ## T06 - Build Smoke Tests and Validation Scripts ```task id: LLM-WP-0006-T06 title: "Validate health, fixture execute, JSON schema content, and timeout budget" priority: high status: done state_hub_task_id: "f046d68b-97f3-4471-a1f6-f1ab351ec448" ``` Add smoke tooling that can run locally against mock/profile mode and in-cluster against the deployed Service. It should check health, post the daily triage fixture, parse `response.content` as JSON, validate it against the daily triage schema, and report latency relative to the 300 second activity-core timeout. Done when the smoke path produces a clear pass/fail summary without dumping secret headers or provider credentials, and failed JSON/schema validation is reported distinctly from provider transport failure. ## T07 - Coordinate Activity-Core Handoff ```task id: LLM-WP-0006-T07 title: "Publish verified LLM_CONNECT_URL handoff and activity-core smoke evidence" priority: high status: blocked state_hub_task_id: "92e043f0-5ca8-4c2d-b8f6-dd5fbf8ccb62" ``` After the service is deployed and smoke-tested, hand the stable URL to the activity-core/railiance-cluster operator for `LLM_CONNECT_URL`. Coordinate one manual or smoke daily WSJF run and record non-secret evidence that a State Hub `daily_triage` event was emitted. Done when the final URL value is documented in the appropriate operator-owned config handoff, a fixture `POST /execute` succeeds from the activity-core namespace, and activity-core has enough evidence to start counting clean 07:20 Europe/Berlin scheduled runs toward `ACTIVITY-WP-0006/T03`. ## Scope Guardrails In scope: - Server-mode profile resolution needed by activity-core. - Internal service packaging and Kubernetes deployment artifacts. - Redacted diagnostics and operator-safe error responses. - Health and execute smoke tooling using non-secret fixtures. - Coordination notes for the final `LLM_CONNECT_URL` handoff. Out of scope: - Publishing `llm-connect` as a public internet service. - Storing provider credentials, live prompts, or State Hub event payloads in Git. - Replacing activity-core's scheduler or WSJF triage logic. - Guaranteeing three scheduled production runs; this plan provides the endpoint and first smoke evidence, while scheduled-run collection remains activity-core ownership. - Choosing or rotating production provider credentials; that is an operator secret-management action. ## Acceptance - `python -m llm_connect.server` or the packaged service starts an internal endpoint with a configured `custodian-triage-balanced` profile. - `GET /health` returns 200 locally and from inside the activity-core runtime network path. - A fixture `POST /execute` with the daily WSJF schema returns an `LLMResponse` whose `content` field is a string containing schema-valid JSON. - Provider failures, timeouts, and profile/config errors return useful non-secret error bodies. - The deployed Service has readiness/liveness probes, runtime-only secret injection, and a documented stable URL for activity-core. - A manual or smoke daily WSJF run emits non-secret evidence of a State Hub `daily_triage` event. ## Risks and Open Questions - The final provider/model behind `custodian-triage-balanced` needs operator approval and runtime secret availability. The profile layer should keep that choice configurable. - If the chosen provider does not reliably honor the supplied JSON schema, the smoke path may need a retry or repair strategy; that should be explicit and bounded if added. - The repository currently has no deployment directory. Implementation must decide whether Kubernetes artifacts live here, in a Railiance deployment repo, or are split between code-owned defaults here and environment-owned overlays elsewhere. - `llm_connect.server` is stdlib HTTP and thread-per-request. That is likely sufficient for daily WSJF traffic, but sustained multi-consumer use may need a later ASGI/worker model. ## Implementation Notes 2026-06-07: - Added non-secret activity-core fixtures under `fixtures/activity_core/` using the `daily-triage-report` schema from activity-core's Railiance runtime. - Added `llm_connect.profiles` with `custodian-triage-balanced` profile dispatch, env/file profile overrides, and metadata on profiled responses. - Updated `llm_connect.server` so CLI serve mode enables runtime profiles by default, reads host/port/provider/model defaults from env, validates configs before execution, and returns structured sanitized error bodies. - Added `LLM_CONNECT_MOCK_RESPONSE` support for local mock server smokes. - Added standard-library smoke tooling in `scripts/smoke_activity_core_endpoint.py`, plus tests that run the smoke path against an in-process profiled mock HTTP server. - Added `Containerfile`, `.dockerignore`, and a Kubernetes overlay at `deploy/k8s/activity-core-llm-connect/`. - Added handoff docs in `docs/activity-core-llm-endpoint.md`. - Verification completed locally: `python3 -m pytest tests/test_profiles.py tests/test_server.py tests/test_activity_core_smoke.py tests/test_factory.py tests/test_package_exports.py`; `docker build --progress=plain -f Containerfile -t llm-connect:wp0006-smoke .`; and `kubectl kustomize deploy/k8s/activity-core-llm-connect`. Live cluster evidence: - Imported `docker.io/library/llm-connect:latest` into the actual Railiance k3s node runtime on `coulombcore` (`92.205.130.254`) and updated the overlay to use that normalized image reference with `imagePullPolicy: Never`. - Applied the `activity-core` namespace deployment surface: ConfigMap, Secret reference, Service, Deployment, readiness/liveness probes, and NetworkPolicy. - Verified the live Deployment is `1/1` ready with image `docker.io/library/llm-connect:latest`. - Verified the stable in-cluster URL `http://llm-connect.activity-core.svc.cluster.local:8080` returns `{"status": "ok"}` for `GET /health` from the activity-core namespace path. - Verified the activity-core fixture smoke reaches `POST /execute`; it fails with a structured `configuration_error` until the provider credential Secret is populated. No Secret values were inspected or recorded. Remaining blocked live gate: - `LLM-WP-0006-T07` still needs the runtime provider Secret populated outside Git/State Hub, a successful fixture `POST /execute` returning schema-valid JSON, the verified URL written to activity-core runtime config, and a manual/smoke daily WSJF run that emits a non-secret State Hub `daily_triage` event. 2026-06-07 follow-up: - Submitted State Hub message `8e644cb0-1af4-482c-8da7-7061080d21bc` to `railiance-cluster` requesting image publication, runtime provider Secret creation outside Git/State Hub, overlay apply or porting, in-namespace `/health`, and fixture smoke evidence for `LLM-WP-0006-T05`. - Submitted State Hub message `ff798e7c-b8ef-4a3f-ab92-00bf09410534` to `activity-core` requesting `LLM_CONNECT_URL` / timeout consumption after the cluster smoke, a manual or smoke daily WSJF run, State Hub `daily_triage` evidence, working-memory verification, and continuation of the three clean scheduled 07:20 Europe/Berlin runs for `ACTIVITY-WP-0006-T03`. - Submitted State Hub message `02033d4d-3cb0-41c8-b390-7b9e8471421e` to `railiance-cluster` confirming the live Deployment, stable URL, and `/health` evidence after importing the image into the actual `coulombcore` k3s node. - Submitted State Hub message `771afe14-a2d0-46ca-b905-52018bf86c62` to `activity-core` with the verified URL and the remaining provider Secret gate for schema-valid `POST /execute` and `daily_triage` evidence. ## Closure Notes After this workplan file is added or task statuses change, ask the custodian operator to run from `~/state-hub`: ```bash make fix-consistency REPO=llm-connect ``` That syncs file-backed workplan state into the State Hub cache.