Files
llm-connect/workplans/LLM-WP-0006-activity-core-always-on-endpoint.md
tegwick 14ba47c129
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Add activity-core LLM endpoint support
2026-06-07 19:24:45 +02:00

354 lines
15 KiB
Markdown

---
id: LLM-WP-0006
type: workplan
title: "Activity-Core Always-On LLM Endpoint"
domain: custodian
repo: llm-connect
status: blocked
owner: codex
topic_slug: activity-core-llm-endpoint
planning_priority: high
planning_order: 6
created: "2026-06-07"
updated: "2026-06-07"
depends_on_workplans:
- LLM-WP-0003
related_workplans:
- ACTIVITY-WP-0006
state_hub_workstream_id: "8de71d58-1193-424f-8338-a9aa4e173c5b"
---
# LLM-WP-0006 - Activity-Core Always-On LLM Endpoint
**status:** blocked
**owner:** codex
## Purpose
Provide an operator-approved, always-on `llm-connect` HTTP endpoint for
`activity-core` daily WSJF triage. The service must be reachable from the
`activity-core` Kubernetes namespace, expose the existing `GET /health` and
`POST /execute` contract, support the `custodian-triage-balanced` runtime
profile, and return JSON content that satisfies the daily triage schema without
leaking provider credentials or secret material into Git, logs, or State Hub.
This is not a new public API. The current `llm_connect.server` contract is a
lightweight internal service surface; this workplan turns it into a durable
internal dependency with profile resolution, deployable artifacts, smoke tests,
and activity-core handoff evidence.
## Demand Signal
State Hub messages from `activity-core` on 2026-06-07 requested a stable
`llm-connect` endpoint before `ACTIVITY-WP-0006/T03` can collect clean scheduled
WSJF evidence.
Required behavior from those messages:
- `GET /health` returns 200 from inside the activity-core runtime path.
- `POST /execute` accepts activity-core `RunConfig` payloads with
`model_name=custodian-triage-balanced`, `temperature=0.2`,
`max_tokens=1800`, `max_depth=2`, `model_params.reasoning_effort=medium`,
and `model_params.json_schema` for the daily triage report.
- The response contains a string `content` field whose value is valid JSON
matching the daily triage schema.
- Provider credentials stay outside Git and outside State Hub
messages/progress.
- The stable service URL can be handed to activity-core as `LLM_CONNECT_URL`.
- The service fits within `LLM_CONNECT_TIMEOUT_SECONDS=300` and surfaces useful
provider/transport errors without exposing secrets.
## Current Repo State
Already present:
- `llm_connect/server.py` exposes `GET /health` and `POST /execute` via
`ThreadingHTTPServer`.
- `/execute` forwards `RunConfig` fields including `max_depth` and
`model_params`.
- Structured-output helpers translate `model_params.json_schema` for OpenAI,
OpenRouter, Gemini, and Claude Code CLI.
- Debug and audit modes redact provider request headers and can replay captured
adapter transformations.
Missing for this request:
- No named runtime profile resolver for `custodian-triage-balanced`.
- No container or Kubernetes deployment artifact for an always-on service.
- No documented secret/config injection path for the cluster service.
- No activity-core daily triage fixture or in-cluster smoke job.
- No committed handoff document naming the final stable URL and verification
evidence.
## T01 - Lock Activity-Core Contract Fixture
```task
id: LLM-WP-0006-T01
title: "Lock activity-core daily WSJF request and schema fixture"
priority: high
status: done
state_hub_task_id: "f1d21c4b-2df3-4da8-8e6e-418fd7998a63"
```
Capture a non-secret fixture for the exact `POST /execute` request used by
`daily-statehub-wsjf-triage`, including the daily triage JSON schema, timeout
budget, expected response shape, and minimum prompt fields. Store only schema
and dummy prompt/evidence values in the repo.
Done when a fixture can be used by tests and smoke scripts without any provider
credentials or live State Hub data, and the workplan notes identify the
activity-core consumer contract it represents.
## T02 - Add Named Runtime Profile Resolution
```task
id: LLM-WP-0006-T02
title: "Resolve custodian-triage-balanced to provider, model, and RunConfig defaults"
priority: high
status: done
state_hub_task_id: "4538bae3-e8cf-4aa6-9056-270fd8d54caa"
```
Add a small named-profile layer for server mode so activity-core can send
`model_name=custodian-triage-balanced` while operators configure the underlying
provider/model out of band. The profile should merge request overrides with
profile defaults for temperature, max tokens, max depth, timeout, and portable
`model_params`, while preserving the existing direct provider/model behavior.
Done when unit tests prove `custodian-triage-balanced` resolves to the selected
adapter/model without hard-coding provider secrets, unknown profile names fail
with a clear non-secret error, and existing `/execute` behavior remains
backward compatible.
## T03 - Harden Server Responses for Operations
```task
id: LLM-WP-0006-T03
title: "Return useful non-secret provider and transport errors from server mode"
priority: high
status: done
state_hub_task_id: "d4adfe3b-6a57-4184-86fd-2eb11979f075"
```
Review server error handling for provider configuration failures, timeouts,
HTTP/API failures, invalid profile config, and malformed structured-output
responses. Keep the normal `LLMResponse.to_dict()` success shape, but make
errors actionable for operators and consumers without echoing API keys, bearer
tokens, request headers, or prompt bodies by default.
Done when tests cover sanitized error responses for configuration, timeout,
provider/API, and profile validation failures, and debug/audit mode remains
opt-in and redacted.
## T04 - Package the Always-On Service
```task
id: LLM-WP-0006-T04
title: "Add container packaging and service entrypoint for llm-connect server"
priority: high
status: done
state_hub_task_id: "38822b17-fa58-4583-939f-26e59b9c93c7"
```
Create the deployable service artifact: container build definition, non-root
runtime, healthcheck, explicit listen host/port, and environment-driven profile
configuration. Keep provider keys injected only at runtime through the approved
cluster secret path.
Done when the image builds locally, starts with mock and at least one real
provider configuration path, passes `GET /health`, and can receive a fixture
`POST /execute` without writing secrets to stdout, image layers, or committed
files.
## T05 - Add Kubernetes Deployment Surface
```task
id: LLM-WP-0006-T05
title: "Provide Kubernetes Deployment, Service, probes, and secret references"
priority: high
status: done
state_hub_task_id: "f9743610-b573-41b8-952f-b27319acb3e3"
```
Add the cluster deployment surface for an internal `llm-connect` service:
Deployment, Service, readiness/liveness probes, ConfigMap/profile settings,
Secret references for provider credentials, resource requests/limits, and
network access scoped to the activity-core namespace. Use the repository's
current deployment conventions if a shared Railiance chart location is selected
during implementation.
Done when an operator can apply the manifests without editing secret values
into Git, the service exposes stable cluster DNS, and `GET /health` succeeds
from an activity-core pod or equivalent smoke pod.
## T06 - Build Smoke Tests and Validation Scripts
```task
id: LLM-WP-0006-T06
title: "Validate health, fixture execute, JSON schema content, and timeout budget"
priority: high
status: done
state_hub_task_id: "f046d68b-97f3-4471-a1f6-f1ab351ec448"
```
Add smoke tooling that can run locally against mock/profile mode and in-cluster
against the deployed Service. It should check health, post the daily triage
fixture, parse `response.content` as JSON, validate it against the daily triage
schema, and report latency relative to the 300 second activity-core timeout.
Done when the smoke path produces a clear pass/fail summary without dumping
secret headers or provider credentials, and failed JSON/schema validation is
reported distinctly from provider transport failure.
## T07 - Coordinate Activity-Core Handoff
```task
id: LLM-WP-0006-T07
title: "Publish verified LLM_CONNECT_URL handoff and activity-core smoke evidence"
priority: high
status: blocked
state_hub_task_id: "92e043f0-5ca8-4c2d-b8f6-dd5fbf8ccb62"
```
After the service is deployed and smoke-tested, hand the stable URL to the
activity-core/railiance-cluster operator for `LLM_CONNECT_URL`. Coordinate one
manual or smoke daily WSJF run and record non-secret evidence that a State Hub
`daily_triage` event was emitted.
Done when the final URL value is documented in the appropriate operator-owned
config handoff, a fixture `POST /execute` succeeds from the activity-core
namespace, and activity-core has enough evidence to start counting clean 07:20
Europe/Berlin scheduled runs toward `ACTIVITY-WP-0006/T03`.
## Scope Guardrails
In scope:
- Server-mode profile resolution needed by activity-core.
- Internal service packaging and Kubernetes deployment artifacts.
- Redacted diagnostics and operator-safe error responses.
- Health and execute smoke tooling using non-secret fixtures.
- Coordination notes for the final `LLM_CONNECT_URL` handoff.
Out of scope:
- Publishing `llm-connect` as a public internet service.
- Storing provider credentials, live prompts, or State Hub event payloads in
Git.
- Replacing activity-core's scheduler or WSJF triage logic.
- Guaranteeing three scheduled production runs; this plan provides the
endpoint and first smoke evidence, while scheduled-run collection remains
activity-core ownership.
- Choosing or rotating production provider credentials; that is an operator
secret-management action.
## Acceptance
- `python -m llm_connect.server` or the packaged service starts an internal
endpoint with a configured `custodian-triage-balanced` profile.
- `GET /health` returns 200 locally and from inside the activity-core runtime
network path.
- A fixture `POST /execute` with the daily WSJF schema returns an
`LLMResponse` whose `content` field is a string containing schema-valid JSON.
- Provider failures, timeouts, and profile/config errors return useful
non-secret error bodies.
- The deployed Service has readiness/liveness probes, runtime-only secret
injection, and a documented stable URL for activity-core.
- A manual or smoke daily WSJF run emits non-secret evidence of a State Hub
`daily_triage` event.
## Risks and Open Questions
- The final provider/model behind `custodian-triage-balanced` needs operator
approval and runtime secret availability. The profile layer should keep that
choice configurable.
- If the chosen provider does not reliably honor the supplied JSON schema, the
smoke path may need a retry or repair strategy; that should be explicit and
bounded if added.
- The repository currently has no deployment directory. Implementation must
decide whether Kubernetes artifacts live here, in a Railiance deployment repo,
or are split between code-owned defaults here and environment-owned overlays
elsewhere.
- `llm_connect.server` is stdlib HTTP and thread-per-request. That is likely
sufficient for daily WSJF traffic, but sustained multi-consumer use may need
a later ASGI/worker model.
## Implementation Notes
2026-06-07:
- Added non-secret activity-core fixtures under `fixtures/activity_core/` using
the `daily-triage-report` schema from activity-core's Railiance runtime.
- Added `llm_connect.profiles` with `custodian-triage-balanced` profile
dispatch, env/file profile overrides, and metadata on profiled responses.
- Updated `llm_connect.server` so CLI serve mode enables runtime profiles by
default, reads host/port/provider/model defaults from env, validates configs
before execution, and returns structured sanitized error bodies.
- Added `LLM_CONNECT_MOCK_RESPONSE` support for local mock server smokes.
- Added standard-library smoke tooling in
`scripts/smoke_activity_core_endpoint.py`, plus tests that run the smoke path
against an in-process profiled mock HTTP server.
- Added `Containerfile`, `.dockerignore`, and a Kubernetes overlay at
`deploy/k8s/activity-core-llm-connect/`.
- Added handoff docs in `docs/activity-core-llm-endpoint.md`.
- Verification completed locally:
`python3 -m pytest tests/test_profiles.py tests/test_server.py
tests/test_activity_core_smoke.py tests/test_factory.py
tests/test_package_exports.py`;
`docker build --progress=plain -f Containerfile -t
llm-connect:wp0006-smoke .`; and `kubectl kustomize
deploy/k8s/activity-core-llm-connect`.
Live cluster evidence:
- Imported `docker.io/library/llm-connect:latest` into the actual Railiance k3s
node runtime on `coulombcore` (`92.205.130.254`) and updated the overlay to
use that normalized image reference with `imagePullPolicy: Never`.
- Applied the `activity-core` namespace deployment surface: ConfigMap, Secret
reference, Service, Deployment, readiness/liveness probes, and NetworkPolicy.
- Verified the live Deployment is `1/1` ready with image
`docker.io/library/llm-connect:latest`.
- Verified the stable in-cluster URL
`http://llm-connect.activity-core.svc.cluster.local:8080` returns
`{"status": "ok"}` for `GET /health` from the activity-core namespace path.
- Verified the activity-core fixture smoke reaches `POST /execute`; it fails
with a structured `configuration_error` until the provider credential Secret
is populated. No Secret values were inspected or recorded.
Remaining blocked live gate:
- `LLM-WP-0006-T07` still needs the runtime provider Secret populated outside
Git/State Hub, a successful fixture `POST /execute` returning schema-valid
JSON, the verified URL written to activity-core runtime config, and a
manual/smoke daily WSJF run that emits a non-secret State Hub `daily_triage`
event.
2026-06-07 follow-up:
- Submitted State Hub message `8e644cb0-1af4-482c-8da7-7061080d21bc` to
`railiance-cluster` requesting image publication, runtime provider Secret
creation outside Git/State Hub, overlay apply or porting, in-namespace
`/health`, and fixture smoke evidence for `LLM-WP-0006-T05`.
- Submitted State Hub message `ff798e7c-b8ef-4a3f-ab92-00bf09410534` to
`activity-core` requesting `LLM_CONNECT_URL` / timeout consumption after the
cluster smoke, a manual or smoke daily WSJF run, State Hub `daily_triage`
evidence, working-memory verification, and continuation of the three clean
scheduled 07:20 Europe/Berlin runs for `ACTIVITY-WP-0006-T03`.
- Submitted State Hub message `02033d4d-3cb0-41c8-b390-7b9e8471421e` to
`railiance-cluster` confirming the live Deployment, stable URL, and `/health`
evidence after importing the image into the actual `coulombcore` k3s node.
- Submitted State Hub message `771afe14-a2d0-46ca-b905-52018bf86c62` to
`activity-core` with the verified URL and the remaining provider Secret gate
for schema-valid `POST /execute` and `daily_triage` evidence.
## Closure Notes
After this workplan file is added or task statuses change, ask the custodian
operator to run from `~/state-hub`:
```bash
make fix-consistency REPO=llm-connect
```
That syncs file-backed workplan state into the State Hub cache.