generated from coulomb/repo-seed
354 lines
15 KiB
Markdown
354 lines
15 KiB
Markdown
---
|
|
id: LLM-WP-0006
|
|
type: workplan
|
|
title: "Activity-Core Always-On LLM Endpoint"
|
|
domain: custodian
|
|
repo: llm-connect
|
|
status: blocked
|
|
owner: codex
|
|
topic_slug: activity-core-llm-endpoint
|
|
planning_priority: high
|
|
planning_order: 6
|
|
created: "2026-06-07"
|
|
updated: "2026-06-07"
|
|
depends_on_workplans:
|
|
- LLM-WP-0003
|
|
related_workplans:
|
|
- ACTIVITY-WP-0006
|
|
state_hub_workstream_id: "8de71d58-1193-424f-8338-a9aa4e173c5b"
|
|
---
|
|
|
|
# LLM-WP-0006 - Activity-Core Always-On LLM Endpoint
|
|
|
|
**status:** blocked
|
|
**owner:** codex
|
|
|
|
## Purpose
|
|
|
|
Provide an operator-approved, always-on `llm-connect` HTTP endpoint for
|
|
`activity-core` daily WSJF triage. The service must be reachable from the
|
|
`activity-core` Kubernetes namespace, expose the existing `GET /health` and
|
|
`POST /execute` contract, support the `custodian-triage-balanced` runtime
|
|
profile, and return JSON content that satisfies the daily triage schema without
|
|
leaking provider credentials or secret material into Git, logs, or State Hub.
|
|
|
|
This is not a new public API. The current `llm_connect.server` contract is a
|
|
lightweight internal service surface; this workplan turns it into a durable
|
|
internal dependency with profile resolution, deployable artifacts, smoke tests,
|
|
and activity-core handoff evidence.
|
|
|
|
## Demand Signal
|
|
|
|
State Hub messages from `activity-core` on 2026-06-07 requested a stable
|
|
`llm-connect` endpoint before `ACTIVITY-WP-0006/T03` can collect clean scheduled
|
|
WSJF evidence.
|
|
|
|
Required behavior from those messages:
|
|
|
|
- `GET /health` returns 200 from inside the activity-core runtime path.
|
|
- `POST /execute` accepts activity-core `RunConfig` payloads with
|
|
`model_name=custodian-triage-balanced`, `temperature=0.2`,
|
|
`max_tokens=1800`, `max_depth=2`, `model_params.reasoning_effort=medium`,
|
|
and `model_params.json_schema` for the daily triage report.
|
|
- The response contains a string `content` field whose value is valid JSON
|
|
matching the daily triage schema.
|
|
- Provider credentials stay outside Git and outside State Hub
|
|
messages/progress.
|
|
- The stable service URL can be handed to activity-core as `LLM_CONNECT_URL`.
|
|
- The service fits within `LLM_CONNECT_TIMEOUT_SECONDS=300` and surfaces useful
|
|
provider/transport errors without exposing secrets.
|
|
|
|
## Current Repo State
|
|
|
|
Already present:
|
|
|
|
- `llm_connect/server.py` exposes `GET /health` and `POST /execute` via
|
|
`ThreadingHTTPServer`.
|
|
- `/execute` forwards `RunConfig` fields including `max_depth` and
|
|
`model_params`.
|
|
- Structured-output helpers translate `model_params.json_schema` for OpenAI,
|
|
OpenRouter, Gemini, and Claude Code CLI.
|
|
- Debug and audit modes redact provider request headers and can replay captured
|
|
adapter transformations.
|
|
|
|
Missing for this request:
|
|
|
|
- No named runtime profile resolver for `custodian-triage-balanced`.
|
|
- No container or Kubernetes deployment artifact for an always-on service.
|
|
- No documented secret/config injection path for the cluster service.
|
|
- No activity-core daily triage fixture or in-cluster smoke job.
|
|
- No committed handoff document naming the final stable URL and verification
|
|
evidence.
|
|
|
|
## T01 - Lock Activity-Core Contract Fixture
|
|
|
|
```task
|
|
id: LLM-WP-0006-T01
|
|
title: "Lock activity-core daily WSJF request and schema fixture"
|
|
priority: high
|
|
status: done
|
|
state_hub_task_id: "f1d21c4b-2df3-4da8-8e6e-418fd7998a63"
|
|
```
|
|
|
|
Capture a non-secret fixture for the exact `POST /execute` request used by
|
|
`daily-statehub-wsjf-triage`, including the daily triage JSON schema, timeout
|
|
budget, expected response shape, and minimum prompt fields. Store only schema
|
|
and dummy prompt/evidence values in the repo.
|
|
|
|
Done when a fixture can be used by tests and smoke scripts without any provider
|
|
credentials or live State Hub data, and the workplan notes identify the
|
|
activity-core consumer contract it represents.
|
|
|
|
## T02 - Add Named Runtime Profile Resolution
|
|
|
|
```task
|
|
id: LLM-WP-0006-T02
|
|
title: "Resolve custodian-triage-balanced to provider, model, and RunConfig defaults"
|
|
priority: high
|
|
status: done
|
|
state_hub_task_id: "4538bae3-e8cf-4aa6-9056-270fd8d54caa"
|
|
```
|
|
|
|
Add a small named-profile layer for server mode so activity-core can send
|
|
`model_name=custodian-triage-balanced` while operators configure the underlying
|
|
provider/model out of band. The profile should merge request overrides with
|
|
profile defaults for temperature, max tokens, max depth, timeout, and portable
|
|
`model_params`, while preserving the existing direct provider/model behavior.
|
|
|
|
Done when unit tests prove `custodian-triage-balanced` resolves to the selected
|
|
adapter/model without hard-coding provider secrets, unknown profile names fail
|
|
with a clear non-secret error, and existing `/execute` behavior remains
|
|
backward compatible.
|
|
|
|
## T03 - Harden Server Responses for Operations
|
|
|
|
```task
|
|
id: LLM-WP-0006-T03
|
|
title: "Return useful non-secret provider and transport errors from server mode"
|
|
priority: high
|
|
status: done
|
|
state_hub_task_id: "d4adfe3b-6a57-4184-86fd-2eb11979f075"
|
|
```
|
|
|
|
Review server error handling for provider configuration failures, timeouts,
|
|
HTTP/API failures, invalid profile config, and malformed structured-output
|
|
responses. Keep the normal `LLMResponse.to_dict()` success shape, but make
|
|
errors actionable for operators and consumers without echoing API keys, bearer
|
|
tokens, request headers, or prompt bodies by default.
|
|
|
|
Done when tests cover sanitized error responses for configuration, timeout,
|
|
provider/API, and profile validation failures, and debug/audit mode remains
|
|
opt-in and redacted.
|
|
|
|
## T04 - Package the Always-On Service
|
|
|
|
```task
|
|
id: LLM-WP-0006-T04
|
|
title: "Add container packaging and service entrypoint for llm-connect server"
|
|
priority: high
|
|
status: done
|
|
state_hub_task_id: "38822b17-fa58-4583-939f-26e59b9c93c7"
|
|
```
|
|
|
|
Create the deployable service artifact: container build definition, non-root
|
|
runtime, healthcheck, explicit listen host/port, and environment-driven profile
|
|
configuration. Keep provider keys injected only at runtime through the approved
|
|
cluster secret path.
|
|
|
|
Done when the image builds locally, starts with mock and at least one real
|
|
provider configuration path, passes `GET /health`, and can receive a fixture
|
|
`POST /execute` without writing secrets to stdout, image layers, or committed
|
|
files.
|
|
|
|
## T05 - Add Kubernetes Deployment Surface
|
|
|
|
```task
|
|
id: LLM-WP-0006-T05
|
|
title: "Provide Kubernetes Deployment, Service, probes, and secret references"
|
|
priority: high
|
|
status: done
|
|
state_hub_task_id: "f9743610-b573-41b8-952f-b27319acb3e3"
|
|
```
|
|
|
|
Add the cluster deployment surface for an internal `llm-connect` service:
|
|
Deployment, Service, readiness/liveness probes, ConfigMap/profile settings,
|
|
Secret references for provider credentials, resource requests/limits, and
|
|
network access scoped to the activity-core namespace. Use the repository's
|
|
current deployment conventions if a shared Railiance chart location is selected
|
|
during implementation.
|
|
|
|
Done when an operator can apply the manifests without editing secret values
|
|
into Git, the service exposes stable cluster DNS, and `GET /health` succeeds
|
|
from an activity-core pod or equivalent smoke pod.
|
|
|
|
## T06 - Build Smoke Tests and Validation Scripts
|
|
|
|
```task
|
|
id: LLM-WP-0006-T06
|
|
title: "Validate health, fixture execute, JSON schema content, and timeout budget"
|
|
priority: high
|
|
status: done
|
|
state_hub_task_id: "f046d68b-97f3-4471-a1f6-f1ab351ec448"
|
|
```
|
|
|
|
Add smoke tooling that can run locally against mock/profile mode and in-cluster
|
|
against the deployed Service. It should check health, post the daily triage
|
|
fixture, parse `response.content` as JSON, validate it against the daily triage
|
|
schema, and report latency relative to the 300 second activity-core timeout.
|
|
|
|
Done when the smoke path produces a clear pass/fail summary without dumping
|
|
secret headers or provider credentials, and failed JSON/schema validation is
|
|
reported distinctly from provider transport failure.
|
|
|
|
## T07 - Coordinate Activity-Core Handoff
|
|
|
|
```task
|
|
id: LLM-WP-0006-T07
|
|
title: "Publish verified LLM_CONNECT_URL handoff and activity-core smoke evidence"
|
|
priority: high
|
|
status: blocked
|
|
state_hub_task_id: "92e043f0-5ca8-4c2d-b8f6-dd5fbf8ccb62"
|
|
```
|
|
|
|
After the service is deployed and smoke-tested, hand the stable URL to the
|
|
activity-core/railiance-cluster operator for `LLM_CONNECT_URL`. Coordinate one
|
|
manual or smoke daily WSJF run and record non-secret evidence that a State Hub
|
|
`daily_triage` event was emitted.
|
|
|
|
Done when the final URL value is documented in the appropriate operator-owned
|
|
config handoff, a fixture `POST /execute` succeeds from the activity-core
|
|
namespace, and activity-core has enough evidence to start counting clean 07:20
|
|
Europe/Berlin scheduled runs toward `ACTIVITY-WP-0006/T03`.
|
|
|
|
## Scope Guardrails
|
|
|
|
In scope:
|
|
|
|
- Server-mode profile resolution needed by activity-core.
|
|
- Internal service packaging and Kubernetes deployment artifacts.
|
|
- Redacted diagnostics and operator-safe error responses.
|
|
- Health and execute smoke tooling using non-secret fixtures.
|
|
- Coordination notes for the final `LLM_CONNECT_URL` handoff.
|
|
|
|
Out of scope:
|
|
|
|
- Publishing `llm-connect` as a public internet service.
|
|
- Storing provider credentials, live prompts, or State Hub event payloads in
|
|
Git.
|
|
- Replacing activity-core's scheduler or WSJF triage logic.
|
|
- Guaranteeing three scheduled production runs; this plan provides the
|
|
endpoint and first smoke evidence, while scheduled-run collection remains
|
|
activity-core ownership.
|
|
- Choosing or rotating production provider credentials; that is an operator
|
|
secret-management action.
|
|
|
|
## Acceptance
|
|
|
|
- `python -m llm_connect.server` or the packaged service starts an internal
|
|
endpoint with a configured `custodian-triage-balanced` profile.
|
|
- `GET /health` returns 200 locally and from inside the activity-core runtime
|
|
network path.
|
|
- A fixture `POST /execute` with the daily WSJF schema returns an
|
|
`LLMResponse` whose `content` field is a string containing schema-valid JSON.
|
|
- Provider failures, timeouts, and profile/config errors return useful
|
|
non-secret error bodies.
|
|
- The deployed Service has readiness/liveness probes, runtime-only secret
|
|
injection, and a documented stable URL for activity-core.
|
|
- A manual or smoke daily WSJF run emits non-secret evidence of a State Hub
|
|
`daily_triage` event.
|
|
|
|
## Risks and Open Questions
|
|
|
|
- The final provider/model behind `custodian-triage-balanced` needs operator
|
|
approval and runtime secret availability. The profile layer should keep that
|
|
choice configurable.
|
|
- If the chosen provider does not reliably honor the supplied JSON schema, the
|
|
smoke path may need a retry or repair strategy; that should be explicit and
|
|
bounded if added.
|
|
- The repository currently has no deployment directory. Implementation must
|
|
decide whether Kubernetes artifacts live here, in a Railiance deployment repo,
|
|
or are split between code-owned defaults here and environment-owned overlays
|
|
elsewhere.
|
|
- `llm_connect.server` is stdlib HTTP and thread-per-request. That is likely
|
|
sufficient for daily WSJF traffic, but sustained multi-consumer use may need
|
|
a later ASGI/worker model.
|
|
|
|
## Implementation Notes
|
|
|
|
2026-06-07:
|
|
|
|
- Added non-secret activity-core fixtures under `fixtures/activity_core/` using
|
|
the `daily-triage-report` schema from activity-core's Railiance runtime.
|
|
- Added `llm_connect.profiles` with `custodian-triage-balanced` profile
|
|
dispatch, env/file profile overrides, and metadata on profiled responses.
|
|
- Updated `llm_connect.server` so CLI serve mode enables runtime profiles by
|
|
default, reads host/port/provider/model defaults from env, validates configs
|
|
before execution, and returns structured sanitized error bodies.
|
|
- Added `LLM_CONNECT_MOCK_RESPONSE` support for local mock server smokes.
|
|
- Added standard-library smoke tooling in
|
|
`scripts/smoke_activity_core_endpoint.py`, plus tests that run the smoke path
|
|
against an in-process profiled mock HTTP server.
|
|
- Added `Containerfile`, `.dockerignore`, and a Kubernetes overlay at
|
|
`deploy/k8s/activity-core-llm-connect/`.
|
|
- Added handoff docs in `docs/activity-core-llm-endpoint.md`.
|
|
- Verification completed locally:
|
|
`python3 -m pytest tests/test_profiles.py tests/test_server.py
|
|
tests/test_activity_core_smoke.py tests/test_factory.py
|
|
tests/test_package_exports.py`;
|
|
`docker build --progress=plain -f Containerfile -t
|
|
llm-connect:wp0006-smoke .`; and `kubectl kustomize
|
|
deploy/k8s/activity-core-llm-connect`.
|
|
|
|
Live cluster evidence:
|
|
|
|
- Imported `docker.io/library/llm-connect:latest` into the actual Railiance k3s
|
|
node runtime on `coulombcore` (`92.205.130.254`) and updated the overlay to
|
|
use that normalized image reference with `imagePullPolicy: Never`.
|
|
- Applied the `activity-core` namespace deployment surface: ConfigMap, Secret
|
|
reference, Service, Deployment, readiness/liveness probes, and NetworkPolicy.
|
|
- Verified the live Deployment is `1/1` ready with image
|
|
`docker.io/library/llm-connect:latest`.
|
|
- Verified the stable in-cluster URL
|
|
`http://llm-connect.activity-core.svc.cluster.local:8080` returns
|
|
`{"status": "ok"}` for `GET /health` from the activity-core namespace path.
|
|
- Verified the activity-core fixture smoke reaches `POST /execute`; it fails
|
|
with a structured `configuration_error` until the provider credential Secret
|
|
is populated. No Secret values were inspected or recorded.
|
|
|
|
Remaining blocked live gate:
|
|
|
|
- `LLM-WP-0006-T07` still needs the runtime provider Secret populated outside
|
|
Git/State Hub, a successful fixture `POST /execute` returning schema-valid
|
|
JSON, the verified URL written to activity-core runtime config, and a
|
|
manual/smoke daily WSJF run that emits a non-secret State Hub `daily_triage`
|
|
event.
|
|
|
|
2026-06-07 follow-up:
|
|
|
|
- Submitted State Hub message `8e644cb0-1af4-482c-8da7-7061080d21bc` to
|
|
`railiance-cluster` requesting image publication, runtime provider Secret
|
|
creation outside Git/State Hub, overlay apply or porting, in-namespace
|
|
`/health`, and fixture smoke evidence for `LLM-WP-0006-T05`.
|
|
- Submitted State Hub message `ff798e7c-b8ef-4a3f-ab92-00bf09410534` to
|
|
`activity-core` requesting `LLM_CONNECT_URL` / timeout consumption after the
|
|
cluster smoke, a manual or smoke daily WSJF run, State Hub `daily_triage`
|
|
evidence, working-memory verification, and continuation of the three clean
|
|
scheduled 07:20 Europe/Berlin runs for `ACTIVITY-WP-0006-T03`.
|
|
- Submitted State Hub message `02033d4d-3cb0-41c8-b390-7b9e8471421e` to
|
|
`railiance-cluster` confirming the live Deployment, stable URL, and `/health`
|
|
evidence after importing the image into the actual `coulombcore` k3s node.
|
|
- Submitted State Hub message `771afe14-a2d0-46ca-b905-52018bf86c62` to
|
|
`activity-core` with the verified URL and the remaining provider Secret gate
|
|
for schema-valid `POST /execute` and `daily_triage` evidence.
|
|
|
|
## Closure Notes
|
|
|
|
After this workplan file is added or task statuses change, ask the custodian
|
|
operator to run from `~/state-hub`:
|
|
|
|
```bash
|
|
make fix-consistency REPO=llm-connect
|
|
```
|
|
|
|
That syncs file-backed workplan state into the State Hub cache.
|