generated from coulomb/repo-seed
Switch the custodian triage default from anthropic/claude-sonnet-4 to google/gemini-2.5-flash, which advertises structured-output support on OpenRouter. Tighten the OpenRouter adapter to send strict JSON schema requests and set provider.require_parameters=true so routing only hits providers that honor the requested response_format. Update Kubernetes deploy docs and config for the verified coulombcore handoff: Containerfile build path, image-pull-policy=Never for smoke pods, credential-routing notes, and live smoke evidence. Mark LLM-WP-0006 finished with closure notes from 2026-06-18.
116 lines
3.3 KiB
Markdown
116 lines
3.3 KiB
Markdown
# Activity-Core LLM Endpoint Handoff
|
|
|
|
This document records the `llm-connect` endpoint contract for activity-core
|
|
daily WSJF triage.
|
|
|
|
## Service URL
|
|
|
|
Proposed stable in-cluster URL:
|
|
|
|
```text
|
|
http://llm-connect.activity-core.svc.cluster.local:8080
|
|
```
|
|
|
|
Use this value for activity-core `LLM_CONNECT_URL` after the Kubernetes overlay
|
|
has been applied and smoked from the `activity-core` namespace. Keep
|
|
`LLM_CONNECT_TIMEOUT_SECONDS=300`.
|
|
|
|
## Runtime Profile
|
|
|
|
The service supports the activity-core profile name:
|
|
|
|
```text
|
|
custodian-triage-balanced
|
|
```
|
|
|
|
Default runtime values:
|
|
|
|
```text
|
|
provider=openrouter
|
|
model=google/gemini-2.5-flash
|
|
temperature=0.2
|
|
max_tokens=1800
|
|
max_depth=2
|
|
timeout_seconds=300
|
|
model_params.reasoning_effort=medium
|
|
```
|
|
|
|
Operators can override provider/model through the Deployment ConfigMap or
|
|
runtime env:
|
|
|
|
```text
|
|
LLM_CONNECT_CUSTODIAN_TRIAGE_PROVIDER
|
|
LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL
|
|
```
|
|
|
|
Provider credentials must be injected at runtime through
|
|
`llm-connect-provider-secrets`; do not store credential values in Git or State
|
|
Hub.
|
|
|
|
Credential custody follows the ops-warden routing table: LLM provider API keys
|
|
are an operator/OpenBao-to-Kubernetes Secret action, not an ops-warden issuance
|
|
task. For the default OpenRouter profile, the Secret must provide
|
|
`OPENROUTER_API_KEY` without exposing the value in Git, State Hub, logs, or
|
|
chat.
|
|
|
|
## Local Smoke
|
|
|
|
Run a mock server that returns known schema-valid daily triage JSON:
|
|
|
|
```bash
|
|
export LLM_CONNECT_MOCK_RESPONSE="$(python -c 'import json; print(json.dumps(json.load(open("fixtures/activity_core/daily-triage-valid-content.json"))))')"
|
|
python -m llm_connect.server --host 127.0.0.1 --port 8080 --provider mock
|
|
```
|
|
|
|
In another shell:
|
|
|
|
```bash
|
|
python scripts/smoke_activity_core_endpoint.py --url http://127.0.0.1:8080
|
|
```
|
|
|
|
The smoke script checks:
|
|
|
|
- `GET /health`
|
|
- fixture `POST /execute`
|
|
- response has a string `content` field
|
|
- `content` parses as JSON
|
|
- parsed JSON matches `fixtures/activity_core/daily-triage-report.schema.json`
|
|
|
|
## Cluster Smoke
|
|
|
|
Apply the overlay from the repo root after creating the provider Secret:
|
|
|
|
```bash
|
|
kubectl apply -k deploy/k8s/activity-core-llm-connect
|
|
kubectl -n activity-core rollout status deployment/llm-connect
|
|
```
|
|
|
|
Run the in-namespace smoke:
|
|
|
|
```bash
|
|
kubectl -n activity-core run llm-connect-smoke \
|
|
--rm -i --restart=Never \
|
|
--image=llm-connect:latest \
|
|
--image-pull-policy=Never \
|
|
--env=LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080 \
|
|
--env=LLM_CONNECT_TIMEOUT_SECONDS=300 \
|
|
-- python scripts/smoke_activity_core_endpoint.py
|
|
```
|
|
|
|
## Handoff Status
|
|
|
|
Code-owned artifacts are present in this repo and the live llm-connect
|
|
handoff is verified as of 2026-06-18:
|
|
|
|
- `docker.io/library/llm-connect:latest` was rebuilt from `Containerfile`,
|
|
imported into the `coulombcore` k3s image store, and rolled out.
|
|
- `activity-core/llm-connect-provider-secrets` reports `DATA 1`; no Secret
|
|
values were inspected or recorded.
|
|
- The live ConfigMap sets `LLM_CONNECT_MODEL=google/gemini-2.5-flash` and
|
|
`LLM_CONNECT_CUSTODIAN_TRIAGE_MODEL=google/gemini-2.5-flash`.
|
|
- The in-namespace smoke passed against the stable Service:
|
|
`smoke: pass health=ok latency_seconds=2.147 recommendations=1`.
|
|
|
|
Scheduled `daily_triage` evidence collection is activity-core ownership under
|
|
`ACTIVITY-WP-0006`.
|