Files
activity-core/k8s/railiance/README.md

79 lines
3.7 KiB
Markdown

# Railiance01 Kubernetes Deployment
This bundle establishes activity-core as an internal production service on the
railiance01 K3s cluster. It keeps the unauthenticated API as a ClusterIP service;
publish it through an authenticated ingress only after choosing the final host
name and access policy.
## Layout
- `00-namespace.yaml`: namespace and shared labels
- `10-infrastructure.yaml`: PostgreSQL for app data, PostgreSQL for Temporal,
NATS JetStream, Temporal, and Temporal UI
- `20-runtime.yaml`: migrate/sync jobs plus API, worker, and event-router
- `bootstrap-secrets.sh`: idempotently creates generated Kubernetes secrets
The runtime image tag is `activity-core:railiance01-prod` and is expected to be
loaded into the railiance01 K3s containerd image store.
`20-runtime.yaml` also projects the disabled Custodian-owned
`ops-service-inventory-probes.md` ActivityDefinition and a non-secret
`actcore-ops-service-inventory` ConfigMap snapshot. The source of truth for the
inventory remains `/home/worsch/the-custodian/ops/service-inventory.yml`; update
the ConfigMap projection from that file before enabling the probe schedule.
`OPS_HUB_KEY` is created only as an empty Secret placeholder until the operator
provisions the Inter-Hub ops-hub key.
The same runtime projection now includes the active
`daily-statehub-wsjf-triage.md` ActivityDefinition plus its JSON output schema
and a persistent working-memory volume mounted at
`/home/worsch/the-custodian/memory/working`. Before trusting the daily 07:20
Europe/Berlin schedule, verify both runtime dependencies:
- `actcore-state-hub-bridge` can reach the State Hub API through the node-local
tunnel expected at `127.0.0.1:18000`.
- `LLM_CONNECT_URL` is set to an operator-approved llm-connect endpoint that can
serve the `custodian-triage-balanced` profile.
If `LLM_CONNECT_URL` is missing or broken, report-sink instructions write a
visible `execution_failed` diagnostic instead of silently producing no report.
## Deploy
```bash
docker build -t activity-core:railiance01-prod .
docker save -o /tmp/activity-core-railiance01-prod.tar activity-core:railiance01-prod
scp /tmp/activity-core-railiance01-prod.tar railiance01:/tmp/
ssh railiance01 sudo k3s ctr images import /tmp/activity-core-railiance01-prod.tar
rsync -a k8s/railiance/ railiance01:activity-core/k8s/railiance/
ssh railiance01
cd ~/activity-core
bash k8s/railiance/bootstrap-secrets.sh
kubectl apply -f k8s/railiance/10-infrastructure.yaml
kubectl -n activity-core wait --for=condition=ready pod -l app.kubernetes.io/name=actcore-app-db --timeout=180s
kubectl -n activity-core wait --for=condition=ready pod -l app.kubernetes.io/name=actcore-temporal-db --timeout=180s
kubectl -n activity-core wait --for=condition=ready pod -l app.kubernetes.io/name=actcore-nats --timeout=180s
kubectl -n activity-core rollout status deploy/actcore-temporal --timeout=300s
kubectl -n activity-core delete job actcore-migrate --ignore-not-found
kubectl apply -f k8s/railiance/20-runtime.yaml
kubectl -n activity-core wait --for=condition=complete job/actcore-migrate --timeout=180s
kubectl -n activity-core rollout status deploy/actcore-api --timeout=180s
kubectl -n activity-core rollout status deploy/actcore-worker --timeout=180s
kubectl -n activity-core rollout status deploy/actcore-event-router --timeout=180s
kubectl -n activity-core delete job actcore-sync --ignore-not-found
kubectl apply -f k8s/railiance/20-runtime.yaml
kubectl -n activity-core wait --for=condition=complete job/actcore-sync --timeout=180s
```
## Verify
```bash
kubectl -n activity-core exec deploy/actcore-api -- \
python -c "import urllib.request; print(urllib.request.urlopen('http://localhost:8010/health').read().decode())"
kubectl -n activity-core get pods
kubectl -n activity-core get svc
```