Files
railiance-forge/workplans/FORGE-WP-0003-actions-runner-substrate.md
tegwick de6178764c
Some checks failed
Forge Runner Smoke / compatibility-smoke (push) Has been cancelled
Record haskelseed runner smoke state
2026-06-08 00:51:50 +02:00

266 lines
9.5 KiB
Markdown

---
id: FORGE-WP-0003
type: workplan
title: "Gitea Actions runner substrate for Railiance workloads"
domain: railiance
repo: railiance-forge
status: active
owner: codex
topic_slug: railiance
planning_priority: high
created: "2026-06-07"
updated: "2026-06-07"
state_hub_workstream_id: "149a0316-64d1-4664-96d0-274577c32e63"
---
# Gitea Actions runner substrate for Railiance workloads
## Context
Inter-Hub reported that its production deployment is blocked on a forge-owned
Actions runner substrate. The inter-hub workflow currently targets
`self-hosted` and `haskelseed`, but production remained on the older API surface
after deployment-trigger commits. The current forge migration notes explicitly
excluded an Actions runner deployment, while the forge operating contract says
`railiance-forge` owns runner deployment, registration, labels, credential
boundaries, and health evidence.
This workplan turns that ownership contract into an actionable runner substrate
without weakening repo or app boundaries. It should unblock inter-hub only after
the runner is registered, visible, and has passed a non-production sample job.
## T01 - Register blocker and dependency evidence
```task
id: FORGE-WP-0003-T01
status: done
priority: high
state_hub_task_id: "b5a42f74-7792-4fbc-8e1f-16c1082ea194"
```
Capture the immediate dependency chain:
- inter-hub `R7` waits on a self-hosted runner for labels currently written as
`self-hosted` and `haskelseed`;
- `hub.coulomb.social` still serves the older API surface after pushed
deployment-trigger commits;
- `docs/first-migration-plan.md` made runner deployment a non-goal for the first
forge migration;
- `docs/ci-runner-actions-gitops-ownership.md` assigns runner substrate
ownership to `railiance-forge`.
Done when this workplan is registered in State Hub and the unread forge inbox
messages that created the blocker are marked read.
---
## T02 - Inventory current Gitea Actions state
```task
id: FORGE-WP-0003-T02
status: done
priority: high
state_hub_task_id: "87181d63-049e-4a2b-a5e3-bf16763246d7"
```
Inspect the current Gitea Actions configuration without printing secrets.
Check:
- whether Actions are enabled for the current Gitea instance;
- whether any `act_runner` service is already registered and online;
- whether a haskelseed runner exists, and which labels it advertises;
- runner logs around the inter-hub Build and Deploy attempts;
- registry tags for the blocked inter-hub commits, including the commit tag and
`latest` where applicable.
Done when the actual current runner/registry state is recorded as non-secret
evidence in the repo and State Hub.
**2026-06-07:** Added `docs/gitea-actions-runner-evidence.md` and
`make runner-status` to capture non-secret inventory. Current session evidence:
public inter-hub `/api/v2/hubs` still returns `404`, the direct `haskelseed`
SSH alias timed out, and `skopeo` is unavailable for registry tag inspection.
After ops-bridge was updated, haskelseed is reachable at `root@192.168.178.135`
with `/home/worsch/.ssh/id_ops`. Haskelseed has `act_runner
v0.6.1-1-g8e6b3be9` and `/root/.runner` registered as `haskelseed` with labels
`haskelseed:host`, `linux:host`, and `x86_64:host`, but no OpenRC service or
live runner process was observed. This task still waits on Gitea runner admin
visibility and registry tag inspection.
**2026-06-07:** Activated the existing haskelseed runner registration through
ops-bridge. Backed up `/root/.runner` to
`/root/.runner.bak-20260607225905`, updated labels to include `self-hosted`,
`linux_amd64`, `container-build`, and `registry-publish`, installed the OpenRC
service from `runner/act-runner-haskelseed.openrc.example`, and started
`act_runner` as PID `5911`. The daemon log reports that runner `haskelseed`
declared successfully with labels `self-hosted`, `haskelseed`, `linux`,
`linux_amd64`, `x86_64`, `container-build`, and `registry-publish`.
**2026-06-08:** Completed current-state inventory. Gitea created
`forge-runner-smoke.yaml #1` for commit `19ee47fe82`, but the run remains
`Waiting` with duration `0s`. Haskelseed login shell has the deploy tools needed
by inter-hub (`skopeo`, `helm`, `kubectl`, `nix`, `git`, `curl`). Registry
inspection from haskelseed shows inter-hub tags `91037a4`, `ae9e497`,
`fa96fb8`, `7cc3173`, and `latest` are all `manifest unknown`, confirming the
blocked inter-hub workflow did not publish those images.
---
## T03 - Decide runner placement, labels, and capacity rules
```task
id: FORGE-WP-0003-T03
status: done
priority: high
state_hub_task_id: "eecde550-43a5-4d77-8e19-c991c5456b42"
```
Choose the first supported runner model.
Decisions:
- place the runner on haskelseed or on a separate approved runner host;
- publish semantic labels such as `linux`, `container-build`, and
`registry-publish`;
- decide whether to keep compatibility labels like `self-hosted` and
`haskelseed` during the first unblock;
- use concurrency `1` or an explicit build lock if haskelseed remains shared
infrastructure;
- treat cluster-deploy or cluster-access labels as separate approvals, not as
implicit side effects of the build runner.
Done when the label and placement contract is documented with any required
human approvals called out.
**2026-06-07:** Documented the first supported runner model in
`docs/gitea-actions-runner-substrate.md`: one haskelseed compatibility runner
named `railiance-haskelseed-build-01`, capacity `1`, compatibility labels
`self-hosted` and `haskelseed`, semantic labels `linux`, `linux_amd64`,
`container-build`, and `registry-publish`, and no implicit cluster-deploy label.
---
## T04 - Build the runner deployment and recovery runbook
```task
id: FORGE-WP-0003-T04
status: done
priority: high
state_hub_task_id: "a3d0adfb-d1f9-4a5f-8e05-c4a8fbb160b1"
```
Create the forge-owned runner operating surface.
Include:
- installation or service definition for the selected runner host;
- registration-token custody path, referenced by name only;
- start, stop, restart, drain, replacement, and token-rotation steps;
- log inspection commands that avoid secret output;
- health and label inspection commands;
- rollback or disable path for a bad runner registration.
Done when an operator can register and operate the runner from the forge repo
without committing decrypted secrets or machine-local assumptions.
**2026-06-07:** Added the attended install/recovery runbook, non-secret
`runner/` templates, systemd and OpenRC service examples, `make runner-docs`,
`make runner-status`, and `make check-runner-tools`. Registration tokens are
referenced by file path only and are never committed.
---
## T05 - Prove a non-production sample job
```task
id: FORGE-WP-0003-T05
status: wait
priority: high
state_hub_task_id: "9ada5b3e-2ddb-4a55-b9f4-5a6e00fef8b2"
```
Run a tiny non-production workflow against the runner before using it for
inter-hub deployment.
The proof should show:
- job scheduling reaches the expected runner;
- labels match the published contract;
- build tooling required by the first supported workload is present;
- no cluster deployment authority is granted unless separately approved;
- logs and State Hub evidence identify the runner and commit without exposing
tokens.
Done when the sample job result is recorded and consumers can cite the runner
label as available.
**2026-06-07:** Added `.gitea/workflows/forge-runner-smoke.yaml`. It cannot pass
until an approved runner is registered and visible to Gitea.
**2026-06-07:** Haskelseed now has a running runner with matching labels. Smoke
execution is still pending until the workflow exists in the remote Gitea repo
and is dispatched or triggered.
**2026-06-08:** The workflow exists in Gitea and run `#1` was created from the
push, but it is still `Waiting`. This task now waits on authenticated Gitea
Actions inspection to approve, rerun, or diagnose runner assignment.
---
## T06 - Unblock the inter-hub deployment path
```task
id: FORGE-WP-0003-T06
status: wait
priority: high
state_hub_task_id: "53929202-40aa-4470-a249-9d0ee02d3213"
```
Coordinate the first real consumer unblock with inter-hub after T05 passes.
Steps:
- confirm the inter-hub workflow can target the approved runner labels;
- rerun or inspect the Build and Deploy workflow for the blocked commits;
- verify the expected inter-hub image tag exists in the registry;
- hand off runner evidence and any workflow adjustment recommendation to
inter-hub;
- avoid repeated production push probes until the runner is visible and ready.
Done when inter-hub has a clear deployment result or a narrower non-runner
blocker.
**2026-06-07:** Inter-hub unblock remains gated on T05. Do not rerun production
push probes until the forge smoke workflow passes.
---
## T07 - Publish runner evidence and ongoing health checks
```task
id: FORGE-WP-0003-T07
status: done
priority: medium
state_hub_task_id: "c959a553-ec48-4e98-a752-168a2b067a81"
```
Update forge evidence docs and read-only operator targets so the runner is not a
one-off fix.
Include:
- runner inventory by label, placement, and trust level;
- last successful sample job and any publish job evidence;
- expected logs, dashboards, or status commands;
- documented alert or escalation condition for stuck jobs and offline runners;
- Forgejo migration notes so the same semantic labels can survive the future
Gitea-to-Forgejo cutover.
Done when forge can continuously explain whether the runner substrate is healthy
and what labels downstream workflows may depend on.
**2026-06-07:** Published runner evidence docs and Makefile probes. Current
health is explicitly `not proven`: no runner registration has been observed from
this session, and live host/Gitea inspection requires attended access.