diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..03adb7b --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,214 @@ +# railiance-infra — Codex Instructions + +## Custodian State Hub Integration + +This project is tracked as the **railiance** domain in the Custodian State Hub. +Hub topic ID: `ca369340-a64e-442e-98f1-a4fa7dc74a38` + +The State Hub runs locally at http://127.0.0.1:8000. The MCP server (`state-hub`) +exposes tools for reading and writing state without touching the API directly. + +--- + +### Session Protocol + +**On receiving your first message — before writing any response text — execute +this orientation sequence. Do not greet, do not ask what to do first.** + +**Step 1 — Call the State Hub** +``` +get_domain_summary("railiance") # workstreams, blocking decisions, recent progress, SBOM status +``` +If the call fails, the API is offline: `cd ~/the-custodian/state-hub && make api` + +**Step 2 — Scan local workplans** + +Read every `.md` file under `workplans/`. Use `Glob(pattern="**/*.md", path="workplans/")` +or Bash `ls workplans/` to discover them. For each file with `status: active`, +extract and note: +- The workplan title and ID +- All tasks whose `status` is `todo` or `in_progress` + +**Step 3 — Present orientation to the user** + +Output a concise brief covering: +1. **Active workstreams** (from state hub) for the `railiance` domain — title, + task counts, any blocking decisions +2. **Pending tasks for this repo** — from local `workplans/` files (Step 2) + plus any state hub tasks with `[repo:railiance-infra]` in their title +3. **Goal guidance** — if the summary contains a `goal_guidance` key, act on it: + - **`needs_workplan`** entries: for each active repo goal with no linked workstream, + surface it as the top suggested action — *"Repo goal '{title}' has no workplan yet. + Suggest: create workplans/RAIL-HO-WP-NNNN-.md and register a workstream + with repo_goal_id='{goal_id}'"*. Treat this as higher priority than continuing + existing work unless Bernd says otherwise. + - **`alignment_warnings`** entries: if active workstreams exist but are not linked + to the current repo goal, name the most recently active one and note: + *"Current work on '{recent_workstream_title}' may not be aligned with the active + goal '{active_goal_title}'. Continue unless you hear otherwise — but flag it."* +4. **Suggested next action** — the highest-priority open item across all sources, + with goal alignment taken into account +5. **SBOM status** — is `last_sbom_at` set for this repo? If not, note it as a gap + +If there are no workstreams at all: follow the First Session Protocol below. + +**During work:** +- Use `record_decision()` for any decision that affects direction or dependencies. +- Use `add_progress_event()` for notable events (milestones, blockers, insights). +- Use `resolve_decision()` to close a decision once the choice is made. + +> **Design boundary:** The State Hub is a *read model*. Two write operations are +> permanently sanctioned: **Resolving Decisions** and **Suggesting Next Steps**. +> The bootstrap tools (`create_workstream`, `create_task`, `update_task_status`) +> are only for First Session Protocol. Formal work structure — workplans, tasks — +> belongs in the domain repo as files (ADR-001), not managed through the hub alone. + +**At the end of every session:** +- Call `add_progress_event()` with a summary of what was accomplished or decided. + Include `topic_id: ca369340-a64e-442e-98f1-a4fa7dc74a38` and the relevant `workstream_id`. + +--- + +### Repo Boundary Rule + +This agent is responsible for files **in this repo only**. + +- **Do not** write files or make commits in any other repository +- **Do not** create workplan files in other repos on their behalf +- When you identify work for another registered repo (**ecosystem todo**): + create a state hub task with `[repo:]` in the title — the other repo's + agent will see it at session start and create its own workplan +- When you identify work for an upstream repo (**third-party todo**): + create a contribution artifact in `contrib/` and register it + +Terminology and workflows: `http://localhost:3000/docs/inter-repo-communication` + +--- + +### First Session Protocol + +Triggered when `get_domain_summary("railiance")` shows **no workstreams** for the `railiance` +topic. The project is registered but work has not yet been structured. + +**Step 1 — Understand the project (read, don't write)** +- `~/the-custodian/canon/projects/railiance/project_charter_v0.1.md` — purpose, scope +- `~/the-custodian/canon/projects/railiance/roadmap_v0.1.md` — planned phases +- Scan the repo root: README, directory structure, existing code or docs + +**Step 2 — Survey in-progress work** +- Look for TODOs, open branches, half-finished files, notes +- Note what is already done vs. what is clearly started but incomplete + +**Step 3 — Propose workstreams to Bernd** +Propose 1–3 workstreams — each a coherent strand of work lasting weeks to months, +named clearly, anchored to a roadmap phase. **Wait for approval before creating.** + +**Step 4 — Create workplan file first, then DB record** +Per ADR-001, work items originate as files in the repo: +``` +workplans/RAIL-HO-WP-NNNN-.md ← write this first +``` +Then register in the hub: +``` +create_workstream(topic_id="ca369340-a64e-442e-98f1-a4fa7dc74a38", title="...", owner="...", description="...") +create_task(workstream_id="", title="...", priority="high|medium|low") +``` + +**Step 5 — Record the setup** +``` +add_progress_event( + summary="First session: structured railiance work into N workstreams, M tasks", + event_type="milestone", + topic_id="ca369340-a64e-442e-98f1-a4fa7dc74a38", + detail={"workstreams": [...], "tasks_created": M} +) +``` + +--- + +### Workplan Convention (ADR-001) + +Work items MUST originate as files in this repo before being registered in the hub. + +**File location:** `workplans/-.md` +**Frontmatter required:** `id`, `type: workplan`, `domain`, `repo`, `status`, +`state_hub_workstream_id`, `state_hub_task_id` (per task) + +When another domain's agent identifies work for this repo, it creates a state hub +task with `[repo:railiance-infra]` in the title (an **ecosystem todo**). You will +see it at session start via `get_domain_summary("railiance")`. When you pick it up, create +the corresponding workplan file in `workplans/` (ADR-001) and begin work. + +--- + +### Contribution Tracking + +Track upstream contributions in `contrib/` — bug reports (BR), feature requests +(FR), extension-point proposals (EP), upstream PRs (UPR). + +``` +contrib/ + bug-reports/ # br-YYYY-MM-DD--org--repo--slug.md + feature-requests/ # fr-YYYY-MM-DD--org--repo--slug.md + extension-points/ # EP-RAIL-NNN--org--repo--slug.md + upstream-prs/ # upr-YYYY-MM-DD--org--repo--slug.md +``` + +Templates: `~/the-custodian/canon/standards/contrib-templates/` +Convention: `~/the-custodian/canon/standards/contribution-convention_v0.1.md` + +``` +register_contribution(type="br|fr|ep|upr", title="...", target_org="...", + target_repo="...", body_path="contrib/...", related_workstream_id="") +update_contribution_status(contribution_id="", status="submitted") +``` + +--- + +### SBOM + +After updating dependencies, re-ingest the SBOM: +```bash +cd ~/the-custodian/state-hub +make ingest-sbom REPO=railiance-infra SCAN=1 REPO_PATH=$(pwd) +``` + +Check compliance: `http://localhost:3000/repos` +Standard: `~/the-custodian/canon/standards/sbom-convention_v0.1.md` + +--- + +### Remote Execution & State Hub Tunnel + +This repo is designed to be worked on **from the HostEurope server** (or any +remote Linux box with access to the managed hosts). The State Hub runs locally +on Bernd's workstation at `127.0.0.1:8000` and is not publicly reachable. + +**Before SSHing to the remote server, start a reverse tunnel on your local machine:** + +```bash +ssh -R 8000:127.0.0.1:8000 @ +``` + +This forwards the remote's `localhost:8000` back to your local State Hub. +Codex on the remote host then reaches the MCP server and `get_domain_summary` +work as normal. + +**Verify the tunnel is live from the remote:** + +```bash +curl http://127.0.0.1:8000/state/health +# expected: {"status":"ok"} +``` + +**If the tunnel is not up (degraded mode):** +The State Hub call in Step 1 will fail. In that case: +- Skip Step 1 — proceed from local workplans only (Step 2) +- Note that goal guidance and progress logging will be unavailable +- Log any progress events manually from your local machine after the session + +--- + +### Quick Reference + +`~/the-custodian/state-hub/mcp_server/TOOLS.md` — compact MCP tool reference diff --git a/workplans/RAIL-HO-WP-0004-production-readiness.md b/workplans/RAIL-HO-WP-0004-production-readiness.md index 5d55896..b861fbc 100644 --- a/workplans/RAIL-HO-WP-0004-production-readiness.md +++ b/workplans/RAIL-HO-WP-0004-production-readiness.md @@ -8,7 +8,7 @@ status: active owner: worsch topic_slug: railiance created: "2026-03-26" -updated: "2026-03-27" +updated: "2026-05-02" supersedes: RAIL-PL-WP-0001 state_hub_workstream_id: "cee078e9-b18c-4f84-8a8a-6f27c2f9f407" --- @@ -432,7 +432,7 @@ context. --- -### T09 — Deploy state-hub to cluster (S5) +### T09 — Deploy state-hub to railiance01 as cluster primary (S5) ```task id: RAIL-HO-WP-0004-T09 @@ -440,12 +440,16 @@ status: todo priority: medium state_hub_task_id: "d2afe78a-eb51-4ce9-b332-f181323d2370" needs_human: true -intervention_note: "Requires decisions: final hostname/domain for state-hub, whether to use Gitea container registry or ghcr.io, and approval before data migration from workstation postgres." +intervention_note: "Requires decisions: final hostname/domain or tunnel-only endpoint, registry choice, private exposure model, and approval before freezing workstation writes and migrating production State Hub data." ``` **Pre-condition:** T04 done (cnpg Gitea DB working); T08 done (deploy sequence -documented). State-hub needs a PostgreSQL database — use a cnpg cluster in -`databases` namespace. +documented). Custodian-side safety gate `CUST-WP-0011-T01` must have passed: +a fresh WSL2 State Hub backup restore drill with matching row counts. + +State-hub needs a PostgreSQL database — use a cnpg cluster in `databases` +namespace. This is the pragmatic railiance01 migration path; full multi-node +ThreePhoenix HA remains a separate Custodian follow-up (`CUST-WP-0038`). Steps: 1. Define `state-hub-db` cnpg Cluster in `railiance-platform` (same pattern as T03). @@ -456,13 +460,18 @@ Steps: - Service + Ingress (https://state-hub.) - ConfigMap for environment (DB URL, etc.) - Secret for DB credentials (SOPS-managed) -5. Migrate data: `pg_dump` from workstation postgres → `pg_restore` into - cnpg cluster. -6. Update ops-bridge tunnel targets if the state-hub URL changes. -7. Update `~/.claude/CLAUDE.md` global instructions to point to cluster URL. +5. Deploy empty State Hub and run Alembic migrations in-cluster. +6. Restore a copy of WSL2 data into the cnpg cluster and compare table counts + while the workstation remains the source of truth. +7. With explicit human approval, freeze workstation writes, take a final dump, + restore it to the cluster, and make railiance01 the primary endpoint. +8. Update ops-bridge tunnel targets or MCP `API_BASE` if the State Hub URL changes. +9. Update operator instructions to describe cluster primary plus WSL2 fallback. -**Done when:** `curl https://state-hub./state/health` returns healthy; -all MCP tools functional; workstation state-hub can be decommissioned. +**Done when:** the private State Hub endpoint returns healthy, MCP tools work +against the cluster-backed API, and WSL2 is retained as documented fallback. +Permanent WSL2 retirement is out of scope here and requires a later explicit +approval after stabilisation. ---