From 21bfd5fa495ac3547a0dbca0600b8fe8b5e132cc Mon Sep 17 00:00:00 2001 From: tegwick Date: Mon, 22 Jun 2026 23:16:27 +0200 Subject: [PATCH] Normalize agent instructions and workplan frontmatter (STATE-WP-0067) - Align agent files with on-disk workplan prefixes (infer from workplan ids) - Set workplan domain to registered domain_slug; add topic_slug where applicable - Repair frontmatter delimiter formatting; migrate legacy task status literals - Regenerate AGENTS.md, CLAUDE.md, and .claude/rules from State Hub templates --- .claude/rules/agents.md | 20 ++ .claude/rules/architecture.md | 8 + .claude/rules/credential-routing.md | 50 +++++ .claude/rules/first-session.md | 38 ++++ .claude/rules/repo-boundary.md | 8 + .claude/rules/repo-identity.md | 5 + .claude/rules/session-protocol.md | 85 +++++++++ .claude/rules/stack-and-commands.md | 19 ++ .claude/rules/workplan-convention.md | 40 ++++ AGENTS.md | 63 ++++++- CLAUDE.md | 12 ++ INTENT.md | 25 ++- README.md | 7 +- SCOPE.md | 40 +++- docs/README.md | 15 ++ docs/initial-inventory.md | 146 +++++++++++++++ docs/readiness-gates.md | 63 +++++++ workplans/OPS-WP-0001-statehub-bootstrap.md | 24 ++- ...PS-WP-0002-interhub-extension-bootstrap.md | 176 ++++++++++++++++++ 19 files changed, 819 insertions(+), 25 deletions(-) create mode 100644 .claude/rules/agents.md create mode 100644 .claude/rules/architecture.md create mode 100644 .claude/rules/credential-routing.md create mode 100644 .claude/rules/first-session.md create mode 100644 .claude/rules/repo-boundary.md create mode 100644 .claude/rules/repo-identity.md create mode 100644 .claude/rules/session-protocol.md create mode 100644 .claude/rules/stack-and-commands.md create mode 100644 .claude/rules/workplan-convention.md create mode 100644 CLAUDE.md create mode 100644 docs/README.md create mode 100644 docs/initial-inventory.md create mode 100644 docs/readiness-gates.md create mode 100644 workplans/OPS-WP-0002-interhub-extension-bootstrap.md diff --git a/.claude/rules/agents.md b/.claude/rules/agents.md new file mode 100644 index 0000000..0e8a5d9 --- /dev/null +++ b/.claude/rules/agents.md @@ -0,0 +1,20 @@ +## Kaizen Agents + +Specialized agent personas available on demand via the state-hub MCP. + +**Discover:** `list_kaizen_agents()` — returns all agents with name, description, category +**Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them + +Common agents: + +| Agent | Category | When to use | +|-------|----------|-------------| +| `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature | +| `code-refactoring` | quality | Code quality analysis and safe refactoring | +| `test-maintenance` | testing | Diagnose and fix failing tests | +| `requirements-engineering` | process | Prevent interface/mock mismatches upfront | +| `keepaTodofile` | process | Maintain TODO.md during work | +| `project-management` | process | Track status, determine next steps | +| `datamodel-optimization` | quality | Optimize dataclasses and data structures | + +All 17 agents: call `list_kaizen_agents()` for the full list. diff --git a/.claude/rules/architecture.md b/.claude/rules/architecture.md new file mode 100644 index 0000000..7c2a645 --- /dev/null +++ b/.claude/rules/architecture.md @@ -0,0 +1,8 @@ +## Architecture + + + +## Quick Reference + +`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference diff --git a/.claude/rules/credential-routing.md b/.claude/rules/credential-routing.md new file mode 100644 index 0000000..a984bf2 --- /dev/null +++ b/.claude/rules/credential-routing.md @@ -0,0 +1,50 @@ +# Credential and access routing + +**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect** +for inference. Run this check **before** requesting secrets, API keys, SSH access, +login tokens, or database passwords — in any repo, not only `ops-warden`. + +ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every +other credential need belongs to another subsystem. **Do not** message +`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key. + +### Lookup (do this first) + +```bash +warden route find "" --json +warden route show --json +``` + +Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`). + +| Agent runtime | How to orient | +| --- | --- | +| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=ops-hub` is for coordination, not secret vending | +| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership | +| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` | + +### Quick routing table + +| I need… | Owner | ops-warden executes? | +| --- | --- | --- | +| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` | +| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only | +| Login / OIDC / MFA | key-cape / Keycloak | No — route only | +| Authorization decision | flex-auth | No — route only | +| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` | +| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only | + +### Anti-patterns (do not do these) + +- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc. +- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist +- Pasting secrets into Git, State Hub, workplans, logs, or chat + +### Other capabilities (reuse-surface) + +Non-credential capabilities are usually discovered through **reuse-surface** federation +(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in +every repo's agent instructions because it is high-frequency, high-risk, and easy to +get wrong. + +**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml` \ No newline at end of file diff --git a/.claude/rules/first-session.md b/.claude/rules/first-session.md new file mode 100644 index 0000000..4dd19e9 --- /dev/null +++ b/.claude/rules/first-session.md @@ -0,0 +1,38 @@ +## First Session Protocol + +Triggered when `get_domain_summary("infotech")` shows **no workstreams**. +The project is registered but work has not yet been structured. + +**Step 1 — Read, don't write** +- `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope +- `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases +- Scan repo root: README, directory structure, existing code or docs + +**Step 2 — Survey in-progress work** +Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete. + +**Step 3 — Propose workstreams to Bernd** +Propose 1–3 workstreams — each a coherent strand, weeks to months, anchored to a +roadmap phase. **Wait for approval before creating.** + +**Step 4 — Create workplan file first, then DB record (ADR-001)** +``` +workplans/OPS-WP-NNNN-.md ← write this first +``` +Then register in the hub: +``` +create_workstream(topic_id="1f2e4d10-c967-4803-ae6c-7f4b4e806409", title="...", owner="...", description="...") +create_task(workstream_id="", title="...", priority="high|medium|low") +``` + +**Step 5 — Record the setup** +``` +add_progress_event( + summary="First session: structured infotech into N workstreams, M tasks", + event_type="milestone", + topic_id="1f2e4d10-c967-4803-ae6c-7f4b4e806409", + detail={"workstreams": [...], "tasks_created": M} +) +``` + + diff --git a/.claude/rules/repo-boundary.md b/.claude/rules/repo-boundary.md new file mode 100644 index 0000000..9ced30c --- /dev/null +++ b/.claude/rules/repo-boundary.md @@ -0,0 +1,8 @@ +## Repo boundary + +This repo owns **ops-hub** only. It does not own: + + diff --git a/.claude/rules/repo-identity.md b/.claude/rules/repo-identity.md new file mode 100644 index 0000000..6ff1c96 --- /dev/null +++ b/.claude/rules/repo-identity.md @@ -0,0 +1,5 @@ +**Purpose:** Operations / System 1 extension for Inter-Hub, focused on operational truth, readiness evidence, service catalog records, and migration gates. + +**Domain:** infotech +**Repo slug:** ops-hub +**Topic ID:** 1f2e4d10-c967-4803-ae6c-7f4b4e806409 diff --git a/.claude/rules/session-protocol.md b/.claude/rules/session-protocol.md new file mode 100644 index 0000000..0127761 --- /dev/null +++ b/.claude/rules/session-protocol.md @@ -0,0 +1,85 @@ +## Session Protocol + +Dev Hub (State Hub API): http://127.0.0.1:8000 +MCP server name in `~/.claude.json`: `dev-hub` + +**Step 1 — Orient** + +Read the offline-safe brief first — it works without a live hub connection: +```bash +cat .custodian-brief.md +``` +Then call the MCP tool for richer cross-domain context when MCP tools are exposed: +``` +get_domain_summary("infotech") +``` +If MCP tools are unavailable in the current agent session, use the REST API: +```bash +curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool +``` +If the hub is offline: `cd ~/state-hub && make api` + +**Step 2 — Check inbox** +With MCP tools: +``` +get_messages(to_agent="ops-hub", unread_only=True) +``` +Mark read with `mark_message_read(message_id)`. Reply or act on coordination +requests before proceeding. + +Without MCP tools: +```bash +curl -s "http://127.0.0.1:8000/messages/?to_agent=ops-hub&unread_only=true" \ + | python3 -m json.tool +curl -s -X PATCH "http://127.0.0.1:8000/messages//read" \ + -H "Content-Type: application/json" -d '{}' +``` + +**Step 3 — Scan workplans** +```bash +ls workplans/ +``` +For each file with `status: ready`, `active`, or `blocked`, note pending +`wait`/`todo`/`progress` tasks. + +**Step 4 — Present brief** + +1. **Active workstreams** for `infotech` — title, task counts, blocking decisions +2. **Pending tasks** from `workplans/` + any `[repo:ops-hub]` hub tasks +3. **Goal guidance** — if `goal_guidance` in summary: + - `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"* + - `alignment_warnings`: flag if active work is not aligned with current goal +4. **Suggested next action** — highest-priority open item +5. **SBOM status** — flag if `last_sbom_at` is unset for this repo + +If no workstreams: follow First Session Protocol (`first-session.md`). + +**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()` + +> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`) +> are First Session Protocol only. Work structure belongs in repo files (ADR-001). + +**Session close:** +With MCP tools: +``` +add_progress_event(summary="...", topic_id="1f2e4d10-c967-4803-ae6c-7f4b4e806409", workstream_id="") +``` +Without MCP tools: +```bash +curl -s -X POST http://127.0.0.1:8000/progress/ \ + -H "Content-Type: application/json" \ + -d '{"topic_id":"1f2e4d10-c967-4803-ae6c-7f4b4e806409","workstream_id":"","event_type":"note","summary":"what changed","author":"codex"}' +``` +If workplan files were modified, ensure the local copy is up to date first: +```bash +git -C pull --ff-only +cd ~/state-hub && make fix-consistency REPO=ops-hub +``` +For repos where implementation runs on a remote machine (e.g. CoulombCore), +use the combined target which pulls before fixing: +```bash +cd ~/state-hub && make fix-consistency-remote REPO=ops-hub +``` +**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback +will sync the file to match DB. **C-16** (repo behind remote) blocks all writes +until you pull — intentional to prevent clobbering remote progress. diff --git a/.claude/rules/stack-and-commands.md b/.claude/rules/stack-and-commands.md new file mode 100644 index 0000000..dc53ac6 --- /dev/null +++ b/.claude/rules/stack-and-commands.md @@ -0,0 +1,19 @@ +## Stack + + +- **Language:** +- **Key deps:** + +## Dev Commands + +```bash +# TODO: Fill in the standard commands for this repo + +# Install dependencies + +# Run tests + +# Lint / type check + +# Build / package (if applicable) +``` diff --git a/.claude/rules/workplan-convention.md b/.claude/rules/workplan-convention.md new file mode 100644 index 0000000..1451320 --- /dev/null +++ b/.claude/rules/workplan-convention.md @@ -0,0 +1,40 @@ +## Workplan Convention (ADR-001) + +File location: `workplans/OPS-WP-NNNN-.md` +ID prefix: `OPS-WP-` + +Work items originate as files in this repo **before** being registered in the hub. + +Canonical workplan/workstream frontmatter statuses are: +`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`. +Use `proposed` for a newly drafted plan, `ready` after review against current +repo state, and `finished` when implementation is complete. `stalled` and +`needs_review` are derived health labels, not stored statuses. + +Closed workplans may be moved to `workplans/archived/` with a completion-date +prefix: `YYMMDD-OPS-WP-NNNN-.md`. The frontmatter id remains +unchanged; the prefix is only for quick visual reference. + +Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**: +`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids +`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed +directly. Promote anything requiring analysis, design, approval, dependencies, or +multiple planned phases into a normal workplan. + +Ecosystem todos from other agents arrive as `[repo:ops-hub]` hub tasks — +visible at session start. Pick one up by creating the workplan file, then registering +the workstream. + +Task blocks use this shape: + +```task +id: OPS-WP-NNNN-T01 +status: wait | todo | progress | done | cancel +priority: high | medium | low +state_hub_task_id: "" # written by fix-consistency — do not edit +``` + +Status progression is `todo` → `progress` → `done`; use `wait` for waiting or +blocked work and `cancel` for stopped work. + + diff --git a/AGENTS.md b/AGENTS.md index 572013c..fcf757f 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,9 +2,9 @@ ## Repo Identity -**Purpose:** Inter-hub extension for the operations & resiliance subdimension of the orthogonal architecture standard perspective. +**Purpose:** Operations / System 1 extension for Inter-Hub, focused on operational truth, readiness evidence, service catalog records, and migration gates. -**Domain:** inter_hub +**Domain:** infotech **Repo slug:** ops-hub **Topic ID:** `1f2e4d10-c967-4803-ae6c-7f4b4e806409` **Workplan prefix:** `OPS-WP-` @@ -101,6 +101,63 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/" \ --- +## Credential and access routing + +**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect** +for inference. Run this check **before** requesting secrets, API keys, SSH access, +login tokens, or database passwords — in any repo, not only `ops-warden`. + +ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every +other credential need belongs to another subsystem. **Do not** message +`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key. + +### Lookup (do this first) + +```bash +warden route find "" --json +warden route show --json +``` + +Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`). + +| Agent runtime | How to orient | +| --- | --- | +| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=ops-hub` is for coordination, not secret vending | +| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership | +| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` | + +### Quick routing table + +| I need… | Owner | ops-warden executes? | +| --- | --- | --- | +| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` | +| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only | +| Login / OIDC / MFA | key-cape / Keycloak | No — route only | +| Authorization decision | flex-auth | No — route only | +| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` | +| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only | + +### Anti-patterns (do not do these) + +- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc. +- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist +- Pasting secrets into Git, State Hub, workplans, logs, or chat + +### Other capabilities (reuse-surface) + +Non-credential capabilities are usually discovered through **reuse-surface** federation +(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in +every repo's agent instructions because it is high-frequency, high-risk, and easy to +get wrong. + +**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml` + + + + +--- + ## Workplan Convention (ADR-001) Work items originate as files in this repo — not in the hub. The hub is a @@ -124,7 +181,7 @@ anything needing analysis, design, approval, dependencies, or multiple phases. id: OPS-WP-NNNN type: workplan title: "..." -domain: inter_hub +domain: infotech repo: ops-hub status: proposed | ready | active | blocked | backlog | finished | archived owner: codex diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..9f8d695 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,12 @@ +# ops-hub — Claude Code Instructions + +@SCOPE.md +@.claude/rules/repo-identity.md +@.claude/rules/session-protocol.md +@.claude/rules/first-session.md +@.claude/rules/workplan-convention.md +@.claude/rules/stack-and-commands.md +@.claude/rules/architecture.md +@.claude/rules/repo-boundary.md +@.claude/rules/credential-routing.md +@.claude/rules/agents.md diff --git a/INTENT.md b/INTENT.md index 28a14db..8c02217 100644 --- a/INTENT.md +++ b/INTENT.md @@ -7,9 +7,16 @@ updated: "2026-06-06" ## Why it exists -Inter-hub extension for the operations & resiliance subdimension of the orthogonal architecture standard perspective. +`ops-hub` is the Operations / System 1 extension for Inter-Hub. It turns +operational reality into governed, queryable, and evidence-backed hub records: +environments, hosts, clusters, services, endpoints, releases, backups, +incidents, risks, runbooks, readiness gates, and migration waves. -Inter-hub extension for the operations & resiliance subdimension of the orthogonal architecture standard perspective. +It exists because Railiance and HelixForge operations need a durable +operational truth surface while the current CoulombCore environment transitions +toward the ThreePhoenix production shape. State Hub continues to own +workstreams and decisions; Inter-Hub continues to own the generic hub substrate. +`ops-hub` owns the operations extension behavior built on top of that substrate. ## Governing principle @@ -17,8 +24,16 @@ This repository should stay focused on the purpose above. Work that changes its authority, ownership boundaries, or operational promises should be captured in a workplan before implementation. +The first implementation rule is: domain-specific runtime code belongs here, +while generic hub framework behavior belongs in `inter-hub`. + ## What it enables -- A coding agent can understand why the repository exists before changing it. -- State Hub can register and coordinate work for this repository. -- Future workplans can stay connected to the repository's intended role. +- Operators can see what runs where, how it is reached, and what evidence proves + it is healthy. +- Collectors, adapters, and scheduled probes can report operational facts into + Inter-Hub using the ops vocabulary. +- Readiness and migration gates can be represented as explicit, auditable + operational records. +- Future VSM hubs can reuse the extension pattern without turning Inter-Hub + itself into a domain-specific operations product. diff --git a/README.md b/README.md index fadebfd..7f2a5eb 100644 --- a/README.md +++ b/README.md @@ -1 +1,6 @@ -Inter-hub extension for the operations & resiliance subdimension of the orthogonal architecture standard perspective. \ No newline at end of file +Operations / System 1 extension for Inter-Hub. + +`ops-hub` is the operational truth surface for environments, hosts, clusters, +services, endpoints, releases, backups, incidents, risks, runbooks, readiness +gates, and migration waves. Generic hub framework work stays in `inter-hub`; +operations-specific extension code belongs here. diff --git a/SCOPE.md b/SCOPE.md index 0d2b549..a1a9cfa 100644 --- a/SCOPE.md +++ b/SCOPE.md @@ -1,32 +1,52 @@ # SCOPE -> This file was generated by `statehub register`. Refine it as the repository -> boundaries become clearer. - ## One-liner -Inter-hub extension for the operations & resiliance subdimension of the orthogonal architecture standard perspective. +Operations / System 1 extension for Inter-Hub, focused on operational truth, +readiness evidence, and migration gates. ## Core Idea -ops-hub exists to provide the capability described in INTENT.md. +`ops-hub` is a domain-specific Inter-Hub extension. It should professionalize +operations by making environments, hosts, clusters, services, endpoints, +releases, backups, incidents, risks, runbooks, readiness gates, and migration +waves explicit and evidence-backed. + +The repo is intentionally separate from `inter-hub`: generic framework and API +substrate work remains in `inter-hub`; operations-specific collectors, +adapters, probes, bootstrap clients, UI/extensions, tests, and packaging belong +here. ## In Scope -- Maintain the repository's primary implementation. -- Keep docs, tests, and operational metadata current. +- Operations hub implementation code and tests. +- Ops vocabulary clients, collectors, adapters, and scheduled probes. +- Inter-Hub bootstrap/smoke tooling for the `ops-hub` extension. +- Operations service catalog, readiness, migration, endpoint, backup, restore, + incident, and runbook models. +- Repo-local workplans for growing the Operations / System 1 extension. ## Out of Scope -- Own unrelated adjacent systems. -- Make irreversible operational decisions without human approval. +- Generic Inter-Hub framework behavior, API substrate, authentication, or + registry semantics. +- State Hub workstream, task, decision, or progress implementation. +- Railiance infrastructure, cluster, platform, enablement, or app desired state. +- Manual production DB seeding unless the operator explicitly chooses that + fallback. +- Irreversible operational decisions without human approval. ## Current State -- Status: active; implementation and stability should be verified by the repo agent. +- Status: active bootstrap. +- Implementation: no executable source tree yet; first real workplan is seeded + in `workplans/OPS-WP-0002-interhub-extension-bootstrap.md`. +- Live Inter-Hub production gate: `/api/v2/hubs` still returned `404` on + 2026-06-06, so supported API bootstrap is not yet available in production. ## Getting Oriented - Start with: INTENT.md - Agent instructions: AGENTS.md - Workplans: workplans/ +- HelixForge handoff: `/home/worsch/helix-forge/workplans/HF-WP-0001-establish-ops-hub-first-extension.md` diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..65543a1 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,15 @@ +# ops-hub Docs + +This directory contains the first repo-local version of the HelixForge +`HF-WP-0001` handoff. + +- `initial-inventory.md` defines the first environment, host, cluster, service, + and endpoint catalog. +- `readiness-gates.md` defines the CoulombCore-to-ThreePhoenix readiness model. +- `bootstrap-runbook.md` defines the operator-ready Inter-Hub bootstrap path. +- `../seeds/ops-hub-manifest.draft.json` contains the initial capability + manifest draft. +- `../seeds/ops-hub-widgets.seed.json` contains the initial widget seed. +- `../seeds/ops-hub-bootstrap.sql` is an operator-approved fallback only; do + not use direct DB seeding while the supported Inter-Hub API path is viable or + pending. diff --git a/docs/initial-inventory.md b/docs/initial-inventory.md new file mode 100644 index 0000000..fd965c7 --- /dev/null +++ b/docs/initial-inventory.md @@ -0,0 +1,146 @@ +# Ops Hub Initial Inventory + +Date: 2026-06-06 + +## Purpose + +This document is the first structured inventory for `ops-hub`, the VSM +Operations / System 1 hub. It turns the current operations situation into a +catalogable model for this implementation repo. + +Source background: + +- `/home/worsch/helix-forge/wiki/CurrentOperationsSituation.md` +- `/home/worsch/helix-forge/workplans/HF-WP-0001-establish-ops-hub-first-extension.md` + +## Repository Boundary + +As of 2026-06-06, `ops-hub` implementation belongs in `/home/worsch/ops-hub` +with remote `gitea-remote:coulomb/ops-hub.git`. + +- `ops-hub` owns collectors, adapters, scheduled probes, runtime + packaging, UI/extensions, tests, and Inter-Hub bootstrap/smoke clients. +- `inter-hub` remains the generic hub framework, manifest/registry substrate, + authentication surface, widget/event API, and bootstrap API owner. +- `helix-forge` keeps architecture context and the original coordinating + workplan. +- Railiance repos own deployable infrastructure/service state and the + operational evidence that `ops-hub` should surface. + +## VSM Placement + +| Field | Value | +|---|---| +| Hub | `ops-hub` | +| Hub family | `vsm` | +| VSM function | `OPS` | +| VSM system | `S1` | +| Primary concern | Operational truth and evidence | + +`ops-hub` owns the description of what is currently running, where it runs, how +it is reached, what state it is in, and what operational evidence exists. It +does not replace State Hub workstreams or Inter-Hub governance. + +## Environments + +| Environment | Role | Current state | Notes | +|---|---|---|---| +| `local` | Workstation development and local services | Active, important, not production | Hosts State Hub and local build/runtime pieces. | +| `coulombcore` | Live transitional production | Active, production-like, historically hand-built | Public IP `92.205.130.254`; runs current Gitea and experimental operational services. | +| `railiance01` | Future production foundation | Provisioning target | Public IP `92.205.62.239`; first server of intended ThreePhoenix shape. | +| `threephoenix-prod` | Target production topology | Planned | Future governed multi-node production environment. | + +## Hosts + +| Host | Environment | Address | Role | Known gaps | +|---|---|---|---|---| +| `coulombcore` | `coulombcore` | `92.205.130.254` | Current live production-like server | Needs service catalog, drift tracking, backup/restore evidence, and migration disposition. | +| `railiance01` | `railiance01` | `92.205.62.239` | First ThreePhoenix production foundation node | Needs full inventory, readiness gates, and cluster/platform bootstrap evidence. | +| local workstation | `local` | local/private | State Hub and development runtime host | Needs explicit service ownership and backup expectations. | + +Ops Bridge may provide reachability evidence for connected servers, but it is +not the service catalog. `ops-hub` should turn bridge reachability into +inventory signals rather than treating the bridge itself as the inventory. + +## Clusters + +| Cluster | Environment | Role | Current notes | +|---|---|---|---| +| CoulombCore Kubernetes | `coulombcore` | Current operational Kubernetes runtime | Hosts current Gitea deployment and related services. | +| ThreePhoenix Kubernetes | `threephoenix-prod` | Target production runtime | Future governed production cluster assembled through Railiance repos. | + +## Services + +| Service | Current environment | Owner repo | Current evidence | Gaps | +|---|---|---|---|---| +| Gitea | `coulombcore` | `railiance-apps` | Helm release `gitea`, namespace `default`, app version `1.25.4`, NodePort `32166`, public registry path returns auth challenge. | SOPS Helm values update, package token, `docker login`, push, pull, backup coverage, restore evidence. | +| Gitea database | `coulombcore` | `railiance-platform` | Database `gitea-db` in namespace `databases`. | Backup and restore evidence not recorded here yet. | +| Gitea shared storage | `coulombcore` | `railiance-platform` / `railiance-apps` | PVC `default/gitea-shared-storage`. | Package blob backup and restore evidence not confirmed. | +| State Hub | `local` | `the-custodian/state-hub` | Local API and dashboard are operational enough for repo registration and workplan sync. | Future cluster deployment/readiness still needs gates and evidence. | +| Inter-Hub | live public endpoint | `inter-hub` | `https://hub.coulomb.social/api/v2/openapi.json` and docs are reachable. | Hub bootstrap still depends on authenticated UI or migration. | +| Ops Bridge | local/remote bridge | `ops-bridge` | Useful for connected-server visibility. | Not a service catalog; should emit reachability evidence into `ops-hub`. | + +## Endpoints + +| Endpoint | Service | Environment | Current status | Evidence | +|---|---|---|---|---| +| `https://gitea.coulomb.social/v2/` | Gitea OCI registry | `coulombcore` | Route fixed; returns registry auth challenge | Expected `401` with OCI registry challenge. | +| `https://hub.coulomb.social/api/v2/openapi.json` | Inter-Hub API | live Inter-Hub | Reachable | OpenAPI document fetched on 2026-05-16. | +| `https://hub.coulomb.social/Hubs` | Inter-Hub UI | live Inter-Hub | Requires login | Redirects to `/NewSession`. | +| `http://127.0.0.1:8000/state/health` | State Hub API | `local` | Reachable locally | Used for StateHub registration/sync. | + +## Service Catalog Gap + +There is no central place that answers these questions: + +- What runs where? +- Which repo owns its desired state? +- Which endpoint exposes it? +- Which data stores back it? +- Which backups and restore tests cover it? +- Which migration wave will replace or move it? +- Which current evidence proves it is healthy? + +`ops-hub` should be the first place where these answers are explicit and +machine-addressable. + +## First Ops Widgets + +Seed these in Inter-Hub once `ops-hub` exists: + +- `ops-env-local` +- `ops-env-coulombcore` +- `ops-env-railiance01` +- `ops-env-threephoenix-prod` +- `ops-host-coulombcore` +- `ops-host-railiance01` +- `ops-service-catalog` +- `ops-service-gitea` +- `ops-service-state-hub` +- `ops-service-inter-hub` +- `ops-endpoint-gitea-registry` +- `ops-readiness-gitea-registry` +- `ops-readiness-state-hub-cluster-deploy` +- `ops-migration-coulombcore-to-threephoenix` + +## First Evidence Events + +The first event should be the Gitea registry endpoint verification: + +```json +{ + "widgetId": "", + "eventType": "ops-endpoint-verified", + "viewContext": "railiance-apps/workplans/RAIL-AP-WP-0001", + "metadata": { + "vsmFunction": "OPS", + "vsmSystem": "S1", + "endpoint": "https://gitea.coulomb.social/v2/", + "expectedStatus": 401, + "observedHeader": "Docker-Distribution-Api-Version: registry/2.0" + } +} +``` + +This event is blocked until the ops event type is registered by an active +manifest and the target widget exists. diff --git a/docs/readiness-gates.md b/docs/readiness-gates.md new file mode 100644 index 0000000..8af8326 --- /dev/null +++ b/docs/readiness-gates.md @@ -0,0 +1,63 @@ +# Ops Hub Readiness Gates + +Date: 2026-06-06 + +## Purpose + +These gates define what must be true before operational responsibility can move +from the current CoulombCore setup to the future ThreePhoenix production setup. +They are the first repo-local `ops-hub` readiness model. + +Statuses: + +- `unknown` means no reliable evidence has been cataloged yet. +- `partial` means some evidence exists, but the gate is not complete. +- `blocked` means a required precondition is missing. +- `ready` means the evidence requirement is satisfied. + +## Gates + +| ID | Gate | Owner repo | Evidence requirement | Current status | +|---|---|---|---|---| +| OPS-G01 | Environment inventory exists | `ops-hub` | `local`, `coulombcore`, `railiance01`, and `threephoenix-prod` are represented with role, lifecycle state, and owner notes. | `partial` | +| OPS-G02 | Service catalog exists | `ops-hub` | Each live and target service has environment, owner repo, endpoint, backing stores, lifecycle state, and evidence links. | `partial` | +| OPS-G03 | DNS and TLS are codified | `railiance-cluster` / `railiance-apps` | Public hostnames, ingress routes, certificate sources, and renewal paths are declared in repo files. | `unknown` | +| OPS-G04 | Git hosting is reproducible | `railiance-apps` / `railiance-platform` | Gitea or successor deployment can be recreated from repo state, including database and storage dependencies. | `partial` | +| OPS-G05 | Container registry publishing is proven | `railiance-apps` | `docker login`, push, and pull succeed against `https://gitea.coulomb.social/v2/` using governed secrets. | `partial` | +| OPS-G06 | Persistent data is backed up | `railiance-platform` | Each persistent data store has backup location, schedule, retention, ownership, and latest successful backup evidence. | `unknown` | +| OPS-G07 | Restore path is proven | `railiance-platform` / `railiance-apps` | Restore test evidence exists for Gitea database, package blobs, and State Hub data. | `unknown` | +| OPS-G08 | Secrets path is governed | `railiance-infra` / `railiance-apps` | SOPS/age keys and operator secret paths are documented; no required secret depends on shell memory. | `partial` | +| OPS-G09 | Cluster runtime is reproducible | `railiance-cluster` | Kubernetes runtime, ingress, CNI, operators, and routing primitives are recreated through repo-owned automation. | `unknown` | +| OPS-G10 | Platform services are reproducible | `railiance-platform` | PostgreSQL/CNPG, object storage, secret management, and identity dependencies have repo-owned deployment evidence. | `unknown` | +| OPS-G11 | Application deployment is reproducible | `railiance-apps` | Gitea, Inter-Hub, State Hub, and other application releases are declared with Helm values and deployment runbooks. | `partial` | +| OPS-G12 | Rollback path is documented | owning service repos | Each migration wave has rollback conditions, steps, and data safety notes. | `unknown` | +| OPS-G13 | Operator runbooks exist | owning service repos | Deploy, restore, rotate, incident response, and migration runbooks exist for each critical service. | `unknown` | +| OPS-G14 | Observability and health checks are explicit | `railiance-cluster` / `railiance-platform` / service repos | Health checks, logs, metrics, and endpoint probes are documented and tied to service catalog entries. | `unknown` | +| OPS-G15 | Inter-Hub ops bootstrap is available | `inter-hub` / `ops-hub` / `helix-forge` | `ops-hub` can be created through UI, supported API, or explicit migration fallback, manifest activated, API consumer/key created, widgets seeded, and events accepted. | `partial` | + +## Initial Migration Waves + +| Wave | Goal | Required gates | +|---|---|---| +| `wave-0-catalog` | Establish the operational truth surface without moving services. | OPS-G01, OPS-G02, OPS-G15 | +| `wave-1-registry-proof` | Prove current Gitea registry publishing and evidence capture. | OPS-G03, OPS-G05, OPS-G08, OPS-G14 | +| `wave-2-backup-restore` | Confirm backups and restore paths for critical persistent state. | OPS-G06, OPS-G07, OPS-G13 | +| `wave-3-threephoenix-foundation` | Recreate cluster and platform foundations on railiance01/ThreePhoenix. | OPS-G09, OPS-G10 | +| `wave-4-service-migration` | Move or replace production responsibilities from CoulombCore to ThreePhoenix. | OPS-G04, OPS-G11, OPS-G12 plus service-specific gates | + +## Evidence Shape + +Each readiness gate should eventually be represented in `ops-hub` as a widget +or widget family with events like: + +- `ops-readiness-gate-updated` +- `ops-endpoint-verified` +- `ops-backup-verified` +- `ops-restore-tested` +- `ops-risk-raised` +- `ops-migration-gate-passed` +- `ops-migration-gate-failed` + +Until Inter-Hub can create all required records through API calls, the evidence +can be maintained in this repo and mirrored into Inter-Hub through the UI or +explicit operator-approved migrations. diff --git a/workplans/OPS-WP-0001-statehub-bootstrap.md b/workplans/OPS-WP-0001-statehub-bootstrap.md index 347cef8..40ffd02 100644 --- a/workplans/OPS-WP-0001-statehub-bootstrap.md +++ b/workplans/OPS-WP-0001-statehub-bootstrap.md @@ -2,9 +2,9 @@ id: OPS-WP-0001 type: workplan title: "Bootstrap State Hub integration" -domain: inter_hub +domain: infotech repo: ops-hub -status: ready +status: finished owner: codex topic_slug: inter_hub created: "2026-06-06" @@ -13,24 +13,28 @@ updated: "2026-06-06" # Bootstrap State Hub integration -Inter-hub extension for the operations & resiliance subdimension of the orthogonal architecture standard perspective. +Bootstrap this repo's State Hub integration and replace generated placeholders +with the first concrete `ops-hub` operating frame. ## Review Generated Integration Files ```task id: OPS-WP-0001-T01 -status: todo +status: done priority: high ``` Review `INTENT.md`, `SCOPE.md`, `AGENTS.md`, and `.custodian-brief.md`. Replace generated placeholders with repo-specific facts where needed. +Completed 2026-06-06: `INTENT.md`, `SCOPE.md`, `AGENTS.md`, and `README.md` +now describe `ops-hub` as the Operations / System 1 Inter-Hub extension. + ## Verify Local Developer Workflow ```task id: OPS-WP-0001-T02 -status: todo +status: done priority: high ``` @@ -38,11 +42,16 @@ Identify the repo's install, test, lint, build, and run commands. Add or refine those commands in the agent instructions so future coding sessions can verify changes confidently. +Completed 2026-06-06: the repo currently has no executable source tree, +dependency manifest, test suite, or build command. `AGENTS.md` records +documentation/workplan verification commands and requires future source work to +add repo-native lint, test, build, and run commands. + ## Seed First Real Workplan ```task id: OPS-WP-0001-T03 -status: todo +status: done priority: medium ``` @@ -52,3 +61,6 @@ next change. After workplan file updates, run from `~/state-hub`: ```bash make fix-consistency REPO=ops-hub ``` + +Completed 2026-06-06: seeded +`workplans/OPS-WP-0002-interhub-extension-bootstrap.md`. diff --git a/workplans/OPS-WP-0002-interhub-extension-bootstrap.md b/workplans/OPS-WP-0002-interhub-extension-bootstrap.md new file mode 100644 index 0000000..363f71d --- /dev/null +++ b/workplans/OPS-WP-0002-interhub-extension-bootstrap.md @@ -0,0 +1,176 @@ +--- +id: OPS-WP-0002 +type: workplan +title: "Bootstrap ops-hub as an Inter-Hub Operations extension" +domain: infotech +repo: ops-hub +status: active +owner: codex +topic_slug: inter_hub +created: "2026-06-06" +updated: "2026-06-06" +--- + +# Bootstrap ops-hub as an Inter-Hub Operations extension + +## Goal + +Turn the HelixForge `HF-WP-0001` handoff into the first concrete `ops-hub` +implementation track. + +`ops-hub` should become the Operations / System 1 Inter-Hub extension for +operational truth: environments, hosts, clusters, services, endpoints, +releases, backups, incidents, risks, runbooks, readiness gates, and migration +waves. + +This repo owns domain-specific implementation assets. `inter-hub` remains the +generic framework, registry, authentication, manifest, widget, event, and +bootstrap API substrate. + +## Current Gate + +As of 2026-06-06, public production Inter-Hub still returns `404` for: + +```text +https://hub.coulomb.social/api/v2/hubs +``` + +Do not run manual database seeding unless the operator explicitly chooses that +fallback. The preferred bootstrap path is the supported Inter-Hub API once +production exposes the current bootstrap surface. + +Gate criteria: + +- Unauthenticated `GET /api/v2/hubs` returns `401`, not `404`. +- OpenAPI lists `/hubs`, `/hub-capability-manifests`, `/api-consumers`, and + `/policy-scopes`. +- The bootstrap/smoke client can create or reuse the `ops-hub` hub, activate + its manifest, create the runtime API consumer/key, seed initial widgets, and + persist the first governed ops event. + +## Handoff Sources + +- `/home/worsch/helix-forge/workplans/HF-WP-0001-establish-ops-hub-first-extension.md` +- `/home/worsch/helix-forge/wiki/OpsHubInventory.md` +- `/home/worsch/helix-forge/wiki/OpsHubReadinessGates.md` +- `/home/worsch/helix-forge/wiki/OpsHubBootstrapRunbook.md` +- `/home/worsch/helix-forge/wiki/ops-hub-manifest.draft.json` +- `/home/worsch/helix-forge/wiki/ops-hub-widgets.seed.json` + +## Port HelixForge Handoff Artifacts + +```task +id: OPS-WP-0002-T01 +status: done +priority: high +``` + +Create repo-local docs and seed data for the ops vocabulary, initial inventory, +readiness gates, bootstrap runbook, manifest draft, and widget seed. + +Done when the `ops-hub` repo can be understood without opening HelixForge for +routine implementation details. Keep links back to HelixForge for architectural +context. + +Completed 2026-06-06: + +- Ported initial inventory to `docs/initial-inventory.md`. +- Ported readiness gates to `docs/readiness-gates.md`. +- Ported bootstrap runbook to `docs/bootstrap-runbook.md`. +- Ported manifest and widget seeds to `seeds/`. +- Added `docs/README.md` as the handoff index. + +## Define Repository Source Layout + +```task +id: OPS-WP-0002-T02 +status: done +priority: high +``` + +Choose and create the first source layout for bootstrap/smoke tooling, +collectors, adapters, and tests. Add the repo-native lint, test, build, and run +commands to `AGENTS.md`. + +Done when future code changes have an obvious home and a verification command. + +Completed 2026-06-06: + +- Added `pyproject.toml`. +- Added Python package layout under `src/ops_hub/`. +- Added operator scripts under `scripts/`. +- Added tests under `tests/`. +- Documented current verification commands in `AGENTS.md`. + +## Implement Inter-Hub Production Gate Probe + +```task +id: OPS-WP-0002-T03 +status: done +priority: high +``` + +Build a small probe that checks the public Inter-Hub bootstrap API gate: + +- `/api/v2/hubs` response is `401` unauthenticated. +- OpenAPI lists the required bootstrap paths. +- The result is machine-readable and suitable for a scheduled ops signal later. + +Done when the probe can run locally without secrets and reports the current +gate as pass/fail with clear reasons. + +Completed 2026-06-06: `scripts/interhub-gate-probe.py` checks unauthenticated +`/api/v2/hubs` status and required OpenAPI bootstrap paths, emits JSON, and +exits nonzero while the gate is closed. + +## Implement Bootstrap Smoke Client + +```task +id: OPS-WP-0002-T04 +status: wait +priority: high +``` + +Implement the authenticated bootstrap/smoke client once Inter-Hub production +exposes the supported bootstrap API. + +The client should use `IHUB_BASE` and `IHUB_OPERATOR_KEY` and should create or +reuse: + +- `ops-hub` hub row +- active capability manifest +- runtime API consumer/key +- initial governed ops widgets +- first `ops-endpoint-verified` event + +Done when a dry-run and an attended real run both produce repeatable evidence +without direct DB access. + +Waiting on: Inter-Hub production API gate from T03. + +## Seed First Operational Signal + +```task +id: OPS-WP-0002-T05 +status: wait +priority: medium +``` + +Submit the first governed ops signal for the Gitea registry endpoint once the +manifest, widget, event type, and API key exist. + +Initial signal: + +```json +{ + "eventType": "ops-endpoint-verified", + "endpoint": "https://gitea.coulomb.social/v2/", + "expectedStatus": 401, + "viewContext": "railiance-apps/workplans/RAIL-AP-WP-0001" +} +``` + +Done when the event is visible in Inter-Hub and traceable to the owning +Railiance workplan. + +Waiting on: T04 and an available `ops-hub` runtime API key.