docs: establish ops-hub bootstrap artifacts

This commit is contained in:
2026-05-16 02:47:06 +02:00
parent 479abeb781
commit dfb6c1fc47
3 changed files with 246 additions and 3 deletions

132
wiki/OpsHubInventory.md Normal file
View File

@@ -0,0 +1,132 @@
# Ops Hub Initial Inventory
Date: 2026-05-16
## Purpose
This document is the first structured inventory for `ops-hub`, the VSM
Operations / System 1 hub. It turns the current operations situation into a
catalogable model before `ops-hub` has its own repository, collectors, or UI.
Source background:
- `wiki/CurrentOperationsSituation.md`
- `workplans/HF-WP-0001-establish-ops-hub-first-extension.md`
## VSM Placement
| Field | Value |
|---|---|
| Hub | `ops-hub` |
| Hub family | `vsm` |
| VSM function | `OPS` |
| VSM system | `S1` |
| Primary concern | Operational truth and evidence |
`ops-hub` owns the description of what is currently running, where it runs, how
it is reached, what state it is in, and what operational evidence exists. It
does not replace State Hub workstreams or Inter-Hub governance.
## Environments
| Environment | Role | Current state | Notes |
|---|---|---|---|
| `local` | Workstation development and local services | Active, important, not production | Hosts State Hub and local build/runtime pieces. |
| `coulombcore` | Live transitional production | Active, production-like, historically hand-built | Public IP `92.205.130.254`; runs current Gitea and experimental operational services. |
| `railiance01` | Future production foundation | Provisioning target | Public IP `92.205.62.239`; first server of intended ThreePhoenix shape. |
| `threephoenix-prod` | Target production topology | Planned | Future governed multi-node production environment. |
## Hosts
| Host | Environment | Address | Role | Known gaps |
|---|---|---|---|---|
| `coulombcore` | `coulombcore` | `92.205.130.254` | Current live production-like server | Needs service catalog, drift tracking, backup/restore evidence, and migration disposition. |
| `railiance01` | `railiance01` | `92.205.62.239` | First ThreePhoenix production foundation node | Needs full inventory, readiness gates, and cluster/platform bootstrap evidence. |
| local workstation | `local` | local/private | State Hub and development runtime host | Needs explicit service ownership and backup expectations. |
Ops Bridge may provide reachability evidence for connected servers, but it is
not the service catalog. `ops-hub` should turn bridge reachability into
inventory signals rather than treating the bridge itself as the inventory.
## Clusters
| Cluster | Environment | Role | Current notes |
|---|---|---|---|
| CoulombCore Kubernetes | `coulombcore` | Current operational Kubernetes runtime | Hosts current Gitea deployment and related services. |
| ThreePhoenix Kubernetes | `threephoenix-prod` | Target production runtime | Future governed production cluster assembled through Railiance repos. |
## Services
| Service | Current environment | Owner repo | Current evidence | Gaps |
|---|---|---|---|---|
| Gitea | `coulombcore` | `railiance-apps` | Helm release `gitea`, namespace `default`, app version `1.25.4`, NodePort `32166`, public registry path returns auth challenge. | SOPS Helm values update, package token, `docker login`, push, pull, backup coverage, restore evidence. |
| Gitea database | `coulombcore` | `railiance-platform` | Database `gitea-db` in namespace `databases`. | Backup and restore evidence not recorded here yet. |
| Gitea shared storage | `coulombcore` | `railiance-platform` / `railiance-apps` | PVC `default/gitea-shared-storage`. | Package blob backup and restore evidence not confirmed. |
| State Hub | `local` | `the-custodian/state-hub` | Local API and dashboard are operational enough for repo registration and workplan sync. | Future cluster deployment/readiness still needs gates and evidence. |
| Inter-Hub | live public endpoint | `inter-hub` | `https://hub.coulomb.social/api/v2/openapi.json` and docs are reachable. | Hub bootstrap still depends on authenticated UI or migration. |
| Ops Bridge | local/remote bridge | `ops-bridge` | Useful for connected-server visibility. | Not a service catalog; should emit reachability evidence into `ops-hub`. |
## Endpoints
| Endpoint | Service | Environment | Current status | Evidence |
|---|---|---|---|---|
| `https://gitea.coulomb.social/v2/` | Gitea OCI registry | `coulombcore` | Route fixed; returns registry auth challenge | Expected `401` with OCI registry challenge. |
| `https://hub.coulomb.social/api/v2/openapi.json` | Inter-Hub API | live Inter-Hub | Reachable | OpenAPI document fetched on 2026-05-16. |
| `https://hub.coulomb.social/Hubs` | Inter-Hub UI | live Inter-Hub | Requires login | Redirects to `/NewSession`. |
| `http://127.0.0.1:8000/state/health` | State Hub API | `local` | Reachable locally | Used for StateHub registration/sync. |
## Service Catalog Gap
There is no central place that answers these questions:
- What runs where?
- Which repo owns its desired state?
- Which endpoint exposes it?
- Which data stores back it?
- Which backups and restore tests cover it?
- Which migration wave will replace or move it?
- Which current evidence proves it is healthy?
`ops-hub` should be the first place where these answers are explicit and
machine-addressable.
## First Ops Widgets
Seed these in Inter-Hub once `ops-hub` exists:
- `ops-env-local`
- `ops-env-coulombcore`
- `ops-env-railiance01`
- `ops-env-threephoenix-prod`
- `ops-host-coulombcore`
- `ops-host-railiance01`
- `ops-service-catalog`
- `ops-service-gitea`
- `ops-service-state-hub`
- `ops-service-inter-hub`
- `ops-endpoint-gitea-registry`
- `ops-readiness-gitea-registry`
- `ops-readiness-state-hub-cluster-deploy`
- `ops-migration-coulombcore-to-threephoenix`
## First Evidence Events
The first event should be the Gitea registry endpoint verification:
```json
{
"widgetId": "<ops-endpoint-gitea-registry-widget-id>",
"eventType": "ops-endpoint-verified",
"viewContext": "railiance-apps/workplans/RAIL-AP-WP-0001",
"metadata": {
"vsmFunction": "OPS",
"vsmSystem": "S1",
"endpoint": "https://gitea.coulomb.social/v2/",
"expectedStatus": 401,
"observedHeader": "Docker-Distribution-Api-Version: registry/2.0"
}
}
```
This event is blocked until the ops event type is registered by an active
manifest and the target widget exists.

View File

@@ -0,0 +1,63 @@
# Ops Hub Readiness Gates
Date: 2026-05-16
## Purpose
These gates define what must be true before operational responsibility can move
from the current CoulombCore setup to the future ThreePhoenix production setup.
They are intended as the first `ops-hub` readiness model.
Statuses:
- `unknown` means no reliable evidence has been cataloged yet.
- `partial` means some evidence exists, but the gate is not complete.
- `blocked` means a required precondition is missing.
- `ready` means the evidence requirement is satisfied.
## Gates
| ID | Gate | Owner repo | Evidence requirement | Current status |
|---|---|---|---|---|
| OPS-G01 | Environment inventory exists | `helix-forge` | `local`, `coulombcore`, `railiance01`, and `threephoenix-prod` are represented with role, lifecycle state, and owner notes. | `partial` |
| OPS-G02 | Service catalog exists | `helix-forge` then future `ops-hub` | Each live and target service has environment, owner repo, endpoint, backing stores, lifecycle state, and evidence links. | `partial` |
| OPS-G03 | DNS and TLS are codified | `railiance-cluster` / `railiance-apps` | Public hostnames, ingress routes, certificate sources, and renewal paths are declared in repo files. | `unknown` |
| OPS-G04 | Git hosting is reproducible | `railiance-apps` / `railiance-platform` | Gitea or successor deployment can be recreated from repo state, including database and storage dependencies. | `partial` |
| OPS-G05 | Container registry publishing is proven | `railiance-apps` | `docker login`, push, and pull succeed against `https://gitea.coulomb.social/v2/` using governed secrets. | `partial` |
| OPS-G06 | Persistent data is backed up | `railiance-platform` | Each persistent data store has backup location, schedule, retention, ownership, and latest successful backup evidence. | `unknown` |
| OPS-G07 | Restore path is proven | `railiance-platform` / `railiance-apps` | Restore test evidence exists for Gitea database, package blobs, and State Hub data. | `unknown` |
| OPS-G08 | Secrets path is governed | `railiance-infra` / `railiance-apps` | SOPS/age keys and operator secret paths are documented; no required secret depends on shell memory. | `partial` |
| OPS-G09 | Cluster runtime is reproducible | `railiance-cluster` | Kubernetes runtime, ingress, CNI, operators, and routing primitives are recreated through repo-owned automation. | `unknown` |
| OPS-G10 | Platform services are reproducible | `railiance-platform` | PostgreSQL/CNPG, object storage, secret management, and identity dependencies have repo-owned deployment evidence. | `unknown` |
| OPS-G11 | Application deployment is reproducible | `railiance-apps` | Gitea, Inter-Hub, State Hub, and other application releases are declared with Helm values and deployment runbooks. | `partial` |
| OPS-G12 | Rollback path is documented | owning service repos | Each migration wave has rollback conditions, steps, and data safety notes. | `unknown` |
| OPS-G13 | Operator runbooks exist | owning service repos | Deploy, restore, rotate, incident response, and migration runbooks exist for each critical service. | `unknown` |
| OPS-G14 | Observability and health checks are explicit | `railiance-cluster` / `railiance-platform` / service repos | Health checks, logs, metrics, and endpoint probes are documented and tied to service catalog entries. | `unknown` |
| OPS-G15 | Inter-Hub ops bootstrap is available | `inter-hub` / `helix-forge` | `ops-hub` can be created through UI or migration, manifest activated, API consumer/key created, widgets seeded, and events accepted. | `partial` |
## Initial Migration Waves
| Wave | Goal | Required gates |
|---|---|---|
| `wave-0-catalog` | Establish the operational truth surface without moving services. | OPS-G01, OPS-G02, OPS-G15 |
| `wave-1-registry-proof` | Prove current Gitea registry publishing and evidence capture. | OPS-G03, OPS-G05, OPS-G08, OPS-G14 |
| `wave-2-backup-restore` | Confirm backups and restore paths for critical persistent state. | OPS-G06, OPS-G07, OPS-G13 |
| `wave-3-threephoenix-foundation` | Recreate cluster and platform foundations on railiance01/ThreePhoenix. | OPS-G09, OPS-G10 |
| `wave-4-service-migration` | Move or replace production responsibilities from CoulombCore to ThreePhoenix. | OPS-G04, OPS-G11, OPS-G12 plus service-specific gates |
## Evidence Shape
Each readiness gate should eventually be represented in `ops-hub` as a widget
or widget family with events like:
- `ops-readiness-gate-updated`
- `ops-endpoint-verified`
- `ops-backup-verified`
- `ops-restore-tested`
- `ops-risk-raised`
- `ops-migration-gate-passed`
- `ops-migration-gate-failed`
Until Inter-Hub can create all required records through API calls, the evidence
can be maintained in this repository and mirrored into Inter-Hub through the UI
or migrations.

View File

@@ -142,6 +142,48 @@ Assessment:
the quickstart as aspirational for bootstrap automation until Inter-Hub is
hardened.
## Confirmed Bootstrap Path
Checked against the live API and local Inter-Hub source on 2026-05-16.
Decision:
- Bootstrap `ops-hub` through the authenticated Inter-Hub admin UI where
possible, with a migration-backed fallback when a repeatable bootstrap is
needed before the public API is hardened.
- Treat Pattern A, API Consumer Hub, as the first implementation pattern.
- Store VSM classification in the manifest capability description for now,
because the current `hubs` table only has `slug`, `name`, `domain`, and
`hub_kind`; there are no first-class `hub_family`, `vsm_function`, or
`vsm_system` columns yet.
- Use `hub_kind = domain` for `ops-hub`.
- Record missing first-class VSM metadata fields and create endpoints as
Inter-Hub hardening work under T10.
Confirmed current support:
- Live `/Hubs` redirects to `/NewSession`, so hub creation is an authenticated
UI flow.
- Local `HubsController` supports creating and editing hub rows through the UI.
- Local `HubCapabilityManifestsController` supports draft creation, JSON-array
vocabulary editing, activation, and registry upsert.
- Local `ApiConsumersController` and `ApiKeysController` support authenticated
UI creation of consumers and static keys.
- Local `/api/v2/interaction-events` supports `POST` and validates event types.
Confirmed gaps:
- Live OpenAPI has no `POST /api/v2/hubs`.
- Live OpenAPI has no `POST /api/v2/widgets`; widgets are read-only in v2.
- There are no v2 endpoints for manifest draft creation, manifest activation,
API consumer creation, or API key creation.
- There is no `/api/v2/policy-scopes` endpoint.
- Live type registries currently return empty arrays, so the ops vocabulary
still needs manifest activation before events can be accepted.
- Event metadata is exposed in the response schema, but the v2 interaction
event create controller currently does not persist submitted metadata.
- Webhook dispatch still uses the hard-coded `"clicked"` event name.
## Architectural Decision
Start with **Pattern A: API Consumer Hub** for `ops-hub`, plus a manual or
@@ -291,7 +333,7 @@ The first explicit service-catalog gap:
```task
id: HF-WP-0001-T01
status: todo
status: done
priority: high
state_hub_task_id: "2587a3b8-3b9b-4948-acaf-1547644e4563"
```
@@ -315,6 +357,8 @@ Done when: there is a concrete, repeatable path to create the `ops-hub` row,
manifest, API consumer, and API key, with enough metadata to classify it as the
Operations / System 1 hub.
Output: `Confirmed Bootstrap Path` section in this workplan.
---
### T02 — Register ops-hub in Inter-Hub as the Operations hub
@@ -440,7 +484,7 @@ and annotations.
```task
id: HF-WP-0001-T06
status: todo
status: done
priority: medium
state_hub_task_id: "2a0b2f69-5a3d-433c-9cbd-85fd868b63d8"
```
@@ -465,6 +509,8 @@ Done when: a human can see the CoulombCore, local, railiance01, and
ThreePhoenix relationship, including the current Gitea registry state, without
reading multiple repo workplans or relying on shell history.
Output: `wiki/OpsHubInventory.md`.
---
### T07 — Instrument the current Gitea registry work as the first ops-hub signal
@@ -504,7 +550,7 @@ traceable back to the Railiance workplan.
```task
id: HF-WP-0001-T08
status: todo
status: done
priority: medium
state_hub_task_id: "72a58622-c3ac-4765-8026-5c2489af2058"
```
@@ -523,6 +569,8 @@ responsibility from CoulombCore to ThreePhoenix:
Done when: each gate has an owner repo, evidence requirement, and status.
Output: `wiki/OpsHubReadinessGates.md`.
---
### T09 — Decide whether to create a separate ops-hub repository