Files
the-custodian/docs/ops-hub-service-catalog.md
codex cf4be716e1 CUST-WP-0054 T01-T03: fleet architecture, de-hub runbook, drain plan
Documents the three-machine role model, fleet mesh topology, coulombcore
freeze policy, and ordered drain sequence. Adds railiance01 systemd tunnel
install assets and refreshes ops service inventory to reflect 2026-07-03
production placement (cluster State Hub, fleet mesh, draining coulombcore).
2026-07-04 00:29:55 +02:00

95 lines
7.1 KiB
Markdown

# Ops Hub Service Catalog Now View
<!-- generated by ops/render_service_inventory.py; edit ops/service-inventory.yml instead -->
Source: `ops/service-inventory.yml`
Inventory last reviewed: `2026-07-03`
This is the repo-native first view for `CUST-WP-0047`. It exists so an
operator can answer what is running where before the full standalone
`ops-hub` application is available.
## Summary
| Metric | Count |
|---|---:|
| Environments | 4 |
| Hosts | 3 |
| Clusters | 3 |
| Services | 11 |
| Services: observed_ok | 6 |
| Services: unknown | 5 |
## Service Catalog
| Service | Where | Owner | Endpoint | Health | Data | Access | Top Gap |
|---|---|---|---|---|---|---|---|
| Gitea (gitea) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: default | railiance-apps | https://gitea.coulomb.social/v2/<br>Expected: status 401, OCI registry auth challenge | unknown<br>2026-05-16: Inventory draft records Helm release gitea, namespace default, app version 1.25.4, NodePort 32166, and registry auth challenge. | database:gitea-db<br>pvc:default/gitea-shared-storage | k8s: unknown (coulombcore-k3s/default) | Package token and push/pull verification need current evidence. |
| Gitea Database (gitea-database) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: databases | railiance-platform | - | unknown<br>2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | k8s: unknown (coulombcore-k3s/databases) | Backup and restore evidence not recorded in ops inventory. |
| Gitea Shared Storage (gitea-shared-storage) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: default | railiance-platform<br>railiance-apps | - | unknown<br>2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | k8s: unknown (coulombcore-k3s/default/pvc/gitea-shared-storage) | Package blob backup and restore evidence not confirmed. |
| State Hub (state-hub) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: state-hub | state-hub<br>the-custodian | http://127.0.0.1:8000/state/health<br>Expected: status 200, health response | observed_ok<br>2026-07-03: Cluster hub healthy; railiance01 reaches via fleet forward tunnel. | postgresql:state-hub-db | http: observed_ok (workstation tunnel state-hub-primary → cluster)<br>tunnel: observed_ok (railiance01 systemd fleet-state-hub-coulombcore → cluster) | Primary home must move to railiance01 per CUST-WP-0054-T05. |
| issue-core (issue-core) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: issue-core | issue-core | http://127.0.0.1:8765/healthz<br>Expected: status 200, version response | observed_ok<br>2026-07-02: REST emission live via cross-machine fleet path. | postgresql:issue-core | tunnel: observed_ok (railiance01 fleet-issue-core-coulombcore → cluster) | Target railiance01 overlay per CUST-WP-0054 drain Wave 4. |
| Core Hub (core-hub) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: core-hub-staging | core-hub | https://hub.coulomb.social/api/v2/hubs<br>Expected: status 200, hub list when authenticated | observed_ok<br>2026-07-02: Staging deployed; production cutover gated on CORE-WP-0005-T04. | postgresql:core-hub | k8s: observed_ok (coulombcore-k3s/core-hub-staging) | Production cutover to railiance01 pending operator approval. |
| Fleet Mesh (railiance01) (fleet-mesh-railiance01) | Railiance01<br>type: systemd; host: railiance01 | the-custodian<br>ops-bridge | http://127.0.0.1:18000/state/health<br>Expected: status 200 | observed_ok<br>2026-07-03: Workstation reverse tunnels stopped; systemd forwards healthy. | - | ssh-tunnel: observed_ok (railiance01 → coulombcore ClusterIPs) | Migrate to atm-fleet-mesh cert_command when VAULT_TOKEN available. |
| Inter-Hub (inter-hub) | ThreePhoenix Production<br>type: external; public_endpoint: https://hub.coulomb.social | inter-hub | https://hub.coulomb.social/api/v2/openapi.json<br>Expected: status 200, OpenAPI document | unknown<br>2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | https: unknown (https://hub.coulomb.social) | ops-hub bootstrap requires authenticated UI flow or deployment-side migration. |
| activity-core (activity-core) | Railiance01<br>type: k3s; cluster: railiance01-k3s; namespace: activity-core | activity-core<br>the-custodian | activity-core API health endpoint<br>Expected: status 200, healthy DB and Temporal status | observed_ok<br>2026-05-23: API health, worker rollout, Temporal CLI schedule listing, and State Hub bridge were verified. | postgresql:activity-core<br>temporal:activity-core<br>nats:railiance01 | k8s: observed_ok (railiance01-k3s/activity-core) | Add explicit ops inventory probes and evidence events. |
| Ops Bridge (ops-bridge) | Local Workstation<br>type: bridge; host: local-workstation | ops-bridge | - | observed_ok<br>2026-07-03: state-hub-railiance01 and issue-core-railiance01 stopped; not production-critical. | - | ssh-tunnel: observed_ok (interactive dev tunnels only (k3s-api, state-hub-primary)) | Install ops-bridge on railiance01 or keep systemd fleet-mesh units. |
| Haskell Build Agent (haskell-build-agent) | Local Workstation<br>type: systemd; host: haskell-build-vm | the-custodian | http://127.0.0.1:18000<br>Expected: VM can reach State Hub through SSH forward | unknown<br>undated: Build agent is a systemd service and registers with State Hub on boot. | - | ssh: unknown (local workstation reverse tunnel port 12222) | Current tunnel and capability registration need live evidence in ops-hub. |
## Open Operating Gaps
### Gitea (`gitea`)
- Package token and push/pull verification need current evidence.
- Backup and restore evidence for database and shared storage not recorded in ops inventory.
### Gitea Database (`gitea-database`)
- Backup and restore evidence not recorded in ops inventory.
### Gitea Shared Storage (`gitea-shared-storage`)
- Package blob backup and restore evidence not confirmed.
### State Hub (`state-hub`)
- Primary home must move to railiance01 per CUST-WP-0054-T05.
- Consistency sweep writebacks still target workstation paths.
### issue-core (`issue-core`)
- Target railiance01 overlay per CUST-WP-0054 drain Wave 4.
### Core Hub (`core-hub`)
- Production cutover to railiance01 pending operator approval.
### Fleet Mesh (railiance01) (`fleet-mesh-railiance01`)
- Migrate to atm-fleet-mesh cert_command when VAULT_TOKEN available.
- Retire when State Hub and issue-core move to railiance01.
### Inter-Hub (`inter-hub`)
- ops-hub bootstrap requires authenticated UI flow or deployment-side migration.
### activity-core (`activity-core`)
- Add explicit ops inventory probes and evidence events.
### Ops Bridge (`ops-bridge`)
- Install ops-bridge on railiance01 or keep systemd fleet-mesh units.
### Haskell Build Agent (`haskell-build-agent`)
- Current tunnel and capability registration need live evidence in ops-hub.
## Next Evidence Events
- `ops-service-observed` for each runtime object confirmed by a probe.
- `ops-endpoint-verified` for HTTP, HTTPS, tunnel, or cluster endpoints.
- `ops-access-path-checked` for non-secret access path checks.
- `ops-backup-verified` where backup and restore evidence exists.
- `ops-inventory-drift` when observed state differs from this inventory.