Files
the-custodian/docs/ops-hub-service-catalog.md
codex cf4be716e1 CUST-WP-0054 T01-T03: fleet architecture, de-hub runbook, drain plan
Documents the three-machine role model, fleet mesh topology, coulombcore
freeze policy, and ordered drain sequence. Adds railiance01 systemd tunnel
install assets and refreshes ops service inventory to reflect 2026-07-03
production placement (cluster State Hub, fleet mesh, draining coulombcore).
2026-07-04 00:29:55 +02:00

7.1 KiB

Ops Hub Service Catalog Now View

Source: ops/service-inventory.yml Inventory last reviewed: 2026-07-03

This is the repo-native first view for CUST-WP-0047. It exists so an operator can answer what is running where before the full standalone ops-hub application is available.

Summary

Metric Count
Environments 4
Hosts 3
Clusters 3
Services 11
Services: observed_ok 6
Services: unknown 5

Service Catalog

Service Where Owner Endpoint Health Data Access Top Gap
Gitea (gitea) CoulombCore
type: k3s; cluster: coulombcore-k3s; namespace: default
railiance-apps https://gitea.coulomb.social/v2/
Expected: status 401, OCI registry auth challenge
unknown
2026-05-16: Inventory draft records Helm release gitea, namespace default, app version 1.25.4, NodePort 32166, and registry auth challenge.
database:gitea-db
pvc:default/gitea-shared-storage
k8s: unknown (coulombcore-k3s/default) Package token and push/pull verification need current evidence.
Gitea Database (gitea-database) CoulombCore
type: k3s; cluster: coulombcore-k3s; namespace: databases
railiance-platform - unknown
2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md
- k8s: unknown (coulombcore-k3s/databases) Backup and restore evidence not recorded in ops inventory.
Gitea Shared Storage (gitea-shared-storage) CoulombCore
type: k3s; cluster: coulombcore-k3s; namespace: default
railiance-platform
railiance-apps
- unknown
2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md
- k8s: unknown (coulombcore-k3s/default/pvc/gitea-shared-storage) Package blob backup and restore evidence not confirmed.
State Hub (state-hub) CoulombCore
type: k3s; cluster: coulombcore-k3s; namespace: state-hub
state-hub
the-custodian
http://127.0.0.1:8000/state/health
Expected: status 200, health response
observed_ok
2026-07-03: Cluster hub healthy; railiance01 reaches via fleet forward tunnel.
postgresql:state-hub-db http: observed_ok (workstation tunnel state-hub-primary → cluster)
tunnel: observed_ok (railiance01 systemd fleet-state-hub-coulombcore → cluster)
Primary home must move to railiance01 per CUST-WP-0054-T05.
issue-core (issue-core) CoulombCore
type: k3s; cluster: coulombcore-k3s; namespace: issue-core
issue-core http://127.0.0.1:8765/healthz
Expected: status 200, version response
observed_ok
2026-07-02: REST emission live via cross-machine fleet path.
postgresql:issue-core tunnel: observed_ok (railiance01 fleet-issue-core-coulombcore → cluster) Target railiance01 overlay per CUST-WP-0054 drain Wave 4.
Core Hub (core-hub) CoulombCore
type: k3s; cluster: coulombcore-k3s; namespace: core-hub-staging
core-hub https://hub.coulomb.social/api/v2/hubs
Expected: status 200, hub list when authenticated
observed_ok
2026-07-02: Staging deployed; production cutover gated on CORE-WP-0005-T04.
postgresql:core-hub k8s: observed_ok (coulombcore-k3s/core-hub-staging) Production cutover to railiance01 pending operator approval.
Fleet Mesh (railiance01) (fleet-mesh-railiance01) Railiance01
type: systemd; host: railiance01
the-custodian
ops-bridge
http://127.0.0.1:18000/state/health
Expected: status 200
observed_ok
2026-07-03: Workstation reverse tunnels stopped; systemd forwards healthy.
- ssh-tunnel: observed_ok (railiance01 → coulombcore ClusterIPs) Migrate to atm-fleet-mesh cert_command when VAULT_TOKEN available.
Inter-Hub (inter-hub) ThreePhoenix Production
type: external; public_endpoint: https://hub.coulomb.social
inter-hub https://hub.coulomb.social/api/v2/openapi.json
Expected: status 200, OpenAPI document
unknown
2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md
- https: unknown (https://hub.coulomb.social) ops-hub bootstrap requires authenticated UI flow or deployment-side migration.
activity-core (activity-core) Railiance01
type: k3s; cluster: railiance01-k3s; namespace: activity-core
activity-core
the-custodian
activity-core API health endpoint
Expected: status 200, healthy DB and Temporal status
observed_ok
2026-05-23: API health, worker rollout, Temporal CLI schedule listing, and State Hub bridge were verified.
postgresql:activity-core
temporal:activity-core
nats:railiance01
k8s: observed_ok (railiance01-k3s/activity-core) Add explicit ops inventory probes and evidence events.
Ops Bridge (ops-bridge) Local Workstation
type: bridge; host: local-workstation
ops-bridge - observed_ok
2026-07-03: state-hub-railiance01 and issue-core-railiance01 stopped; not production-critical.
- ssh-tunnel: observed_ok (interactive dev tunnels only (k3s-api, state-hub-primary)) Install ops-bridge on railiance01 or keep systemd fleet-mesh units.
Haskell Build Agent (haskell-build-agent) Local Workstation
type: systemd; host: haskell-build-vm
the-custodian http://127.0.0.1:18000
Expected: VM can reach State Hub through SSH forward
unknown
undated: Build agent is a systemd service and registers with State Hub on boot.
- ssh: unknown (local workstation reverse tunnel port 12222) Current tunnel and capability registration need live evidence in ops-hub.

Open Operating Gaps

Gitea (gitea)

  • Package token and push/pull verification need current evidence.
  • Backup and restore evidence for database and shared storage not recorded in ops inventory.

Gitea Database (gitea-database)

  • Backup and restore evidence not recorded in ops inventory.

Gitea Shared Storage (gitea-shared-storage)

  • Package blob backup and restore evidence not confirmed.

State Hub (state-hub)

  • Primary home must move to railiance01 per CUST-WP-0054-T05.
  • Consistency sweep writebacks still target workstation paths.

issue-core (issue-core)

  • Target railiance01 overlay per CUST-WP-0054 drain Wave 4.

Core Hub (core-hub)

  • Production cutover to railiance01 pending operator approval.

Fleet Mesh (railiance01) (fleet-mesh-railiance01)

  • Migrate to atm-fleet-mesh cert_command when VAULT_TOKEN available.
  • Retire when State Hub and issue-core move to railiance01.

Inter-Hub (inter-hub)

  • ops-hub bootstrap requires authenticated UI flow or deployment-side migration.

activity-core (activity-core)

  • Add explicit ops inventory probes and evidence events.

Ops Bridge (ops-bridge)

  • Install ops-bridge on railiance01 or keep systemd fleet-mesh units.

Haskell Build Agent (haskell-build-agent)

  • Current tunnel and capability registration need live evidence in ops-hub.

Next Evidence Events

  • ops-service-observed for each runtime object confirmed by a probe.
  • ops-endpoint-verified for HTTP, HTTPS, tunnel, or cluster endpoints.
  • ops-access-path-checked for non-secret access path checks.
  • ops-backup-verified where backup and restore evidence exists.
  • ops-inventory-drift when observed state differs from this inventory.