Documents the three-machine role model, fleet mesh topology, coulombcore freeze policy, and ordered drain sequence. Adds railiance01 systemd tunnel install assets and refreshes ops service inventory to reflect 2026-07-03 production placement (cluster State Hub, fleet mesh, draining coulombcore).
95 lines
7.1 KiB
Markdown
95 lines
7.1 KiB
Markdown
# Ops Hub Service Catalog Now View
|
|
|
|
<!-- generated by ops/render_service_inventory.py; edit ops/service-inventory.yml instead -->
|
|
|
|
Source: `ops/service-inventory.yml`
|
|
Inventory last reviewed: `2026-07-03`
|
|
|
|
This is the repo-native first view for `CUST-WP-0047`. It exists so an
|
|
operator can answer what is running where before the full standalone
|
|
`ops-hub` application is available.
|
|
|
|
## Summary
|
|
|
|
| Metric | Count |
|
|
|---|---:|
|
|
| Environments | 4 |
|
|
| Hosts | 3 |
|
|
| Clusters | 3 |
|
|
| Services | 11 |
|
|
| Services: observed_ok | 6 |
|
|
| Services: unknown | 5 |
|
|
|
|
## Service Catalog
|
|
|
|
| Service | Where | Owner | Endpoint | Health | Data | Access | Top Gap |
|
|
|---|---|---|---|---|---|---|---|
|
|
| Gitea (gitea) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: default | railiance-apps | https://gitea.coulomb.social/v2/<br>Expected: status 401, OCI registry auth challenge | unknown<br>2026-05-16: Inventory draft records Helm release gitea, namespace default, app version 1.25.4, NodePort 32166, and registry auth challenge. | database:gitea-db<br>pvc:default/gitea-shared-storage | k8s: unknown (coulombcore-k3s/default) | Package token and push/pull verification need current evidence. |
|
|
| Gitea Database (gitea-database) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: databases | railiance-platform | - | unknown<br>2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | k8s: unknown (coulombcore-k3s/databases) | Backup and restore evidence not recorded in ops inventory. |
|
|
| Gitea Shared Storage (gitea-shared-storage) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: default | railiance-platform<br>railiance-apps | - | unknown<br>2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | k8s: unknown (coulombcore-k3s/default/pvc/gitea-shared-storage) | Package blob backup and restore evidence not confirmed. |
|
|
| State Hub (state-hub) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: state-hub | state-hub<br>the-custodian | http://127.0.0.1:8000/state/health<br>Expected: status 200, health response | observed_ok<br>2026-07-03: Cluster hub healthy; railiance01 reaches via fleet forward tunnel. | postgresql:state-hub-db | http: observed_ok (workstation tunnel state-hub-primary → cluster)<br>tunnel: observed_ok (railiance01 systemd fleet-state-hub-coulombcore → cluster) | Primary home must move to railiance01 per CUST-WP-0054-T05. |
|
|
| issue-core (issue-core) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: issue-core | issue-core | http://127.0.0.1:8765/healthz<br>Expected: status 200, version response | observed_ok<br>2026-07-02: REST emission live via cross-machine fleet path. | postgresql:issue-core | tunnel: observed_ok (railiance01 fleet-issue-core-coulombcore → cluster) | Target railiance01 overlay per CUST-WP-0054 drain Wave 4. |
|
|
| Core Hub (core-hub) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: core-hub-staging | core-hub | https://hub.coulomb.social/api/v2/hubs<br>Expected: status 200, hub list when authenticated | observed_ok<br>2026-07-02: Staging deployed; production cutover gated on CORE-WP-0005-T04. | postgresql:core-hub | k8s: observed_ok (coulombcore-k3s/core-hub-staging) | Production cutover to railiance01 pending operator approval. |
|
|
| Fleet Mesh (railiance01) (fleet-mesh-railiance01) | Railiance01<br>type: systemd; host: railiance01 | the-custodian<br>ops-bridge | http://127.0.0.1:18000/state/health<br>Expected: status 200 | observed_ok<br>2026-07-03: Workstation reverse tunnels stopped; systemd forwards healthy. | - | ssh-tunnel: observed_ok (railiance01 → coulombcore ClusterIPs) | Migrate to atm-fleet-mesh cert_command when VAULT_TOKEN available. |
|
|
| Inter-Hub (inter-hub) | ThreePhoenix Production<br>type: external; public_endpoint: https://hub.coulomb.social | inter-hub | https://hub.coulomb.social/api/v2/openapi.json<br>Expected: status 200, OpenAPI document | unknown<br>2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | https: unknown (https://hub.coulomb.social) | ops-hub bootstrap requires authenticated UI flow or deployment-side migration. |
|
|
| activity-core (activity-core) | Railiance01<br>type: k3s; cluster: railiance01-k3s; namespace: activity-core | activity-core<br>the-custodian | activity-core API health endpoint<br>Expected: status 200, healthy DB and Temporal status | observed_ok<br>2026-05-23: API health, worker rollout, Temporal CLI schedule listing, and State Hub bridge were verified. | postgresql:activity-core<br>temporal:activity-core<br>nats:railiance01 | k8s: observed_ok (railiance01-k3s/activity-core) | Add explicit ops inventory probes and evidence events. |
|
|
| Ops Bridge (ops-bridge) | Local Workstation<br>type: bridge; host: local-workstation | ops-bridge | - | observed_ok<br>2026-07-03: state-hub-railiance01 and issue-core-railiance01 stopped; not production-critical. | - | ssh-tunnel: observed_ok (interactive dev tunnels only (k3s-api, state-hub-primary)) | Install ops-bridge on railiance01 or keep systemd fleet-mesh units. |
|
|
| Haskell Build Agent (haskell-build-agent) | Local Workstation<br>type: systemd; host: haskell-build-vm | the-custodian | http://127.0.0.1:18000<br>Expected: VM can reach State Hub through SSH forward | unknown<br>undated: Build agent is a systemd service and registers with State Hub on boot. | - | ssh: unknown (local workstation reverse tunnel port 12222) | Current tunnel and capability registration need live evidence in ops-hub. |
|
|
|
|
## Open Operating Gaps
|
|
|
|
### Gitea (`gitea`)
|
|
|
|
- Package token and push/pull verification need current evidence.
|
|
- Backup and restore evidence for database and shared storage not recorded in ops inventory.
|
|
|
|
### Gitea Database (`gitea-database`)
|
|
|
|
- Backup and restore evidence not recorded in ops inventory.
|
|
|
|
### Gitea Shared Storage (`gitea-shared-storage`)
|
|
|
|
- Package blob backup and restore evidence not confirmed.
|
|
|
|
### State Hub (`state-hub`)
|
|
|
|
- Primary home must move to railiance01 per CUST-WP-0054-T05.
|
|
- Consistency sweep writebacks still target workstation paths.
|
|
|
|
### issue-core (`issue-core`)
|
|
|
|
- Target railiance01 overlay per CUST-WP-0054 drain Wave 4.
|
|
|
|
### Core Hub (`core-hub`)
|
|
|
|
- Production cutover to railiance01 pending operator approval.
|
|
|
|
### Fleet Mesh (railiance01) (`fleet-mesh-railiance01`)
|
|
|
|
- Migrate to atm-fleet-mesh cert_command when VAULT_TOKEN available.
|
|
- Retire when State Hub and issue-core move to railiance01.
|
|
|
|
### Inter-Hub (`inter-hub`)
|
|
|
|
- ops-hub bootstrap requires authenticated UI flow or deployment-side migration.
|
|
|
|
### activity-core (`activity-core`)
|
|
|
|
- Add explicit ops inventory probes and evidence events.
|
|
|
|
### Ops Bridge (`ops-bridge`)
|
|
|
|
- Install ops-bridge on railiance01 or keep systemd fleet-mesh units.
|
|
|
|
### Haskell Build Agent (`haskell-build-agent`)
|
|
|
|
- Current tunnel and capability registration need live evidence in ops-hub.
|
|
|
|
## Next Evidence Events
|
|
|
|
- `ops-service-observed` for each runtime object confirmed by a probe.
|
|
- `ops-endpoint-verified` for HTTP, HTTPS, tunnel, or cluster endpoints.
|
|
- `ops-access-path-checked` for non-secret access path checks.
|
|
- `ops-backup-verified` where backup and restore evidence exists.
|
|
- `ops-inventory-drift` when observed state differs from this inventory.
|