feat(ops): add ops-hub service inventory now view (CUST-WP-0047)

Seed a non-secret service inventory (environments, hosts, clusters,
services, endpoints, access paths, evidence, gaps) with a JSON schema,
a renderer, and a generated service-catalog view. Adds the
`make ops-inventory-view` target, probe ActivityDefinition, and docs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-07 00:12:30 +02:00
parent 4bdfeb1850
commit b1aac08eb2
9 changed files with 1238 additions and 0 deletions

View File

@@ -16,6 +16,10 @@ CUSTODIAN_KEY := $(HOME)/.ssh/id_custodian_agent
RAILIANCE_INFRA := $(HOME)/railiance-infra
AGENT_VARS_FILE := $(RAILIANCE_INFRA)/ansible/inventory/group_vars/all.yaml
.PHONY: ops-inventory-view
ops-inventory-view: ## Render the ops-hub service catalog now view
python3 ops/render_service_inventory.py
.PHONY: custodian-keygen
custodian-keygen: ## Generate custodian agent SSH keypair (one-time setup)
@if [ -f "$(CUSTODIAN_KEY)" ]; then \

View File

@@ -0,0 +1,88 @@
---
id: "40d15a87-7ff6-4d8e-992c-37df15f95110"
name: "Ops Service Inventory Probes"
type: activity-definition
version: "0.1"
enabled: false
owner: custodian
governance: custodian
status: proposed
created: "2026-06-05"
trigger:
type: cron
cron_expression: "15 * * * *"
timezone: Europe/Berlin
misfire_policy: skip
context_sources:
- type: static
bind_to: context.inventory_path
config:
value: /home/worsch/the-custodian/ops/service-inventory.yml
- type: static
bind_to: context.catalog_path
config:
value: /home/worsch/the-custodian/docs/ops-hub-service-catalog.md
---
# ActivityDefinition: Ops Service Inventory Probes
## Purpose
This disabled draft is the activity-core handoff point for
`CUST-WP-0047 - Ops Hub Service Inventory Now View`.
The future enabled routine should read the non-secret inventory, run repeatable
probes for declared endpoints and access paths, render the catalog view, and
submit non-secret ops evidence events against stable inventory ids.
## Runner Status
This definition is intentionally `enabled: false`.
Do not enable it until both of these are true:
- activity-core has an inventory probe runner or State Hub resolver that can
execute the checks without embedding secrets in ActivityRun context
- the ops-hub Inter-Hub widget/event sink can accept `ops-service-observed`,
`ops-endpoint-verified`, `ops-access-path-checked`, `ops-backup-verified`,
and `ops-inventory-drift` events
## Trigger
Hourly at minute 15 in `Europe/Berlin`, with `misfire_policy: skip`.
This offset avoids colliding with the hourly RecentlyOnScope run at minute 0.
## Probe Candidates
Initial deterministic probes:
- State Hub local health endpoint:
`http://127.0.0.1:8000/state/health`
- Inter-Hub OpenAPI endpoint:
`https://hub.coulomb.social/api/v2/openapi.json`
- Gitea OCI registry auth challenge:
`https://gitea.coulomb.social/v2/`
- activity-core API health and Temporal schedule availability
- ops-bridge tunnel reachability
- Haskell build-agent State Hub registration and tunnel state
## Output Contract
Each successful run should produce:
- an updated `docs/ops-hub-service-catalog.md`
- one evidence event per checked service/endpoint/access path
- one ActivityRun with compact non-secret summary metadata
- no credentials, tokens, cookies, private key material, or sensitive command
output in context snapshots, event metadata, reports, or logs
## Event Mapping
| Probe result | Event type |
|---|---|
| Runtime object observed | `ops-service-observed` |
| HTTP/HTTPS/tunnel endpoint matches expected signal | `ops-endpoint-verified` |
| SSH, Kubernetes, or HTTP access path checked | `ops-access-path-checked` |
| Backup and restore evidence found | `ops-backup-verified` |
| Observed runtime differs from inventory | `ops-inventory-drift` |

View File

@@ -0,0 +1,77 @@
# Ops Hub Service Catalog Now View
<!-- generated by ops/render_service_inventory.py; edit ops/service-inventory.yml instead -->
Source: `ops/service-inventory.yml`
Inventory last reviewed: `2026-06-05`
This is the repo-native first view for `CUST-WP-0047`. It exists so an
operator can answer what is running where before the full standalone
`ops-hub` application is available.
## Summary
| Metric | Count |
|---|---:|
| Environments | 4 |
| Hosts | 3 |
| Clusters | 3 |
| Services | 8 |
| Services: observed_ok | 2 |
| Services: unknown | 6 |
## Service Catalog
| Service | Where | Owner | Endpoint | Health | Data | Access | Top Gap |
|---|---|---|---|---|---|---|---|
| Gitea (gitea) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: default | railiance-apps | https://gitea.coulomb.social/v2/<br>Expected: status 401, OCI registry auth challenge | unknown<br>2026-05-16: Inventory draft records Helm release gitea, namespace default, app version 1.25.4, NodePort 32166, and registry auth challenge. | database:gitea-db<br>pvc:default/gitea-shared-storage | k8s: unknown (coulombcore-k3s/default) | Package token and push/pull verification need current evidence. |
| Gitea Database (gitea-database) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: databases | railiance-platform | - | unknown<br>2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | k8s: unknown (coulombcore-k3s/databases) | Backup and restore evidence not recorded in ops inventory. |
| Gitea Shared Storage (gitea-shared-storage) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: default | railiance-platform<br>railiance-apps | - | unknown<br>2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | k8s: unknown (coulombcore-k3s/default/pvc/gitea-shared-storage) | Package blob backup and restore evidence not confirmed. |
| State Hub (state-hub) | Local Workstation<br>type: local-process; host: local-workstation; ports: 8000 | state-hub<br>the-custodian | http://127.0.0.1:8000/state/health<br>Expected: status 200, health response | observed_ok<br>2026-06-05: State Hub accepted inbox, task, and progress API calls. | postgresql:state-hub | http: observed_ok (http://127.0.0.1:8000) | Future cluster deployment readiness still needs ops evidence. |
| Inter-Hub (inter-hub) | ThreePhoenix Production<br>type: external; public_endpoint: https://hub.coulomb.social | inter-hub | https://hub.coulomb.social/api/v2/openapi.json<br>Expected: status 200, OpenAPI document | unknown<br>2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | https: unknown (https://hub.coulomb.social) | ops-hub bootstrap requires authenticated UI flow or deployment-side migration. |
| activity-core (activity-core) | Railiance01<br>type: k3s; cluster: railiance01-k3s; namespace: activity-core | activity-core<br>the-custodian | activity-core API health endpoint<br>Expected: status 200, healthy DB and Temporal status | observed_ok<br>2026-05-23: API health, worker rollout, Temporal CLI schedule listing, and State Hub bridge were verified. | postgresql:activity-core<br>temporal:activity-core<br>nats:railiance01 | k8s: observed_ok (railiance01-k3s/activity-core) | Add explicit ops inventory probes and evidence events. |
| Ops Bridge (ops-bridge) | Local Workstation<br>type: bridge; host: local-workstation | ops-bridge | - | unknown<br>2026-05-16: Bridge is useful for connected-server visibility but is not itself the service catalog. | - | ssh-tunnel: unknown (connected remote servers) | Emit reachability evidence into ops-hub instead of relying on bridge state as inventory. |
| Haskell Build Agent (haskell-build-agent) | Local Workstation<br>type: systemd; host: haskell-build-vm | the-custodian | http://127.0.0.1:18000<br>Expected: VM can reach State Hub through SSH forward | unknown<br>undated: Build agent is a systemd service and registers with State Hub on boot. | - | ssh: unknown (local workstation reverse tunnel port 12222) | Current tunnel and capability registration need live evidence in ops-hub. |
## Open Operating Gaps
### Gitea (`gitea`)
- Package token and push/pull verification need current evidence.
- Backup and restore evidence for database and shared storage not recorded in ops inventory.
### Gitea Database (`gitea-database`)
- Backup and restore evidence not recorded in ops inventory.
### Gitea Shared Storage (`gitea-shared-storage`)
- Package blob backup and restore evidence not confirmed.
### State Hub (`state-hub`)
- Future cluster deployment readiness still needs ops evidence.
### Inter-Hub (`inter-hub`)
- ops-hub bootstrap requires authenticated UI flow or deployment-side migration.
### activity-core (`activity-core`)
- Add explicit ops inventory probes and evidence events.
### Ops Bridge (`ops-bridge`)
- Emit reachability evidence into ops-hub instead of relying on bridge state as inventory.
### Haskell Build Agent (`haskell-build-agent`)
- Current tunnel and capability registration need live evidence in ops-hub.
## Next Evidence Events
- `ops-service-observed` for each runtime object confirmed by a probe.
- `ops-endpoint-verified` for HTTP, HTTPS, tunnel, or cluster endpoints.
- `ops-access-path-checked` for non-secret access path checks.
- `ops-backup-verified` where backup and restore evidence exists.
- `ops-inventory-drift` when observed state differs from this inventory.

View File

@@ -0,0 +1,94 @@
# Ops Hub Service Inventory
Date: 2026-06-05
## Purpose
The first ops-hub "now view" should answer one practical question:
> What service is running where, who owns it, how is it reached, and what
> evidence says it is alive?
The lowest-effort path is a small read model, not a full new application. The
read model starts as `ops/service-inventory.yml`, can be surfaced through
Inter-Hub ops widgets, and can later be ingested by the standalone `ops-hub`
repo planned in `CUST-WP-0025`.
## Operating Model
- Git owns the declared inventory.
- Inter-Hub widgets expose the visible ops entities.
- Interaction events provide timestamped operational evidence.
- activity-core runs repeatable probes and writes evidence.
- State Hub continues to own workstreams, tasks, decisions, and progress. It is
not the service catalog.
## Minimal Record Shape
Each service record should include:
- `id`: stable lowercase service id, for example `state-hub`.
- `name`: human-readable name.
- `lifecycle_state`: `observed`, `planned`, `target`, or `retired`.
- `health_status`: `unknown`, `observed_ok`, `degraded`, `down`, or `planned`.
- `environment`: environment id where the service currently belongs.
- `owner_repos`: repos that own desired state, runtime code, or runbooks.
- `runtime`: runtime kind and location details, such as `local-process`,
`k3s`, `systemd`, `external`, or `bridge`.
- `endpoints`: public, local, cluster, or tunnel endpoints with expected
non-secret checks.
- `backing_stores`: databases, PVCs, object stores, or external stores that
must be backed up with the service.
- `access_paths`: non-secret descriptions of SSH, Kubernetes, HTTP, or tunnel
paths.
- `evidence`: links to docs, progress events, probe results, or workplans.
- `gaps`: missing evidence or operating controls.
The schema lives at `schemas/ops-service-inventory.schema.json`.
## First View
The initial ops-hub view can be a dense table:
| Column | Meaning |
|---|---|
| Service | `name` plus `id` |
| Where | environment, host, cluster, namespace |
| Owner | owner repo and desired state source |
| Endpoint | primary endpoint and expected check |
| Health | latest health status and last evidence timestamp |
| Data | backing stores and backup gap summary |
| Access | access path status |
| Gaps | highest-priority missing operating evidence |
This is enough to make scattered operational reality visible without waiting
for a full incident system, runbook executor, or custom database.
The repo-native version is rendered to `docs/ops-hub-service-catalog.md`:
```bash
make ops-inventory-view
```
## Evidence Events
Use a small event vocabulary first:
- `ops-service-observed`: service/runtime object was observed.
- `ops-endpoint-verified`: endpoint responded as expected.
- `ops-access-path-checked`: access path was checked without storing secrets.
- `ops-backup-verified`: backup and restore evidence exists.
- `ops-inventory-drift`: observed state differs from declared inventory.
Event metadata should reference the stable inventory id and include non-secret
probe output only.
## Promotion Path
1. Keep `ops/service-inventory.yml` as the source artifact.
2. Seed or update Inter-Hub widgets from the inventory ids.
3. Let activity-core run probes and submit evidence events.
4. Build the first ops-hub view from inventory plus latest evidence.
5. When the standalone `ops-hub` repo exists, ingest the same inventory and
evidence events into the proper Service, AccessPath, Runbook, and Incident
models from `CUST-WP-0025`.

View File

@@ -6,10 +6,24 @@ Operational runbooks and incident reports for the Railiance/Custodian infrastruc
```
ops/
service-inventory.yml — non-secret service/location/evidence seed for ops-hub
runbooks/ — how-to guides for recurring operational tasks and known issues
incidents/ — post-incident reports (append-only, one file per incident)
```
## Inventory
| Artifact | Covers |
|----------|--------|
| [service-inventory.yml](service-inventory.yml) | Initial ops-hub service inventory: environments, hosts, clusters, services, endpoints, access paths, evidence, and gaps |
| [../docs/ops-hub-service-catalog.md](../docs/ops-hub-service-catalog.md) | Rendered service catalog now view generated from the inventory |
Render the first catalog view with:
```bash
make ops-inventory-view
```
## Runbooks
| Runbook | Covers |

View File

@@ -0,0 +1,216 @@
#!/usr/bin/env python3
"""Render the ops service inventory into a compact Markdown now view."""
from __future__ import annotations
import argparse
from collections import Counter
from pathlib import Path
from typing import Any
try:
import yaml
except ImportError as exc: # pragma: no cover - environment guard
raise SystemExit("PyYAML is required to render ops/service-inventory.yml") from exc
DEFAULT_INPUT = Path("ops/service-inventory.yml")
DEFAULT_OUTPUT = Path("docs/ops-hub-service-catalog.md")
def text(value: Any, default: str = "-") -> str:
if value is None:
return default
if isinstance(value, str):
return value if value else default
return str(value)
def md(value: Any) -> str:
return text(value).replace("|", "\\|").replace("\n", "<br>")
def joined(values: list[Any] | None, limit: int | None = None) -> str:
if not values:
return "-"
items = [text(v) for v in values]
if limit is not None and len(items) > limit:
shown = items[:limit]
shown.append(f"+{len(items) - limit} more")
items = shown
return "<br>".join(md(item) for item in items)
def endpoint_label(endpoint: dict[str, Any]) -> str:
label = endpoint.get("url") or endpoint.get("id") or "-"
checks: list[str] = []
if endpoint.get("expected_status") is not None:
checks.append(f"status {endpoint['expected_status']}")
if endpoint.get("expected_signal"):
checks.append(endpoint["expected_signal"])
if checks:
label = f"{label}<br>Expected: {', '.join(checks)}"
return md(label)
def primary_endpoint(service: dict[str, Any]) -> str:
endpoints = service.get("endpoints") or []
if not endpoints:
return "-"
return endpoint_label(endpoints[0])
def runtime_label(service: dict[str, Any], envs: dict[str, dict[str, Any]]) -> str:
env_id = service.get("environment")
env = envs.get(env_id, {})
parts = [env.get("name") or env_id or "-"]
runtime = service.get("runtime") or {}
details: list[str] = []
for key in ("type", "cluster", "namespace", "host", "public_endpoint"):
if runtime.get(key):
details.append(f"{key}: {runtime[key]}")
if runtime.get("ports"):
details.append("ports: " + ", ".join(str(p) for p in runtime["ports"]))
if details:
parts.append("; ".join(details))
return "<br>".join(md(part) for part in parts)
def access_label(service: dict[str, Any]) -> str:
paths = service.get("access_paths") or []
if not paths:
return "-"
labels = []
for path in paths[:2]:
labels.append(
f"{path.get('type', '-')}: {path.get('status', 'unknown')} "
f"({path.get('target', '-')})"
)
if len(paths) > 2:
labels.append(f"+{len(paths) - 2} more")
return "<br>".join(md(label) for label in labels)
def latest_evidence(service: dict[str, Any]) -> str:
evidence = service.get("evidence") or []
if not evidence:
return "-"
dated = [item for item in evidence if item.get("observed_at")]
latest = max(dated, key=lambda item: item["observed_at"]) if dated else evidence[-1]
when = latest.get("observed_at") or "undated"
summary = latest.get("summary") or latest.get("source") or "-"
return md(f"{when}: {summary}")
def service_table(inventory: dict[str, Any]) -> str:
envs = {env["id"]: env for env in inventory.get("environments", [])}
rows = [
"| Service | Where | Owner | Endpoint | Health | Data | Access | Top Gap |",
"|---|---|---|---|---|---|---|---|",
]
for service in inventory.get("services", []):
gaps = service.get("gaps") or []
rows.append(
"| "
+ " | ".join(
[
md(f"{service.get('name')} ({service.get('id')})"),
runtime_label(service, envs),
joined(service.get("owner_repos"), limit=3),
primary_endpoint(service),
md(f"{service.get('health_status', 'unknown')}<br>{latest_evidence(service)}"),
joined(service.get("backing_stores"), limit=3),
access_label(service),
md(gaps[0] if gaps else "-"),
]
)
+ " |"
)
return "\n".join(rows)
def summary_table(inventory: dict[str, Any]) -> str:
services = inventory.get("services", [])
health = Counter(service.get("health_status", "unknown") for service in services)
rows = [
"| Metric | Count |",
"|---|---:|",
f"| Environments | {len(inventory.get('environments', []))} |",
f"| Hosts | {len(inventory.get('hosts', []))} |",
f"| Clusters | {len(inventory.get('clusters', []))} |",
f"| Services | {len(services)} |",
]
for status, count in sorted(health.items()):
rows.append(f"| Services: {md(status)} | {count} |")
return "\n".join(rows)
def gaps_section(inventory: dict[str, Any]) -> str:
lines = ["## Open Operating Gaps", ""]
for service in inventory.get("services", []):
gaps = service.get("gaps") or []
if not gaps:
continue
lines.append(f"### {service.get('name')} (`{service.get('id')}`)")
lines.append("")
for gap in gaps:
lines.append(f"- {gap}")
lines.append("")
return "\n".join(lines).rstrip()
def render(inventory: dict[str, Any]) -> str:
source = "ops/service-inventory.yml"
reviewed = inventory.get("last_reviewed", "unknown")
lines = [
"# Ops Hub Service Catalog Now View",
"",
"<!-- generated by ops/render_service_inventory.py; edit ops/service-inventory.yml instead -->",
"",
f"Source: `{source}`",
f"Inventory last reviewed: `{reviewed}`",
"",
"This is the repo-native first view for `CUST-WP-0047`. It exists so an",
"operator can answer what is running where before the full standalone",
"`ops-hub` application is available.",
"",
"## Summary",
"",
summary_table(inventory),
"",
"## Service Catalog",
"",
service_table(inventory),
"",
gaps_section(inventory),
"",
"## Next Evidence Events",
"",
"- `ops-service-observed` for each runtime object confirmed by a probe.",
"- `ops-endpoint-verified` for HTTP, HTTPS, tunnel, or cluster endpoints.",
"- `ops-access-path-checked` for non-secret access path checks.",
"- `ops-backup-verified` where backup and restore evidence exists.",
"- `ops-inventory-drift` when observed state differs from this inventory.",
"",
]
return "\n".join(lines)
def main() -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--input", type=Path, default=DEFAULT_INPUT)
parser.add_argument("--output", type=Path, default=DEFAULT_OUTPUT)
args = parser.parse_args()
inventory = yaml.safe_load(args.input.read_text(encoding="utf-8"))
rendered = render(inventory)
args.output.parent.mkdir(parents=True, exist_ok=True)
args.output.write_text(rendered, encoding="utf-8")
print(f"rendered {args.output} from {args.input}")
return 0
if __name__ == "__main__":
raise SystemExit(main())

342
ops/service-inventory.yml Normal file
View File

@@ -0,0 +1,342 @@
version: 1
last_reviewed: "2026-06-05"
policy:
non_secret_inventory: true
secrets_rule: "Do not store credentials, tokens, private addresses that are not already operationally documented, or command output containing secrets."
sources:
- path: "/home/worsch/helix-forge/wiki/OpsHubInventory.md"
summary: "Initial ops-hub inventory draft with environments, hosts, services, endpoints, gaps, and first widget ids."
- path: "/home/worsch/the-custodian/workplans/CUST-WP-0025-fos-hub-bootstrap.md"
summary: "Long-term ops-hub scaffold, models, health probes, access paths, and now-view work."
- path: "/home/worsch/the-custodian/workplans/CUST-WP-0046-hourly-recently-on-scope-activity-core.md"
summary: "Evidence that activity-core runs on Railiance01 and can reach State Hub through the in-cluster bridge."
- path: "/home/worsch/the-custodian/infra/build-machines/README.md"
summary: "Local workstation and build VM tunnel pattern."
environments:
- id: local
name: "Local Workstation"
role: "Workstation development and local operations"
lifecycle_state: observed
- id: coulombcore
name: "CoulombCore"
role: "Transitional production-like runtime"
lifecycle_state: observed
- id: railiance01
name: "Railiance01"
role: "First ThreePhoenix foundation node"
lifecycle_state: observed
- id: threephoenix-prod
name: "ThreePhoenix Production"
role: "Target governed production topology"
lifecycle_state: planned
hosts:
- id: local-workstation
environment: local
address: "local/private"
role: "State Hub and operator workstation runtime"
evidence:
- type: document
source: "/home/worsch/the-custodian/infra/build-machines/README.md"
- id: coulombcore
environment: coulombcore
address: "92.205.130.254"
role: "Current live production-like server"
evidence:
- type: document
source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md"
- id: railiance01
environment: railiance01
address: "92.205.62.239"
role: "First ThreePhoenix foundation node"
evidence:
- type: document
source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md"
clusters:
- id: coulombcore-k3s
environment: coulombcore
host: coulombcore
kind: k3s
lifecycle_state: observed
notes: "Current operational Kubernetes runtime for Gitea and related services."
- id: railiance01-k3s
environment: railiance01
host: railiance01
kind: k3s
lifecycle_state: observed
notes: "Runtime substrate for activity-core production service evidence."
- id: threephoenix-k3s
environment: threephoenix-prod
kind: k3s
lifecycle_state: planned
notes: "Target governed production cluster shape."
services:
- id: gitea
name: "Gitea"
kind: application
lifecycle_state: observed
health_status: unknown
environment: coulombcore
owner_repos:
- railiance-apps
desired_state_sources:
- "/home/worsch/railiance-forge/docs/gitea-package-registry.md"
- "/home/worsch/the-custodian/ops/runbooks/gitea-coulombcore.md"
runtime:
type: k3s
cluster: coulombcore-k3s
namespace: default
workload_refs:
- "helm:gitea"
- "nodePort:32166"
endpoints:
- id: gitea-oci-registry
type: https
url: "https://gitea.coulomb.social/v2/"
expected_status: 401
expected_signal: "OCI registry auth challenge"
widget_ref: "ops:endpoint:gitea-registry"
backing_stores:
- "database:gitea-db"
- "pvc:default/gitea-shared-storage"
access_paths:
- type: k8s
target: "coulombcore-k3s/default"
status: unknown
evidence:
- type: document
observed_at: "2026-05-16"
source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md"
summary: "Inventory draft records Helm release gitea, namespace default, app version 1.25.4, NodePort 32166, and registry auth challenge."
gaps:
- "Package token and push/pull verification need current evidence."
- "Backup and restore evidence for database and shared storage not recorded in ops inventory."
- id: gitea-database
name: "Gitea Database"
kind: datastore
lifecycle_state: observed
health_status: unknown
environment: coulombcore
owner_repos:
- railiance-platform
runtime:
type: k3s
cluster: coulombcore-k3s
namespace: databases
workload_refs:
- "database:gitea-db"
endpoints: []
backing_stores: []
access_paths:
- type: k8s
target: "coulombcore-k3s/databases"
status: unknown
evidence:
- type: document
observed_at: "2026-05-16"
source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md"
gaps:
- "Backup and restore evidence not recorded in ops inventory."
- id: gitea-shared-storage
name: "Gitea Shared Storage"
kind: storage
lifecycle_state: observed
health_status: unknown
environment: coulombcore
owner_repos:
- railiance-platform
- railiance-apps
runtime:
type: k3s
cluster: coulombcore-k3s
namespace: default
workload_refs:
- "pvc:default/gitea-shared-storage"
endpoints: []
backing_stores: []
access_paths:
- type: k8s
target: "coulombcore-k3s/default/pvc/gitea-shared-storage"
status: unknown
evidence:
- type: document
observed_at: "2026-05-16"
source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md"
gaps:
- "Package blob backup and restore evidence not confirmed."
- id: state-hub
name: "State Hub"
kind: coordination-service
lifecycle_state: observed
health_status: observed_ok
environment: local
owner_repos:
- state-hub
- the-custodian
desired_state_sources:
- "/home/worsch/state-hub"
- "/home/worsch/the-custodian/state-hub/README.md"
runtime:
type: local-process
host: local-workstation
ports:
- 8000
endpoints:
- id: state-hub-local-api
type: http
url: "http://127.0.0.1:8000/state/health"
expected_status: 200
expected_signal: "health response"
backing_stores:
- "postgresql:state-hub"
access_paths:
- type: http
target: "http://127.0.0.1:8000"
status: observed_ok
evidence:
- type: session-probe
observed_at: "2026-06-05"
source: "Codex session curl to local State Hub"
summary: "State Hub accepted inbox, task, and progress API calls."
gaps:
- "Future cluster deployment readiness still needs ops evidence."
- id: inter-hub
name: "Inter-Hub"
kind: governance-service
lifecycle_state: observed
health_status: unknown
environment: threephoenix-prod
owner_repos:
- inter-hub
runtime:
type: external
public_endpoint: "https://hub.coulomb.social"
endpoints:
- id: inter-hub-openapi
type: https
url: "https://hub.coulomb.social/api/v2/openapi.json"
expected_status: 200
expected_signal: "OpenAPI document"
- id: inter-hub-ui
type: https
url: "https://hub.coulomb.social/Hubs"
expected_status: 302
expected_signal: "login redirect when unauthenticated"
backing_stores: []
access_paths:
- type: https
target: "https://hub.coulomb.social"
status: unknown
evidence:
- type: document
observed_at: "2026-05-16"
source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md"
gaps:
- "ops-hub bootstrap requires authenticated UI flow or deployment-side migration."
- id: activity-core
name: "activity-core"
kind: automation-service
lifecycle_state: observed
health_status: observed_ok
environment: railiance01
owner_repos:
- activity-core
- the-custodian
desired_state_sources:
- "/home/worsch/activity-core/k8s/railiance"
- "/home/worsch/the-custodian/activity-definitions"
runtime:
type: k3s
cluster: railiance01-k3s
namespace: activity-core
workload_refs:
- "deployment:activity-core-api"
- "deployment:activity-core-worker"
- "temporal:schedules"
endpoints:
- id: activity-core-api
type: cluster-http
url: "activity-core API health endpoint"
expected_status: 200
expected_signal: "healthy DB and Temporal status"
backing_stores:
- "postgresql:activity-core"
- "temporal:activity-core"
- "nats:railiance01"
access_paths:
- type: k8s
target: "railiance01-k3s/activity-core"
status: observed_ok
evidence:
- type: workplan-note
observed_at: "2026-05-23"
source: "/home/worsch/the-custodian/workplans/CUST-WP-0046-hourly-recently-on-scope-activity-core.md"
summary: "API health, worker rollout, Temporal CLI schedule listing, and State Hub bridge were verified."
gaps:
- "Add explicit ops inventory probes and evidence events."
- id: ops-bridge
name: "Ops Bridge"
kind: connectivity-service
lifecycle_state: observed
health_status: unknown
environment: local
owner_repos:
- ops-bridge
runtime:
type: bridge
host: local-workstation
endpoints: []
backing_stores: []
access_paths:
- type: ssh-tunnel
target: "connected remote servers"
status: unknown
evidence:
- type: document
observed_at: "2026-05-16"
source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md"
summary: "Bridge is useful for connected-server visibility but is not itself the service catalog."
gaps:
- "Emit reachability evidence into ops-hub instead of relying on bridge state as inventory."
- id: haskell-build-agent
name: "Haskell Build Agent"
kind: build-service
lifecycle_state: observed
health_status: unknown
environment: local
owner_repos:
- the-custodian
desired_state_sources:
- "/home/worsch/the-custodian/infra/build-machines/haskell"
runtime:
type: systemd
host: haskell-build-vm
tunnel:
reverse_ssh: "12222:localhost:22"
forward_state_hub: "18000:localhost:8000"
endpoints:
- id: haskell-build-agent-state-hub-forward
type: tunnel
url: "http://127.0.0.1:18000"
expected_signal: "VM can reach State Hub through SSH forward"
backing_stores: []
access_paths:
- type: ssh
target: "local workstation reverse tunnel port 12222"
status: unknown
evidence:
- type: document
source: "/home/worsch/the-custodian/infra/build-machines/README.md"
summary: "Build agent is a systemd service and registers with State Hub on boot."
gaps:
- "Current tunnel and capability registration need live evidence in ops-hub."

View File

@@ -0,0 +1,174 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://custodian.local/schemas/ops-service-inventory.schema.json",
"title": "Ops Hub Service Inventory",
"type": "object",
"required": ["version", "last_reviewed", "environments", "hosts", "clusters", "services"],
"properties": {
"version": { "type": "integer", "minimum": 1 },
"last_reviewed": { "type": "string", "format": "date" },
"policy": {
"type": "object",
"additionalProperties": true
},
"sources": {
"type": "array",
"items": { "$ref": "#/$defs/source" }
},
"environments": {
"type": "array",
"items": { "$ref": "#/$defs/environment" }
},
"hosts": {
"type": "array",
"items": { "$ref": "#/$defs/host" }
},
"clusters": {
"type": "array",
"items": { "$ref": "#/$defs/cluster" }
},
"services": {
"type": "array",
"items": { "$ref": "#/$defs/service" }
}
},
"$defs": {
"source": {
"type": "object",
"required": ["path", "summary"],
"properties": {
"path": { "type": "string" },
"summary": { "type": "string" }
},
"additionalProperties": false
},
"environment": {
"type": "object",
"required": ["id", "name", "role", "lifecycle_state"],
"properties": {
"id": { "$ref": "#/$defs/id" },
"name": { "type": "string" },
"role": { "type": "string" },
"lifecycle_state": { "$ref": "#/$defs/lifecycle_state" }
},
"additionalProperties": false
},
"host": {
"type": "object",
"required": ["id", "environment", "role"],
"properties": {
"id": { "$ref": "#/$defs/id" },
"environment": { "$ref": "#/$defs/id" },
"address": { "type": "string" },
"role": { "type": "string" },
"evidence": {
"type": "array",
"items": { "$ref": "#/$defs/evidence" }
}
},
"additionalProperties": false
},
"cluster": {
"type": "object",
"required": ["id", "environment", "kind", "lifecycle_state"],
"properties": {
"id": { "$ref": "#/$defs/id" },
"environment": { "$ref": "#/$defs/id" },
"host": { "$ref": "#/$defs/id" },
"kind": { "type": "string" },
"lifecycle_state": { "$ref": "#/$defs/lifecycle_state" },
"notes": { "type": "string" }
},
"additionalProperties": false
},
"service": {
"type": "object",
"required": ["id", "name", "kind", "lifecycle_state", "health_status", "environment", "owner_repos", "runtime", "endpoints", "backing_stores", "access_paths", "evidence", "gaps"],
"properties": {
"id": { "$ref": "#/$defs/id" },
"name": { "type": "string" },
"kind": { "type": "string" },
"lifecycle_state": { "$ref": "#/$defs/lifecycle_state" },
"health_status": {
"enum": ["unknown", "observed_ok", "degraded", "down", "planned"]
},
"environment": { "$ref": "#/$defs/id" },
"owner_repos": {
"type": "array",
"items": { "type": "string" }
},
"desired_state_sources": {
"type": "array",
"items": { "type": "string" }
},
"runtime": {
"type": "object",
"additionalProperties": true
},
"endpoints": {
"type": "array",
"items": { "$ref": "#/$defs/endpoint" }
},
"backing_stores": {
"type": "array",
"items": { "type": "string" }
},
"access_paths": {
"type": "array",
"items": { "$ref": "#/$defs/access_path" }
},
"evidence": {
"type": "array",
"items": { "$ref": "#/$defs/evidence" }
},
"gaps": {
"type": "array",
"items": { "type": "string" }
}
},
"additionalProperties": false
},
"endpoint": {
"type": "object",
"required": ["id", "type"],
"properties": {
"id": { "$ref": "#/$defs/id" },
"type": { "type": "string" },
"url": { "type": "string" },
"expected_status": { "type": "integer" },
"expected_signal": { "type": "string" },
"widget_ref": { "type": "string" }
},
"additionalProperties": false
},
"access_path": {
"type": "object",
"required": ["type", "target", "status"],
"properties": {
"type": { "type": "string" },
"target": { "type": "string" },
"status": { "enum": ["unknown", "observed_ok", "degraded", "down", "planned"] }
},
"additionalProperties": false
},
"evidence": {
"type": "object",
"required": ["type", "source"],
"properties": {
"type": { "type": "string" },
"observed_at": { "type": "string" },
"source": { "type": "string" },
"summary": { "type": "string" }
},
"additionalProperties": false
},
"id": {
"type": "string",
"pattern": "^[a-z0-9][a-z0-9-]*$"
},
"lifecycle_state": {
"enum": ["observed", "planned", "target", "retired"]
}
},
"additionalProperties": false
}

View File

@@ -0,0 +1,229 @@
---
id: CUST-WP-0047
type: workplan
title: "Ops Hub Service Inventory Now View"
domain: custodian
repo: the-custodian
status: active
owner: codex
topic_slug: custodian
planning_priority: high
planning_order: 47
created: "2026-06-05"
updated: "2026-06-05"
state_hub_workstream_id: "656e435d-3a00-4f5e-a38e-114467f9062e"
---
# CUST-WP-0047 - Ops Hub Service Inventory Now View
## Goal
Establish a systematic, low-implementation overview of which services are
running where, then surface that overview as the first ops-hub "now view".
The immediate strategy is inventory-first:
- declare a small service inventory in Git
- map inventory ids to existing ops-hub widget concepts in Inter-Hub
- record evidence as events rather than building a new database first
- let activity-core run repeatable probes later
- leave the full standalone ops-hub scaffold to `CUST-WP-0025`
## Relationship To CUST-WP-0025
This workplan is a narrow implementation slice of the CUST-WP-0025 Ops Hub
phase. It advances the useful parts of:
- T14, by defining the first service/access/evidence record shape
- T16, by preparing the probe/evidence path for runtime observability
- T18, by defining the first service status grid
It intentionally does not require T13, T15, T17, or T19 to be complete first.
When the standalone `ops-hub` repo exists, it should ingest these inventory and
evidence artifacts instead of replacing them.
## Scope
In scope:
- A non-secret service inventory contract.
- An initial service inventory seed covering the currently known local,
CoulombCore, Railiance01, Inter-Hub, activity-core, bridge, and build-agent
surfaces.
- A first ops-hub view shape: service, where, owner, endpoint, health, data,
access, gaps.
- Inter-Hub widget/event handoff for the first visible ops-hub surface.
- activity-core probe handoff for later scheduled evidence.
Out of scope:
- Building the full standalone ops-hub FastAPI/MCP repo.
- Replacing Inter-Hub, State Hub, or activity-core.
- Capturing credentials, secret values, or sensitive command output.
- Treating bridge reachability as the service catalog.
## Task: Carve CUST-WP-0025 Inventory-First Slice
```task
id: CUST-WP-0047-T01
status: done
priority: high
state_hub_task_id: "0f2c504b-833e-4144-8849-4f74e6e6ab57"
```
Update `CUST-WP-0025` so Phase 3 explicitly recognizes this workplan as the
inventory-first implementation slice for the useful parts of T14/T16/T18.
Done when CUST-WP-0025 points to this workplan and still preserves the full
ops-hub scaffold as the long-term target.
## Task: Define Minimal Inventory Contract
```task
id: CUST-WP-0047-T02
status: done
priority: high
state_hub_task_id: "b9040dbf-64e1-46bf-bcca-e72d5a25b951"
```
Define the non-secret service inventory contract and first-view semantics.
Deliverables:
- `docs/ops-hub-service-inventory.md`
- `schemas/ops-service-inventory.schema.json`
Done when the contract explains the record shape, evidence event vocabulary,
first table view, and promotion path into the future ops-hub repo.
## Task: Seed Initial Service Inventory
```task
id: CUST-WP-0047-T03
status: done
priority: high
state_hub_task_id: "cf4404a8-1284-4412-a998-80cc98c617ce"
```
Create the initial inventory artifact from existing evidence in
`helix-forge/wiki/OpsHubInventory.md`, CUST-WP-0025, CUST-WP-0046, and current
Custodian ops docs.
Deliverable:
- `ops/service-inventory.yml`
Done when the seed includes environments, hosts, clusters, services, endpoints,
access paths, evidence links, and gaps for the known operating surface.
## Task: Register Workplan With State Hub
```task
id: CUST-WP-0047-T04
status: done
priority: high
state_hub_task_id: "221a30bc-d1f9-44e6-92db-99ea36c17e87"
```
Run the State Hub consistency sync for `the-custodian` so this workplan and its
task statuses are registered in the hub database.
Done when `make fix-consistency REPO=the-custodian` has completed and the
workstream appears in State Hub.
## Task: Activate Ops-Hub Widgets In Inter-Hub
```task
id: CUST-WP-0047-T05
status: wait
priority: high
state_hub_task_id: "b16c5e15-d44b-481a-abd7-3e059cb70c92"
```
Create or activate the ops-hub Inter-Hub row, capability manifest, API
consumer, and initial widgets from the existing seed material in
`helix-forge/wiki/ops-hub-widgets.seed.json`.
This is a human/operator-gated task because it requires authenticated
Inter-Hub admin access or deployment-side migration execution.
Done when the ops-hub widgets exist and can accept `ops-endpoint-verified` or
equivalent ops evidence events.
## Task: Build First Ops-Hub Service Catalog View
```task
id: CUST-WP-0047-T06
status: done
priority: high
state_hub_task_id: "db97a10d-2b20-4ac8-97a2-0f81e3fca907"
```
Build the first visible service catalog view from `ops/service-inventory.yml`
plus latest evidence events.
The view should show:
- service
- where it runs
- owner repo
- endpoint
- health and last evidence
- data/backing store gaps
- access path status
- highest-priority operating gaps
Done when an operator can open ops-hub and answer "what is running where?"
without reading scattered workplans and runbooks.
Completed 2026-06-05:
- Added `ops/render_service_inventory.py`.
- Added `make ops-inventory-view`.
- Generated `docs/ops-hub-service-catalog.md` from
`ops/service-inventory.yml`.
This is the repo-native now view until the Inter-Hub/ops-hub widget surface is
activated.
## Task: Schedule Activity-Core Inventory Probes
```task
id: CUST-WP-0047-T07
status: progress
priority: medium
state_hub_task_id: "5a972670-934f-458c-8274-acabc290992f"
```
Add an activity-core handoff for repeatable inventory probes.
Initial probe candidates:
- State Hub local health endpoint.
- Inter-Hub OpenAPI endpoint.
- Gitea OCI registry auth challenge.
- activity-core API health and Temporal schedule availability.
- ops-bridge tunnel reachability.
- build-agent State Hub registration and tunnel state.
Done when activity-core can run the probes on a schedule and submit non-secret
ops evidence events against the inventory ids.
Progress 2026-06-05:
- Added disabled draft handoff definition
`activity-definitions/ops-service-inventory-probes.md`.
- The definition names the inventory/catalog paths, hourly trigger, first probe
candidates, and evidence event mapping.
Remaining work: implement the activity-core probe runner/resolver and enable the
definition only after the ops-hub Inter-Hub widget/event sink is active.
## Acceptance Criteria
- The service inventory has a stable file and schema in this repo.
- CUST-WP-0025 points to this workplan as the inventory-first slice.
- The workplan is registered in State Hub.
- The remaining blocked work is explicit: Inter-Hub ops-hub activation and
actual view/probe implementation.
- No secrets or sensitive command output are stored in the inventory.