From b1aac08eb2a7fd565bacfa638537066d13565688 Mon Sep 17 00:00:00 2001 From: tegwick Date: Sun, 7 Jun 2026 00:12:30 +0200 Subject: [PATCH] feat(ops): add ops-hub service inventory now view (CUST-WP-0047) Seed a non-secret service inventory (environments, hosts, clusters, services, endpoints, access paths, evidence, gaps) with a JSON schema, a renderer, and a generated service-catalog view. Adds the `make ops-inventory-view` target, probe ActivityDefinition, and docs. Co-Authored-By: Claude Opus 4.8 --- Makefile | 4 + .../ops-service-inventory-probes.md | 88 +++++ docs/ops-hub-service-catalog.md | 77 ++++ docs/ops-hub-service-inventory.md | 94 +++++ ops/README.md | 14 + ops/render_service_inventory.py | 216 +++++++++++ ops/service-inventory.yml | 342 ++++++++++++++++++ schemas/ops-service-inventory.schema.json | 174 +++++++++ ...0047-ops-hub-service-inventory-now-view.md | 229 ++++++++++++ 9 files changed, 1238 insertions(+) create mode 100644 activity-definitions/ops-service-inventory-probes.md create mode 100644 docs/ops-hub-service-catalog.md create mode 100644 docs/ops-hub-service-inventory.md create mode 100644 ops/render_service_inventory.py create mode 100644 ops/service-inventory.yml create mode 100644 schemas/ops-service-inventory.schema.json create mode 100644 workplans/CUST-WP-0047-ops-hub-service-inventory-now-view.md diff --git a/Makefile b/Makefile index 8e43ead..9386b11 100644 --- a/Makefile +++ b/Makefile @@ -16,6 +16,10 @@ CUSTODIAN_KEY := $(HOME)/.ssh/id_custodian_agent RAILIANCE_INFRA := $(HOME)/railiance-infra AGENT_VARS_FILE := $(RAILIANCE_INFRA)/ansible/inventory/group_vars/all.yaml +.PHONY: ops-inventory-view +ops-inventory-view: ## Render the ops-hub service catalog now view + python3 ops/render_service_inventory.py + .PHONY: custodian-keygen custodian-keygen: ## Generate custodian agent SSH keypair (one-time setup) @if [ -f "$(CUSTODIAN_KEY)" ]; then \ diff --git a/activity-definitions/ops-service-inventory-probes.md b/activity-definitions/ops-service-inventory-probes.md new file mode 100644 index 0000000..094aab0 --- /dev/null +++ b/activity-definitions/ops-service-inventory-probes.md @@ -0,0 +1,88 @@ +--- +id: "40d15a87-7ff6-4d8e-992c-37df15f95110" +name: "Ops Service Inventory Probes" +type: activity-definition +version: "0.1" +enabled: false +owner: custodian +governance: custodian +status: proposed +created: "2026-06-05" +trigger: + type: cron + cron_expression: "15 * * * *" + timezone: Europe/Berlin + misfire_policy: skip +context_sources: + - type: static + bind_to: context.inventory_path + config: + value: /home/worsch/the-custodian/ops/service-inventory.yml + - type: static + bind_to: context.catalog_path + config: + value: /home/worsch/the-custodian/docs/ops-hub-service-catalog.md +--- + +# ActivityDefinition: Ops Service Inventory Probes + +## Purpose + +This disabled draft is the activity-core handoff point for +`CUST-WP-0047 - Ops Hub Service Inventory Now View`. + +The future enabled routine should read the non-secret inventory, run repeatable +probes for declared endpoints and access paths, render the catalog view, and +submit non-secret ops evidence events against stable inventory ids. + +## Runner Status + +This definition is intentionally `enabled: false`. + +Do not enable it until both of these are true: + +- activity-core has an inventory probe runner or State Hub resolver that can + execute the checks without embedding secrets in ActivityRun context +- the ops-hub Inter-Hub widget/event sink can accept `ops-service-observed`, + `ops-endpoint-verified`, `ops-access-path-checked`, `ops-backup-verified`, + and `ops-inventory-drift` events + +## Trigger + +Hourly at minute 15 in `Europe/Berlin`, with `misfire_policy: skip`. + +This offset avoids colliding with the hourly RecentlyOnScope run at minute 0. + +## Probe Candidates + +Initial deterministic probes: + +- State Hub local health endpoint: + `http://127.0.0.1:8000/state/health` +- Inter-Hub OpenAPI endpoint: + `https://hub.coulomb.social/api/v2/openapi.json` +- Gitea OCI registry auth challenge: + `https://gitea.coulomb.social/v2/` +- activity-core API health and Temporal schedule availability +- ops-bridge tunnel reachability +- Haskell build-agent State Hub registration and tunnel state + +## Output Contract + +Each successful run should produce: + +- an updated `docs/ops-hub-service-catalog.md` +- one evidence event per checked service/endpoint/access path +- one ActivityRun with compact non-secret summary metadata +- no credentials, tokens, cookies, private key material, or sensitive command + output in context snapshots, event metadata, reports, or logs + +## Event Mapping + +| Probe result | Event type | +|---|---| +| Runtime object observed | `ops-service-observed` | +| HTTP/HTTPS/tunnel endpoint matches expected signal | `ops-endpoint-verified` | +| SSH, Kubernetes, or HTTP access path checked | `ops-access-path-checked` | +| Backup and restore evidence found | `ops-backup-verified` | +| Observed runtime differs from inventory | `ops-inventory-drift` | diff --git a/docs/ops-hub-service-catalog.md b/docs/ops-hub-service-catalog.md new file mode 100644 index 0000000..0a6d186 --- /dev/null +++ b/docs/ops-hub-service-catalog.md @@ -0,0 +1,77 @@ +# Ops Hub Service Catalog Now View + + + +Source: `ops/service-inventory.yml` +Inventory last reviewed: `2026-06-05` + +This is the repo-native first view for `CUST-WP-0047`. It exists so an +operator can answer what is running where before the full standalone +`ops-hub` application is available. + +## Summary + +| Metric | Count | +|---|---:| +| Environments | 4 | +| Hosts | 3 | +| Clusters | 3 | +| Services | 8 | +| Services: observed_ok | 2 | +| Services: unknown | 6 | + +## Service Catalog + +| Service | Where | Owner | Endpoint | Health | Data | Access | Top Gap | +|---|---|---|---|---|---|---|---| +| Gitea (gitea) | CoulombCore
type: k3s; cluster: coulombcore-k3s; namespace: default | railiance-apps | https://gitea.coulomb.social/v2/
Expected: status 401, OCI registry auth challenge | unknown
2026-05-16: Inventory draft records Helm release gitea, namespace default, app version 1.25.4, NodePort 32166, and registry auth challenge. | database:gitea-db
pvc:default/gitea-shared-storage | k8s: unknown (coulombcore-k3s/default) | Package token and push/pull verification need current evidence. | +| Gitea Database (gitea-database) | CoulombCore
type: k3s; cluster: coulombcore-k3s; namespace: databases | railiance-platform | - | unknown
2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | k8s: unknown (coulombcore-k3s/databases) | Backup and restore evidence not recorded in ops inventory. | +| Gitea Shared Storage (gitea-shared-storage) | CoulombCore
type: k3s; cluster: coulombcore-k3s; namespace: default | railiance-platform
railiance-apps | - | unknown
2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | k8s: unknown (coulombcore-k3s/default/pvc/gitea-shared-storage) | Package blob backup and restore evidence not confirmed. | +| State Hub (state-hub) | Local Workstation
type: local-process; host: local-workstation; ports: 8000 | state-hub
the-custodian | http://127.0.0.1:8000/state/health
Expected: status 200, health response | observed_ok
2026-06-05: State Hub accepted inbox, task, and progress API calls. | postgresql:state-hub | http: observed_ok (http://127.0.0.1:8000) | Future cluster deployment readiness still needs ops evidence. | +| Inter-Hub (inter-hub) | ThreePhoenix Production
type: external; public_endpoint: https://hub.coulomb.social | inter-hub | https://hub.coulomb.social/api/v2/openapi.json
Expected: status 200, OpenAPI document | unknown
2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | https: unknown (https://hub.coulomb.social) | ops-hub bootstrap requires authenticated UI flow or deployment-side migration. | +| activity-core (activity-core) | Railiance01
type: k3s; cluster: railiance01-k3s; namespace: activity-core | activity-core
the-custodian | activity-core API health endpoint
Expected: status 200, healthy DB and Temporal status | observed_ok
2026-05-23: API health, worker rollout, Temporal CLI schedule listing, and State Hub bridge were verified. | postgresql:activity-core
temporal:activity-core
nats:railiance01 | k8s: observed_ok (railiance01-k3s/activity-core) | Add explicit ops inventory probes and evidence events. | +| Ops Bridge (ops-bridge) | Local Workstation
type: bridge; host: local-workstation | ops-bridge | - | unknown
2026-05-16: Bridge is useful for connected-server visibility but is not itself the service catalog. | - | ssh-tunnel: unknown (connected remote servers) | Emit reachability evidence into ops-hub instead of relying on bridge state as inventory. | +| Haskell Build Agent (haskell-build-agent) | Local Workstation
type: systemd; host: haskell-build-vm | the-custodian | http://127.0.0.1:18000
Expected: VM can reach State Hub through SSH forward | unknown
undated: Build agent is a systemd service and registers with State Hub on boot. | - | ssh: unknown (local workstation reverse tunnel port 12222) | Current tunnel and capability registration need live evidence in ops-hub. | + +## Open Operating Gaps + +### Gitea (`gitea`) + +- Package token and push/pull verification need current evidence. +- Backup and restore evidence for database and shared storage not recorded in ops inventory. + +### Gitea Database (`gitea-database`) + +- Backup and restore evidence not recorded in ops inventory. + +### Gitea Shared Storage (`gitea-shared-storage`) + +- Package blob backup and restore evidence not confirmed. + +### State Hub (`state-hub`) + +- Future cluster deployment readiness still needs ops evidence. + +### Inter-Hub (`inter-hub`) + +- ops-hub bootstrap requires authenticated UI flow or deployment-side migration. + +### activity-core (`activity-core`) + +- Add explicit ops inventory probes and evidence events. + +### Ops Bridge (`ops-bridge`) + +- Emit reachability evidence into ops-hub instead of relying on bridge state as inventory. + +### Haskell Build Agent (`haskell-build-agent`) + +- Current tunnel and capability registration need live evidence in ops-hub. + +## Next Evidence Events + +- `ops-service-observed` for each runtime object confirmed by a probe. +- `ops-endpoint-verified` for HTTP, HTTPS, tunnel, or cluster endpoints. +- `ops-access-path-checked` for non-secret access path checks. +- `ops-backup-verified` where backup and restore evidence exists. +- `ops-inventory-drift` when observed state differs from this inventory. diff --git a/docs/ops-hub-service-inventory.md b/docs/ops-hub-service-inventory.md new file mode 100644 index 0000000..86b47fe --- /dev/null +++ b/docs/ops-hub-service-inventory.md @@ -0,0 +1,94 @@ +# Ops Hub Service Inventory + +Date: 2026-06-05 + +## Purpose + +The first ops-hub "now view" should answer one practical question: + +> What service is running where, who owns it, how is it reached, and what +> evidence says it is alive? + +The lowest-effort path is a small read model, not a full new application. The +read model starts as `ops/service-inventory.yml`, can be surfaced through +Inter-Hub ops widgets, and can later be ingested by the standalone `ops-hub` +repo planned in `CUST-WP-0025`. + +## Operating Model + +- Git owns the declared inventory. +- Inter-Hub widgets expose the visible ops entities. +- Interaction events provide timestamped operational evidence. +- activity-core runs repeatable probes and writes evidence. +- State Hub continues to own workstreams, tasks, decisions, and progress. It is + not the service catalog. + +## Minimal Record Shape + +Each service record should include: + +- `id`: stable lowercase service id, for example `state-hub`. +- `name`: human-readable name. +- `lifecycle_state`: `observed`, `planned`, `target`, or `retired`. +- `health_status`: `unknown`, `observed_ok`, `degraded`, `down`, or `planned`. +- `environment`: environment id where the service currently belongs. +- `owner_repos`: repos that own desired state, runtime code, or runbooks. +- `runtime`: runtime kind and location details, such as `local-process`, + `k3s`, `systemd`, `external`, or `bridge`. +- `endpoints`: public, local, cluster, or tunnel endpoints with expected + non-secret checks. +- `backing_stores`: databases, PVCs, object stores, or external stores that + must be backed up with the service. +- `access_paths`: non-secret descriptions of SSH, Kubernetes, HTTP, or tunnel + paths. +- `evidence`: links to docs, progress events, probe results, or workplans. +- `gaps`: missing evidence or operating controls. + +The schema lives at `schemas/ops-service-inventory.schema.json`. + +## First View + +The initial ops-hub view can be a dense table: + +| Column | Meaning | +|---|---| +| Service | `name` plus `id` | +| Where | environment, host, cluster, namespace | +| Owner | owner repo and desired state source | +| Endpoint | primary endpoint and expected check | +| Health | latest health status and last evidence timestamp | +| Data | backing stores and backup gap summary | +| Access | access path status | +| Gaps | highest-priority missing operating evidence | + +This is enough to make scattered operational reality visible without waiting +for a full incident system, runbook executor, or custom database. + +The repo-native version is rendered to `docs/ops-hub-service-catalog.md`: + +```bash +make ops-inventory-view +``` + +## Evidence Events + +Use a small event vocabulary first: + +- `ops-service-observed`: service/runtime object was observed. +- `ops-endpoint-verified`: endpoint responded as expected. +- `ops-access-path-checked`: access path was checked without storing secrets. +- `ops-backup-verified`: backup and restore evidence exists. +- `ops-inventory-drift`: observed state differs from declared inventory. + +Event metadata should reference the stable inventory id and include non-secret +probe output only. + +## Promotion Path + +1. Keep `ops/service-inventory.yml` as the source artifact. +2. Seed or update Inter-Hub widgets from the inventory ids. +3. Let activity-core run probes and submit evidence events. +4. Build the first ops-hub view from inventory plus latest evidence. +5. When the standalone `ops-hub` repo exists, ingest the same inventory and + evidence events into the proper Service, AccessPath, Runbook, and Incident + models from `CUST-WP-0025`. diff --git a/ops/README.md b/ops/README.md index f09f824..7f836d9 100644 --- a/ops/README.md +++ b/ops/README.md @@ -6,10 +6,24 @@ Operational runbooks and incident reports for the Railiance/Custodian infrastruc ``` ops/ + service-inventory.yml — non-secret service/location/evidence seed for ops-hub runbooks/ — how-to guides for recurring operational tasks and known issues incidents/ — post-incident reports (append-only, one file per incident) ``` +## Inventory + +| Artifact | Covers | +|----------|--------| +| [service-inventory.yml](service-inventory.yml) | Initial ops-hub service inventory: environments, hosts, clusters, services, endpoints, access paths, evidence, and gaps | +| [../docs/ops-hub-service-catalog.md](../docs/ops-hub-service-catalog.md) | Rendered service catalog now view generated from the inventory | + +Render the first catalog view with: + +```bash +make ops-inventory-view +``` + ## Runbooks | Runbook | Covers | diff --git a/ops/render_service_inventory.py b/ops/render_service_inventory.py new file mode 100644 index 0000000..9f67832 --- /dev/null +++ b/ops/render_service_inventory.py @@ -0,0 +1,216 @@ +#!/usr/bin/env python3 +"""Render the ops service inventory into a compact Markdown now view.""" + +from __future__ import annotations + +import argparse +from collections import Counter +from pathlib import Path +from typing import Any + +try: + import yaml +except ImportError as exc: # pragma: no cover - environment guard + raise SystemExit("PyYAML is required to render ops/service-inventory.yml") from exc + + +DEFAULT_INPUT = Path("ops/service-inventory.yml") +DEFAULT_OUTPUT = Path("docs/ops-hub-service-catalog.md") + + +def text(value: Any, default: str = "-") -> str: + if value is None: + return default + if isinstance(value, str): + return value if value else default + return str(value) + + +def md(value: Any) -> str: + return text(value).replace("|", "\\|").replace("\n", "
") + + +def joined(values: list[Any] | None, limit: int | None = None) -> str: + if not values: + return "-" + items = [text(v) for v in values] + if limit is not None and len(items) > limit: + shown = items[:limit] + shown.append(f"+{len(items) - limit} more") + items = shown + return "
".join(md(item) for item in items) + + +def endpoint_label(endpoint: dict[str, Any]) -> str: + label = endpoint.get("url") or endpoint.get("id") or "-" + checks: list[str] = [] + if endpoint.get("expected_status") is not None: + checks.append(f"status {endpoint['expected_status']}") + if endpoint.get("expected_signal"): + checks.append(endpoint["expected_signal"]) + if checks: + label = f"{label}
Expected: {', '.join(checks)}" + return md(label) + + +def primary_endpoint(service: dict[str, Any]) -> str: + endpoints = service.get("endpoints") or [] + if not endpoints: + return "-" + return endpoint_label(endpoints[0]) + + +def runtime_label(service: dict[str, Any], envs: dict[str, dict[str, Any]]) -> str: + env_id = service.get("environment") + env = envs.get(env_id, {}) + parts = [env.get("name") or env_id or "-"] + + runtime = service.get("runtime") or {} + details: list[str] = [] + for key in ("type", "cluster", "namespace", "host", "public_endpoint"): + if runtime.get(key): + details.append(f"{key}: {runtime[key]}") + if runtime.get("ports"): + details.append("ports: " + ", ".join(str(p) for p in runtime["ports"])) + if details: + parts.append("; ".join(details)) + + return "
".join(md(part) for part in parts) + + +def access_label(service: dict[str, Any]) -> str: + paths = service.get("access_paths") or [] + if not paths: + return "-" + labels = [] + for path in paths[:2]: + labels.append( + f"{path.get('type', '-')}: {path.get('status', 'unknown')} " + f"({path.get('target', '-')})" + ) + if len(paths) > 2: + labels.append(f"+{len(paths) - 2} more") + return "
".join(md(label) for label in labels) + + +def latest_evidence(service: dict[str, Any]) -> str: + evidence = service.get("evidence") or [] + if not evidence: + return "-" + dated = [item for item in evidence if item.get("observed_at")] + latest = max(dated, key=lambda item: item["observed_at"]) if dated else evidence[-1] + when = latest.get("observed_at") or "undated" + summary = latest.get("summary") or latest.get("source") or "-" + return md(f"{when}: {summary}") + + +def service_table(inventory: dict[str, Any]) -> str: + envs = {env["id"]: env for env in inventory.get("environments", [])} + rows = [ + "| Service | Where | Owner | Endpoint | Health | Data | Access | Top Gap |", + "|---|---|---|---|---|---|---|---|", + ] + for service in inventory.get("services", []): + gaps = service.get("gaps") or [] + rows.append( + "| " + + " | ".join( + [ + md(f"{service.get('name')} ({service.get('id')})"), + runtime_label(service, envs), + joined(service.get("owner_repos"), limit=3), + primary_endpoint(service), + md(f"{service.get('health_status', 'unknown')}
{latest_evidence(service)}"), + joined(service.get("backing_stores"), limit=3), + access_label(service), + md(gaps[0] if gaps else "-"), + ] + ) + + " |" + ) + return "\n".join(rows) + + +def summary_table(inventory: dict[str, Any]) -> str: + services = inventory.get("services", []) + health = Counter(service.get("health_status", "unknown") for service in services) + rows = [ + "| Metric | Count |", + "|---|---:|", + f"| Environments | {len(inventory.get('environments', []))} |", + f"| Hosts | {len(inventory.get('hosts', []))} |", + f"| Clusters | {len(inventory.get('clusters', []))} |", + f"| Services | {len(services)} |", + ] + for status, count in sorted(health.items()): + rows.append(f"| Services: {md(status)} | {count} |") + return "\n".join(rows) + + +def gaps_section(inventory: dict[str, Any]) -> str: + lines = ["## Open Operating Gaps", ""] + for service in inventory.get("services", []): + gaps = service.get("gaps") or [] + if not gaps: + continue + lines.append(f"### {service.get('name')} (`{service.get('id')}`)") + lines.append("") + for gap in gaps: + lines.append(f"- {gap}") + lines.append("") + return "\n".join(lines).rstrip() + + +def render(inventory: dict[str, Any]) -> str: + source = "ops/service-inventory.yml" + reviewed = inventory.get("last_reviewed", "unknown") + lines = [ + "# Ops Hub Service Catalog Now View", + "", + "", + "", + f"Source: `{source}`", + f"Inventory last reviewed: `{reviewed}`", + "", + "This is the repo-native first view for `CUST-WP-0047`. It exists so an", + "operator can answer what is running where before the full standalone", + "`ops-hub` application is available.", + "", + "## Summary", + "", + summary_table(inventory), + "", + "## Service Catalog", + "", + service_table(inventory), + "", + gaps_section(inventory), + "", + "## Next Evidence Events", + "", + "- `ops-service-observed` for each runtime object confirmed by a probe.", + "- `ops-endpoint-verified` for HTTP, HTTPS, tunnel, or cluster endpoints.", + "- `ops-access-path-checked` for non-secret access path checks.", + "- `ops-backup-verified` where backup and restore evidence exists.", + "- `ops-inventory-drift` when observed state differs from this inventory.", + "", + ] + return "\n".join(lines) + + +def main() -> int: + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument("--input", type=Path, default=DEFAULT_INPUT) + parser.add_argument("--output", type=Path, default=DEFAULT_OUTPUT) + args = parser.parse_args() + + inventory = yaml.safe_load(args.input.read_text(encoding="utf-8")) + rendered = render(inventory) + args.output.parent.mkdir(parents=True, exist_ok=True) + args.output.write_text(rendered, encoding="utf-8") + print(f"rendered {args.output} from {args.input}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/ops/service-inventory.yml b/ops/service-inventory.yml new file mode 100644 index 0000000..5a4b30f --- /dev/null +++ b/ops/service-inventory.yml @@ -0,0 +1,342 @@ +version: 1 +last_reviewed: "2026-06-05" +policy: + non_secret_inventory: true + secrets_rule: "Do not store credentials, tokens, private addresses that are not already operationally documented, or command output containing secrets." +sources: + - path: "/home/worsch/helix-forge/wiki/OpsHubInventory.md" + summary: "Initial ops-hub inventory draft with environments, hosts, services, endpoints, gaps, and first widget ids." + - path: "/home/worsch/the-custodian/workplans/CUST-WP-0025-fos-hub-bootstrap.md" + summary: "Long-term ops-hub scaffold, models, health probes, access paths, and now-view work." + - path: "/home/worsch/the-custodian/workplans/CUST-WP-0046-hourly-recently-on-scope-activity-core.md" + summary: "Evidence that activity-core runs on Railiance01 and can reach State Hub through the in-cluster bridge." + - path: "/home/worsch/the-custodian/infra/build-machines/README.md" + summary: "Local workstation and build VM tunnel pattern." + +environments: + - id: local + name: "Local Workstation" + role: "Workstation development and local operations" + lifecycle_state: observed + - id: coulombcore + name: "CoulombCore" + role: "Transitional production-like runtime" + lifecycle_state: observed + - id: railiance01 + name: "Railiance01" + role: "First ThreePhoenix foundation node" + lifecycle_state: observed + - id: threephoenix-prod + name: "ThreePhoenix Production" + role: "Target governed production topology" + lifecycle_state: planned + +hosts: + - id: local-workstation + environment: local + address: "local/private" + role: "State Hub and operator workstation runtime" + evidence: + - type: document + source: "/home/worsch/the-custodian/infra/build-machines/README.md" + - id: coulombcore + environment: coulombcore + address: "92.205.130.254" + role: "Current live production-like server" + evidence: + - type: document + source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md" + - id: railiance01 + environment: railiance01 + address: "92.205.62.239" + role: "First ThreePhoenix foundation node" + evidence: + - type: document + source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md" + +clusters: + - id: coulombcore-k3s + environment: coulombcore + host: coulombcore + kind: k3s + lifecycle_state: observed + notes: "Current operational Kubernetes runtime for Gitea and related services." + - id: railiance01-k3s + environment: railiance01 + host: railiance01 + kind: k3s + lifecycle_state: observed + notes: "Runtime substrate for activity-core production service evidence." + - id: threephoenix-k3s + environment: threephoenix-prod + kind: k3s + lifecycle_state: planned + notes: "Target governed production cluster shape." + +services: + - id: gitea + name: "Gitea" + kind: application + lifecycle_state: observed + health_status: unknown + environment: coulombcore + owner_repos: + - railiance-apps + desired_state_sources: + - "/home/worsch/railiance-forge/docs/gitea-package-registry.md" + - "/home/worsch/the-custodian/ops/runbooks/gitea-coulombcore.md" + runtime: + type: k3s + cluster: coulombcore-k3s + namespace: default + workload_refs: + - "helm:gitea" + - "nodePort:32166" + endpoints: + - id: gitea-oci-registry + type: https + url: "https://gitea.coulomb.social/v2/" + expected_status: 401 + expected_signal: "OCI registry auth challenge" + widget_ref: "ops:endpoint:gitea-registry" + backing_stores: + - "database:gitea-db" + - "pvc:default/gitea-shared-storage" + access_paths: + - type: k8s + target: "coulombcore-k3s/default" + status: unknown + evidence: + - type: document + observed_at: "2026-05-16" + source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md" + summary: "Inventory draft records Helm release gitea, namespace default, app version 1.25.4, NodePort 32166, and registry auth challenge." + gaps: + - "Package token and push/pull verification need current evidence." + - "Backup and restore evidence for database and shared storage not recorded in ops inventory." + + - id: gitea-database + name: "Gitea Database" + kind: datastore + lifecycle_state: observed + health_status: unknown + environment: coulombcore + owner_repos: + - railiance-platform + runtime: + type: k3s + cluster: coulombcore-k3s + namespace: databases + workload_refs: + - "database:gitea-db" + endpoints: [] + backing_stores: [] + access_paths: + - type: k8s + target: "coulombcore-k3s/databases" + status: unknown + evidence: + - type: document + observed_at: "2026-05-16" + source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md" + gaps: + - "Backup and restore evidence not recorded in ops inventory." + + - id: gitea-shared-storage + name: "Gitea Shared Storage" + kind: storage + lifecycle_state: observed + health_status: unknown + environment: coulombcore + owner_repos: + - railiance-platform + - railiance-apps + runtime: + type: k3s + cluster: coulombcore-k3s + namespace: default + workload_refs: + - "pvc:default/gitea-shared-storage" + endpoints: [] + backing_stores: [] + access_paths: + - type: k8s + target: "coulombcore-k3s/default/pvc/gitea-shared-storage" + status: unknown + evidence: + - type: document + observed_at: "2026-05-16" + source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md" + gaps: + - "Package blob backup and restore evidence not confirmed." + + - id: state-hub + name: "State Hub" + kind: coordination-service + lifecycle_state: observed + health_status: observed_ok + environment: local + owner_repos: + - state-hub + - the-custodian + desired_state_sources: + - "/home/worsch/state-hub" + - "/home/worsch/the-custodian/state-hub/README.md" + runtime: + type: local-process + host: local-workstation + ports: + - 8000 + endpoints: + - id: state-hub-local-api + type: http + url: "http://127.0.0.1:8000/state/health" + expected_status: 200 + expected_signal: "health response" + backing_stores: + - "postgresql:state-hub" + access_paths: + - type: http + target: "http://127.0.0.1:8000" + status: observed_ok + evidence: + - type: session-probe + observed_at: "2026-06-05" + source: "Codex session curl to local State Hub" + summary: "State Hub accepted inbox, task, and progress API calls." + gaps: + - "Future cluster deployment readiness still needs ops evidence." + + - id: inter-hub + name: "Inter-Hub" + kind: governance-service + lifecycle_state: observed + health_status: unknown + environment: threephoenix-prod + owner_repos: + - inter-hub + runtime: + type: external + public_endpoint: "https://hub.coulomb.social" + endpoints: + - id: inter-hub-openapi + type: https + url: "https://hub.coulomb.social/api/v2/openapi.json" + expected_status: 200 + expected_signal: "OpenAPI document" + - id: inter-hub-ui + type: https + url: "https://hub.coulomb.social/Hubs" + expected_status: 302 + expected_signal: "login redirect when unauthenticated" + backing_stores: [] + access_paths: + - type: https + target: "https://hub.coulomb.social" + status: unknown + evidence: + - type: document + observed_at: "2026-05-16" + source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md" + gaps: + - "ops-hub bootstrap requires authenticated UI flow or deployment-side migration." + + - id: activity-core + name: "activity-core" + kind: automation-service + lifecycle_state: observed + health_status: observed_ok + environment: railiance01 + owner_repos: + - activity-core + - the-custodian + desired_state_sources: + - "/home/worsch/activity-core/k8s/railiance" + - "/home/worsch/the-custodian/activity-definitions" + runtime: + type: k3s + cluster: railiance01-k3s + namespace: activity-core + workload_refs: + - "deployment:activity-core-api" + - "deployment:activity-core-worker" + - "temporal:schedules" + endpoints: + - id: activity-core-api + type: cluster-http + url: "activity-core API health endpoint" + expected_status: 200 + expected_signal: "healthy DB and Temporal status" + backing_stores: + - "postgresql:activity-core" + - "temporal:activity-core" + - "nats:railiance01" + access_paths: + - type: k8s + target: "railiance01-k3s/activity-core" + status: observed_ok + evidence: + - type: workplan-note + observed_at: "2026-05-23" + source: "/home/worsch/the-custodian/workplans/CUST-WP-0046-hourly-recently-on-scope-activity-core.md" + summary: "API health, worker rollout, Temporal CLI schedule listing, and State Hub bridge were verified." + gaps: + - "Add explicit ops inventory probes and evidence events." + + - id: ops-bridge + name: "Ops Bridge" + kind: connectivity-service + lifecycle_state: observed + health_status: unknown + environment: local + owner_repos: + - ops-bridge + runtime: + type: bridge + host: local-workstation + endpoints: [] + backing_stores: [] + access_paths: + - type: ssh-tunnel + target: "connected remote servers" + status: unknown + evidence: + - type: document + observed_at: "2026-05-16" + source: "/home/worsch/helix-forge/wiki/OpsHubInventory.md" + summary: "Bridge is useful for connected-server visibility but is not itself the service catalog." + gaps: + - "Emit reachability evidence into ops-hub instead of relying on bridge state as inventory." + + - id: haskell-build-agent + name: "Haskell Build Agent" + kind: build-service + lifecycle_state: observed + health_status: unknown + environment: local + owner_repos: + - the-custodian + desired_state_sources: + - "/home/worsch/the-custodian/infra/build-machines/haskell" + runtime: + type: systemd + host: haskell-build-vm + tunnel: + reverse_ssh: "12222:localhost:22" + forward_state_hub: "18000:localhost:8000" + endpoints: + - id: haskell-build-agent-state-hub-forward + type: tunnel + url: "http://127.0.0.1:18000" + expected_signal: "VM can reach State Hub through SSH forward" + backing_stores: [] + access_paths: + - type: ssh + target: "local workstation reverse tunnel port 12222" + status: unknown + evidence: + - type: document + source: "/home/worsch/the-custodian/infra/build-machines/README.md" + summary: "Build agent is a systemd service and registers with State Hub on boot." + gaps: + - "Current tunnel and capability registration need live evidence in ops-hub." diff --git a/schemas/ops-service-inventory.schema.json b/schemas/ops-service-inventory.schema.json new file mode 100644 index 0000000..8149d4b --- /dev/null +++ b/schemas/ops-service-inventory.schema.json @@ -0,0 +1,174 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://custodian.local/schemas/ops-service-inventory.schema.json", + "title": "Ops Hub Service Inventory", + "type": "object", + "required": ["version", "last_reviewed", "environments", "hosts", "clusters", "services"], + "properties": { + "version": { "type": "integer", "minimum": 1 }, + "last_reviewed": { "type": "string", "format": "date" }, + "policy": { + "type": "object", + "additionalProperties": true + }, + "sources": { + "type": "array", + "items": { "$ref": "#/$defs/source" } + }, + "environments": { + "type": "array", + "items": { "$ref": "#/$defs/environment" } + }, + "hosts": { + "type": "array", + "items": { "$ref": "#/$defs/host" } + }, + "clusters": { + "type": "array", + "items": { "$ref": "#/$defs/cluster" } + }, + "services": { + "type": "array", + "items": { "$ref": "#/$defs/service" } + } + }, + "$defs": { + "source": { + "type": "object", + "required": ["path", "summary"], + "properties": { + "path": { "type": "string" }, + "summary": { "type": "string" } + }, + "additionalProperties": false + }, + "environment": { + "type": "object", + "required": ["id", "name", "role", "lifecycle_state"], + "properties": { + "id": { "$ref": "#/$defs/id" }, + "name": { "type": "string" }, + "role": { "type": "string" }, + "lifecycle_state": { "$ref": "#/$defs/lifecycle_state" } + }, + "additionalProperties": false + }, + "host": { + "type": "object", + "required": ["id", "environment", "role"], + "properties": { + "id": { "$ref": "#/$defs/id" }, + "environment": { "$ref": "#/$defs/id" }, + "address": { "type": "string" }, + "role": { "type": "string" }, + "evidence": { + "type": "array", + "items": { "$ref": "#/$defs/evidence" } + } + }, + "additionalProperties": false + }, + "cluster": { + "type": "object", + "required": ["id", "environment", "kind", "lifecycle_state"], + "properties": { + "id": { "$ref": "#/$defs/id" }, + "environment": { "$ref": "#/$defs/id" }, + "host": { "$ref": "#/$defs/id" }, + "kind": { "type": "string" }, + "lifecycle_state": { "$ref": "#/$defs/lifecycle_state" }, + "notes": { "type": "string" } + }, + "additionalProperties": false + }, + "service": { + "type": "object", + "required": ["id", "name", "kind", "lifecycle_state", "health_status", "environment", "owner_repos", "runtime", "endpoints", "backing_stores", "access_paths", "evidence", "gaps"], + "properties": { + "id": { "$ref": "#/$defs/id" }, + "name": { "type": "string" }, + "kind": { "type": "string" }, + "lifecycle_state": { "$ref": "#/$defs/lifecycle_state" }, + "health_status": { + "enum": ["unknown", "observed_ok", "degraded", "down", "planned"] + }, + "environment": { "$ref": "#/$defs/id" }, + "owner_repos": { + "type": "array", + "items": { "type": "string" } + }, + "desired_state_sources": { + "type": "array", + "items": { "type": "string" } + }, + "runtime": { + "type": "object", + "additionalProperties": true + }, + "endpoints": { + "type": "array", + "items": { "$ref": "#/$defs/endpoint" } + }, + "backing_stores": { + "type": "array", + "items": { "type": "string" } + }, + "access_paths": { + "type": "array", + "items": { "$ref": "#/$defs/access_path" } + }, + "evidence": { + "type": "array", + "items": { "$ref": "#/$defs/evidence" } + }, + "gaps": { + "type": "array", + "items": { "type": "string" } + } + }, + "additionalProperties": false + }, + "endpoint": { + "type": "object", + "required": ["id", "type"], + "properties": { + "id": { "$ref": "#/$defs/id" }, + "type": { "type": "string" }, + "url": { "type": "string" }, + "expected_status": { "type": "integer" }, + "expected_signal": { "type": "string" }, + "widget_ref": { "type": "string" } + }, + "additionalProperties": false + }, + "access_path": { + "type": "object", + "required": ["type", "target", "status"], + "properties": { + "type": { "type": "string" }, + "target": { "type": "string" }, + "status": { "enum": ["unknown", "observed_ok", "degraded", "down", "planned"] } + }, + "additionalProperties": false + }, + "evidence": { + "type": "object", + "required": ["type", "source"], + "properties": { + "type": { "type": "string" }, + "observed_at": { "type": "string" }, + "source": { "type": "string" }, + "summary": { "type": "string" } + }, + "additionalProperties": false + }, + "id": { + "type": "string", + "pattern": "^[a-z0-9][a-z0-9-]*$" + }, + "lifecycle_state": { + "enum": ["observed", "planned", "target", "retired"] + } + }, + "additionalProperties": false +} diff --git a/workplans/CUST-WP-0047-ops-hub-service-inventory-now-view.md b/workplans/CUST-WP-0047-ops-hub-service-inventory-now-view.md new file mode 100644 index 0000000..f60d73c --- /dev/null +++ b/workplans/CUST-WP-0047-ops-hub-service-inventory-now-view.md @@ -0,0 +1,229 @@ +--- +id: CUST-WP-0047 +type: workplan +title: "Ops Hub Service Inventory Now View" +domain: custodian +repo: the-custodian +status: active +owner: codex +topic_slug: custodian +planning_priority: high +planning_order: 47 +created: "2026-06-05" +updated: "2026-06-05" +state_hub_workstream_id: "656e435d-3a00-4f5e-a38e-114467f9062e" +--- + +# CUST-WP-0047 - Ops Hub Service Inventory Now View + +## Goal + +Establish a systematic, low-implementation overview of which services are +running where, then surface that overview as the first ops-hub "now view". + +The immediate strategy is inventory-first: + +- declare a small service inventory in Git +- map inventory ids to existing ops-hub widget concepts in Inter-Hub +- record evidence as events rather than building a new database first +- let activity-core run repeatable probes later +- leave the full standalone ops-hub scaffold to `CUST-WP-0025` + +## Relationship To CUST-WP-0025 + +This workplan is a narrow implementation slice of the CUST-WP-0025 Ops Hub +phase. It advances the useful parts of: + +- T14, by defining the first service/access/evidence record shape +- T16, by preparing the probe/evidence path for runtime observability +- T18, by defining the first service status grid + +It intentionally does not require T13, T15, T17, or T19 to be complete first. +When the standalone `ops-hub` repo exists, it should ingest these inventory and +evidence artifacts instead of replacing them. + +## Scope + +In scope: + +- A non-secret service inventory contract. +- An initial service inventory seed covering the currently known local, + CoulombCore, Railiance01, Inter-Hub, activity-core, bridge, and build-agent + surfaces. +- A first ops-hub view shape: service, where, owner, endpoint, health, data, + access, gaps. +- Inter-Hub widget/event handoff for the first visible ops-hub surface. +- activity-core probe handoff for later scheduled evidence. + +Out of scope: + +- Building the full standalone ops-hub FastAPI/MCP repo. +- Replacing Inter-Hub, State Hub, or activity-core. +- Capturing credentials, secret values, or sensitive command output. +- Treating bridge reachability as the service catalog. + +## Task: Carve CUST-WP-0025 Inventory-First Slice + +```task +id: CUST-WP-0047-T01 +status: done +priority: high +state_hub_task_id: "0f2c504b-833e-4144-8849-4f74e6e6ab57" +``` + +Update `CUST-WP-0025` so Phase 3 explicitly recognizes this workplan as the +inventory-first implementation slice for the useful parts of T14/T16/T18. + +Done when CUST-WP-0025 points to this workplan and still preserves the full +ops-hub scaffold as the long-term target. + +## Task: Define Minimal Inventory Contract + +```task +id: CUST-WP-0047-T02 +status: done +priority: high +state_hub_task_id: "b9040dbf-64e1-46bf-bcca-e72d5a25b951" +``` + +Define the non-secret service inventory contract and first-view semantics. + +Deliverables: + +- `docs/ops-hub-service-inventory.md` +- `schemas/ops-service-inventory.schema.json` + +Done when the contract explains the record shape, evidence event vocabulary, +first table view, and promotion path into the future ops-hub repo. + +## Task: Seed Initial Service Inventory + +```task +id: CUST-WP-0047-T03 +status: done +priority: high +state_hub_task_id: "cf4404a8-1284-4412-a998-80cc98c617ce" +``` + +Create the initial inventory artifact from existing evidence in +`helix-forge/wiki/OpsHubInventory.md`, CUST-WP-0025, CUST-WP-0046, and current +Custodian ops docs. + +Deliverable: + +- `ops/service-inventory.yml` + +Done when the seed includes environments, hosts, clusters, services, endpoints, +access paths, evidence links, and gaps for the known operating surface. + +## Task: Register Workplan With State Hub + +```task +id: CUST-WP-0047-T04 +status: done +priority: high +state_hub_task_id: "221a30bc-d1f9-44e6-92db-99ea36c17e87" +``` + +Run the State Hub consistency sync for `the-custodian` so this workplan and its +task statuses are registered in the hub database. + +Done when `make fix-consistency REPO=the-custodian` has completed and the +workstream appears in State Hub. + +## Task: Activate Ops-Hub Widgets In Inter-Hub + +```task +id: CUST-WP-0047-T05 +status: wait +priority: high +state_hub_task_id: "b16c5e15-d44b-481a-abd7-3e059cb70c92" +``` + +Create or activate the ops-hub Inter-Hub row, capability manifest, API +consumer, and initial widgets from the existing seed material in +`helix-forge/wiki/ops-hub-widgets.seed.json`. + +This is a human/operator-gated task because it requires authenticated +Inter-Hub admin access or deployment-side migration execution. + +Done when the ops-hub widgets exist and can accept `ops-endpoint-verified` or +equivalent ops evidence events. + +## Task: Build First Ops-Hub Service Catalog View + +```task +id: CUST-WP-0047-T06 +status: done +priority: high +state_hub_task_id: "db97a10d-2b20-4ac8-97a2-0f81e3fca907" +``` + +Build the first visible service catalog view from `ops/service-inventory.yml` +plus latest evidence events. + +The view should show: + +- service +- where it runs +- owner repo +- endpoint +- health and last evidence +- data/backing store gaps +- access path status +- highest-priority operating gaps + +Done when an operator can open ops-hub and answer "what is running where?" +without reading scattered workplans and runbooks. + +Completed 2026-06-05: + +- Added `ops/render_service_inventory.py`. +- Added `make ops-inventory-view`. +- Generated `docs/ops-hub-service-catalog.md` from + `ops/service-inventory.yml`. + +This is the repo-native now view until the Inter-Hub/ops-hub widget surface is +activated. + +## Task: Schedule Activity-Core Inventory Probes + +```task +id: CUST-WP-0047-T07 +status: progress +priority: medium +state_hub_task_id: "5a972670-934f-458c-8274-acabc290992f" +``` + +Add an activity-core handoff for repeatable inventory probes. + +Initial probe candidates: + +- State Hub local health endpoint. +- Inter-Hub OpenAPI endpoint. +- Gitea OCI registry auth challenge. +- activity-core API health and Temporal schedule availability. +- ops-bridge tunnel reachability. +- build-agent State Hub registration and tunnel state. + +Done when activity-core can run the probes on a schedule and submit non-secret +ops evidence events against the inventory ids. + +Progress 2026-06-05: + +- Added disabled draft handoff definition + `activity-definitions/ops-service-inventory-probes.md`. +- The definition names the inventory/catalog paths, hourly trigger, first probe + candidates, and evidence event mapping. + +Remaining work: implement the activity-core probe runner/resolver and enable the +definition only after the ops-hub Inter-Hub widget/event sink is active. + +## Acceptance Criteria + +- The service inventory has a stable file and schema in this repo. +- CUST-WP-0025 points to this workplan as the inventory-first slice. +- The workplan is registered in State Hub. +- The remaining blocked work is explicit: Inter-Hub ops-hub activation and + actual view/probe implementation. +- No secrets or sensitive command output are stored in the inventory.