diff --git a/wiki/CurrentOperationsSituation.md b/wiki/CurrentOperationsSituation.md new file mode 100644 index 0000000..2929587 --- /dev/null +++ b/wiki/CurrentOperationsSituation.md @@ -0,0 +1,111 @@ +Here is a 26-05-15 assessment of the operations situation. The clean mental model is: + +`92.205.130.254` / CoulombCore is the current real production-ish server, but it is also historically hand-built and experimental. The Railiance tooling is now trying to pull that reality back into governed repos and workplans without pretending it was born clean. + +`92.205.62.239` / `railiance01` is the first server of the intended future production shape: ThreePhoenix. That future stack belongs mostly to the lower repos first: `railiance-infra`, `railiance-cluster`, `railiance-platform`, then finally `railiance-apps`. + +Current Gitea state: + +- Gitea is live on the CoulombCore Kubernetes cluster at `92.205.130.254`. +- It is deployed by Helm as release `gitea`, but currently in namespace `default`, not `gitea`. +- The live chart label says `gitea-12.5.0`, app version `1.25.4`. +- The running image is `docker.gitea.com/gitea:1.25.4-rootless`. +- The main service is `default/gitea`, exposed as NodePort `32166`. +- The database is `gitea-db` in namespace `databases`, owned by the platform layer. +- Gitea data lives on PVC `default/gitea-shared-storage`. + +For the registry specifically: + +- Before my change, Gitea itself already responded correctly on `/v2/` through the pod and NodePort. +- But `https://gitea.coulomb.social/v2/` returned a generic `404`, meaning public hostname routing was missing. +- I added and applied a narrow Traefik ingress for only `/v2` on `gitea.coulomb.social`. +- Now `https://gitea.coulomb.social/v2/` returns the expected OCI registry `401` auth challenge. + +So the current registry situation is: routing is fixed, but publishing is not fully proven yet. We still need the encrypted Helm values updated via SOPS and a Gitea package token to run `docker login`, `push`, and `pull`. + +How the tooling handles the split: + +- `railiance-apps` owns the Gitea app deployment shape: Helm values, application ingress, app-level docs. +- `railiance-platform` owns the database/cache/storage services Gitea depends on. +- `railiance-cluster` owns Kubernetes runtime and routing primitives. +- `railiance-infra` owns server provisioning and the future Forgejo/ThreePhoenix direction. +- State Hub tracks the workstreams and decisions, but the source of truth is still repo files like workplans and manifests. + +In short: CoulombCore is the current operational platform. `railiance01` is the future production foundation. The work we just did makes the current Gitea usable as a container registry for near-term State Hub image publishing, while the longer-term Forgejo/ThreePhoenix migration remains the strategic production path. + +There is a lot of detail about what's running locally and it might be helpful to look at the ops-bridge state to see connected server. +As one notices there is no central service catalog or registry. So service discovery remains to be solved. + + +There is another aspect. I plan to transition from statehub to the inter-hub framework with specific hubs for +- Operations (ops-hub), +- Synchronization (syn-hub), +- Internal Control (ctl-hub), +- Audit (aud-hub), +- Intelligence aka Adaption (int-hub), +- Policy aka Identity (pol-hub), +- Environment (env-hub). + +The framework is available at hub.coulomb.social/Hubs but i did not set up any specific hub extension yet. +Maybe it is a good time to establish ops-hub and provide a better framework to professionalize the +operations infrastructure while working our way to the three phoenix setup? + +Yes, I think this is exactly the right moment to establish `ops-hub`, but with one caveat: make it a thin operational layer first, not a big-bang replacement for State Hub. + +I could not fetch `hub.coulomb.social/Hubs` from here, so I’m reasoning from the framework names and your current setup. + +The current problem is no longer just “deploy Gitea” or “move to railiance01.” You now have an operations transition: + +- CoulombCore `92.205.130.254`: live, useful, but hand-built and somewhat experimental. +- Local machine: still hosts important services like State Hub and local build/runtime pieces. +- railiance01 `92.205.62.239`: first node of the intended ThreePhoenix production architecture. +- Future goal: a governed, repeatable, multi-node production system. + +That is very much an `ops-hub` concern. + +I’d position `ops-hub` as the operational truth layer for: + +- environments: `local`, `coulombcore`, `railiance01`, future `threephoenix-prod` +- hosts/nodes: IPs, roles, lifecycle state, ownership +- clusters: current CoulombCore K8s vs future ThreePhoenix K8s +- services: Gitea, State Hub, databases, ingress, registry, identity +- endpoints: DNS names, TLS status, public/private exposure +- releases: what is deployed where, by which repo/workplan +- backups: what data exists, where, and whether it is covered +- operational risks: hand-built drift, missing secrets, manual steps, unverified restore +- runbooks: deploy, rollback, restore, rotate, migrate + +State Hub can remain the workstream/task/decision tracker for now. `ops-hub` should not immediately replace it. Instead, `ops-hub` should professionalize the operational model around the work State Hub is already tracking. + +For Gitea specifically, `ops-hub` would make the current state explicit: + +```text +Service: Gitea +Current environment: coulombcore +Host/IP: 92.205.130.254 +Cluster namespace: default +Helm release: gitea +Chart/app: gitea-12.5.0 / 1.25.4 +Public host: gitea.coulomb.social +Registry endpoint: /v2 live, auth challenge working +Known gaps: + - SOPS Helm values not yet updated in this session + - Docker push/pull not proven with PAT + - backup coverage for package blobs not confirmed +Future disposition: + - migrate/replace under Forgejo or governed Gitea on ThreePhoenix +``` + +That is much better than having this knowledge scattered across workplans, shell history, and memory. + +My recommendation: create `ops-hub` now with a very small first milestone: + +1. Define the environment model: `local`, `coulombcore`, `railiance01`, `threephoenix-prod`. +2. Seed an operations inventory for Gitea and State Hub. +3. Mark CoulombCore as `live-but-transitional`. +4. Mark railiance01 as `production-foundation`. +5. Track readiness gates for migration: DNS, TLS, registry, backups, secrets, restore test, deployment reproducibility. +6. Keep State Hub as the coordination source until `coo-hub` exists. + +In short: yes, establish `ops-hub` now. It gives you the missing professional operations spine while you work toward ThreePhoenix, without forcing a premature migration away from State Hub. + diff --git a/workplans/HF-WP-0001-establish-ops-hub-first-extension.md b/workplans/HF-WP-0001-establish-ops-hub-first-extension.md index 9e35b9d..29ffe31 100644 --- a/workplans/HF-WP-0001-establish-ops-hub-first-extension.md +++ b/workplans/HF-WP-0001-establish-ops-hub-first-extension.md @@ -1,7 +1,7 @@ --- id: HF-WP-0001 type: workplan -title: "Establish ops-hub as the First Inter-Hub Extension" +title: "Establish ops-hub as the First VSM Inter-Hub Extension" domain: helix_forge repo: helix-forge status: active @@ -19,23 +19,59 @@ related_repos: state_hub_workstream_id: "48d91935-197e-4ad4-be07-7bbcd535847c" --- -# Establish ops-hub as the First Inter-Hub Extension +# Establish ops-hub as the First VSM Inter-Hub Extension ## Goal -Create `ops-hub` as the first practical domain extension of the Interaction Hub -Framework, focused on professionalizing Railiance operations while the current +Use Inter-Hub as the generic hub framework and establish `ops-hub` as the +first VSM-oriented domain hub extension: the Operations / System 1 hub. + +`ops-hub` should professionalize Railiance operations while the current CoulombCore environment transitions toward the future ThreePhoenix production -setup. +setup. Just as importantly, it should prove the repeatable extension pattern +for the later VSM hubs: + +- `ops-hub` — Operations and Activities / System 1 +- `syn-hub` — Synchronization and Coordination / System 2 +- `ctl-hub` — Internal Control and Regulation / System 3 +- `aud-hub` — Audit and Monitoring / System 3* +- `int-hub` — Intelligence and Adaptation / System 4 +- `pol-hub` — Policy and Identity / System 5 +- `env-hub` — Boundary and Environment The first increment should not replace State Hub or require a separate `ops-hub` repository immediately. It should establish the operational model, -the hub vocabulary, and the smallest governed integration with Inter-Hub. A -separate implementation repository can be created once the shape of the hub is -stable and the Inter-Hub extension bootstrap API is less manual. +the VSM hub vocabulary, and the smallest governed integration with Inter-Hub. +A separate implementation repository can be created once the shape of the hub +is stable and the Inter-Hub extension bootstrap API is less manual. + +## VSM Hub Extension Strategy + +Inter-Hub is the framework. HelixForge should extend it with specific hubs that +map to the viable-system functions already named in `INTENT.md`. + +| Hub | VSM function | First responsibility | +|---|---|---| +| `ops-hub` | Operations and Activities / System 1 | Operational truth surface for environments, hosts, clusters, services, endpoints, releases, backups, incidents, risks, runbooks, and migration waves. | +| `syn-hub` | Synchronization and Coordination / System 2 | Coordination between operational units, repos, workstreams, and service handoffs so local actions do not conflict. | +| `ctl-hub` | Internal Control and Regulation / System 3 | Current-state control, resource constraints, readiness gates, priorities, and operational governance. | +| `aud-hub` | Audit and Monitoring / System 3* | Independent evidence, checks, observations, drift detection, and verification trails. | +| `int-hub` | Intelligence and Adaptation / System 4 | Future sensing, migration analysis, forecasting, recommendations, and adaptation planning. | +| `pol-hub` | Policy and Identity / System 5 | Identity, values, ultimate constraints, policy decisions, and acceptable operating posture. | +| `env-hub` | Boundary and Environment | External actors, surfaces, endpoints, users, markets, partner systems, and environmental signals. | + +This workplan starts only with `ops-hub`, but every bootstrap choice should be +judged by whether it can become the template for the next hub. ## Context +`wiki/CurrentOperationsSituation.md` captures the immediate operational +background as of 2026-05-15. The short version: the operational platform is +real, useful, and already carrying production-like responsibilities, but its +state is spread across live systems, repo workplans, shell knowledge, and +operator memory. There is no central service catalog or operational registry +yet. + Current operational reality: - `coulombcore` / `92.205.130.254` is the live production-like server. It runs @@ -47,10 +83,19 @@ Current operational reality: - The Railiance repo stack already separates operational responsibility: `railiance-infra` (S1), `railiance-cluster` (S2), `railiance-platform` (S3), `railiance-enablement` (S4), and `railiance-apps` (S5). +- Gitea is live on the CoulombCore Kubernetes cluster as Helm release `gitea` + in namespace `default`, exposed through NodePort `32166`, with its database + in namespace `databases` and shared data on PVC `default/gitea-shared-storage`. +- The Gitea OCI registry route at `https://gitea.coulomb.social/v2/` now + returns the expected registry auth challenge, but publishing still needs to + be proven with encrypted Helm values, a package token, `docker login`, push, + and pull. +- Ops Bridge can help reveal which servers are connected and reachable, but it + is not itself a full operational service catalog. `ops-hub` should become the operational truth surface across those realities: environments, hosts, clusters, services, releases, endpoints, backups, -readiness gates, incidents, risks, and migration waves. +readiness gates, incidents, risks, service discovery, and migration waves. ## Inter-Hub API Findings @@ -99,21 +144,43 @@ Assessment: ## Architectural Decision -Start with **Pattern A: API Consumer Hub**, plus a manual or migration-backed -Inter-Hub registration: +Start with **Pattern A: API Consumer Hub** for `ops-hub`, plus a manual or +migration-backed Inter-Hub registration. Treat `ops-hub` as the first VSM hub +instance rather than a one-off operational dashboard: 1. Register `ops-hub` as a domain hub in Inter-Hub. -2. Activate a `HubCapabilityManifest` for its operational vocabulary. -3. Create an `ApiConsumer` and API key for `ops-hub`. -4. Seed a small set of governed widgets representing operational surfaces. -5. Emit interaction events and annotations from lightweight scripts or a +2. Classify it as the Operations / System 1 hub in hub metadata or manifest + metadata, depending on what Inter-Hub currently supports. +3. Activate a `HubCapabilityManifest` for its operational vocabulary. +4. Create an `ApiConsumer` and API key for `ops-hub`. +5. Seed a small set of governed widgets representing operational surfaces. +6. Emit interaction events and annotations from lightweight scripts or a prototype UI. +The first reusable contract to prove is: + +```text +Hub identity + VSM function + manifest vocabulary + API consumer + seed widgets + evidence events +``` + +The next hubs should be able to follow the same shape with their own +vocabularies: + +```text +syn-hub / ctl-hub / aud-hub / int-hub / pol-hub / env-hub +``` + Do not create a separate `ops-hub` repository until the first inventory, -readiness, and migration workflows have proven their data model. +readiness, service catalog, and migration workflows have proven their data +model. ## Initial ops-hub Vocabulary +This vocabulary is deliberately scoped to Operations / System 1. Coordination, +control, audit, intelligence, policy, and environment concerns should be +represented only where they touch operational evidence; their own hubs will +own the broader semantics later. + Suggested manifest values: ### Widget Types @@ -124,9 +191,11 @@ Suggested manifest values: "ops-host", "ops-cluster", "ops-service", + "ops-service-catalog", "ops-endpoint", "ops-release", "ops-backup-set", + "ops-secret-set", "ops-runbook", "ops-incident", "ops-readiness-gate", @@ -140,14 +209,18 @@ Suggested manifest values: ```json [ "ops-inventory-registered", + "ops-inventory-updated", + "ops-service-discovered", "ops-health-checked", "ops-release-observed", "ops-endpoint-verified", "ops-backup-verified", "ops-restore-tested", "ops-runbook-executed", + "ops-drift-detected", "ops-risk-raised", "ops-risk-accepted", + "ops-readiness-gate-updated", "ops-migration-gate-passed", "ops-migration-gate-failed" ] @@ -158,6 +231,7 @@ Suggested manifest values: ```json [ "ops-drift", + "ops-service-catalog-gap", "ops-backup-gap", "ops-security-gap", "ops-routing-gap", @@ -202,10 +276,18 @@ The first services to model: - PostgreSQL/CNPG services used by Gitea and State Hub - Ingress/DNS/TLS endpoints for the above - Backup and restore coverage for each persistent data store +- Ops Bridge connectivity as reachability evidence, not as the catalog itself + +The first explicit service-catalog gap: + +- There is no central place that answers "what runs where, why, who owns it, + how it is reached, and what evidence proves it is healthy." `ops-hub` should + make that question answerable before the ThreePhoenix migration becomes more + complicated. ## Tasks -### T01 — Confirm Inter-Hub extension bootstrap path +### T01 — Confirm the VSM hub extension bootstrap path ```task id: HF-WP-0001-T01 @@ -215,7 +297,9 @@ state_hub_task_id: "2587a3b8-3b9b-4948-acaf-1547644e4563" ``` Confirm whether `ops-hub` should be registered through the Inter-Hub UI, -through a migration, or through new API endpoints. +through a migration, or through new API endpoints. Capture the result as the +first repeatable VSM hub bootstrap path, not just as a local workaround for +Operations. Checks: @@ -223,14 +307,17 @@ Checks: - Confirm whether `/Hubs/new`, `/HubCapabilityManifests`, `/ApiConsumers`, and `/ApiKeys` are accessible to the operator. - Confirm whether direct DB migration is acceptable for initial bootstrap. -- Record the chosen bootstrap path in this workplan. +- Confirm where hub metadata can carry the VSM function (`OPS`, `SYN`, `CTL`, + `AUD`, `INT`, `POL`, `ENV`) and VSM system mapping. +- Record the chosen bootstrap path in this workplan so `syn-hub` can reuse it. Done when: there is a concrete, repeatable path to create the `ops-hub` row, -manifest, API consumer, and API key. +manifest, API consumer, and API key, with enough metadata to classify it as the +Operations / System 1 hub. --- -### T02 — Register ops-hub in Inter-Hub +### T02 — Register ops-hub in Inter-Hub as the Operations hub ```task id: HF-WP-0001-T02 @@ -246,9 +333,16 @@ Create the Hub row: - `domain`: `ops.coulomb.social` or another explicit domain chosen by the operator - `hub_kind`: `domain` +- VSM function metadata: `OPS` +- VSM system metadata: `S1` +- Hub family metadata: `vsm` + +If Inter-Hub does not yet have explicit fields for VSM function, system, or hub +family, store them in manifest metadata and record the missing first-class +fields as an Inter-Hub API/model gap. Done when: `ops-hub` appears in `/Hubs` and `/api/v2/hub-registry` after -authentication. +authentication, and a human can tell that it is the VSM Operations hub. --- @@ -262,7 +356,14 @@ state_hub_task_id: "55f5aeed-21c3-4a83-bc78-f90f92c7d597" ``` Create and activate a `HubCapabilityManifest` for `ops-hub` using the -vocabulary in this workplan. +vocabulary in this workplan. The manifest should make the VSM classification +explicit: + +- `hub_family`: `vsm` +- `vsm_function`: `OPS` +- `vsm_system`: `S1` +- `scope`: operational truth and evidence, not coordination/control/audit + ownership Validation: @@ -271,6 +372,8 @@ Validation: - Declared annotation categories appear in `/api/v2/annotation-categories`. - Policy scopes are visible in the Inter-Hub registry UI or DB, even though the public v2 API currently lacks `/policy-scopes`. +- Future VSM hub values can be added by changing manifest vocabulary, not by + inventing a different bootstrap mechanism. Done when: the manifest status is `active` and no type conflicts remain. @@ -314,9 +417,13 @@ Create initial widgets for the operational surfaces: - `ops-env-coulombcore` - `ops-env-railiance01` - `ops-env-threephoenix-prod` +- `ops-host-coulombcore` +- `ops-host-railiance01` +- `ops-service-catalog` - `ops-service-gitea` - `ops-service-state-hub` - `ops-service-inter-hub` +- `ops-endpoint-gitea-registry` - `ops-readiness-gitea-registry` - `ops-readiness-state-hub-cluster-deploy` - `ops-migration-coulombcore-to-threephoenix` @@ -346,13 +453,17 @@ state of: - clusters - services - endpoints +- service discovery and service-catalog gaps - storage and backup coverage - migration readiness gates -Use this as the working model before creating a separate `ops-hub` repository. +Use `wiki/CurrentOperationsSituation.md` as the seed background, then turn it +into a more structured inventory artifact. Use this as the working model before +creating a separate `ops-hub` repository. Done when: a human can see the CoulombCore, local, railiance01, and -ThreePhoenix relationship without reading multiple repo workplans. +ThreePhoenix relationship, including the current Gitea registry state, without +reading multiple repo workplans or relying on shell history. --- @@ -375,6 +486,8 @@ Suggested event: "eventType": "ops-endpoint-verified", "viewContext": "railiance-apps/workplans/RAIL-AP-WP-0001", "metadata": { + "vsmFunction": "OPS", + "vsmSystem": "S1", "endpoint": "https://gitea.coulomb.social/v2/", "expectedStatus": 401, "observedHeader": "Docker-Distribution-Api-Version: registry/2.0" @@ -400,6 +513,7 @@ Define readiness gates that must be green before moving production responsibility from CoulombCore to ThreePhoenix: - DNS and TLS are codified. +- Service catalog entries exist for the live and target production services. - Git hosting and container registry are reproducible. - Persistent data stores have backup and restore evidence. - Secrets and SOPS/age keys are available through governed operator paths. @@ -428,6 +542,8 @@ Create a separate repo when at least one of these is true: - `ops-hub` needs collectors, adapters, or scheduled probes. - `ops-hub` needs its own release lifecycle. - The ops vocabulary stabilizes enough to deserve reusable code. +- The VSM hub extension template needs shared scaffolding that should not live + inside `inter-hub` itself. Until then, keep the model in `helix-forge` and register state in Inter-Hub. @@ -436,7 +552,7 @@ needed. --- -### T10 — Inter-Hub API hardening for extension bootstrap +### T10 — Inter-Hub API hardening for VSM hub bootstrap ```task id: HF-WP-0001-T10 @@ -446,7 +562,7 @@ target_repo: inter-hub state_hub_task_id: "7fa54508-7add-4885-8913-12edaadc4d92" ``` -Create or link an `inter-hub` workplan to make domain hub bootstrapping +Create or link an `inter-hub` workplan to make VSM domain hub bootstrapping machine-repeatable. Recommended Inter-Hub improvements: @@ -455,29 +571,33 @@ Recommended Inter-Hub improvements: 2. Add `POST /api/v2/widgets` and include it in OpenAPI. 3. Add API endpoints for `HubCapabilityManifest` draft creation, update, and activation. -4. Add API endpoints for `ApiConsumer` and API key creation, or a clearly +4. Add a documented place for hub-family metadata such as `hub_family`, + `vsm_function`, and `vsm_system`. +5. Add API endpoints for `ApiConsumer` and API key creation, or a clearly documented admin-only bootstrap command if API key creation remains UI-only. -5. Add `/api/v2/policy-scopes` to match the policy scope registry already used +6. Add `/api/v2/policy-scopes` to match the policy scope registry already used by manifests. -6. Add distinct OpenAPI request schemas for create requests instead of reusing +7. Add distinct OpenAPI request schemas for create requests instead of reusing response schemas. -7. Align `docs/new-hub-quickstart.md` with the actual live API until the create +8. Align `docs/new-hub-quickstart.md` with the actual live API until the create endpoints exist. -8. Fix `Web.Controller.Api.V2.InteractionEvents` so manifest-declared event +9. Fix `Web.Controller.Api.V2.InteractionEvents` so manifest-declared event types are actually decoded and enforced. -9. Fix webhook dispatch so it uses the submitted event type instead of the +10. Fix webhook dispatch so it uses the submitted event type instead of the hard-coded `"clicked"` event name. -10. Decide whether event `metadata` is part of the v2 create contract; if yes, +11. Decide whether event `metadata` is part of the v2 create contract; if yes, persist it in the controller and test it. +12. Document the bootstrap recipe as a template for `syn-hub`, `ctl-hub`, + `aud-hub`, `int-hub`, `pol-hub`, and `env-hub`. -Done when: the next domain hub can be created from a script using documented -API calls and without direct DB access. +Done when: the next VSM hub can be created from a script using documented API +calls and without direct DB access. ## Initial Acceptance Criteria This workplan is complete when: -1. `ops-hub` is registered in Inter-Hub. +1. `ops-hub` is registered in Inter-Hub as the VSM Operations / System 1 hub. 2. Its capability manifest is active. 3. It has an API consumer and key. 4. Initial ops widgets exist for environments, services, readiness gates, and @@ -487,6 +607,8 @@ This workplan is complete when: 7. A decision has been made whether to create a separate `ops-hub` repository. 8. Inter-Hub bootstrap API gaps are either fixed or tracked in an Inter-Hub workplan. +9. The bootstrap path is reusable enough that `syn-hub` can be created next + without rediscovering the whole process. ## Notes @@ -494,6 +616,6 @@ This workplan is complete when: - State Hub continues to track workstreams, decisions, and progress events. - `ops-hub` tracks operational reality and readiness evidence. -- `coo-hub` can later become the coordination/workstream successor once the - broader hub constellation is established. - +- `syn-hub`, `ctl-hub`, and `aud-hub` can later absorb coordination, control, + and evidence responsibilities once the broader hub constellation is + established.