Files
helix-forge/wiki/CurrentOperationsSituation.md

112 lines
6.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Here is a 26-05-15 assessment of the operations situation. The clean mental model is:
`92.205.130.254` / CoulombCore is the current real production-ish server, but it is also historically hand-built and experimental. The Railiance tooling is now trying to pull that reality back into governed repos and workplans without pretending it was born clean.
`92.205.62.239` / `railiance01` is the first server of the intended future production shape: ThreePhoenix. That future stack belongs mostly to the lower repos first: `railiance-infra`, `railiance-cluster`, `railiance-platform`, then finally `railiance-apps`.
Current Gitea state:
- Gitea is live on the CoulombCore Kubernetes cluster at `92.205.130.254`.
- It is deployed by Helm as release `gitea`, but currently in namespace `default`, not `gitea`.
- The live chart label says `gitea-12.5.0`, app version `1.25.4`.
- The running image is `docker.gitea.com/gitea:1.25.4-rootless`.
- The main service is `default/gitea`, exposed as NodePort `32166`.
- The database is `gitea-db` in namespace `databases`, owned by the platform layer.
- Gitea data lives on PVC `default/gitea-shared-storage`.
For the registry specifically:
- Before my change, Gitea itself already responded correctly on `/v2/` through the pod and NodePort.
- But `https://gitea.coulomb.social/v2/` returned a generic `404`, meaning public hostname routing was missing.
- I added and applied a narrow Traefik ingress for only `/v2` on `gitea.coulomb.social`.
- Now `https://gitea.coulomb.social/v2/` returns the expected OCI registry `401` auth challenge.
So the current registry situation is: routing is fixed, but publishing is not fully proven yet. We still need the encrypted Helm values updated via SOPS and a Gitea package token to run `docker login`, `push`, and `pull`.
How the tooling handles the split:
- `railiance-apps` owns the Gitea app deployment shape: Helm values, application ingress, app-level docs.
- `railiance-platform` owns the database/cache/storage services Gitea depends on.
- `railiance-cluster` owns Kubernetes runtime and routing primitives.
- `railiance-infra` owns server provisioning and the future Forgejo/ThreePhoenix direction.
- State Hub tracks the workstreams and decisions, but the source of truth is still repo files like workplans and manifests.
In short: CoulombCore is the current operational platform. `railiance01` is the future production foundation. The work we just did makes the current Gitea usable as a container registry for near-term State Hub image publishing, while the longer-term Forgejo/ThreePhoenix migration remains the strategic production path.
There is a lot of detail about what's running locally and it might be helpful to look at the ops-bridge state to see connected server.
As one notices there is no central service catalog or registry. So service discovery remains to be solved.
There is another aspect. I plan to transition from statehub to the inter-hub framework with specific hubs for
- Operations (ops-hub),
- Synchronization (syn-hub),
- Internal Control (ctl-hub),
- Audit (aud-hub),
- Intelligence aka Adaption (int-hub),
- Policy aka Identity (pol-hub),
- Environment (env-hub).
The framework is available at hub.coulomb.social/Hubs but i did not set up any specific hub extension yet.
Maybe it is a good time to establish ops-hub and provide a better framework to professionalize the
operations infrastructure while working our way to the three phoenix setup?
Yes, I think this is exactly the right moment to establish `ops-hub`, but with one caveat: make it a thin operational layer first, not a big-bang replacement for State Hub.
I could not fetch `hub.coulomb.social/Hubs` from here, so Im reasoning from the framework names and your current setup.
The current problem is no longer just “deploy Gitea” or “move to railiance01.” You now have an operations transition:
- CoulombCore `92.205.130.254`: live, useful, but hand-built and somewhat experimental.
- Local machine: still hosts important services like State Hub and local build/runtime pieces.
- railiance01 `92.205.62.239`: first node of the intended ThreePhoenix production architecture.
- Future goal: a governed, repeatable, multi-node production system.
That is very much an `ops-hub` concern.
Id position `ops-hub` as the operational truth layer for:
- environments: `local`, `coulombcore`, `railiance01`, future `threephoenix-prod`
- hosts/nodes: IPs, roles, lifecycle state, ownership
- clusters: current CoulombCore K8s vs future ThreePhoenix K8s
- services: Gitea, State Hub, databases, ingress, registry, identity
- endpoints: DNS names, TLS status, public/private exposure
- releases: what is deployed where, by which repo/workplan
- backups: what data exists, where, and whether it is covered
- operational risks: hand-built drift, missing secrets, manual steps, unverified restore
- runbooks: deploy, rollback, restore, rotate, migrate
State Hub can remain the workstream/task/decision tracker for now. `ops-hub` should not immediately replace it. Instead, `ops-hub` should professionalize the operational model around the work State Hub is already tracking.
For Gitea specifically, `ops-hub` would make the current state explicit:
```text
Service: Gitea
Current environment: coulombcore
Host/IP: 92.205.130.254
Cluster namespace: default
Helm release: gitea
Chart/app: gitea-12.5.0 / 1.25.4
Public host: gitea.coulomb.social
Registry endpoint: /v2 live, auth challenge working
Known gaps:
- SOPS Helm values not yet updated in this session
- Docker push/pull not proven with PAT
- backup coverage for package blobs not confirmed
Future disposition:
- migrate/replace under Forgejo or governed Gitea on ThreePhoenix
```
That is much better than having this knowledge scattered across workplans, shell history, and memory.
My recommendation: create `ops-hub` now with a very small first milestone:
1. Define the environment model: `local`, `coulombcore`, `railiance01`, `threephoenix-prod`.
2. Seed an operations inventory for Gitea and State Hub.
3. Mark CoulombCore as `live-but-transitional`.
4. Mark railiance01 as `production-foundation`.
5. Track readiness gates for migration: DNS, TLS, registry, backups, secrets, restore test, deployment reproducibility.
6. Keep State Hub as the coordination source until `coo-hub` exists.
In short: yes, establish `ops-hub` now. It gives you the missing professional operations spine while you work toward ThreePhoenix, without forcing a premature migration away from State Hub.