Archive closed workplans to workplans/archived/ (ADR-001)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,229 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0005
|
||||
type: workplan
|
||||
title: "S5 app release readiness and scope alignment"
|
||||
domain: financials
|
||||
repo: railiance-apps
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
planning_priority: medium
|
||||
created: "2026-06-04"
|
||||
updated: "2026-06-05"
|
||||
state_hub_workstream_id: "685f1c18-33c0-400d-a2b1-e1dae0f27c3e"
|
||||
---
|
||||
|
||||
# S5 app release readiness and scope alignment
|
||||
|
||||
## Context
|
||||
|
||||
The 2026-06-04 review of `SCOPE.md` against the actual repository
|
||||
implementation found that `railiance-apps` has moved beyond "Gitea Helm values"
|
||||
and now owns the repeatable S5 application release surface: forge-backed
|
||||
artifact consumption, the `vergabe-teilnahme` Helm release, operator
|
||||
guardrails, and deployment runbooks.
|
||||
|
||||
The 2026-06-05 `railiance-forge` extraction moved canonical registry operating
|
||||
docs and registry-retention policy into the new forge layer. This workplan now
|
||||
keeps only app-release readiness items in S5.
|
||||
|
||||
The same review found several planning-time gaps that are closed by this
|
||||
workplan:
|
||||
|
||||
- `INTENT.md` was missing, so purpose and scope were collapsed into one
|
||||
document.
|
||||
- The `vergabe-teilnahme` runbook contained stale image-promotion guidance tied
|
||||
to the old local `issue-core` build context.
|
||||
- First-app lessons were documented, but not yet turned into a reusable
|
||||
checklist for the next S5 app release.
|
||||
- Forge package storage and app database backup responsibilities needed clearer
|
||||
contracts with platform-layer work.
|
||||
- The server-side dry-run workflow did not state its live-cluster/CRD
|
||||
prerequisites clearly enough for a future runner.
|
||||
|
||||
This workplan turns those scope gaps into the next improvement strand.
|
||||
|
||||
## T01 - Write `INTENT.md`
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0005-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "af15d202-c013-48b1-8522-671477e31381"
|
||||
```
|
||||
|
||||
Create `INTENT.md` for `railiance-apps`.
|
||||
|
||||
It should explain:
|
||||
|
||||
- why this repo exists as the S5 application deployment surface;
|
||||
- what problem it solves for operators and source app repos;
|
||||
- what it intentionally does not own across S1-S4 boundaries;
|
||||
- how forge-owned registry capabilities, `vergabe-teilnahme`, and future S5
|
||||
apps fit together;
|
||||
- how workplan files relate to State Hub workstreams and task rows.
|
||||
|
||||
Done when `INTENT.md` stands on its own and `SCOPE.md` can reference it instead
|
||||
of carrying all purpose language itself.
|
||||
|
||||
---
|
||||
|
||||
## T02 - Normalize app image promotion and package registry docs
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0005-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "f8f63edb-a7ef-4692-8b01-66402d296cbb"
|
||||
```
|
||||
|
||||
Update app promotion docs after the `issue-core` package handoff is complete.
|
||||
|
||||
Focus areas:
|
||||
|
||||
- remove stale references to the old `--build-context issue-core=...` path from
|
||||
`docs/vergabe-teilnahme.md`;
|
||||
- document the package-registry credential path for private Python package
|
||||
installs without committing tokenized index URLs;
|
||||
- state when `vergabe-teilnahme/uv.lock` must be regenerated in the source repo;
|
||||
- link to forge-owned registry endpoint docs and source-repo release docs
|
||||
instead of duplicating them in S5.
|
||||
|
||||
Done when the Railiance operator docs describe the portable image promotion path
|
||||
and no active runbook tells an operator to rely on a sibling repo checkout.
|
||||
|
||||
Completed on 2026-06-05 by updating `docs/vergabe-teilnahme.md` and replacing
|
||||
local registry ownership with direct links to `railiance-forge`. The temporary
|
||||
compatibility pointers were removed later by `RAILIANCE-WP-0006-T10`.
|
||||
|
||||
---
|
||||
|
||||
## T03 - Create a reusable S5 app onboarding checklist
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0005-T03
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "4eab93a9-ad1b-46ca-97ef-18a059f64ab5"
|
||||
```
|
||||
|
||||
Turn the `vergabe-teilnahme` deployment lessons into a reusable checklist for
|
||||
the next S5 app.
|
||||
|
||||
The checklist should cover:
|
||||
|
||||
- chart and values layout;
|
||||
- ingress and TLS ownership;
|
||||
- health probe Host headers for framework apps;
|
||||
- database secret handoff and URL-encoding guidance;
|
||||
- image registry naming and pull verification;
|
||||
- runbook sections every S5 app should ship;
|
||||
- smoke-test pattern using persistent pods plus `kubectl exec`;
|
||||
- State Hub workplan and task-sync expectations.
|
||||
|
||||
Done when a new app can start from the checklist without reading all historical
|
||||
workplans first.
|
||||
|
||||
Completed on 2026-06-05 by adding
|
||||
`docs/s5-app-onboarding-checklist.md` and cross-linking it from `SCOPE.md` and
|
||||
the `vergabe-teilnahme` runbook. The checklist covers app scope, chart/values
|
||||
layout, forge artifact consumption, database and secret handoff, ingress/TLS,
|
||||
probe Host headers, smoke tests, runbook baseline, and State Hub sync.
|
||||
|
||||
---
|
||||
|
||||
## T04 - Define app data backup and restore handoffs
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0005-T04
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "299d9623-3a54-4e85-9a70-016e8356c3d9"
|
||||
```
|
||||
|
||||
Clarify where S5 app data durability begins and ends.
|
||||
|
||||
Cover at least:
|
||||
|
||||
- `vergabe_db` on the shared `apps-pg` CNPG cluster;
|
||||
- responsibility split between S5 runbooks and `railiance-platform` backup
|
||||
controllers;
|
||||
- minimum restore-drill evidence S5 app operators need before promoting an app
|
||||
beyond smoke-test usage;
|
||||
- how to file or link platform-layer workplans when the durability gap is not
|
||||
local to this repo.
|
||||
- how S5 app runbooks cite forge-owned package/blob restore evidence without
|
||||
owning Gitea package backup procedures.
|
||||
|
||||
Done when `SCOPE.md` and app runbooks clearly separate S5 release ownership from
|
||||
S3 backup implementation while still giving operators an actionable restore
|
||||
readiness gate.
|
||||
|
||||
Completed on 2026-06-05 by adding
|
||||
`docs/app-data-backup-restore-handoff.md`, updating
|
||||
`docs/vergabe-teilnahme.md`, and refreshing `SCOPE.md`. The handoff states that
|
||||
S5 owns app release evidence and post-restore app checks, while
|
||||
`railiance-platform` owns `apps-pg` backup/restore mechanisms and
|
||||
`railiance-forge` owns artifact restore evidence.
|
||||
|
||||
---
|
||||
|
||||
## T05 - Make manifest dry-run workflow prerequisites explicit
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0005-T05
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "6cf0e662-d7e2-48b1-b1f2-c7636240dd81"
|
||||
```
|
||||
|
||||
Document and, where useful, encode the assumptions behind
|
||||
`.gitea/workflows/manifest-server-dry-run.yaml` and
|
||||
`make k8s-server-dry-run`.
|
||||
|
||||
Questions to answer:
|
||||
|
||||
- Does the workflow expect a live representative cluster?
|
||||
- Which CRDs must already exist for server-side dry-run to be meaningful?
|
||||
- What credentials or runner placement are required?
|
||||
- What should happen when a runner has no cluster access?
|
||||
- Which failures are release-blocking versus local operator setup issues?
|
||||
|
||||
Done when a future operator can tell whether the workflow is ready to enforce
|
||||
PR checks or still needs runner/cluster preparation.
|
||||
|
||||
Completed on 2026-06-05 by adding
|
||||
`docs/manifest-server-dry-run.md`, updating `docs/operator-recipes.md`, and
|
||||
making `tools/k8s-server-dry-run.sh` print an explicit representative-cluster
|
||||
preflight error. The docs classify runner/cluster prerequisite gaps separately
|
||||
from release-blocking manifest failures and note that
|
||||
`DRY_RUN_CREATE_NAMESPACES=true` creates the namespace as a real side effect.
|
||||
|
||||
---
|
||||
|
||||
## T06 - Hand off Gitea package registry storage and retention posture
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0005-T06
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "382ba252-0f54-45fa-8e33-e656f4472341"
|
||||
```
|
||||
|
||||
Document the forge-owned operating posture for Gitea package storage while
|
||||
Gitea remains the active forge.
|
||||
|
||||
Include:
|
||||
|
||||
- current PVC, size, and package blob location;
|
||||
- expected retention approach for smoke-test images and Python wheels;
|
||||
- cleanup procedure for superseded test tags;
|
||||
- alert or inspection command for package storage growth;
|
||||
- handoff to platform backup/restore work when package data becomes production
|
||||
critical.
|
||||
|
||||
Done when registry growth is no longer only a note in S5 app docs.
|
||||
|
||||
Completed on 2026-06-05 by moving the canonical storage and retention posture
|
||||
to `/home/worsch/railiance-forge/docs/initial-operating-contracts.md`; S5 app
|
||||
runbooks now cite forge docs directly.
|
||||
@@ -0,0 +1,414 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0006
|
||||
type: workplan
|
||||
title: "Extract railiance-forge and formalize forge-layer contracts"
|
||||
domain: financials
|
||||
repo: railiance-apps
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
planning_priority: high
|
||||
created: "2026-06-04"
|
||||
updated: "2026-06-05"
|
||||
state_hub_workstream_id: "1960b22b-82ae-4b67-9fff-397bd38bdd65"
|
||||
---
|
||||
|
||||
# Extract railiance-forge and formalize forge-layer contracts
|
||||
|
||||
## Context
|
||||
|
||||
The 2026-06-04 intent review found the S1-S5 Railiance framework mostly sound:
|
||||
|
||||
```text
|
||||
railiance-infra -> railiance-cluster -> railiance-platform ->
|
||||
railiance-enablement -> railiance-apps
|
||||
```
|
||||
|
||||
The main architecture smell is not the vertical layering. It is the set of
|
||||
cross-cutting forge concerns currently straddling S4 and S5:
|
||||
|
||||
- Gitea/Forgejo runtime ownership;
|
||||
- container and Python package registries;
|
||||
- Actions runners and CI/CD substrate;
|
||||
- source-hosting operational runbooks;
|
||||
- artifact lifecycle, retention, and restore posture;
|
||||
- package/image promotion evidence used by application releases.
|
||||
|
||||
The decision is to create a dedicated `railiance-forge` repository rather than
|
||||
leaving forge responsibilities split between `railiance-apps` and
|
||||
`railiance-enablement`.
|
||||
|
||||
This workplan coordinates the extraction and the accompanying framework
|
||||
contracts so the new repo becomes a clean abstraction rather than another place
|
||||
for cross-layer drift.
|
||||
|
||||
## Boundary Target
|
||||
|
||||
`railiance-forge` should own:
|
||||
|
||||
- current Gitea operation and future Forgejo migration/cutover;
|
||||
- source forge deployment configuration and runbooks;
|
||||
- container, package, and release artifact registries;
|
||||
- forge-backed Actions runner substrate and repository automation hooks;
|
||||
- artifact retention, registry storage, and restore readiness;
|
||||
- forge observability and operational evidence;
|
||||
- Fabric declarations for forge capabilities and interfaces.
|
||||
|
||||
`railiance-forge` should not own:
|
||||
|
||||
- server provisioning and OS hardening (`railiance-infra`);
|
||||
- Kubernetes runtime primitives, ingress controllers, and cluster addons
|
||||
(`railiance-cluster`);
|
||||
- shared databases, object storage, caches, and runtime secret custody
|
||||
(`railiance-platform`);
|
||||
- generic workload templates, SDKs, and developer portal surfaces
|
||||
(`railiance-enablement`);
|
||||
- user-facing application release charts and app runbooks
|
||||
(`railiance-apps`);
|
||||
- application source code, package metadata, and image build definitions in
|
||||
source repos.
|
||||
|
||||
## T01 - Define the `railiance-forge` repository contract
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "129c2920-8868-4118-8772-ff5d925588cd"
|
||||
```
|
||||
|
||||
Write the initial repository contract for `railiance-forge`.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- `INTENT.md` explaining why the forge layer exists;
|
||||
- `SCOPE.md` defining in-scope and out-of-scope responsibilities;
|
||||
- `AGENTS.md` with State Hub integration, workplan prefix, repo identity, and
|
||||
task-status vocabulary aligned with current State Hub canon;
|
||||
- initial workplan convention using a new prefix, for example `FORGE-WP-`;
|
||||
- explicit relationship to `railiance-apps`, `railiance-enablement`,
|
||||
`railiance-platform`, and `railiance-cluster`.
|
||||
|
||||
Done when the repo can be created without ambiguity about whether forge work
|
||||
belongs there.
|
||||
|
||||
---
|
||||
|
||||
## T02 - Create and register the `railiance-forge` repository
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "0e71e5a6-ce3c-4334-abf3-750aa8bef4df"
|
||||
```
|
||||
|
||||
Create the sibling repository and connect it to the work coordination model.
|
||||
|
||||
Steps:
|
||||
|
||||
- create `/home/worsch/railiance-forge`;
|
||||
- initialize Git metadata and baseline files;
|
||||
- add `INTENT.md`, `SCOPE.md`, `AGENTS.md`, `README.md`, and a first workplan;
|
||||
- create or register the State Hub topic for the new repo;
|
||||
- run `make fix-consistency REPO=railiance-forge` once State Hub knows the repo;
|
||||
- verify that the new workplan appears as a State Hub workstream.
|
||||
|
||||
Done when `railiance-forge` is a first-class sibling repo with State Hub
|
||||
indexing and no reliance on `railiance-apps` as its planning home.
|
||||
|
||||
---
|
||||
|
||||
## T03 - Move current forge deployment surface out of `railiance-apps`
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T03
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "57162f50-d1a4-4fb3-b4fa-503939b22450"
|
||||
```
|
||||
|
||||
Move the existing Gitea/registry operational surface into `railiance-forge`.
|
||||
|
||||
Candidate source files in `railiance-apps`:
|
||||
|
||||
- `helm/gitea-values.sops.yaml`;
|
||||
- `helm/gitea-registry-values.yaml`;
|
||||
- `manifests/gitea-ingress.yaml`;
|
||||
- `releases/gitea/values.yaml`;
|
||||
- Gitea-related `Makefile` targets;
|
||||
- `docs/gitea-container-registry.md`;
|
||||
- `docs/gitea-package-registry.md`;
|
||||
- Gitea/package-registry sections in `SCOPE.md` and workplans.
|
||||
|
||||
Migration rules:
|
||||
|
||||
- preserve secret boundaries and do not expose decrypted values;
|
||||
- keep the existing deployed service stable during the repo move;
|
||||
- leave compatibility pointers in `railiance-apps` until operators have the
|
||||
new paths;
|
||||
- do not mix this move with the future Forgejo production cutover unless an
|
||||
explicit cutover workplan says to.
|
||||
|
||||
Done when `railiance-apps` no longer owns source-forge deployment config, and
|
||||
current Gitea operation has an equivalent or better home in `railiance-forge`.
|
||||
|
||||
Completed 2026-06-05: canonical registry docs moved to `railiance-forge`, with
|
||||
compatibility pointers left in `railiance-apps`. Deploy-capable Gitea
|
||||
Helm/SOPS/manifests and Makefile deploy/status targets also moved to
|
||||
`railiance-forge` after the review in
|
||||
`/home/worsch/railiance-forge/docs/deploy-capable-gitea-move-review.md`.
|
||||
`railiance-apps` temporarily kept compatibility wrappers until
|
||||
`RAILIANCE-WP-0006-T10`. No live deploy, SOPS decryption, or Kubernetes apply
|
||||
was run.
|
||||
|
||||
---
|
||||
|
||||
## T04 - Re-scope `railiance-apps` and `railiance-enablement`
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T04
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "64ac73e7-abe5-47df-abbc-51aed25a7ce4"
|
||||
```
|
||||
|
||||
Update the adjacent repos so the new abstraction is visible in their own
|
||||
contracts.
|
||||
|
||||
In `railiance-apps`:
|
||||
|
||||
- remove Gitea/Forgejo as an owned S5 workload after migration;
|
||||
- describe forge services as upstream release infrastructure consumed by app
|
||||
releases;
|
||||
- keep application charts, app-level runbooks, smoke tests, and deployment
|
||||
guardrails in S5;
|
||||
- update `RAILIANCE-WP-0005` tasks that mention Gitea package storage so they
|
||||
point to the new forge ownership.
|
||||
|
||||
In `railiance-enablement`:
|
||||
|
||||
- clarify that reusable CI/CD templates and developer portal paths live in S4;
|
||||
- clarify that the forge runtime, package registries, and Actions runner
|
||||
substrate live in `railiance-forge`;
|
||||
- define the handoff from S4 templates to forge-hosted automation.
|
||||
|
||||
Done when a new operator can answer "is this S4, S5, or forge?" from the scope
|
||||
documents without inspecting historical workplans.
|
||||
|
||||
Completed 2026-06-05: `railiance-apps/SCOPE.md` now treats forge as an upstream
|
||||
service and keeps app release ownership separate from moved Gitea operations.
|
||||
`railiance-enablement/SCOPE.md` and `railiance-enablement/INTENT.md` now
|
||||
separate reusable S4 workflow templates and paved paths from forge-owned
|
||||
runtime, registries, runner substrate, labels, credentials, and artifact
|
||||
evidence.
|
||||
|
||||
---
|
||||
|
||||
## T05 - Define artifact lifecycle, retention, and provenance policy
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T05
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "a99520c3-91dc-4af0-9f9b-0f0b53137be5"
|
||||
```
|
||||
|
||||
Make artifact ownership explicit for images, packages, and source-hosted
|
||||
release assets.
|
||||
|
||||
Cover:
|
||||
|
||||
- container image retention for smoke, development, and production tags;
|
||||
- Python package retention for internal libraries such as `issue-core`;
|
||||
- package/blob storage location and capacity inspection;
|
||||
- cleanup commands and operator guardrails;
|
||||
- artifact provenance expectations, including commit SHA tagging;
|
||||
- future SBOM, vulnerability scanning, signing, and attestations;
|
||||
- what evidence S5 needs before consuming a new artifact in an app release.
|
||||
|
||||
Done when package and image growth, cleanup, and release evidence are not
|
||||
implicit tribal knowledge.
|
||||
|
||||
Completed on 2026-06-05 in
|
||||
`/home/worsch/railiance-forge/docs/initial-operating-contracts.md`.
|
||||
|
||||
---
|
||||
|
||||
## T06 - Define CI runner, Actions, and GitOps ownership
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T06
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "37945939-e7c5-4717-83d9-294873810fb3"
|
||||
```
|
||||
|
||||
Set the boundary between forge runtime, S4 reusable automation, and S5 release
|
||||
checks.
|
||||
|
||||
Questions to answer:
|
||||
|
||||
- Which repo owns Gitea/Forgejo Actions runner deployment and credentials?
|
||||
- Which repo owns workflow templates?
|
||||
- Which repo owns app-specific workflows?
|
||||
- How are runner labels, placement, and secret access controlled?
|
||||
- What is the cutover path from current Gitea Actions to future Forgejo?
|
||||
- Which server-side manifest dry-run checks belong in S5, and which runner
|
||||
prerequisites belong in forge?
|
||||
|
||||
Done when CI runner substrate and CI/CD templates no longer blur together.
|
||||
|
||||
Completed 2026-06-05: the detailed ownership contract now lives in
|
||||
`/home/worsch/railiance-forge/docs/ci-runner-actions-gitops-ownership.md`.
|
||||
It defines runner deployment and credential ownership, reusable template
|
||||
ownership, app-specific workflow ownership, label and placement rules, secret
|
||||
access constraints, the Gitea-to-Forgejo automation cutover path, GitOps
|
||||
controller boundaries, and the split between S5 server-side dry-run checks and
|
||||
forge-owned runner prerequisites. `railiance-apps` and `railiance-enablement`
|
||||
now point at that contract from their scope/intent docs.
|
||||
|
||||
---
|
||||
|
||||
## T07 - Define backup, restore, and secret handoff contracts
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T07
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "da8bfbab-4bc3-48f0-9837-acf43fec9f0c"
|
||||
```
|
||||
|
||||
Document how forge data and credentials interact with platform services.
|
||||
|
||||
Cover:
|
||||
|
||||
- forge database and package blob backup responsibilities;
|
||||
- restore drills for source repos, packages, runners, and registry data;
|
||||
- handoff to `railiance-platform` for CNPG/object-storage backup mechanisms;
|
||||
- secret custody boundaries with SOPS/age bootstrap and OpenBao runtime
|
||||
delivery;
|
||||
- what operators may reference in docs without storing live secret values;
|
||||
- how S5 apps verify forge artifacts without owning registry credentials.
|
||||
|
||||
Done when the forge repo can state exactly what it owns, what S3 implements,
|
||||
and what evidence consumers can rely on.
|
||||
|
||||
Completed 2026-06-05: the detailed handoff contract now lives in
|
||||
`/home/worsch/railiance-forge/docs/backup-restore-secret-handoff.md`. It
|
||||
defines forge asset inventory, database/package blob restore gates,
|
||||
railiance-platform handoffs for CNPG, object storage, OpenBao, and runtime
|
||||
secret delivery, allowed versus forbidden operator references, SOPS/age
|
||||
bootstrap boundaries, and how S5 apps cite forge artifact restore evidence
|
||||
without owning registry credentials or package backup procedures.
|
||||
|
||||
---
|
||||
|
||||
## T08 - Add forge observability and operating evidence requirements
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T08
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "ce2ebe48-052a-4e3e-86aa-441274940078"
|
||||
```
|
||||
|
||||
Close the framework gap around observability for the forge layer.
|
||||
|
||||
Define at least:
|
||||
|
||||
- health checks for web, SSH, registry, package, and Actions endpoints;
|
||||
- log inspection and dashboard expectations;
|
||||
- storage growth checks for package/blob data;
|
||||
- alert thresholds or manual inspection commands;
|
||||
- release-readiness evidence that downstream app deployments can cite;
|
||||
- where future centralized observability should live if a dedicated repo or
|
||||
S3/S4 surface is created.
|
||||
|
||||
Done when forge availability and artifact health can be inspected without
|
||||
manually reconstructing the system.
|
||||
|
||||
Completed 2026-06-05: the detailed observability and operating evidence
|
||||
contract now lives in
|
||||
`/home/worsch/railiance-forge/docs/observability-operating-evidence.md`. It
|
||||
defines read-only health checks for web/API, Git SSH, OCI registry, Python
|
||||
package registry, Actions/runner, database, logs, and package storage; manual
|
||||
storage thresholds and intervention rules; release-readiness evidence that S5
|
||||
app deployments can cite; and the future split between forge-owned signal
|
||||
definitions, platform-owned durable metrics/log storage, enablement templates,
|
||||
and a possible future observability repo.
|
||||
|
||||
---
|
||||
|
||||
## T09 - Add Railiance Fabric declarations for forge capabilities
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T09
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "fd231acc-fe55-417f-a27b-797e1f520e1d"
|
||||
```
|
||||
|
||||
Use `railiance-fabric` to make the new layer visible as a graph contract.
|
||||
|
||||
Declare forge capabilities such as:
|
||||
|
||||
- source hosting;
|
||||
- Git SSH endpoint;
|
||||
- container registry;
|
||||
- Python package registry;
|
||||
- workflow runner substrate;
|
||||
- artifact promotion evidence.
|
||||
|
||||
Declare dependencies such as:
|
||||
|
||||
- Kubernetes runtime from `railiance-cluster`;
|
||||
- database, object storage, and secret services from `railiance-platform`;
|
||||
- CI/CD templates from `railiance-enablement`;
|
||||
- app release consumers from `railiance-apps`.
|
||||
|
||||
Done when State Hub or local Fabric tooling can show the new forge layer's
|
||||
provider and consumer edges without relying only on prose docs.
|
||||
|
||||
Completed 2026-06-05: `railiance-fabric` now declares the forge layer as a
|
||||
graph contract with source hosting, Git SSH, OCI registry, Python package
|
||||
registry, workflow runner substrate, and artifact promotion evidence
|
||||
capabilities. It also declares forge dependencies on Railiance Kubernetes,
|
||||
CNPG PostgreSQL, OpenBao runtime secrets, and planned object storage, plus S5
|
||||
and S4 consumer edges from `railiance-apps` and `railiance-enablement` to forge
|
||||
capabilities. The S4 template relationship is modeled as enablement consuming
|
||||
forge runner substrate, not forge depending on templates, to avoid a false
|
||||
cycle. Validation passed with `0 error(s), 0 warning(s)`.
|
||||
|
||||
---
|
||||
|
||||
## T10 - Decommission compatibility pointers after migration
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T10
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "5bca2a11-453c-4cd0-b86a-7ed269564dfb"
|
||||
```
|
||||
|
||||
After the new repo is operational, clean up transitional references.
|
||||
|
||||
Steps:
|
||||
|
||||
- remove stale Gitea/registry ownership language from `railiance-apps`;
|
||||
- update docs that point to old file locations;
|
||||
- archive or supersede workplans that were only about forge ownership in S5;
|
||||
- ensure State Hub workstreams reference the new repo for forge work;
|
||||
- run consistency sync for affected repos;
|
||||
- record a final decision that `railiance-forge` is the source of truth for
|
||||
forge runtime and artifact-registry operations.
|
||||
|
||||
Done when no active railiance workplan asks operators to modify forge runtime
|
||||
state from the wrong repo.
|
||||
|
||||
Completed 2026-06-05: removed the app-side Gitea deploy/status compatibility
|
||||
wrappers, deleted local registry pointer docs, changed `make check-sops` to
|
||||
require an explicit `SOPS_SENTINEL`, refreshed active workplans so package
|
||||
registry and token ownership point to `railiance-forge` and source repos,
|
||||
archived `RAIL-AP-WP-0001`, and recorded the final source-of-truth decision in
|
||||
`docs/forge-source-of-truth-decision.md` and State Hub.
|
||||
@@ -0,0 +1,125 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0007
|
||||
type: workplan
|
||||
title: "Deploy reuse-surface federation service on railiance01"
|
||||
domain: financials
|
||||
repo: railiance-apps
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
created: "2026-06-15"
|
||||
updated: "2026-06-15"
|
||||
state_hub_workstream_id: "7da18dd8-76b9-4a70-b9d7-de541afc65c0"
|
||||
---
|
||||
|
||||
# Deploy reuse-surface federation service on railiance01
|
||||
|
||||
Companion to **`reuse-surface` REUSE-WP-0011**. Own the S5 Helm release,
|
||||
ingress, and operator targets for the federation service on production cluster
|
||||
node `railiance01` (`92.205.62.239`).
|
||||
|
||||
## Goal
|
||||
|
||||
Expose the helix_forge federation API at **`https://reuse.coulomb.social`** so
|
||||
repos can register capability index URLs via `reuse-surface hub` without
|
||||
per-machine `sources.yaml` maintenance.
|
||||
|
||||
Gitea repo: `coulomb/reuse-surface`
|
||||
OCI image: `gitea.coulomb.social/coulomb/reuse-surface:<tag>`
|
||||
|
||||
## DNS evidence
|
||||
|
||||
`reuse.coulomb.social` A → **`92.205.62.239`** (operator confirmed 2026-06-15).
|
||||
Ingress host configured in `charts/reuse-surface/values.yaml`.
|
||||
|
||||
## Upstream dependency
|
||||
|
||||
| Upstream | Workplan | Required artifact |
|
||||
|---|---|---|
|
||||
| Service + image | `reuse-surface` REUSE-WP-0011 | Image `gitea.coulomb.social/coulomb/reuse-surface:<tag>`, `reuse-surface serve`, `/health` |
|
||||
|
||||
Do not deploy until REUSE-WP-0011-T04 publishes a buildable image.
|
||||
|
||||
## Placement
|
||||
|
||||
Follow the `inter-hub` pattern:
|
||||
|
||||
- `charts/reuse-surface/` — Helm chart (Deployment, Service, Ingress, PVC)
|
||||
- `helm/reuse-surface-values.yaml` — non-secret overrides (image tag)
|
||||
- Secret `reuse-surface-env` with `REUSE_SURFACE_TOKEN`
|
||||
- `Makefile` targets: `reuse-dry-run`, `reuse-deploy`, `reuse-status`, `reuse-logs`
|
||||
- Namespace: `reuse`
|
||||
|
||||
## Safety contract
|
||||
|
||||
- Do not commit decrypted SOPS values or `REUSE_SURFACE_TOKEN`.
|
||||
- Pin image tags in `helm/reuse-surface-values.yaml`.
|
||||
- PVC at `/data` for SQLite (`reuse.db`) and fetch cache.
|
||||
|
||||
---
|
||||
|
||||
## Scaffold Helm Chart For reuse-surface
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "d296f037-67a3-4b49-a773-6ebc2b252f3d"
|
||||
```
|
||||
|
||||
Create `charts/reuse-surface/` with Deployment (`reuse-surface serve`), Service,
|
||||
PVC, Ingress, probes on `/health`.
|
||||
|
||||
## Add Values, Secret Template, And Makefile Targets
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "5050e2fb-b60c-4519-9168-81a6073fb4a2"
|
||||
```
|
||||
|
||||
Add `helm/reuse-surface-values.yaml`, document Secret `reuse-surface-env`, and
|
||||
Makefile `reuse-*` targets.
|
||||
|
||||
## Configure Ingress For reuse.coulomb.social
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T03
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "80dc308a-02e8-453c-a20a-d6f634b7ce12"
|
||||
```
|
||||
|
||||
Ingress enabled in chart values:
|
||||
|
||||
- `ingress.host: reuse.coulomb.social`
|
||||
- `cert-manager.io/cluster-issuer: letsencrypt-prod`
|
||||
- Traefik annotations matching `inter-hub`
|
||||
|
||||
DNS A record live: `reuse.coulomb.social → 92.205.62.239`.
|
||||
|
||||
## Deploy Release To railiance01
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T04
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "14049fd1-3319-4a76-8b48-c4228a7939f7"
|
||||
```
|
||||
|
||||
Helm revision 3 (image `cb7a6e4`). Pod Running; `/health` and `/v1/federated`
|
||||
verified. TLS Ready after DNS A → `92.205.62.239`.
|
||||
|
||||
## Post-Deploy Verification And Runbook
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T05
|
||||
status: done
|
||||
priority: low
|
||||
state_hub_task_id: "30b08789-4eb7-4182-87d1-8e464fc968d1"
|
||||
```
|
||||
|
||||
Runbook `docs/reuse-surface-on-railiance01.md` updated with deploy evidence,
|
||||
token retrieval, and TLS/DNS operator note. Smoke checks pass via ingress
|
||||
and public TLS on DNS A → `92.205.62.239`.
|
||||
@@ -0,0 +1,169 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0008
|
||||
type: workplan
|
||||
title: "Add friendly landing pages for S5 service endpoints"
|
||||
domain: financials
|
||||
repo: railiance-apps
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
created: "2026-06-15"
|
||||
updated: "2026-06-15"
|
||||
state_hub_workstream_id: "41b9deea-a935-4372-8916-b3981336f597"
|
||||
---
|
||||
|
||||
# Add friendly landing pages for S5 service endpoints
|
||||
|
||||
Create a small, reusable landing-page pattern for public S5 application
|
||||
endpoints. The page should give humans a clear next step without changing the
|
||||
machine-facing API contract:
|
||||
|
||||
- UI-backed applications may show a friendly page with a short redirect notice
|
||||
and a visible **Continue** button to the correct login or application route.
|
||||
- API-only services should show a proper no-login informational page at `/`,
|
||||
with links to health/status/runbook documentation where appropriate.
|
||||
|
||||
First rollout target: **`https://reuse.coulomb.social`**, which hosts the
|
||||
`reuse-surface` federation API and intentionally has no browser login UI.
|
||||
|
||||
## Current Context
|
||||
|
||||
`RAILIANCE-WP-0007` deployed the `reuse-surface` Helm release and runbook for
|
||||
`reuse.coulomb.social`. The release is useful to machines and operators, but a
|
||||
human visiting `/` should not be left with an API-shaped response or a dead end.
|
||||
|
||||
The implementation must preserve the existing CLI/API surface:
|
||||
|
||||
- `/health` remains suitable for Kubernetes probes and external smoke checks.
|
||||
- `/v1/*` API routes continue to reach the `reuse-surface` service.
|
||||
- Authenticated federation operations continue to use `REUSE_SURFACE_TOKEN` and
|
||||
must not expose token material in the landing page, chart values, or logs.
|
||||
|
||||
## Design Constraints
|
||||
|
||||
- Keep the page static and non-secret.
|
||||
- Prefer explicit Ingress path routing over changing API semantics when a
|
||||
service has no upstream web UI.
|
||||
- Use exact `/` routing for the landing page when possible, so API prefixes are
|
||||
unaffected.
|
||||
- Add `noindex` metadata for operator/service landing pages that are not meant
|
||||
to be public marketing surfaces.
|
||||
- Keep the pattern reusable for later services with login destinations.
|
||||
|
||||
---
|
||||
|
||||
## Define Landing Page Contract
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0008-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "ca47efdb-ed0e-47bf-90f3-5a78c6861ebf"
|
||||
```
|
||||
|
||||
Document the endpoint contract in `docs/s5-app-onboarding-checklist.md` or a
|
||||
small companion doc:
|
||||
|
||||
- API-only service: static informational page at `/`, no login claim, no token
|
||||
hints, links to health/status and operator runbook.
|
||||
- UI-backed service: landing page may auto-forward after a short delay and must
|
||||
include a visible button to the canonical login/application route.
|
||||
- All variants must preserve health, API, OAuth callback, and asset paths.
|
||||
|
||||
Done when future S5 app workplans can cite a single landing-page rule instead
|
||||
of rediscovering the behavior per service.
|
||||
|
||||
## Add Reusable Helm Support
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0008-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "44dfe49f-a3f4-4381-a4ca-eb9218099868"
|
||||
```
|
||||
|
||||
Add a reusable Helm pattern for a static landing page, either as shared snippets
|
||||
or as chart-local templates following one documented convention. The pattern
|
||||
should support:
|
||||
|
||||
- enable/disable flag in values;
|
||||
- static HTML content from a ConfigMap;
|
||||
- a lightweight static HTTP container or equivalent service;
|
||||
- exact `/` Ingress path to the landing service;
|
||||
- prefix routes for API and app paths to the existing backend;
|
||||
- optional redirect target and button label for UI-backed applications.
|
||||
|
||||
Done when `helm template` can render both disabled and enabled landing-page
|
||||
variants without changing non-landing deployments.
|
||||
|
||||
## Implement API-Only Landing Page For reuse.coulomb.social
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0008-T03
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "4e90315d-f0f0-49ca-b44f-533c23a44e96"
|
||||
```
|
||||
|
||||
Enable the landing-page pattern in `charts/reuse-surface` and
|
||||
`helm/reuse-surface-values.yaml` for `reuse.coulomb.social`.
|
||||
|
||||
Content should make clear that this endpoint is the Railiance REUSE capability
|
||||
federation service, not a login application. Include non-secret operator links
|
||||
or text for:
|
||||
|
||||
- service purpose;
|
||||
- `/health`;
|
||||
- `/v1/federated`;
|
||||
- `docs/reuse-surface-on-railiance01.md`;
|
||||
- noindex metadata.
|
||||
|
||||
Done when a browser request to `/` receives the landing page while `/health`
|
||||
and `/v1/federated` still reach the API service.
|
||||
|
||||
## Add Login-Forward Variant For UI Applications
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0008-T04
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "80c843b2-68d0-4f8a-aba6-39db1a2a6f70"
|
||||
```
|
||||
|
||||
Prepare the values contract and at least one documented example for services
|
||||
that do have a browser UI. The page should explain that the user is being sent
|
||||
to the application and include a visible button to the configured login or
|
||||
application route.
|
||||
|
||||
Done when a future UI-backed app can set only values such as `redirectTarget`,
|
||||
`buttonLabel`, and explanatory copy, without editing templates.
|
||||
|
||||
## Verify, Deploy, And Update Runbook
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0008-T05
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "0fc250a8-8fea-4c88-bd81-ad7ecf6143a1"
|
||||
```
|
||||
|
||||
Render, deploy, and smoke-test the reuse landing page on railiance01:
|
||||
|
||||
- `make reuse-dry-run`;
|
||||
- deploy with pinned image and non-secret landing values;
|
||||
- verify `/`, `/health`, and `/v1/federated` over the public hostname;
|
||||
- confirm no secret values appear in manifests or rendered HTML;
|
||||
- update `docs/reuse-surface-on-railiance01.md` with the landing-page behavior
|
||||
and rollback notes.
|
||||
|
||||
Done when the public browser experience is friendly and the existing
|
||||
machine-facing API checks still pass.
|
||||
|
||||
Completed 2026-06-15: deployed Helm revision 5 to namespace `reuse`.
|
||||
`reuse-surface-landing` is Running, HTTP `/` redirects to HTTPS, HTTPS `/`
|
||||
returns `200 text/html`, `/health` returns `200 application/json`,
|
||||
`/v1/federated` returns `200 application/json` with 12 capabilities, and the
|
||||
fetched landing page contains no token references. Follow-up fix separated the
|
||||
API and landing Service selectors with `app.kubernetes.io/component` labels and
|
||||
changed ingress routing to explicit API paths (`/health`, `/v1`) plus landing
|
||||
fallback `/`.
|
||||
@@ -0,0 +1,29 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0009
|
||||
type: workplan
|
||||
title: "Inter-hub chart handoff"
|
||||
domain: financials
|
||||
repo: railiance-apps
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
created: "2026-06-14"
|
||||
updated: "2026-06-14"
|
||||
state_hub_workstream_id: "21c3fcdc-f014-4858-8028-0ac3038212b7"
|
||||
---
|
||||
|
||||
# Inter-hub chart handoff
|
||||
|
||||
## Add inter-hub chart handoff
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0009-T01
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "8d32b24c-652f-41b4-9aa9-062677b09039"
|
||||
```
|
||||
|
||||
Added `charts/inter-hub` and `helm/inter-hub-values.yaml` so IHUB-WP-0018 has
|
||||
a concrete Railiance app handoff chart matching the existing railiance-apps
|
||||
layout. The source Inter-Hub repo remains the owner of the runtime app code,
|
||||
workflow, and SOPS secret handoff.
|
||||
@@ -0,0 +1,55 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0010
|
||||
type: workplan
|
||||
title: "Inbox suggestion guardrails"
|
||||
domain: financials
|
||||
repo: railiance-apps
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
created: "2026-06-15"
|
||||
updated: "2026-06-15"
|
||||
state_hub_workstream_id: "a4f301bb-3f49-450a-8dbd-5cf6a16b8e0f"
|
||||
---
|
||||
|
||||
# Inbox suggestion guardrails
|
||||
|
||||
## Apply reuse-surface production guardrails
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0010-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "658bde0e-708f-42d9-9b8f-b096bb0b07d6"
|
||||
```
|
||||
|
||||
Added Railiance01 kubeconfig guardrails to the reuse-surface Makefile targets,
|
||||
documented the production kubeconfig restore path, added `make reuse-smoke`,
|
||||
and corrected stale `RAILIANCE-WP-0007` host/TLS notes to `92.205.62.239`.
|
||||
|
||||
## Document inter-hub rollout verification
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0010-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "857a4d75-cc09-4bd1-970d-5db866354ef3"
|
||||
```
|
||||
|
||||
Added Makefile support for `INTER_HUB_IMAGE_TAG=<sha>` deploy overrides, aligned
|
||||
server dry-run handling, and documented the release inspection, migration
|
||||
boundary, immutable legacy selector guardrail, and v2 API smoke gate in
|
||||
`docs/inter-hub-on-railiance01.md`. The production deploy target now requires an
|
||||
explicit image tag to avoid silently deploying a stale checked-in value.
|
||||
|
||||
## Adopt State Hub task status canon
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0010-T03
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "0192f41a-d267-4cb7-ac2d-d03de0341c18"
|
||||
```
|
||||
|
||||
Updated `AGENTS.md` task status examples from the legacy aliases to the
|
||||
canonical `wait`, `todo`, `progress`, `done`, and `cancel` values.
|
||||
@@ -0,0 +1,87 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0011
|
||||
type: workplan
|
||||
title: "Inter-Hub production trigger hardening"
|
||||
domain: financials
|
||||
repo: railiance-apps
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
created: "2026-06-15"
|
||||
updated: "2026-06-15"
|
||||
state_hub_workstream_id: "98cf42ae-9b64-4736-97e1-bae325ded1f9"
|
||||
---
|
||||
|
||||
# Inter-Hub production trigger hardening
|
||||
|
||||
## Goal
|
||||
|
||||
Turn the local Inter-Hub deploy surface into a safe production trigger for
|
||||
Railiance01. The trigger must refuse missing images before Helm, use the
|
||||
current Inter-Hub v2 API smoke contract, and expose a manual workflow path that
|
||||
has the same gates as an attended local operator deploy.
|
||||
|
||||
## Add OCI Image Preflight
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0011-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "10e27372-fb8b-40ac-b1f8-1c2c78fea0da"
|
||||
```
|
||||
|
||||
Add a reusable image manifest preflight for
|
||||
`gitea.coulomb.social/coulomb/inter-hub:<tag>` and wire production deploys to
|
||||
fail before Helm when the requested tag is absent or inaccessible.
|
||||
|
||||
## Split Baseline Render From Production Dry-Run
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0011-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "c48320db-9ed7-4792-89a6-f55691919891"
|
||||
```
|
||||
|
||||
Keep a baseline render target for chart validation with checked-in values, but
|
||||
make production-facing Inter-Hub dry-runs require an explicit
|
||||
`INTER_HUB_IMAGE_TAG`.
|
||||
|
||||
## Update Inter-Hub Smoke Contract
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0011-T03
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "b3260f7a-6dcb-4bb4-ae53-bf81c0081e86"
|
||||
```
|
||||
|
||||
Update `inter-hub-smoke` to match the current public-read/authenticated-write
|
||||
contract: `/api/v2/hubs` returns public discovery, protected resources reject
|
||||
anonymous access, and OpenAPI is served from `/api/v2/openapi.json`.
|
||||
|
||||
## Add Manual Production Deploy Workflow
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0011-T04
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "32ca0b17-fb7c-4cd5-a846-ff92933daf89"
|
||||
```
|
||||
|
||||
Add a `workflow_dispatch` Gitea Actions workflow that requires an immutable
|
||||
image tag and confirmation text, verifies the image manifest, runs Helm
|
||||
server-side dry-run, deploys, shows status, and runs smoke checks.
|
||||
|
||||
## Update Runbook And Closure Evidence
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0011-T05
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "0369b47a-09f0-4780-9c91-556049a0d505"
|
||||
```
|
||||
|
||||
Document the local and workflow production paths, failure classification for a
|
||||
missing image tag, current smoke expectations, and validation evidence for the
|
||||
implemented deploy surface.
|
||||
@@ -0,0 +1,622 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0002
|
||||
type: workplan
|
||||
title: "Establish vergabe-teilnahme as an Application on railiance01"
|
||||
domain: financials
|
||||
repo: railiance-apps
|
||||
status: finished
|
||||
owner: railiance
|
||||
topic_slug: railiance
|
||||
created: "2026-05-18"
|
||||
updated: "2026-05-18"
|
||||
planning_priority: high
|
||||
planning_order: 2
|
||||
state_hub_workstream_id: "94522a85-80d5-4f2c-8eb0-8d0bcb15f3b0"
|
||||
---
|
||||
|
||||
# Establish vergabe-teilnahme as an Application on railiance01
|
||||
|
||||
## Goal
|
||||
|
||||
Deploy the `vergabe-teilnahme` Django application as a governed Helm release on
|
||||
the production cluster node `railiance01` (`92.205.130.254`), reachable at
|
||||
`https://vergabe-teilnahme.whywhynot.de`, with its container image published
|
||||
through the Railiance Gitea OCI registry and its PostgreSQL data living in the
|
||||
shared cnpg cluster.
|
||||
|
||||
This establishes vergabe-teilnahme as the second application (after Gitea)
|
||||
running on the S5 layer of the Railiance OAS Stack and exercises the freshly
|
||||
enabled container registry from `RAILIANCE-WP-0001` end-to-end.
|
||||
|
||||
## Placement in the Railiance Tooling Set
|
||||
|
||||
This workplan lives in `railiance-apps` because vergabe-teilnahme is an S5
|
||||
application workload. The deployment surface added by this workplan is:
|
||||
|
||||
- `helm/vergabe-teilnahme-values.sops.yaml` — SOPS-encrypted Helm values
|
||||
(`DJANGO_SECRET_KEY`, DB DSN, etc.).
|
||||
- `releases/vergabe-teilnahme/` — chart selection / overlay (Bitnami generic
|
||||
chart or hand-rolled chart, decided in T05).
|
||||
- `manifests/vergabe-teilnahme-ingress.yaml` — Traefik ingress + cert-manager
|
||||
TLS for `vergabe-teilnahme.whywhynot.de`.
|
||||
- `Makefile` targets: `vergabe-deploy`, `vergabe-status`, `vergabe-migrate`.
|
||||
|
||||
Cross-repo coordination required:
|
||||
|
||||
| Concern | Owner repo | Coordination |
|
||||
|---------|------------|--------------|
|
||||
| Application Helm release | `railiance-apps` | This workplan |
|
||||
| Containerization (Dockerfile, entrypoint, asset build) | `vergabe-teilnahme` | This workplan opens a task in that repo |
|
||||
| PostgreSQL role + database in shared cnpg cluster | `railiance-platform` | Capability request via hub |
|
||||
| DNS A record for `vergabe-teilnahme.whywhynot.de` | DNS owner of `whywhynot.de` | Out-of-band, captured in T06 |
|
||||
| Ingress controller / cluster routing primitives | `railiance-cluster` | Reuse — no changes expected |
|
||||
| cert-manager ClusterIssuer `letsencrypt-prod` | `railiance-platform` | Reuse — no changes expected |
|
||||
|
||||
## Current Evidence
|
||||
|
||||
- `vergabe-teilnahme/CLAUDE.md`: Django 6.x · uv · Tailwind CSS v4 (Vite) ·
|
||||
HTMX 2.x · Alpine.js 3.x · PostgreSQL 16+ (psycopg3). German UI.
|
||||
- `vergabe-teilnahme/` currently has no `Dockerfile`. `docker-compose.dev.yml`
|
||||
documents the local Postgres but isn't started when the shared
|
||||
`infra-postgres-1` is up.
|
||||
- `railiance-apps/Makefile` deploys Gitea via `helm/gitea-values.sops.yaml`;
|
||||
the same SOPS + Helm pattern is the template for this workplan.
|
||||
- `RAILIANCE-WP-0001` confirmed `https://gitea.coulomb.social/v2/` returns
|
||||
the OCI registry auth challenge. Image naming convention established:
|
||||
`gitea.coulomb.social/<org>/<image>:<tag>`.
|
||||
- `manifests/gitea-ingress.yaml` confirms the ingress recipe:
|
||||
`ingressClassName: traefik` + annotation
|
||||
`cert-manager.io/cluster-issuer: letsencrypt-prod`.
|
||||
- Domain `whywhynot.de` has no prior references in any railiance repo —
|
||||
DNS and a fresh Let's Encrypt cert will need to be set up.
|
||||
- Live cnpg state: `gitea-db` runs in the `databases` namespace. T01
|
||||
confirms whether a single shared cluster exists for app DBs or whether
|
||||
one must be designated.
|
||||
|
||||
## Safety Contract
|
||||
|
||||
- Do not commit decrypted Helm values, the Django `SECRET_KEY`, DB
|
||||
credentials, or any other secret material.
|
||||
- Use a dedicated PostgreSQL role with privileges scoped to a single
|
||||
database; never reuse the gitea role or grant superuser.
|
||||
- No public exposure until cert-manager has successfully issued a TLS
|
||||
certificate for `vergabe-teilnahme.whywhynot.de`.
|
||||
- Do not start with `DEBUG=True`; production settings only.
|
||||
- Preserve Gitea behavior: the new ingress must not conflict with the
|
||||
`gitea` ingress on the cluster's default ingress controller.
|
||||
- If DB role provisioning requires changes to the shared cnpg cluster
|
||||
resource limits, pause and create a `railiance-platform` task.
|
||||
- If DNS for `whywhynot.de` is owned outside this operator's control,
|
||||
pause T06 until DNS ownership is confirmed.
|
||||
- Start fresh — no migration of data from any existing dev database in
|
||||
this workplan.
|
||||
|
||||
## Target State
|
||||
|
||||
- `vergabe-teilnahme:<tag>` image is built and pushed to
|
||||
`gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag>`.
|
||||
- A `vergabe` PostgreSQL role and `vergabe_db` database exist in the
|
||||
shared cnpg cluster (single role, single DB, no cross-app grants).
|
||||
- A Helm release `vergabe-teilnahme` is deployed in a dedicated
|
||||
namespace with a single replica, a Service, a PVC for media (if any),
|
||||
and the necessary secrets sourced from SOPS values.
|
||||
- Django `migrate` and `make seed` have run successfully against the
|
||||
shared cnpg database.
|
||||
- `https://vergabe-teilnahme.whywhynot.de` serves the Django app behind
|
||||
a valid Let's Encrypt certificate.
|
||||
- Login as a superuser succeeds; the dashboard renders; static assets
|
||||
are served correctly (Tailwind/Vite build pipeline integrated into the
|
||||
image).
|
||||
- Runbook, registry naming, DB credentials handling, and rollback steps
|
||||
are documented for the next operator.
|
||||
|
||||
## Tasks
|
||||
|
||||
### T01 — Inventory the deployment substrate
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0002-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "49aa7d85-96bd-4d97-952c-80dcfff06610"
|
||||
```
|
||||
|
||||
Confirm the pre-conditions before any code is written.
|
||||
|
||||
Checks:
|
||||
|
||||
- Identify the shared cnpg cluster intended for app databases (name,
|
||||
namespace, version, current databases/roles, available capacity).
|
||||
- Verify `gitea.coulomb.social/v2/` still returns an OCI registry auth
|
||||
challenge and that the operator has a package-capable token.
|
||||
- Verify cert-manager `letsencrypt-prod` ClusterIssuer is healthy and
|
||||
has successfully issued at least one production certificate recently
|
||||
(`gitea-tls` is a known good example).
|
||||
- Confirm DNS ownership and the change path for `whywhynot.de` — record
|
||||
who owns the zone and how an A record is added.
|
||||
- Confirm Traefik is the active ingress controller and note the public
|
||||
IP/hostname an A record must point at.
|
||||
|
||||
**Done when:** the workplan records (a) the cnpg cluster to use,
|
||||
(b) the operator's path to a Gitea package token, (c) the DNS change
|
||||
path for `whywhynot.de`, and (d) any pre-condition gaps.
|
||||
|
||||
**Findings (2026-05-18):**
|
||||
|
||||
- **cnpg landscape — no shared cluster yet.** `kubectl get clusters.postgresql.cnpg.io -A` returns two
|
||||
app-dedicated clusters in the `databases` namespace:
|
||||
- `gitea-db` — `ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie`, 1 instance, 10Gi
|
||||
- `net-kingdom-pg` — `ghcr.io/cloudnative-pg/postgresql:16`, 1 instance, 10Gi
|
||||
Neither was provisioned as a shared cluster. The user's earlier choice
|
||||
("shared cnpg cluster, new database role") therefore requires a sub-decision —
|
||||
see **Decision D-01** below.
|
||||
- **Gitea registry reachable.** `curl --resolve gitea.coulomb.social:443:92.205.130.254
|
||||
https://gitea.coulomb.social/v2/` returns `HTTP 405` for `HEAD` with a valid TLS chain
|
||||
(cert: `default/gitea-tls`, ready 3d). The OCI endpoint is up; HEAD-vs-GET is expected.
|
||||
- **Gitea package token — still required.** No package-capable PAT is currently held
|
||||
by the operator in this session (carryover blocker from `RAILIANCE-WP-0001-T04`).
|
||||
Token must be minted via the Gitea web UI by a user with `write:package` scope before T03.
|
||||
- **Public DNS for `whywhynot.de`:** A-record currently `217.160.0.212` (IONOS web hosting).
|
||||
Authoritative NS = `ns1126.ui-dns.{de,biz,com,org}` (IONOS / 1&1). The zone is
|
||||
administered through the operator's IONOS web console — DNS change is a manual
|
||||
out-of-band step.
|
||||
- **Traefik LB public IP:** `92.205.130.254` (`kube-system/traefik` LoadBalancer service,
|
||||
ports 80/443). This is the target the new A-record must point at.
|
||||
- **cert-manager:** `ClusterIssuer/letsencrypt-prod` is `Ready=True` (59d). Most recent
|
||||
successful issuance: `default/gitea-tls`, 3d4h ago. Several stale failing certs in
|
||||
`mfa` and `sso` namespaces are unrelated to this workplan.
|
||||
- **Pre-condition gaps before downstream tasks unblock:**
|
||||
1. D-01 below (cnpg target cluster) — blocks T04.
|
||||
2. Gitea package-capable PAT — blocks T03.
|
||||
3. DNS A-record for `vergabe-teilnahme.whywhynot.de → 92.205.130.254` —
|
||||
blocks T06.
|
||||
|
||||
**Decision D-01 — cnpg target for `vergabe_db`** (pending; required before T04):
|
||||
|
||||
| Option | Pros | Cons |
|
||||
|--------|------|------|
|
||||
| A. New dedicated cluster `vergabe-pg` | Matches the existing one-cluster-per-app pattern; clean blast radius | Resource cost grows linearly with apps; no actual "shared" cluster emerges |
|
||||
| B. Add role+db to existing `net-kingdom-pg` (PG 16) | Reuses a healthy PG 16 cluster matching vergabe-teilnahme's minimum; lowest cost | Cluster name no longer reflects its content; coupling with netkingdom domain |
|
||||
| C. Add role+db to existing `gitea-db` (PG 18) | Newest cluster image; same operator | Couples gitea ops with vergabe ops; name no longer reflects content |
|
||||
| D. Provision a new general-purpose cluster `apps-pg` (PG 16+) | Establishes a real shared cluster that future apps adopt | Net-new infra; needs a `railiance-platform` task to own the cluster |
|
||||
|
||||
Recommendation: **D** (creates the "shared cluster" the user asked for as a real
|
||||
artifact rather than retrofitting an existing name). Recorded as a pending hub decision.
|
||||
|
||||
**Resolution (2026-05-18, bernd):** Option D. Provision a new shared cnpg cluster
|
||||
`apps-pg` (PG 16, 1 instance, 10Gi initial) in namespace `databases`. cnpg `Cluster`
|
||||
CRs live in `railiance-platform` per ADR-003 (confirmed: `helm/gitea-db-cluster.yaml`).
|
||||
A coordination message has been sent to `railiance-platform` requesting the cluster.
|
||||
T04 below is now sequenced **after** that cluster reports healthy.
|
||||
|
||||
---
|
||||
|
||||
### T02 — Add Dockerfile and asset build to vergabe-teilnahme
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0002-T02
|
||||
status: done
|
||||
priority: high
|
||||
repo: vergabe-teilnahme
|
||||
state_hub_task_id: "43ce85c4-0bdb-43c4-b0a5-81fa366800a6"
|
||||
```
|
||||
|
||||
Open a companion task in the `vergabe-teilnahme` repo to add a
|
||||
production-ready container image definition.
|
||||
|
||||
Scope:
|
||||
|
||||
- Multi-stage `Dockerfile` at the repo root:
|
||||
- Stage 1: Node + Vite + Tailwind asset build (`npm ci` →
|
||||
`npm run build` → emits to `static/dist/`).
|
||||
- Stage 2: Python image, `uv sync --frozen`, copy app and built
|
||||
assets, run `manage.py collectstatic --noinput`.
|
||||
- Production WSGI/ASGI server (gunicorn) listening on `:8000`.
|
||||
- Whitenoise-style static asset serving (or document an alternative).
|
||||
- Non-root container user, sensible `WORKDIR`, healthcheck endpoint.
|
||||
- `.dockerignore` excluding `node_modules`, `media/`, `__pycache__`,
|
||||
`.venv`, `static/dist` source, etc.
|
||||
- Document the build command and the chosen image tag scheme in the
|
||||
vergabe-teilnahme README.
|
||||
|
||||
Coordination: this task crosses into `vergabe-teilnahme`. Track via a
|
||||
hub task and reference the PR/commit when complete.
|
||||
|
||||
**Done when:** `docker build` against the vergabe-teilnahme repo produces
|
||||
a runnable image that responds to a smoke request locally.
|
||||
|
||||
**Resolution (2026-05-18):** issue-facade was renamed to issue-core
|
||||
upstream. Rewired vergabe-teilnahme to depend on issue-core
|
||||
(`@ file:///home/worsch/issue-core`); the three Python imports were
|
||||
updated (`issue_tracker.*` → `issue_core.*`). All 20 aufgaben tests
|
||||
pass after the rewire. See vergabe-teilnahme commit `17f511f`.
|
||||
|
||||
Dockerfile delivered in vergabe-teilnahme repo:
|
||||
- Three stages (assets / python-deps / runtime) with whitenoise
|
||||
static serving and `collectstatic` at build time.
|
||||
- BuildKit named context resolves the `../issue-core` path dep:
|
||||
`docker build --build-context issue-core=/home/worsch/issue-core .`
|
||||
- Non-root `app` user, `/health/` HEALTHCHECK, gunicorn on :8000.
|
||||
- Smoke test: container reports `(healthy)`, `/health/` → 200.
|
||||
|
||||
**Original blocker (now resolved):** vergabe-teilnahme couldn't
|
||||
`uv sync` because `pyproject.toml` referenced
|
||||
`universal-issue-tracker @ file:///home/worsch/issue-facade`, but
|
||||
that directory was effectively empty (only `.claude/` remained).
|
||||
|
||||
```
|
||||
error: Failed to generate package metadata for
|
||||
`universal-issue-tracker==0.1.0 @ directory+../issue-facade`
|
||||
Caused by: /home/worsch/issue-facade does not appear to be a Python
|
||||
project, as neither `pyproject.toml` nor `setup.py` are present.
|
||||
```
|
||||
|
||||
Related candidate sources investigated:
|
||||
- `/home/worsch/issue-core/` — a separate package (`issue-core`), not
|
||||
the missing `universal-issue-tracker` facade.
|
||||
- `/home/worsch/markitect-main/_issue-tracking/issue-facade/` — does
|
||||
not exist.
|
||||
|
||||
This must be resolved upstream in `vergabe-teilnahme` (or by restoring
|
||||
`issue-facade`) before T02 can produce a buildable image. Options:
|
||||
|
||||
1. **Restore `issue-facade`** — recover the missing source (git
|
||||
reflog, backup, or recreate from `issue-core`'s public surface).
|
||||
2. **Repoint** `vergabe-teilnahme`'s dep to `issue-core` directly if
|
||||
that's the intended replacement, then update `uv.lock`.
|
||||
3. **Vendor** a minimal stub interface in `vergabe-teilnahme/vendor/`
|
||||
to unblock the container build; restore the real dep later.
|
||||
|
||||
Recommendation: route to whoever owns `issue-facade` (likely a
|
||||
`markitect` or `personhood` domain task) and pause T02 until the dep
|
||||
resolves cleanly outside Docker.
|
||||
|
||||
---
|
||||
|
||||
### T03 — Build and publish image to Gitea container registry
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0002-T03
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "d0f8db8c-fad9-4e0b-a404-9e3a04cffb05"
|
||||
```
|
||||
|
||||
Push the first production image of vergabe-teilnahme through the
|
||||
registry enabled in `RAILIANCE-WP-0001`.
|
||||
|
||||
Steps:
|
||||
|
||||
```bash
|
||||
docker login gitea.coulomb.social
|
||||
docker build -t gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag> \
|
||||
/home/worsch/vergabe-teilnahme
|
||||
docker push gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag>
|
||||
```
|
||||
|
||||
Choose a deterministic tag scheme (`<git-sha>` recommended). Record the
|
||||
exact image reference and digest used for the first deployment.
|
||||
|
||||
**Done when:** the image is fetchable from a disposable Kubernetes pod
|
||||
on the cluster.
|
||||
|
||||
**Done (2026-05-19):**
|
||||
|
||||
- Pushed `gitea.coulomb.social/coulomb/vergabe-teilnahme:483a4df`
|
||||
and `:latest` from the `vergabe-teilnahme:t02-smoke` build.
|
||||
- Image digest: `sha256:e9bbceb35b0239c835d339295a0ae1d2d8b6d08c02a7b4e992c0ecd37de86d7a`.
|
||||
- Token owner: `tegwick` (Bernd Worsch). Push namespace: `coulomb` org.
|
||||
- Cluster-side pull verified via `kubectl run vt-pull-test` —
|
||||
pod reached `Running` in ~7s with no `imagePullSecret`. **Package
|
||||
is public by default**; T05 does not need to wire an imagePullSecret
|
||||
unless the package is later made private in the Gitea web UI.
|
||||
|
||||
---
|
||||
|
||||
### T04 — Provision PostgreSQL role and database
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0002-T04
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "925ace1c-f9bf-4644-bd0b-637705d72ea6"
|
||||
```
|
||||
|
||||
Create a `vergabe` PostgreSQL role and `vergabe_db` database inside the
|
||||
new shared `apps-pg` cnpg cluster being provisioned by `railiance-platform`
|
||||
(per resolved decision D-01).
|
||||
|
||||
Blocked on: `apps-pg` cluster reaching `Cluster in healthy state` in
|
||||
namespace `databases`. Tracked by `railiance-platform`
|
||||
**`RAILIANCE-WP-0003`** (workstream
|
||||
`665b3b9b-608a-4be4-84b6-dcb8261ff57b`), proposed 2026-05-19 in
|
||||
response to the coordination thread.
|
||||
|
||||
Consumer recipe (from RAILIANCE-WP-0003 T06):
|
||||
1. Label the `vergabe-teilnahme` namespace
|
||||
`railiance.io/postgres-client=apps-pg` so the platform's ingress
|
||||
NetworkPolicy permits the connection.
|
||||
2. Create a credential Secret in that namespace for the `vergabe` role.
|
||||
3. Create a cnpg `Database` CR pointing at cluster `apps-pg` with
|
||||
`ownerName: vergabe` and the credential Secret.
|
||||
4. DSN: `postgresql://vergabe:...@apps-pg-rw.databases:5432/vergabe_db`,
|
||||
wired into the SOPS Helm values in T05.
|
||||
|
||||
Approach:
|
||||
|
||||
- Use a cnpg `Database` and `Role` resource — never an out-of-band `psql`
|
||||
change without recording it.
|
||||
- The role owns only `vergabe_db`; no `CREATEDB`, no superuser, no grants
|
||||
on other databases.
|
||||
- Capture the database DSN in the SOPS values file (T05).
|
||||
- If the cluster needs to grow (more instances, more storage, backup
|
||||
inclusion), pause and add a follow-up `railiance-platform` task — do
|
||||
not edit cluster-level resources from this repo.
|
||||
|
||||
**Done when:** the new role can connect to `vergabe_db` from inside the
|
||||
cluster (`kubectl run --rm -it psql ...`) and is recorded in the SOPS
|
||||
values used by T05.
|
||||
|
||||
**Done (2026-05-19):**
|
||||
|
||||
Platform side (in `railiance-platform`, commit `017934d`):
|
||||
|
||||
- `helm/apps-pg-cluster.yaml` adds `spec.managed.roles[vergabe]`
|
||||
(CNPG 1.28 role lifecycle is cluster-scoped — no standalone Role CR).
|
||||
- `helm/apps-pg-databases.yaml` (new) declares `Database/vergabe-db`
|
||||
with `name: vergabe_db`, `owner: vergabe`.
|
||||
- Bootstrap credential `databases/vergabe-app-credentials`
|
||||
(`kubernetes.io/basic-auth`, `username: vergabe`, generated password).
|
||||
|
||||
Apps side (this workplan):
|
||||
|
||||
- Namespace `vergabe-teilnahme` created and labeled
|
||||
`railiance.io/postgres-client=apps-pg` (per docs/apps-pg.md
|
||||
opt-in contract).
|
||||
- Credential Secret mirrored to
|
||||
`vergabe-teilnahme/vergabe-app-credentials` so the application pod
|
||||
can mount it. T05 will reference this Secret via `envFrom` or
|
||||
individual `valueFrom.secretKeyRef`.
|
||||
|
||||
DSN for T05's SOPS Helm values:
|
||||
|
||||
```
|
||||
postgresql://vergabe:${PASSWORD}@apps-pg-rw.databases:5432/vergabe_db
|
||||
```
|
||||
|
||||
End-to-end verification: `kubectl exec` into a pod in the
|
||||
`vergabe-teilnahme` namespace and run psql with the mirrored
|
||||
credentials — returns `vergabe | vergabe_db | PostgreSQL 16.13`.
|
||||
|
||||
---
|
||||
|
||||
### T05 — Author Helm release for vergabe-teilnahme
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0002-T05
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "29ba6add-6f23-4053-acb9-9d7efa0b3881"
|
||||
```
|
||||
|
||||
Add the chart selection (or bespoke chart) and SOPS-encrypted values
|
||||
that turn the published image into a Kubernetes Deployment.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- Decide chart approach: Bitnami `common` template, a thin in-repo
|
||||
chart under `charts/vergabe-teilnahme/`, or raw manifests. Record the
|
||||
rationale in the workplan log.
|
||||
- `helm/vergabe-teilnahme-values.sops.yaml` containing:
|
||||
- image repo + tag from T03,
|
||||
- env (DJANGO_SETTINGS_MODULE=`vergabe_teilnahme.settings.prod`,
|
||||
`ALLOWED_HOSTS`, `CSRF_TRUSTED_ORIGINS`),
|
||||
- `SECRET_KEY` (generated, never committed in clear),
|
||||
- DB DSN from T04,
|
||||
- resource requests/limits, single replica, readiness/liveness probes
|
||||
targeting the healthcheck endpoint introduced in T02.
|
||||
- A dedicated namespace (`vergabe-teilnahme`).
|
||||
- Optional: PVC for media uploads if Django `MEDIA_ROOT` is needed in
|
||||
v1; otherwise omit and document deferral.
|
||||
- `Makefile` targets: `vergabe-deploy`, `vergabe-status`,
|
||||
`vergabe-migrate`.
|
||||
|
||||
**Done when:** `make vergabe-deploy` renders cleanly with `--dry-run`
|
||||
and produces no plaintext secrets in the rendered manifest source.
|
||||
|
||||
**Done (2026-05-19):**
|
||||
|
||||
- Chart approach: thin in-repo chart `charts/vergabe-teilnahme/` rather
|
||||
than SOPS-encrypted values, because the only sensitive material
|
||||
(`SECRET_KEY`, `DATABASE_URL`) lives in K8s Secrets (cnpg's
|
||||
`vergabe-app-credentials` + the assembled `vergabe-teilnahme-env`),
|
||||
not in Helm values. `helm/vergabe-teilnahme-values.yaml` is therefore
|
||||
plain YAML — image tag, hostnames, no secrets.
|
||||
- `make vergabe-dry-run` renders 2 objects (Deployment + Service);
|
||||
`grep -iE 'SECRET_KEY=|DATABASE_URL=|password'` returns empty.
|
||||
- Deploy revision 2 is live: pod Running 1/1, probes green. The
|
||||
HTTP-probe `httpGet.httpHeaders[Host]` is set to the public hostname
|
||||
so Django's `ALLOWED_HOSTS` check passes for kube-probe (the v1
|
||||
fix took one iteration — earlier attempts failed liveness with HTTP
|
||||
400 because the probe sent `Host: 10.42.x.x:8000`).
|
||||
- `Makefile` targets added: `vergabe-dry-run`, `vergabe-deploy`,
|
||||
`vergabe-ingress-deploy`, `vergabe-status`, `vergabe-migrate`,
|
||||
`vergabe-seed`, `vergabe-superuser`, `vergabe-logs`.
|
||||
|
||||
**Lesson recorded:** the base64-generated bootstrap password contains
|
||||
`=`, `+`, `/`; embedding it raw in `DATABASE_URL` confuses
|
||||
`dj-database-url` (it parses `:password@host:5432/db` and the `=`
|
||||
broke the DB name into 80 characters). The Secret now stores a
|
||||
URL-encoded password inside `DATABASE_URL` while the raw password
|
||||
remains in `vergabe-app-credentials.password`. Future apps should
|
||||
either URL-encode at Secret-build time or use individual env vars.
|
||||
|
||||
---
|
||||
|
||||
### T06 — DNS, ingress, and TLS for vergabe-teilnahme.whywhynot.de
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0002-T06
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "8e673ee6-5338-4eb5-8973-a1818b4dc7f5"
|
||||
```
|
||||
|
||||
Make the application reachable behind a valid Let's Encrypt certificate.
|
||||
|
||||
Steps:
|
||||
|
||||
- Add an A record `vergabe-teilnahme.whywhynot.de` →
|
||||
cluster public IP (per T01). Use the DNS change path captured in T01.
|
||||
- Add `manifests/vergabe-teilnahme-ingress.yaml` modeled on
|
||||
`gitea-ingress.yaml`:
|
||||
- `ingressClassName: traefik`,
|
||||
- annotation `cert-manager.io/cluster-issuer: letsencrypt-prod`,
|
||||
- `tls.secretName: vergabe-teilnahme-tls`,
|
||||
- host `vergabe-teilnahme.whywhynot.de`, backend the Service from T05.
|
||||
- Wait for cert-manager to issue the cert.
|
||||
- Validate `https://vergabe-teilnahme.whywhynot.de/healthz` (or
|
||||
equivalent) returns 200 with a trusted cert chain.
|
||||
|
||||
Boundary note: ingress controller and cluster networking changes
|
||||
belong in `railiance-cluster`. This task only adds an `Ingress`
|
||||
resource that consumes the existing controller.
|
||||
|
||||
**Done when:** the public hostname serves the app over HTTPS and the
|
||||
certificate chain validates from outside the cluster.
|
||||
|
||||
**Progress (2026-05-18):**
|
||||
|
||||
- ✅ DNS A record live: `vergabe-teilnahme.whywhynot.de → 92.205.130.254`
|
||||
(TTL 3600; served authoritatively by `ns1126.ui-dns.*`).
|
||||
- ✅ Traefik routing reaches the cluster: HTTP probe returns 404 — the
|
||||
expected pre-state because no Ingress rule matches the host yet.
|
||||
- ✅ `manifests/vergabe-teilnahme-ingress.yaml` committed; Traefik +
|
||||
cert-manager letsencrypt-prod.
|
||||
- ✅ `vergabe-teilnahme-tls` issued by cert-manager in ~35s (HTTP-01).
|
||||
- ✅ External HTTPS probes: `/health/` returns 200 `{"status":"ok"}`;
|
||||
`/` redirects (302) to `/ausschreibungen/dashboard/` which renders
|
||||
`<title>Übersicht</title>` (German UI); `/admin/login/` shows the
|
||||
German Django admin login page. `curl` reports
|
||||
`SSL verify_result: 0` (trusted chain).
|
||||
|
||||
---
|
||||
|
||||
### T07 — Initial migration, seed, and smoke test
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0002-T07
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "be1decb5-b734-4312-b98d-20ed5299d02c"
|
||||
```
|
||||
|
||||
Bring the app to a usable state in production.
|
||||
|
||||
Steps:
|
||||
|
||||
- Run `manage.py migrate` as a Kubernetes `Job` or one-shot
|
||||
`kubectl exec` against the running Deployment (record which).
|
||||
- Run `manage.py seed` (the `make seed` target) — vergabe-teilnahme's
|
||||
idempotent seed.
|
||||
- Create the first superuser via `manage.py createsuperuser`.
|
||||
- Smoke checklist:
|
||||
- Login at `/admin/` succeeds.
|
||||
- The dashboard at `/` renders without errors.
|
||||
- Static assets (Tailwind build output) are served with correct
|
||||
content-type and 200 status.
|
||||
- HTMX partial requests succeed on at least one page.
|
||||
- A new `Ausschreibung` can be created and saved.
|
||||
|
||||
**Done when:** the smoke checklist passes and `kubectl logs` shows no
|
||||
unexpected errors.
|
||||
|
||||
**Done (2026-05-19, with deliberate deferrals):**
|
||||
|
||||
- ✅ `manage.py migrate` ran via `make vergabe-migrate` against the
|
||||
live deployment. All Django apps migrated (`accounts`, `core`,
|
||||
`ausschreibungen`, `lose`, `aufgaben`, `dokumente`, `preise`,
|
||||
`partner`, `bibliothek`, `marktbegleiter`, `nachbetrachtung`,
|
||||
`feedback`, plus framework apps).
|
||||
- ❌ `make seed` (= `seed_dev`) deliberately **skipped**: it creates a
|
||||
hardcoded dev user `max.muster / testpass123`. Not prod-safe.
|
||||
- ❌ `createsuperuser` deferred to the operator (interactive
|
||||
credential should not be minted through this session). Recipe in
|
||||
`docs/vergabe-teilnahme.md`.
|
||||
- ✅ Smoke (no-auth surface):
|
||||
- `/health/` → `200 {"status":"ok"}`
|
||||
- `/` → `302 → /ausschreibungen/dashboard/` → `200`, page title
|
||||
`Übersicht`.
|
||||
- `/admin/login/` → `200`, title
|
||||
`Anmelden | Django-Systemverwaltung` (German Django admin).
|
||||
- Static assets: `/static/dist/main.css` 200 (Tailwind),
|
||||
`/static/admin/css/base.css` 200 (Django admin),
|
||||
`/static/vendor/{alpinejs,htmx}/...` referenced from the
|
||||
rendered HTML.
|
||||
- ❌ Auth-required smoke (login, create Ausschreibung) deferred to the
|
||||
operator after `createsuperuser`.
|
||||
- ✅ `kubectl logs` clean — only gunicorn boot + kube-probe 200s.
|
||||
|
||||
---
|
||||
|
||||
### T08 — Document handoff, runbook, and backup posture
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0002-T08
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "594d3591-b61f-40c4-850c-efaa02c859ed"
|
||||
```
|
||||
|
||||
Capture everything an on-call operator needs.
|
||||
|
||||
Deliverables in `docs/vergabe-teilnahme.md`:
|
||||
|
||||
- Registry image naming and tag scheme.
|
||||
- Namespace, Deployment, Service, Ingress names.
|
||||
- DB DSN handling (where secrets live, how to rotate).
|
||||
- Restart, rollback (`helm rollback`), and migration commands.
|
||||
- Backup posture: confirm whether the shared cnpg cluster's backup job
|
||||
includes `vergabe_db`; if not, open a `railiance-platform` follow-up.
|
||||
- Pointer to the vergabe-teilnahme repo for app-level changes vs.
|
||||
`railiance-apps` for Helm/ingress changes.
|
||||
|
||||
**Done when:** a new operator can find vergabe-teilnahme, deploy a new
|
||||
image tag, and recover from a pod crash without reading this workplan.
|
||||
|
||||
**Done (2026-05-19):** `docs/vergabe-teilnahme.md` covers identity,
|
||||
secrets + rotation recipes (DB password and SECRET_KEY), day-to-day
|
||||
make targets, image promotion + rollback, troubleshooting
|
||||
(kube-probe Host header, DSN URL-encoding, cert-manager failure
|
||||
modes), open backup posture, and cross-references to the improvements
|
||||
backlog (`RAILIANCE-WP-0004`), the shared DB cluster doc, and the
|
||||
container registry doc.
|
||||
|
||||
## Completion Criteria
|
||||
|
||||
This workplan is complete when:
|
||||
|
||||
1. The vergabe-teilnahme image is published to
|
||||
`gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag>`.
|
||||
2. A dedicated PostgreSQL role and database serve the app from the
|
||||
shared cnpg cluster.
|
||||
3. `helm/vergabe-teilnahme-values.sops.yaml` and the ingress manifest
|
||||
are committed; `make vergabe-deploy` is the single command to deploy.
|
||||
4. `https://vergabe-teilnahme.whywhynot.de` serves the app over HTTPS
|
||||
with a valid Let's Encrypt cert.
|
||||
5. Migrations + seed have run; the smoke checklist passes.
|
||||
6. Runbook is in `docs/vergabe-teilnahme.md`.
|
||||
|
||||
## Notes
|
||||
|
||||
- This is the second application on `railiance01` (after Gitea). It
|
||||
intentionally adopts the same SOPS + Helm + Traefik + cert-manager
|
||||
pattern so the operator workflow stays consistent.
|
||||
- v1 deliberately defers: 3-stage canary (Staged Promotion Lifecycle is
|
||||
still 0/7), SSO/Keycloak integration, S3-backed media storage,
|
||||
Celery + Redis workers (optional in the architecture blueprint), and
|
||||
multi-replica HA. Each can become its own follow-up workplan once the
|
||||
baseline runs.
|
||||
- The `whywhynot.de` domain enters the Railiance stack for the first
|
||||
time here. Treat the DNS path established in T01/T06 as the reference
|
||||
for any future `*.whywhynot.de` workloads.
|
||||
@@ -0,0 +1,259 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0004
|
||||
type: workplan
|
||||
title: "App deployment improvements (lessons from RAILIANCE-WP-0002)"
|
||||
domain: financials
|
||||
repo: railiance-apps
|
||||
status: finished
|
||||
owner: railiance
|
||||
topic_slug: railiance
|
||||
planning_priority: medium
|
||||
created: "2026-05-19"
|
||||
updated: "2026-06-05"
|
||||
state_hub_workstream_id: "b61a9aca-4e43-4b3d-a48b-999e0fa842cf"
|
||||
---
|
||||
|
||||
# App deployment improvements
|
||||
|
||||
This workplan collects concrete follow-ups surfaced while shipping
|
||||
`vergabe-teilnahme` under `RAILIANCE-WP-0002`. Each item is small,
|
||||
independent, and can be picked up in isolation when the next S5 app
|
||||
lands or when the next operator onboards. Activated on 2026-05-22;
|
||||
local railiance-apps guardrails are implemented, with the package
|
||||
publication item completed through the forge-owned Gitea package registry.
|
||||
|
||||
## I01 — URL-encode DB passwords at Secret-build time
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-I01
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "a05a855a-00a0-4e0e-ba82-27e0a072f777"
|
||||
```
|
||||
|
||||
**Problem.** cnpg-generated bootstrap passwords come from
|
||||
`openssl rand -base64 N` and contain `=`, `+`, `/`. Embedded raw in
|
||||
`DATABASE_URL`, those characters confuse `dj-database-url` (it parsed
|
||||
`vergabe:<pw>@apps-pg-rw:5432/vergabe_db` as having an 80-character
|
||||
database name). Cost us one Helm revision and one pod restart to
|
||||
diagnose.
|
||||
|
||||
**Fix.** Add a tiny helper (shell script or Makefile target) that
|
||||
takes the raw role password from the cnpg secret and emits the
|
||||
DSN-ready URL-encoded form into the consumer-namespace env Secret.
|
||||
Alternative: switch to individual env vars (`POSTGRES_HOST`,
|
||||
`POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB`) so no URL
|
||||
parsing is needed at all.
|
||||
|
||||
**Where it lives:** new `tools/` script + Makefile target, or chart
|
||||
helper template.
|
||||
|
||||
**Implemented 2026-05-22.** Added `tools/build-database-url-secret.sh`
|
||||
and `make vergabe-db-url-secret`; updated the app runbook to use the
|
||||
helper during DB password rotation.
|
||||
|
||||
---
|
||||
|
||||
## I02 — Document the Django + kube-probe Host-header pattern
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-I02
|
||||
status: done
|
||||
priority: low
|
||||
state_hub_task_id: "22a212e6-31b1-490a-8d1c-0a33ddc62501"
|
||||
```
|
||||
|
||||
**Problem.** The kube-probe sends `Host: <pod-ip>:8000`. With
|
||||
production Django settings (`DEBUG=False`, narrow `ALLOWED_HOSTS`),
|
||||
that fails the Host validation and returns `HTTP 400 Bad Request`,
|
||||
which the kubelet treats as Unhealthy. First deploy revision
|
||||
restarted on liveness failure for ~5 minutes before diagnosis.
|
||||
|
||||
**Fix.** The `charts/vergabe-teilnahme` chart already sets
|
||||
`httpGet.httpHeaders[Host]` from `probes.hostHeader`. Promote this
|
||||
pattern into a documented "Django-on-Railiance" recipe (short doc in
|
||||
`docs/`) so the next Django app starts there rather than rediscovering
|
||||
the gotcha. Also worth a "common chart values" sketch if a second
|
||||
Django app justifies the abstraction.
|
||||
|
||||
**Implemented 2026-05-22.** Added `docs/django-on-railiance.md` and
|
||||
cross-linked it from the `vergabe-teilnahme` runbook.
|
||||
|
||||
---
|
||||
|
||||
## I03 — Publish `issue-core` to a Gitea Python package registry
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-I03
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "f412b874-0670-4a4a-89fc-575fe4994646"
|
||||
```
|
||||
|
||||
**Problem.** `vergabe-teilnahme/pyproject.toml` has a path dependency
|
||||
on `../issue-core`. Building the container image therefore requires
|
||||
the `--build-context issue-core=/home/worsch/issue-core` BuildKit
|
||||
flag, which is operator-machine-specific and breaks CI builds /
|
||||
remote builds / other workstations.
|
||||
|
||||
**Fix.** Enable the Gitea Python package registry (analogous to the
|
||||
container registry from `RAIL-AP-WP-0001`), publish `issue-core` as a
|
||||
proper wheel with version, and switch the dep to
|
||||
`issue-core>=0.2,<0.3` with a normal index URL. The Dockerfile then
|
||||
drops the `--build-context` and the build becomes portable.
|
||||
|
||||
**Coordination:** depends on the forge-owned Gitea PyPI endpoint and package
|
||||
token posture in `railiance-forge`, plus a release pipeline for `issue-core`
|
||||
in its source repo.
|
||||
|
||||
**Local progress 2026-05-22.** `helm/gitea-registry-values.yaml` set
|
||||
`packages.LIMIT_SIZE_PYPI: -1` while Gitea was still operated from this repo.
|
||||
That registry operating surface has since moved to `railiance-forge`; current
|
||||
PyPI endpoint docs live at
|
||||
`/home/worsch/railiance-forge/docs/gitea-package-registry.md`. The remaining
|
||||
release and dependency change must happen in the `issue-core` and
|
||||
`vergabe-teilnahme` repos.
|
||||
|
||||
**Cross-repo progress 2026-05-23.** `issue-core` now has a validated
|
||||
`make package-check` build and Gitea Actions publish workflow for the
|
||||
`0.2.x` package series. `vergabe-teilnahme` has been switched in
|
||||
`pyproject.toml` to `issue-core>=0.2,<0.3`, with the Docker named
|
||||
`issue-core` build context removed in favor of the Gitea PyPI index.
|
||||
The final unblock still requires a Gitea package username/token to
|
||||
publish `issue-core==0.2.0`; once published, regenerate
|
||||
`vergabe-teilnahme/uv.lock` from the registry and mark this task done.
|
||||
|
||||
**Completed 2026-06-05.** Published `issue-core==0.2.0` to the Coulomb Gitea
|
||||
PyPI registry using an operator token read from `/tmp/gat.tmp` without
|
||||
recording the secret value. `railiance-forge` exposed the approved
|
||||
`/api/packages` ingress path, the public package-specific simple index returned
|
||||
`200`, a clean temporary environment installed `issue-core==0.2.0` from Gitea,
|
||||
and `vergabe-teilnahme/uv.lock` was regenerated so it uses the Gitea registry
|
||||
instead of `../issue-core`.
|
||||
|
||||
---
|
||||
|
||||
## I04 — Operator onboarding: install the `kubectl cnpg` plugin
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-I04
|
||||
status: done
|
||||
priority: low
|
||||
state_hub_task_id: "2f44cad1-b70c-4406-91a9-0c0fa9c75583"
|
||||
```
|
||||
|
||||
**Problem.** `make vergabe-status`, `apps-pg-status`, `db-shell` use
|
||||
`kubectl cnpg ...` first and fall back to bare `kubectl` when the
|
||||
plugin is missing. The fallback works but the cnpg plugin gives much
|
||||
better cluster diagnostics (`status` table, primary/replica health,
|
||||
backup state).
|
||||
|
||||
**Fix.** Add the plugin install command to operator onboarding (one
|
||||
line: `kubectl krew install cnpg` or a direct binary download). Add
|
||||
a `make check-tools` target that warns when `kubectl cnpg` or `helm`
|
||||
is missing.
|
||||
|
||||
**Implemented 2026-05-22.** Added `make check-tools`,
|
||||
`docs/operator-setup.md`, and cnpg fallback status output for Gitea and
|
||||
the shared `apps-pg` cluster.
|
||||
|
||||
---
|
||||
|
||||
## I05 — Operator onboarding: SOPS / age key bootstrap
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-I05
|
||||
status: done
|
||||
priority: low
|
||||
state_hub_task_id: "741d8a73-8cb0-40ac-a218-f1d3a74ebef3"
|
||||
```
|
||||
|
||||
**Problem.** Several Makefile targets read `helm/*.sops.yaml` via
|
||||
`sops -d`. A new operator with no `~/.config/sops/age/keys.txt`
|
||||
sees a confusing decryption failure rather than a clear "you need
|
||||
the age key" message. The session that produced this workplan had to
|
||||
skip the SOPS template step for `apps-pg-secret.sops.yaml.template`.
|
||||
|
||||
**Fix.** Add a `docs/operator-setup.md` with the age key handoff
|
||||
procedure (where to put the key, how to verify, how to rotate). A
|
||||
`make check-sops` target that asserts the keys file exists and can
|
||||
decrypt a known sentinel would catch this at the first deploy attempt
|
||||
rather than at the failing apply.
|
||||
|
||||
**Implemented 2026-05-22.** Added `docs/operator-setup.md`,
|
||||
`tools/check-sops.sh`, and `make check-sops`. After the forge extraction,
|
||||
`make check-sops` requires an explicit `SOPS_SENTINEL=<encrypted-file>` so this
|
||||
repo does not depend on forge-owned Gitea SOPS files.
|
||||
|
||||
---
|
||||
|
||||
## I06 — CI guard against stale committed manifests vs live CRD drift
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-I06
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "a319c20b-993c-46b7-889a-f0ac738056c4"
|
||||
```
|
||||
|
||||
**Problem.** `helm/gitea-db-cluster.yaml` (in `railiance-platform`)
|
||||
had `spec.postgresql.version: "16"` — a field that has never
|
||||
existed in the CNPG v1 schema. The committed manifest had silently
|
||||
diverged from the live cluster for months and would have rejected on
|
||||
the next `make db-deploy`. Caught only by trying to apply a new file
|
||||
that copied the same stale shape.
|
||||
|
||||
**Fix.** Add a per-PR CI job that runs
|
||||
`kubectl apply --dry-run=server -f <changed-yaml>` against a
|
||||
representative cluster (or a kind cluster seeded with the same CRDs).
|
||||
The cnpg / cert-manager / Traefik CRDs change between operator
|
||||
releases; strict server-side decoding catches drift that
|
||||
`yamllint` and Helm template rendering miss.
|
||||
|
||||
**Note.** Primarily a `railiance-platform` and `railiance-cluster`
|
||||
concern, but mirrored here because every S5 manifest in
|
||||
`charts/` and `manifests/` carries the same risk.
|
||||
|
||||
**Implemented 2026-05-22.** Added `tools/k8s-server-dry-run.sh`,
|
||||
`make k8s-server-dry-run`, and a `.gitea/workflows/` PR workflow that
|
||||
runs the guard when charts, Helm values, manifests, or the dry-run tool
|
||||
change.
|
||||
|
||||
---
|
||||
|
||||
## I07 — `kubectl run --rm -i` smoke pattern is unreliable
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-I07
|
||||
status: done
|
||||
priority: low
|
||||
state_hub_task_id: "e3f59b3d-95c8-4cf9-9943-b1597954fd77"
|
||||
```
|
||||
|
||||
**Problem.** Repeated false negatives when testing service-IP
|
||||
connectivity with `kubectl run --rm -i …`: the smoke pod exits
|
||||
before the connection completes, producing "Connection refused"
|
||||
output even though the destination service was fully healthy. Wasted
|
||||
significant debugging time during apps-pg verification before
|
||||
switching to a persistent pod + `kubectl exec`.
|
||||
|
||||
**Fix.** Add an `docs/operator-recipes.md` note (or inline in the
|
||||
runbook) recommending the persistent-pod-plus-exec pattern for any
|
||||
service-IP smoke check. Optional: ship `tools/smoke.sh` that
|
||||
wraps the pattern.
|
||||
|
||||
**Implemented 2026-05-22.** Added `docs/operator-recipes.md` and
|
||||
`tools/smoke-service.sh`.
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- Items were activated on 2026-05-22 and completed on 2026-06-05. I03 closed
|
||||
after `issue-core==0.2.0` was published to the Gitea PyPI registry, the
|
||||
package API route was exposed by `railiance-forge`, and the
|
||||
`vergabe-teilnahme` source lock moved off the sibling checkout.
|
||||
- I06 is genuinely cross-repo; the others are local to
|
||||
`railiance-apps` or its operator workflow.
|
||||
- The first three items (I01, I02, I03) are the highest-leverage
|
||||
for the second S5 app onboarding.
|
||||
Reference in New Issue
Block a user