generated from coulomb/repo-seed
docs(workplan): update NK-WP-0001 with resolved decisions D1/D2/D3
- Add Decisions table summarising D1 (KeePassXC→Vault), D2 (Keycloak-internal hybrid + file-based bootstrap), D3 (plain Helm, AI-first philosophy) - Split T01 into Phase 0a (pre-cluster KeePassXC) and Phase 0b (in-cluster Vault transition) per D1 - Update T05 to explicitly reference D3 (plain Helm first) - Update T06 to state the D2 identity decision rather than re-opening it - Update T07: remove "decide" language, implement decided approach, add D2 bootstrap user management scope note - Update T08: add Vault unseal key backup to the backup list - Replace Open Questions with remaining unresolved items (5 items) - Add DECISIONS.md (decision log auto-generated by State Hub) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
37
DECISIONS.md
Normal file
37
DECISIONS.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# Decision Log
|
||||
|
||||
_Auto-generated by the Custodian State Hub._
|
||||
|
||||
## D1 — Vault backend: KeePassXC vs HashiCorp Vault in-cluster
|
||||
|
||||
**Date:** 2026-03-01
|
||||
**Decided by:** Tegwick
|
||||
|
||||
We will follow the suggested path with the following intent: The build up of the security infrastructure needs to be secure from scratch by design. We thus need to bootstrap secure infrastructure before even getting to the stage where a kubernetes cluster is available. This should enable the setup and operation of dev, test and sandbox systems that are separate from production, too. We will base this on KeePassXC. Secure access to KeePassXC is in the responsibility of the user setting up the pre cluster system and should come with the recommendation of keeping the initial master password in a personal password manager. Once a cluster setup is available the credential management will transition to hashicorp in-cluster.
|
||||
|
||||
---
|
||||
|
||||
## D2 — Identity source of truth: Keycloak-internal vs LDAP/AD/Entra
|
||||
|
||||
**Date:** 2026-03-01
|
||||
**Decided by:** Tegwick
|
||||
|
||||
We will go for a hybrid approach that builds on keycloak internal users and extends to ldap/entra federation where appropriate. Here is some background information that went into the decision: 1) there is no existing user base we need to respect at this point in time. 2) there will be the need to integrate ldap/entra users but this is enterprise customer functionality and it will be some time before we offer enterprise plans. this type of integration might well be one of the main drivers for enterprises to purchase an enterprise account when using the hosted turnkey solution. 3) As an extension we also need to consider the bootstrapping usecase: before a ha keycloak infrastructure for user managment is available and for systems that will not connect to it (dev, test, sandbox, ...) some minimal user management without keycloak should be available for ease of use and lightweight spin up of instances. In this case let's come up with a very simplistic single plus test users infrastructure where users are represented as files in a secure subdirectory of the home directory of the user with presets for the user from its linux user settings plus an email adress and automatically generate 2 (optionally more) test users from this base user by adding "N" and "+testN" suffixes to the username and email adress. For me for example the default woule be to have username=tegwick, fullname="Bernd Worsch", email="bernd.worsch@gmail.com" and generate the testusers "tegwick1, Bernd Worsch+test1, bernd.worsch+test1@gmail.com" and "tegwick2, Bernd Worsch+test2, bernd.worsch+test2@gmail.com". I think this plays nicely with the gmail convention of forwarding "+xxx" emails to their main email box. 4) Generated test users should not spill over into other systems. For the users generated from the local user it might be helpful to have a mapping towards a specific user in the production environment and allow for mechanisms to transfer entities from isolated systems to production based on this mapping. So that something owned by the user in a sandboxed infrastructure can be transfered to production with correct mapping of the owning user.
|
||||
As user management is a complicated topic at the foundation of the infrastructure this decision should be challenged and if necessary be followed up with a clarification or other decisions for clarificatoin.
|
||||
|
||||
---
|
||||
|
||||
## D3 — GitOps tooling: ArgoCD vs Flux vs plain Helm
|
||||
|
||||
**Date:** 2026-03-01
|
||||
**Decided by:** Tegwick
|
||||
|
||||
Net-Kingdom and Railiance both should be optimized for ai-first development. As best-practices for ai-first are not established and will probably evolve rapidly we will start from the following interpretation for ai-first:
|
||||
1) implement in a tdd api-first headless infrastructure approach, testing needs to ensure data integrity of entities
|
||||
2) include efficient access to help/manual/description information as a part of the api interface
|
||||
3) build mcp layer on top of the headless approach
|
||||
4) provide comprehensive commandline tooling where usecases demand it
|
||||
5) ui is low priority and can be prototypical to keep overhead and complications low while allowing for quick dev/user interaction feedback loops. professional production grade user interfaces will as a rule be out of scope for a repository but become a repository in their own right, that is targeted to providing and evolving a user interface specifically.
|
||||
For this decision it follows that plain helm is fine for starting projects up to keep them lightweight as long as this is helpful and then upgrade/integrate to flux.
|
||||
|
||||
---
|
||||
@@ -8,7 +8,7 @@ owner: worsch
|
||||
topic_slug: netkingdom
|
||||
state_hub_workstream_id: 39263c4b-ef70-4053-b782-350834b7e1be
|
||||
created: "2026-02-28"
|
||||
updated: "2026-02-28"
|
||||
updated: "2026-03-01"
|
||||
---
|
||||
|
||||
# SSO & MFA Platform — Keycloak + privacyIDEA on Kubernetes
|
||||
@@ -36,6 +36,17 @@ this plan picks the most concrete and production-aligned choices from each:
|
||||
- **Custom Keycloak image** (both) — JAR baked into image via `kc.sh build`
|
||||
rather than `kubectl cp`; clean GitOps pattern.
|
||||
|
||||
## Decisions
|
||||
|
||||
All three pending decisions from the first session have been resolved
|
||||
(2026-03-01, decided by Tegwick). Full rationale in `DECISIONS.md`.
|
||||
|
||||
| ID | Decision | Outcome |
|
||||
|----|----------|---------|
|
||||
| D1 | Vault backend | **KeePassXC pre-cluster → HashiCorp Vault in-cluster.** Bootstrap on KeePassXC before a cluster is available; transition to Vault once K3s is operational. |
|
||||
| D2 | Identity source of truth | **Hybrid: Keycloak-internal + LDAP/Entra federation** for enterprise tier. Plus a **file-based bootstrap** user store for pre-Keycloak dev/test/sandbox systems. |
|
||||
| D3 | GitOps tooling | **Plain Helm to start, upgrade to Flux when warranted.** Development philosophy: AI-first (TDD, API-first/headless, MCP layer, CLI tooling; UI is low-priority and lives in separate repos). |
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
@@ -55,7 +66,8 @@ this plan picks the most concrete and production-aligned choices from each:
|
||||
▼
|
||||
[PostgreSQL] (CloudNativePG, namespace: databases)
|
||||
│
|
||||
[Vault / K8s Secrets] ← single credential unlocks
|
||||
[HashiCorp Vault] ← single credential unlocks (in-cluster)
|
||||
[KeePassXC] ← pre-cluster bootstrap / dev/test/sandbox
|
||||
```
|
||||
|
||||
**Namespaces:** `sso` (Keycloak), `mfa` (privacyIDEA), `databases`
|
||||
@@ -82,9 +94,13 @@ status: todo
|
||||
priority: critical
|
||||
```
|
||||
|
||||
Create the vault (KeePassXC .kdbx or self-hosted Bitwarden; HashiCorp Vault
|
||||
for later production hardening). Generate and store all secrets inside the
|
||||
vault — never typed again:
|
||||
**Decision D1 applies:** Two-phase vault strategy.
|
||||
|
||||
**Phase 0a — Pre-cluster KeePassXC bootstrap (do this first, before K8s):**
|
||||
|
||||
Create a KeePassXC `.kdbx` database as the initial secret store. Keep the
|
||||
KeePassXC master password in a personal password manager. Generate and store
|
||||
all bootstrap secrets inside KeePassXC:
|
||||
|
||||
- privacyIDEA: `SECRET_KEY` (64+ chars), `PI_PEPPER` (32+ chars),
|
||||
`PI_ENCFILE` content (`pi-manage create_enckey`).
|
||||
@@ -94,11 +110,20 @@ vault — never typed again:
|
||||
- Break-glass: admin credentials + offline recovery OTP seed.
|
||||
|
||||
Export an age-encrypted ops bundle (encrypted tar of all secret YAML
|
||||
manifests). Enable K8s encryption-at-rest. Confirm secret injection
|
||||
strategy: External Secrets Operator + Vault backend, or sops/age for GitOps.
|
||||
manifests). Store offsite.
|
||||
|
||||
**Done when:** vault created, all secrets generated, encrypted ops bundle
|
||||
exported and stored offsite. Secret injection strategy decided.
|
||||
**Phase 0b — HashiCorp Vault in-cluster (after T02, once K3s is running):**
|
||||
|
||||
Deploy HashiCorp Vault in the cluster (Helm chart). Migrate secrets from
|
||||
KeePassXC into Vault. Enable K8s encryption-at-rest. Choose and implement
|
||||
secret injection strategy: External Secrets Operator + Vault backend, or
|
||||
Vault Agent Injector (ESO preferred for GitOps alignment). KeePassXC
|
||||
remains the source of truth for dev/test/sandbox systems that do not connect
|
||||
to the cluster Vault.
|
||||
|
||||
**Done when:** KeePassXC created and all secrets generated (0a). Vault
|
||||
deployed in-cluster, secrets migrated, injection strategy operational (0b).
|
||||
Encrypted ops bundle exported and stored offsite.
|
||||
|
||||
---
|
||||
|
||||
@@ -142,7 +167,8 @@ ThreePhoenix HA posture) or Bitnami Helm chart as fallback. Create:
|
||||
- Database `keycloak_db`, user `keycloak`
|
||||
- Database `privacyidea_db`, user `privacyidea`
|
||||
|
||||
Store DB credentials as K8s Secrets (or ExternalSecrets from vault).
|
||||
Store DB credentials as K8s Secrets injected from Vault (T01 Phase 0b must
|
||||
be complete, or use placeholder K8s Secrets until Vault is live).
|
||||
Configure automated DB backups to object storage (S3 or MinIO).
|
||||
**Run a restore drill before proceeding** — a failed restore later is a
|
||||
critical blocker.
|
||||
@@ -216,10 +242,11 @@ COPY PrivacyIDEA-Provider.jar /opt/keycloak/providers/
|
||||
RUN /opt/keycloak/bin/kc.sh build
|
||||
```
|
||||
|
||||
Deploy via official Keycloak Operator (CRD-based) or codecentric KeycloakX
|
||||
Helm chart. Configure:
|
||||
Deploy via plain Helm chart (official Keycloak Operator CRD-based or
|
||||
codecentric KeycloakX Helm chart; **decision D3: plain Helm first, Flux
|
||||
later**). Configure:
|
||||
|
||||
- DB: `keycloak_db` (credentials from K8s Secret)
|
||||
- DB: `keycloak_db` (credentials from Vault / K8s Secret)
|
||||
- Ingress + TLS: `keycloak.yourdomain.com` (Traefik + cert-manager)
|
||||
- Hostname strictness + proxy mode (Traefik forward headers)
|
||||
- Metrics/logging (Prometheus annotations)
|
||||
@@ -242,8 +269,9 @@ priority: medium
|
||||
|
||||
In Keycloak:
|
||||
|
||||
1. Create/configure realm; set identity source of truth (Keycloak internal
|
||||
users recommended for initial deployment; LDAP/AD or Entra as extension).
|
||||
1. Create/configure realm. **Decision D2 applies:** identity source of truth
|
||||
is Keycloak-internal users. LDAP/AD and Entra federation is deferred to
|
||||
the enterprise tier (not in scope for this workplan phase).
|
||||
2. Create Authentication Flow "privacyIDEA Browser":
|
||||
- Add privacyIDEA execution step (REQUIRED)
|
||||
- Config: privacyIDEA URL = `https://pi.yourdomain.com`, service account
|
||||
@@ -272,9 +300,13 @@ status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Decide and implement identity source of truth (Keycloak internal →
|
||||
privacyIDEA Keycloak resolver, or LDAP/AD shared). The privacyIDEA 3.12+
|
||||
Keycloak user resolver simplifies alignment.
|
||||
**Decision D2 applies:** identity source of truth is Keycloak-internal with
|
||||
the privacyIDEA Keycloak resolver. Implement (not decide):
|
||||
|
||||
- Configure privacyIDEA 3.12+ Keycloak user resolver to align Keycloak
|
||||
users with privacyIDEA token ownership.
|
||||
- LDAP/Entra federation: explicitly out of scope for this phase; tracked as
|
||||
an enterprise-tier extension point.
|
||||
|
||||
Define policies in privacyIDEA:
|
||||
- Allowed token types: TOTP, hardware (YubiKey), passkey
|
||||
@@ -288,8 +320,17 @@ Configure auditing and log shipping: privacyIDEA audit logs + Keycloak
|
||||
events → centralized logging (ELK/Loki or equivalent). Token lifecycle
|
||||
policies: enrollment, revocation, re-enrollment on device loss.
|
||||
|
||||
**Bootstrap user management (D2 extension — scope TBD):**
|
||||
D2 also specifies a file-based lightweight user store for pre-Keycloak
|
||||
systems (dev/test/sandbox that do not connect to the cluster). Users stored
|
||||
as files in a secure subdirectory of the Linux home directory; auto-generates
|
||||
two test users with `N` / `+testN` username and email suffixes. Test users
|
||||
must not spill over into other systems; a mapping mechanism from sandbox
|
||||
identities to production should be provided. This scope is not yet captured
|
||||
in a task — see Open Questions.
|
||||
|
||||
**Done when:** policies documented and applied, self-service portal live,
|
||||
audit logs flowing.
|
||||
audit logs flowing, Keycloak resolver configured.
|
||||
|
||||
---
|
||||
|
||||
@@ -307,6 +348,7 @@ priority: medium
|
||||
backup to S3/MinIO). Test restore.
|
||||
- privacyIDEA encryption/audit key Secrets: encrypted export, versioned.
|
||||
- Keycloak realm exports: stored as JSON in git (GitOps-friendly).
|
||||
- Vault unseal keys and root token: offline copy in KeePassXC.
|
||||
|
||||
**Disaster recovery drill** (mandatory before production):
|
||||
1. Restore DB + keys into a fresh namespace.
|
||||
@@ -335,7 +377,9 @@ documented and tested, HSTS and NetworkPolicies verified.
|
||||
|
||||
## Deliverables Checklist
|
||||
|
||||
- [ ] Vault created; all secrets generated and encrypted ops bundle exported
|
||||
- [ ] KeePassXC vault created; all secrets generated and encrypted ops bundle exported
|
||||
- [ ] HashiCorp Vault deployed in-cluster; secrets migrated from KeePassXC
|
||||
- [ ] Secret injection strategy chosen and operational (ESO + Vault or Vault Agent)
|
||||
- [ ] `sso`, `mfa`, `databases` namespaces + NetworkPolicies deployed
|
||||
- [ ] TLS everywhere via cert-manager (Traefik ingress)
|
||||
- [ ] PostgreSQL live; both DBs created; backup + restore tested
|
||||
@@ -347,15 +391,37 @@ documented and tested, HSTS and NetworkPolicies verified.
|
||||
- [ ] Self-service portal live; token lifecycle policies defined
|
||||
- [ ] DR drill passed; monitoring live; break-glass documented and tested
|
||||
|
||||
## Open Questions / Extension Points
|
||||
## Open Questions
|
||||
|
||||
- **Vault backend**: KeePassXC (simple) vs HashiCorp Vault in-cluster
|
||||
(rotation, audit trail). Start with KeePassXC; upgrade to Vault when
|
||||
ThreePhoenix cluster is stable.
|
||||
- **Identity source of truth**: Keycloak-internal vs LDAP/AD/Entra.
|
||||
Decision needed before T07.
|
||||
- **GitOps tooling**: ArgoCD or Flux for declarative Helm management?
|
||||
Aligns with Railiance staged-promotion-lifecycle workstream.
|
||||
- **Cluster target**: Development on single-node k3s; production on
|
||||
ThreePhoenix (3-node HA). Workplan covers both; HA-specific steps noted
|
||||
where they diverge.
|
||||
The three original pending decisions (D1 vault backend, D2 identity source
|
||||
of truth, D3 GitOps tooling) have all been resolved. See `DECISIONS.md`.
|
||||
|
||||
Remaining open items:
|
||||
|
||||
1. **Secret injection strategy** — D1 resolves the vault backend (Vault
|
||||
in-cluster) but the concrete injection mechanism is still open: External
|
||||
Secrets Operator vs Vault Agent Injector. Should be decided and closed
|
||||
in T01 Phase 0b.
|
||||
|
||||
2. **File-based bootstrap user management (D2 extension)** — D2 specifies
|
||||
a lightweight file-based user store for pre-Keycloak environments. This
|
||||
is non-trivial scope (file format, test-user generation, isolation
|
||||
controls, production-mapping mechanism) and is not captured in any
|
||||
current task. Needs a decision: is this a task within this workplan, or
|
||||
a separate workplan/repo?
|
||||
|
||||
3. **AI-first / MCP layer (D3 extension)** — D3 establishes an AI-first
|
||||
development philosophy (TDD, API-first/headless, MCP layer, CLI
|
||||
tooling). This workplan currently covers only infrastructure deployment.
|
||||
Should Keycloak/privacyIDEA operations (user management, policy CRUD,
|
||||
token lifecycle) be wrapped in an MCP server or CLI? If so, this needs
|
||||
a new task or workplan.
|
||||
|
||||
4. **LDAP/Entra federation** — Explicitly deferred to the enterprise tier
|
||||
(D2). Track as an extension point when the time comes.
|
||||
|
||||
5. **Cluster target for dev/test** — D1 implies KeePassXC-based systems
|
||||
run independently of the cluster. The plan assumes single-node k3s for
|
||||
dev and ThreePhoenix for production. The sequencing between T01 Phase 0a
|
||||
(pre-cluster) and Phase 0b (in-cluster) should be confirmed once the
|
||||
Railiance cluster timeline is clearer.
|
||||
|
||||
Reference in New Issue
Block a user