feat(NET-WP-0018-T02): Document The Runtime Architecture

- Created docs/NetkingdomRuntimeArchitecture.md (comprehensive, specific-as-deployed):
  - Planes (bootstrap/control/tenant + recursive trust)
  - Identity/MFA/OIDC (lightweight key-cape: LLDAP/Authelia/privacyIDEA + KeyCape https://kc.coulomb.social; clients, claims, groups)
  - Authelia handoff, OpenBao OIDC+secrets path (SOPS/age -> runtime leases/K8s/audit)
  - Bootstrap console/UI state (S6, gates, 0019 dry-run additions, web-ui, evidence)
  - State Hub relation, k8s/DNS/routes/ingress/trust (concrete hosts/ns)
  - Pragmatic audit paths (local-identity TSV, PVC+mock, State Hub/console)
  - UE integration points + 7 gaps (per assessment + boundary contract refs)
  - Operational assumptions + rebuild notes
- Updated NET-WP-0018 workplan: T02 status done + detailed 2026-06-03 completion note
- Used pragmatic tracking throughout (progress events with task_id 121ee797..., file notes, this commit)
- Per T02 spec + Coordination Notes guidance on pragmatic for impl/retrospect

This doc is now the baseline for T03 retrospective/gap matrix (incl. audit), T05 guide, T06/T08 control surface/validations, T09 risk assessment.
This commit is contained in:
2026-06-03 16:27:22 +02:00
parent 6e05946163
commit d09843c17e
2 changed files with 210 additions and 1 deletions

View File

@@ -0,0 +1,205 @@
# NetKingdom Runtime Architecture
**Status:** draft (initial capture for NET-WP-0018-T02)
**Date:** 2026-06-03
**Context:** Documents the *as-deployed* runtime after the first successful bootstrap (0015-0017) + T06-adjacent polish (0019). Not an idealized future architecture. Specific enough to guide a scratch rebuild or rehearsal without rediscovering integration details. Incorporates pragmatic audit paths and known UE integration points/gaps per the persisted assessment.
This is the working system that survived the first bootstrap ceremony and is now the target for automation, validation, guide, and risk assessment in NET-WP-0018.
## Planes Model
(From platform-identity-security-architecture.md baseline)
- **Bootstrap plane**: Establishes initial trust before full platform services. Minimal authority for cluster access, initial identity/secret injection, break-glass recovery, transition to managed runtime. Owned by railiance-infra/cluster + net-kingdom credential bootstrap. Uses SOPS/age for at-rest + offline packets.
- **Platform control plane**: Shared security services (identity, MFA, secrets, policy, audit, authorization). net-kingdom owns canonical architecture/IAM Profile/SSO/MFA/bootstrap decisions; deployed via Railiance stack.
- **Tenant planes**: Workloads (Coulomb as tenant zero/reference). Must not alter platform root trust.
Recursive trust rule: Normal tenant admin (even Coulomb) must never suffice to alter platform root of trust (IAM Profile semantics, break-glass, global MFA, OpenBao root/unseal, flex-auth policy pipelines, audit retention, etc.).
## Identity Stores, MFA Realms, and OIDC Flows
**Lightweight mode (key-cape, current primary for bootstrap/internal):**
- Directory: LLDAP (https://lldap.coulomb.social for admin; internal for Authelia).
- SSO/Proxy: Authelia (over LLDAP).
- MFA/Token: privacyIDEA (self-service enrollment for TOTP; pi-admin for setup/repair; used for assurance on privileged actions).
- OIDC Provider: KeyCape (issuer https://kc.coulomb.social; conforms to NetKingdom IAM Profile v0.2).
- KeyCape issues tokens with required claims: tenant, principal_type, groups, roles, scope/scp, assurance.
- Registered clients include: netkingdom-bootstrap-console (for console OIDC login), openbao-admin (for OpenBao OIDC auth).
- Redirects: http://localhost:8250/oidc/callback, http://127.0.0.1:8250/oidc/callback.
- Groups/roles for bootstrap: net-kingdom-admins (for platform-admin OpenBao policy), net-kingdom-users (for scoped non-root).
- platform-root / king credential: dedicated LLDAP user (separate from personal accounts like tegwick). Password in operator password safe; TOTP via privacyIDEA; roles include platform-root-custodian, openbao-admin, identity-admin.
**Expanded mode:** Keycloak (for enterprise federation/SAML/Entra, complex realms, delegated admin). Not yet primary for bootstrap.
**Capability progression (C1 lightweight -> C2 MFA/token):**
- C1: Single-factor OIDC SSO over internal directory (key-cape: Authelia + LLDAP).
- C2a (light 2FA): Authelia built-in TOTP/WebAuthn.
- C2b (token authority): privacyIDEA for hardware tokens, many types, self-service, lifecycle.
Applications target the IAM Profile v0.2 contract (`canon/standards/iam-profile_v0.2.md`), not concrete providers.
**Token flows (high level):**
- Human/service -> Authelia/LLDAP or Keycloak -> KeyCape/Keycloak issues IAM Profile token -> claims to flex-auth (for authz) or directly to protected services / OpenBao OIDC.
- For bootstrap console: OIDC login verified to obtain platform-admin via KeyCape -> OpenBao.
## Authelia Handoff
Authelia acts as the SSO proxy/authenticator in lightweight mode, fronting LLDAP directory + (where enabled) privacyIDEA MFA. Handoffs normalized identity to KeyCape for OIDC issuance. Used for day-to-day logins; email (e.g. bernd.worsch@gmail.com) is notification-only, not auth source for privileged/root.
## OpenBao OIDC Admin Path and Secrets/Credential Path
**OpenBao as runtime secrets authority (post-bootstrap):**
- KV v2 for platform config.
- Dynamic DB creds, K8s auth/workload identity.
- Future object storage STS brokering.
- Audit devices, lease/revocation.
- Delivery: direct clients, External Secrets Operator -> K8s Secrets, CSI mounts.
- Auth: OIDC/JWT against KeyCape (maps claims/groups to policies, e.g. platform-admin for net-kingdom-admins group).
- platform-root can obtain platform-admin policy via KeyCape/MFA (proven in 0015/0017).
- Root token: revoked/dispositioned after init; used only for bootstrap/break-glass. Unseal keys in custody (age/SOPS protected, offline packets, king credential).
**Bootstrap to runtime transition:**
- SOPS/age for initial cluster secrets, emergency bundles, Git at-rest.
- Once OpenBao alive + configured (auth, mounts, policies, audit): switch to it as long-lived authority.
- Bootstrap-era creds/databases/access paths reviewed/rotated/cleaned before production reliance (see cleanup_complete, T03/T04 in 0017).
**Platform root custody (see docs/platform-root-custody.md):**
- Initial setup operator: tegwick / bernd.worsch@gmail.com (notification contact).
- King credential: dedicated, rarely used platform-root identity (break-glass only). Not day-to-day Gitea/email account.
- Temporary single-king custody (with MFA, encrypted offline, password-safe refs) allowed pre-prod; target two-of-three escrow.
- Never store unseal/root/OTP/private keys in Git, State Hub, email, shell history, etc.
## Bootstrap UI / Console State (Control Surface)
Implemented in `tools/security-bootstrap-console/security_bootstrap_console.py` (non-secret only; refuses live OpenBao init or secret collection).
**Current stage (post 0017/0019):** S6 - Reopen under custody.
**Key gates / posture (from metadata + console):**
- King credential kit prepared.
- Custody strategy approved (temporary-single-king).
- OpenBao preflight, init ceremony (attended only), initial config, KeyCape client, OIDC auth, admin login via KeyCape/MFA, root token disposition (revoked), restore drill, cleanup/rotation, platform reopened.
- audit_core_posture: bootstrap risk accepted (production sink not ready); owner, review date (2026-07-02), note recorded. See audit_core_posture_ready() / reason().
- Other: custodian age keys confirmed, mfa enrolled (TOTP via privacyIDEA), oidc_login_verified, no_secret_capture_confirmed, etc.
- .local/security-bootstrap.json holds non-secret flags (updated via console approve/validate flows).
**Available actions (status output + parser):** king-kit, custody-packet, openbao-preflight, handover-checklist, validate-* (t02, cleanup, lifecycle-flow, onboarding-dry-run), custody-roster-template, lifecycle-flow-template, lifecycle-guide, onboarding-dry-run-template, onboarding-dry-run (delegates to orchestrator), onboarding-dry-run-claims, lifecycle-cleanup-dryrun-users, validate-custody-roster, metadata-template, approve-custody-mode, web-ui, etc.
**Web UI:** Served locally (default :8765 or similar); forms for custody approval, responsibility, audit_core flags (production_sink_ready, bootstrap_risk_accepted + owner/review/note), cleanup_complete, platform_reopened. Uses JS to compute gates from metadata.
**Runbooks / payloads:** privacyIDEA realm repair, Key material compromised (taint), generate new unseal keys, emergency lock-down, restore drill, OpenBao token revocation, **User lifecycle dry-run (T06)** (from 0019: references dry-run-nonroot-user.sh, make security-bootstrap-onboarding-dry-run, console subcmds, NET-WP-0019).
**0019 polish additions (T06-adjacent):**
- dry-run-nonroot-user.sh orchestrator (/tmp workspace + EXIT trap cleanup; k8s fallback for LLDAP_ADMIN_PASS never writing persistent bootstrap/secrets for test users; create --test non-root; verifs (MFA, KeyCape); optional GraphQL lock/offboard; populate + validate evidence.json).
- Console subcommands + make targets for repeatable dry-run, claims verification (infers from LLDAP groups + T01 role binding; warns on platform-root/admins), cleanup by pattern.
- Evidence templates/validators for onboarding dry-run (12+ bools: effective access preview, no secret material recorded, actor_class != king, groups limited to net-kingdom-users, lldap/keycape verified, etc.).
- Integrated into lifecycle-guide (T06 DRY-RUN section) and runbook_payloads for web-ui exposure.
- Safer secret handling in create-user.sh (k8s extract fallback).
**Evidence discipline:** /tmp/netkingdom-*-evidence.json (exact strings + bools); validated by console; non-secret only (refuses secret markers).
## State Hub Relation
- Tracks domain netkingdom (topic a6c6e745-bf54-4465-9340-1534a2be493e).
- Workstreams/tasks (e.g. this NET-WP-0018 id 800f9f16-..., 0019, 0017).
- Progress events (POST /progress/ with workstream/task for what was done; used for tracking impl + feeding retrospectives).
- Decisions (POST /decisions/ for key choices).
- Inbox for cross-agent coordination.
- .custodian-brief.md generated by fix-consistency (reflects file + DB).
- Used for audit correlation in pragmatic layer (events link to actors/decisions).
## k8s / Deployment, DNS, Routes, Ingress, Trust Boundaries
**Namespaces/components (from manifests + usage):**
- sso: LLDAP, privacyIDEA, KeyCape (keycape-config Secret), Authelia?
- openbao: OpenBao (0 pod; bao status via kubectl exec).
- Railiance platform services (DBs, etc.) for stateful backing.
**Ingress / DNS (internal .coulomb.social):**
- LLDAP admin: lldap.coulomb.social
- KeyCape: kc.coulomb.social (OIDC issuer)
- Console OIDC callbacks: localhost/127.0.0.1:8250
- Other platform services via railiance-cluster ingress + cert-manager + NetworkPolicies.
**Trust boundaries / token flows (high level):**
- Bootstrap: local files (SOPS/age), operator password safe, k8s secrets (lldap-credentials etc.), direct kubectl for dry-run safety.
- Runtime: OIDC tokens (IAM Profile claims) -> KeyCape/Keycloak -> flex-auth decisions or OpenBao OIDC (group->policy e.g. net-kingdom-admins -> platform-admin).
- Workload identity: K8s auth to OpenBao for dynamic creds.
- No tenant can reach platform-root paths without explicit platform control plane authority + custody safeguards.
- Break-glass: king credential + unseal shares (custody roster, age protected).
**Operational assumptions:**
- Live k3s/Railiance cluster with ingress/cert-manager/NetworkPolicy.
- Operator has kubectl access to sso/openbao ns + password safe for bootstrap material.
- Non-secret evidence + State Hub for progress; secrets never in Git/console.
- Capability-driven identity mode (lightweight key-cape sufficient for many cases).
- Audit is currently pragmatic/separate during bootstrap (see below).
## Pragmatic Audit Paths (Current Bootstrap)
Per Coordination Notes and assessment gap 7 (audit/event correlation):
- **local-identity/audit.py**: Append-only TSV ~/.local-identity/audit.log (TSV: TS command username outcome; mode 600; silent on I/O failure). For local-identity CLI + OIDC server events during bootstrap phase.
- **OpenBao audit**: Retained on audit PVC + Audit Core mock wiring only (production tenant-aware durable sink not ready; risk accepted with owner/review note).
- **State Hub + console evidence**: /progress/ events (with workstream/task/decision correlation), /decisions/, non-secret evidence.json (from templates + live data + validators), metadata flags (audit_core_*), runbook payloads. Used for impl tracking and bootstrap ceremony records.
- **Separate from UE**: Current bootstrap (direct LLDAP/Keycloak for platform-root/break-glass/test users in 0015-0019) does not yet route through user-engine AuditRecord/OutboxEvent or claims_enrichment. UE emits its own audit for domain facts when used.
- **Contract requirement**: Must link request/actor/decision/user_engine_audit/outbox_event (audit correlation bundle in boundary contract). Current is pragmatic/separate for bootstrap.
Documented here for rebuild guidance. Proper integration (adapters + sinks) is follow-up (see assessment recs + T03/T09).
## UE Integration Points and Known Gaps (from docs/user-engine-netkingdom-integration-assessment.md)
**Fit (no intent conflicts):**
- net-kingdom: IAM orchestration + authn/coarse claims (IAM Profile) + bootstrap + secrets (OpenBao) + contract governance + meta-orchestration of user-domain facts.
- user-engine: headless user-domain backend (User/Account/ExternalIdentity/Membership with owning_system/source/freshness/delete_semantics, Application+Binding+Catalog, ProfileValue layered, EffectiveProfile+projections (incl. claims_enrichment, audit), ports (IdentityClaimsAdapter, AuthorizationCheckPort, SecretProvider, EventOutbox, AuditWriter), OutboxEvent/AuditRecord). In-mem MVP finished (USER-WPs); local standalone only currently.
- Integration via claims/contracts/adapters (no shared code). UE consumes verified Actor from claims; delegates authz to PDP (flex-auth); exports projections. NK orchestrates boundaries ("owner wins" for membership sync, app onboarding 8-step bindings as separate records, etc.).
**Current points in runtime:**
- Bootstrap/T06 dry-runs + platform users use direct LLDAP/Keycloak (IAM side).
- KeyCape OIDC claims (groups + email) feed OpenBao policy and console verification (0019 helper).
- Claims enrichment not yet via UE projection + cache (direct LLDAP resolution in paths).
- Memberships (net-kingdom-* groups) treated as IAM facts; not yet synced as UE Membership with semantics.
- Audit separate (see above).
**7 Gaps (biggest first; see full assessment for details/recs):**
1. Missing Platform Integration Adapters (UE side or symmetric): IdentityClaimsAdapter (KeyCape claims -> Actor), AuthorizationCheckPort (to flex-auth), SecretProvider (OpenBao), EventOutbox, AuditWriter, etc. Only mocks today. Blocks UE as canonical for user facts.
2. Bootstrap/Platform Users vs. Governed UE Lifecycle (direct LLDAP creates for root/break-glass/"net-kingdom-*" vs. UE Membership + externally_provisioned).
3. Application Onboarding "Application" concept (KeyCape OIDC client/secret vs. UE Application + Binding records; must stay separate).
4. Membership/Group Overlap (LLDAP groups vs. UE Membership scopes + owner/source).
5. Governance/Workplan/Brief Split (UE brief stale May22/domain=netkingdom; 0018/0019 as NK orchestration correct but line must stay crisp).
6. Claims Enrichment Path drift (current direct LLDAP in OIDC/bootstrap paths; must switch to adapter-owned when UE deployed; UE never in token critical path).
7. Other: Audit/event correlation (shared IDs across UE/flex/platform; current bootstrap separate/split to audit-core); tenant platform:root special case; no prod UE deployment in NK flows yet.
**Recommendations (for 0018 context):** Use T02 to document current paths + gaps; T07/T08 as testbed for integration once adapters exist (e.g. update dry-run to exercise UE); T03/T09 to classify UE integration risk. NK keeps boundary/contracts/orchestration; UE owns domain impl.
See canon/standards/user-engine-boundary-contract_v0.1.md, docs/user-engine-netkingdom-integration-assessment.md, responsibility-map.md, SCOPE.md, user-engine/INTENT.md + SCOPE.md for full.
## Dry-Run / User Lifecycle Tooling (0019 Polish Additions)
See NET-WP-0019 and sso-mfa/k8s/lldap/dry-run-nonroot-user.sh:
- Safe repeatable non-root onboarding dry-run (create/verify/lock/offboard) with /tmp hygiene, k8s secret fallback, evidence (lldap_identity_verified, keycape_oidc_claims_verified, effective_access_summary, no_secret_material_recorded, actor_class, groups, lock_offboard_result, 12+ bools), validate via make/console.
- Console: onboarding-dry-run*, claims verification (T05 helper), lifecycle-cleanup-dryrun-users.
- Make targets + runbook entry for web-ui.
- Integrated into lifecycle-guide.
- Proves IAM-lifecycle contract; foundation for future UE-backed version (per assessment).
## Operational Assumptions and Rebuild Notes
- Requires live Railiance/k3s cluster with required addons (ingress, cert-manager, etc.).
- Operator kubectl to target ns + access to password safe for bootstrap material + age keys.
- All privileged flows show effective preview; MFA for privileged; no platform-root grants to non-king.
- Evidence always non-secret; secrets in safe/k8s only.
- For scratch rebuild: follow T05 guide (once complete) + evidence per step + T09 risk assessment. Use 0019 dry-run tooling as model for safe user lifecycle tests. Rehearse in isolated/namespace/scripted first (non-goal: destructive live).
- Audit currently pragmatic for bootstrap (documented above); production correlation is follow-up.
**References:**
- platform-identity-security-architecture.md, responsibility-map.md, SCOPE.md
- docs/user-engine-netkingdom-integration-assessment.md + canon/standards/user-engine-boundary-contract_v0.1.md
- security-bootstrap-*.md family (operator-journey, openbao-ceremony-ux, user-lifecycle, handover-cleanup, king-credential-kit, age-custody, etc.)
- tools/security-bootstrap-console/security_bootstrap_console.py (and Makefile targets)
- sso-mfa/k8s/lldap/{create-user.sh,dry-run-nonroot-user.sh}, keycape/verify-*, privacyidea/check-*
- local-identity/ (audit.py, etc.)
- .local/security-bootstrap.json (current gates)
- NET-WP-0017, 0019 workplans + their evidence
- DECISIONS.md, ADRs (e.g. 0007, 0010), canon/standards/iam-profile_v0.2.md
This document will be updated as T03 retrospective, T05 guide, T06/T08 work, and T09 risk assessment proceed. It is the single source for "what the running system actually is" for rebuild guidance.

View File

@@ -117,7 +117,7 @@ guide, UI automation, validations, and rebuild-risk assessment.
```task
id: NET-WP-0018-T02
status: todo
status: done
priority: high
state_hub_task_id: "121ee797-e3f5-4d3e-9baa-cfa8c92f8a66"
```
@@ -140,6 +140,10 @@ vs. UE Membership (owning_system etc.), bootstrap local-identity vs. UE local
mode, and the boundary contract as the governance layer. Include refs to
canon/standards/user-engine-boundary-contract_v0.1.md and the assessment.
**2026-06-03:** Started T02. Using pragmatic tracking (this note + will POST /progress/ with task). Gathering deployed components from existing docs, code, and configs to produce specific-as-deployed doc (not idealized). Will cover all listed items + pragmatic audit paths, dry-run 0019 additions, UE integration points/gaps per assessment.
**2026-06-03:** T02 complete. Created docs/NetkingdomRuntimeArchitecture.md (comprehensive sections on planes model, identity stores/MFA/OIDC flows (lightweight key-cape: LLDAP at lldap.coulomb.social + Authelia + privacyIDEA + KeyCape issuer https://kc.coulomb.social with bootstrap clients), Authelia handoff, OpenBao OIDC admin + secrets/credential path (SOPS/age bootstrap -> runtime with K8s auth, ESO, leases), bootstrap console/UI state (S6 Reopen, full gates incl. audit_core_posture, 0019 dry-run orchestrator/console subcmds/make targets/evidence/validators/runbook entry), State Hub relation (progress/decisions for tracking), k8s/DNS/routes/ingress/trust boundaries (sso/openbao ns, recursive rule, concrete hosts), operational assumptions + rebuild notes. Explicitly includes current pragmatic audit paths (local-identity/audit.py TSV, OpenBao PVC+mock, State Hub/console evidence) and UE integration points + 7 gaps (from assessment + contract refs). Specific as-deployed for rebuild guidance. This doc now feeds T03 retrospective, T05 guide, T09 risk, and T02/T08 validation targets.
### T03 - Produce A Bootstrap Retrospective And Automation Gap Matrix
```task