Files
net-kingdom/docs/NetkingdomRuntimeArchitecture.md
tegwick d09843c17e feat(NET-WP-0018-T02): Document The Runtime Architecture
- Created docs/NetkingdomRuntimeArchitecture.md (comprehensive, specific-as-deployed):
  - Planes (bootstrap/control/tenant + recursive trust)
  - Identity/MFA/OIDC (lightweight key-cape: LLDAP/Authelia/privacyIDEA + KeyCape https://kc.coulomb.social; clients, claims, groups)
  - Authelia handoff, OpenBao OIDC+secrets path (SOPS/age -> runtime leases/K8s/audit)
  - Bootstrap console/UI state (S6, gates, 0019 dry-run additions, web-ui, evidence)
  - State Hub relation, k8s/DNS/routes/ingress/trust (concrete hosts/ns)
  - Pragmatic audit paths (local-identity TSV, PVC+mock, State Hub/console)
  - UE integration points + 7 gaps (per assessment + boundary contract refs)
  - Operational assumptions + rebuild notes
- Updated NET-WP-0018 workplan: T02 status done + detailed 2026-06-03 completion note
- Used pragmatic tracking throughout (progress events with task_id 121ee797..., file notes, this commit)
- Per T02 spec + Coordination Notes guidance on pragmatic for impl/retrospect

This doc is now the baseline for T03 retrospective/gap matrix (incl. audit), T05 guide, T06/T08 control surface/validations, T09 risk assessment.
2026-06-03 16:27:22 +02:00

17 KiB

NetKingdom Runtime Architecture

Status: draft (initial capture for NET-WP-0018-T02) Date: 2026-06-03 Context: Documents the as-deployed runtime after the first successful bootstrap (0015-0017) + T06-adjacent polish (0019). Not an idealized future architecture. Specific enough to guide a scratch rebuild or rehearsal without rediscovering integration details. Incorporates pragmatic audit paths and known UE integration points/gaps per the persisted assessment.

This is the working system that survived the first bootstrap ceremony and is now the target for automation, validation, guide, and risk assessment in NET-WP-0018.

Planes Model

(From platform-identity-security-architecture.md baseline)

  • Bootstrap plane: Establishes initial trust before full platform services. Minimal authority for cluster access, initial identity/secret injection, break-glass recovery, transition to managed runtime. Owned by railiance-infra/cluster + net-kingdom credential bootstrap. Uses SOPS/age for at-rest + offline packets.
  • Platform control plane: Shared security services (identity, MFA, secrets, policy, audit, authorization). net-kingdom owns canonical architecture/IAM Profile/SSO/MFA/bootstrap decisions; deployed via Railiance stack.
  • Tenant planes: Workloads (Coulomb as tenant zero/reference). Must not alter platform root trust.

Recursive trust rule: Normal tenant admin (even Coulomb) must never suffice to alter platform root of trust (IAM Profile semantics, break-glass, global MFA, OpenBao root/unseal, flex-auth policy pipelines, audit retention, etc.).

Identity Stores, MFA Realms, and OIDC Flows

Lightweight mode (key-cape, current primary for bootstrap/internal):

  • Directory: LLDAP (https://lldap.coulomb.social for admin; internal for Authelia).
  • SSO/Proxy: Authelia (over LLDAP).
  • MFA/Token: privacyIDEA (self-service enrollment for TOTP; pi-admin for setup/repair; used for assurance on privileged actions).
  • OIDC Provider: KeyCape (issuer https://kc.coulomb.social; conforms to NetKingdom IAM Profile v0.2).
  • Groups/roles for bootstrap: net-kingdom-admins (for platform-admin OpenBao policy), net-kingdom-users (for scoped non-root).
  • platform-root / king credential: dedicated LLDAP user (separate from personal accounts like tegwick). Password in operator password safe; TOTP via privacyIDEA; roles include platform-root-custodian, openbao-admin, identity-admin.

Expanded mode: Keycloak (for enterprise federation/SAML/Entra, complex realms, delegated admin). Not yet primary for bootstrap.

Capability progression (C1 lightweight -> C2 MFA/token):

  • C1: Single-factor OIDC SSO over internal directory (key-cape: Authelia + LLDAP).
  • C2a (light 2FA): Authelia built-in TOTP/WebAuthn.
  • C2b (token authority): privacyIDEA for hardware tokens, many types, self-service, lifecycle.

Applications target the IAM Profile v0.2 contract (canon/standards/iam-profile_v0.2.md), not concrete providers.

Token flows (high level):

  • Human/service -> Authelia/LLDAP or Keycloak -> KeyCape/Keycloak issues IAM Profile token -> claims to flex-auth (for authz) or directly to protected services / OpenBao OIDC.
  • For bootstrap console: OIDC login verified to obtain platform-admin via KeyCape -> OpenBao.

Authelia Handoff

Authelia acts as the SSO proxy/authenticator in lightweight mode, fronting LLDAP directory + (where enabled) privacyIDEA MFA. Handoffs normalized identity to KeyCape for OIDC issuance. Used for day-to-day logins; email (e.g. bernd.worsch@gmail.com) is notification-only, not auth source for privileged/root.

OpenBao OIDC Admin Path and Secrets/Credential Path

OpenBao as runtime secrets authority (post-bootstrap):

  • KV v2 for platform config.
  • Dynamic DB creds, K8s auth/workload identity.
  • Future object storage STS brokering.
  • Audit devices, lease/revocation.
  • Delivery: direct clients, External Secrets Operator -> K8s Secrets, CSI mounts.
  • Auth: OIDC/JWT against KeyCape (maps claims/groups to policies, e.g. platform-admin for net-kingdom-admins group).
  • platform-root can obtain platform-admin policy via KeyCape/MFA (proven in 0015/0017).
  • Root token: revoked/dispositioned after init; used only for bootstrap/break-glass. Unseal keys in custody (age/SOPS protected, offline packets, king credential).

Bootstrap to runtime transition:

  • SOPS/age for initial cluster secrets, emergency bundles, Git at-rest.
  • Once OpenBao alive + configured (auth, mounts, policies, audit): switch to it as long-lived authority.
  • Bootstrap-era creds/databases/access paths reviewed/rotated/cleaned before production reliance (see cleanup_complete, T03/T04 in 0017).

Platform root custody (see docs/platform-root-custody.md):

  • Initial setup operator: tegwick / bernd.worsch@gmail.com (notification contact).
  • King credential: dedicated, rarely used platform-root identity (break-glass only). Not day-to-day Gitea/email account.
  • Temporary single-king custody (with MFA, encrypted offline, password-safe refs) allowed pre-prod; target two-of-three escrow.
  • Never store unseal/root/OTP/private keys in Git, State Hub, email, shell history, etc.

Bootstrap UI / Console State (Control Surface)

Implemented in tools/security-bootstrap-console/security_bootstrap_console.py (non-secret only; refuses live OpenBao init or secret collection).

Current stage (post 0017/0019): S6 - Reopen under custody.

Key gates / posture (from metadata + console):

  • King credential kit prepared.
  • Custody strategy approved (temporary-single-king).
  • OpenBao preflight, init ceremony (attended only), initial config, KeyCape client, OIDC auth, admin login via KeyCape/MFA, root token disposition (revoked), restore drill, cleanup/rotation, platform reopened.
  • audit_core_posture: bootstrap risk accepted (production sink not ready); owner, review date (2026-07-02), note recorded. See audit_core_posture_ready() / reason().
  • Other: custodian age keys confirmed, mfa enrolled (TOTP via privacyIDEA), oidc_login_verified, no_secret_capture_confirmed, etc.
  • .local/security-bootstrap.json holds non-secret flags (updated via console approve/validate flows).

Available actions (status output + parser): king-kit, custody-packet, openbao-preflight, handover-checklist, validate-* (t02, cleanup, lifecycle-flow, onboarding-dry-run), custody-roster-template, lifecycle-flow-template, lifecycle-guide, onboarding-dry-run-template, onboarding-dry-run (delegates to orchestrator), onboarding-dry-run-claims, lifecycle-cleanup-dryrun-users, validate-custody-roster, metadata-template, approve-custody-mode, web-ui, etc.

Web UI: Served locally (default :8765 or similar); forms for custody approval, responsibility, audit_core flags (production_sink_ready, bootstrap_risk_accepted + owner/review/note), cleanup_complete, platform_reopened. Uses JS to compute gates from metadata.

Runbooks / payloads: privacyIDEA realm repair, Key material compromised (taint), generate new unseal keys, emergency lock-down, restore drill, OpenBao token revocation, User lifecycle dry-run (T06) (from 0019: references dry-run-nonroot-user.sh, make security-bootstrap-onboarding-dry-run, console subcmds, NET-WP-0019).

0019 polish additions (T06-adjacent):

  • dry-run-nonroot-user.sh orchestrator (/tmp workspace + EXIT trap cleanup; k8s fallback for LLDAP_ADMIN_PASS never writing persistent bootstrap/secrets for test users; create --test non-root; verifs (MFA, KeyCape); optional GraphQL lock/offboard; populate + validate evidence.json).
  • Console subcommands + make targets for repeatable dry-run, claims verification (infers from LLDAP groups + T01 role binding; warns on platform-root/admins), cleanup by pattern.
  • Evidence templates/validators for onboarding dry-run (12+ bools: effective access preview, no secret material recorded, actor_class != king, groups limited to net-kingdom-users, lldap/keycape verified, etc.).
  • Integrated into lifecycle-guide (T06 DRY-RUN section) and runbook_payloads for web-ui exposure.
  • Safer secret handling in create-user.sh (k8s extract fallback).

Evidence discipline: /tmp/netkingdom-*-evidence.json (exact strings + bools); validated by console; non-secret only (refuses secret markers).

State Hub Relation

  • Tracks domain netkingdom (topic a6c6e745-bf54-4465-9340-1534a2be493e).
  • Workstreams/tasks (e.g. this NET-WP-0018 id 800f9f16-..., 0019, 0017).
  • Progress events (POST /progress/ with workstream/task for what was done; used for tracking impl + feeding retrospectives).
  • Decisions (POST /decisions/ for key choices).
  • Inbox for cross-agent coordination.
  • .custodian-brief.md generated by fix-consistency (reflects file + DB).
  • Used for audit correlation in pragmatic layer (events link to actors/decisions).

k8s / Deployment, DNS, Routes, Ingress, Trust Boundaries

Namespaces/components (from manifests + usage):

  • sso: LLDAP, privacyIDEA, KeyCape (keycape-config Secret), Authelia?
  • openbao: OpenBao (0 pod; bao status via kubectl exec).
  • Railiance platform services (DBs, etc.) for stateful backing.

Ingress / DNS (internal .coulomb.social):

  • LLDAP admin: lldap.coulomb.social
  • KeyCape: kc.coulomb.social (OIDC issuer)
  • Console OIDC callbacks: localhost/127.0.0.1:8250
  • Other platform services via railiance-cluster ingress + cert-manager + NetworkPolicies.

Trust boundaries / token flows (high level):

  • Bootstrap: local files (SOPS/age), operator password safe, k8s secrets (lldap-credentials etc.), direct kubectl for dry-run safety.
  • Runtime: OIDC tokens (IAM Profile claims) -> KeyCape/Keycloak -> flex-auth decisions or OpenBao OIDC (group->policy e.g. net-kingdom-admins -> platform-admin).
  • Workload identity: K8s auth to OpenBao for dynamic creds.
  • No tenant can reach platform-root paths without explicit platform control plane authority + custody safeguards.
  • Break-glass: king credential + unseal shares (custody roster, age protected).

Operational assumptions:

  • Live k3s/Railiance cluster with ingress/cert-manager/NetworkPolicy.
  • Operator has kubectl access to sso/openbao ns + password safe for bootstrap material.
  • Non-secret evidence + State Hub for progress; secrets never in Git/console.
  • Capability-driven identity mode (lightweight key-cape sufficient for many cases).
  • Audit is currently pragmatic/separate during bootstrap (see below).

Pragmatic Audit Paths (Current Bootstrap)

Per Coordination Notes and assessment gap 7 (audit/event correlation):

  • local-identity/audit.py: Append-only TSV ~/.local-identity/audit.log (TSV: TS command username outcome; mode 600; silent on I/O failure). For local-identity CLI + OIDC server events during bootstrap phase.
  • OpenBao audit: Retained on audit PVC + Audit Core mock wiring only (production tenant-aware durable sink not ready; risk accepted with owner/review note).
  • State Hub + console evidence: /progress/ events (with workstream/task/decision correlation), /decisions/, non-secret evidence.json (from templates + live data + validators), metadata flags (audit_core_*), runbook payloads. Used for impl tracking and bootstrap ceremony records.
  • Separate from UE: Current bootstrap (direct LLDAP/Keycloak for platform-root/break-glass/test users in 0015-0019) does not yet route through user-engine AuditRecord/OutboxEvent or claims_enrichment. UE emits its own audit for domain facts when used.
  • Contract requirement: Must link request/actor/decision/user_engine_audit/outbox_event (audit correlation bundle in boundary contract). Current is pragmatic/separate for bootstrap.

Documented here for rebuild guidance. Proper integration (adapters + sinks) is follow-up (see assessment recs + T03/T09).

UE Integration Points and Known Gaps (from docs/user-engine-netkingdom-integration-assessment.md)

Fit (no intent conflicts):

  • net-kingdom: IAM orchestration + authn/coarse claims (IAM Profile) + bootstrap + secrets (OpenBao) + contract governance + meta-orchestration of user-domain facts.
  • user-engine: headless user-domain backend (User/Account/ExternalIdentity/Membership with owning_system/source/freshness/delete_semantics, Application+Binding+Catalog, ProfileValue layered, EffectiveProfile+projections (incl. claims_enrichment, audit), ports (IdentityClaimsAdapter, AuthorizationCheckPort, SecretProvider, EventOutbox, AuditWriter), OutboxEvent/AuditRecord). In-mem MVP finished (USER-WPs); local standalone only currently.
  • Integration via claims/contracts/adapters (no shared code). UE consumes verified Actor from claims; delegates authz to PDP (flex-auth); exports projections. NK orchestrates boundaries ("owner wins" for membership sync, app onboarding 8-step bindings as separate records, etc.).

Current points in runtime:

  • Bootstrap/T06 dry-runs + platform users use direct LLDAP/Keycloak (IAM side).
  • KeyCape OIDC claims (groups + email) feed OpenBao policy and console verification (0019 helper).
  • Claims enrichment not yet via UE projection + cache (direct LLDAP resolution in paths).
  • Memberships (net-kingdom-* groups) treated as IAM facts; not yet synced as UE Membership with semantics.
  • Audit separate (see above).

7 Gaps (biggest first; see full assessment for details/recs):

  1. Missing Platform Integration Adapters (UE side or symmetric): IdentityClaimsAdapter (KeyCape claims -> Actor), AuthorizationCheckPort (to flex-auth), SecretProvider (OpenBao), EventOutbox, AuditWriter, etc. Only mocks today. Blocks UE as canonical for user facts.
  2. Bootstrap/Platform Users vs. Governed UE Lifecycle (direct LLDAP creates for root/break-glass/"net-kingdom-*" vs. UE Membership + externally_provisioned).
  3. Application Onboarding "Application" concept (KeyCape OIDC client/secret vs. UE Application + Binding records; must stay separate).
  4. Membership/Group Overlap (LLDAP groups vs. UE Membership scopes + owner/source).
  5. Governance/Workplan/Brief Split (UE brief stale May22/domain=netkingdom; 0018/0019 as NK orchestration correct but line must stay crisp).
  6. Claims Enrichment Path drift (current direct LLDAP in OIDC/bootstrap paths; must switch to adapter-owned when UE deployed; UE never in token critical path).
  7. Other: Audit/event correlation (shared IDs across UE/flex/platform; current bootstrap separate/split to audit-core); tenant platform:root special case; no prod UE deployment in NK flows yet.

Recommendations (for 0018 context): Use T02 to document current paths + gaps; T07/T08 as testbed for integration once adapters exist (e.g. update dry-run to exercise UE); T03/T09 to classify UE integration risk. NK keeps boundary/contracts/orchestration; UE owns domain impl.

See canon/standards/user-engine-boundary-contract_v0.1.md, docs/user-engine-netkingdom-integration-assessment.md, responsibility-map.md, SCOPE.md, user-engine/INTENT.md + SCOPE.md for full.

Dry-Run / User Lifecycle Tooling (0019 Polish Additions)

See NET-WP-0019 and sso-mfa/k8s/lldap/dry-run-nonroot-user.sh:

  • Safe repeatable non-root onboarding dry-run (create/verify/lock/offboard) with /tmp hygiene, k8s secret fallback, evidence (lldap_identity_verified, keycape_oidc_claims_verified, effective_access_summary, no_secret_material_recorded, actor_class, groups, lock_offboard_result, 12+ bools), validate via make/console.
  • Console: onboarding-dry-run*, claims verification (T05 helper), lifecycle-cleanup-dryrun-users.
  • Make targets + runbook entry for web-ui.
  • Integrated into lifecycle-guide.
  • Proves IAM-lifecycle contract; foundation for future UE-backed version (per assessment).

Operational Assumptions and Rebuild Notes

  • Requires live Railiance/k3s cluster with required addons (ingress, cert-manager, etc.).
  • Operator kubectl to target ns + access to password safe for bootstrap material + age keys.
  • All privileged flows show effective preview; MFA for privileged; no platform-root grants to non-king.
  • Evidence always non-secret; secrets in safe/k8s only.
  • For scratch rebuild: follow T05 guide (once complete) + evidence per step + T09 risk assessment. Use 0019 dry-run tooling as model for safe user lifecycle tests. Rehearse in isolated/namespace/scripted first (non-goal: destructive live).
  • Audit currently pragmatic for bootstrap (documented above); production correlation is follow-up.

References:

  • platform-identity-security-architecture.md, responsibility-map.md, SCOPE.md
  • docs/user-engine-netkingdom-integration-assessment.md + canon/standards/user-engine-boundary-contract_v0.1.md
  • security-bootstrap-*.md family (operator-journey, openbao-ceremony-ux, user-lifecycle, handover-cleanup, king-credential-kit, age-custody, etc.)
  • tools/security-bootstrap-console/security_bootstrap_console.py (and Makefile targets)
  • sso-mfa/k8s/lldap/{create-user.sh,dry-run-nonroot-user.sh}, keycape/verify-, privacyidea/check-
  • local-identity/ (audit.py, etc.)
  • .local/security-bootstrap.json (current gates)
  • NET-WP-0017, 0019 workplans + their evidence
  • DECISIONS.md, ADRs (e.g. 0007, 0010), canon/standards/iam-profile_v0.2.md

This document will be updated as T03 retrospective, T05 guide, T06/T08 work, and T09 risk assessment proceed. It is the single source for "what the running system actually is" for rebuild guidance.