diff --git a/docs/NetkingdomRuntimeArchitecture.md b/docs/NetkingdomRuntimeArchitecture.md new file mode 100644 index 0000000..dc30e80 --- /dev/null +++ b/docs/NetkingdomRuntimeArchitecture.md @@ -0,0 +1,205 @@ +# NetKingdom Runtime Architecture + +**Status:** draft (initial capture for NET-WP-0018-T02) +**Date:** 2026-06-03 +**Context:** Documents the *as-deployed* runtime after the first successful bootstrap (0015-0017) + T06-adjacent polish (0019). Not an idealized future architecture. Specific enough to guide a scratch rebuild or rehearsal without rediscovering integration details. Incorporates pragmatic audit paths and known UE integration points/gaps per the persisted assessment. + +This is the working system that survived the first bootstrap ceremony and is now the target for automation, validation, guide, and risk assessment in NET-WP-0018. + +## Planes Model + +(From platform-identity-security-architecture.md baseline) + +- **Bootstrap plane**: Establishes initial trust before full platform services. Minimal authority for cluster access, initial identity/secret injection, break-glass recovery, transition to managed runtime. Owned by railiance-infra/cluster + net-kingdom credential bootstrap. Uses SOPS/age for at-rest + offline packets. +- **Platform control plane**: Shared security services (identity, MFA, secrets, policy, audit, authorization). net-kingdom owns canonical architecture/IAM Profile/SSO/MFA/bootstrap decisions; deployed via Railiance stack. +- **Tenant planes**: Workloads (Coulomb as tenant zero/reference). Must not alter platform root trust. + +Recursive trust rule: Normal tenant admin (even Coulomb) must never suffice to alter platform root of trust (IAM Profile semantics, break-glass, global MFA, OpenBao root/unseal, flex-auth policy pipelines, audit retention, etc.). + +## Identity Stores, MFA Realms, and OIDC Flows + +**Lightweight mode (key-cape, current primary for bootstrap/internal):** +- Directory: LLDAP (https://lldap.coulomb.social for admin; internal for Authelia). +- SSO/Proxy: Authelia (over LLDAP). +- MFA/Token: privacyIDEA (self-service enrollment for TOTP; pi-admin for setup/repair; used for assurance on privileged actions). +- OIDC Provider: KeyCape (issuer https://kc.coulomb.social; conforms to NetKingdom IAM Profile v0.2). + - KeyCape issues tokens with required claims: tenant, principal_type, groups, roles, scope/scp, assurance. + - Registered clients include: netkingdom-bootstrap-console (for console OIDC login), openbao-admin (for OpenBao OIDC auth). + - Redirects: http://localhost:8250/oidc/callback, http://127.0.0.1:8250/oidc/callback. +- Groups/roles for bootstrap: net-kingdom-admins (for platform-admin OpenBao policy), net-kingdom-users (for scoped non-root). +- platform-root / king credential: dedicated LLDAP user (separate from personal accounts like tegwick). Password in operator password safe; TOTP via privacyIDEA; roles include platform-root-custodian, openbao-admin, identity-admin. + +**Expanded mode:** Keycloak (for enterprise federation/SAML/Entra, complex realms, delegated admin). Not yet primary for bootstrap. + +**Capability progression (C1 lightweight -> C2 MFA/token):** +- C1: Single-factor OIDC SSO over internal directory (key-cape: Authelia + LLDAP). +- C2a (light 2FA): Authelia built-in TOTP/WebAuthn. +- C2b (token authority): privacyIDEA for hardware tokens, many types, self-service, lifecycle. + +Applications target the IAM Profile v0.2 contract (`canon/standards/iam-profile_v0.2.md`), not concrete providers. + +**Token flows (high level):** +- Human/service -> Authelia/LLDAP or Keycloak -> KeyCape/Keycloak issues IAM Profile token -> claims to flex-auth (for authz) or directly to protected services / OpenBao OIDC. +- For bootstrap console: OIDC login verified to obtain platform-admin via KeyCape -> OpenBao. + +## Authelia Handoff + +Authelia acts as the SSO proxy/authenticator in lightweight mode, fronting LLDAP directory + (where enabled) privacyIDEA MFA. Handoffs normalized identity to KeyCape for OIDC issuance. Used for day-to-day logins; email (e.g. bernd.worsch@gmail.com) is notification-only, not auth source for privileged/root. + +## OpenBao OIDC Admin Path and Secrets/Credential Path + +**OpenBao as runtime secrets authority (post-bootstrap):** +- KV v2 for platform config. +- Dynamic DB creds, K8s auth/workload identity. +- Future object storage STS brokering. +- Audit devices, lease/revocation. +- Delivery: direct clients, External Secrets Operator -> K8s Secrets, CSI mounts. +- Auth: OIDC/JWT against KeyCape (maps claims/groups to policies, e.g. platform-admin for net-kingdom-admins group). +- platform-root can obtain platform-admin policy via KeyCape/MFA (proven in 0015/0017). +- Root token: revoked/dispositioned after init; used only for bootstrap/break-glass. Unseal keys in custody (age/SOPS protected, offline packets, king credential). + +**Bootstrap to runtime transition:** +- SOPS/age for initial cluster secrets, emergency bundles, Git at-rest. +- Once OpenBao alive + configured (auth, mounts, policies, audit): switch to it as long-lived authority. +- Bootstrap-era creds/databases/access paths reviewed/rotated/cleaned before production reliance (see cleanup_complete, T03/T04 in 0017). + +**Platform root custody (see docs/platform-root-custody.md):** +- Initial setup operator: tegwick / bernd.worsch@gmail.com (notification contact). +- King credential: dedicated, rarely used platform-root identity (break-glass only). Not day-to-day Gitea/email account. +- Temporary single-king custody (with MFA, encrypted offline, password-safe refs) allowed pre-prod; target two-of-three escrow. +- Never store unseal/root/OTP/private keys in Git, State Hub, email, shell history, etc. + +## Bootstrap UI / Console State (Control Surface) + +Implemented in `tools/security-bootstrap-console/security_bootstrap_console.py` (non-secret only; refuses live OpenBao init or secret collection). + +**Current stage (post 0017/0019):** S6 - Reopen under custody. + +**Key gates / posture (from metadata + console):** +- King credential kit prepared. +- Custody strategy approved (temporary-single-king). +- OpenBao preflight, init ceremony (attended only), initial config, KeyCape client, OIDC auth, admin login via KeyCape/MFA, root token disposition (revoked), restore drill, cleanup/rotation, platform reopened. +- audit_core_posture: bootstrap risk accepted (production sink not ready); owner, review date (2026-07-02), note recorded. See audit_core_posture_ready() / reason(). +- Other: custodian age keys confirmed, mfa enrolled (TOTP via privacyIDEA), oidc_login_verified, no_secret_capture_confirmed, etc. +- .local/security-bootstrap.json holds non-secret flags (updated via console approve/validate flows). + +**Available actions (status output + parser):** king-kit, custody-packet, openbao-preflight, handover-checklist, validate-* (t02, cleanup, lifecycle-flow, onboarding-dry-run), custody-roster-template, lifecycle-flow-template, lifecycle-guide, onboarding-dry-run-template, onboarding-dry-run (delegates to orchestrator), onboarding-dry-run-claims, lifecycle-cleanup-dryrun-users, validate-custody-roster, metadata-template, approve-custody-mode, web-ui, etc. + +**Web UI:** Served locally (default :8765 or similar); forms for custody approval, responsibility, audit_core flags (production_sink_ready, bootstrap_risk_accepted + owner/review/note), cleanup_complete, platform_reopened. Uses JS to compute gates from metadata. + +**Runbooks / payloads:** privacyIDEA realm repair, Key material compromised (taint), generate new unseal keys, emergency lock-down, restore drill, OpenBao token revocation, **User lifecycle dry-run (T06)** (from 0019: references dry-run-nonroot-user.sh, make security-bootstrap-onboarding-dry-run, console subcmds, NET-WP-0019). + +**0019 polish additions (T06-adjacent):** +- dry-run-nonroot-user.sh orchestrator (/tmp workspace + EXIT trap cleanup; k8s fallback for LLDAP_ADMIN_PASS never writing persistent bootstrap/secrets for test users; create --test non-root; verifs (MFA, KeyCape); optional GraphQL lock/offboard; populate + validate evidence.json). +- Console subcommands + make targets for repeatable dry-run, claims verification (infers from LLDAP groups + T01 role binding; warns on platform-root/admins), cleanup by pattern. +- Evidence templates/validators for onboarding dry-run (12+ bools: effective access preview, no secret material recorded, actor_class != king, groups limited to net-kingdom-users, lldap/keycape verified, etc.). +- Integrated into lifecycle-guide (T06 DRY-RUN section) and runbook_payloads for web-ui exposure. +- Safer secret handling in create-user.sh (k8s extract fallback). + +**Evidence discipline:** /tmp/netkingdom-*-evidence.json (exact strings + bools); validated by console; non-secret only (refuses secret markers). + +## State Hub Relation + +- Tracks domain netkingdom (topic a6c6e745-bf54-4465-9340-1534a2be493e). +- Workstreams/tasks (e.g. this NET-WP-0018 id 800f9f16-..., 0019, 0017). +- Progress events (POST /progress/ with workstream/task for what was done; used for tracking impl + feeding retrospectives). +- Decisions (POST /decisions/ for key choices). +- Inbox for cross-agent coordination. +- .custodian-brief.md generated by fix-consistency (reflects file + DB). +- Used for audit correlation in pragmatic layer (events link to actors/decisions). + +## k8s / Deployment, DNS, Routes, Ingress, Trust Boundaries + +**Namespaces/components (from manifests + usage):** +- sso: LLDAP, privacyIDEA, KeyCape (keycape-config Secret), Authelia? +- openbao: OpenBao (0 pod; bao status via kubectl exec). +- Railiance platform services (DBs, etc.) for stateful backing. + +**Ingress / DNS (internal .coulomb.social):** +- LLDAP admin: lldap.coulomb.social +- KeyCape: kc.coulomb.social (OIDC issuer) +- Console OIDC callbacks: localhost/127.0.0.1:8250 +- Other platform services via railiance-cluster ingress + cert-manager + NetworkPolicies. + +**Trust boundaries / token flows (high level):** +- Bootstrap: local files (SOPS/age), operator password safe, k8s secrets (lldap-credentials etc.), direct kubectl for dry-run safety. +- Runtime: OIDC tokens (IAM Profile claims) -> KeyCape/Keycloak -> flex-auth decisions or OpenBao OIDC (group->policy e.g. net-kingdom-admins -> platform-admin). +- Workload identity: K8s auth to OpenBao for dynamic creds. +- No tenant can reach platform-root paths without explicit platform control plane authority + custody safeguards. +- Break-glass: king credential + unseal shares (custody roster, age protected). + +**Operational assumptions:** +- Live k3s/Railiance cluster with ingress/cert-manager/NetworkPolicy. +- Operator has kubectl access to sso/openbao ns + password safe for bootstrap material. +- Non-secret evidence + State Hub for progress; secrets never in Git/console. +- Capability-driven identity mode (lightweight key-cape sufficient for many cases). +- Audit is currently pragmatic/separate during bootstrap (see below). + +## Pragmatic Audit Paths (Current Bootstrap) + +Per Coordination Notes and assessment gap 7 (audit/event correlation): +- **local-identity/audit.py**: Append-only TSV ~/.local-identity/audit.log (TSV: TS command username outcome; mode 600; silent on I/O failure). For local-identity CLI + OIDC server events during bootstrap phase. +- **OpenBao audit**: Retained on audit PVC + Audit Core mock wiring only (production tenant-aware durable sink not ready; risk accepted with owner/review note). +- **State Hub + console evidence**: /progress/ events (with workstream/task/decision correlation), /decisions/, non-secret evidence.json (from templates + live data + validators), metadata flags (audit_core_*), runbook payloads. Used for impl tracking and bootstrap ceremony records. +- **Separate from UE**: Current bootstrap (direct LLDAP/Keycloak for platform-root/break-glass/test users in 0015-0019) does not yet route through user-engine AuditRecord/OutboxEvent or claims_enrichment. UE emits its own audit for domain facts when used. +- **Contract requirement**: Must link request/actor/decision/user_engine_audit/outbox_event (audit correlation bundle in boundary contract). Current is pragmatic/separate for bootstrap. + +Documented here for rebuild guidance. Proper integration (adapters + sinks) is follow-up (see assessment recs + T03/T09). + +## UE Integration Points and Known Gaps (from docs/user-engine-netkingdom-integration-assessment.md) + +**Fit (no intent conflicts):** +- net-kingdom: IAM orchestration + authn/coarse claims (IAM Profile) + bootstrap + secrets (OpenBao) + contract governance + meta-orchestration of user-domain facts. +- user-engine: headless user-domain backend (User/Account/ExternalIdentity/Membership with owning_system/source/freshness/delete_semantics, Application+Binding+Catalog, ProfileValue layered, EffectiveProfile+projections (incl. claims_enrichment, audit), ports (IdentityClaimsAdapter, AuthorizationCheckPort, SecretProvider, EventOutbox, AuditWriter), OutboxEvent/AuditRecord). In-mem MVP finished (USER-WPs); local standalone only currently. +- Integration via claims/contracts/adapters (no shared code). UE consumes verified Actor from claims; delegates authz to PDP (flex-auth); exports projections. NK orchestrates boundaries ("owner wins" for membership sync, app onboarding 8-step bindings as separate records, etc.). + +**Current points in runtime:** +- Bootstrap/T06 dry-runs + platform users use direct LLDAP/Keycloak (IAM side). +- KeyCape OIDC claims (groups + email) feed OpenBao policy and console verification (0019 helper). +- Claims enrichment not yet via UE projection + cache (direct LLDAP resolution in paths). +- Memberships (net-kingdom-* groups) treated as IAM facts; not yet synced as UE Membership with semantics. +- Audit separate (see above). + +**7 Gaps (biggest first; see full assessment for details/recs):** +1. Missing Platform Integration Adapters (UE side or symmetric): IdentityClaimsAdapter (KeyCape claims -> Actor), AuthorizationCheckPort (to flex-auth), SecretProvider (OpenBao), EventOutbox, AuditWriter, etc. Only mocks today. Blocks UE as canonical for user facts. +2. Bootstrap/Platform Users vs. Governed UE Lifecycle (direct LLDAP creates for root/break-glass/"net-kingdom-*" vs. UE Membership + externally_provisioned). +3. Application Onboarding "Application" concept (KeyCape OIDC client/secret vs. UE Application + Binding records; must stay separate). +4. Membership/Group Overlap (LLDAP groups vs. UE Membership scopes + owner/source). +5. Governance/Workplan/Brief Split (UE brief stale May22/domain=netkingdom; 0018/0019 as NK orchestration correct but line must stay crisp). +6. Claims Enrichment Path drift (current direct LLDAP in OIDC/bootstrap paths; must switch to adapter-owned when UE deployed; UE never in token critical path). +7. Other: Audit/event correlation (shared IDs across UE/flex/platform; current bootstrap separate/split to audit-core); tenant platform:root special case; no prod UE deployment in NK flows yet. + +**Recommendations (for 0018 context):** Use T02 to document current paths + gaps; T07/T08 as testbed for integration once adapters exist (e.g. update dry-run to exercise UE); T03/T09 to classify UE integration risk. NK keeps boundary/contracts/orchestration; UE owns domain impl. + +See canon/standards/user-engine-boundary-contract_v0.1.md, docs/user-engine-netkingdom-integration-assessment.md, responsibility-map.md, SCOPE.md, user-engine/INTENT.md + SCOPE.md for full. + +## Dry-Run / User Lifecycle Tooling (0019 Polish Additions) + +See NET-WP-0019 and sso-mfa/k8s/lldap/dry-run-nonroot-user.sh: +- Safe repeatable non-root onboarding dry-run (create/verify/lock/offboard) with /tmp hygiene, k8s secret fallback, evidence (lldap_identity_verified, keycape_oidc_claims_verified, effective_access_summary, no_secret_material_recorded, actor_class, groups, lock_offboard_result, 12+ bools), validate via make/console. +- Console: onboarding-dry-run*, claims verification (T05 helper), lifecycle-cleanup-dryrun-users. +- Make targets + runbook entry for web-ui. +- Integrated into lifecycle-guide. +- Proves IAM-lifecycle contract; foundation for future UE-backed version (per assessment). + +## Operational Assumptions and Rebuild Notes + +- Requires live Railiance/k3s cluster with required addons (ingress, cert-manager, etc.). +- Operator kubectl to target ns + access to password safe for bootstrap material + age keys. +- All privileged flows show effective preview; MFA for privileged; no platform-root grants to non-king. +- Evidence always non-secret; secrets in safe/k8s only. +- For scratch rebuild: follow T05 guide (once complete) + evidence per step + T09 risk assessment. Use 0019 dry-run tooling as model for safe user lifecycle tests. Rehearse in isolated/namespace/scripted first (non-goal: destructive live). +- Audit currently pragmatic for bootstrap (documented above); production correlation is follow-up. + +**References:** +- platform-identity-security-architecture.md, responsibility-map.md, SCOPE.md +- docs/user-engine-netkingdom-integration-assessment.md + canon/standards/user-engine-boundary-contract_v0.1.md +- security-bootstrap-*.md family (operator-journey, openbao-ceremony-ux, user-lifecycle, handover-cleanup, king-credential-kit, age-custody, etc.) +- tools/security-bootstrap-console/security_bootstrap_console.py (and Makefile targets) +- sso-mfa/k8s/lldap/{create-user.sh,dry-run-nonroot-user.sh}, keycape/verify-*, privacyidea/check-* +- local-identity/ (audit.py, etc.) +- .local/security-bootstrap.json (current gates) +- NET-WP-0017, 0019 workplans + their evidence +- DECISIONS.md, ADRs (e.g. 0007, 0010), canon/standards/iam-profile_v0.2.md + +This document will be updated as T03 retrospective, T05 guide, T06/T08 work, and T09 risk assessment proceed. It is the single source for "what the running system actually is" for rebuild guidance. \ No newline at end of file diff --git a/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md b/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md index 1f85cf1..2e816e7 100644 --- a/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md +++ b/workplans/NET-WP-0018-bootstrap-automation-and-rebuild-readiness.md @@ -117,7 +117,7 @@ guide, UI automation, validations, and rebuild-risk assessment. ```task id: NET-WP-0018-T02 -status: todo +status: done priority: high state_hub_task_id: "121ee797-e3f5-4d3e-9baa-cfa8c92f8a66" ``` @@ -140,6 +140,10 @@ vs. UE Membership (owning_system etc.), bootstrap local-identity vs. UE local mode, and the boundary contract as the governance layer. Include refs to canon/standards/user-engine-boundary-contract_v0.1.md and the assessment. +**2026-06-03:** Started T02. Using pragmatic tracking (this note + will POST /progress/ with task). Gathering deployed components from existing docs, code, and configs to produce specific-as-deployed doc (not idealized). Will cover all listed items + pragmatic audit paths, dry-run 0019 additions, UE integration points/gaps per assessment. + +**2026-06-03:** T02 complete. Created docs/NetkingdomRuntimeArchitecture.md (comprehensive sections on planes model, identity stores/MFA/OIDC flows (lightweight key-cape: LLDAP at lldap.coulomb.social + Authelia + privacyIDEA + KeyCape issuer https://kc.coulomb.social with bootstrap clients), Authelia handoff, OpenBao OIDC admin + secrets/credential path (SOPS/age bootstrap -> runtime with K8s auth, ESO, leases), bootstrap console/UI state (S6 Reopen, full gates incl. audit_core_posture, 0019 dry-run orchestrator/console subcmds/make targets/evidence/validators/runbook entry), State Hub relation (progress/decisions for tracking), k8s/DNS/routes/ingress/trust boundaries (sso/openbao ns, recursive rule, concrete hosts), operational assumptions + rebuild notes. Explicitly includes current pragmatic audit paths (local-identity/audit.py TSV, OpenBao PVC+mock, State Hub/console evidence) and UE integration points + 7 gaps (from assessment + contract refs). Specific as-deployed for rebuild guidance. This doc now feeds T03 retrospective, T05 guide, T09 risk, and T02/T08 validation targets. + ### T03 - Produce A Bootstrap Retrospective And Automation Gap Matrix ```task