Files
net-kingdom/workplans/NET-WP-0017-it-security-readiness-for-user-onboarding.md

337 lines
15 KiB
Markdown

---
id: NET-WP-0017
type: workplan
title: "IT Security Readiness For User Onboarding"
domain: netkingdom
repo: net-kingdom
status: active
owner: codex
topic_slug: netkingdom
created: "2026-05-26"
updated: "2026-06-01"
depends_on:
- NET-WP-0015
- NET-WP-0016
- RAIL-PL-WP-0002
state_hub_workstream_id: "385de708-fd59-4bab-a4f4-28c1c476b3ea"
---
# NET-WP-0017 - IT Security Readiness For User Onboarding
## Goal
Finish the remaining NetKingdom and Railiance security setup needed before
ordinary platform users, tenant admins, or fabric admins are onboarded.
`NET-WP-0015` established the king credential, OpenBao bootstrap ceremony, and
guided control surface. This workplan is the narrower finish-line plan: routine
admin access must use NetKingdom identity, bootstrap-era material must be
retired or explicitly accepted, audit/recovery posture must be credible, and a
first non-root onboarding dry run must prove the lifecycle model.
## Current Evidence
- `platform-root` exists in LLDAP, belongs to `net-kingdom-admins`, has MFA,
and completed KeyCape OIDC login.
- Railiance OpenBao is initialized, unsealed, and post-unseal verified.
- OpenBao initial configuration was applied; `platform/` KV and Kubernetes auth
exist.
- The initial OpenBao root token is recorded as revoked.
- Trial unseal shares were rotated.
- The KeyCape `openbao-admin` client is live and verified, including the public
`https://kc.coulomb.social` route and certificate.
- OpenBao OIDC auth configuration is applied; MFA-backed OpenBao admin login
completed successfully and the resulting token lookup showed the
`platform-admin` policy for `platform-root`.
- Declarative local OpenBao audit and authenticated audit visibility are
complete; enterprise durable tenant-aware audit retention has been split into
the standalone `audit-core` product. Residual taint closeout,
cleanup/rotation, and the first ordinary-user onboarding dry run are still
pending.
## Tasks
### T01 - Finish OIDC-Backed OpenBao Admin Login
```task
id: NET-WP-0017-T01
status: done
priority: high
state_hub_task_id: "9b087bbd-631b-4316-b94d-a8265a05b065"
```
Run the fixed OpenBao OIDC helper, record the non-secret completion flag, then
verify `platform-root` can complete:
```bash
bao login -method=oidc -path=keycape role=platform-admin
```
The verification must prove the resulting OpenBao token has the intended
`platform-admin` policy without relying on the initial root token or a manually
minted temporary operator token.
**2026-05-29:** DNS and ACME issuance for `kc.coulomb.social` are healthy:
cert-manager issued `kc-tls`, and `sso-mfa/k8s/keycape/verify-openbao-client.sh`
passes against the live KeyCape route. `configure-openbao-oidc.sh` has applied
the OpenBao `auth/keycape` OIDC configuration and `platform-admin` role. The
remaining T01 gate is the human browser login with MFA and a token lookup that
shows the expected OpenBao `platform-admin` policy.
**2026-06-01:** Added a guided console recovery action for the observed
privacyIDEA state-loss blocker: if the live instance lacks the `coulomb` realm,
LLDAP resolver, or self-service policies, the operator can run **Repair
privacyIDEA realm and self-service** from **Usecases & Runbooks**. The action
does not store secrets; it calls `repair-realm-live.sh`, prompts live, creates
temporary env files for `bootstrap-realm.sh`, removes them on exit, and then
runs `verify-t06.sh`. After repair, `platform-root` TOTP
enrollment/re-enrollment and the MFA-backed `bao login` proof are still
required.
**2026-06-01:** Fixed the follow-up OpenBao OIDC token exchange
`user not found` error caused by live `keycape-config` drift: the Secret had
lost the non-secret LLDAP lookup fields `userOU: ou=people` and
`groupOU: ou=groups`. The KeyCape live patch helper now enforces those fields
alongside the `openbao-admin` client, the live Secret was patched, KeyCape was
restarted, and `verify-openbao-client.sh` passes again.
**2026-06-01:** Deployed a KeyCape runtime lookup fix for the remaining
`user not found` token-exchange failure after config drift was ruled out. The
LDAP adapter now treats provisioning metadata validation failures as runtime
warnings instead of blocking token issuance for an otherwise resolved LLDAP
user. The patched image `main-runtime-lookup-0601` is live and
`verify-openbao-client.sh` passes after rollout.
**2026-06-01:** Deployed the follow-up KeyCape OIDC nonce fix after OpenBao
rejected the exchanged ID token with `invalid id_token nonce`. KeyCape now
persists the original authorization `nonce` through pending state and the
authorization-code session, then emits it in the ID token. The patched image
`main-nonce-0601` is live, reports 1/1 ready, and `verify-openbao-client.sh`
passes after rollout.
**2026-06-01:** Fixed the next OpenBao role configuration failure,
`error converting claim 'groups' to string`. KeyCape correctly emits `groups`
as an array for `groups_claim`; OpenBao only failed because the role also copied
that array through scalar `claim_mappings`. The helper now leaves groups in
`groups_claim`/`bound_claims` and maps only scalar `email` and
`preferred_username` metadata.
**2026-06-01:** The operator reached the OpenBao success page, "Signed in via
your OIDC provider", after reapplying the corrected role. The follow-up
terminal proof showed `token_policies`/`policies` containing `platform-admin`,
`token_meta_role: platform-admin`, and `token_meta_username: platform-root`.
T01 is closed; the pasted short-lived token should be treated as disclosed and
revoked or allowed to expire after the check.
### T02 - Close OpenBao Audit And Recovery Production Gates
```task
id: NET-WP-0017-T02
status: in_progress
priority: high
state_hub_task_id: "909944bd-843a-4a63-8c87-536cea052a88"
```
Resolve the remaining OpenBao production-trust gates:
- configure audit declaratively if API-managed audit remains rejected;
- record the interim Audit Core interface used before enterprise durable audit
retention is implemented;
- hand off durable tenant-aware audit shipping beyond the audit PVC to
`audit-core`;
- retain non-secret restore-drill evidence and repeat the drill if any
material changed;
- record emergency seal/unseal drill evidence; and
- identify the next independent escrow holder for moving beyond temporary
single-king custody.
**2026-06-01:** Started the OpenBao audit/recovery closeout. Railiance source
now has a declarative OpenBao file-audit stanza in
`helm/openbao-values.yaml`, and its initial-config helper now verifies
`bao audit list` instead of trying to create audit devices through the API.
The Railiance post-unseal verifier also warns when
`/openbao/audit/openbao-audit.log` is missing or empty. Live non-secret
checks still show OpenBao healthy and unsealed with Bound data/audit PVCs, but
the live Helm values do not yet include the declarative audit stanza and the
audit directory is empty. Do not move production secrets into OpenBao until a
planned Helm rollout is performed with unseal shares available, `file/` audit
is visible, an audit log is written, durable audit shipping beyond the PVC is
selected, and restore/emergency drill evidence plus a next escrow holder are
recorded.
**2026-06-01:** Completed the attended live rollout of the Railiance
declarative file-audit configuration. The Helm release was upgraded, the
`OnDelete` StatefulSet pod was deliberately recycled, the operator unsealed the
new pod, and `make openbao-verify-post-unseal` now reports OpenBao `2.5.4`,
`Sealed: false`, an audit directory, and a non-empty
`/openbao/audit/openbao-audit.log`. The Railiance source now pins the live
OpenBao image tag to `2.5.4` after the chart upgrade advanced the runtime from
`2.5.3`; a follow-up Helm revision 3 applied the explicit tag while the pod
remained ready. T02 remains open for the authenticated `bao audit list` proof,
durable audit shipping beyond the audit PVC, restore-drill evidence, emergency
seal/unseal drill evidence, and the next independent escrow holder.
**2026-06-01:** Added a Railiance evidence-only helper for the authenticated
OpenBao proof: `make openbao-verify-authenticated` prompts for an approved
OpenBao token without echoing it and verifies `file/` audit visibility,
`platform/` secrets, `kubernetes/` auth, `keycape/` auth, and a non-empty audit
log without mutating OpenBao configuration. The helper can also reuse a
still-valid pod token helper with
`OPENBAO_VERIFY_AUTH_ARGS=--use-token-helper`, avoiding token movement through
the local shell. It is ready to run with the MFA-backed
`platform-root`/`platform-admin` path. Durable audit shipping remains open; the
audit PVC is not a durable sink and non-secret evidence hashes or State Hub
notes are not substitutes for retained audit log custody.
**2026-06-01:** Completed the authenticated OpenBao proof through the
MFA-backed KeyCape path without printing token material. A fresh
`bao login -no-print -method=oidc -path=keycape role=platform-admin` browser
flow cached the pod token helper, then `make openbao-verify-authenticated
OPENBAO_VERIFY_AUTH_ARGS=--use-token-helper` passed. Evidence: OpenBao is
unsealed on `2.5.4`, `file/` audit is visible, `platform/` secrets are visible,
`kubernetes/` and `keycape/` auth methods are visible, and the audit log grew
from 7969 bytes to 23330 bytes during the check. The cached verifier token was
then revoked with `bao token revoke -self`. T02 remains open for durable audit
shipping beyond the audit PVC, restore-drill evidence, emergency seal/unseal
drill evidence, and the next independent escrow holder.
**2026-06-01:** Split enterprise audit retention out of this task and into the
new standalone `/home/worsch/audit-core` repo. `audit-core` now has
`INTENT.md`, a product requirements definition, and a minimal replaceable mock
backend that writes JSONL audit events to
`/tmp/audit-core/audit-YYYYMMDDTHH.jsonl` and cleans up files older than seven
days. A smoke event for the OpenBao authenticated readiness proof was written
through the mock interface, and `audit-core` tests pass. This mock backend is
acceptable for bootstrap/development wiring and NetKingdom UI integration, but
it is not durable audit custody and must not be presented as enterprise
retention. NET-WP-0017-T02 now treats the full tenant-aware durable audit
fabric as an `audit-core` follow-up rather than an OpenBao bootstrap subtask.
Remaining T02 gates are restore-drill evidence, emergency seal/unseal drill
evidence, the next independent escrow holder, and an explicit risk note if
ordinary onboarding proceeds before the production Audit Core sink exists.
**2026-06-01:** Tightened the restore-drill evidence gate. The local bootstrap
metadata currently says `restore_drill_passed: true`, but that checkbox alone
does not preserve enough non-secret evidence for review. Railiance now has a
restore evidence JSON template and `make openbao-validate-restore-evidence`
validator that checks for snapshot hashes, encrypted-snapshot hash/location,
isolated restore completion, unseal/status/test-secret verification, isolated
environment destruction, and `no_secret_material_recorded`. The NetKingdom
control surface now includes a **Validate restore drill evidence** runbook
card. T02 should not count the restore gate closed until a real non-secret
evidence file from the prior or repeated drill passes that validator.
### T03 - Close Trial Taint And Retire Bootstrap Admin Paths
```task
id: NET-WP-0017-T03
status: todo
priority: high
state_hub_task_id: "a6cd4325-8f3b-46bb-b810-ca816c35cb29"
```
Review all access paths created during the trial exposure and record the
compromise response complete only after the operator has either rotated,
revoked, reset, or explicitly accepted residual risk for:
- temporary OpenBao `platform-admin` tokens;
- bootstrap/root-token-derived paths;
- early LLDAP/Authelia/KeyCape admin credentials;
- local plaintext secret workspaces;
- bootstrap service tokens; and
- any copied command output or local shell history that may contain secret
values.
### T04 - Harden Bootstrap Infrastructure Before User Onboarding
```task
id: NET-WP-0017-T04
status: todo
priority: high
state_hub_task_id: "12c31f76-68f4-4d2b-853a-f3185cfc761c"
```
Complete the minimum hardening before ordinary users are onboarded:
- restrict direct administrative access to LLDAP and privacyIDEA to approved
operator networks or tunnels;
- verify no privileged login path bypasses MFA for platform-admin authority;
- rotate or reset bootstrap-era database, admin, and service credentials that
were created before custody was established;
- confirm host/workload checks and vulnerability scans are run or explicitly
deferred with owner/date; and
- update the bootstrap console state to `cleanup_complete` only when these
checks are recorded.
### T05 - Implement First User Lifecycle Operator Flow
```task
id: NET-WP-0017-T05
status: todo
priority: high
state_hub_task_id: "aec3ac45-18be-4b04-a863-0c8c70693739"
```
Turn the documented user lifecycle UX into the first practical operator flow
for:
- onboarding a scoped non-root user;
- temporarily locking that user;
- permanently offboarding that user;
- reviewing credentials and MFA state; and
- creating a fabric/tenant admin without platform-root authority.
The flow can begin as console/UI action cards, but it must show effective
access before saving and must not expose secrets.
### T06 - Run A Non-Root Onboarding Dry Run
```task
id: NET-WP-0017-T06
status: todo
priority: high
state_hub_task_id: "c149b2f0-c9ee-4c95-a1df-b25ed0d20579"
```
Create a test or first real non-root user using the new lifecycle flow. Verify:
- LLDAP identity and groups;
- MFA enrollment through privacyIDEA;
- KeyCape OIDC claims;
- expected application or platform scope;
- no platform-root or OpenBao root authority;
- lock/offboard path can be exercised or simulated; and
- non-secret audit/progress evidence is recorded.
This is the final gate before declaring the platform ready for normal user
onboarding.
### T07 - Review And Retire Superseded Bootstrap Workplans
```task
id: NET-WP-0017-T07
status: todo
priority: medium
state_hub_task_id: "e9ceafb2-14c0-4352-9ac7-e31628feb045"
```
After T01-T06 complete, review `NET-WP-0015`, `NET-WP-0016`,
`RAIL-PL-WP-0002`, and older NetKingdom credential/bootstrap workplans.
Mark completed work finished or archived, and leave only longer-horizon items
such as multi-custodian upgrade, enterprise federation, dynamic database
credentials, object-storage STS vending, and application onboarding contracts.
## Acceptance Criteria
- Routine OpenBao administration works through NetKingdom/KeyCape OIDC and MFA.
- The initial root token and temporary OpenBao admin tokens are not normal
operating paths.
- Audit, recovery, emergency seal, and restore evidence are recorded without
secret values.
- Bootstrap-era privileged credentials have been rotated, reset, revoked, or
explicitly accepted as residual risk.
- A non-root user onboarding dry run succeeds and proves lock/offboard/review
paths.
- The bootstrap console can honestly move beyond Admin Identity Integration
into cleanup and reopening.