generated from coulomb/repo-seed
Add unified metadata-only audit.jsonl with secret-material guard, instrument sign/access/worker paths, and expose warden activity CLI. Surface broker hint when VAULT_TOKEN is unset, refresh INTENT/SCOPE docs, and add production integration checklists plus catalog lane promotion playbook.
271 lines
12 KiB
Markdown
271 lines
12 KiB
Markdown
# INTENT
|
||
|
||
> This file captures **why this repository exists**, the **direction it is
|
||
> moving toward**, and the **kind of system it is meant to become**.
|
||
> It is intentionally **aspirational and stable**, not a description of
|
||
> current implementation. See `SCOPE.md` for what is implemented today.
|
||
|
||
---
|
||
|
||
## One-liner
|
||
|
||
**Operational access steward for the NetKingdom security model — knows the platform
|
||
credential lanes, keeps workload posture conformance aligned, and issues short-lived
|
||
SSH certificates where that lane belongs to ops-warden.**
|
||
|
||
---
|
||
|
||
## Why This Exists
|
||
|
||
Development workers — human operators, kaizen agents, CI automations, and
|
||
custodian tooling — need **safe, attributable access** across an increasingly
|
||
complex NetKingdom stack: identity, MFA, authorization, runtime secrets, SSH
|
||
reachability, and tunnel transport.
|
||
|
||
That stack is easy to misuse:
|
||
|
||
- static SSH keys and pasted API tokens in chat or Git
|
||
- wrong subsystem chosen for a credential need (OpenBao vs warden vs key-cape)
|
||
- drift between NetKingdom architecture canon and what operators actually run
|
||
- ad hoc rediscovery of bootstrap and custody rules every time a worker needs access
|
||
- unclear security blockers because dev/test/prod posture and workload maturity are
|
||
not named before someone asks for real credentials
|
||
|
||
**ops-warden exists so operational access has a custodian-domain home** that
|
||
understands NetKingdom security infrastructure, routes workers to the right
|
||
subsystem, keeps local guidance current, and **directly operates only the SSH
|
||
short-lived certificate lane** it owns.
|
||
|
||
---
|
||
|
||
## The Mission
|
||
|
||
> *Where we are going.*
|
||
|
||
ops-warden **issues short-lived SSH certificates and routes every other credential
|
||
need to the subsystem that owns it.** It is not a desk that wraps the platform; it
|
||
owns one lane and points at the rest:
|
||
|
||
1. **Know** the NetKingdom security model — identity, authorization, secrets,
|
||
SSH access, tunnels, bootstrap custody, and tenant/platform boundaries.
|
||
2. **Route, and assist.** Point workers to the correct subsystem for each credential
|
||
type instead of becoming a universal secret vending machine — through the wiki and
|
||
a machine-readable routing catalog that *points at* the owner's docs rather than
|
||
restating them. Beyond pointing, **assist**: the `warden access` front door renders
|
||
the exact auth method, path, and command for any need and — for `exec_capable`
|
||
lanes — proxies the fetch *as the caller* (a transparent, policy-gated, audited
|
||
conduit that holds, caches, and logs **nothing**). For **owner-native exec** lanes
|
||
(secrets-engine `exec`, railiance-platform `credential exec`) ops-warden routes to
|
||
the owner's front door — it does not mint tokens or run the owner's tool itself.
|
||
This is the assist layer, not a universal broker: custody stays in OpenBao /
|
||
secrets-engine / the platform broker; authorization in flex-auth.
|
||
3. **Steward workload security posture conformance.** Author the ops-security slice
|
||
for environment posture (`dev/test/prod`) and workload maturity (`M0-M3`), then
|
||
ship descriptors and read-only checks that identify whether a secret-flow blocker
|
||
is real, owner-routed, or removable with a contract double. Runtime enforcement
|
||
remains flex-auth; custody remains OpenBao.
|
||
4. **Align** runbooks, wiki, inventory patterns, and scorecard checks with
|
||
NetKingdom canon as the platform evolves (OpenBao-first, flex-auth policy,
|
||
key-cape IAM Profile, railiance deployment layers).
|
||
5. **Issue** short-lived SSH certificates for `adm` / `agt` / `atm` actors when
|
||
host or ops reachability requires the SSH lane — via `warden sign`,
|
||
`cert_command`, and `ops-ssh-wrapper`. This is the **only** lane ops-warden
|
||
executes with its own authority.
|
||
6. **Audit** every ops-warden action — SSH signs, access proxy handoffs, worker
|
||
coordination ticks — in one metadata-only trail (`warden activity`) so
|
||
gatekeeping is observable, not tribal knowledge.
|
||
|
||
---
|
||
|
||
## NetKingdom Security Literacy
|
||
|
||
ops-warden should be fluent in the platform architecture documented in
|
||
`net-kingdom` — especially:
|
||
|
||
| Plane / component | Role in access | ops-warden relationship |
|
||
| --- | --- | --- |
|
||
| **key-cape / Keycloak** | Identity — who is the actor, MFA, IAM Profile claims | Instruct identity path; do not re-implement OIDC |
|
||
| **flex-auth + Topaz** | Authorization — may this actor perform this action | Caller-side policy gate shipped (opt-in); production flip is flex-auth's |
|
||
| **OpenBao** | Runtime secrets — API keys, dynamic creds, leases, audit | Instruct custody paths; SSH engine is signing backend only; proxy reads as caller when `exec_capable` |
|
||
| **secrets-engine** | Owner-native secret-exec (`secrets-engine exec`) | Route provisioned exec lanes (e.g. npm publish); ops-warden does not hold tokens |
|
||
| **railiance-platform** (credential broker) | Scoped lease grants (`credential exec`) | Route `warden-sign` token needs; ops-warden does not mint OpenBao tokens |
|
||
| **ops-warden** | Operational SSH certificates — short-lived host access | **Own and issue** this lane |
|
||
| **ops-bridge** | Tunnel transport — consumes certs via `cert_command` | Primary consumer; document integration |
|
||
| **railiance-infra** | Host principals, force-command, SSH hardening | Instruct host-side deployment; do not own Ansible |
|
||
| **railiance-platform** (deploy) | OpenBao/K8s/platform service deployment | Instruct production endpoints; do not deploy clusters |
|
||
|
||
Canonical references:
|
||
|
||
- `net-kingdom/docs/platform-identity-security-architecture.md`
|
||
- `net-kingdom/docs/responsibility-map.md`
|
||
- `wiki/AccessManagementDirective.md` (ops SSH actor model)
|
||
|
||
---
|
||
|
||
## Responsibility Boundary
|
||
|
||
### ops-warden owns
|
||
|
||
- NetKingdom-aligned **operational SSH access** guidance and stewardship
|
||
- **SSH certificate issuance** for registered `adm` / `agt` / `atm` actors
|
||
- Actor inventory, TTL/principal policy, cert-side scorecard, unified audit trail
|
||
- `cert_command` contract and `ops-ssh-wrapper` automation surface
|
||
- Keeping ops-warden docs and patterns aligned with NetKingdom security evolution
|
||
- Workload Security Posture standard, conformance descriptors/checks, and dev-tier
|
||
contract-double guidance for secret-flow readiness
|
||
- Coordination worker stewardship — triage ops-warden's State Hub inbox with
|
||
conservative defaults (draft-only unless `--full-auto`)
|
||
|
||
### ops-warden instructs but does not own
|
||
|
||
| Need | Route to |
|
||
| --- | --- |
|
||
| OIDC login, MFA, human identity claims | key-cape / Keycloak (NetKingdom IAM Profile) |
|
||
| Policy decision — may actor X access resource Y | flex-auth |
|
||
| API keys, provider secrets, DB creds, object-storage STS | OpenBao (+ flex-auth policy where required) |
|
||
| Inter-Hub operator keys, LLM provider credentials | OpenBao or approved operator secret store |
|
||
| Tunnel lifecycle, port forwarding | ops-bridge |
|
||
| `/etc/ssh/auth_principals/`, host hardening | railiance-infra |
|
||
| OpenBao cluster init/unseal, platform deploy | railiance-platform |
|
||
|
||
**ops-warden is not a general secrets manager.** It may document *how* workers
|
||
obtain non-SSH credentials; it must not store long-lived secrets in Git, State
|
||
Hub, workplans, logs, or chat.
|
||
|
||
---
|
||
|
||
## Design Principles
|
||
|
||
### 1. Right lane, right subsystem
|
||
|
||
Every credential request should land in the subsystem NetKingdom designed for it.
|
||
ops-warden optimizes for **correct routing** as much as for **fast issuance**.
|
||
|
||
### 2. Short-lived by default (SSH lane)
|
||
|
||
Operational SSH access uses CA-signed certificates with TTL and principals —
|
||
never unbounded static keys in worker workflows.
|
||
|
||
### 3. Align with canon, reduce drift
|
||
|
||
When NetKingdom security architecture changes (e.g. OpenBao standardization,
|
||
new bootstrap lanes), ops-warden updates its wiki, SCOPE, and runbooks so dev
|
||
workers do not reconstruct decisions from stale chat history.
|
||
|
||
### 4. Attributable actors
|
||
|
||
Humans, agents, and automations are distinct actor types (`adm` / `agt` / `atm`)
|
||
with naming, TTL, and principal conventions — matching the Access Management
|
||
Directive and NetKingdom agent-operating model.
|
||
|
||
### 5. Implement narrowly, guide broadly
|
||
|
||
**Implement** only what belongs in the SSH certificate lane.
|
||
**Guide** across the full NetKingdom security surface through documentation,
|
||
scorecard checks, inventory patterns, and future policy-integration hooks.
|
||
|
||
### 6. Observable gatekeeping
|
||
|
||
Every ops-warden action appends metadata-only audit events; `warden activity`
|
||
answers *what happened recently* in one command. Compliance checks (scorecard) make
|
||
cert-side policy violations visible before they become incidents.
|
||
|
||
---
|
||
|
||
## Credential flow (target mental model)
|
||
|
||
```text
|
||
Development worker needs access
|
||
|
|
||
v
|
||
ops-warden (issue SSH; route / assist the rest)
|
||
|
|
||
+-- SSH host / ops reachability? --------> warden sign / cert_command
|
||
| (OpenBao SSH engine; scoped token via credential broker)
|
||
|
|
||
+-- Owner-native secret exec? -----------> secrets-engine exec
|
||
| (e.g. npm publish) or railiance-platform credential exec
|
||
|
|
||
+-- Generic API / DB / provider secret? -> OpenBao path
|
||
| (warden access proxies as caller when exec_capable)
|
||
|
|
||
+-- Authorization required? ------------> flex-auth decision
|
||
| (caller-side gate on sign + access when policy.enabled)
|
||
|
|
||
+-- Identity / MFA required? -------------> key-cape / Keycloak path
|
||
|
|
||
+-- Tunnel only? ------------------------> ops-bridge + cert_command
|
||
```
|
||
|
||
The steward role spans documentation, runbooks, the SSH CLI, the machine-readable
|
||
routing catalog with `warden route` lookup, policy-gated issuance, workload posture
|
||
conformance, the coordination worker, unified audit (`warden activity`), and — since
|
||
WARDEN-WP-0014 — the `warden access` assist layer that advises, routes owner-native
|
||
exec lanes, and (for generic `exec_capable` lanes) proxies fetches as the caller
|
||
without holding the value.
|
||
|
||
---
|
||
|
||
## Relationship to NetKingdom
|
||
|
||
NetKingdom owns the **canonical security architecture** and meta-orchestration
|
||
across orchestrated repos. ops-warden is a **custodian-domain execution repo**
|
||
for one security lane plus operational guidance.
|
||
|
||
- NetKingdom defines *what the platform security model is*
|
||
- ops-warden keeps *operational SSH access and worker routing* aligned with it
|
||
- Railiance repos *deploy* what NetKingdom and component repos specify
|
||
|
||
ops-warden should appear in NetKingdom responsibility and pattern material as
|
||
the **operational SSH credential authority**, not as a replacement for
|
||
OpenBao or flex-auth.
|
||
|
||
---
|
||
|
||
## Success criteria
|
||
|
||
ops-warden is succeeding when:
|
||
|
||
1. A dev worker can determine **which subsystem** to use for a credential need
|
||
without guessing or pasting secrets into agent sessions.
|
||
2. SSH access for agents and operators is **short-lived, inventoried, and audited**.
|
||
3. ops-bridge and other consumers integrate via **stable cert_command** without
|
||
backend-specific branching.
|
||
4. NetKingdom security evolution (OpenBao, IAM Profile, bootstrap lanes) is
|
||
reflected in ops-warden docs within the same maintenance cycle.
|
||
5. Non-SSH secrets remain **out of ops-warden storage** — only documented paths.
|
||
6. Security blockers can be classified by environment posture, workload maturity,
|
||
owner route, and non-secret evidence instead of by vague credential risk.
|
||
|
||
---
|
||
|
||
## Non-goals
|
||
|
||
- Universal credential broker for all secret types
|
||
- Runtime enforcement of the workload secret-flow lattice (flex-auth owns that)
|
||
- Replacing OpenBao, flex-auth, key-cape, or railiance deployment ownership
|
||
- Storing Inter-Hub, LLM provider, or other long-lived API keys
|
||
- Host-side SSH configuration deployment
|
||
- **Duplicating or restating another subsystem's procedure** — routing material
|
||
points at the owner's docs; it does not fork them
|
||
- SSO / Teleport at scale (trigger per Access Management Directive §6.2)
|
||
|
||
---
|
||
|
||
## Evolution notes
|
||
|
||
The repository shipped the SSH CA CLI first (WARDEN-WP-0001–0003). The
|
||
stewardship and NetKingdom-alignment mission is the **next stratum** — docs,
|
||
routing canon, inventory standards, production OpenBao SSH engine alignment,
|
||
flex-auth integration design, and NetKingdom cross-links — without collapsing
|
||
platform boundaries.
|
||
|
||
See `wiki/CredentialRouting.md` for worker-facing routing,
|
||
`wiki/WorkloadSecurityPosture.md` for the posture/maturity conformance model,
|
||
`wiki/NetKingdomSecurityMap.md` for component literacy,
|
||
`wiki/AuditTrail.md` for the unified activity log,
|
||
`history/2026-07-01-intent-scope-gap-analysis.md` for the latest gap analysis,
|
||
`history/2026-06-18-post-wp0008-intent-scope-reassessment.md` for the SSH lane
|
||
reassessment, and archived workplans WP-0006–0008 for stewardship and production
|
||
closeout execution.
|