generated from coulomb/repo-seed
Drop the "operational access desk" framing (and the rejected "coach" metaphor) for plain language: ops-warden issues short-lived SSH certs and routes every other credential need to its owner. SSH is the only lane it executes. Adds WARDEN-WP-0010/0011/0012 with a pointer-layer routing catalog that points at owner docs rather than restating them, enforced structurally (non-SSH entries carrying a steps block fail CI). Drops the scope-creep-prone `check` command; hides unshipped-path scenarios as draft. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
233 lines
9.5 KiB
Markdown
233 lines
9.5 KiB
Markdown
# INTENT
|
||
|
||
> This file captures **why this repository exists**, the **direction it is
|
||
> moving toward**, and the **kind of system it is meant to become**.
|
||
> It is intentionally **aspirational and stable**, not a description of
|
||
> current implementation. See `SCOPE.md` for what is implemented today.
|
||
|
||
---
|
||
|
||
## One-liner
|
||
|
||
**Operational access steward for the NetKingdom security model — knows the platform
|
||
credential lanes, keeps them aligned, and issues short-lived SSH certificates where
|
||
that lane belongs to ops-warden.**
|
||
|
||
---
|
||
|
||
## Why This Exists
|
||
|
||
Development workers — human operators, kaizen agents, CI automations, and
|
||
custodian tooling — need **safe, attributable access** across an increasingly
|
||
complex NetKingdom stack: identity, MFA, authorization, runtime secrets, SSH
|
||
reachability, and tunnel transport.
|
||
|
||
That stack is easy to misuse:
|
||
|
||
- static SSH keys and pasted API tokens in chat or Git
|
||
- wrong subsystem chosen for a credential need (OpenBao vs warden vs key-cape)
|
||
- drift between NetKingdom architecture canon and what operators actually run
|
||
- ad hoc rediscovery of bootstrap and custody rules every time a worker needs access
|
||
|
||
**ops-warden exists so operational access has a custodian-domain home** that
|
||
understands NetKingdom security infrastructure, routes workers to the right
|
||
subsystem, keeps local guidance current, and **directly operates only the SSH
|
||
short-lived certificate lane** it owns.
|
||
|
||
---
|
||
|
||
## The Mission
|
||
|
||
> *Where we are going.*
|
||
|
||
ops-warden **issues short-lived SSH certificates and routes every other credential
|
||
need to the subsystem that owns it.** It is not a desk that wraps the platform; it
|
||
owns one lane and points at the rest:
|
||
|
||
1. **Know** the NetKingdom security model — identity, authorization, secrets,
|
||
SSH access, tunnels, bootstrap custody, and tenant/platform boundaries.
|
||
2. **Route** workers to the correct subsystem for each credential type instead
|
||
of becoming a universal secret vending machine — through the wiki and a
|
||
machine-readable routing catalog that *points at* the owner's docs rather than
|
||
restating them.
|
||
3. **Align** runbooks, wiki, inventory patterns, and scorecard checks with
|
||
NetKingdom canon as the platform evolves (OpenBao-first, flex-auth policy,
|
||
key-cape IAM Profile, railiance deployment layers).
|
||
4. **Issue** short-lived SSH certificates for `adm` / `agt` / `atm` actors when
|
||
host or ops reachability requires the SSH lane — via `warden sign`,
|
||
`cert_command`, and `ops-ssh-wrapper`. This is the **only** lane ops-warden
|
||
executes.
|
||
5. **Audit** SSH signing operations and cert-side compliance so gatekeeping is
|
||
observable, not tribal knowledge.
|
||
|
||
---
|
||
|
||
## NetKingdom Security Literacy
|
||
|
||
ops-warden should be fluent in the platform architecture documented in
|
||
`net-kingdom` — especially:
|
||
|
||
| Plane / component | Role in access | ops-warden relationship |
|
||
| --- | --- | --- |
|
||
| **key-cape / Keycloak** | Identity — who is the actor, MFA, IAM Profile claims | Instruct identity path; do not re-implement OIDC |
|
||
| **flex-auth + Topaz** | Authorization — may this actor perform this action | Future policy gate before SSH issuance; document integration |
|
||
| **OpenBao** | Runtime secrets — API keys, dynamic creds, leases, audit | Instruct secret custody paths; SSH engine is signing backend only |
|
||
| **ops-warden** | Operational SSH certificates — short-lived host access | **Own and issue** this lane |
|
||
| **ops-bridge** | Tunnel transport — consumes certs via `cert_command` | Primary consumer; document integration |
|
||
| **railiance-infra** | Host principals, force-command, SSH hardening | Instruct host-side deployment; do not own Ansible |
|
||
| **railiance-platform** | OpenBao/K8s/platform service deployment | Instruct production endpoints; do not deploy clusters |
|
||
|
||
Canonical references:
|
||
|
||
- `net-kingdom/docs/platform-identity-security-architecture.md`
|
||
- `net-kingdom/docs/responsibility-map.md`
|
||
- `wiki/AccessManagementDirective.md` (ops SSH actor model)
|
||
|
||
---
|
||
|
||
## Responsibility Boundary
|
||
|
||
### ops-warden owns
|
||
|
||
- NetKingdom-aligned **operational SSH access** guidance and stewardship
|
||
- **SSH certificate issuance** for registered `adm` / `agt` / `atm` actors
|
||
- Actor inventory, TTL/principal policy, cert-side scorecard, signatures log
|
||
- `cert_command` contract and `ops-ssh-wrapper` automation surface
|
||
- Keeping ops-warden docs and patterns aligned with NetKingdom security evolution
|
||
|
||
### ops-warden instructs but does not own
|
||
|
||
| Need | Route to |
|
||
| --- | --- |
|
||
| OIDC login, MFA, human identity claims | key-cape / Keycloak (NetKingdom IAM Profile) |
|
||
| Policy decision — may actor X access resource Y | flex-auth |
|
||
| API keys, provider secrets, DB creds, object-storage STS | OpenBao (+ flex-auth policy where required) |
|
||
| Inter-Hub operator keys, LLM provider credentials | OpenBao or approved operator secret store |
|
||
| Tunnel lifecycle, port forwarding | ops-bridge |
|
||
| `/etc/ssh/auth_principals/`, host hardening | railiance-infra |
|
||
| OpenBao cluster init/unseal, platform deploy | railiance-platform |
|
||
|
||
**ops-warden is not a general secrets manager.** It may document *how* workers
|
||
obtain non-SSH credentials; it must not store long-lived secrets in Git, State
|
||
Hub, workplans, logs, or chat.
|
||
|
||
---
|
||
|
||
## Design Principles
|
||
|
||
### 1. Right lane, right subsystem
|
||
|
||
Every credential request should land in the subsystem NetKingdom designed for it.
|
||
ops-warden optimizes for **correct routing** as much as for **fast issuance**.
|
||
|
||
### 2. Short-lived by default (SSH lane)
|
||
|
||
Operational SSH access uses CA-signed certificates with TTL and principals —
|
||
never unbounded static keys in worker workflows.
|
||
|
||
### 3. Align with canon, reduce drift
|
||
|
||
When NetKingdom security architecture changes (e.g. OpenBao standardization,
|
||
new bootstrap lanes), ops-warden updates its wiki, SCOPE, and runbooks so dev
|
||
workers do not reconstruct decisions from stale chat history.
|
||
|
||
### 4. Attributable actors
|
||
|
||
Humans, agents, and automations are distinct actor types (`adm` / `agt` / `atm`)
|
||
with naming, TTL, and principal conventions — matching the Access Management
|
||
Directive and NetKingdom agent-operating model.
|
||
|
||
### 5. Implement narrowly, guide broadly
|
||
|
||
**Implement** only what belongs in the SSH certificate lane.
|
||
**Guide** across the full NetKingdom security surface through documentation,
|
||
scorecard checks, inventory patterns, and future policy-integration hooks.
|
||
|
||
### 6. Observable gatekeeping
|
||
|
||
Every successful SSH sign is auditable (`signatures.log`). Compliance checks
|
||
(scorecard) make cert-side policy violations visible before they become incidents.
|
||
|
||
---
|
||
|
||
## Credential flow (target mental model)
|
||
|
||
```text
|
||
Development worker needs access
|
||
|
|
||
v
|
||
ops-warden (issue SSH; route the rest)
|
||
|
|
||
+-- SSH host / ops reachability? ----> warden sign / cert_command
|
||
|
|
||
+-- Runtime API / platform secret? --> OpenBao path (documented)
|
||
|
|
||
+-- Authorization required? ---------> flex-auth decision (future hook)
|
||
|
|
||
+-- Identity / MFA required? --------> key-cape / Keycloak path
|
||
|
|
||
+-- Tunnel only? --------------------> ops-bridge + cert_command
|
||
```
|
||
|
||
Today the steward role is primarily documentation, runbooks, and the implemented
|
||
SSH CLI. The machine-readable routing catalog and `warden route` lookup, plus
|
||
policy-gated issuance, are intentional follow-ups, not current promises.
|
||
|
||
---
|
||
|
||
## Relationship to NetKingdom
|
||
|
||
NetKingdom owns the **canonical security architecture** and meta-orchestration
|
||
across orchestrated repos. ops-warden is a **custodian-domain execution repo**
|
||
for one security lane plus operational guidance.
|
||
|
||
- NetKingdom defines *what the platform security model is*
|
||
- ops-warden keeps *operational SSH access and worker routing* aligned with it
|
||
- Railiance repos *deploy* what NetKingdom and component repos specify
|
||
|
||
ops-warden should appear in NetKingdom responsibility and pattern material as
|
||
the **operational SSH credential authority**, not as a replacement for
|
||
OpenBao or flex-auth.
|
||
|
||
---
|
||
|
||
## Success criteria
|
||
|
||
ops-warden is succeeding when:
|
||
|
||
1. A dev worker can determine **which subsystem** to use for a credential need
|
||
without guessing or pasting secrets into agent sessions.
|
||
2. SSH access for agents and operators is **short-lived, inventoried, and audited**.
|
||
3. ops-bridge and other consumers integrate via **stable cert_command** without
|
||
backend-specific branching.
|
||
4. NetKingdom security evolution (OpenBao, IAM Profile, bootstrap lanes) is
|
||
reflected in ops-warden docs within the same maintenance cycle.
|
||
5. Non-SSH secrets remain **out of ops-warden storage** — only documented paths.
|
||
|
||
---
|
||
|
||
## Non-goals
|
||
|
||
- Universal credential broker for all secret types
|
||
- Replacing OpenBao, flex-auth, key-cape, or railiance deployment ownership
|
||
- Storing Inter-Hub, LLM provider, or other long-lived API keys
|
||
- Host-side SSH configuration deployment
|
||
- **Duplicating or restating another subsystem's procedure** — routing material
|
||
points at the owner's docs; it does not fork them
|
||
- SSO / Teleport at scale (trigger per Access Management Directive §6.2)
|
||
|
||
---
|
||
|
||
## Evolution notes
|
||
|
||
The repository shipped the SSH CA CLI first (WARDEN-WP-0001–0003). The
|
||
stewardship and NetKingdom-alignment mission is the **next stratum** — docs,
|
||
routing canon, inventory standards, production OpenBao SSH engine alignment,
|
||
flex-auth integration design, and NetKingdom cross-links — without collapsing
|
||
platform boundaries.
|
||
|
||
See `wiki/CredentialRouting.md` for worker-facing routing,
|
||
`wiki/NetKingdomSecurityMap.md` for component literacy,
|
||
`history/2026-06-18-post-wp0008-intent-scope-reassessment.md` for the latest
|
||
gap analysis (production SSH path verified), and archived workplans WP-0006–0008
|
||
for stewardship and production closeout execution. |