Files
ops-warden/SCOPE.md
tegwick 55c3404741 docs(SCOPE): sync current state — WP-0016 pilot-ready, completeness C4→C5
Update SCOPE.md "Where we are" / INTENT gap / maturity vector / Current State to
reflect the ops-bridge cert_command pilot (WP-0016) shipped to pilot-ready and all
ops-warden workplans finished. Remaining distance is external (flex-auth prod flip,
ops-bridge live cutover, owner-driven WP-0015 canon landing).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 20:33:32 +02:00

325 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# SCOPE
> This file helps you quickly understand what this repository is about,
> when it is relevant, and when it is not.
> Aspirational direction lives in `INTENT.md`.
---
## One-liner
Operational access steward for the NetKingdom security model — issues short-lived
SSH certificates for `adm`/`agt`/`atm` actors, documents how to obtain other
credential types from the right platform subsystems, stewards workload security
posture conformance, and keeps ops access guidance aligned with NetKingdom canon.
---
## Where we are (2026-06-27)
ops-warden **issues short-lived SSH certificates and routes every other credential
need to the subsystem that owns it.** SSH signing is **production-verified** on
Railiance OpenBao (`warden sign` against `https://bao.coulomb.social`, host CA trust
deployed).
**Access routing** is shipped: `wiki/AccessRouting.md`, credential routing wiki,
NetKingdom security map, machine-readable pointer catalog
(`registry/routing/catalog.yaml`, WP-0010), and `warden route` lookup CLI
(`list`/`show`/`find`, `--json`, WP-0011).
**Operator access assist** is shipped (WP-0014): `warden access` gives advisory
handoffs for every catalog need and can proxy `exec_capable` lanes as the caller,
without taking custody of values.
**Workload security posture** is shipped (WP-0015, all tasks done): dev/test/prod
environment posture, M0-M3 workload maturity, the secret-flow lattice, and blocker
triage language (T1); machine-readable descriptors + `warden policy list|show` (T2);
the read-only conformance checker `scripts/check_secret_posture_conformance.py` (T3);
and the dev-tier contract-double library `warden.doubles` (T4). Canon landing in
net-kingdom / info-tech-canon is owner-driven (tracked via coordination messages, T5).
**Policy gate** is shipped on the caller side (WP-0007) with production registry
and smoke evidence (WP-0009 archived). flex-auth published the `ssh-certificate`
policy package (FLEX-WP-0006). `policy.enabled` remains **false** in production
until flex-auth is deployed to a reachable URL (flex-auth FLEX-WP-0007).
**ops-bridge cert_command pilot** is shipped to pilot-ready (WP-0016): a read-only
readiness gate (`scripts/check_tunnel_cert_readiness.py`) plus an opt-in offline
contract smoke (`--sign-smoke`); the playbook leads with the gate and the pilot
(`agt-state-hub-bridge`) is handed to ops-bridge. The live tunnel cutover is
ops-bridge's to execute.
**INTENT alignment:** SSH issuance mission met in production. All ops-warden workplans
are finished. Remaining distance is in other repos' lanes: ops-bridge running the
cert_command pilot cutover, flex-auth runtime deployment (FLEX-WP-0007, unblocks
`policy.enabled: true`), and the owner-driven WP-0015 canon landing — plus ongoing
operator hygiene.
### Issue vs route
ops-warden executes exactly one lane with its own authority and routes/assists the rest.
| Need | Subsystem | ops-warden role |
| --- | --- | --- |
| SSH cert for host/ops access (`adm`/`agt`/`atm`) | **ops-warden** | **Issue** (`warden sign`) |
| API key / DB cred / dynamic lease | OpenBao | Assist — route; proxy as caller only for `exec_capable` lanes |
| "May I perform action X?" | flex-auth | Route — point at policy; consume decisions where configured |
| Login / OIDC / MFA | key-cape / Keycloak | Assist — route; proxy `login` lane when `exec_capable` |
| SSH tunnel / port forward | ops-bridge | Route — supply `cert_command` |
| Host principal deployment | railiance-infra | Route — point at Ansible |
Full role and boundary: `wiki/AccessRouting.md`. The catalog is a **pointer layer**
it never restates an owner's procedure (authored `steps` exist only for the SSH lane).
Gap analysis: `history/2026-06-24-intent-scope-gap-analysis.md` (current);
`history/2026-06-18-post-wp0008-intent-scope-reassessment.md` (SSH lane);
`history/2026-06-18-access-routing-intent-shift-assessment.md` (routing charter).
---
## INTENT gap snapshot
| INTENT success criterion | Status |
| --- | --- |
| Worker knows which subsystem for each credential type | Met |
| SSH short-lived, inventoried, audited | Met (production) |
| ops-bridge integrates via stable `cert_command` | **Pilot-ready** — contract + readiness gate (`check_tunnel_cert_readiness.py`, WP-0016) shipped; live cutover handed to ops-bridge |
| NetKingdom evolution reflected in docs | Met |
| Non-SSH secrets stay out of ops-warden | Met |
| Workload posture / maturity model for secret-flow blockers | Met — two-axis standard + descriptors + conformance checker + dev doubles (WP-0015) |
**Maturity vector:** `D5 / A5 / C5 / R3` (Discovery / Availability / Completeness / Reliability)
| Dimension | Level | Meaning today |
| --- | --- | --- |
| D5 | Discovery | Routing wiki + security map + pointer catalog + NK canon cross-links |
| A5 | Availability | CLI + `warden route` + `warden access` advisory & proxy front door + `warden policy` + opt-in policy gate + agent `--json` |
| C5 | Completeness | All ops-warden lanes shipped — SSH (prod), routing, access assist, posture conformance, cert_command pilot gate. Open items are external: flex-auth prod flip + ops-bridge live cutover |
| R3 | Reliability | Live OpenBao sign evidence on Railiance |
---
## Core Idea
**Today:** implements the SSH certificate lane from `wiki/AccessManagementDirective.md`
§§15 — CA signing, actor inventory, TTL policy, cert-side scorecard, optional
flex-auth pre-sign gate, and the `cert_command` interface for ops-bridge. Production
path uses OpenBao SSH engine (`backend: vault`).
**Direction (INTENT):** issue short-lived SSH certificates and route dev workers to
key-cape, flex-auth, OpenBao, ops-bridge, and railiance components for everything
else — implementing only the SSH certificate lane directly, pointing at the owner
for the rest.
---
## In Scope
### Implemented (SSH lane)
- Local CA backend (`ssh-keygen -s`)
- OpenBao / Vault-compatible SSH engine backend (**production-verified**)
- Actor identity registry (`inventory.yaml`)
- `cert_command`: `warden sign <actor> --pubkey <path>` → cert on stdout
- TTL enforcement per `ActorType` (`adm` 48 h, `agt` 24 h, `atm` 8 h)
- `warden status`, cleanup, scorecard, signatures log
- Opt-in flex-auth policy gate (`policy.enabled`, `policy_decision_id` in log)
- Production flex-auth registry builder (`scripts/build_flex_auth_registry.py`,
`registry/flex-auth/production_registry_snapshot.json`)
- Policy gate smoke runner (`scripts/policy_gate_production_smoke.sh`)
- `warden route` lookup CLI (`list`/`show`/`find`, `--json`) over the pointer catalog
- `warden access` operator front door (WP-0014): advisory handoff for any need, and a
transparent, policy-gated, audited **proxy** (`--fetch`/`--exec`) for `exec_capable`
lanes (OpenBao secret reads, key-cape login) — caller identity, value never held
- `warden issue` and `ops-ssh-wrapper` (local backend; vault uses sign-only)
- ops-bridge cert_command readiness gate (`scripts/check_tunnel_cert_readiness.py`,
WP-0016) — read-only preflight + opt-in offline contract smoke
- Runbooks for OpenBao config and Inter-Hub bootstrap SSH envelope
### Stewardship (documentation and alignment)
- NetKingdom security routing guidance — which subsystem owns which credential type
- Wiki and config references aligned with OpenBao-first platform standard
- Capability registry entry for SSH certificate issuance
- Routing pointer catalog (`registry/routing/catalog.yaml`)
- Keeping ops access patterns consistent with `net-kingdom` platform architecture
- Workload Security Posture standard (`wiki/WorkloadSecurityPosture.md`),
machine-readable posture descriptors (`registry/policy/security-posture.yaml`),
the read-only conformance checker, and the dev-tier contract-double library
### Shipped workplans (archived)
| WP | Focus |
| --- | --- |
| WP-00010005 | Initial CLI, quality, hygiene, OpenBao docs, hub sync |
| WP-0006 | Credential routing, security map, inventory patterns, OpenBao checklist |
| WP-0007 | Opt-in flex-auth policy gate (`policy.enabled`) |
| WP-0008 | Production sign verification, stewardship closeout, archive hygiene |
| WP-0009 | flex-auth registry + policy smoke; pickup brief for FLEX-WP-0007 |
| WP-0010 | Access routing charter + pointer catalog |
| WP-0011 | `warden route` lookup CLI |
| WP-0012 | Routing scenario playbooks (catalog + wiki expansion) |
| WP-0013 | Production integration closeout — cert_command playbook, token hygiene, principals drift |
| WP-0014 | Operator access assist — `warden access` advisory + proxy front door |
| WP-0015 | Workload security posture — two-axis standard, descriptors, conformance checker, dev doubles |
| WP-0016 | ops-bridge cert_command pilot — readiness gate (`check_tunnel_cert_readiness.py`) + handoff |
### Active / ready
_None open._ All ops-warden workplans are finished; the remaining distance is in other
repos' lanes (see Known gaps).
### Known gaps (not ops-warden workplans)
| Gap | Owner | Notes |
| --- | --- | --- |
| flex-auth production runtime + registry deploy | flex-auth | **FLEX-WP-0007** — unblocks `policy.enabled: true` |
| Vault-backed policy gate joint smoke | flex-auth + operator | Needs valid scoped `VAULT_TOKEN` |
| ops-bridge `cert_command` on live tunnels | ops-bridge | Playbook + readiness gate shipped (WP-0016); pilot cutover handed off, awaiting ops-bridge |
| Principals sync warden ↔ railiance-infra | ops-warden + infra | `scripts/check_principals_drift.py` — operator runs periodically |
| NK-WP-0009 joint SSH tutorial | net-kingdom | Parallel coordination track |
| WP-0015 canon landing (generic `WorkloadMaturityLevel` + M0-M3 requirements) | net-kingdom + info-tech-canon | ops-warden drafted + offered (coordination msgs); owner-driven landing |
---
## Out of Scope
- **Issuing or custodying** non-SSH secrets (API keys, DB creds, S3 STS,
Inter-Hub keys) → OpenBao with flex-auth policy where required; ops-warden
documents paths and may proxy caller-authenticated `exec_capable` lanes only
- Identity / OIDC / MFA → key-cape, Keycloak
- Authorization policy decisions → flex-auth
- flex-auth runtime deployment and secret-flow lattice enforcement → flex-auth
(`FLEX-WP-0007` and follow-ups)
- Tunnel lifecycle → `ops-bridge`
- Host principal deployment → `railiance-infra`
- OpenBao / Vault cluster deployment → `railiance-platform`
- Human admin SSH key generation (self-service `ssh-keygen`)
- Session recording, SIEM, SSO / Teleport at scale
---
## Relevant When
- Issuing or refreshing an **SSH cert** for `adm`/`agt`/`atm`
- A dev worker needs to know **where to get credentials** in the NetKingdom stack
- An agent needs **`warden route find`** instead of re-deriving routing from wiki prose
- `ops-bridge` needs a `cert_command` for a tunnel
- Adding actors to the principals inventory (regenerate flex-auth registry snapshot)
- Inter-Hub or bootstrap tasks need a **short-lived agent SSH envelope**
- Checking cert-side compliance (scorecard)
- Enabling or testing the opt-in flex-auth policy gate
- Classifying whether a credential blocker is a dev/test double, owner-routed prod
gate, or maturity/posture violation
---
## Not Relevant When
- Storing or vending **API keys or runtime secrets** (→ OpenBao)
- Policy decisions on resource access (→ flex-auth)
- Managing tunnels without SSH cert issuance (→ ops-bridge)
- Static-key-only legacy access (ops-bridge static key mode)
---
## Current State
- **SSH CLI:** v0.1.0 — local + OpenBao backends
- **Production sign:** verified 2026-06-18 (`history/2026-06-17-openbao-production-verify.md`)
- **Access routing:** WP-0010 + WP-0011 shipped (`warden route`, pointer catalog)
- **Policy gate:** caller shipped (WP-0007); registry + smoke complete (WP-0009 archived).
`policy.enabled: false` until flex-auth reachable (`FLEX-WP-0007`)
- **Workload posture:** WP-0015 shipped (standard, descriptors, `warden policy`,
conformance checker, dev doubles); canon landing owner-driven
- **ops-bridge cert_command:** WP-0016 shipped to pilot-ready (readiness gate +
offline contract smoke + handoff); live cutover is ops-bridge's
- **Active work:** none open in ops-warden; remaining distance is other repos' lanes
- **Integration docs:** cert_command migration, token hygiene, principals drift (`wiki/playbooks/`)
- **Latest assessment:** `history/2026-06-24-intent-scope-gap-analysis.md`
---
## How It Fits (NetKingdom)
```text
key-cape / Keycloak identity claims
→ flex-auth authorization decisions
→ OpenBao runtime secrets & dynamic credentials
→ ops-warden SSH certs + operational access guidance
→ ops-bridge tunnel transport (cert_command consumer)
→ railiance-* deployment and host enforcement
```
Upstream: OpenBao SSH engine (production) or local CA (labs). Actor inventory in
operator config or Git-tracked patterns. flex-auth registry snapshot derived from
inventory when policy gate is enabled.
Downstream: `ops-bridge` (primary), kaizen agents, CI automations, human operators.
---
## Terminology
- `ActorType`: `adm` | `agt` | `atm`
- `cert_command`: shell command returning a cert on stdout
- `inventory.yaml`: actor → principals + TTL registry
- `LocalCA` / `VaultCA`: signing backends (`backend: local` | `vault`)
- Pointer catalog: `registry/routing/catalog.yaml` — subsystem ownership lookup plus
secret-free `warden access` handoff metadata
- Workload Security Posture: env posture (`dev/test/prod`) plus maturity (`M0-M3`)
used to decide whether a secret may flow to a workload
---
## Related Repositories
| Repo | Relationship |
| --- | --- |
| `net-kingdom` | Canonical security architecture; ops-warden aligns to it |
| `ops-bridge` | Primary cert_command consumer |
| `railiance-infra` | Host-side SSH principals and hardening |
| `railiance-platform` | OpenBao deployment and platform secrets |
| `flex-auth` | Authorization; policy package shipped (FLEX-WP-0006); runtime deploy FLEX-WP-0007 |
| `key-cape` | Identity / IAM Profile lightweight mode |
| `state-hub` | Workstream registry |
---
## Provided Capabilities
```capability
type: security
title: SSH certificate issuance
description: Issues short-lived CA-signed SSH certificates for adm/agt/atm actors via a
pluggable cert_command interface; documents NetKingdom operational access routing;
supports local CA and OpenBao/Vault-compatible SSH engine backends.
keywords: [ssh, certificate, ca, credential, warden, ops-warden, pki, openbao, vault, netkingdom]
```
---
## Getting Oriented
| Read first | Purpose |
| --- | --- |
| `INTENT.md` | Why ops-warden exists and where it is going |
| `SCOPE.md` | What is implemented today (this file) |
| `wiki/AccessRouting.md` | What ops-warden issues vs routes vs assists (role and boundary) |
| `wiki/OperatorAccessAssist.md` | `warden access` front door + conduit-vs-broker boundary + guardrails |
| `wiki/CredentialRouting.md` | Which subsystem for each credential need |
| `wiki/WorkloadSecurityPosture.md` | Secret-store posture, workload maturity, and blocker triage |
| `registry/routing/catalog.yaml` | Machine-readable routing pointer catalog |
| `wiki/NetKingdomSecurityMap.md` | Platform security component map |
| `examples/warden.production.example.yaml` | Production warden.yaml template |
| `wiki/PolicyGatedSigning.md` | flex-auth opt-in gate + registry rollout |
| `wiki/AccessManagementDirective.md` | SSH actor model |
| `wiki/OpsWardenConfig.md` | warden.yaml and OpenBao |
| `wiki/CertCommandInterface.md` | cert_command contract |
| `history/2026-06-24-intent-scope-gap-analysis.md` | Current gap analysis + WP-0013 |
| `history/2026-06-27-workload-security-posture-charter.md` | WP-0015 posture/conformance charter |
| `history/2026-06-18-post-wp0008-intent-scope-reassessment.md` | SSH lane gap analysis |
| `history/2026-06-18-access-routing-intent-shift-assessment.md` | Routing charter decision |
| `history/2026-06-23-flex-auth-policy-gate-production-smoke.md` | Policy gate smoke evidence |
| `net-kingdom/docs/platform-identity-security-architecture.md` | Platform security canon |