Files
ops-warden/SCOPE.md
tegwick 46b340f45f feat(WARDEN-WP-0017): make the access front door discoverable (not SSH-only)
WP-0014 made ops-warden the operator access front door (warden access --fetch/--exec
proxies an exec_capable secret as the caller), but every discovery surface still told
the pre-WP-0014 "SSH certs only, pointer not key" story — so agents like whynot-design
never found the proxy and concluded they had to message ops-warden for a token value.

Messaging/discoverability only; the conduit security model is unchanged (no custody,
no broker).

T1 — CLI: `warden route` table warden column is now three-valued (issue/assist/route);
route + access JSON gain warden_role + exec_capable and a proxy-aware next_action;
`warden access` closing line leads with "ops-warden can fetch this for you as the
caller" for exec_capable lanes (route-only lanes keep "owner vends").

T2 — .claude/rules/credential-routing.md reframed (lead + routing table role column);
SCOPE one-liner + a second capability block for the access front door.

T3 — registered the State Hub capability "Operator access front door (caller-identity
fetch proxy)" (the hub had no ops-warden security capability at all); messaged
whynot-design the corrected `warden access "npm auth token" --fetch/--exec` path.

210 tests pass, lint clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 21:02:46 +02:00

17 KiB
Raw Blame History

SCOPE

This file helps you quickly understand what this repository is about, when it is relevant, and when it is not. Aspirational direction lives in INTENT.md.


One-liner

Operational access steward and front door for the NetKingdom security model — issues short-lived SSH certificates for adm/agt/atm actors, and for every other credential need is the operator front door (warden access): routes to the owning subsystem and, for exec_capable lanes (OpenBao reads, key-cape login), proxies the fetch as the caller without taking custody. Also stewards workload security posture conformance and keeps ops access guidance aligned with NetKingdom canon.


Where we are (2026-06-27)

ops-warden issues short-lived SSH certificates and routes every other credential need to the subsystem that owns it. SSH signing is production-verified on Railiance OpenBao (warden sign against https://bao.coulomb.social, host CA trust deployed).

Access routing is shipped: wiki/AccessRouting.md, credential routing wiki, NetKingdom security map, machine-readable pointer catalog (registry/routing/catalog.yaml, WP-0010), and warden route lookup CLI (list/show/find, --json, WP-0011).

Operator access assist is shipped (WP-0014): warden access gives advisory handoffs for every catalog need and can proxy exec_capable lanes as the caller, without taking custody of values.

Workload security posture is shipped (WP-0015, all tasks done): dev/test/prod environment posture, M0-M3 workload maturity, the secret-flow lattice, and blocker triage language (T1); machine-readable descriptors + warden policy list|show (T2); the read-only conformance checker scripts/check_secret_posture_conformance.py (T3); and the dev-tier contract-double library warden.doubles (T4). Canon landing in net-kingdom / info-tech-canon is owner-driven (tracked via coordination messages, T5).

Policy gate is shipped on the caller side (WP-0007) with production registry and smoke evidence (WP-0009 archived). flex-auth published the ssh-certificate policy package (FLEX-WP-0006). policy.enabled remains false in production until flex-auth is deployed to a reachable URL (flex-auth FLEX-WP-0007).

ops-bridge cert_command pilot is shipped to pilot-ready (WP-0016): a read-only readiness gate (scripts/check_tunnel_cert_readiness.py) plus an opt-in offline contract smoke (--sign-smoke); the playbook leads with the gate and the pilot (agt-state-hub-bridge) is handed to ops-bridge. The live tunnel cutover is ops-bridge's to execute.

INTENT alignment: SSH issuance mission met in production. All ops-warden workplans are finished. Remaining distance is in other repos' lanes: ops-bridge running the cert_command pilot cutover, flex-auth runtime deployment (FLEX-WP-0007, unblocks policy.enabled: true), and the owner-driven WP-0015 canon landing — plus ongoing operator hygiene.

Issue vs route

ops-warden executes exactly one lane with its own authority and routes/assists the rest.

Need Subsystem ops-warden role
SSH cert for host/ops access (adm/agt/atm) ops-warden Issue (warden sign)
API key / DB cred / dynamic lease OpenBao Assist — route; proxy as caller only for exec_capable lanes
"May I perform action X?" flex-auth Route — point at policy; consume decisions where configured
Login / OIDC / MFA key-cape / Keycloak Assist — route; proxy login lane when exec_capable
SSH tunnel / port forward ops-bridge Route — supply cert_command
Host principal deployment railiance-infra Route — point at Ansible

Full role and boundary: wiki/AccessRouting.md. The catalog is a pointer layer — it never restates an owner's procedure (authored steps exist only for the SSH lane).

Gap analysis: history/2026-06-24-intent-scope-gap-analysis.md (current); history/2026-06-18-post-wp0008-intent-scope-reassessment.md (SSH lane); history/2026-06-18-access-routing-intent-shift-assessment.md (routing charter).


INTENT gap snapshot

INTENT success criterion Status
Worker knows which subsystem for each credential type Met
SSH short-lived, inventoried, audited Met (production)
ops-bridge integrates via stable cert_command Pilot-ready — contract + readiness gate (check_tunnel_cert_readiness.py, WP-0016) shipped; live cutover handed to ops-bridge
NetKingdom evolution reflected in docs Met
Non-SSH secrets stay out of ops-warden Met
Workload posture / maturity model for secret-flow blockers Met — two-axis standard + descriptors + conformance checker + dev doubles (WP-0015)

Maturity vector: D5 / A5 / C5 / R3 (Discovery / Availability / Completeness / Reliability)

Dimension Level Meaning today
D5 Discovery Routing wiki + security map + pointer catalog + NK canon cross-links
A5 Availability CLI + warden route + warden access advisory & proxy front door + warden policy + opt-in policy gate + agent --json
C5 Completeness All ops-warden lanes shipped — SSH (prod), routing, access assist, posture conformance, cert_command pilot gate. Open items are external: flex-auth prod flip + ops-bridge live cutover
R3 Reliability Live OpenBao sign evidence on Railiance

Core Idea

Today: implements the SSH certificate lane from wiki/AccessManagementDirective.md §§15 — CA signing, actor inventory, TTL policy, cert-side scorecard, optional flex-auth pre-sign gate, and the cert_command interface for ops-bridge. Production path uses OpenBao SSH engine (backend: vault).

Direction (INTENT): issue short-lived SSH certificates and route dev workers to key-cape, flex-auth, OpenBao, ops-bridge, and railiance components for everything else — implementing only the SSH certificate lane directly, pointing at the owner for the rest.


In Scope

Implemented (SSH lane)

  • Local CA backend (ssh-keygen -s)
  • OpenBao / Vault-compatible SSH engine backend (production-verified)
  • Actor identity registry (inventory.yaml)
  • cert_command: warden sign <actor> --pubkey <path> → cert on stdout
  • TTL enforcement per ActorType (adm 48 h, agt 24 h, atm 8 h)
  • warden status, cleanup, scorecard, signatures log
  • Opt-in flex-auth policy gate (policy.enabled, policy_decision_id in log)
  • Production flex-auth registry builder (scripts/build_flex_auth_registry.py, registry/flex-auth/production_registry_snapshot.json)
  • Policy gate smoke runner (scripts/policy_gate_production_smoke.sh)
  • warden route lookup CLI (list/show/find, --json) over the pointer catalog
  • warden access operator front door (WP-0014): advisory handoff for any need, and a transparent, policy-gated, audited proxy (--fetch/--exec) for exec_capable lanes (OpenBao secret reads, key-cape login) — caller identity, value never held
  • warden issue and ops-ssh-wrapper (local backend; vault uses sign-only)
  • ops-bridge cert_command readiness gate (scripts/check_tunnel_cert_readiness.py, WP-0016) — read-only preflight + opt-in offline contract smoke
  • Runbooks for OpenBao config and Inter-Hub bootstrap SSH envelope

Stewardship (documentation and alignment)

  • NetKingdom security routing guidance — which subsystem owns which credential type
  • Wiki and config references aligned with OpenBao-first platform standard
  • Capability registry entry for SSH certificate issuance
  • Routing pointer catalog (registry/routing/catalog.yaml)
  • Keeping ops access patterns consistent with net-kingdom platform architecture
  • Workload Security Posture standard (wiki/WorkloadSecurityPosture.md), machine-readable posture descriptors (registry/policy/security-posture.yaml), the read-only conformance checker, and the dev-tier contract-double library

Shipped workplans (archived)

WP Focus
WP-00010005 Initial CLI, quality, hygiene, OpenBao docs, hub sync
WP-0006 Credential routing, security map, inventory patterns, OpenBao checklist
WP-0007 Opt-in flex-auth policy gate (policy.enabled)
WP-0008 Production sign verification, stewardship closeout, archive hygiene
WP-0009 flex-auth registry + policy smoke; pickup brief for FLEX-WP-0007
WP-0010 Access routing charter + pointer catalog
WP-0011 warden route lookup CLI
WP-0012 Routing scenario playbooks (catalog + wiki expansion)
WP-0013 Production integration closeout — cert_command playbook, token hygiene, principals drift
WP-0014 Operator access assist — warden access advisory + proxy front door
WP-0015 Workload security posture — two-axis standard, descriptors, conformance checker, dev doubles
WP-0016 ops-bridge cert_command pilot — readiness gate (check_tunnel_cert_readiness.py) + handoff

Active / ready

None open. All ops-warden workplans are finished; the remaining distance is in other repos' lanes (see Known gaps).

Known gaps (not ops-warden workplans)

Gap Owner Notes
flex-auth production runtime + registry deploy flex-auth FLEX-WP-0007 — unblocks policy.enabled: true
Vault-backed policy gate joint smoke flex-auth + operator Needs valid scoped VAULT_TOKEN
ops-bridge cert_command on live tunnels ops-bridge Playbook + readiness gate shipped (WP-0016); pilot cutover handed off, awaiting ops-bridge
Principals sync warden ↔ railiance-infra ops-warden + infra scripts/check_principals_drift.py — operator runs periodically
NK-WP-0009 joint SSH tutorial net-kingdom Parallel coordination track
WP-0015 canon landing (generic WorkloadMaturityLevel + M0-M3 requirements) net-kingdom + info-tech-canon ops-warden drafted + offered (coordination msgs); owner-driven landing

Out of Scope

  • Issuing or custodying non-SSH secrets (API keys, DB creds, S3 STS, Inter-Hub keys) → OpenBao with flex-auth policy where required; ops-warden documents paths and may proxy caller-authenticated exec_capable lanes only
  • Identity / OIDC / MFA → key-cape, Keycloak
  • Authorization policy decisions → flex-auth
  • flex-auth runtime deployment and secret-flow lattice enforcement → flex-auth (FLEX-WP-0007 and follow-ups)
  • Tunnel lifecycle → ops-bridge
  • Host principal deployment → railiance-infra
  • OpenBao / Vault cluster deployment → railiance-platform
  • Human admin SSH key generation (self-service ssh-keygen)
  • Session recording, SIEM, SSO / Teleport at scale

Relevant When

  • Issuing or refreshing an SSH cert for adm/agt/atm
  • A dev worker needs to know where to get credentials in the NetKingdom stack
  • An agent needs warden route find instead of re-deriving routing from wiki prose
  • ops-bridge needs a cert_command for a tunnel
  • Adding actors to the principals inventory (regenerate flex-auth registry snapshot)
  • Inter-Hub or bootstrap tasks need a short-lived agent SSH envelope
  • Checking cert-side compliance (scorecard)
  • Enabling or testing the opt-in flex-auth policy gate
  • Classifying whether a credential blocker is a dev/test double, owner-routed prod gate, or maturity/posture violation

Not Relevant When

  • Storing or vending API keys or runtime secrets (→ OpenBao)
  • Policy decisions on resource access (→ flex-auth)
  • Managing tunnels without SSH cert issuance (→ ops-bridge)
  • Static-key-only legacy access (ops-bridge static key mode)

Current State

  • SSH CLI: v0.1.0 — local + OpenBao backends
  • Production sign: verified 2026-06-18 (history/2026-06-17-openbao-production-verify.md)
  • Access routing: WP-0010 + WP-0011 shipped (warden route, pointer catalog)
  • Policy gate: caller shipped (WP-0007); registry + smoke complete (WP-0009 archived). policy.enabled: false until flex-auth reachable (FLEX-WP-0007)
  • Workload posture: WP-0015 shipped (standard, descriptors, warden policy, conformance checker, dev doubles); canon landing owner-driven
  • ops-bridge cert_command: WP-0016 shipped to pilot-ready (readiness gate + offline contract smoke + handoff); live cutover is ops-bridge's
  • Active work: none open in ops-warden; remaining distance is other repos' lanes
  • Integration docs: cert_command migration, token hygiene, principals drift (wiki/playbooks/)
  • Latest assessment: history/2026-06-24-intent-scope-gap-analysis.md

How It Fits (NetKingdom)

key-cape / Keycloak     identity claims
        → flex-auth     authorization decisions
        → OpenBao       runtime secrets & dynamic credentials
        → ops-warden    SSH certs + operational access guidance
        → ops-bridge    tunnel transport (cert_command consumer)
        → railiance-*   deployment and host enforcement

Upstream: OpenBao SSH engine (production) or local CA (labs). Actor inventory in operator config or Git-tracked patterns. flex-auth registry snapshot derived from inventory when policy gate is enabled.

Downstream: ops-bridge (primary), kaizen agents, CI automations, human operators.


Terminology

  • ActorType: adm | agt | atm
  • cert_command: shell command returning a cert on stdout
  • inventory.yaml: actor → principals + TTL registry
  • LocalCA / VaultCA: signing backends (backend: local | vault)
  • Pointer catalog: registry/routing/catalog.yaml — subsystem ownership lookup plus secret-free warden access handoff metadata
  • Workload Security Posture: env posture (dev/test/prod) plus maturity (M0-M3) used to decide whether a secret may flow to a workload

Repo Relationship
net-kingdom Canonical security architecture; ops-warden aligns to it
ops-bridge Primary cert_command consumer
railiance-infra Host-side SSH principals and hardening
railiance-platform OpenBao deployment and platform secrets
flex-auth Authorization; policy package shipped (FLEX-WP-0006); runtime deploy FLEX-WP-0007
key-cape Identity / IAM Profile lightweight mode
state-hub Workstream registry

Provided Capabilities

type: security
title: SSH certificate issuance
description: Issues short-lived CA-signed SSH certificates for adm/agt/atm actors via a
  pluggable cert_command interface; documents NetKingdom operational access routing;
  supports local CA and OpenBao/Vault-compatible SSH engine backends.
keywords: [ssh, certificate, ca, credential, warden, ops-warden, pki, openbao, vault, netkingdom]
type: security
title: Operator access front door (caller-identity fetch proxy)
description: warden access is the operator front door for any NetKingdom credential need.
  It renders the owner, auth method, path, and policy status, and for exec_capable lanes
  (OpenBao secret reads, key-cape OIDC login) proxies the fetch as the caller — running
  the owner's tool with the caller's identity and streaming the value to them. ops-warden
  takes no custody: it holds, caches, and logs no secret value (transparent conduit, not a
  broker). Use this to obtain an API key, DB credential, npm token, or login — not a State
  Hub message.
keywords: [access, credential, secret, npm, token, api-key, openbao, key-cape, login, proxy, fetch, exec, warden-access, front-door, routing]

Getting Oriented

Read first Purpose
INTENT.md Why ops-warden exists and where it is going
SCOPE.md What is implemented today (this file)
wiki/AccessRouting.md What ops-warden issues vs routes vs assists (role and boundary)
wiki/OperatorAccessAssist.md warden access front door + conduit-vs-broker boundary + guardrails
wiki/CredentialRouting.md Which subsystem for each credential need
wiki/WorkloadSecurityPosture.md Secret-store posture, workload maturity, and blocker triage
registry/routing/catalog.yaml Machine-readable routing pointer catalog
wiki/NetKingdomSecurityMap.md Platform security component map
examples/warden.production.example.yaml Production warden.yaml template
wiki/PolicyGatedSigning.md flex-auth opt-in gate + registry rollout
wiki/AccessManagementDirective.md SSH actor model
wiki/OpsWardenConfig.md warden.yaml and OpenBao
wiki/CertCommandInterface.md cert_command contract
history/2026-06-24-intent-scope-gap-analysis.md Current gap analysis + WP-0013
history/2026-06-27-workload-security-posture-charter.md WP-0015 posture/conformance charter
history/2026-06-18-post-wp0008-intent-scope-reassessment.md SSH lane gap analysis
history/2026-06-18-access-routing-intent-shift-assessment.md Routing charter decision
history/2026-06-23-flex-auth-policy-gate-production-smoke.md Policy gate smoke evidence
net-kingdom/docs/platform-identity-security-architecture.md Platform security canon