generated from coulomb/repo-seed
feat(WARDEN-WP-0016): ops-bridge cert_command readiness gate + handoff
Close ops-warden's side of the last Partial INTENT criterion (ops-bridge integrates via a stable cert_command). The migration playbook and contract already existed; what was missing was an automated readiness gate before touching tunnel config. T1 — scripts/check_tunnel_cert_readiness.py: read-only preflight that asserts the cert_command path is ready without signing — config/backend, actor inventory + TTL within type max, pubkey exists/parses/not-private, principals present, and optional host-principal deployment (mirrors check_principals_drift). Exit 0/1/2. T2 — opt-in --sign-smoke: runs the cert_command against the local backend and validates identity/principals/TTL of the emitted cert; refuses a vault backend. Window measured from the cert's own valid_from->valid_before so it's timezone-robust (fixes a CEST off-by-2h artifact). integration-marked test + a vault-refusal unit test. T3 — playbook now leads with Step 0 readiness gate; ops-bridge handoff message sent. T4 — SCOPE INTENT row: Partial -> Pilot-ready; known-gaps + SSH-lane list updated. 9 unit + 1 integration test, 209 default passing, lint clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
6
SCOPE.md
6
SCOPE.md
@@ -75,7 +75,7 @@ Gap analysis: `history/2026-06-24-intent-scope-gap-analysis.md` (current);
|
||||
| --- | --- |
|
||||
| Worker knows which subsystem for each credential type | Met |
|
||||
| SSH short-lived, inventoried, audited | Met (production) |
|
||||
| ops-bridge integrates via stable `cert_command` | **Partial** — contract yes; tunnels still static-key |
|
||||
| ops-bridge integrates via stable `cert_command` | **Pilot-ready** — contract + readiness gate (`check_tunnel_cert_readiness.py`, WP-0016) shipped; live cutover handed to ops-bridge |
|
||||
| NetKingdom evolution reflected in docs | Met |
|
||||
| Non-SSH secrets stay out of ops-warden | Met |
|
||||
| Workload posture / maturity model for secret-flow blockers | Met — two-axis standard + descriptors + conformance checker + dev doubles (WP-0015) |
|
||||
@@ -124,6 +124,8 @@ for the rest.
|
||||
transparent, policy-gated, audited **proxy** (`--fetch`/`--exec`) for `exec_capable`
|
||||
lanes (OpenBao secret reads, key-cape login) — caller identity, value never held
|
||||
- `warden issue` and `ops-ssh-wrapper` (local backend; vault uses sign-only)
|
||||
- ops-bridge cert_command readiness gate (`scripts/check_tunnel_cert_readiness.py`,
|
||||
WP-0016) — read-only preflight + opt-in offline contract smoke
|
||||
- Runbooks for OpenBao config and Inter-Hub bootstrap SSH envelope
|
||||
|
||||
### Stewardship (documentation and alignment)
|
||||
@@ -164,7 +166,7 @@ repos' lanes (see Known gaps).
|
||||
| --- | --- | --- |
|
||||
| flex-auth production runtime + registry deploy | flex-auth | **FLEX-WP-0007** — unblocks `policy.enabled: true` |
|
||||
| Vault-backed policy gate joint smoke | flex-auth + operator | Needs valid scoped `VAULT_TOKEN` |
|
||||
| ops-bridge `cert_command` on live tunnels | ops-bridge | Playbook shipped (`wiki/playbooks/ops-bridge-tunnel-cert.md`); pilot pending |
|
||||
| ops-bridge `cert_command` on live tunnels | ops-bridge | Playbook + readiness gate shipped (WP-0016); pilot cutover handed off, awaiting ops-bridge |
|
||||
| Principals sync warden ↔ railiance-infra | ops-warden + infra | `scripts/check_principals_drift.py` — operator runs periodically |
|
||||
| NK-WP-0009 joint SSH tutorial | net-kingdom | Parallel coordination track |
|
||||
| WP-0015 canon landing (generic `WorkloadMaturityLevel` + M0-M3 requirements) | net-kingdom + info-tech-canon | ops-warden drafted + offered (coordination msgs); owner-driven landing |
|
||||
|
||||
243
scripts/check_tunnel_cert_readiness.py
Normal file
243
scripts/check_tunnel_cert_readiness.py
Normal file
@@ -0,0 +1,243 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Read-only readiness gate for an ops-bridge cert_command pilot (WARDEN-WP-0016 T1).
|
||||
|
||||
Before an operator migrates a tunnel from a static SSH key to a warden-signed
|
||||
certificate (see ``wiki/playbooks/ops-bridge-tunnel-cert.md``), this script asserts the
|
||||
**ops-warden side is ready** — *without signing anything*:
|
||||
|
||||
* warden.yaml loads and names a known backend (local | vault),
|
||||
* the actor exists in the inventory with a valid type and resolvable TTL,
|
||||
* the public key file exists and is structurally a public key (no private key),
|
||||
* the actor has at least one principal,
|
||||
* (optional) the actor's principals are deployed in railiance-infra's
|
||||
``ssh_principals.yaml`` (mirrors ``scripts/check_principals_drift.py``).
|
||||
|
||||
Exit 0 = ready, 1 = not ready (a check failed), 2 = bad input (missing/invalid files).
|
||||
It never signs, reads a private key, or prints a secret. The actual cert_command smoke
|
||||
is the opt-in ``--sign-smoke`` step (WP-0016 T2), kept separate because it issues a cert.
|
||||
|
||||
Usage:
|
||||
python scripts/check_tunnel_cert_readiness.py \\
|
||||
--actor agt-state-hub-bridge \\
|
||||
--pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub \\
|
||||
--config ~/.config/warden/warden.yaml \\
|
||||
[--infra ~/railiance-infra/ansible/inventory/ssh_principals.yaml]
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any, List, Optional, Tuple
|
||||
|
||||
_SRC = Path(__file__).resolve().parent.parent / "src"
|
||||
if _SRC.is_dir() and str(_SRC) not in sys.path:
|
||||
sys.path.insert(0, str(_SRC))
|
||||
|
||||
import yaml # noqa: E402
|
||||
|
||||
from warden.config import ConfigError, WardenConfig, load_config # noqa: E402
|
||||
from warden.inventory import ActorEntry, InventoryError, load_inventory # noqa: E402
|
||||
from warden.models import MAX_TTL_HOURS, CertSpec # noqa: E402
|
||||
|
||||
# A check result: status in {"ok", "fail", "skip"}, a short label, and a detail line.
|
||||
Check = Tuple[str, str, str]
|
||||
|
||||
# Public-key prefixes we accept for a cert_command pubkey (never a private key).
|
||||
_PUBKEY_PREFIXES = ("ssh-ed25519 ", "ssh-rsa ", "ecdsa-sha2-", "sk-ssh-", "ssh-dss ")
|
||||
|
||||
|
||||
def build_cert_command(actor: str, pubkey: Path) -> str:
|
||||
"""The cert_command an ops-bridge tunnel config would carry for this actor."""
|
||||
return f"warden sign {actor} --pubkey {pubkey}"
|
||||
|
||||
|
||||
def check_pubkey(pubkey: Path) -> Check:
|
||||
if not pubkey.exists():
|
||||
return ("fail", "public key", f"{pubkey} does not exist")
|
||||
text = pubkey.read_text(errors="replace").strip()
|
||||
if "PRIVATE KEY" in text:
|
||||
return ("fail", "public key", f"{pubkey} looks like a PRIVATE key — use the .pub")
|
||||
if not text.startswith(_PUBKEY_PREFIXES):
|
||||
return ("fail", "public key", f"{pubkey} is not a recognized SSH public key")
|
||||
return ("ok", "public key", f"{pubkey} ({text.split()[0]})")
|
||||
|
||||
|
||||
def check_actor(inventory_actors: dict, actor: str) -> Tuple[Check, Optional[ActorEntry]]:
|
||||
entry = inventory_actors.get(actor)
|
||||
if entry is None:
|
||||
return (("fail", "inventory", f"actor {actor!r} not in inventory"), None)
|
||||
max_ttl = MAX_TTL_HOURS.get(entry.actor_type)
|
||||
if not entry.ttl_hours or entry.ttl_hours <= 0:
|
||||
return (("fail", "inventory", f"actor {actor!r} has no resolvable TTL"), entry)
|
||||
if max_ttl and entry.ttl_hours > max_ttl:
|
||||
return (
|
||||
("fail", "inventory", f"actor {actor!r} TTL {entry.ttl_hours}h exceeds "
|
||||
f"{entry.actor_type.value} max {max_ttl}h"),
|
||||
entry,
|
||||
)
|
||||
return (
|
||||
("ok", "inventory", f"{actor} type={entry.actor_type.value} ttl={entry.ttl_hours}h"),
|
||||
entry,
|
||||
)
|
||||
|
||||
|
||||
def check_principals(entry: ActorEntry) -> Check:
|
||||
if not entry.principals:
|
||||
return ("fail", "principals", f"actor {entry.name!r} has no principals")
|
||||
return ("ok", "principals", ", ".join(entry.principals))
|
||||
|
||||
|
||||
def _infra_principals(infra: dict[str, Any]) -> set[str]:
|
||||
# Mirrors scripts/check_principals_drift.py._infra_principals.
|
||||
principals: set[str] = set()
|
||||
for host_data in (infra.get("ssh_principals") or {}).values():
|
||||
for user_principals in (host_data.get("users") or {}).values():
|
||||
principals.update(user_principals)
|
||||
return principals
|
||||
|
||||
|
||||
def check_infra_principal(entry: ActorEntry, infra_path: Optional[Path]) -> Check:
|
||||
if infra_path is None:
|
||||
return ("skip", "infra principals", "no --infra given (host-side check skipped)")
|
||||
if not infra_path.exists():
|
||||
return ("fail", "infra principals", f"{infra_path} not found")
|
||||
infra = yaml.safe_load(infra_path.read_text()) or {}
|
||||
deployed = _infra_principals(infra)
|
||||
missing = [p for p in entry.principals if p not in deployed]
|
||||
if missing:
|
||||
return (
|
||||
"fail",
|
||||
"infra principals",
|
||||
f"not deployed in {infra_path.name}: {', '.join(missing)}",
|
||||
)
|
||||
return ("ok", "infra principals", f"all deployed in {infra_path.name}")
|
||||
|
||||
|
||||
def run_checks(
|
||||
cfg: WardenConfig,
|
||||
actor: str,
|
||||
pubkey: Path,
|
||||
infra_path: Optional[Path],
|
||||
) -> List[Check]:
|
||||
"""Run every readiness check and return the result list (pure; no signing)."""
|
||||
checks: List[Check] = [
|
||||
("ok", "config", f"backend={cfg.backend}, inventory={cfg.inventory_path}")
|
||||
]
|
||||
inventory = load_inventory(cfg.inventory_path)
|
||||
actor_check, entry = check_actor(inventory.actors, actor)
|
||||
checks.append(actor_check)
|
||||
checks.append(check_pubkey(pubkey))
|
||||
if entry is not None:
|
||||
checks.append(check_principals(entry))
|
||||
checks.append(check_infra_principal(entry, infra_path))
|
||||
return checks
|
||||
|
||||
|
||||
def sign_smoke(cfg: WardenConfig, actor: str, pubkey: Path) -> List[Check]:
|
||||
"""Opt-in cert_command contract smoke against the LOCAL backend (WP-0016 T2).
|
||||
|
||||
Actually runs the cert_command (issues a short-lived local cert) and validates the
|
||||
emitted certificate: identity matches the actor, principals match inventory, and the
|
||||
validity window is within the actor type's max TTL. Requires ``ssh-keygen`` and a
|
||||
local backend — it must not touch production OpenBao. Raises on misuse.
|
||||
"""
|
||||
from warden.ca import CAError, LocalCA, parse_cert_metadata
|
||||
|
||||
if cfg.backend != "local":
|
||||
raise ValueError(
|
||||
f"--sign-smoke runs offline against the local backend, but config backend is "
|
||||
f"{cfg.backend!r}. Point --config at a local warden.yaml for the smoke."
|
||||
)
|
||||
inventory = load_inventory(cfg.inventory_path)
|
||||
entry = inventory.actors.get(actor)
|
||||
if entry is None:
|
||||
return [("fail", "sign smoke", f"actor {actor!r} not in inventory")]
|
||||
|
||||
spec = CertSpec(
|
||||
actor_name=actor,
|
||||
actor_type=entry.actor_type,
|
||||
pubkey_path=pubkey,
|
||||
ttl_hours=entry.ttl_hours,
|
||||
principals=entry.principals,
|
||||
identity=actor,
|
||||
)
|
||||
try:
|
||||
record = LocalCA(cfg.ca_key, cfg.state_dir).sign(spec)
|
||||
except CAError as e:
|
||||
return [("fail", "sign smoke", f"signing failed: {e}")]
|
||||
|
||||
checks: List[Check] = []
|
||||
if record.identity == actor:
|
||||
checks.append(("ok", "cert identity", record.identity))
|
||||
else:
|
||||
checks.append(("fail", "cert identity", f"{record.identity!r} != {actor!r}"))
|
||||
|
||||
if set(record.principals) == set(entry.principals):
|
||||
checks.append(("ok", "cert principals", ", ".join(record.principals)))
|
||||
else:
|
||||
checks.append(
|
||||
("fail", "cert principals", f"{record.principals} != inventory {entry.principals}")
|
||||
)
|
||||
|
||||
# Measure the validity window from the cert's own from→to so it is independent of
|
||||
# how ssh-keygen renders the timezone (parse_cert_metadata reads both the same way).
|
||||
max_ttl = MAX_TTL_HOURS.get(entry.actor_type)
|
||||
meta = parse_cert_metadata(record.cert_path)
|
||||
valid_from = meta.get("valid_from")
|
||||
if valid_from is None:
|
||||
window_h = (record.valid_before - record.signed_at).total_seconds() / 3600
|
||||
else:
|
||||
window_h = (meta["valid_before"] - valid_from).total_seconds() / 3600
|
||||
if max_ttl is None or window_h <= max_ttl + 0.1:
|
||||
checks.append(("ok", "cert validity", f"~{window_h:.1f}h (max {max_ttl}h)"))
|
||||
else:
|
||||
checks.append(("fail", "cert validity", f"~{window_h:.1f}h exceeds max {max_ttl}h"))
|
||||
return checks
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(description=__doc__)
|
||||
parser.add_argument("--actor", required=True)
|
||||
parser.add_argument("--pubkey", type=Path, required=True)
|
||||
parser.add_argument("--config", type=Path, default=None, help="warden.yaml (or WARDEN_CONFIG)")
|
||||
parser.add_argument("--infra", type=Path, default=None, help="railiance-infra ssh_principals.yaml")
|
||||
parser.add_argument(
|
||||
"--sign-smoke",
|
||||
action="store_true",
|
||||
help="Also run the cert_command against the local backend and validate the cert (WP-0016 T2)",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
try:
|
||||
cfg = load_config(args.config)
|
||||
except ConfigError as e:
|
||||
print(f"config error: {e}", file=sys.stderr)
|
||||
return 2
|
||||
pubkey = args.pubkey.expanduser()
|
||||
try:
|
||||
checks = run_checks(cfg, args.actor, pubkey, args.infra)
|
||||
if args.sign_smoke:
|
||||
checks += sign_smoke(cfg, args.actor, pubkey)
|
||||
except (InventoryError, ValueError, yaml.YAMLError) as e:
|
||||
print(f"input error: {e}", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
glyph = {"ok": "✓", "fail": "✗", "skip": "·"}
|
||||
print(f"cert_command readiness — actor {args.actor!r}\n")
|
||||
for status, label, detail in checks:
|
||||
print(f" {glyph[status]} {label}: {detail}")
|
||||
print(f"\n cert_command: {build_cert_command(args.actor, args.pubkey)}")
|
||||
|
||||
failed = [c for c in checks if c[0] == "fail"]
|
||||
if failed:
|
||||
print(f"\nNOT READY — {len(failed)} check(s) failed. See "
|
||||
"wiki/playbooks/ops-bridge-tunnel-cert.md")
|
||||
return 1
|
||||
print("\nREADY — ops-warden side is set. Next: cert_command smoke (--sign-smoke), "
|
||||
"then hand the cutover to ops-bridge.")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
128
tests/test_tunnel_cert_readiness.py
Normal file
128
tests/test_tunnel_cert_readiness.py
Normal file
@@ -0,0 +1,128 @@
|
||||
"""Tests for the ops-bridge cert_command readiness gate (WARDEN-WP-0016 T1/T2)."""
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib.util
|
||||
import shutil
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from warden.config import WardenConfig
|
||||
|
||||
_SCRIPT = Path(__file__).resolve().parent.parent / "scripts" / "check_tunnel_cert_readiness.py"
|
||||
_spec = importlib.util.spec_from_file_location("check_tunnel_cert_readiness", _SCRIPT)
|
||||
readiness = importlib.util.module_from_spec(_spec)
|
||||
_spec.loader.exec_module(readiness)
|
||||
|
||||
PUBKEY = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFakeKeyMaterialForTestsOnly comment\n"
|
||||
|
||||
|
||||
def _status(checks, label):
|
||||
return next(s for s, lab, _ in checks if lab == label)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def setup(tmp_path):
|
||||
inv = tmp_path / "inventory.yaml"
|
||||
inv.write_text(
|
||||
"actors:\n"
|
||||
" agt-state-hub-bridge:\n"
|
||||
" type: agt\n"
|
||||
" principals: [agt-task-bridge]\n"
|
||||
" ttl_hours: 24\n"
|
||||
)
|
||||
pub = tmp_path / "agt.pub"
|
||||
pub.write_text(PUBKEY)
|
||||
cfg = WardenConfig(
|
||||
backend="local",
|
||||
ca_key=tmp_path / "ca",
|
||||
inventory_path=inv,
|
||||
state_dir=tmp_path / "state",
|
||||
)
|
||||
return cfg, pub, tmp_path
|
||||
|
||||
|
||||
def test_all_ready(setup):
|
||||
cfg, pub, _ = setup
|
||||
checks = readiness.run_checks(cfg, "agt-state-hub-bridge", pub, None)
|
||||
assert _status(checks, "inventory") == "ok"
|
||||
assert _status(checks, "public key") == "ok"
|
||||
assert _status(checks, "principals") == "ok"
|
||||
assert _status(checks, "infra principals") == "skip" # no --infra
|
||||
|
||||
|
||||
def test_unknown_actor_fails(setup):
|
||||
cfg, pub, _ = setup
|
||||
checks = readiness.run_checks(cfg, "agt-ghost", pub, None)
|
||||
assert _status(checks, "inventory") == "fail"
|
||||
|
||||
|
||||
def test_missing_pubkey_fails(setup):
|
||||
cfg, _, tmp = setup
|
||||
checks = readiness.run_checks(cfg, "agt-state-hub-bridge", tmp / "nope.pub", None)
|
||||
assert _status(checks, "public key") == "fail"
|
||||
|
||||
|
||||
def test_private_key_rejected(setup):
|
||||
cfg, _, tmp = setup
|
||||
priv = tmp / "id.pub"
|
||||
priv.write_text("-----BEGIN OPENSSH PRIVATE KEY-----\nxxx\n-----END OPENSSH PRIVATE KEY-----\n")
|
||||
checks = readiness.run_checks(cfg, "agt-state-hub-bridge", priv, None)
|
||||
assert _status(checks, "public key") == "fail"
|
||||
|
||||
|
||||
def test_infra_principal_missing(setup):
|
||||
cfg, pub, tmp = setup
|
||||
infra = tmp / "ssh_principals.yaml"
|
||||
infra.write_text("ssh_principals:\n host1:\n users:\n agt: [some-other-principal]\n")
|
||||
checks = readiness.run_checks(cfg, "agt-state-hub-bridge", pub, infra)
|
||||
assert _status(checks, "infra principals") == "fail"
|
||||
|
||||
|
||||
def test_infra_principal_present(setup):
|
||||
cfg, pub, tmp = setup
|
||||
infra = tmp / "ssh_principals.yaml"
|
||||
infra.write_text("ssh_principals:\n host1:\n users:\n agt: [agt-task-bridge]\n")
|
||||
checks = readiness.run_checks(cfg, "agt-state-hub-bridge", pub, infra)
|
||||
assert _status(checks, "infra principals") == "ok"
|
||||
|
||||
|
||||
def test_ttl_over_max_fails(tmp_path):
|
||||
inv = tmp_path / "inventory.yaml"
|
||||
# agt max TTL is 24h; load_inventory clamps? No — it preserves; the check flags it.
|
||||
inv.write_text("actors:\n agt-x:\n type: agt\n principals: [p]\n ttl_hours: 999\n")
|
||||
pub = tmp_path / "k.pub"
|
||||
pub.write_text(PUBKEY)
|
||||
cfg = WardenConfig(backend="local", ca_key=tmp_path / "ca", inventory_path=inv, state_dir=tmp_path)
|
||||
checks = readiness.run_checks(cfg, "agt-x", pub, None)
|
||||
assert _status(checks, "inventory") == "fail"
|
||||
|
||||
|
||||
def test_build_cert_command():
|
||||
cmd = readiness.build_cert_command("agt-state-hub-bridge", Path("/k.pub"))
|
||||
assert cmd == "warden sign agt-state-hub-bridge --pubkey /k.pub"
|
||||
|
||||
|
||||
def test_sign_smoke_rejects_vault_backend(tmp_path):
|
||||
cfg = WardenConfig(backend="vault", inventory_path=tmp_path / "i.yaml", state_dir=tmp_path)
|
||||
with pytest.raises(ValueError, match="local backend"):
|
||||
readiness.sign_smoke(cfg, "agt-x", tmp_path / "k.pub")
|
||||
|
||||
|
||||
@pytest.mark.integration
|
||||
def test_sign_smoke_validates_real_cert(setup):
|
||||
"""Opt-in: requires ssh-keygen. Issues a real local cert and validates it."""
|
||||
if shutil.which("ssh-keygen") is None:
|
||||
pytest.skip("ssh-keygen not available")
|
||||
cfg, _, tmp = setup
|
||||
# Generate a real CA key and a real actor pubkey.
|
||||
ca = tmp / "ca"
|
||||
subprocess.run(["ssh-keygen", "-t", "ed25519", "-f", str(ca), "-N", "", "-q"], check=True)
|
||||
actor_key = tmp / "actor"
|
||||
subprocess.run(["ssh-keygen", "-t", "ed25519", "-f", str(actor_key), "-N", "", "-q"], check=True)
|
||||
checks = readiness.sign_smoke(cfg, "agt-state-hub-bridge", actor_key.with_suffix(".pub"))
|
||||
statuses = {lab: s for s, lab, _ in checks}
|
||||
assert statuses.get("cert identity") == "ok"
|
||||
assert statuses.get("cert principals") == "ok"
|
||||
assert statuses.get("cert validity") == "ok"
|
||||
@@ -11,6 +11,28 @@ ops-warden documents the migration; **ops-bridge** owns tunnel config changes.
|
||||
|
||||
---
|
||||
|
||||
## Step 0 — Readiness gate (run this first)
|
||||
|
||||
Before editing any tunnel config, run the read-only readiness gate (WARDEN-WP-0016).
|
||||
It confirms ops-warden's side is set — actor inventory, TTL, public key, and (optionally)
|
||||
host principals — **without signing anything**:
|
||||
|
||||
```bash
|
||||
python scripts/check_tunnel_cert_readiness.py \
|
||||
--actor agt-state-hub-bridge \
|
||||
--pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub \
|
||||
--config ~/.config/warden/warden.yaml \
|
||||
--infra ~/railiance-infra/ansible/inventory/ssh_principals.yaml
|
||||
```
|
||||
|
||||
Exit 0 = ready, 1 = a check failed (fix before proceeding), 2 = bad input. The
|
||||
Prerequisites and Migration checklist below are the human-readable backing for what the
|
||||
gate verifies. To additionally prove the `cert_command` contract end to end against a
|
||||
**local** backend (issues a throwaway cert, validates identity/principals/TTL), add
|
||||
`--sign-smoke` with a local `warden.yaml`.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- [ ] Actor registered in `~/.config/warden/inventory.yaml` (see `wiki/ActorInventoryPatterns.md`)
|
||||
|
||||
@@ -9,6 +9,7 @@ owner: claude
|
||||
topic_slug: custodian
|
||||
created: "2026-06-27"
|
||||
updated: "2026-06-27"
|
||||
state_hub_workstream_id: "142b171b-c34b-4a45-91a5-c77e6d07ec6f"
|
||||
---
|
||||
|
||||
# Ad Hoc Tasks — 2026-06-27
|
||||
@@ -21,6 +22,7 @@ Low-risk opportunistic fixes completed directly during the consolidation session
|
||||
id: ADHOC-2026-06-27-T01
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "867c72c9-9904-400f-8542-04264e5856c2"
|
||||
```
|
||||
|
||||
issue-core reported (msg `70bcf238`) that the `warden` CLI on `~/.local/bin` lacked
|
||||
|
||||
124
workplans/WARDEN-WP-0016-ops-bridge-tunnel-cert-pilot.md
Normal file
124
workplans/WARDEN-WP-0016-ops-bridge-tunnel-cert-pilot.md
Normal file
@@ -0,0 +1,124 @@
|
||||
---
|
||||
id: WARDEN-WP-0016
|
||||
type: workplan
|
||||
title: "ops-bridge cert_command pilot — readiness gate + handoff"
|
||||
domain: infotech
|
||||
repo: ops-warden
|
||||
status: finished
|
||||
owner: claude
|
||||
topic_slug: custodian
|
||||
planning_priority: high
|
||||
planning_order: 16
|
||||
created: "2026-06-27"
|
||||
updated: "2026-06-27"
|
||||
state_hub_workstream_id: "a56da8db-38bc-4bbe-8671-823360ec9245"
|
||||
---
|
||||
|
||||
# WARDEN-WP-0016 — ops-bridge cert_command pilot (readiness gate + handoff)
|
||||
|
||||
**Scope:** Close ops-warden's side of the last **Partial** INTENT criterion — *"ops-bridge
|
||||
integrates via a stable `cert_command`"*. The migration playbook
|
||||
(`wiki/playbooks/ops-bridge-tunnel-cert.md`, WP-0013) and the `cert_command` contract
|
||||
(`wiki/CertCommandInterface.md`) already exist, but the pilot has never been run because
|
||||
the readiness checks are scattered manual checkboxes across three owners (ops-warden,
|
||||
ops-bridge, railiance-infra). This WP ships the **automated readiness gate** an operator
|
||||
runs *before* touching tunnel config, plus an offline `cert_command` contract smoke, and
|
||||
hands the verified pilot to ops-bridge.
|
||||
|
||||
**Boundary (unchanged):** ops-warden issues certs and verifies its own side is ready.
|
||||
The **live tunnel cutover is ops-bridge's to execute** — this WP does not (cannot) flip a
|
||||
running tunnel. "Done" here means *pilot-ready and handed off*, not *tunnel migrated*.
|
||||
|
||||
**Out of scope:** editing `~/.config/bridge/tunnels.yaml` (ops-bridge owns it); deploying
|
||||
host principals (railiance-infra); requiring a live OpenBao token for the contract smoke
|
||||
(use the local backend).
|
||||
|
||||
**Depends on:** WP-0013 (playbook + contract), the SSH lane (prod-verified).
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
### T1 — Read-only `cert_command` readiness preflight
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0016-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "fea84495-dbec-480a-b42b-90e39f414b78"
|
||||
```
|
||||
|
||||
- [x] `scripts/check_tunnel_cert_readiness.py` — given `--actor`, `--pubkey`, `--config`
|
||||
(warden.yaml) and optional `--infra` (ssh_principals.yaml), asserts the cert_command
|
||||
path is ready **without signing anything**: config loads + backend known; actor in
|
||||
inventory with a valid type + TTL within the type max; pubkey file exists, parses,
|
||||
and is not a private key; actor principals present; (optional) principals deployed
|
||||
in the infra file (mirrors `check_principals_drift._infra_principals`). Exit 0/1/2.
|
||||
- [x] Checklist-style report (✓/✗/·); never prints a private key or token.
|
||||
- [x] Tests: `tests/test_tunnel_cert_readiness.py` (ready, unknown actor, missing/private
|
||||
pubkey, infra present/missing, TTL-over-max, cert_command string). 9 unit cases.
|
||||
|
||||
### T2 — Offline cert_command contract smoke
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0016-T02
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "e34ae1a8-2ba9-4324-8d1a-005d61dae478"
|
||||
```
|
||||
|
||||
- [x] Opt-in `--sign-smoke` mode runs the actual `cert_command` against the **local**
|
||||
backend and validates the emitted cert: identity matches the actor, principals match
|
||||
inventory, validity window within the type's max TTL. Refuses a vault backend (must
|
||||
be offline). Proves the contract end to end with no live OpenBao.
|
||||
- [x] Window measured from the cert's own `valid_from`→`valid_before` (via
|
||||
`parse_cert_metadata`) so it is timezone-robust — fixes a CEST off-by-2h artifact
|
||||
where local-time ssh-keygen output was read as UTC.
|
||||
- [x] `integration`-marked test (needs `ssh-keygen`, skipped in the default suite) plus a
|
||||
non-integration test that `--sign-smoke` refuses a vault backend.
|
||||
|
||||
### T3 — Playbook gate + ops-bridge handoff
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0016-T03
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "330e01f4-4927-4280-b0e0-49d35b4416d6"
|
||||
```
|
||||
|
||||
- [x] `wiki/playbooks/ops-bridge-tunnel-cert.md` now leads with **Step 0 — Readiness gate**
|
||||
(the exact `check_tunnel_cert_readiness.py` invocation + `--sign-smoke` note); the
|
||||
manual checklist remains as the human-readable backing.
|
||||
- [x] Sent ops-bridge the coordination handoff (pilot `agt-state-hub-bridge`, the
|
||||
readiness-gate command, and the cutover steps ops-bridge owns).
|
||||
|
||||
### T4 — INTENT/SCOPE alignment
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0016-T04
|
||||
status: done
|
||||
priority: low
|
||||
state_hub_task_id: "4726f5bb-4ffd-484f-8674-91ee5658434f"
|
||||
```
|
||||
|
||||
- [x] SCOPE: INTENT gap row moved from "Partial — tunnels still static-key" to
|
||||
"Pilot-ready — readiness gate shipped; live cutover handed to ops-bridge"; known-gaps
|
||||
row updated; readiness script added to the implemented SSH-lane list.
|
||||
|
||||
---
|
||||
|
||||
## Acceptance
|
||||
|
||||
- `scripts/check_tunnel_cert_readiness.py` gates the pilot read-only and is tested.
|
||||
- The offline contract smoke validates a real cert against the local backend.
|
||||
- The playbook leads with the automated gate; ops-bridge has the handoff with exact steps.
|
||||
- No secret material in any script, test, doc, or log. ops-warden's boundary is intact:
|
||||
it verifies and hands off; ops-bridge executes the cutover.
|
||||
|
||||
---
|
||||
|
||||
## See also
|
||||
|
||||
- `wiki/playbooks/ops-bridge-tunnel-cert.md`, `wiki/CertCommandInterface.md`
|
||||
- `scripts/check_principals_drift.py` (reused helpers)
|
||||
- `history/2026-06-24-intent-scope-gap-analysis.md`
|
||||
Reference in New Issue
Block a user