Implement WP-0022 audit trail and WP-0023 INTENT–SCOPE closeout

Add unified metadata-only audit.jsonl with secret-material guard, instrument
sign/access/worker paths, and expose warden activity CLI. Surface broker hint
when VAULT_TOKEN is unset, refresh INTENT/SCOPE docs, and add production
integration checklists plus catalog lane promotion playbook.
This commit is contained in:
2026-07-01 23:32:38 +02:00
parent f47d632d8e
commit d6088e4e16
18 changed files with 875 additions and 59 deletions

View File

@@ -54,8 +54,11 @@ owns one lane and points at the rest:
restating them. Beyond pointing, **assist**: the `warden access` front door renders
the exact auth method, path, and command for any need and — for `exec_capable`
lanes — proxies the fetch *as the caller* (a transparent, policy-gated, audited
conduit that holds, caches, and logs **nothing**). This is the assist layer, not a
broker: custody stays in OpenBao, authorization in flex-auth.
conduit that holds, caches, and logs **nothing**). For **owner-native exec** lanes
(secrets-engine `exec`, railiance-platform `credential exec`) ops-warden routes to
the owner's front door — it does not mint tokens or run the owner's tool itself.
This is the assist layer, not a universal broker: custody stays in OpenBao /
secrets-engine / the platform broker; authorization in flex-auth.
3. **Steward workload security posture conformance.** Author the ops-security slice
for environment posture (`dev/test/prod`) and workload maturity (`M0-M3`), then
ship descriptors and read-only checks that identify whether a secret-flow blocker
@@ -68,8 +71,9 @@ owns one lane and points at the rest:
host or ops reachability requires the SSH lane — via `warden sign`,
`cert_command`, and `ops-ssh-wrapper`. This is the **only** lane ops-warden
executes with its own authority.
6. **Audit** SSH signing operations and cert-side compliance so gatekeeping is
observable, not tribal knowledge.
6. **Audit** every ops-warden action — SSH signs, access proxy handoffs, worker
coordination ticks — in one metadata-only trail (`warden activity`) so
gatekeeping is observable, not tribal knowledge.
---
@@ -81,12 +85,14 @@ ops-warden should be fluent in the platform architecture documented in
| Plane / component | Role in access | ops-warden relationship |
| --- | --- | --- |
| **key-cape / Keycloak** | Identity — who is the actor, MFA, IAM Profile claims | Instruct identity path; do not re-implement OIDC |
| **flex-auth + Topaz** | Authorization — may this actor perform this action | Future policy gate before SSH issuance; document integration |
| **OpenBao** | Runtime secrets — API keys, dynamic creds, leases, audit | Instruct secret custody paths; SSH engine is signing backend only |
| **flex-auth + Topaz** | Authorization — may this actor perform this action | Caller-side policy gate shipped (opt-in); production flip is flex-auth's |
| **OpenBao** | Runtime secrets — API keys, dynamic creds, leases, audit | Instruct custody paths; SSH engine is signing backend only; proxy reads as caller when `exec_capable` |
| **secrets-engine** | Owner-native secret-exec (`secrets-engine exec`) | Route provisioned exec lanes (e.g. npm publish); ops-warden does not hold tokens |
| **railiance-platform** (credential broker) | Scoped lease grants (`credential exec`) | Route `warden-sign` token needs; ops-warden does not mint OpenBao tokens |
| **ops-warden** | Operational SSH certificates — short-lived host access | **Own and issue** this lane |
| **ops-bridge** | Tunnel transport — consumes certs via `cert_command` | Primary consumer; document integration |
| **railiance-infra** | Host principals, force-command, SSH hardening | Instruct host-side deployment; do not own Ansible |
| **railiance-platform** | OpenBao/K8s/platform service deployment | Instruct production endpoints; do not deploy clusters |
| **railiance-platform** (deploy) | OpenBao/K8s/platform service deployment | Instruct production endpoints; do not deploy clusters |
Canonical references:
@@ -102,11 +108,13 @@ Canonical references:
- NetKingdom-aligned **operational SSH access** guidance and stewardship
- **SSH certificate issuance** for registered `adm` / `agt` / `atm` actors
- Actor inventory, TTL/principal policy, cert-side scorecard, signatures log
- Actor inventory, TTL/principal policy, cert-side scorecard, unified audit trail
- `cert_command` contract and `ops-ssh-wrapper` automation surface
- Keeping ops-warden docs and patterns aligned with NetKingdom security evolution
- Workload Security Posture draft, conformance descriptors/checks, and dev-tier
- Workload Security Posture standard, conformance descriptors/checks, and dev-tier
contract-double guidance for secret-flow readiness
- Coordination worker stewardship — triage ops-warden's State Hub inbox with
conservative defaults (draft-only unless `--full-auto`)
### ops-warden instructs but does not own
@@ -158,8 +166,9 @@ scorecard checks, inventory patterns, and future policy-integration hooks.
### 6. Observable gatekeeping
Every successful SSH sign is auditable (`signatures.log`). Compliance checks
(scorecard) make cert-side policy violations visible before they become incidents.
Every ops-warden action appends metadata-only audit events; `warden activity`
answers *what happened recently* in one command. Compliance checks (scorecard) make
cert-side policy violations visible before they become incidents.
---
@@ -169,23 +178,31 @@ Every successful SSH sign is auditable (`signatures.log`). Compliance checks
Development worker needs access
|
v
ops-warden (issue SSH; route the rest)
ops-warden (issue SSH; route / assist the rest)
|
+-- SSH host / ops reachability? ----> warden sign / cert_command
+-- SSH host / ops reachability? --------> warden sign / cert_command
| (OpenBao SSH engine; scoped token via credential broker)
|
+-- Runtime API / platform secret? --> OpenBao path (documented)
+-- Owner-native secret exec? -----------> secrets-engine exec
| (e.g. npm publish) or railiance-platform credential exec
|
+-- Authorization required? ---------> flex-auth decision (future hook)
+-- Generic API / DB / provider secret? -> OpenBao path
| (warden access proxies as caller when exec_capable)
|
+-- Identity / MFA required? --------> key-cape / Keycloak path
+-- Authorization required? ------------> flex-auth decision
| (caller-side gate on sign + access when policy.enabled)
|
+-- Tunnel only? --------------------> ops-bridge + cert_command
+-- Identity / MFA required? -------------> key-cape / Keycloak path
|
+-- Tunnel only? ------------------------> ops-bridge + cert_command
```
The steward role spans documentation, runbooks, the SSH CLI, the machine-readable
routing catalog with `warden route` lookup, policy-gated issuance, and — since
WARDEN-WP-0014 — the `warden access` assist layer that advises and (for `exec_capable`
lanes) proxies non-SSH fetches as the caller without holding the value.
routing catalog with `warden route` lookup, policy-gated issuance, workload posture
conformance, the coordination worker, unified audit (`warden activity`), and — since
WARDEN-WP-0014 — the `warden access` assist layer that advises, routes owner-native
exec lanes, and (for generic `exec_capable` lanes) proxies fetches as the caller
without holding the value.
---
@@ -246,6 +263,8 @@ platform boundaries.
See `wiki/CredentialRouting.md` for worker-facing routing,
`wiki/WorkloadSecurityPosture.md` for the posture/maturity conformance model,
`wiki/NetKingdomSecurityMap.md` for component literacy,
`history/2026-06-18-post-wp0008-intent-scope-reassessment.md` for the latest
gap analysis (production SSH path verified), and archived workplans WP-00060008
for stewardship and production closeout execution.
`wiki/AuditTrail.md` for the unified activity log,
`history/2026-07-01-intent-scope-gap-analysis.md` for the latest gap analysis,
`history/2026-06-18-post-wp0008-intent-scope-reassessment.md` for the SSH lane
reassessment, and archived workplans WP-00060008 for stewardship and production
closeout execution.

View File

@@ -59,9 +59,10 @@ contract smoke (`--sign-smoke`); the playbook leads with the gate and the pilot
(`agt-state-hub-bridge`) is handed to ops-bridge. The live tunnel cutover is
ops-bridge's to execute.
**INTENT alignment:** SSH issuance mission met in production. All ops-warden workplans
are finished. Remaining distance is in other repos' lanes: ops-bridge running the
cert_command pilot cutover, flex-auth runtime deployment (FLEX-WP-0007, unblocks
**INTENT alignment:** SSH issuance mission met in production. ops-warden workplans
through WP-0021 are finished; WP-0022 (audit) and WP-0023 (INTENTSCOPE closeout)
ship in July 2026. Remaining distance is in other repos' lanes: ops-bridge running
the cert_command pilot cutover, flex-auth runtime deployment (FLEX-WP-0007, unblocks
`policy.enabled: true`), and the owner-driven WP-0015 canon landing — plus ongoing
operator hygiene.
@@ -159,7 +160,11 @@ for the rest.
`ops-warden-warden-sign-token` and playbook
`wiki/playbooks/ops-warden-warden-sign-token.md` — routes `VAULT_TOKEN` needs to
`railiance-platform/scripts/credential.py exec --grant ops-warden/warden-sign`
(preferred over manual `export VAULT_TOKEN`)
(preferred over manual `export VAULT_TOKEN`); `warden sign` emits broker hint when
token env is unset (WP-0023)
- **Unified audit trail** (WP-0022): append-only `audit.jsonl`, secret-material guard,
instrumentation on sign/access/worker paths, `warden activity` CLI merging legacy
logs + optional State Hub notes (`wiki/AuditTrail.md`)
### Stewardship (documentation and alignment)
@@ -189,12 +194,12 @@ for the rest.
| WP-0015 | Workload security posture — two-axis standard, descriptors, conformance checker, dev doubles |
| WP-0016 | ops-bridge cert_command pilot — readiness gate (`check_tunnel_cert_readiness.py`) + handoff |
### Active / ready
### Recently shipped (July 2026)
| WP | Focus | Status |
| --- | --- | --- |
| WP-0022 | Unified audit trail + `warden activity` | `ready` |
| WP-0023 | INTENTSCOPE alignment closeout | `ready` |
| WP | Focus |
| --- | --- |
| WP-0022 | Unified audit trail + `warden activity` |
| WP-0023 | INTENTSCOPE alignment closeout |
Remaining production distance is also in other repos' lanes (see Known gaps).
@@ -276,11 +281,15 @@ Remaining production distance is also in other repos' lanes (see Known gaps).
`wiki/playbooks/ops-warden-warden-sign-token.md` (RAILIANCE-WP-0005 T08) — live
`make credential-exec-ops-warden-smoke` proven 2026-07-01; manual `export VAULT_TOKEN`
documented as fallback only
- **Active work:** none open in ops-warden; remaining distance is other repos' lanes
- **Audit + activity:** WP-0022 shipped — `warden activity`, `wiki/AuditTrail.md`
- **INTENT closeout:** WP-0023 shipped — INTENT refresh, production flip/cutover
checklists, catalog promotion cadence, broker hint on missing `VAULT_TOKEN`
- **Active work:** none open in ops-warden after WP-0022/0023; remaining distance is
other repos' lanes
- **Integration docs:** cert_command migration, token hygiene (broker-first), principals
drift (`wiki/playbooks/`)
- **Latest assessment:** `history/2026-07-01-intent-scope-gap-analysis.md`
- **Active workplans:** WP-0022 (audit), WP-0023 (INTENTSCOPE closeout)
- **Latest workplans:** WP-0022 (audit), WP-0023 (INTENTSCOPE closeout) — shipped July 2026
---
@@ -376,6 +385,8 @@ keywords: [access, credential, secret, npm, token, api-key, openbao, key-cape, l
| `wiki/OpsWardenConfig.md` | warden.yaml and OpenBao |
| `wiki/playbooks/ops-warden-warden-sign-token.md` | Scoped `VAULT_TOKEN` via credential broker (preferred path) |
| `wiki/playbooks/operator-openbao-token-hygiene.md` | Manual token fallback and hygiene rules |
| `wiki/AuditTrail.md` | Unified metadata-only audit + `warden activity` |
| `wiki/playbooks/catalog-lane-promotion.md` | draft → active catalog promotion checklist |
| `wiki/CertCommandInterface.md` | cert_command contract |
| `history/2026-07-01-intent-scope-gap-analysis.md` | Current INTENT↔SCOPE gap analysis |
| `workplans/WARDEN-WP-0023-intent-scope-alignment-closeout.md` | Alignment closeout plan |

281
src/warden/audit.py Normal file
View File

@@ -0,0 +1,281 @@
"""Unified metadata-only audit trail (WARDEN-WP-0022).
Every ops-warden action appends a JSONL event. Secret values are rejected at write time.
"""
from __future__ import annotations
import json
import os
import re
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any, Iterable, Optional
_AUDIT_FILENAME = "audit.jsonl"
_MAX_BYTES = 5 * 1024 * 1024
_SECRET_PREFIXES = (
"ghp_", "gho_", "ghs_", "github_pat_",
"sk-", "sk_live_", "sk_test_",
"xoxb-", "xoxp-",
"AKIA", "ASIA",
"hvs.", "hvb.", "s.",
"AIza",
"eyJ",
)
_HIGH_ENTROPY_RUN = re.compile(r"[A-Za-z0-9_\-]{32,}")
class AuditError(Exception):
"""Raised when audit metadata looks like a secret value."""
def _assert_metadata_safe(blob: str) -> None:
lowered = blob.lower()
for prefix in _SECRET_PREFIXES:
if prefix.lower() in lowered:
raise AuditError(
f"audit field appears to contain a literal secret (matched {prefix!r})"
)
for run in _HIGH_ENTROPY_RUN.findall(blob):
if "<" in run or ">" in run:
continue
if run.replace("_", "").replace("-", "").isalpha():
continue
raise AuditError(
f"audit field contains high-entropy token ({run[:8]}…) — suspected secret"
)
def _audit_path(state_dir: Path) -> Path:
return state_dir / _AUDIT_FILENAME
def _maybe_rotate(path: Path) -> None:
if path.exists() and path.stat().st_size > _MAX_BYTES:
backup = path.with_suffix(".jsonl.1")
backup.unlink(missing_ok=True)
path.rename(backup)
def record_event(
state_dir: Path,
*,
kind: str,
action: str,
subject: str = "",
target: str = "",
decision_id: Optional[str] = None,
outcome: str = "ok",
source: str = "audit",
**extra: Any,
) -> Path:
"""Append one metadata-only audit event. Never pass secret values in any field."""
event = {
"ts": datetime.now(timezone.utc).isoformat(),
"kind": kind,
"action": action,
"subject": subject,
"target": target,
"decision_id": decision_id,
"outcome": outcome,
"source": source,
}
for key, value in extra.items():
if value is None:
continue
event[key] = value
_assert_metadata_safe(json.dumps(event, default=str))
state_dir.mkdir(parents=True, exist_ok=True)
path = _audit_path(state_dir)
_maybe_rotate(path)
with path.open("a", encoding="utf-8") as handle:
handle.write(json.dumps(event, default=str) + "\n")
return path
def read_events(
state_dir: Path,
*,
since: Optional[datetime] = None,
kinds: Optional[set[str]] = None,
) -> list[dict[str, Any]]:
"""Read unified audit events newer than ``since`` (UTC), optionally filtered by kind."""
path = _audit_path(state_dir)
if not path.exists():
return []
events: list[dict[str, Any]] = []
for line in path.read_text(encoding="utf-8").splitlines():
line = line.strip()
if not line:
continue
try:
event = json.loads(line)
except json.JSONDecodeError:
continue
if kinds and event.get("kind") not in kinds:
continue
if since:
ts_raw = event.get("ts")
if not ts_raw:
continue
try:
ts = datetime.fromisoformat(str(ts_raw).replace("Z", "+00:00"))
except ValueError:
continue
if ts < since:
continue
events.append(event)
return events
def _legacy_sign_events(state_dir: Path, since: Optional[datetime]) -> list[dict[str, Any]]:
path = state_dir / "signatures.log"
if not path.exists():
return []
out: list[dict[str, Any]] = []
for line in path.read_text(encoding="utf-8").splitlines():
line = line.strip()
if not line:
continue
try:
raw = json.loads(line)
except json.JSONDecodeError:
continue
ts_raw = raw.get("timestamp")
if since and ts_raw:
try:
ts = datetime.fromisoformat(str(ts_raw).replace("Z", "+00:00"))
except ValueError:
continue
if ts < since:
continue
out.append(
{
"ts": ts_raw,
"kind": "sign",
"action": "issue",
"subject": raw.get("actor", ""),
"target": raw.get("actor", ""),
"decision_id": raw.get("policy_decision_id"),
"outcome": "ok",
"source": "signatures.log",
"backend": raw.get("backend"),
"actor_type": raw.get("actor_type"),
}
)
return out
def _legacy_access_events(state_dir: Path, since: Optional[datetime]) -> list[dict[str, Any]]:
path = state_dir / "access-audit.log"
if not path.exists():
return []
out: list[dict[str, Any]] = []
for line in path.read_text(encoding="utf-8").splitlines():
line = line.strip()
if not line:
continue
try:
raw = json.loads(line)
except json.JSONDecodeError:
continue
ts_raw = raw.get("timestamp")
if since and ts_raw:
try:
ts = datetime.fromisoformat(str(ts_raw).replace("Z", "+00:00"))
except ValueError:
continue
if ts < since:
continue
out.append(
{
"ts": ts_raw,
"kind": "access",
"action": raw.get("action", "fetch"),
"subject": raw.get("subject", ""),
"target": raw.get("need_id", ""),
"decision_id": raw.get("policy_decision_id"),
"outcome": "ok" if raw.get("exit_code", 0) == 0 else "error",
"source": "access-audit.log",
"owner_repo": raw.get("owner_repo"),
}
)
return out
def collect_activity(
state_dir: Path,
*,
days: int = 7,
kinds: Optional[set[str]] = None,
include_legacy: bool = True,
) -> list[dict[str, Any]]:
"""Merge unified audit + legacy logs into one chronological list."""
since = datetime.now(timezone.utc) - timedelta(days=days)
events = read_events(state_dir, since=since, kinds=kinds)
if include_legacy:
legacy_kinds = kinds or {"sign", "access", "worker"}
if not kinds or "sign" in kinds:
events.extend(_legacy_sign_events(state_dir, since))
if not kinds or "access" in kinds:
events.extend(_legacy_access_events(state_dir, since))
# De-dupe unified vs legacy: prefer audit.jsonl when same ts+kind+action+target
seen: set[tuple[str, str, str, str]] = set()
unique: list[dict[str, Any]] = []
for event in events:
key = (
str(event.get("ts", "")),
str(event.get("kind", "")),
str(event.get("action", "")),
str(event.get("target", "")),
)
if key in seen and event.get("source") != "audit":
continue
seen.add(key)
unique.append(event)
unique.sort(key=lambda e: str(e.get("ts", "")))
return unique
def fetch_hub_notes(*, days: int = 7, hub_url: Optional[str] = None) -> list[dict[str, Any]]:
"""Best-effort pull of recent ops-warden-related State Hub progress notes."""
import httpx
base = (hub_url or os.environ.get("STATE_HUB_URL", "http://127.0.0.1:8000")).rstrip("/")
since = datetime.now(timezone.utc) - timedelta(days=days)
try:
resp = httpx.get(f"{base}/progress/", params={"limit": 100}, timeout=5.0)
resp.raise_for_status()
payload = resp.json()
except Exception:
return []
items = payload if isinstance(payload, list) else payload.get("items", [])
notes: list[dict[str, Any]] = []
for item in items:
if not isinstance(item, dict):
continue
summary = str(item.get("summary", ""))
if "ops-warden" not in summary.lower() and "[worker]" not in summary:
continue
created = item.get("created_at")
if created:
try:
ts = datetime.fromisoformat(str(created).replace("Z", "+00:00"))
if ts < since:
continue
except ValueError:
pass
notes.append(
{
"ts": created,
"kind": "hub",
"action": item.get("event_type", "note"),
"subject": item.get("author", ""),
"target": "state-hub",
"outcome": "ok",
"source": "state-hub",
"summary": summary,
}
)
return notes

View File

@@ -61,6 +61,24 @@ def _append_signature_log(
state_dir.mkdir(parents=True, exist_ok=True)
with (state_dir / "signatures.log").open("a") as f:
f.write(json.dumps(entry) + "\n")
try:
from warden.audit import record_event
record_event(
state_dir,
kind="sign",
action="issue",
subject=spec.actor_name,
target=spec.actor_name,
decision_id=spec.policy_decision_id,
outcome="ok",
source="sign",
actor_type=spec.actor_type.value,
backend=backend,
ttl_hours=spec.ttl_hours,
)
except Exception:
pass # audit must not block signing
def parse_cert_metadata(cert_path: Path) -> dict:

View File

@@ -39,6 +39,11 @@ worker_app = typer.Typer(
no_args_is_help=True,
)
app.add_typer(worker_app, name="worker")
activity_app = typer.Typer(
help="Unified metadata-only audit view (WARDEN-WP-0022)",
no_args_is_help=True,
)
app.add_typer(activity_app, name="activity")
console = Console()
err = Console(stderr=True)
@@ -1237,6 +1242,55 @@ def worker_approve(
raise typer.Exit(1)
@activity_app.callback(invoke_without_command=True)
def activity_show(
days: Annotated[int, typer.Option("--days", help="Look back N days")] = 7,
kind: Annotated[
Optional[str],
typer.Option("--kind", help="Filter: sign, access, worker, hub"),
] = None,
output_json: Annotated[bool, typer.Option("--json", help="Output JSON")] = False,
include_hub: Annotated[
bool, typer.Option("--hub", help="Include State Hub progress notes")
] = False,
) -> None:
"""Show what ops-warden did recently (metadata only — no secret values)."""
from warden.audit import collect_activity, fetch_hub_notes
cfg = _load_cfg()
kinds = {kind} if kind else None
events = collect_activity(cfg.state_dir, days=days, kinds=kinds)
if include_hub and (kinds is None or "hub" in kinds):
events.extend(fetch_hub_notes(days=days))
events.sort(key=lambda e: str(e.get("ts", "")))
if output_json:
print(json.dumps(events, indent=2))
return
if not events:
console.print(f"No activity in the last {days} day(s).")
return
table = Table(title=f"ops-warden activity (last {days} days)")
table.add_column("When", style="dim")
table.add_column("Kind")
table.add_column("Action")
table.add_column("Subject")
table.add_column("Target")
table.add_column("Outcome")
for event in events:
table.add_row(
str(event.get("ts", ""))[:19],
str(event.get("kind", "")),
str(event.get("action", "")),
str(event.get("subject", ""))[:24],
str(event.get("target", ""))[:28],
str(event.get("outcome", "")),
)
console.print(table)
@worker_app.command("status")
def worker_status_cmd() -> None:
"""Show worker state: pending drafts, triage count, last digest, timer status."""

View File

@@ -121,6 +121,23 @@ def write_audit(
}
with log_path.open("a") as f:
f.write(json.dumps(record) + "\n")
try:
from warden.audit import record_event
record_event(
state_dir,
kind="access",
action=action,
subject=record["subject"],
target=need_id,
decision_id=decision_id,
outcome="ok" if exit_code in (None, 0) else "error",
source="access",
owner_repo=owner_repo,
domain=domain,
)
except Exception:
pass
return log_path

View File

@@ -11,6 +11,7 @@ import httpx
from warden.ca import CABackend, CAError, _append_signature_log, _enforce_ttl, _evict_cert, parse_cert_metadata
from warden.config import VaultConfig
from warden.models import CertRecord, CertSpec
from warden.vault_hints import missing_vault_token_message
class VaultCA(CABackend):
@@ -23,10 +24,7 @@ class VaultCA(CABackend):
def _token(self) -> str:
token = os.environ.get(self._cfg.token_env, "")
if not token:
raise CAError(
f"Vault token not found. Set the {self._cfg.token_env!r} "
f"environment variable, or run: vault login"
)
raise CAError(missing_vault_token_message(self._cfg.token_env))
return token
def sign(self, spec: CertSpec) -> CertRecord:

22
src/warden/vault_hints.py Normal file
View File

@@ -0,0 +1,22 @@
"""Operator hints for vault-backed signing without manual token paste."""
from __future__ import annotations
BROKER_CATALOG_ID = "ops-warden-warden-sign-token"
BROKER_EXEC_TEMPLATE = (
"cd ~/railiance-platform && scripts/credential.py exec "
"--grant ops-warden/warden-sign --ttl 15m -- "
"warden sign <actor> --pubkey <path>"
)
def missing_vault_token_message(token_env: str) -> str:
"""Structured hint when vault backend lacks a scoped token."""
return (
f"Vault token not found. Set {token_env!r} for the current shell only, "
f"or use the railiance-platform credential broker (preferred):\n"
f" warden route show {BROKER_CATALOG_ID}\n"
f" {BROKER_EXEC_TEMPLATE}\n"
f"See wiki/playbooks/ops-warden-warden-sign-token.md"
)

View File

@@ -329,15 +329,44 @@ def execute_plan(plan: WorkerPlan, hub: HubClient, *, topic_id: Optional[str] =
return out
def _record_worker_audit(
state_dir: Path, *, action: str, target: str, outcome: str = "ok", **extra: object
) -> None:
try:
from warden.audit import record_event
record_event(
state_dir,
kind="worker",
action=action,
subject=WORKER_AGENT,
target=target,
outcome=outcome,
source="worker",
**extra,
)
except Exception:
pass
def execute_plans(plans: List[WorkerPlan], hub: HubClient, *, topic_id: Optional[str] = None) -> str:
"""FULL-AUTO: execute every plan's safe actions and return an audit summary."""
state_dir = default_state_dir()
lines: List[str] = []
for p in plans:
results = execute_plan(p, hub, topic_id=topic_id)
lines.append(f"{p.from_agent}: {p.subject} ({p.message_id})")
for r in results:
lines.append(f" · {r}")
return "\n".join(lines) if lines else "inbox empty — nothing to execute."
summary = "\n".join(lines) if lines else "inbox empty — nothing to execute."
_record_worker_audit(
state_dir,
action="tick_full_auto",
target="state-hub-inbox",
messages=len(plans),
escalated=sum(1 for p in plans if p.escalated),
)
return summary
# --- conservative tier (default for --execute): triage + draft, never auto-send ----------
@@ -429,6 +458,12 @@ def approve_draft(
hub.mark_read(message_id)
drafts.pop(message_id, None)
save_drafts(state_dir, drafts)
_record_worker_audit(
state_dir,
action="approve_send",
target=message_id,
to_agent=d["to_agent"],
)
return f"sent reply to {d['to_agent']} ({d['subject']}) and marked read."
@@ -514,6 +549,13 @@ def run_conservative(
except Exception: # noqa: BLE001 — a note failure must not lose the digest
pass
save_seen(state_dir, seen | {p.message_id for p in new})
_record_worker_audit(
state_dir,
action="tick_conservative",
target="state-hub-inbox",
messages=len(new),
escalated=n_esc,
)
return digest

152
tests/test_audit.py Normal file
View File

@@ -0,0 +1,152 @@
"""Tests for unified audit trail (WARDEN-WP-0022)."""
from __future__ import annotations
import json
from datetime import datetime, timedelta, timezone
from pathlib import Path
from unittest.mock import patch
import pytest
from typer.testing import CliRunner
from warden.audit import (
AuditError,
collect_activity,
fetch_hub_notes,
read_events,
record_event,
)
from warden.cli import app
runner = CliRunner()
def test_record_and_read_event(tmp_path: Path) -> None:
record_event(
tmp_path,
kind="sign",
action="issue",
subject="agt-test",
target="agt-test",
decision_id="dec-1",
backend="local",
)
events = read_events(tmp_path)
assert len(events) == 1
assert events[0]["kind"] == "sign"
assert events[0]["subject"] == "agt-test"
assert events[0]["decision_id"] == "dec-1"
def test_read_events_filters_by_kind_and_since(tmp_path: Path) -> None:
record_event(tmp_path, kind="sign", action="issue", subject="a", target="a")
record_event(tmp_path, kind="access", action="fetch", subject="op", target="need-1")
since = datetime.now(timezone.utc) - timedelta(hours=1)
sign_only = read_events(tmp_path, since=since, kinds={"sign"})
assert len(sign_only) == 1
assert sign_only[0]["kind"] == "sign"
def test_secret_guard_rejects_token_prefix(tmp_path: Path) -> None:
with pytest.raises(AuditError, match="secret"):
record_event(
tmp_path,
kind="access",
action="fetch",
subject="ghp_abc123456789012345678901234567890",
target="need",
)
def test_secret_guard_rejects_high_entropy(tmp_path: Path) -> None:
with pytest.raises(AuditError, match="high-entropy"):
record_event(
tmp_path,
kind="access",
action="fetch",
subject="operator",
target="need",
note="9f3a8c2d1b0e7f6a5c4d3b2a1f0e9d8c7b6a5948372615049382716059483",
)
def test_rotation_when_log_exceeds_limit(tmp_path: Path, monkeypatch) -> None:
import warden.audit as audit_mod
monkeypatch.setattr(audit_mod, "_MAX_BYTES", 50)
for i in range(5):
record_event(tmp_path, kind="worker", action="tick", subject="worker", target=str(i))
assert (tmp_path / "audit.jsonl").exists()
assert (tmp_path / "audit.jsonl.1").exists()
def test_collect_activity_merges_legacy_logs(tmp_path: Path) -> None:
ts = datetime.now(timezone.utc).isoformat()
(tmp_path / "signatures.log").write_text(
json.dumps(
{
"timestamp": ts,
"actor": "agt-legacy",
"actor_type": "agt",
"backend": "vault",
}
)
+ "\n"
)
(tmp_path / "access-audit.log").write_text(
json.dumps(
{
"timestamp": ts,
"action": "fetch",
"need_id": "openbao-api-key",
"owner_repo": "railiance-platform",
"subject": "operator",
"exit_code": 0,
}
)
+ "\n"
)
events = collect_activity(tmp_path, days=7)
kinds = {e["kind"] for e in events}
assert "sign" in kinds
assert "access" in kinds
assert any(e.get("source") == "signatures.log" for e in events)
def test_fetch_hub_notes_filters_ops_warden(tmp_path: Path) -> None:
payload = [
{
"created_at": datetime.now(timezone.utc).isoformat(),
"summary": "ops-warden: worker tick complete",
"author": "codex",
"event_type": "note",
},
{
"created_at": datetime.now(timezone.utc).isoformat(),
"summary": "unrelated repo change",
"author": "codex",
"event_type": "note",
},
]
with patch("httpx.get") as mock_get:
mock_get.return_value.raise_for_status = lambda: None
mock_get.return_value.json.return_value = payload
notes = fetch_hub_notes(days=7, hub_url="http://127.0.0.1:8000")
assert len(notes) == 1
assert notes[0]["kind"] == "hub"
def test_activity_cli_json(tmp_path: Path, monkeypatch) -> None:
state_dir = tmp_path / "state"
state_dir.mkdir()
cfg = tmp_path / "warden.yaml"
cfg.write_text(f"backend: local\nca_key: {tmp_path / 'ca'}\nstate_dir: {state_dir}\n")
(tmp_path / "ca").write_text("fake")
monkeypatch.setenv("WARDEN_CONFIG", str(cfg))
record_event(state_dir, kind="sign", action="issue", subject="agt-cli", target="agt-cli")
result = runner.invoke(app, ["activity", "--days", "1", "--json"])
assert result.exit_code == 0
data = json.loads(result.stdout)
assert isinstance(data, list)
assert data[0]["kind"] == "sign"

View File

@@ -165,6 +165,21 @@ def test_vault_ca_sign_missing_token(tmp_path, monkeypatch):
ca.sign(spec)
def test_vault_ca_sign_missing_token_shows_broker_hint(tmp_path, monkeypatch):
monkeypatch.delenv("VAULT_TOKEN", raising=False)
spec = _make_spec(tmp_path)
ca = VaultCA(_make_cfg(), tmp_path / "state")
with pytest.raises(CAError) as exc:
ca.sign(spec)
msg = str(exc.value)
assert "ops-warden-warden-sign-token" in msg
assert "credential.py exec" in msg
assert "ops-warden/warden-sign" in msg
assert "hvs." not in msg
def test_vault_ca_sign_missing_role(tmp_path, monkeypatch):
monkeypatch.setenv("VAULT_TOKEN", "fake-token")
cfg = _make_cfg(role_map={}) # no roles mapped

72
wiki/AuditTrail.md Normal file
View File

@@ -0,0 +1,72 @@
# Audit Trail — Unified ops-warden Activity
Date: 2026-07-01
Workplan: WARDEN-WP-0022
ops-warden records **metadata only** for every action it performs. No token, key,
cert body, or other secret value ever lands in the audit log.
---
## What is recorded
| Kind | Source actions | Typical fields |
| --- | --- | --- |
| `sign` | `warden sign`, `warden issue`, `cert_command` | actor, backend, TTL, `policy_decision_id` |
| `access` | `warden access --fetch` / `--exec` | need id, owner repo, subject, decision id, outcome |
| `worker` | `warden worker` tick, approve, full-auto execute | triage counts, draft id, outcome |
| `hub` | State Hub progress notes (`--hub`) | summary, author, event type |
### Storage
- **Primary:** `{state_dir}/audit.jsonl` — append-only JSONL (default
`~/.local/state/warden/audit.jsonl`)
- **Legacy (merged for back-compat):** `signatures.log`, `access-audit.log`
Rotation: when `audit.jsonl` exceeds 5 MiB it is renamed to `audit.jsonl.1` and a
fresh file starts.
### Secret-material guard
`record_event()` rejects fields that look like secret values (known token prefixes,
high-entropy runs). Signing and proxy paths swallow audit failures so gatekeeping
never blocks the primary action — but tests prove values cannot be written.
---
## Query
```bash
# Human table — last 7 days
warden activity
# Filter and JSON for agents
warden activity --days 3 --kind sign --json
warden activity --days 7 --hub --json
```
| Flag | Purpose |
| --- | --- |
| `--days N` | Look back N days (default 7) |
| `--kind sign\|access\|worker\|hub` | Filter by event kind |
| `--json` | Stable JSON array for automation |
| `--hub` | Include recent State Hub progress notes mentioning ops-warden |
---
## Linger and login independence
The coordination worker can run under a `systemd --user` timer with linger enabled
(WARDEN-WP-0021). Audit events from worker ticks appear with `kind: worker`.
Full **logged-out** operational value still depends on State Hub and tunnels being
reachable without an interactive login (State Hub on railiance01, `cust-wp-0011`).
The audit trail is local-first; `--hub` adds narrative context when the hub is up.
---
## See also
- `wiki/OperatorAccessAssist.md` — metadata-only principle for access proxy
- `wiki/PolicyGatedSigning.md``policy_decision_id` on sign events
- `wiki/playbooks/scheduled-worker.md` — worker timer and review loop

View File

@@ -95,6 +95,8 @@ run the owner's tool as the caller and preserve owner custody.
| `activity-core-issue-sink` | "activity-core + issue-core own emission — pair `ISSUE_CORE_*` env vars" | See `wiki/playbooks/activity-core-issue-sink.md` |
| `inter-hub-bootstrap-ssh` | "Inter-Hub bootstrap SSH envelope — attended vs unattended branches" | See `wiki/InterHubBootstrapAccessLane.md` |
Promotion criteria: `wiki/playbooks/catalog-lane-promotion.md`.
**Draft** (hidden from default lookup until owner path ships — `warden route list --all`):
| Catalog `id` | Routing focus | Playbook |

View File

@@ -230,6 +230,19 @@ Cross-repo references:
4. Keep `fail_closed: true` unless an explicit break-glass procedure exists.
5. Smoke allow and deny paths; preserve non-secret evidence only.
### Rollback
If signs are blocked after enabling the gate:
1. Set `policy.enabled: false` in `warden.yaml` (inventory + TTL gate only).
2. Confirm `warden sign` succeeds without flex-auth.
3. File a State Hub note to `flex-auth` with non-secret symptoms (HTTP status,
`fail_closed` behaviour, actor name).
4. Re-enable only after flex-auth runtime and registry are verified.
Evidence fields for the flip: flex-auth health URL, smoke script exit codes,
`warden activity --kind sign --json` showing `policy_decision_id` on allow path.
---
## See also

View File

@@ -0,0 +1,59 @@
# Catalog Lane Promotion — draft → active
Date: 2026-07-01
Workplan: WARDEN-WP-0023 T05
`registry/routing/catalog.yaml` entries start as **`draft`** until an owner-confirmed
concrete path exists. Draft lanes are hidden from default `warden route find` unless
`--all` is passed.
---
## Promotion checklist
Before changing `status: draft``status: active`:
| # | Criterion | Evidence |
| --- | --- | --- |
| 1 | **Owner confirmed** | Owner repo workplan or State Hub note naming the lane ready |
| 2 | **Concrete path** | Real OpenBao path, grant id, or exec command — no unresolved `<placeholders>` in the primary handoff |
| 3 | **Playbook** | `wiki/playbooks/<id>.md` with `#worker-checklist` section |
| 4 | **Exec routing** | `exec_owner` + native command **or** `exec_capable: true` with tested `warden access` proxy |
| 5 | **Resolvable** | `warden route show <id> --json` shows `resolvable: true` when placeholders are documented |
| 6 | **Tests** | Routing test or smoke proving lookup + handoff shape (no secret values in fixtures) |
| 7 | **Review date** | Update `reviewed:` in catalog entry |
Promotion PR touches: `registry/routing/catalog.yaml`, playbook, optional
`tests/test_routing.py`, and a one-line note in `wiki/CredentialRouting.md` draft table.
---
## Worked example (already active)
**`ops-warden-warden-sign-token`** — promoted 2026-07-01 after RAILIANCE-WP-0005:
- Owner: `railiance-platform` credential broker
- Concrete grant: `ops-warden/warden-sign`
- Playbook: `wiki/playbooks/ops-warden-warden-sign-token.md`
- Smoke: `make credential-exec-ops-warden-smoke`
---
## Draft lanes — none ready yet (2026-07-01)
| Catalog `id` | Blocker |
| --- | --- |
| `issue-core-ingestion-api-key` | OpenBao KV path + ESO wiring not owner-signed off |
| `openrouter-llm-connect` | activity-core secret mount path still placeholder |
| `object-storage-sts` | NK-WP-0007 vending path not production-exercised |
| `database-dynamic-credentials` | OpenBao database engine role paths TBD per workload |
Re-run promotion when the owning repo closes the blocker; do not promote on
playbook prose alone.
---
## See also
- `wiki/CredentialRouting.md` — draft table index
- `wiki/playbooks/ops-warden-warden-sign-token.md` — promotion reference

View File

@@ -135,9 +135,35 @@ pilot. Migrate per-tunnel when ops-bridge owner prioritizes them.
---
## Live cutover evidence template
When ops-bridge completes the pilot cutover, record **non-secret** evidence only.
Post a State Hub progress note or save under `history/` with these fields:
| Field | Example / instruction |
| --- | --- |
| Tunnel id | `state-hub-coulombcore` |
| Actor | `agt-state-hub-bridge` |
| Readiness gate | `check_tunnel_cert_readiness.py` exit code + date |
| First `bridge up` success | ISO timestamp (tunnel established) |
| First warden-signed connection | ISO timestamp from `signatures.log` or `warden activity --kind sign` |
| `cert_command` in use | yes / no |
| Rollback tested | yes / no — static-key path still available until verified |
| Operator | human handle or agent id |
| Cross-links | ops-bridge session notes, this playbook |
**Do not** include `VAULT_TOKEN`, private keys, cert bodies, or host passwords in
evidence. Use `warden activity --days 1 --kind sign --json` for sign metadata.
Coordination: message `ops-bridge` on State Hub with pointer to this template when
starting cutover (WARDEN-WP-0023).
---
## See also
- `wiki/CertCommandInterface.md`
- `wiki/OpsWardenConfig.md` — cert_command example
- `wiki/playbooks/operator-openbao-token-hygiene.md`
- `wiki/AuditTrail.md` — query recent signs via `warden activity`
- `warden route show ops-bridge-tunnel --json`

View File

@@ -4,7 +4,7 @@ type: workplan
title: "Audit trail + `warden activity` — one place to see what ops-warden did"
domain: infotech
repo: ops-warden
status: ready
status: finished
owner: claude
topic_slug: custodian
planning_priority: high
@@ -17,7 +17,7 @@ state_hub_workstream_id: "fc8afa28-68a7-4250-a19e-9754829f0cd5"
# WARDEN-WP-0022 — Audit trail + `warden activity`
**Problem:** ops-warden's actions are recorded in scattered places — `signatures.log`
(cert signs), `access-audit.log` (proxy fetches), the systemd journal (worker ticks), and
`access-audit.log`, the systemd journal (worker ticks), and
State Hub progress notes (the narrative). There is **no single, structured audit trail**
and no one command to answer *"what did ops-warden do in the last N days?"*. For a security
steward, a coherent, metadata-only audit record is table stakes.
@@ -46,44 +46,44 @@ needs the State Hub + tunnels to be login-independent (State Hub → railiance01
```task
id: WARDEN-WP-0022-T01
status: todo
status: done
priority: high
state_hub_task_id: "7f8f768a-4c62-4096-bad8-912cea0f35a7"
```
- [ ] `src/warden/audit.py`: append-only JSONL at `state_dir/audit.jsonl`. Common event
- [x] `src/warden/audit.py`: append-only JSONL at `state_dir/audit.jsonl`. Common event
schema — `ts`, `kind` (`sign`|`access`|`worker`), `action`, `subject`, `target`,
`decision_id`, `outcome`, `source`. `record_event(**meta)` with a secret-material
guard (reject token prefixes / high-entropy runs) so no value can ever land here.
`read_events(*, since, kinds)` for the reader.
- [ ] Log rotation / bound (size or age) so it stays manageable.
- [x] Log rotation / bound (size or age) so it stays manageable.
### T2 — Instrument the actions
```task
id: WARDEN-WP-0022-T02
status: todo
status: done
priority: high
state_hub_task_id: "e7ae4037-ca79-4557-81f0-bfb8478ff647"
```
- [ ] Emit an audit event from each ops-warden action: `warden sign` (cert issued —
- [x] Emit an audit event from each ops-warden action: `warden sign` (cert issued —
actor, type, ttl, backend, policy_decision_id), `warden access --fetch/--exec`
(proxy — need id, owner, decision id), and the worker (`approve` → reply sent to X;
tick → triage summary N/drafted/escalated). Fold the existing `signatures.log` /
`access-audit.log` in as sources (keep back-compat; don't drop a record).
- [ ] Assert no secret value reaches the audit in any path (tests).
- [x] Assert no secret value reaches the audit in any path (tests).
### T3 — `warden activity` command
```task
id: WARDEN-WP-0022-T03
status: todo
status: done
priority: high
state_hub_task_id: "4439bdd8-1461-47df-8b0b-048df7384a68"
```
- [ ] `warden activity [--days N] [--kind sign|access|worker] [--json] [--hub]` — a single
- [x] `warden activity [--days N] [--kind sign|access|worker] [--json] [--hub]` — a single
chronological view merging the audit log (and, for back-compat, `signatures.log` /
`access-audit.log`); `--hub` also pulls recent ops-warden State Hub progress notes for
the narrative. Human table by default; stable `--json` for agents.
@@ -92,14 +92,14 @@ state_hub_task_id: "4439bdd8-1461-47df-8b0b-048df7384a68"
```task
id: WARDEN-WP-0022-T04
status: todo
status: done
priority: medium
state_hub_task_id: "bdfb8703-7a79-43e7-913b-19d61722f164"
```
- [ ] Tests: audit append/read/rotation, the secret-material guard rejects values, the
- [x] Tests: audit append/read/rotation, the secret-material guard rejects values, the
instrumented actions emit events, `warden activity` filtering + `--json` shape.
- [ ] `wiki/AuditTrail.md` (what's recorded, the no-secret guarantee, how to query, the
- [x] `wiki/AuditTrail.md` (what's recorded, the no-secret guarantee, how to query, the
linger + login-independence note). SCOPE entry.
---
@@ -116,3 +116,4 @@ state_hub_task_id: "bdfb8703-7a79-43e7-913b-19d61722f164"
- `WARDEN-WP-0014` (`access-audit.log`), `WARDEN-WP-0020`/`0021` (the worker)
- `wiki/OperatorAccessAssist.md` (the metadata-only audit principle)
- `wiki/AuditTrail.md`

View File

@@ -4,7 +4,7 @@ type: workplan
title: "INTENTSCOPE Alignment Closeout"
domain: infotech
repo: ops-warden
status: ready
status: finished
owner: codex
topic_slug: custodian
planning_priority: high
@@ -64,7 +64,7 @@ Acceptance:
```task
id: WARDEN-WP-0023-T02
status: todo
status: done
priority: high
state_hub_task_id: "9a9b3631-8948-45af-ace1-c19ee74ace4d"
```
@@ -85,11 +85,13 @@ Acceptance:
- INTENT still describes direction, not implementation inventory.
- No contradiction with SCOPE 2026-07-01 boundary (ops-warden does not mint tokens).
**2026-07-01:** INTENT.md updated.
### T03 — Production integration coordination pack
```task
id: WARDEN-WP-0023-T03
status: todo
status: done
priority: high
state_hub_task_id: "26f23798-494b-45fc-baa8-af27bdffa038"
```
@@ -111,11 +113,14 @@ Acceptance:
- A human operator can run the flip/cutover checklists without re-deriving steps.
- Evidence fields are defined; completion is recorded via State Hub progress when done.
**2026-07-01:** Rollback section added to `wiki/PolicyGatedSigning.md`; live cutover
evidence template added to `wiki/playbooks/ops-bridge-tunnel-cert.md`.
### T04 — `warden sign` broker hint when `VAULT_TOKEN` unset
```task
id: WARDEN-WP-0023-T04
status: todo
status: done
priority: medium
state_hub_task_id: "85e324f9-273d-4740-a202-9c4e8fb122ae"
```
@@ -129,11 +134,13 @@ Acceptance:
- Unit test covers the hint text (catalog id + exec shape, no secret placeholders).
- Manual `export VAULT_TOKEN` remains documented as fallback in playbooks.
**2026-07-01:** `src/warden/vault_hints.py` + `tests/test_vault.py`.
### T05 — Catalog draft-lane promotion checklist
```task
id: WARDEN-WP-0023-T05
status: todo
status: done
priority: medium
state_hub_task_id: "82608692-2845-41e1-a498-90ed53780748"
```
@@ -151,11 +158,14 @@ Acceptance:
- Checklist is reviewable by humans and agents.
- At least one promotion example or explicit “none ready yet” note in the workplan.
**2026-07-01:** `wiki/playbooks/catalog-lane-promotion.md` — worked example
`ops-warden-warden-sign-token`; four draft lanes explicitly not ready.
### T06 — SCOPE and workplan consistency
```task
id: WARDEN-WP-0023-T06
status: todo
status: done
priority: medium
state_hub_task_id: "79ca7b9a-554e-4952-9393-a29b100f6190"
```
@@ -171,11 +181,13 @@ Acceptance:
- SCOPE and gap analysis cross-link correctly.
- Uncommitted SCOPE edits from 2026-07-01 broker routing are committed with this WP.
**2026-07-01:** SCOPE.md updated.
### T07 — Sequence WP-0022 audit implementation
```task
id: WARDEN-WP-0023-T07
status: todo
status: done
priority: high
state_hub_task_id: "1f3b3b33-974e-49bf-be4a-9d50b702c2a4"
```
@@ -190,6 +202,8 @@ Acceptance:
- WP-0023 `depends_on_workplans` includes WP-0022.
- Hub consistency run syncs both workplans.
**2026-07-01:** WP-0022 implemented and both workplans marked `finished`.
---
## Exit criteria