chore(consistency): sync task status from DB [auto]

Updated by fix-consistency on 2026-07-03: - update .custodian-brief.md for ops-bridge
tunnels: optional remote_host forward destination (default 127.0.0.1)
2026-07-03 18:52:51 +02:00 · 2026-07-02 14:18:18 +02:00 · 2026-06-22 23:16:27 +02:00 · 2026-06-22 11:40:44 +02:00 · 2026-06-22 03:06:02 +02:00 · 2026-06-22 02:44:47 +02:00
52 changed files with 3087 additions and 288 deletions
--- a/.claude/rules/agents.md
+++ b/.claude/rules/agents.md
@@ -0,0 +1,20 @@
 ## Kaizen Agents
 Specialized agent personas available on demand via the state-hub MCP.
 **Discover:** `list_kaizen_agents()` — returns all agents with name, description, category
 **Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them
 Common agents:
 | Agent | Category | When to use |
 |-------|----------|-------------|
 | `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature |
 | `code-refactoring` | quality | Code quality analysis and safe refactoring |
 | `test-maintenance` | testing | Diagnose and fix failing tests |
 | `requirements-engineering` | process | Prevent interface/mock mismatches upfront |
 | `keepaTodofile` | process | Maintain TODO.md during work |
 | `project-management` | process | Track status, determine next steps |
 | `datamodel-optimization` | quality | Optimize dataclasses and data structures |
 All 17 agents: call `list_kaizen_agents()` for the full list.
--- a/.claude/rules/architecture.md
+++ b/.claude/rules/architecture.md
@@ -1,31 +1,8 @@
 ## Architecture
-OpsBridge has two logical components:
+<!-- TODO: Describe the key design decisions and component structure.
-
+     Key modules, data flows, external integrations, state machines, etc. -->
 **1. OpsBridge — tunnel lifecycle manager** (this repo)
 Manages named SSH reverse tunnels defined in `~/.config/bridge/tunnels.yaml`.
 Each tunnel runs in a subprocess with a reconnect backoff loop; PIDs are tracked
 in `~/.local/state/bridge/`. Bridge states: `stopped → starting → connected →
 degraded → failed`. The `degraded` state means SSH is up but the optional HTTP
 health check is failing.
 **2. OpsCatalog — operations knowledge repository** (planned extension)
 A Git-backed YAML catalog of operations domains, targets, bridges, and actor
 classes. OpsBridge consumes this catalog to resolve bridge identifiers and
 orient operators. Schema examples are in `wiki/OpsCatalogSpecification.md`.
 The catalog layout follows: `opscatalog/domains/<domain>/{domain.yaml,
 targets/, bridges/, docs/}`.
 Key design constraints:
 - OpsBridge owns lifecycle management only; it does not own identity/credentials
 - Each tunnel is identified by name (e.g. `state-hub-coulombcore`); names used
  in config, CLI args, and log filenames must stay consistent
 - Actor attribution (human operator vs. automation agent) is tracked per bridge
  for audit log traceability (FRS §5.7)
 Specification docs are in `wiki/`: PRD (`OpsBridgePrd.md`), FRS
 (`OpsBridgeFrs.md`), and OpsCatalog spec (`OpsCatalogSpecification.md`).
 ## Quick Reference
-`~/the-custodian/state-hub/mcp_server/TOOLS.md`
+`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference
--- a/.claude/rules/credential-routing.md
+++ b/.claude/rules/credential-routing.md
@@ -0,0 +1,50 @@
 # Credential and access routing
 **Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
 for inference. Run this check **before** requesting secrets, API keys, SSH access,
 login tokens, or database passwords — in any repo, not only `ops-warden`.
 ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
 other credential need belongs to another subsystem. **Do not** message
 `ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
 ### Lookup (do this first)
 ```bash
 warden route find "<describe your need>" --json
 warden route show <catalog-id> --json
 ```
 Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
 | Agent runtime | How to orient |
 | --- | --- |
 | **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=ops-bridge` is for coordination, not secret vending |
 | **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
 | **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
 ### Quick routing table
 | I need… | Owner | ops-warden executes? |
 | --- | --- | --- |
 | SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
 | API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
 | Login / OIDC / MFA | key-cape / Keycloak | No — route only |
 | Authorization decision | flex-auth | No — route only |
 | activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
 | SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
 ### Anti-patterns (do not do these)
 - `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
 - Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
 - Pasting secrets into Git, State Hub, workplans, logs, or chat
 ### Other capabilities (reuse-surface)
 Non-credential capabilities are usually discovered through **reuse-surface** federation
 (`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
 every repo's agent instructions because it is high-frequency, high-risk, and easy to
 get wrong.
 **Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
--- a/.claude/rules/first-session.md
+++ b/.claude/rules/first-session.md
@@ -0,0 +1,38 @@
 ## First Session Protocol
 Triggered when `get_domain_summary("infotech")` shows **no workstreams**.
 The project is registered but work has not yet been structured.
 **Step 1 — Read, don't write**
 - `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope
 - `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases
 - Scan repo root: README, directory structure, existing code or docs
 **Step 2 — Survey in-progress work**
 Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
 **Step 3 — Propose workstreams to Bernd**
 Propose 1–3 workstreams — each a coherent strand, weeks to months, anchored to a
 roadmap phase. **Wait for approval before creating.**
 **Step 4 — Create workplan file first, then DB record (ADR-001)**
 ```
 workplans/BRIDGE-WP-NNNN-<slug>.md   ← write this first
 ```
 Then register in the hub:
 ```
 create_workstream(topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a", title="...", owner="...", description="...")
 create_task(workstream_id="<id>", title="...", priority="high|medium|low")
 ```
 **Step 5 — Record the setup**
 ```
 add_progress_event(
    summary="First session: structured infotech into N workstreams, M tasks",
    event_type="milestone",
    topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a",
    detail={"workstreams": [...], "tasks_created": M}
 )
 ```
 <!-- Delete or archive this file once past first session -->
--- a/.claude/rules/repo-boundary.md
+++ b/.claude/rules/repo-boundary.md
@@ -1,6 +1,8 @@
 ## Repo boundary
-This repo owns **tunnel lifecycle management only**. It does not own:
+This repo owns **ops-bridge** only. It does not own:
- State hub code → `the-custodian/state-hub/`
+
- SSH key management → `railiance-infra/` (S1) or user dotfiles
+<!-- TODO: List what belongs in adjacent repos, e.g.:
- Ansible/provisioning → `railiance-infra/`
+- SSH key management → railiance-infra/
 - State hub code     → state-hub/
 -->
--- a/.claude/rules/repo-identity.md
+++ b/.claude/rules/repo-identity.md
@@ -1,7 +1,5 @@
-**Purpose:** SSH reverse tunnel lifecycle manager. Keeps remote execution
+**Purpose:** SSH reverse tunnel lifecycle manager. Keeps remote execution environments (COULOMBCORE, Railiance nodes) connected to the local state hub. Small CLI tool: bridge up/down/status/logs per named tunnel config.
 environments (COULOMBCORE, Railiance nodes) connected to the local Custodian
 State Hub so Claude Code sessions on those machines have full MCP connectivity.
-**Domain:** custodian
+**Domain:** infotech
 **Repo slug:** ops-bridge
-**Repo ID:** 1bf99f56-6e94-4379-a9ea-295a4c181889
+**Topic ID:** cee7bedf-2b48-46ef-8601-006474f2ad7a
--- a/.claude/rules/session-protocol.md
+++ b/.claude/rules/session-protocol.md
@@ -1,24 +1,85 @@
-## Custodian State Hub Integration
+## Session Protocol
-State Hub: http://127.0.0.1:8000
+Dev Hub (State Hub API): http://127.0.0.1:8000
-
+MCP server name in `~/.claude.json`: `dev-hub`
 ### Session Protocol
 **Step 1 — Orient**
 Read the offline-safe brief first — it works without a live hub connection:
 ```bash
 cat .custodian-brief.md
 ```
-get_domain_summary("custodian")
+Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
 ```
 get_domain_summary("infotech")
 ```
 If MCP tools are unavailable in the current agent session, use the REST API:
 ```bash
 curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool
 ```
 If the hub is offline: `cd ~/state-hub && make api`
 **Step 2 — Check inbox**
 With MCP tools:
 ```
 get_messages(to_agent="ops-bridge", unread_only=True)
 ```
 Mark read with `mark_message_read(message_id)`. Reply or act on coordination
 requests before proceeding.
 Without MCP tools:
 ```bash
 curl -s "http://127.0.0.1:8000/messages/?to_agent=ops-bridge&unread_only=true" \
  | python3 -m json.tool
 curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
  -H "Content-Type: application/json" -d '{}'
 ```
-**Step 2 — Scan workplans**
+**Step 3 — Scan workplans**
-```
+```bash
 ls workplans/
 ```
 For each file with `status: ready`, `active`, or `blocked`, note pending
 `wait`/`todo`/`progress` tasks.
-**During work:** use `record_decision()`, `add_progress_event()`, `resolve_decision()`.
+**Step 4 — Present brief**
-**Session close:** `add_progress_event()` with workstream_id.
+1. **Active workstreams** for `infotech` — title, task counts, blocking decisions
 2. **Pending tasks** from `workplans/` + any `[repo:ops-bridge]` hub tasks
 3. **Goal guidance** — if `goal_guidance` in summary:
   - `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
   - `alignment_warnings`: flag if active work is not aligned with current goal
 4. **Suggested next action** — highest-priority open item
 5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
-If workplan files were modified, run from `~/the-custodian/state-hub/`:
+If no workstreams: follow First Session Protocol (`first-session.md`).
-```bash
+
-make fix-consistency REPO=ops-bridge
+**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
 > State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`)
 > are First Session Protocol only. Work structure belongs in repo files (ADR-001).
 **Session close:**
 With MCP tools:
 ```
 add_progress_event(summary="...", topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a", workstream_id="<uuid>")
 ```
 Without MCP tools:
 ```bash
 curl -s -X POST http://127.0.0.1:8000/progress/ \
  -H "Content-Type: application/json" \
  -d '{"topic_id":"cee7bedf-2b48-46ef-8601-006474f2ad7a","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
 ```
 If workplan files were modified, ensure the local copy is up to date first:
 ```bash
 git -C <repo_path> pull --ff-only
 cd ~/state-hub && make fix-consistency REPO=ops-bridge
 ```
 For repos where implementation runs on a remote machine (e.g. CoulombCore),
 use the combined target which pulls before fixing:
 ```bash
 cd ~/state-hub && make fix-consistency-remote REPO=ops-bridge
 ```
 **C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
 will sync the file to match DB.  **C-16** (repo behind remote) blocks all writes
 until you pull — intentional to prevent clobbering remote progress.
--- a/.claude/rules/stack-and-commands.md
+++ b/.claude/rules/stack-and-commands.md
@@ -1,46 +1,19 @@
 ## What this repo builds
 A CLI tool (`bridge`) that manages named SSH reverse tunnels:
 ```
 bridge up [TUNNEL]      # start tunnel(s)
 bridge down [TUNNEL]    # stop tunnel(s)
 bridge restart [TUNNEL] # restart tunnel(s)
 bridge status           # show all tunnels: state, uptime, last health check
 bridge logs [TUNNEL]    # tail reconnect log
 ```
 Config file: `~/.config/bridge/tunnels.yaml`
 Each tunnel:
 - Named (e.g. `state-hub-coulombcore`)
 - Reverse SSH port-forward: `ssh -R remote_port:127.0.0.1:local_port host`
 - Auto-reconnects on drop (backoff loop)
 - Optional HTTP health check to confirm the forwarded service is reachable
 PRD: `workplans/BRIDGE-WP-0001-initial-implementation.md`
 ## Stack
- **Language:** Python 3.11+
+<!-- TODO: Fill in language, frameworks, and key dependencies -->
- **CLI framework:** Typer
+- **Language:**
- **Dependencies:** typer, pyyaml, httpx
+- **Key deps:**
 - **Packaging:** `uv tool install` (single command install, no venv activation)
 - **No system daemons** — process management is internal, PID tracked in
  `~/.local/state/bridge/`
 ## Dev Commands
 ```bash
-# Install locally for development
+# TODO: Fill in the standard commands for this repo
-uv tool install -e .
+
 # Install dependencies
 # Run tests
 uv run pytest
-# Run a single test
+# Lint / type check
 uv run pytest tests/test_tunnel.py::test_name -v
-# Lint
+# Build / package (if applicable)
 uv run ruff check .
 ```
--- a/.claude/rules/workplan-convention.md
+++ b/.claude/rules/workplan-convention.md
@@ -1,6 +1,40 @@
-### Workplan Convention (ADR-001)
+## Workplan Convention (ADR-001)
 File location: `workplans/BRIDGE-WP-NNNN-<slug>.md`
-Prefix: `BRIDGE-WP`
+ID prefix: `BRIDGE-WP-`
-<!-- Ralph Loop rules are defined globally in ~/.claude/CLAUDE.md — do not duplicate here -->
+Work items originate as files in this repo **before** being registered in the hub.
 Canonical workplan/workstream frontmatter statuses are:
 `proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
 Use `proposed` for a newly drafted plan, `ready` after review against current
 repo state, and `finished` when implementation is complete. `stalled` and
 `needs_review` are derived health labels, not stored statuses.
 Closed workplans may be moved to `workplans/archived/` with a completion-date
 prefix: `YYMMDD-BRIDGE-WP-NNNN-<slug>.md`. The frontmatter id remains
 unchanged; the prefix is only for quick visual reference.
 Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
 `workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids
 `ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
 directly. Promote anything requiring analysis, design, approval, dependencies, or
 multiple planned phases into a normal workplan.
 Ecosystem todos from other agents arrive as `[repo:ops-bridge]` hub tasks —
 visible at session start. Pick one up by creating the workplan file, then registering
 the workstream.
 Task blocks use this shape:
 ```task
 id: BRIDGE-WP-NNNN-T01
 status: wait | todo | progress | done | cancel
 priority: high | medium | low
 state_hub_task_id: "<uuid>"         # written by fix-consistency — do not edit
 ```
 Status progression is `todo` → `progress` → `done`; use `wait` for waiting or
 blocked work and `cancel` for stopped work.
 <!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->
--- a/.codex/config.toml
+++ b/.codex/config.toml
@@ -0,0 +1,7 @@
 [mcp_servers.ops-bridge]
 command = "uv"
 args = [
    "run",
    "python",
    "src/bridge/mcp_server/server.py",
 ]
--- a/.custodian-brief.md
+++ b/.custodian-brief.md
@@ -0,0 +1,18 @@
 <!-- custodian-brief: generated by fix-consistency — do not edit manually -->
 # Custodian Brief — ops-bridge
 **Domain:** infotech  
 **Last synced:** 2026-07-03 16:52 UTC  
 **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
 ## Active Workstreams
 *(none — repo may need first-session setup)*
 ---
 ## MCP Orientation (when available)
 If the state-hub MCP server is reachable, call:
 `get_domain_summary("infotech")`
 This provides richer cross-domain context.
 If the MCP call fails, use this file as your orientation source.
--- a/.repo-classification.yaml
+++ b/.repo-classification.yaml
@@ -0,0 +1,26 @@
 # Repo classification (Repo Classification Standard v1.0).
 repo_classification:
  standard: Repo Classification Standard
  version: '1.0'
  classified_at: '2026-06-22'
  classified_by: human
  category: tooling
  domain: infotech
  secondary_domains: []
  capability_tags:
  - operations
  - access-control
  - platform
  - observability
  - orchestration
  business_stake:
  - operations
  - technology
  - automation
  business_mechanics:
  - control
  - operation
  - adaptation
  notes: SSH reverse-tunnel lifecycle manager keeping remote environments connected to the
    State Hub. Operational tooling -> product.
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,219 @@
 # ops-bridge — Agent Instructions
 ## Repo Identity
 **Purpose:** SSH reverse tunnel lifecycle manager. Keeps remote execution environments (COULOMBCORE, Railiance nodes) connected to the local state hub. Small CLI tool: bridge up/down/status/logs per named tunnel config.
 **Domain:** infotech
 **Repo slug:** ops-bridge
 **Topic ID:** `cee7bedf-2b48-46ef-8601-006474f2ad7a`
 **Workplan prefix:** `BRIDGE-WP-`
 ---
 ## State Hub Integration
 The Custodian State Hub tracks work across all domains. Interact via HTTP REST —
 there is no MCP server for Codex agents.
 | Context | URL |
 |---------|-----|
 | Local workstation | `http://127.0.0.1:8000` |
 | Remote via tunnel | `http://127.0.0.1:18000` |
 ### Orient at session start
 ```bash
 # Offline brief — works without hub connection
 cat .custodian-brief.md
 # Active workstreams for this domain
 curl -s "http://127.0.0.1:8000/workstreams/?topic_id=cee7bedf-2b48-46ef-8601-006474f2ad7a&status=active" \
  | python3 -m json.tool
 # Check inbox
 curl -s "http://127.0.0.1:8000/messages/?to_agent=ops-bridge&unread_only=true" \
  | python3 -m json.tool
 ```
 Mark a message read:
 ```bash
 curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
  -H "Content-Type: application/json" -d '{}'
 ```
 ### Log progress (required at session close)
 ```bash
 curl -s -X POST http://127.0.0.1:8000/progress/ \
  -H "Content-Type: application/json" \
  -d '{
    "summary": "what was done",
    "event_type": "note",
    "author": "codex",
    "workstream_id": "<uuid>",
    "task_id": "<uuid>"
  }'
 ```
 Omit `workstream_id` / `task_id` when not applicable.
 ### Update task status
 ```bash
 curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
  -H "Content-Type: application/json" \
  -d '{"status": "progress"}'
 # values: wait | todo | progress | done | cancel
 ```
 ### Flag a task for human review
 ```bash
 curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
  -H "Content-Type: application/json" \
  -d '{"needs_human": true, "intervention_note": "reason"}'
 ```
 ---
 ## Session Protocol
 **Start:**
 1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
 2. Check inbox: `GET /messages/?to_agent=ops-bridge&unread_only=true`; mark read
 3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks
 4. Check human-needed tasks: `GET /tasks/?needs_human=true`
 **During work:**
 - Update task statuses in workplan files as tasks progress
 - Record significant decisions via `POST /decisions/`
 **Close:**
 1. Update workplan file task statuses to reflect progress
 2. Log: `POST /progress/` with a summary of what changed
 3. Note for the custodian operator: after workplan file changes, run from
   `~/state-hub`:
   ```bash
   make fix-consistency REPO=ops-bridge
   ```
   This syncs task status from files into the hub DB.
 ---
 ## Credential and access routing
 **Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
 for inference. Run this check **before** requesting secrets, API keys, SSH access,
 login tokens, or database passwords — in any repo, not only `ops-warden`.
 ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
 other credential need belongs to another subsystem. **Do not** message
 `ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
 ### Lookup (do this first)
 ```bash
 warden route find "<describe your need>" --json
 warden route show <catalog-id> --json
 ```
 Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
 | Agent runtime | How to orient |
 | --- | --- |
 | **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=ops-bridge` is for coordination, not secret vending |
 | **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
 | **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
 ### Quick routing table
 | I need… | Owner | ops-warden executes? |
 | --- | --- | --- |
 | SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
 | API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
 | Login / OIDC / MFA | key-cape / Keycloak | No — route only |
 | Authorization decision | flex-auth | No — route only |
 | activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
 | SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
 ### Anti-patterns (do not do these)
 - `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
 - Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
 - Pasting secrets into Git, State Hub, workplans, logs, or chat
 ### Other capabilities (reuse-surface)
 Non-credential capabilities are usually discovered through **reuse-surface** federation
 (`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
 every repo's agent instructions because it is high-frequency, high-risk, and easy to
 get wrong.
 **Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
 <!-- REPO-AGENTS-EXTENSIONS -->
 <!-- Append repo-specific agent instructions below this marker.
     The state-hub template sync preserves content after this line. -->
 ---
 ## Workplan Convention (ADR-001)
 Work items originate as files in this repo — not in the hub. The hub is a
 read/cache/index layer that rebuilds from files.
 **File location:** `workplans/OPS-WP-NNNN-<slug>.md`
 **Archived location:** finished workplans may move to
 `workplans/archived/YYMMDD-OPS-WP-NNNN-<slug>.md`. The `YYMMDD` prefix is
 the completion/archive date; the frontmatter `id` does not change.
 **Ad Hoc Tasks:** small opportunistic fixes discovered during a session use
 `workplans/ADHOC-YYYY-MM-DD.md` with task ids `ADHOC-YYYY-MM-DD-T01`, etc. Use
 this only for low-risk work completed directly; create a normal workplan for
 anything needing analysis, design, approval, dependencies, or multiple phases.
 **Frontmatter:**
 ```yaml
 ---
 id: OPS-WP-NNNN
 type: workplan
 title: "..."
 domain: infotech
 repo: ops-bridge
 status: proposed | ready | active | blocked | backlog | finished | archived
 owner: codex
 topic_slug: ...
 created: "YYYY-MM-DD"
 updated: "YYYY-MM-DD"
 state_hub_workstream_id: "<uuid>"   # written by fix-consistency — do not edit
 ---
 ```
 Use `proposed` for a new draft, `ready` after review against current repo
 state, and `finished` after implementation. `stalled` and `needs_review` are
 derived health labels, not frontmatter statuses.
 **Task block format** (one per `##` section):
 ```
 ## Task Title
 ` ` `task
 id: OPS-WP-NNNN-T01
 status: wait | todo | progress | done | cancel
 priority: high | medium | low
 state_hub_task_id: "<uuid>"         # written by fix-consistency — do not edit
 ` ` `
 Task description text.
 ```
 Status progression: `todo` → `progress` → `done`; use `wait` for waiting/blocked work and `cancel` for stopped work.
 To create a new workplan:
 1. Write the file following the format above
 2. Notify the custodian operator to run `make fix-consistency REPO=ops-bridge`
   (or send a message to the hub agent via `POST /messages/`)
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,8 +1,12 @@
 # ops-bridge — Claude Code Instructions
@SCOPE.md
@.claude/rules/repo-identity.md
@.claude/rules/session-protocol.md
@.claude/rules/first-session.md
@.claude/rules/workplan-convention.md
@.claude/rules/stack-and-commands.md
@.claude/rules/architecture.md
@.claude/rules/repo-boundary.md
@.claude/rules/credential-routing.md
@.claude/rules/agents.md
--- a/INTENT.md
+++ b/INTENT.md
@@ -0,0 +1,92 @@
 # INTENT
 ## Purpose
 This repository exists to provide a **reliable, inspectable, and controllable connectivity layer** 
 between distributed dev, build, test and execution environments for dev and ops personal human and agentic.
 Its role is to ensure that remote machines can **consistently and safely “phone home”** without requiring complex network infrastructure or manual intervention.
 ---
 ## Primary Utility
 The repository provides a **managed SSH reverse tunneling system** that:
 * Maintains continuous connectivity between remote systems and a central hub
 * Makes connectivity **observable, auditable, and controllable**
 * Exposes this capability as both a **CLI tool and an MCP-accessible service**
 It transforms raw SSH port-forwarding into a **first-class operational primitive**.
 ---
 ## Intended Users
 * Human operators (`adm`) managing infrastructure and connectivity
 * LLM-based agents (`agt`) requiring stable access to local services
 * Deterministic automations (`atm`) coordinating distributed workloads
 ---
 ## Strategic Role in the System
 This repository acts as the **connectivity backbone** of the custodian ecosystem:
 * It enables remote agents and services to participate in a **locally anchored control plane**
 * It decouples **execution location** from **control location**
 * It supports a **hub-and-spoke topology** where the Custodian State Hub remains central
 ---
 ## Strategic Boundaries
 This repository is **not** intended to:
 * Replace SSH as a general-purpose access mechanism
 * Act as a credential authority or security policy engine
 * Provide full network virtualization (e.g., VPN, mesh networking)
 * Host or orchestrate application workloads
 Its responsibility ends at **secure, observable, and managed connectivity via tunnels**.
 ---
 ## Design Principles
 * **Continuity over convenience**
  Connectivity must persist across failures without manual recovery
 * **Observability as a first-class concern**
  All lifecycle events must be traceable and attributable
 * **Actor-aware operations**
  Every action is tied to a clearly defined actor type (`adm`, `agt`, `atm`)
 * **Pluggable security integration**
  Works with both static keys and external certificate authorities without owning them
 * **Toolability**
  All capabilities should be accessible programmatically (MCP) and operationally (CLI)
 ---
 ## Maturity Target
 A mature version of this repository should:
 * Provide **fully autonomous tunnel lifecycle management** across heterogeneous environments
 * Integrate seamlessly with **centralized access control and certificate systems**
 * Serve as a **standardized connectivity primitive** across all Custodian-managed systems
 * Offer **complete operational transparency** for all connectivity-related actions
 * Be robust enough to act as the **default connectivity layer** for distributed agent systems
 ---
 ## Stability Note
 Changes to this file represent a **deliberate shift in repository purpose or role** within the system architecture.
 Such changes should be rare and made with explicit intent.
--- a/31
+++ b/31
@@ -1,10 +1,31 @@
-.PHONY: test lint install
+.DEFAULT_GOAL := help
-test:
+.PHONY: help setup test lint install mcp-http mcp-stop cron-install-cron cron-uninstall-cron
 help: ## List available make targets
 	@awk 'BEGIN {FS = ":.*## "}; /^[a-zA-Z0-9_.-]+:.*## / {printf "  %-16s %s\n", $$1, $$2}' $(MAKEFILE_LIST)
 setup: ## Sync dependencies and install the bridge CLI wrapper
 	uv sync --all-groups
 	uv tool install -e . --force
 test: ## Run the test suite
 	uv run pytest
-lint:
+lint: ## Run ruff lint checks
 	uv run ruff check .
-install:
+install: ## Install the bridge CLI wrapper
-	uv tool install -e .
+	uv tool install -e . --force
 mcp-http: ## Start MCP server in SSE mode (default port 8002)
 	BRIDGE_MCP_PORT=$${BRIDGE_MCP_PORT:-8002} uv run python src/bridge/mcp_server/server.py --http
 mcp-stop: ## Stop MCP server running on port 8002
 	@lsof -ti:$${BRIDGE_MCP_PORT:-8002} | xargs -r kill -TERM && echo "MCP server stopped" || echo "No MCP server running on port $${BRIDGE_MCP_PORT:-8002}"
 cron-install-cron: ## Install 03:00 nightly stale-forward cleanup cron
 	bridge maintenance install-cron
 cron-uninstall-cron: ## Remove nightly stale-forward cleanup cron
 	bridge maintenance uninstall-cron
--- a/README.txt
+++ b/README.txt
@@ -243,6 +243,31 @@ has not yet cleaned up the socket), so the next reconnect attempt hits
 "remote port forwarding failed" and exits with code 255. With ClientAlive
 enabled, sshd evicts stale sessions within ~90 seconds and frees the port.
 NIGHTLY STALE-FORWARD CLEANUP
 ------------------------------
 When a bridge client dies without tearing down its SSH session, the remote
 host can keep port 18000 (etc.) bound to a zombie sshd listener. The port
 accepts connections but never forwards them, which breaks in-cluster proxies
 such as actcore-state-hub-bridge on railiance01.
 Install a 03:00 local-time cron job that probes each reverse tunnel's remote
 forward, kills stale listeners when the local service is healthy but the
 remote forward is not, and restarts the tunnel:
  bridge maintenance install-cron
 Manual run:
  bridge maintenance cleanup --restart
 Inspect or remove the cron entry:
  bridge maintenance show-cron
  bridge maintenance uninstall-cron
 Logs append to ~/.local/state/bridge/cleanup.log
 Apply and reload (no disconnect):
  sudo sed -i 's/#ClientAliveInterval 0/ClientAliveInterval 30/' /etc/ssh/sshd_config
--- a/SCOPE.md
+++ b/SCOPE.md
@@ -8,7 +8,7 @@
 ## One-liner
-SSH reverse tunnel lifecycle manager — keeps remote execution environments continuously connected to the local Custodian State Hub via auto-reconnecting port-forwards.
+SSH reverse tunnel lifecycle manager — keeps remote execution environments continuously connected to the local Custodian State Hub via auto-reconnecting port-forwards. Supports both static SSH keys (no TTL) and CA-signed short-lived certificates via a pluggable `cert_command` interface.
 ---
@@ -20,11 +20,17 @@ Claude Code sessions run locally; the Custodian State Hub API runs locally. Remo
 ## In Scope
- Named SSH reverse tunnel lifecycle (`bridge up/down/restart/status/logs`)
+- Named SSH reverse tunnel lifecycle (`bridge up/down/restart/status/logs/cert-status`)
 - Auto-reconnect with exponential backoff and configurable retry policy
 - Optional HTTP health checks (confirm forwarded service is actually reachable from remote)
 - Structured audit logging: JSON events (connected, disconnected, health_check_failed, etc.)
- Actor attribution: per-tunnel actor class (human / automation) for audit traceability
+- Actor attribution: per-tunnel actor type (`adm` / `agt` / `atm`) for audit traceability,
  with naming convention enforcement (`adm-*`, `agt-*`, `atm-*`)
 - **Static key mode** (default): `ssh_key` passed directly to SSH — no TTL, no cert logic,
  works without any CA or external tooling
 - **cert_command mode** (optional): pluggable shell command that issues a short-lived
  CA-signed certificate before each SSH launch; TTL-aware pre-emptive cert refresh;
  `cert_identity` recorded in audit log — satisfies AccessManagementDirective §5
 - PID + state file management in `~/.local/state/bridge/`
 - MCP server exposing tunnel lifecycle + OpsCatalog queries as Claude Code tools
 - OpsCatalog: optional Git-backed YAML catalog of infrastructure topology (domains/targets/bridges)
@@ -33,7 +39,10 @@ Claude Code sessions run locally; the Custodian State Hub API runs locally. Remo
 ## Out of Scope
- Identity/credential management (uses existing SSH keys)
+- Credential issuance and CA management (owned by `ops-warden`; ops-bridge consumes
  certs via the `cert_command` interface but never signs anything itself)
 - SSH key generation for human admins (self-service: `ssh-keygen`)
 - Host-side principal deployment (`/etc/ssh/auth_principals/`) — that is `railiance-infra`
 - Long-running application hosting on remote machines (port-forward only, not deployment)
 - VPN or layer-3 connectivity
 - Monitoring/alerting beyond JSON audit logs
@@ -44,9 +53,11 @@ Claude Code sessions run locally; the Custodian State Hub API runs locally. Remo
 ## Relevant When
 - Remote Temporal workers or Railiance nodes need to reach the local Custodian MCP
- Need audit trail of which actor (human vs. automation) started/stopped tunnels
+- Need audit trail of which actor (`adm` / `agt` / `atm`) started/stopped tunnels
 - Setting up a new machine in the Railiance ecosystem that must phone home to the hub
 - Diagnosing connectivity issues between local hub and remote services
 - Checking certificate validity for active tunnels (`bridge cert-status`)
 - Integrating with a CA (ops-warden or Vault) for short-lived tunnel credentials
 ---
@@ -60,8 +71,11 @@ Claude Code sessions run locally; the Custodian State Hub API runs locally. Remo
 ## Current State
- Status: experimental → active (v0.1 core complete; OpsCatalog planned but not yet shipped)
+- Status: active (v0.1 core complete; AccessManagementDirective alignment done — BRIDGE-WP-0004)
- Implementation: ~75% — CLI tunneling fully functional, MCP integration working, health checks and audit logging complete; OpsCatalog framework present but not populated
+- Implementation: ~80% — CLI tunneling fully functional, MCP integration working, health
  checks and audit logging complete; ActorType enum (adm/agt/atm) enforced; cert_command
  mode implemented with TTL-aware refresh and cert_identity audit logging; OpsCatalog
  framework present but not yet populated
 - Stability: stable tunnel lifecycle; tested under network drops and SSH failures
 - Usage: running in lab for daily Railiance/Temporal connectivity
@@ -77,17 +91,24 @@ Claude Code sessions run locally; the Custodian State Hub API runs locally. Remo
 ## Terminology
- Preferred terms: tunnel, bridge, actor, actor_class, reconnect policy, health check
+- Preferred terms: tunnel, bridge, actor, actor_type, reconnect policy, health check,
  cert_command, cert_identity
 - Actor types: `adm` (human operator), `agt` (LLM agent), `atm` (deterministic automation)
 - Also known as: "the bridge"
- Potentially confusing terms: "bridge state" is a tunnel-specific state machine (stopped → starting → connected ↔ degraded → reconnecting), not a network bridge
+- Potentially confusing: "bridge state" is a tunnel-specific state machine
  (stopped → starting → connected ↔ degraded → reconnecting), not a network bridge
 - Legacy terms (deprecated): `actor_class: human` (→ `adm`), `actor_class: automation` (→ `atm`)
 ---
-## Related / Overlapping Repositories
+## Related / Overlapping
 - `the-custodian` — primary consumer; ops-bridge keeps remote agents connected to it
 - `ops-warden` — optional upstream; owns CA and cert issuance; ops-bridge calls it via
  `cert_command` when short-lived certificates are required
 - `activity-core` — Temporal server on remote reached via ops-bridge tunnel
- `railiance-cluster` / `railiance-infra` — remote hosts that need to phone home
+- `railiance-cluster` / `railiance-infra` — remote hosts that need to phone home; owns
  host-side principal deployment (`/etc/ssh/auth_principals/`)
 ---
@@ -105,5 +126,9 @@ keywords: [ssh, tunnel, reverse-tunnel, connectivity, remote, bridge, ops-bridge
 ## Getting Oriented
 - Start with: `README.txt` (architecture, config format, CLI commands, MCP integration)
- Key files / directories: `~/.config/bridge/tunnels.yaml` (tunnel config), `~/.local/state/bridge/` (PID/state files)
+- Key files / directories: `~/.config/bridge/tunnels.yaml` (tunnel config),
- Entry points: `bridge --help`; `bridge up <tunnel-name>`; MCP: `bridge_status()`
+  `~/.local/state/bridge/` (PID/state/cert files)
 - Entry points: `bridge --help`; `bridge up <tunnel-name>`; `bridge cert-status`;
  MCP: `bridge_status()`
 - AccessManagementDirective context: `wiki/AccessManagementDirective.md`
 - Workplans: BRIDGE-WP-0004 (directive alignment), WARDEN-WP-0001 (ops-warden bootstrap)
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -11,7 +11,7 @@ dependencies = [
    "typer>=0.12",
    "pyyaml>=6.0",
    "httpx>=0.27",
-    "fastmcp>=2.0.0",
+    "fastmcp>=2.0.0,<3.1.0",
 ]
 [project.scripts]
--- a/registry/README.md
+++ b/registry/README.md
@@ -0,0 +1,12 @@
 # Capability Registry
 Markdown-first capability index for federation and reuse planning.
 ## Authoring
 1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`).
 2. Add the row to `indexes/capabilities.yaml`.
 3. Run `reuse-surface validate` from a checkout with the CLI installed.
 4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`.
 Federation contract: reuse-surface `docs/RegistryFederation.md`.
--- a/registry/capabilities/.gitkeep
+++ b/registry/capabilities/.gitkeep
--- a/registry/indexes/capabilities.yaml
+++ b/registry/indexes/capabilities.yaml
@@ -0,0 +1,4 @@
 version: 1
 updated: '2026-06-16'
 domain: helix_forge
 capabilities: []
--- a/src/bridge/audit.py
+++ b/src/bridge/audit.py
@@ -16,6 +16,7 @@ class AuditEvent(str, Enum):
    HEALTH_CHECK_FAILED = "health_check_failed"
    HEALTH_CHECK_RECOVERED = "health_check_recovered"
    BRIDGE_STOPPED = "bridge_stopped"
    CERT_EXPIRING = "cert_expiring"
 def _default_state_dir() -> Path:
@@ -34,19 +35,22 @@ class AuditLogger:
        tunnel: str,
        event: AuditEvent,
        actor: str,
-        actor_class: str,
+        actor_type: str,
        detail: str = "",
        cert_identity: Optional[str] = None,
    ) -> None:
        self._dir.mkdir(parents=True, exist_ok=True)
        entry: Dict[str, Any] = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "tunnel": tunnel,
            "actor": actor,
-            "actor_class": actor_class,
+            "actor_type": actor_type,
            "event": event.value,
        }
        if detail:
            entry["detail"] = detail
        if cert_identity:
            entry["cert_identity"] = cert_identity
        with self._log_path(tunnel).open("a") as f:
            f.write(json.dumps(entry) + "\n")
--- a/src/bridge/capabilities.py
+++ b/src/bridge/capabilities.py
@@ -73,6 +73,11 @@ CAPABILITIES: list[Capability] = [
        description="End-to-end tunnel diagnostics via SSH: SSH PID alive + remote port listening",
        required_access_modes=frozenset({"cli", "mcp"}),
    ),
    Capability(
        name="bridge_cert_status",
        description="Show certificate status for tunnels using cert_command mode",
        required_access_modes=frozenset({"cli"}),
    ),
 ]
 CAPABILITIES_BY_NAME: dict[str, Capability] = {c.name: c for c in CAPABILITIES}
--- a/src/bridge/cleanup.py
+++ b/src/bridge/cleanup.py
@@ -0,0 +1,328 @@
 """Nightly maintenance: detect and clear stale SSH remote port forwards."""
 from __future__ import annotations
 import subprocess
 from dataclasses import dataclass
 from typing import Optional
 from urllib.parse import urlparse, urlunparse
 import httpx
 from bridge.diagnostics import _remote_port_probe_command, check_tunnel
 from bridge.manager import TunnelManager
 from bridge.models import TunnelConfig
 from bridge.state import StateManager
@dataclass
 class CleanupAction:
    tunnel: str
    action: str  # skipped | healthy | cleaned | cleaned_and_restarted | error
    detail: str = ""
@dataclass
 class CleanupReport:
    actions: list[CleanupAction]
    @property
    def cleaned_count(self) -> int:
        return sum(1 for a in self.actions if a.action.startswith("cleaned"))
 def remote_forward_health_url(cfg: TunnelConfig) -> Optional[str]:
    """Map the local health_check URL to the remote forwarded port."""
    if cfg.health_check is None or cfg.direction == "local":
        return None
    parsed = urlparse(cfg.health_check.url)
    if not parsed.hostname:
        return None
    netloc = f"{parsed.hostname}:{cfg.remote_port}"
    return urlunparse(parsed._replace(netloc=netloc))
 def _ssh_base_cmd(cfg: TunnelConfig) -> list[str]:
    from pathlib import Path
    return [
        "ssh",
        "-i",
        str(Path(cfg.ssh_key).expanduser()),
        "-o",
        "BatchMode=yes",
        "-o",
        "ConnectTimeout=10",
        "-o",
        "StrictHostKeyChecking=accept-new",
        f"{cfg.ssh_user}@{cfg.host}",
    ]
 def _run_ssh(cfg: TunnelConfig, remote_command: str, *, timeout: float = 30) -> subprocess.CompletedProcess[str]:
    return subprocess.run(
        [*_ssh_base_cmd(cfg), remote_command],
        capture_output=True,
        text=True,
        timeout=timeout,
    )
 def remote_port_listening(cfg: TunnelConfig) -> bool:
    proc = _run_ssh(cfg, _remote_port_probe_command(cfg.remote_port), timeout=15)
    return proc.stdout.strip() == "ok"
 def probe_remote_forward(cfg: TunnelConfig) -> tuple[bool, str]:
    """Return (healthy, detail) for the remote forwarded service."""
    url = remote_forward_health_url(cfg)
    if url is None:
        return True, "no remote health url configured"
    timeout = cfg.health_check.timeout_seconds if cfg.health_check else 5
    remote_cmd = (
        f"curl -sf --max-time {timeout} {url!r} >/dev/null "
        "&& echo ok || echo fail"
    )
    try:
        proc = _run_ssh(cfg, remote_cmd, timeout=timeout + 15)
    except subprocess.TimeoutExpired:
        return False, "remote health probe timed out"
    output = proc.stdout.strip()
    if output == "ok":
        return True, "remote forward healthy"
    if proc.returncode != 0 and proc.stderr.strip():
        return False, proc.stderr.strip()
    return False, "remote forward unhealthy"
 def local_service_healthy(cfg: TunnelConfig) -> Optional[bool]:
    if cfg.health_check is None:
        return None
    try:
        resp = httpx.get(
            cfg.health_check.url,
            timeout=cfg.health_check.timeout_seconds,
        )
        return resp.is_success
    except Exception:
        return False
 def _remote_cleanup_script(port: int) -> str:
    return f"""set -eu
 port={port}
 pids=""
 if command -v lsof >/dev/null 2>&1; then
  pids=$(sudo -n lsof -t -iTCP:$port -sTCP:LISTEN 2>/dev/null || true)
  if [ -z "$pids" ]; then
    pids=$(lsof -t -iTCP:$port -sTCP:LISTEN 2>/dev/null || true)
  fi
 fi
 if [ -z "$pids" ] && command -v fuser >/dev/null 2>&1; then
  pids=$(fuser -n tcp $port 2>/dev/null | tr -s ' ' '\\n' | grep -E '^[0-9]+$' || true)
 fi
 if [ -z "$pids" ]; then
  echo "no_listeners"
  exit 0
 fi
 echo "killing:$pids"
 for pid in $pids; do
  kill "$pid" 2>/dev/null || sudo -n kill "$pid" 2>/dev/null || true
 done
 sleep 1
 if ss -tln 2>/dev/null | grep -q ":$port "; then
  echo "still_listening"
 else
  echo "cleared"
 fi
 """
 def clear_stale_remote_binding(cfg: TunnelConfig) -> tuple[bool, str]:
    try:
        proc = _run_ssh(cfg, _remote_cleanup_script(cfg.remote_port), timeout=30)
    except subprocess.TimeoutExpired:
        return False, "remote cleanup timed out"
    output = proc.stdout.strip()
    if "cleared" in output:
        return True, output
    if "no_listeners" in output:
        return True, "no listeners found"
    if "still_listening" in output:
        return False, output
    detail = output or proc.stderr.strip() or f"exit {proc.returncode}"
    return False, detail
 def should_cleanup_tunnel(
    cfg: TunnelConfig,
    state_mgr: StateManager,
 ) -> tuple[bool, str]:
    """Decide whether a reverse tunnel's remote binding looks stale."""
    if cfg.direction == "local":
        return False, "local tunnel"
    if not remote_port_listening(cfg):
        return False, "remote port closed"
    remote_ok, remote_detail = probe_remote_forward(cfg)
    if remote_ok:
        return False, remote_detail
    check = check_tunnel(cfg, state_mgr)
    local_ok = local_service_healthy(cfg)
    if local_ok is True and not remote_ok:
        return True, f"stale forward: {remote_detail}"
    if check.ssh_process != "ok" and check.remote_port == "listening":
        return True, f"orphan forward while ssh {check.ssh_process}: {remote_detail}"
    if check.ssh_process == "ok" and not remote_ok:
        return True, f"broken forward with live client: {remote_detail}"
    return False, remote_detail
 def cleanup_tunnel(
    cfg: TunnelConfig,
    state_mgr: StateManager,
    *,
    restart: bool,
 ) -> CleanupAction:
    name = cfg.name
    try:
        needed, reason = should_cleanup_tunnel(cfg, state_mgr)
        if not needed:
            return CleanupAction(name, "healthy", reason)
        ok, detail = clear_stale_remote_binding(cfg)
        if not ok:
            return CleanupAction(name, "error", f"cleanup failed: {detail}")
        if not restart:
            return CleanupAction(name, "cleaned", f"{reason}; {detail}")
        mgr = TunnelManager(cfg, state_dir=state_mgr._dir)
        was_running = mgr.is_running()
        if was_running:
            mgr.stop()
        mgr.start()
        action = "cleaned_and_restarted"
        verb = "restarted" if was_running else "started"
        return CleanupAction(name, action, f"{reason}; {verb} tunnel; {detail}")
    except Exception as exc:
        return CleanupAction(name, "error", str(exc))
 def restart_tunnel(
    cfg: TunnelConfig,
    state_mgr: StateManager,
 ) -> CleanupAction:
    """Restart one tunnel with blank-slate recovery for reverse tunnels."""
    if cfg.direction == "local":
        mgr = TunnelManager(cfg, state_dir=state_mgr._dir)
        mgr.stop()
        mgr.start()
        return CleanupAction(cfg.name, "restarted", "local tunnel stop/start")
    return cleanup_tunnel(cfg, state_mgr, restart=True)
 def restart_all_tunnels(
    cfg,
    state_mgr: StateManager,
 ) -> list[CleanupAction]:
    """Restart every inline tunnel (reverse via cleanup path, local via stop/start)."""
    return [restart_tunnel(tcfg, state_mgr) for tcfg in cfg.tunnels.values()]
 def cleanup_all_tunnels(
    cfg,
    state_mgr: StateManager,
    *,
    restart: bool,
    tunnel_name: Optional[str] = None,
 ) -> CleanupReport:
    tunnels = cfg.tunnels.values()
    if tunnel_name is not None:
        if tunnel_name not in cfg.tunnels:
            raise KeyError(tunnel_name)
        tunnels = [cfg.tunnels[tunnel_name]]
    actions = [
        cleanup_tunnel(tcfg, state_mgr, restart=restart)
        for tcfg in tunnels
        if tcfg.direction != "local"
    ]
    return CleanupReport(actions=actions)
 CRON_MARKER = "# ops-bridge: maintenance cleanup"
 CRON_SCHEDULE = "0 3 * * *"
 CRON_LOG = "~/.local/state/bridge/cleanup.log"
 def build_cron_line() -> str:
    bridge_bin = "~/.local/bin/bridge"
    return (
        f"{CRON_SCHEDULE} BRIDGE_CONFIG=~/.config/bridge/tunnels.yaml "
        f"{bridge_bin} maintenance cleanup --restart "
        f">> {CRON_LOG} 2>&1 {CRON_MARKER}"
    )
 def read_installed_cron() -> Optional[str]:
    proc = subprocess.run(["crontab", "-l"], capture_output=True, text=True)
    if proc.returncode != 0:
        return None
    for line in proc.stdout.splitlines():
        if CRON_MARKER in line:
            return line.strip()
    return None
 def install_cleanup_cron() -> tuple[bool, str]:
    existing = read_installed_cron()
    if existing:
        return False, f"cron already installed: {existing}"
    proc = subprocess.run(["crontab", "-l"], capture_output=True, text=True)
    current = proc.stdout if proc.returncode == 0 else ""
    new_line = build_cron_line()
    body = current.rstrip("\n")
    if body:
        body += "\n"
    body += new_line + "\n"
    write = subprocess.run(
        ["crontab", "-"],
        input=body,
        capture_output=True,
        text=True,
    )
    if write.returncode != 0:
        return False, write.stderr.strip() or "crontab write failed"
    return True, new_line
 def uninstall_cleanup_cron() -> tuple[bool, str]:
    proc = subprocess.run(["crontab", "-l"], capture_output=True, text=True)
    if proc.returncode != 0:
        return False, "no crontab installed"
    kept = [
        line
        for line in proc.stdout.splitlines()
        if CRON_MARKER not in line
    ]
    if len(kept) == len(proc.stdout.splitlines()):
        return False, "cleanup cron not found"
    body = "\n".join(kept).rstrip("\n")
    if body:
        body += "\n"
    write = subprocess.run(
        ["crontab", "-"],
        input=body,
        capture_output=True,
        text=True,
    )
    if write.returncode != 0:
        return False, write.stderr.strip() or "crontab write failed"
    return True, "removed cleanup cron entry"
--- a/src/bridge/cli.py
+++ b/src/bridge/cli.py
@@ -4,12 +4,24 @@ from __future__ import annotations
 import dataclasses
 import json
 import os
 import subprocess
 from datetime import datetime
 from pathlib import Path
 from typing import Optional
 import typer
 from bridge.audit import AuditLogger
 from bridge.cleanup import (
    CleanupAction,
    build_cron_line,
    cleanup_all_tunnels,
    install_cleanup_cron,
    read_installed_cron,
    restart_all_tunnels,
    restart_tunnel,
    uninstall_cleanup_cron,
 )
 from bridge.config import ConfigError, load_config
 from bridge.diagnostics import check_all_tunnels, check_tunnel
 from bridge.manager import TunnelManager
@@ -23,9 +35,11 @@ app = typer.Typer(
 targets_app = typer.Typer(help="Inspect infrastructure targets from the OpsCatalog.")
 catalog_app = typer.Typer(help="Inspect and validate the OpsCatalog.")
 maintenance_app = typer.Typer(help="Scheduled maintenance for tunnel hygiene.")
 app.add_typer(targets_app, name="targets")
 app.add_typer(catalog_app, name="catalog")
 app.add_typer(maintenance_app, name="maintenance")
 def _state_dir() -> Path:
@@ -142,27 +156,37 @@ def down(
            raise typer.Exit(2)
 def _emit_restart_actions(actions: list[CleanupAction]) -> None:
    any_error = False
    for action in actions:
        typer.echo(f"{action.tunnel}: {action.action} — {action.detail}")
        if action.action == "error":
            any_error = True
    if any_error:
        raise typer.Exit(1)
@app.command()
 def restart(
    tunnel: Optional[str] = typer.Argument(None, help="Tunnel name (omit for all inline)"),
 ):
-    """Restart one or all tunnels."""
+    """Restart one or all tunnels.
    Reverse tunnels run conditional remote stale-forward cleanup before
    reconnecting; healthy forwards are left running. Local-direction tunnels
    use local stop/start only.
    """
    cfg = _load_or_exit()
    sd = _state_dir()
    state_mgr = StateManager(state_dir=sd)
    if tunnel:
        tcfg = _resolve_tunnel(cfg, tunnel)
-        mgr = TunnelManager(tcfg, state_dir=sd)
+        actions = [restart_tunnel(tcfg, state_mgr)]
        mgr.stop()
        mgr.start()
        typer.echo(f"Restarted tunnel '{tunnel}'.")
    else:
-        for name in _all_tunnel_names(cfg):
+        actions = restart_all_tunnels(cfg, state_mgr)
-            tcfg = cfg.tunnels[name]
+
-            mgr = TunnelManager(tcfg, state_dir=sd)
+    _emit_restart_actions(actions)
            mgr.stop()
            mgr.start()
            typer.echo(f"Restarted tunnel '{name}'.")
@app.command()
@@ -357,6 +381,84 @@ def _print_check_table(results):
        typer.echo(_fmt(row))
@app.command("cert-status")
 def cert_status(
    tunnel: Optional[str] = typer.Argument(None, help="Tunnel name (omit for all inline)"),
    as_json: bool = typer.Option(False, "--json", help="Output as JSON"),
 ):
    """Show certificate status for tunnels using cert_command mode."""
    cfg = _load_or_exit()
    sd = _state_dir()
    names = [tunnel] if tunnel else list(cfg.tunnels.keys())
    rows = []
    any_expired = False
    for name in names:
        cert_file = sd / f"{name}-cert.pub"
        if not cert_file.exists():
            rows.append({"tunnel": name, "mode": "static-key", "cert_file": None})
            continue
        try:
            result = subprocess.run(
                ["ssh-keygen", "-L", "-f", str(cert_file)],
                capture_output=True, text=True, check=False,
            )
            info = {"tunnel": name, "mode": "cert", "cert_file": str(cert_file)}
            for line in result.stdout.splitlines():
                line = line.strip()
                if line.startswith("Key ID:"):
                    info["key_id"] = line.split(":", 1)[1].strip().strip('"')
                elif line.startswith("Valid:"):
                    parts = line.split()
                    if len(parts) >= 5 and parts[1] == "from" and parts[3] == "to":
                        info["valid_from"] = parts[2]
                        info["valid_until"] = parts[4]
                        try:
                            expires = datetime.fromisoformat(parts[4])
                            now = datetime.now()
                            remaining = expires - now
                            if remaining.total_seconds() <= 0:
                                info["expired"] = True
                                any_expired = True
                            else:
                                info["expired"] = False
                                mins = int(remaining.total_seconds() // 60)
                                info["ttl_remaining"] = f"{mins}m"
                        except ValueError:
                            pass
            rows.append(info)
        except FileNotFoundError:
            rows.append({"tunnel": name, "mode": "cert", "error": "ssh-keygen not found"})
    if as_json:
        typer.echo(json.dumps(rows, indent=2))
    else:
        for row in rows:
            mode = row.get("mode", "unknown")
            if mode == "static-key":
                typer.echo(f"{row['tunnel']}  static-key / no cert")
            elif "error" in row:
                typer.echo(f"{row['tunnel']}  ERROR: {row['error']}")
            else:
                parts = [row["tunnel"]]
                if "key_id" in row:
                    parts.append(f"id={row['key_id']}")
                if "valid_from" in row:
                    parts.append(f"from={row['valid_from']}")
                if "valid_until" in row:
                    parts.append(f"until={row['valid_until']}")
                if row.get("expired"):
                    parts.append("EXPIRED")
                elif "ttl_remaining" in row:
                    parts.append(f"ttl={row['ttl_remaining']}")
                typer.echo("  ".join(parts))
    if any_expired:
        raise typer.Exit(1)
 # ─── targets commands ─────────────────────────────────────────────────────────
@targets_app.callback(invoke_without_command=True)
@@ -553,3 +655,119 @@ def catalog_show(
    if b.target in cat.targets:
        t = cat.targets[b.target]
        typer.echo(f"Target:         {t.description or t.id} ({t.kind})")
 _CONVENTIONS_TEXT = """\
 Actor Naming Conventions (from AccessManagementDirective.md §2)
 Every actor declared under `actors:` in ~/.config/bridge/tunnels.yaml must have
 a `class` field, and the actor name must start with the class-specific prefix:
  class   prefix   purpose
  -----   ------   ------------------------------------------------------------
  adm     adm-     Human operator (interactive shell when needed)
  agt     agt-     LLM-powered autonomous agent (Claude Code, etc.)
  atm     atm-     Deterministic script / cron job / pipeline
 Legacy class aliases (deprecated, still accepted with a warning):
  human       -> adm
  automation  -> atm
 Examples:
  adm-bernd:              { class: adm, description: Bernd Worsch }
  agt-claude-coulombcore: { class: agt, description: Claude Code on CoulombCore }
  atm-backup-daily:       { class: atm, description: Nightly DB backup }
 Full specification:
  <ops-bridge repo>/wiki/AccessManagementDirective.md
 """
@maintenance_app.command("cleanup")
 def maintenance_cleanup(
    tunnel: Optional[str] = typer.Argument(
        None,
        help="Tunnel name (omit for all reverse tunnels)",
    ),
    restart: bool = typer.Option(
        False,
        "--restart",
        help="Restart tunnels after clearing stale remote bindings",
    ),
    as_json: bool = typer.Option(False, "--json", help="Output as JSON"),
 ):
    """Clear stale SSH remote port forwards that block tunnel reconnects."""
    cfg = _load_or_exit()
    sd = _state_dir()
    state_mgr = StateManager(state_dir=sd)
    try:
        report = cleanup_all_tunnels(
            cfg,
            state_mgr,
            restart=restart,
            tunnel_name=tunnel,
        )
    except KeyError:
        typer.echo(f"Error: tunnel '{tunnel}' not found in config", err=True)
        raise typer.Exit(1)
    if as_json:
        payload = {
            "cleaned_count": report.cleaned_count,
            "actions": [
                {"tunnel": a.tunnel, "action": a.action, "detail": a.detail}
                for a in report.actions
            ],
        }
        typer.echo(json.dumps(payload, indent=2))
        return
    if not report.actions:
        typer.echo("No reverse tunnels configured.")
        return
    for action in report.actions:
        typer.echo(f"{action.tunnel}: {action.action} — {action.detail}")
    typer.echo(f"done ({report.cleaned_count} cleaned)")
@maintenance_app.command("install-cron")
 def maintenance_install_cron():
    """Install a 03:00 daily cron job for `bridge maintenance cleanup --restart`."""
    installed, message = install_cleanup_cron()
    if installed:
        typer.echo("Installed nightly cleanup cron:")
        typer.echo(f"  {message}")
    else:
        typer.echo(message)
        raise typer.Exit(2)
@maintenance_app.command("uninstall-cron")
 def maintenance_uninstall_cron():
    """Remove the nightly cleanup cron job."""
    removed, message = uninstall_cleanup_cron()
    if removed:
        typer.echo(message)
    else:
        typer.echo(message)
        raise typer.Exit(2)
@maintenance_app.command("show-cron")
 def maintenance_show_cron():
    """Show the configured nightly cleanup cron line."""
    existing = read_installed_cron()
    if existing:
        typer.echo(existing)
    else:
        typer.echo("Nightly cleanup cron is not installed.")
        typer.echo("Would install:")
        typer.echo(f"  {build_cron_line()}")
@app.command()
 def conventions():
    """Show the actor naming conventions enforced by tunnels.yaml."""
    typer.echo(_CONVENTIONS_TEXT)
--- a/src/bridge/config.py
+++ b/src/bridge/config.py
@@ -2,13 +2,14 @@
 from __future__ import annotations
 import os
 import warnings
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Dict, Optional
 import yaml
-from bridge.models import ActorInfo, HealthCheckConfig, ReconnectPolicy, TunnelConfig
+from bridge.models import ActorInfo, ActorType, HealthCheckConfig, ReconnectPolicy, TunnelConfig
 class ConfigError(Exception):
@@ -91,6 +92,10 @@ def _parse_tunnel(name: str, data: dict) -> TunnelConfig:
    if direction not in ("reverse", "local"):
        raise ConfigError(f"Tunnel '{name}' direction must be 'reverse' or 'local', got: {direction!r}")
    cert_command = data.get("cert_command") or None
    if cert_command is not None:
        cert_command = str(cert_command)
    return TunnelConfig(
        name=name,
        host=str(data["host"]),
@@ -102,6 +107,39 @@ def _parse_tunnel(name: str, data: dict) -> TunnelConfig:
        reconnect=reconnect,
        health_check=health_check,
        direction=direction,
        remote_host=str(data.get("remote_host", "127.0.0.1")),
        cert_command=cert_command,
    )
 _LEGACY_CLASS_MAP = {
    "human": ActorType.ADM,
    "automation": ActorType.ATM,
 }
 _ACTOR_TYPE_PREFIXES = {
    ActorType.ADM: "adm-",
    ActorType.AGT: "agt-",
    ActorType.ATM: "atm-",
 }
 def _parse_actor_type(name: str, raw_class: str) -> ActorType:
    if raw_class in _LEGACY_CLASS_MAP:
        warnings.warn(
            f"Actor '{name}': class '{raw_class}' is deprecated; "
            f"use '{_LEGACY_CLASS_MAP[raw_class].value}' instead.",
            DeprecationWarning,
            stacklevel=4,
        )
        return _LEGACY_CLASS_MAP[raw_class]
    try:
        return ActorType(raw_class)
    except ValueError:
        raise ConfigError(
            f"Actor '{name}' has unknown class '{raw_class}'; "
            f"must be one of: adm, agt, atm (or legacy: human, automation). "
            f"Run `bridge conventions` for the full naming rules."
        )
@@ -112,9 +150,17 @@ def _parse_actors(raw: dict) -> Dict[str, ActorInfo]:
            raise ConfigError(f"Actor '{name}' must be a mapping")
        if "class" not in data:
            raise ConfigError(f"Actor '{name}' missing required field: class")
        actor_type = _parse_actor_type(name, str(data["class"]))
        required_prefix = _ACTOR_TYPE_PREFIXES[actor_type]
        if not name.startswith(required_prefix):
            raise ConfigError(
                f"Actor '{name}' has type '{actor_type.value}' but name must start "
                f"with '{required_prefix}' (got '{name}'). "
                f"Run `bridge conventions` for the full naming rules."
            )
        actors[name] = ActorInfo(
            name=name,
-            actor_class=str(data["class"]),
+            actor_type=actor_type,
            description=str(data.get("description", "")),
        )
    return actors
--- a/src/bridge/diagnostics.py
+++ b/src/bridge/diagnostics.py
@@ -1,6 +1,7 @@
 """End-to-end tunnel diagnostics for OpsBridge."""
 from __future__ import annotations
 import socket
 import subprocess
 import time
 from dataclasses import dataclass
@@ -13,6 +14,38 @@ from bridge.models import BridgeState, TunnelConfig
 from bridge.state import StateManager, _pid_alive
 def _remote_port_probe_command(remote_port: int) -> str:
    """Build a portable remote shell probe for a listening TCP port."""
    return (
        f"port={remote_port}; "
        "if command -v ss >/dev/null 2>&1; then "
        "ss -tnlp 2>/dev/null | grep -q \":$port \" && echo ok || echo closed; "
        "elif command -v netstat >/dev/null 2>&1; then "
        "netstat -tnlp 2>/dev/null | "
        "grep -q \"[.:]$port[[:space:]]\" && echo ok || echo closed; "
        "else "
        "hex=$(printf '%04X' \"$port\"); "
        "awk -v p=\":$hex\" "
        "'NR > 1 && $4 == \"0A\" && index($2, p) { found = 1 } "
        "END { print found ? \"ok\" : \"closed\" }' "
        "/proc/net/tcp /proc/net/tcp6 2>/dev/null; "
        "fi"
    )
 def _probe_local_port(local_port: int) -> str:
    """Check whether the local side of an SSH -L tunnel is accepting TCP."""
    try:
        with socket.create_connection(("127.0.0.1", local_port), timeout=5):
            return "listening"
    except ConnectionRefusedError:
        return "closed"
    except socket.timeout:
        return "error:timeout"
    except OSError as e:
        return f"error:{e}"
@dataclass
 class TunnelCheckResult:
    tunnel: str
@@ -52,7 +85,10 @@ def check_tunnel(cfg: TunnelConfig, state_mgr: StateManager) -> TunnelCheckResul
        and ssh_process != "ok"
    )
-    # 3. SSH probe for remote port
+    # 3. Port probe: reverse tunnels listen remotely; local tunnels listen here.
    if cfg.direction == "local":
        remote_port = _probe_local_port(cfg.local_port)
    else:
        key_path = str(Path(cfg.ssh_key).expanduser())
        cmd = [
            "ssh",
@@ -61,7 +97,7 @@ def check_tunnel(cfg: TunnelConfig, state_mgr: StateManager) -> TunnelCheckResul
            "-o", "ConnectTimeout=5",
            "-o", "StrictHostKeyChecking=accept-new",
            f"{cfg.ssh_user}@{cfg.host}",
-        f"ss -tnlp 2>/dev/null | grep -q ':{cfg.remote_port} ' && echo ok || echo closed",
+            _remote_port_probe_command(cfg.remote_port),
        ]
        try:
            proc = subprocess.run(
--- a/src/bridge/manager.py
+++ b/src/bridge/manager.py
@@ -6,35 +6,102 @@ import os
 import signal
 import subprocess
 import time
 from datetime import datetime, timedelta
 from pathlib import Path
 from typing import List, Optional
 from bridge.audit import AuditEvent, AuditLogger
 from bridge.health import HealthChecker
-from bridge.models import BridgeState, TunnelConfig
+from bridge.models import BridgeState, CertAcquisitionError, TunnelConfig
 from bridge.state import StateManager
 log = logging.getLogger(__name__)
-def build_ssh_command(cfg: TunnelConfig) -> List[str]:
+def _actor_type_from_name(name: str) -> str:
    for prefix in ("adm", "agt", "atm"):
        if name.startswith(f"{prefix}-"):
            return prefix
    return "unknown"
 def build_ssh_command(cfg: TunnelConfig, cert_path: Optional[Path] = None) -> List[str]:
    """Build the SSH tunnel command (reverse -R or local -L)."""
    key = os.path.expanduser(cfg.ssh_key)
    if cfg.direction == "local":
-        forward_flag = ["-L", f"{cfg.local_port}:127.0.0.1:{cfg.remote_port}"]
+        forward_flag = ["-L", f"{cfg.local_port}:{cfg.remote_host}:{cfg.remote_port}"]
    else:
-        forward_flag = ["-R", f"{cfg.remote_port}:127.0.0.1:{cfg.local_port}"]
+        forward_flag = ["-R", f"{cfg.remote_port}:{cfg.remote_host}:{cfg.local_port}"]
-    return [
+    cmd = [
        "ssh",
        "-N",
        *forward_flag,
        "-i", key,
    ]
    if cert_path is not None:
        cmd += ["-i", str(cert_path)]
    cmd += [
        "-o", "ServerAliveInterval=10",
        "-o", "ServerAliveCountMax=3",
        "-o", "ExitOnForwardFailure=yes",
        "-o", "StrictHostKeyChecking=accept-new",
        f"{cfg.ssh_user}@{cfg.host}",
    ]
    return cmd
 def _run_cert_command(cfg: TunnelConfig, state_dir: Path) -> Optional[Path]:
    """Run cert_command and write cert to state dir. Returns cert path or None."""
    if cfg.cert_command is None:
        return None
    result = subprocess.run(
        cfg.cert_command,
        shell=True,
        capture_output=True,
        text=True,
    )
    if result.returncode != 0:
        raise CertAcquisitionError(result.stderr.strip())
    cert_path = state_dir / f"{cfg.name}-cert.pub"
    cert_path.write_text(result.stdout)
    return cert_path
 def _parse_cert_identity(cert_path: Path) -> Optional[str]:
    """Parse Key ID from ssh-keygen -L output."""
    try:
        result = subprocess.run(
            ["ssh-keygen", "-L", "-f", str(cert_path)],
            capture_output=True,
            text=True,
        )
        for line in result.stdout.splitlines():
            line = line.strip()
            if line.startswith("Key ID:"):
                return line.split(":", 1)[1].strip().strip('"')
    except Exception:
        pass
    return None
 def _parse_cert_expiry(cert_path: Path) -> Optional[datetime]:
    """Parse Valid-before datetime from ssh-keygen -L output."""
    try:
        result = subprocess.run(
            ["ssh-keygen", "-L", "-f", str(cert_path)],
            capture_output=True,
            text=True,
        )
        for line in result.stdout.splitlines():
            line = line.strip()
            if line.startswith("Valid:"):
                # "Valid: from 2026-05-15T10:00:00 to 2026-05-15T22:00:00"
                parts = line.split()
                if len(parts) >= 5 and parts[3] == "to":
                    return datetime.fromisoformat(parts[4])
    except Exception:
        pass
    return None
 class TunnelManager:
@@ -56,7 +123,8 @@ class TunnelManager:
        return self._state.is_running(self._cfg.name)
    def _actor_info(self):
-        return self._cfg.actor, "unknown"
+        actor = self._cfg.actor
        return actor, _actor_type_from_name(actor)
    def _next_backoff(self, attempt: int) -> int:
        initial = self._cfg.reconnect.backoff_initial
@@ -71,12 +139,12 @@ class TunnelManager:
            return
        self._state.write_state(self._cfg.name, BridgeState.STARTING)
-        actor, actor_class = self._actor_info()
+        actor, actor_type = self._actor_info()
        self._audit.log(
            tunnel=self._cfg.name,
            event=AuditEvent.BRIDGE_STARTED,
            actor=actor,
-            actor_class=actor_class,
+            actor_type=actor_type,
        )
        pid = os.fork()
@@ -99,7 +167,7 @@ class TunnelManager:
                tunnel=self._cfg.name,
                event=AuditEvent.BRIDGE_STOPPED,
                actor=actor,
-                actor_class=actor_class,
+                actor_type=actor_type,
            )
        os._exit(0)
@@ -131,12 +199,12 @@ class TunnelManager:
        self._state.clear_pid(self._cfg.name)
        self._state.write_state(self._cfg.name, BridgeState.STOPPED)
-        actor, actor_class = self._actor_info()
+        actor, actor_type = self._actor_info()
        self._audit.log(
            tunnel=self._cfg.name,
            event=AuditEvent.BRIDGE_STOPPED,
            actor=actor,
-            actor_class=actor_class,
+            actor_type=actor_type,
        )
    def _run_loop(self) -> None:
@@ -144,11 +212,11 @@ class TunnelManager:
        import asyncio
        cfg = self._cfg
-        actor, actor_class = self._actor_info()
+        actor, actor_type = self._actor_info()
        attempt = 0
        max_attempts = cfg.reconnect.max_attempts  # 0 = infinite
        state_dir = self._state._dir
        # Setup signal handler for graceful shutdown
        _stop = [False]
        def _on_term(signum, frame):
@@ -162,7 +230,31 @@ class TunnelManager:
                self._state.write_state(cfg.name, BridgeState.FAILED)
                break
-            cmd = build_ssh_command(cfg)
+            # Acquire cert before each SSH launch (T3, T7)
            try:
                cert_path = _run_cert_command(cfg, state_dir)
            except CertAcquisitionError as e:
                self._audit.log(
                    tunnel=cfg.name,
                    event=AuditEvent.BRIDGE_DISCONNECTED,
                    actor=actor,
                    actor_type=actor_type,
                    detail=f"cert acquisition failed: {e}",
                )
                attempt += 1
                if max_attempts > 0 and attempt >= max_attempts:
                    self._state.write_state(cfg.name, BridgeState.FAILED)
                    break
                backoff = self._next_backoff(attempt - 1)
                self._state.write_state(cfg.name, BridgeState.RECONNECTING)
                log.info("Cert acquisition failed, retrying in %ds", backoff)
                time.sleep(backoff)
                continue
            cert_identity = _parse_cert_identity(cert_path) if cert_path else None
            cert_expires_at = _parse_cert_expiry(cert_path) if cert_path else None
            cmd = build_ssh_command(cfg, cert_path=cert_path)
            log.info("Starting SSH: %s", " ".join(cmd))
            self._state.write_state(cfg.name, BridgeState.STARTING)
@@ -174,24 +266,30 @@ class TunnelManager:
                    tunnel=cfg.name,
                    event=AuditEvent.BRIDGE_DISCONNECTED,
                    actor=actor,
-                    actor_class=actor_class,
+                    actor_type=actor_type,
                    detail="ssh binary not found",
                )
                break
            # Wait briefly then assume connected if still running
            time.sleep(2)
            _ttl_refresh = False
            if proc.poll() is None:
                self._state.write_state(cfg.name, BridgeState.CONNECTED)
                self._audit.log(
                    tunnel=cfg.name,
                    event=AuditEvent.BRIDGE_CONNECTED,
                    actor=actor,
-                    actor_class=actor_class,
+                    actor_type=actor_type,
                    cert_identity=cert_identity,
                )
                attempt = 0
-                # Health check loop
+                def _check_ttl() -> bool:
                    """Return True if cert is within 5 min of expiry and SSH should restart."""
                    if cert_expires_at is None:
                        return False
                    return datetime.now() >= cert_expires_at - timedelta(minutes=5)
                if cfg.health_check:
                    checker = HealthChecker(
                        url=cfg.health_check.url,
@@ -199,6 +297,18 @@ class TunnelManager:
                    )
                    health_failing = False
                    while not _stop[0] and proc.poll() is None:
                        if _check_ttl():
                            self._audit.log(
                                tunnel=cfg.name,
                                event=AuditEvent.CERT_EXPIRING,
                                actor=actor,
                                actor_type=actor_type,
                                cert_identity=cert_identity,
                                detail=str(cert_expires_at),
                            )
                            proc.terminate()
                            _ttl_refresh = True
                            break
                        result = asyncio.run(checker.check())
                        if result.ok:
                            if health_failing:
@@ -208,7 +318,7 @@ class TunnelManager:
                                    tunnel=cfg.name,
                                    event=AuditEvent.HEALTH_CHECK_RECOVERED,
                                    actor=actor,
-                                    actor_class=actor_class,
+                                    actor_type=actor_type,
                                )
                        else:
                            if not health_failing:
@@ -218,21 +328,36 @@ class TunnelManager:
                                    tunnel=cfg.name,
                                    event=AuditEvent.HEALTH_CHECK_FAILED,
                                    actor=actor,
-                                    actor_class=actor_class,
+                                    actor_type=actor_type,
                                    detail=result.error or f"HTTP {result.status_code}",
                                )
                        time.sleep(cfg.health_check.interval_seconds)
                else:
                    while not _stop[0] and proc.poll() is None:
                        if _check_ttl():
                            self._audit.log(
                                tunnel=cfg.name,
                                event=AuditEvent.CERT_EXPIRING,
                                actor=actor,
                                actor_type=actor_type,
                                cert_identity=cert_identity,
                                detail=str(cert_expires_at),
                            )
                            proc.terminate()
                            _ttl_refresh = True
                            break
                        time.sleep(1)
-            # SSH exited
+            if _ttl_refresh:
                # Planned cert refresh — don't count as failure, no backoff
                continue
            if proc.poll() is not None:
                self._audit.log(
                    tunnel=cfg.name,
                    event=AuditEvent.BRIDGE_DISCONNECTED,
                    actor=actor,
-                    actor_class=actor_class,
+                    actor_type=actor_type,
                    detail=f"exit code {proc.returncode}",
                )
@@ -248,7 +373,7 @@ class TunnelManager:
                tunnel=cfg.name,
                event=AuditEvent.BRIDGE_RECONNECTING,
                actor=actor,
-                actor_class=actor_class,
+                actor_type=actor_type,
                detail=f"retry {attempt}, backoff {backoff}s",
            )
            log.info("Reconnecting in %ds (attempt %d)", backoff, attempt)
--- a/src/bridge/mcp_server/server.py
+++ b/src/bridge/mcp_server/server.py
@@ -169,19 +169,22 @@ def bridge_down(tunnel: Optional[str] = None) -> dict:
 def bridge_restart(tunnel: Optional[str] = None) -> dict:
    """Restart one or all configured tunnels.
    Reverse tunnels run conditional remote stale-forward cleanup before
    reconnecting; healthy forwards are left running.
    Args:
        tunnel: Tunnel name to restart. If omitted, restarts all inline tunnels.
    Returns:
-        {"restarted": [...]} or {"error": "..."}
+        {"actions": [{"tunnel", "action", "detail"}, ...]} or {"error": "..."}
    """
    cfg, err = _load_cfg_or_error()
    if err:
        return err
-    from bridge.manager import TunnelManager
+    from bridge.cleanup import restart_all_tunnels, restart_tunnel
    sd = _state_dir()
-    restarted = []
+    state_mgr = StateManager(state_dir=sd)
    if tunnel:
        from bridge.catalog.loader import load_catalog
@@ -196,18 +199,19 @@ def bridge_restart(tunnel: Optional[str] = None) -> dict:
            tcfg = resolve(tunnel, catalog=catalog, inline_tunnels=cfg.tunnels)
        except BridgeNotFound:
            return {"error": f"Tunnel '{tunnel}' not found in config or catalog"}
-        mgr = TunnelManager(tcfg, state_dir=sd)
+        actions = [restart_tunnel(tcfg, state_mgr)]
        mgr.stop()
        mgr.start()
        restarted.append(tunnel)
    else:
-        for name, tcfg in cfg.tunnels.items():
+        actions = restart_all_tunnels(cfg, state_mgr)
            mgr = TunnelManager(tcfg, state_dir=sd)
            mgr.stop()
            mgr.start()
            restarted.append(name)
-    return {"restarted": restarted}
+    payload = {
        "actions": [
            {"tunnel": a.tunnel, "action": a.action, "detail": a.detail}
            for a in actions
        ],
    }
    if any(a.action == "error" for a in actions):
        payload["error"] = "one or more tunnels failed to restart"
    return payload
@mcp.tool()
@@ -513,4 +517,13 @@ def resource_catalog_targets() -> str:
 # ---------------------------------------------------------------------------
 if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser(description="OpsBridge MCP server")
    parser.add_argument("--http", action="store_true", help="Run in SSE/HTTP mode instead of stdio")
    args = parser.parse_args()
    if args.http:
        port = int(os.environ.get("BRIDGE_MCP_PORT", "8002"))
        mcp.run(transport="sse", host="127.0.0.1", port=port)
    else:
        mcp.run(transport="stdio")
--- a/src/bridge/models.py
+++ b/src/bridge/models.py
@@ -15,6 +15,16 @@ class BridgeState(str, Enum):
    FAILED = "failed"
 class ActorType(str, Enum):
    ADM = "adm"  # human operator
    AGT = "agt"  # LLM-powered autonomous agent
    ATM = "atm"  # deterministic script / pipeline
 class CertAcquisitionError(Exception):
    """Raised when cert_command fails to produce a certificate."""
@dataclass
 class ReconnectPolicy:
    max_attempts: int = 0  # 0 = infinite
@@ -41,10 +51,15 @@ class TunnelConfig:
    reconnect: ReconnectPolicy = field(default_factory=ReconnectPolicy)
    health_check: Optional[HealthCheckConfig] = None
    direction: str = "reverse"  # "reverse" (-R) or "local" (-L)
    # Forward-destination host as seen from the remote end (direction "local")
    # or from this workstation (direction "reverse"). Defaults to loopback;
    # set e.g. a k3s ClusterIP to tunnel to an in-cluster Service.
    remote_host: str = "127.0.0.1"
    cert_command: Optional[str] = None
@dataclass
 class ActorInfo:
    name: str
-    actor_class: str  # "human" or "automation"
+    actor_type: ActorType
    description: str = ""
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -23,10 +23,10 @@ VALID_CONFIG = textwrap.dedent("""\
        local_port: 8000
        ssh_user: ubuntu
        ssh_key: ~/.ssh/id_ops
-        actor: operator.bernd
+        actor: adm-bernd
    actors:
-      operator.bernd:
+      adm-bernd:
-        class: human
+        class: adm
        description: Bernd
 """)
@@ -38,10 +38,10 @@ VALID_CONFIG_WITH_CATALOG = textwrap.dedent("""\
        local_port: 8000
        ssh_user: ubuntu
        ssh_key: ~/.ssh/id_ops
-        actor: operator.bernd
+        actor: adm-bernd
    actors:
-      operator.bernd:
+      adm-bernd:
-        class: human
+        class: adm
        description: Bernd
    catalog_path: {catalog_path}
 """)
--- a/tests/test_audit.py
+++ b/tests/test_audit.py
@@ -22,7 +22,7 @@ class TestAuditLogger:
            tunnel="my-tunnel",
            event=AuditEvent.BRIDGE_STARTED,
            actor="operator.bernd",
-            actor_class="human",
+            actor_type="adm",
        )
        log_file = log_dir / "my-tunnel.log"
        assert log_file.exists()
@@ -32,7 +32,7 @@ class TestAuditLogger:
            tunnel="my-tunnel",
            event=AuditEvent.BRIDGE_STARTED,
            actor="operator.bernd",
-            actor_class="human",
+            actor_type="adm",
        )
        lines = (log_dir / "my-tunnel.log").read_text().strip().splitlines()
        assert len(lines) == 1
@@ -40,12 +40,12 @@ class TestAuditLogger:
        assert entry["tunnel"] == "my-tunnel"
        assert entry["event"] == "bridge_started"
        assert entry["actor"] == "operator.bernd"
-        assert entry["actor_class"] == "human"
+        assert entry["actor_type"] == "adm"
        assert "timestamp" in entry
    def test_multiple_events_append(self, logger, log_dir):
        for event in [AuditEvent.BRIDGE_STARTED, AuditEvent.BRIDGE_CONNECTED, AuditEvent.BRIDGE_STOPPED]:
-            logger.log(tunnel="t", event=event, actor="a", actor_class="human")
+            logger.log(tunnel="t", event=event, actor="a", actor_type="adm")
        lines = (log_dir / "t.log").read_text().strip().splitlines()
        assert len(lines) == 3
@@ -54,7 +54,7 @@ class TestAuditLogger:
            tunnel="t",
            event=AuditEvent.HEALTH_CHECK_FAILED,
            actor="a",
-            actor_class="automation",
+            actor_type="atm",
            detail="connection refused",
        )
        entry = json.loads((log_dir / "t.log").read_text().strip())
@@ -72,15 +72,15 @@ class TestAuditLogger:
    def test_timestamp_is_iso8601(self, logger, log_dir):
        from datetime import datetime
-        logger.log(tunnel="t", event=AuditEvent.BRIDGE_STOPPED, actor="a", actor_class="human")
+        logger.log(tunnel="t", event=AuditEvent.BRIDGE_STOPPED, actor="a", actor_type="adm")
        entry = json.loads((log_dir / "t.log").read_text().strip())
        # Should parse without error
        dt = datetime.fromisoformat(entry["timestamp"])
        assert dt.tzinfo is not None or True  # UTC or naive both acceptable
    def test_read_events(self, logger, log_dir):
-        logger.log(tunnel="t", event=AuditEvent.BRIDGE_STARTED, actor="a", actor_class="human")
+        logger.log(tunnel="t", event=AuditEvent.BRIDGE_STARTED, actor="a", actor_type="adm")
-        logger.log(tunnel="t", event=AuditEvent.BRIDGE_STOPPED, actor="a", actor_class="human")
+        logger.log(tunnel="t", event=AuditEvent.BRIDGE_STOPPED, actor="a", actor_type="adm")
        events = logger.read_events("t")
        assert len(events) == 2
        assert events[0]["event"] == "bridge_started"
--- a/tests/test_cleanup.py
+++ b/tests/test_cleanup.py
@@ -0,0 +1,130 @@
 """Tests for stale SSH forward cleanup."""
 from __future__ import annotations
 import textwrap
 from unittest.mock import MagicMock, patch
 from typer.testing import CliRunner
 from bridge.cleanup import (
    CleanupAction,
    build_cron_line,
    cleanup_all_tunnels,
    remote_forward_health_url,
    should_cleanup_tunnel,
 )
 from bridge.cli import app
 from bridge.config import load_config
 from bridge.models import HealthCheckConfig, TunnelConfig
 from bridge.state import StateManager
 def _tunnel(**overrides) -> TunnelConfig:
    base = dict(
        name="state-hub-railiance01",
        host="92.205.62.239",
        remote_port=18000,
        local_port=8000,
        ssh_user="tegwick",
        ssh_key="~/.ssh/id_ops",
        actor="agt-claude-railiance01",
        health_check=HealthCheckConfig(
            url="http://127.0.0.1:8000/state/health",
            timeout_seconds=5,
        ),
    )
    base.update(overrides)
    return TunnelConfig(**base)
 class TestRemoteForwardHealthUrl:
    def test_maps_local_port_to_remote(self):
        cfg = _tunnel()
        assert remote_forward_health_url(cfg) == "http://127.0.0.1:18000/state/health"
    def test_returns_none_for_local_tunnel(self):
        cfg = _tunnel(direction="local")
        assert remote_forward_health_url(cfg) is None
 class TestShouldCleanupTunnel:
    def test_skips_healthy_remote_forward(self, tmp_path):
        cfg = _tunnel()
        state_mgr = StateManager(state_dir=tmp_path)
        with (
            patch("bridge.cleanup.remote_port_listening", return_value=True),
            patch("bridge.cleanup.probe_remote_forward", return_value=(True, "ok")),
        ):
            needed, reason = should_cleanup_tunnel(cfg, state_mgr)
        assert needed is False
    def test_detects_stale_forward_when_local_ok_remote_fails(self, tmp_path):
        cfg = _tunnel()
        state_mgr = StateManager(state_dir=tmp_path)
        with (
            patch("bridge.cleanup.remote_port_listening", return_value=True),
            patch("bridge.cleanup.probe_remote_forward", return_value=(False, "timeout")),
            patch("bridge.cleanup.local_service_healthy", return_value=True),
            patch(
                "bridge.cleanup.check_tunnel",
                return_value=MagicMock(ssh_process="ok", remote_port="listening"),
            ),
        ):
            needed, reason = should_cleanup_tunnel(cfg, state_mgr)
        assert needed is True
        assert "stale forward" in reason
 class TestCleanupAllTunnels:
    def test_reports_cleaned_tunnel(self, tmp_path, monkeypatch):
        monkeypatch.setenv("BRIDGE_CONFIG", str(tmp_path / "tunnels.yaml"))
        (tmp_path / "tunnels.yaml").write_text(
            textwrap.dedent(
                """\
                tunnels:
                  state-hub-railiance01:
                    host: 92.205.62.239
                    remote_port: 18000
                    local_port: 8000
                    ssh_user: tegwick
                    ssh_key: ~/.ssh/id_ops
                    actor: agt-claude-railiance01
                    health_check:
                      url: http://127.0.0.1:8000/state/health
                actors:
                  agt-claude-railiance01:
                    class: agt
                """
            )
        )
        cfg = load_config()
        state_mgr = StateManager(state_dir=tmp_path / "state")
        with patch(
            "bridge.cleanup.cleanup_tunnel",
            return_value=CleanupAction("state-hub-railiance01", "cleaned", "cleared"),
        ):
            report = cleanup_all_tunnels(cfg, state_mgr, restart=False)
        assert report.cleaned_count == 1
        assert report.actions[0].action == "cleaned"
 class TestMaintenanceCli:
    def test_cleanup_help(self):
        runner = CliRunner()
        result = runner.invoke(app, ["maintenance", "cleanup", "--help"])
        assert result.exit_code == 0
        assert "restart" in result.output.lower()
    def test_show_cron_prints_template_when_not_installed(self):
        runner = CliRunner()
        with patch("bridge.cli.read_installed_cron", return_value=None):
            result = runner.invoke(app, ["maintenance", "show-cron"])
        assert result.exit_code == 0
        assert "0 3 * * *" in result.output
 def test_build_cron_line_contains_marker():
    line = build_cron_line()
    assert "0 3 * * *" in line
    assert "maintenance cleanup --restart" in line
    assert "ops-bridge: maintenance cleanup" in line
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@@ -17,10 +17,10 @@ VALID_CONFIG = textwrap.dedent("""\
        local_port: 8000
        ssh_user: ubuntu
        ssh_key: ~/.ssh/id_ops
-        actor: operator.bernd
+        actor: adm-bernd
    actors:
-      operator.bernd:
+      adm-bernd:
-        class: human
+        class: adm
        description: Bernd
 """)
@@ -266,22 +266,146 @@ class TestCheckCommand:
        assert result.exit_code == 1
 REVERSE_CONFIG = VALID_CONFIG
 LOCAL_TUNNEL_CONFIG = textwrap.dedent("""\
    tunnels:
      k3s-api:
        host: host.local
        remote_port: 6443
        local_port: 6443
        ssh_user: ubuntu
        ssh_key: ~/.ssh/id_ops
        actor: adm-bernd
        direction: local
    actors:
      adm-bernd:
        class: adm
        description: Bernd
 """)
 class TestRestartCommand:
    def test_restart_unknown_tunnel_exit_1(self, env):
        result = runner.invoke(app, ["restart", "nonexistent"], env=env)
        assert result.exit_code == 1
    def test_restart_help_mentions_remote_cleanup(self):
        result = runner.invoke(app, ["restart", "--help"])
        assert result.exit_code == 0
        assert "stale-forward" in result.output.lower() or "remote" in result.output.lower()
    @pytest.mark.capability("bridge_restart")
    @pytest.mark.access_mode("cli")
-    def test_restart_calls_stop_then_start(self, env):
+    def test_restart_reverse_tunnel_delegates_to_cleanup(self, env):
-        with patch("bridge.cli.TunnelManager") as mock_mgr_cls:
+        from bridge.cleanup import CleanupAction
        with patch("bridge.cli.restart_tunnel") as mock_restart:
            mock_restart.return_value = CleanupAction(
                "test-tunnel", "healthy", "remote forward healthy"
            )
            result = runner.invoke(app, ["restart", "test-tunnel"], env=env)
        assert result.exit_code == 0
        mock_restart.assert_called_once()
        assert "test-tunnel: healthy" in result.output
    def test_restart_reverse_tunnel_reports_cleaned_and_restarted(self, env):
        from bridge.cleanup import CleanupAction
        with patch("bridge.cli.restart_tunnel") as mock_restart:
            mock_restart.return_value = CleanupAction(
                "test-tunnel",
                "cleaned_and_restarted",
                "stale forward; restarted tunnel; cleared",
            )
            result = runner.invoke(app, ["restart", "test-tunnel"], env=env)
        assert result.exit_code == 0
        assert "cleaned_and_restarted" in result.output
    def test_restart_reverse_tunnel_error_exit_1(self, env):
        from bridge.cleanup import CleanupAction
        with patch("bridge.cli.restart_tunnel") as mock_restart:
            mock_restart.return_value = CleanupAction(
                "test-tunnel", "error", "cleanup failed: still_listening"
            )
            result = runner.invoke(app, ["restart", "test-tunnel"], env=env)
        assert result.exit_code == 1
        assert "error" in result.output
    def test_restart_local_tunnel_uses_stop_start(self, tmp_path, state_dir):
        config_file = tmp_path / "tunnels.yaml"
        config_file.write_text(LOCAL_TUNNEL_CONFIG)
        env = {
            "BRIDGE_CONFIG": str(config_file),
            "BRIDGE_STATE_DIR": str(state_dir),
        }
        with patch("bridge.cleanup.TunnelManager") as mock_mgr_cls:
            mock_mgr = MagicMock()
            mock_mgr_cls.return_value = mock_mgr
            call_order = []
            mock_mgr.stop.side_effect = lambda: call_order.append("stop")
            mock_mgr.start.side_effect = lambda: call_order.append("start")
-            result = runner.invoke(app, ["restart", "test-tunnel"], env=env)
+            result = runner.invoke(app, ["restart", "k3s-api"], env=env)
        assert result.exit_code == 0
        assert call_order == ["stop", "start"]
        assert "k3s-api: restarted" in result.output
 class TestCertStatusCommand:
    @pytest.mark.capability("bridge_cert_status")
    @pytest.mark.access_mode("cli")
    def test_cert_status_no_cert_shows_static_key(self, env, state_dir):
        result = runner.invoke(app, ["cert-status"], env=env)
        assert result.exit_code == 0
        assert "static-key" in result.output
    def test_cert_status_json_no_cert(self, env, state_dir):
        result = runner.invoke(app, ["cert-status", "--json"], env=env)
        assert result.exit_code == 0
        data = json.loads(result.output)
        assert data[0]["mode"] == "static-key"
    def test_cert_status_exit_1_on_expired(self, env, state_dir, tmp_path):
        # Write a fake cert file in state dir; mock ssh-keygen to report expired
        state_dir.mkdir(parents=True, exist_ok=True)
        cert_file = state_dir / "test-tunnel-cert.pub"
        cert_file.write_text("fake cert")
        with patch("subprocess.run") as mock_run:
            mock_run.return_value = MagicMock(
                stdout=(
                    "test-tunnel-cert.pub:\n"
                    "        Key ID: \"agt-test\"\n"
                    "        Valid: from 2026-01-01T00:00:00 to 2026-01-02T00:00:00\n"
                ),
                returncode=0,
            )
            result = runner.invoke(app, ["cert-status"], env=env)
        assert result.exit_code == 1
        assert "EXPIRED" in result.output
    def test_cert_status_json_with_cert(self, env, state_dir):
        state_dir.mkdir(parents=True, exist_ok=True)
        cert_file = state_dir / "test-tunnel-cert.pub"
        cert_file.write_text("fake cert")
        with patch("subprocess.run") as mock_run:
            mock_run.return_value = MagicMock(
                stdout=(
                    "test-tunnel-cert.pub:\n"
                    "        Key ID: \"agt-test\"\n"
                    "        Valid: from 2030-01-01T00:00:00 to 2030-01-02T00:00:00\n"
                ),
                returncode=0,
            )
            result = runner.invoke(app, ["cert-status", "--json"], env=env)
        assert result.exit_code == 0
        data = json.loads(result.output)
        assert data[0]["mode"] == "cert"
        assert data[0]["key_id"] == "agt-test"
        assert data[0]["expired"] is False
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -1,9 +1,11 @@
 """Tests for config loading."""
 import textwrap
 import warnings
 import pytest
 from bridge.config import ConfigError, load_config
 from bridge.models import ActorType
 VALID_YAML = textwrap.dedent("""\
@@ -14,7 +16,7 @@ VALID_YAML = textwrap.dedent("""\
        local_port: 8000
        ssh_user: ubuntu
        ssh_key: ~/.ssh/id_ops
-        actor: agent.claude-coulombcore
+        actor: agt-claude-coulombcore
        health_check:
          url: http://127.0.0.1:18000/health
          interval_seconds: 30
@@ -25,11 +27,11 @@ VALID_YAML = textwrap.dedent("""\
          backoff_max: 60
    actors:
-      agent.claude-coulombcore:
+      agt-claude-coulombcore:
-        class: automation
+        class: agt
        description: Claude Code agent on CoulombCore
-      operator.bernd:
+      adm-bernd:
-        class: human
+        class: adm
        description: Bernd Worsch
 """)
@@ -50,7 +52,7 @@ def test_load_valid_config(config_file, monkeypatch):
    assert t.remote_port == 18000
    assert t.local_port == 8000
    assert t.ssh_user == "ubuntu"
-    assert t.actor == "agent.claude-coulombcore"
+    assert t.actor == "agt-claude-coulombcore"
 def test_health_check_loaded(config_file, monkeypatch):
@@ -74,10 +76,10 @@ def test_reconnect_policy_loaded(config_file, monkeypatch):
 def test_actors_loaded(config_file, monkeypatch):
    monkeypatch.setenv("BRIDGE_CONFIG", str(config_file))
    cfg = load_config()
-    assert "agent.claude-coulombcore" in cfg.actors
+    assert "agt-claude-coulombcore" in cfg.actors
-    a = cfg.actors["agent.claude-coulombcore"]
+    a = cfg.actors["agt-claude-coulombcore"]
-    assert a.actor_class == "automation"
+    assert a.actor_type == ActorType.AGT
-    assert "operator.bernd" in cfg.actors
+    assert "adm-bernd" in cfg.actors
 def test_missing_required_field_raises(tmp_path, monkeypatch):
@@ -118,12 +120,180 @@ def test_tunnel_without_health_check(tmp_path, monkeypatch):
            local_port: 8000
            ssh_user: ubuntu
            ssh_key: ~/.ssh/id_rsa
-            actor: operator.bernd
+            actor: adm-bernd
        actors:
-          operator.bernd:
+          adm-bernd:
-            class: human
+            class: adm
            description: Bernd
    """))
    monkeypatch.setenv("BRIDGE_CONFIG", str(f))
    cfg = load_config()
    assert cfg.tunnels["simple"].health_check is None
 class TestActorTypeValidation:
    def test_canonical_agt_accepted(self, tmp_path, monkeypatch):
        f = tmp_path / "t.yaml"
        f.write_text(textwrap.dedent("""\
            tunnels:
              t:
                host: h
                remote_port: 1
                local_port: 2
                ssh_user: u
                ssh_key: ~/.ssh/k
                actor: agt-claude
            actors:
              agt-claude:
                class: agt
        """))
        monkeypatch.setenv("BRIDGE_CONFIG", str(f))
        cfg = load_config()
        assert cfg.actors["agt-claude"].actor_type == ActorType.AGT
    def test_canonical_atm_accepted(self, tmp_path, monkeypatch):
        f = tmp_path / "t.yaml"
        f.write_text(textwrap.dedent("""\
            tunnels:
              t:
                host: h
                remote_port: 1
                local_port: 2
                ssh_user: u
                ssh_key: ~/.ssh/k
                actor: atm-backup
            actors:
              atm-backup:
                class: atm
        """))
        monkeypatch.setenv("BRIDGE_CONFIG", str(f))
        cfg = load_config()
        assert cfg.actors["atm-backup"].actor_type == ActorType.ATM
    def test_wrong_prefix_raises_config_error(self, tmp_path, monkeypatch):
        f = tmp_path / "t.yaml"
        f.write_text(textwrap.dedent("""\
            tunnels:
              t:
                host: h
                remote_port: 1
                local_port: 2
                ssh_user: u
                ssh_key: ~/.ssh/k
                actor: adm-bernd
            actors:
              adm-bernd:
                class: agt
        """))
        monkeypatch.setenv("BRIDGE_CONFIG", str(f))
        with pytest.raises(ConfigError, match="must start with 'agt-'"):
            load_config()
    def test_missing_prefix_raises_config_error(self, tmp_path, monkeypatch):
        f = tmp_path / "t.yaml"
        f.write_text(textwrap.dedent("""\
            tunnels:
              t:
                host: h
                remote_port: 1
                local_port: 2
                ssh_user: u
                ssh_key: ~/.ssh/k
                actor: operator.bernd
            actors:
              operator.bernd:
                class: adm
        """))
        monkeypatch.setenv("BRIDGE_CONFIG", str(f))
        with pytest.raises(ConfigError, match="must start with 'adm-'"):
            load_config()
    def test_unknown_class_raises_config_error(self, tmp_path, monkeypatch):
        f = tmp_path / "t.yaml"
        f.write_text(textwrap.dedent("""\
            tunnels:
              t:
                host: h
                remote_port: 1
                local_port: 2
                ssh_user: u
                ssh_key: ~/.ssh/k
                actor: adm-bernd
            actors:
              adm-bernd:
                class: wizard
        """))
        monkeypatch.setenv("BRIDGE_CONFIG", str(f))
        with pytest.raises(ConfigError, match="unknown class"):
            load_config()
    def test_legacy_human_maps_to_adm_with_warning(self, tmp_path, monkeypatch):
        f = tmp_path / "t.yaml"
        f.write_text(textwrap.dedent("""\
            tunnels:
              t:
                host: h
                remote_port: 1
                local_port: 2
                ssh_user: u
                ssh_key: ~/.ssh/k
                actor: adm-bernd
            actors:
              adm-bernd:
                class: human
        """))
        monkeypatch.setenv("BRIDGE_CONFIG", str(f))
        with warnings.catch_warnings(record=True) as w:
            warnings.simplefilter("always")
            cfg = load_config()
        assert cfg.actors["adm-bernd"].actor_type == ActorType.ADM
        assert any("deprecated" in str(x.message).lower() for x in w)
    def test_legacy_automation_maps_to_atm_with_warning(self, tmp_path, monkeypatch):
        f = tmp_path / "t.yaml"
        f.write_text(textwrap.dedent("""\
            tunnels:
              t:
                host: h
                remote_port: 1
                local_port: 2
                ssh_user: u
                ssh_key: ~/.ssh/k
                actor: atm-cron
            actors:
              atm-cron:
                class: automation
        """))
        monkeypatch.setenv("BRIDGE_CONFIG", str(f))
        with warnings.catch_warnings(record=True) as w:
            warnings.simplefilter("always")
            cfg = load_config()
        assert cfg.actors["atm-cron"].actor_type == ActorType.ATM
        assert any("deprecated" in str(x.message).lower() for x in w)
 class TestCertCommandConfig:
    def test_cert_command_parsed(self, tmp_path, monkeypatch):
        f = tmp_path / "t.yaml"
        f.write_text(textwrap.dedent("""\
            tunnels:
              t:
                host: h
                remote_port: 1
                local_port: 2
                ssh_user: u
                ssh_key: ~/.ssh/k
                actor: agt-bridge
                cert_command: "warden sign agt-bridge --pubkey /tmp/k.pub"
            actors:
              agt-bridge:
                class: agt
        """))
        monkeypatch.setenv("BRIDGE_CONFIG", str(f))
        cfg = load_config()
        assert cfg.tunnels["t"].cert_command == "warden sign agt-bridge --pubkey /tmp/k.pub"
    def test_no_cert_command_is_none(self, config_file, monkeypatch):
        monkeypatch.setenv("BRIDGE_CONFIG", str(config_file))
        cfg = load_config()
        assert cfg.tunnels["state-hub-coulombcore"].cert_command is None
--- a/tests/test_diagnostics.py
+++ b/tests/test_diagnostics.py
@@ -6,7 +6,11 @@ from unittest.mock import MagicMock, patch
 import pytest
-from bridge.diagnostics import TunnelCheckResult, check_all_tunnels, check_tunnel
+from bridge.diagnostics import (
    _remote_port_probe_command,
    check_all_tunnels,
    check_tunnel,
 )
 from bridge.models import BridgeState, TunnelConfig
 from bridge.state import StateManager
@@ -20,7 +24,7 @@ def tcfg():
        local_port=8000,
        ssh_user="ubuntu",
        ssh_key="~/.ssh/id_ops",
-        actor="operator.bernd",
+        actor="adm-bernd",
    )
@@ -32,6 +36,14 @@ def state_mgr(tmp_path):
 class TestCheckTunnel:
    def test_remote_port_probe_has_minimal_host_fallback(self):
        """Remote probe supports minimal hosts without ss/netstat."""
        command = _remote_port_probe_command(18000)
        assert "command -v ss" in command
        assert "command -v netstat" in command
        assert "/proc/net/tcp" in command
        assert "/proc/net/tcp6" in command
    def test_no_pid(self, tcfg, state_mgr):
        """No PID file → ssh_process='no_pid', ok=False."""
        with patch("bridge.diagnostics.subprocess.run") as mock_run:
@@ -83,6 +95,29 @@ class TestCheckTunnel:
        assert result.remote_port == "closed"
        assert result.ok is False
    def test_local_direction_checks_local_port(self, tcfg, state_mgr):
        """Local tunnels verify the local listener instead of a remote -R port."""
        local_cfg = TunnelConfig(
            name="local-tunnel",
            host="haskelseed.local",
            remote_port=1234,
            local_port=11234,
            ssh_user="root",
            ssh_key="~/.ssh/id_ops",
            actor="adm-bernd",
            direction="local",
        )
        state_mgr.write_pid("local-tunnel", 12345)
        with (
            patch("bridge.diagnostics._pid_alive", return_value=True),
            patch("bridge.diagnostics._probe_local_port", return_value="listening"),
            patch("bridge.diagnostics.subprocess.run") as mock_run,
        ):
            result = check_tunnel(local_cfg, state_mgr)
        mock_run.assert_not_called()
        assert result.remote_port == "listening"
        assert result.ok is True
    def test_ssh_timeout(self, tcfg, state_mgr):
        """SSH probe timeout → remote_port='error:timeout'."""
        state_mgr.write_pid("test-tunnel", 12345)
@@ -114,7 +149,7 @@ class TestCheckTunnel:
            local_port=8000,
            ssh_user="ubuntu",
            ssh_key="~/.ssh/id_ops",
-            actor="operator.bernd",
+            actor="adm-bernd",
            health_check=HealthCheckConfig(url="http://127.0.0.1:8000/health"),
        )
        state_mgr.write_pid("test-tunnel", 12345)
@@ -135,7 +170,8 @@ class TestCheckAllTunnels:
    def test_check_all_iterates_tunnels(self, tmp_path):
        """check_all_tunnels returns one result per tunnel in cfg."""
        from bridge.config import load_config
-        import textwrap, os
+        import textwrap
        import os
        cfg_file = tmp_path / "tunnels.yaml"
        cfg_file.write_text(textwrap.dedent("""\
@@ -146,17 +182,17 @@ class TestCheckAllTunnels:
                local_port: 8001
                ssh_user: ubuntu
                ssh_key: ~/.ssh/id_ops
-                actor: operator.bernd
+                actor: adm-bernd
              t2:
                host: h2.local
                remote_port: 18002
                local_port: 8002
                ssh_user: ubuntu
                ssh_key: ~/.ssh/id_ops
-                actor: operator.bernd
+                actor: adm-bernd
            actors:
-              operator.bernd:
+              adm-bernd:
-                class: human
+                class: adm
                description: Bernd
        """))
        os.environ["BRIDGE_CONFIG"] = str(cfg_file)
--- a/tests/test_integration.py
+++ b/tests/test_integration.py
@@ -18,14 +18,14 @@ MINIMAL_CONFIG = textwrap.dedent("""\
        local_port: 8000
        ssh_user: testuser
        ssh_key: ~/.ssh/id_rsa
-        actor: operator.bernd
+        actor: adm-bernd
        reconnect:
          max_attempts: 2
          backoff_initial: 1
          backoff_max: 2
    actors:
-      operator.bernd:
+      adm-bernd:
-        class: human
+        class: adm
        description: Bernd
 """)
@@ -51,7 +51,7 @@ def tunnel_cfg():
        local_port=8000,
        ssh_user="testuser",
        ssh_key="~/.ssh/id_rsa",
-        actor="operator.bernd",
+        actor="adm-bernd",
        reconnect=ReconnectPolicy(max_attempts=2, backoff_initial=1, backoff_max=2),
    )
@@ -142,7 +142,7 @@ class TestHealthCheckDegradedPath:
            local_port=8001,
            ssh_user="u",
            ssh_key="k",
-            actor="operator.bernd",
+            actor="adm-bernd",
            reconnect=ReconnectPolicy(max_attempts=1, backoff_initial=1, backoff_max=1),
            health_check=hc_cfg,
        )
--- a/tests/test_manager.py
+++ b/tests/test_manager.py
@@ -3,6 +3,8 @@ import os
 import signal
 from unittest.mock import MagicMock, patch
 from dataclasses import replace
 import pytest
 from bridge.models import BridgeState, ReconnectPolicy, TunnelConfig
@@ -38,6 +40,16 @@ class TestBuildSshCommand:
        assert "-i" in cmd
        assert "ubuntu@host.local" in cmd
    def test_remote_host_override_local(self, tunnel_cfg):
        cfg = replace(tunnel_cfg, direction="local", remote_host="10.43.103.154")
        cmd = build_ssh_command(cfg)
        assert "-L" in cmd
        assert f"{cfg.local_port}:10.43.103.154:{cfg.remote_port}" in cmd
    def test_remote_host_default_loopback(self, tunnel_cfg):
        cmd = build_ssh_command(tunnel_cfg)
        assert "18000:127.0.0.1:8000" in cmd
    def test_server_alive_options(self, tunnel_cfg):
        cmd = build_ssh_command(tunnel_cfg)
        assert "-o" in cmd
@@ -105,3 +117,99 @@ class TestTunnelManager:
    def test_is_running_false_initially(self, tunnel_cfg, state_dir):
        mgr = TunnelManager(tunnel_cfg, state_dir=state_dir)
        assert not mgr.is_running()
 class TestBuildSshCommandWithCert:
    def test_no_cert_path_omits_extra_i(self, tunnel_cfg):
        cmd = build_ssh_command(tunnel_cfg)
        assert cmd.count("-i") == 1
    def test_cert_path_appends_after_key(self, tunnel_cfg, tmp_path):
        cert = tmp_path / "test-cert.pub"
        cert.write_text("cert")
        cmd = build_ssh_command(tunnel_cfg, cert_path=cert)
        i_indices = [i for i, x in enumerate(cmd) if x == "-i"]
        assert len(i_indices) == 2
        key_idx, cert_idx = i_indices
        assert not cmd[key_idx + 1].endswith("-cert.pub")  # key comes first
        assert cmd[cert_idx + 1] == str(cert)
 class TestRunCertCommand:
    def test_returns_none_when_no_cert_command(self, tunnel_cfg, tmp_path):
        from bridge.manager import _run_cert_command
        assert _run_cert_command(tunnel_cfg, tmp_path) is None
    def test_writes_cert_and_returns_path(self, tunnel_cfg, tmp_path):
        from bridge.manager import _run_cert_command
        tunnel_cfg.cert_command = "echo 'ssh-rsa-cert AAAA'"
        path = _run_cert_command(tunnel_cfg, tmp_path)
        assert path is not None
        assert path.exists()
        assert "ssh-rsa-cert" in path.read_text()
    def test_raises_on_nonzero_exit(self, tunnel_cfg, tmp_path):
        from bridge.manager import _run_cert_command
        from bridge.models import CertAcquisitionError
        tunnel_cfg.cert_command = "exit 1"
        with pytest.raises(CertAcquisitionError):
            _run_cert_command(tunnel_cfg, tmp_path)
 class TestActorTypeFromName:
    def test_adm_prefix(self):
        from bridge.manager import _actor_type_from_name
        assert _actor_type_from_name("adm-bernd") == "adm"
    def test_agt_prefix(self):
        from bridge.manager import _actor_type_from_name
        assert _actor_type_from_name("agt-claude") == "agt"
    def test_atm_prefix(self):
        from bridge.manager import _actor_type_from_name
        assert _actor_type_from_name("atm-cron") == "atm"
    def test_unknown_prefix(self):
        from bridge.manager import _actor_type_from_name
        assert _actor_type_from_name("operator.bernd") == "unknown"
 class TestTtlRefresh:
    def test_parse_cert_expiry_returns_none_for_missing_file(self, tmp_path):
        from bridge.manager import _parse_cert_expiry
        missing = tmp_path / "no.pub"
        result = _parse_cert_expiry(missing)
        assert result is None
    def test_parse_cert_identity_returns_none_for_missing_file(self, tmp_path):
        from bridge.manager import _parse_cert_identity
        missing = tmp_path / "no.pub"
        result = _parse_cert_identity(missing)
        assert result is None
    def test_parse_cert_identity_from_keygen_output(self, tmp_path):
        from unittest.mock import patch, MagicMock
        from bridge.manager import _parse_cert_identity
        cert = tmp_path / "test.pub"
        cert.write_text("fake")
        with patch("subprocess.run") as mock_run:
            mock_run.return_value = MagicMock(
                stdout='test.pub:\n        Key ID: "agt-bridge"\n',
                returncode=0,
            )
            result = _parse_cert_identity(cert)
        assert result == "agt-bridge"
    def test_parse_cert_expiry_from_keygen_output(self, tmp_path):
        from unittest.mock import patch, MagicMock
        from bridge.manager import _parse_cert_expiry
        cert = tmp_path / "test.pub"
        cert.write_text("fake")
        with patch("subprocess.run") as mock_run:
            mock_run.return_value = MagicMock(
                stdout="test.pub:\n        Valid: from 2026-05-15T10:00:00 to 2030-05-15T22:00:00\n",
                returncode=0,
            )
            result = _parse_cert_expiry(cert)
        assert result is not None
        assert result.year == 2030
--- a/tests/test_mcp.py
+++ b/tests/test_mcp.py
@@ -49,10 +49,10 @@ def _simple_config(tmp_path: Path) -> Path:
            local_port: 8000
            ssh_user: ubuntu
            ssh_key: ~/.ssh/id_ops
-            actor: operator.bernd
+            actor: adm-bernd
        actors:
-          operator.bernd:
+          adm-bernd:
-            class: human
+            class: adm
            description: Bernd
    """))
@@ -66,10 +66,10 @@ def _catalog_config(tmp_path: Path, catalog_dir: Path) -> Path:
            local_port: 8000
            ssh_user: ubuntu
            ssh_key: ~/.ssh/id_ops
-            actor: operator.bernd
+            actor: adm-bernd
        actors:
-          operator.bernd:
+          adm-bernd:
-            class: human
+            class: adm
            description: Bernd
        catalog_path: {catalog_dir}
    """))
@@ -237,22 +237,22 @@ class TestMcpBridgeDown:
 class TestMcpBridgeRestart:
    @pytest.mark.capability("bridge_restart")
    @pytest.mark.access_mode("mcp")
-    async def test_bridge_restart_calls_stop_then_start(self, env_simple):
+    async def test_bridge_restart_delegates_to_cleanup(self, env_simple):
-        with patch("bridge.manager.TunnelManager") as mock_cls:
+        from bridge.cleanup import CleanupAction
-            mock_mgr = MagicMock()
+
-            call_order = []
+        with patch("bridge.cleanup.restart_tunnel") as mock_restart:
-            mock_mgr.stop.side_effect = lambda: call_order.append("stop")
+            mock_restart.return_value = CleanupAction(
-            mock_mgr.start.side_effect = lambda: call_order.append("start")
+                "test-tunnel", "healthy", "remote forward healthy"
-            mock_cls.return_value = mock_mgr
+            )
            from fastmcp import Client
            async with Client(mcp) as c:
                result = await c.call_tool("bridge_restart", {"tunnel": "test-tunnel"})
        data = _data(result)
-        assert "restarted" in data
+        assert data["actions"][0]["tunnel"] == "test-tunnel"
-        assert "test-tunnel" in data["restarted"]
+        assert data["actions"][0]["action"] == "healthy"
-        assert call_order == ["stop", "start"]
+        mock_restart.assert_called_once()
    async def test_bridge_restart_unknown_tunnel(self, env_simple):
        from fastmcp import Client
@@ -278,8 +278,8 @@ class TestMcpBridgeLogs:
            _json.dumps({
                "timestamp": "2026-01-01T00:00:00+00:00",
                "tunnel": "test-tunnel",
-                "actor": "operator.bernd",
+                "actor": "adm-bernd",
-                "actor_class": "human",
+                "actor_type": "adm",
                "event": "bridge_started",
            }) + "\n"
        )
--- a/tests/test_models.py
+++ b/tests/test_models.py
@@ -69,6 +69,7 @@ class TestTunnelConfig:
 class TestActorInfo:
    def test_fields(self):
-        a = ActorInfo(name="operator.bernd", actor_class="human", description="Bernd")
+        from bridge.models import ActorType
-        assert a.name == "operator.bernd"
+        a = ActorInfo(name="adm-bernd", actor_type=ActorType.ADM, description="Bernd")
-        assert a.actor_class == "human"
+        assert a.name == "adm-bernd"
        assert a.actor_type == ActorType.ADM
--- a/uv.lock
+++ b/uv.lock
@@ -345,7 +345,7 @@ wheels = [
 [[package]]
 name = "fastmcp"
-version = "3.1.0"
+version = "3.0.2"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "authlib" },
@@ -365,14 +365,13 @@ dependencies = [
    { name = "python-dotenv" },
    { name = "pyyaml" },
    { name = "rich" },
    { name = "uncalled-for" },
    { name = "uvicorn" },
    { name = "watchfiles" },
    { name = "websockets" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/0a/70/862026c4589441f86ad3108f05bfb2f781c6b322ad60a982f40b303b47d7/fastmcp-3.1.0.tar.gz", hash = "sha256:e25264794c734b9977502a51466961eeecff92a0c2f3b49c40c070993628d6d0", size = 17347083 }
+sdist = { url = "https://files.pythonhosted.org/packages/11/6b/1a7ec89727797fb07ec0928e9070fa2f45e7b35718e1fe01633a34c35e45/fastmcp-3.0.2.tar.gz", hash = "sha256:6bd73b4a3bab773ee6932df5249dcbcd78ed18365ed0aeeb97bb42702a7198d7", size = 17239351 }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/17/07/516f5b20d88932e5a466c2216b628e5358a71b3a9f522215607c3281de05/fastmcp-3.1.0-py3-none-any.whl", hash = "sha256:b1f73b56fd3b0cb2bd9e2a144fc650d5cc31587ed129d996db7710e464ae8010", size = 633749 },
+    { url = "https://files.pythonhosted.org/packages/0a/5a/f410a9015cfde71adf646dab4ef2feae49f92f34f6050fcfb265eb126b30/fastmcp-3.0.2-py3-none-any.whl", hash = "sha256:f513d80d4b30b54749fe8950116b1aab843f3c293f5cb971fc8665cb48dbb028", size = 606268 },
 ]
 [[package]]
@@ -664,7 +663,7 @@ dev = [
 [package.metadata]
 requires-dist = [
-    { name = "fastmcp", specifier = ">=2.0.0" },
+    { name = "fastmcp", specifier = ">=2.0.0,<3.1.0" },
    { name = "httpx", specifier = ">=0.27" },
    { name = "pyyaml", specifier = ">=6.0" },
    { name = "typer", specifier = ">=0.12" },
@@ -1297,15 +1296,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/dc/9b/47798a6c91d8bdb567fe2698fe81e0c6b7cb7ef4d13da4114b41d239f65d/typing_inspection-0.4.2-py3-none-any.whl", hash = "sha256:4ed1cacbdc298c220f1bd249ed5287caa16f34d44ef4e9c3d0cbad5b521545e7", size = 14611 },
 ]
 [[package]]
 name = "uncalled-for"
 version = "0.2.0"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/02/7c/b5b7d8136f872e3f13b0584e576886de0489d7213a12de6bebf29ff6ebfc/uncalled_for-0.2.0.tar.gz", hash = "sha256:b4f8fdbcec328c5a113807d653e041c5094473dd4afa7c34599ace69ccb7e69f", size = 49488 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/ff/7f/4320d9ce3be404e6310b915c3629fe27bf1e2f438a1a7a3cb0396e32e9a9/uncalled_for-0.2.0-py3-none-any.whl", hash = "sha256:2c0bd338faff5f930918f79e7eb9ff48290df2cb05fcc0b40a7f334e55d4d85f", size = 11351 },
 ]
 [[package]]
 name = "uvicorn"
 version = "0.41.0"
--- a/wiki/AccessManagementDirective.md
+++ b/wiki/AccessManagementDirective.md
@@ -0,0 +1,203 @@
 AccessManagementDirective
 *Practical host access control management *
 # AccessManagementDirective
 **Document Title:** SSH Access Management Directive  
 **Version:** 1.1 (Production-Ready Revision – Post-SWOT Improvements)  
 **Date:** 28 March 2026  
 **Audience:** Operations Department  
 **Purpose:** Establish a simple, efficient, scalable, and secure standard for managing SSH access across all hosts for three actor types: Admins (adm), Agents (agt), and Automations (atm).  
 **Author:** Grok (on behalf of the team)  
 **Status:** Official Directive – All ops personnel, agents, and automation pipelines MUST follow this.  
 **Changes in v1.1:** Added prerequisites, emergency break-glass procedure, concrete issuance examples, strengthened CA security, enhanced scorecard, human UX guidance, agent risk clarification, KRL support, and tighter TTL recommendations.
 ## 0. Prerequisites
 Before bootstrapping, the following must be in place:
 - Ansible (or equivalent config-management tool) with a central inventory.
 - HashiCorp Vault (or equivalent secrets manager) with the SSH secrets engine enabled.
 - GitOps repository containing the authoritative principals inventory.
 - Basic monitoring/alerting for Vault and SSH logs (e.g., Prometheus + Loki or equivalent).
 - At least two ops personnel trained on Vault SSH signing and Ansible playbooks.
 If any of these are missing, complete them first or the “automatic” parts of this directive will not function reliably.
 ## 1. Concept Overview
 This directive replaces the legacy practice of scattering static SSH public keys in `~/.ssh/authorized_keys` files. Instead, we adopt **SSH Certificate Authority (CA) based authentication** as the single source of truth.
 **Why this model?**  
 - A central CA signs short-lived certificates for every login.  
 - No more manual key copying, key sprawl, or painful revocation.  
 - Built-in expiration, role-based principals, and auditability.  
 - Works identically for humans, LLM-powered autonomous agents, and deterministic scripts.  
 - Scales from 5 hosts to 500+ with almost zero per-host maintenance.
 **Core Principles**  
 - **Least privilege** – Every certificate carries explicit *principals* (roles) and optional `force-command` / `source-address` restrictions.  
 - **Short-lived credentials** – Certificates expire automatically (24–48 h for admins, 4–24 h for agents, 1–8 h for automations).  
 - **One CA, many issuers** – A single offline User CA whose public key is trusted by every host.  
 - **Automation-first** – All key issuance, rotation, and host configuration is driven by code (Ansible + Vault).  
 - **Separation of concerns** –  
  - **Admins (adm)**: Human operators (full interactive shell when needed).  
  - **Agents (agt)**: LLM-powered autonomous entities that can self-register wake-up triggers and execute tasks.  
  - **Automations (atm)**: Deterministic scripts / cron jobs / pipelines with narrow, purpose-specific rights.
 ## 2. Actor Definitions & Access Model
 | Actor Type | Identifier Prefix | Description | Typical Certificate Lifetime | Principals / Restrictions |
 |------------|-------------------|-------------|------------------------------|---------------------------|
 | **Admin (adm)** | `adm-` | Human operator (on-call engineers) | 24–48 hours (renewable) | `adm-full`, `adm-readonly` + optional `force-command` |
 | **Agent (agt)** | `agt-` | LLM-powered autonomous agent (can schedule own wake-ups) | 4–24 hours (auto-refresh) | `agt-task-<name>`, limited to specific scripts/directories |
 | **Automation (atm)** | `atm-` | Deterministic script / pipeline | 1–8 hours (per invocation) | `atm-<jobname>`, `force-command=/usr/local/bin/atm-wrapper.sh` |
 **Certificate Naming Convention**  
 - Identity string (`-I`): `adm-bernd`, `agt-incident-resolver-v2`, `atm-backup-daily`  
 - Principals (`-n`): comma-separated list of allowed roles (stored in `/etc/ssh/auth_principals/%u` on hosts)
 **LLM-Agent Risk Clarification**  
 Agent signing policy MUST enforce least-privilege principals + `force-command` wrappers; never grant blanket shell access to autonomous agents.
 ## 3. Bootstrapping the System (One-Time Setup)
 ### 3.1. Create the CA (do this once, offline)
 ```bash
 ssh-keygen -t ed25519 -f /secure/vault/ca_user -C "Ops SSH User CA (2026)" -N ""
 ```
 - Store the private key in an HSM-backed Vault (or air-gapped offline storage) with **4-eyes approval** required for any signing operation.  
 - Rotate the CA key itself every 2–3 years using the same bootstrap playbook.  
 - Public key: `ca_user.pub`
 ### 3.2. Deploy Trust on Every Host (Ansible playbook `bootstrap-ssh-ca.yml`)
 - Copy `ca_user.pub` → `/etc/ssh/ca/ca_user.pub` (mode 644, root-owned).  
 - Update `/etc/ssh/sshd_config`:
  ```bash
  TrustedUserCAKeys /etc/ssh/ca/ca_user.pub
  AuthorizedPrincipalsFile /etc/ssh/auth_principals/%u
  PubkeyAuthentication yes
  PasswordAuthentication no
  PermitRootLogin no
  ```
 - Create principals directory and files from the central Git inventory.  
 - `systemctl restart sshd`
 ### 3.3. Initial Admin Access
 First admin generates personal keypair → submits `.pub` → CA signs a bootstrap certificate valid for 48 hours with principal `adm-bootstrap`. This is the ONLY manual step.
 ## 4. Automatic Management of Access Rights
 ### 4.1. Daily / On-Demand Workflow
 1. **Key/Certificate Issuance Pipeline** (GitOps + Vault)  
   - **Humans (adm)**: Use the recommended CLI wrapper `ops-ssh-sign` (or Teleport `tsh` if adopted early) so signing feels invisible.  
   - **Agents (agt)**: At startup, call Vault SSH engine API (auto-refreshed by a wrapper daemon).  
   - **Automations (atm)**: Just-in-time cert request via Vault inside a thin wrapper script.
 2. **Ansible-Driven Host Updates** (run hourly via CI/CD)  
   - `auth_principals/` files are rendered from a central inventory (JSON/YAML in Git).  
   - Example inventory snippet:
     ```yaml
     hosts:
       - name: prod-db-01
         allowed_principals:
           adm: [adm-full]
           agt: [agt-incident-resolver-v2]
           atm: [atm-backup-daily, atm-logrotate]
     ```
 3. **Revocation & Rotation**  
   - Short expiry = automatic revocation.  
   - For emergency revocation of a still-valid cert, maintain a Key Revocation List (KRL) and push it via Ansible (`RevokedKeys` directive in `sshd_config`).  
   - Agents/automations never store long-lived private keys on disk.
 4. **Concrete Agent & Automation Wrapper Example** (Python snippet – place in `/usr/local/bin/ops-ssh-wrapper`)
   ```python
   #!/usr/bin/env python3
   import subprocess, os, tempfile
   # Request short-lived cert from Vault
   cert = subprocess.check_output(["vault", "write", "-field=signed_key", "ssh/sign/agt-role", f"public_key={os.environ['SSH_PUBKEY']}"]).decode().strip()
   with tempfile.NamedTemporaryFile(suffix="-cert.pub", delete=False) as f:
       f.write(cert.encode())
       cert_path = f.name
   # Load into ssh-agent and exec the real command
   subprocess.run(["ssh-add", cert_path])
   os.execvp(sys.argv[1], sys.argv[1:])
   ```
   Agents call this wrapper; it auto-refreshes the cert on every wake-up.
 ### 4.2. Human UX Guidance
 Admins are encouraged to use the `ops-ssh-sign` wrapper script (provided in the ops repo) or Teleport `tsh ssh` for seamless experience. Manual `ssh-keygen -s` is only for edge cases.
 ### 4.3. Emergency Break-Glass Procedure
 In case of total lockout (CA offline, misconfigured Ansible push, etc.):
 1. Use the pre-documented static emergency key pair on a separate bastion host (rotated quarterly, stored in Vault with 4-eyes access).  
 2. Or fall back to cloud-provider console access (AWS SSM Session Manager, GCP IAP, Azure Bastion).  
 3. Document the exact recovery playbook in the same Git repo under `emergency/break-glass.md`.  
 4. After recovery, immediately rotate the CA and run a full scorecard.
 ## 5. AccessManagement Scorecard (Checklist)
 Run via Ansible `ssh-access-audit.yml`. Each item is pass/fail.
 | Category | Check | Target | Tool |
 |----------|-------|--------|------|
 | **CA Trust** | `TrustedUserCAKeys` points to correct file | All hosts | `ssh-audit` |
 | **No Static Keys** | `authorized_keys` files are empty or contain only emergency bootstrap keys | All hosts | `find /home -name authorized_keys -size +0` |
 | **Principals Config** | `/etc/ssh/auth_principals/%u` exists and is up-to-date | All hosts | Ansible inventory diff |
 | **Expiry Policy** | All issued certs have `Valid: < 48h` (adm) or `< 24h` (agt/atm) | Last 100 certs | `ssh-keygen -L -f *.pub` |
 | **Password Auth** | Disabled globally | All hosts | `sshd -T \| grep password` |
 | **Root Login** | Disabled | All hosts | `sshd -T \| grep permitroot` |
 | **Agent/Automation Wrapper** | Every agt/atm binary calls Vault for cert | All pipelines | Code review + runtime trace |
 | **Audit Logging** | Every SSH connection logs certificate identity (`-I`) to central SIEM | All hosts | `journalctl -u sshd` + SIEM query |
 | **CA Security** | CA key access is 4-eyes / HSM-backed | Vault policy | Vault audit log |
 | **Bootstrap Complete** | No `adm-bootstrap` principal in use | All hosts | Scorecard run |
 | **Score** | ≥ 10/10 = **Operational** | - | - |
 **Scorecard Execution Command** (run from ops laptop):
 ```bash
 ansible all -m command -a "ssh-access-scorecard.sh" --become
 ```
 ## 6. Scope & Operational Boundaries
 ### 6.1. When Bootstrapping Is Officially Closed
 The system is **fully operational** when **ALL** of the following are true:
 - Scorecard passes 10/10 on every host.
 - Central Git repo contains the authoritative principals inventory.
 - First three admins have successfully used signed certificates for 7 consecutive days.
 - At least one agent (agt) and one automation (atm) have executed a task using a CA-signed certificate.
 - CI/CD pipeline for host config updates is green and runs hourly.
 - Emergency break-glass procedure has been tested once.
 **Declaration:** Ops Lead signs off with date in the Git commit message.
 ### 6.2. Scope Boundary – When to Switch to Sophisticated Tooling
 Stay with **native OpenSSH CA + Ansible + Vault** while:
 - ≤ 200 hosts
 - ≤ 50 distinct agent/automation identities
 - No regulatory requirement for SSO or full session recording
 **Switch triggers** (any one):
 - > 200 hosts OR rapid daily growth
 - Need for human SSO (Okta/Google) integration
 - Requirement for audited web-based SSH sessions or just-in-time access approval
 - Agents need built-in Machine-ID / workload identity (e.g., Teleport tbot)
 - Audit/compliance demands central policy engine or session recording
 **Recommended next-level tools** (in order):
 1. **Teleport** – Best for mixed human + agent workloads (SSO + Machine ID).  
 2. **HashiCorp Vault SSH + Boundary** – When you already use Vault heavily.  
 3. **step-ca + smallstep** – If you prefer a pure open-source CA with OIDC.
 **Migration path:** The CA public key and principals model are fully compatible; you can import the existing CA into Teleport/Vault without re-issuing keys to users.
 ## 7. Enforcement & Review
 - **Quarterly review** of this directive and scorecard results.  
 - **Violations** (e.g., adding static keys) trigger immediate access revocation and incident ticket.  
 - **Questions / improvements** → create PR against this file in the ops repo.
 **End of Document**  
 Approved for immediate use across all production and staging environments.
 xxx
--- a/wiki/OpsBridge.md
+++ b/wiki/OpsBridge.md
@@ -157,31 +157,82 @@ Just controlled operational access when you need it.
 Start a bridge:
 ```
-ob up hostA=hostB
+bridge up state-hub-railiance01
 ```
 Check active bridges:
 ```
-ob status
+bridge status
 ```
 Investigate infrastructure targets:
 ```
-ob targets
+bridge targets
 ```
 Stop the bridge when finished:
 ```
-ob down hostA=hostB
+bridge down state-hub-railiance01
 ```
 OpsBridge handles the lifecycle so operators can focus on solving the problem.
 ---
 # Tunnel lifecycle commands
 | Command | Purpose |
 |---------|---------|
 | `bridge up` | Start tunnel(s) that are not already running |
 | `bridge down` | Stop tunnel(s) that are running |
 | `bridge restart` | Blank-slate recovery — get tunnel(s) operational again |
 | `bridge maintenance cleanup` | Proactive hygiene sweep without implying restart |
 ## `bridge restart` — blank-slate recovery
 `bridge restart` means *operational again*, not merely cycling the local manager
 PID while a broken remote listener still holds the port.
 For **reverse** tunnels (State Hub exposure on remote hosts), restart:
 1. Runs `should_cleanup_tunnel` to detect stale SSH remote forwards
 2. Clears orphan listeners on the remote host when needed
 3. Reconnects the tunnel (stop + start) only when cleanup was required
 When the remote forward is already healthy, restart reports `healthy` and leaves
 the working tunnel running — no unnecessary disruption.
 For **local-direction** tunnels (`direction: local` in `tunnels.yaml`, e.g.
 `k3s-api-coulombcore`), restart uses local stop/start only; no remote cleanup.
 Use `bridge maintenance cleanup` for scheduled or manual hygiene without the
 restart contract. The nightly cron (`bridge maintenance install-cron`) runs
 `maintenance cleanup --restart` at 03:00.
 **Incident context:** stale orphan `sshd` remote forwards after laptop sleep
 blocked `bridge restart` until operators discovered the maintenance subcommand.
 See `state-hub/history/20260621-weekend-automation-assessment.md` and
 `BRIDGE-WP-0005` in this repo.
 ## Host roles
 Tunnels in `~/.config/bridge/tunnels.yaml` serve three host roles:
 | Role | Hosts | Behaviour |
 |------|-------|-----------|
 | **Workstation origin** | WSL laptop | Shutdown, sleep, and network changes kill local bridge processes without graceful remote SSH teardown. Orphan forwards on all remotes are common after wake. |
 | **VPS remotes** | coulombcore, railiance01 | Normally always-on. Maintenance reboots clear kernel state, but laptop return can leave orphan forwards from the previous session if the VPS did not reboot. |
 | **LAN builder** | haskelseed | Intermittently offline; same orphan-forward pattern when the workstation-side tunnel dies uncleanly. |
 Conditional remote cleanup before restart benefits all reverse tunnels.
 `should_cleanup_tunnel` skips healthy forwards — VPS tunnels with live working
 forwards are untouched.
 ---
 # The Philosophy Behind OpsBridge
 Infrastructure teams succeed or fail based on how effectively they bridge the gaps between:
--- a/workplans/ADHOC-2026-06-14.md
+++ b/workplans/ADHOC-2026-06-14.md
@@ -0,0 +1,56 @@
 ---
 id: ADHOC-2026-06-14
 type: workplan
 title: "Ad hoc ops-bridge fixes for 2026-06-14"
 domain: custodian
 repo: ops-bridge
 status: finished
 owner: codex
 topic_slug: ops-bridge
 created: "2026-06-14"
 updated: "2026-06-14"
 state_hub_workstream_id: "fbc2ef7e-626f-4c6a-bdf8-c69bf29097ce"
 ---
 ## Fix haskelseed bridge diagnostics
 ```task
 id: ADHOC-2026-06-14-T01
 status: done
 priority: medium
 state_hub_task_id: "ffe6b8d8-889c-4ec4-8b64-00b77f86e39f"
 ```
 `haskelseed` is an Alpine host without `ss`, so `bridge check` reported
 reverse tunnel ports as closed even while SSH reverse listeners were present.
 Updated diagnostics to fall back from `ss` to `netstat` and then
 `/proc/net/tcp`/`tcp6`. Also fixed local-direction diagnostics so
 `nix-daemon-haskelseed` checks the local `-L` listener instead of probing a
 remote reverse port.
 Verification:
 - `state-hub-haskelseed` responded through `127.0.0.1:18000/state/health`.
 - `bridge check --json` reported all configured tunnels `ok: true`.
 - `python3 -m pytest tests/test_cli.py tests/test_diagnostics.py` passed.
 ## Make default target safe and add setup
 ```task
 id: ADHOC-2026-06-14-T02
 status: done
 priority: medium
 state_hub_task_id: "3b932955-0d75-4b95-9821-92bfa2dadbd0"
 ```
 Changed `make` to default to a help listing that only shows targets with
 `##` comments. Added `make setup` to run `uv sync --all-groups` and reinstall
 the editable `bridge` CLI wrapper through `uv tool install -e . --force`.
 Verification:
 - `uv sync --all-groups` succeeded and installed the project environment.
 - `make` listed targets only and did not run tests or setup.
 - `make setup` succeeded and installed the `bridge` executable.
 - `make test` passed all 235 tests.
 - `make lint` passed.
--- a/workplans/BRIDGE-WP-0001-initial-implementation.md
+++ b/workplans/BRIDGE-WP-0001-initial-implementation.md
@@ -2,7 +2,7 @@
 id: BRIDGE-WP-0001
 type: workplan
 title: "OpsBridge Initial Implementation"
-domain: custodian
+domain: infotech
 repo: ops-bridge
 status: completed
 owner: Bernd
--- a/workplans/BRIDGE-WP-0002-opscatalog-extension.md
+++ b/workplans/BRIDGE-WP-0002-opscatalog-extension.md
@@ -2,7 +2,7 @@
 id: BRIDGE-WP-0002
 type: workplan
 title: "OpsCatalog Extension"
-domain: custodian
+domain: infotech
 repo: ops-bridge
 status: completed
 owner: Bernd
--- a/workplans/BRIDGE-WP-0003-mcp-skill-cross-mode-tests.md
+++ b/workplans/BRIDGE-WP-0003-mcp-skill-cross-mode-tests.md
@@ -2,7 +2,7 @@
 id: BRIDGE-WP-0003
 type: workplan
 title: "OpsBridge MCP Server, Skill, and Cross-Mode Test Coverage"
-domain: custodian
+domain: infotech
 repo: ops-bridge
 status: done
 owner: Bernd
--- a/workplans/BRIDGE-WP-0004-directive-alignment.md
+++ b/workplans/BRIDGE-WP-0004-directive-alignment.md
@@ -0,0 +1,340 @@
 ---
 id: BRIDGE-WP-0004
 type: workplan
 title: "AccessManagementDirective Alignment"
 domain: infotech
 repo: ops-bridge
 status: done
 owner: Bernd
 topic_slug: custodian
 created: "2026-03-28"
 updated: "2026-03-28"
 state_hub_workstream_id: "e3451b70-688e-4e19-bff5-0c82c0f009a7"
 ---
 # BRIDGE-WP-0004 — AccessManagementDirective Alignment
 **Scope:** Align `ops-bridge` with `wiki/AccessManagementDirective.md` — three-actor model,
 optional CA-signed certificate acquisition, TTL-aware reconnect, richer audit log — while
 preserving full backward compatibility with the existing static-key mode.
 **Out of scope:** CA/signing logic itself (lives in `ops-warden`), host-side principal
 deployment, Vault cluster management, OpsCatalog extensions (BRIDGE-WP-0002).
 ---
 ## Goal
 After this workplan:
 1. `ops-bridge` works unchanged for anyone using plain, non-expiring SSH keys.
 2. `ops-bridge` works with CA-signed short-lived certs via `ops-warden` (or any compatible
   `cert_command`) — cert acquisition, cert rotation, and cert identity logging are all
   handled transparently by the tunnel manager.
 3. Actor attribution is expressed in the three-actor vocabulary (`adm | agt | atm`) from
   the directive, with config validation that enforces naming conventions.
 4. The audit log carries `cert_identity` when a cert was used, satisfying the directive's
   §5 SIEM traceability requirement.
 ---
 ## Reference Documents
 | Document | Location |
 |---|---|
 | AccessManagementDirective | `wiki/AccessManagementDirective.md` |
 | WARDEN-WP-0001 | `workplans/WARDEN-WP-0001-initial-implementation.md` |
 | PRD | `wiki/OpsBridgePrd.md` |
 | FRS | `wiki/OpsBridgeFrs.md` |
 ---
 ## Design Decisions
 ### Static key mode stays first-class
 If `cert_command` is absent from a tunnel config, `ops-bridge` behaves exactly as today:
 `ssh_key` is passed directly to `ssh -i`. No deprecation, no warnings. Static keys are
 explicitly supported for:
 - Lab/dev environments without a CA
 - Tunnels owned by `adm`-class humans who manage their own cert refresh externally
 - Environments below the directive's complexity threshold
 ### cert_command interface
 ```yaml
 # tunnels.yaml — optional cert_command field
 tunnels:
  state-hub-coulombcore:
    host: coulombcore
    remote_port: 8001
    local_port: 8000
    ssh_user: agt-state-hub-bridge
    ssh_key: ~/.ssh/agt-state-hub-bridge_ed25519   # private key (always required)
    actor: agt-state-hub-bridge
    cert_command: "warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub"
 ```
 When `cert_command` is present, `manager.py` runs it before every SSH subprocess launch,
 captures stdout as the cert text, writes it to a tempfile in the state dir, and adds
 `-i <cert_path>` alongside `-i <key_path>` to the SSH command. The cert file is cleaned up
 on tunnel stop.
 `cert_command` is a raw shell string, intentionally. The caller decides whether it invokes
 `warden`, `vault write`, `ssh-keygen -s`, or any other tool. This keeps the interface
 dependency-free — no Vault SDK, no warden import needed inside ops-bridge.
 ### TTL-aware cert refresh
 After acquiring a cert, `manager.py` parses `Valid before:` via `ssh-keygen -L` to
 determine `cert_expires_at`. It schedules a pre-emptive cert refresh
 (`cert_expires_at - 5 min`) inside the health-check/wait loop. When the refresh timer
 fires, the SSH subprocess is gracefully restarted with a freshly signed cert — no auth
 failure, no reconnect backoff triggered.
 If `cert_command` is absent, no TTL logic runs.
 ### Actor type model
 `actor_class: str  # "human" | "automation"` is replaced by:
 ```python
 class ActorType(str, Enum):
    ADM = "adm"   # human operator
    AGT = "agt"   # LLM-powered autonomous agent
    ATM = "atm"   # deterministic script / pipeline
 ```
 Backward-compat mapping at config load time: `"human"` → `adm`, `"automation"` → `atm`.
 The mapping is a one-way migration aid with a deprecation warning; new configs must use the
 canonical values.
 Config validation: if `actor` name is set, it must start with the prefix matching its type
 (`adm-*`, `agt-*`, `atm-*`). Hard error, not a warning — the directive requires this for
 SIEM auditability.
 ---
 ## Tasks
 ### T1 — ActorType enum
 ```task
 id: BRIDGE-WP-0004-T1
 state_hub_task_id: 40c7f818-8233-4b84-9a0e-5f5359a47504
 status: done
 priority: high
 ```
 - [x] `models.py`: replace `actor_class: str` in `ActorInfo` with `actor_type: ActorType`
 - [x] `config.py`: accept legacy `"human"` → `ActorType.ADM` and `"automation"` →
      `ActorType.ATM` with a `DeprecationWarning`; reject unknown values
 - [x] `config.py`: enforce actor name prefix: `adm-*` for ADM, `agt-*` for AGT,
      `atm-*` for ATM; raise `ConfigError` on mismatch
 - [x] Update `manager.py` / `audit.py` call sites: `actor_class` → `actor_type.value`
 - [x] Update tests
 ### T2 — cert_command config field
 ```task
 id: BRIDGE-WP-0004-T2
 state_hub_task_id: d69ac3b8-6c68-4da0-976f-0cce2ee626d6
 status: done
 priority: high
 ```
 - [x] `models.py`: add `cert_command: Optional[str] = None` to `TunnelConfig`
 - [x] `config.py`: parse `cert_command` from tunnel YAML; no validation of the string
      content (shell-level freedom intentional)
 - [x] Document in config example / SCOPE.md
 ### T3 — Cert acquisition in manager
 ```task
 id: BRIDGE-WP-0004-T3
 state_hub_task_id: b93be1e4-dd32-4e9c-a085-c5bf81108d97
 status: done
 priority: high
 ```
 - [x] `manager.py`: extract cert acquisition into `_acquire_cert(cfg) -> Optional[Path]`
      - If `cfg.cert_command` is None: return None (static key mode)
      - Run `cert_command` via `subprocess.run(shell=True, capture_output=True)`
      - Write stdout to `~/.local/state/bridge/<tunnel>-cert.pub` (overwrite each time)
      - Return path; on non-zero exit code: raise `CertAcquisitionError` with stderr
 - [x] `build_ssh_command`: accept optional `cert_path`; when set, insert
      `-i <cert_path>` after `-i <key_path>` (OpenSSH loads both automatically)
 - [x] Call `_acquire_cert` at the top of each reconnect iteration (not once at startup)
      so every reconnect gets a fresh cert
 ### T4 — cert_identity in audit log
 ```task
 id: BRIDGE-WP-0004-T4
 state_hub_task_id: bc29cc2a-1d77-48d8-97d3-54a49de0550e
 status: done
 priority: high
 ```
 - [x] `manager.py`: after cert acquisition, parse `ssh-keygen -L -f <cert>` output to
      extract `Key ID` (the `-I` value from signing time)
 - [x] Add `cert_identity: Optional[str]` to `AuditLogger.log()` signature; include in
      JSON entry when present
 - [x] Log `cert_identity` in `BRIDGE_CONNECTED` and `BRIDGE_STARTED` events
 - [x] `AuditEvent`: no new events needed; `cert_identity` is metadata on existing events
 ### T5 — TTL-aware cert refresh
 ```task
 id: BRIDGE-WP-0004-T5
 state_hub_task_id: cc3aee49-7821-4a11-a331-be562aa88d91
 status: done
 priority: high
 ```
 - [x] `manager.py`: after successful cert acquisition, parse `Valid before:` timestamp
      from `ssh-keygen -L` output → `cert_expires_at: datetime`
 - [x] In the health-check/wait loop, check `datetime.now(utc) >= cert_expires_at - timedelta(minutes=5)`
      on each iteration
 - [x] When refresh is due: call `proc.terminate()`, break inner loop, let the outer
      reconnect loop restart naturally (T3 will re-acquire the cert at the top of the
      next iteration)
 - [x] Log a new `AuditEvent.CERT_EXPIRING` event when refresh is triggered (add to
      `AuditEvent` enum); include `cert_identity` and `cert_expires_at` in detail field
 - [x] If `cert_command` is absent, skip all TTL logic entirely
 ### T6 — `bridge cert-status` command
 ```task
 id: BRIDGE-WP-0004-T6
 state_hub_task_id: b10275fc-bfe2-49a9-a83e-dd0dec796efd
 status: done
 priority: medium
 ```
 - [x] `cli.py`: add `cert-status [TUNNEL]` subcommand
 - [x] For each tunnel (or the named one): read cert file from state dir if present,
      run `ssh-keygen -L`, display: identity, principals, valid-from, valid-until,
      time-to-expiry (or "static key / no cert" if absent)
 - [x] Exit code 1 if any cert is expired; exit code 0 otherwise (scriptable)
 - [x] `--json` flag for machine-readable output
 ### T7 — CertAcquisitionError handling
 ```task
 id: BRIDGE-WP-0004-T7
 state_hub_task_id: de355a7c-f07e-452e-974f-4ddf362b24a6
 status: done
 priority: high
 ```
 - [x] New exception `CertAcquisitionError` in `models.py`
 - [x] In `_run_loop`: catch `CertAcquisitionError`, log `AuditEvent.BRIDGE_DISCONNECTED`
      with `detail="cert acquisition failed: <stderr>"`, apply normal backoff and retry
      (cert failures are transient — e.g., Vault briefly unreachable)
 - [x] After `max_attempts` consecutive cert failures, transition to `FAILED` state
 ### T8 — SCOPE.md and documentation updates
 ```task
 id: BRIDGE-WP-0004-T8
 state_hub_task_id: 40f5364b-f9e1-41cb-90e5-2b19511108f1
 status: done
 priority: medium
 ```
 - [x] Update `SCOPE.md`: Current State updated to reflect completion; directive alignment done
 - [x] `wiki/OpsBridgeFrs.md` §5.7 already covers actor attribution abstractly — no changes needed
 - [x] `.claude/rules/architecture.md` already documents cert_command mode and actor vocab
 - [ ] Update `wiki/OpsBridgePrd.md`: note directive alignment, ops-warden dependency (deferred)
 ### T9 — Tests
 ```task
 id: BRIDGE-WP-0004-T9
 state_hub_task_id: fc1d1321-c1d0-4a0a-ae2e-d9ec9939dd6a
 status: done
 priority: high
 ```
 - [x] `test_config.py`: actor name prefix validation (adm/agt/atm); legacy class mapping;
      cert_command parse
 - [x] `test_manager.py`: mock `cert_command` subprocess; verify cert path appended to SSH
      args; verify `CertAcquisitionError` on non-zero exit; TTL logic helpers
 - [x] `test_audit.py`: `cert_identity` field; actor_type rename
 - [x] `test_cli.py`: `cert-status` exit codes; JSON output shape
 - [x] 233 tests, 0 failures
 ---
 ## Config Schema — Before / After
 ### Before
 ```yaml
 tunnels:
  state-hub-coulombcore:
    host: coulombcore
    remote_port: 8001
    local_port: 8000
    ssh_user: ops-agent
    ssh_key: ~/.ssh/id_ed25519
    actor: automation-agent
 actors:
  automation-agent:
    class: automation
    description: "state hub bridge agent"
 ```
 ### After (static key mode — unchanged behavior)
 ```yaml
 tunnels:
  state-hub-coulombcore:
    host: coulombcore
    remote_port: 8001
    local_port: 8000
    ssh_user: agt-state-hub-bridge
    ssh_key: ~/.ssh/agt-state-hub-bridge_ed25519
    actor: agt-state-hub-bridge
 actors:
  agt-state-hub-bridge:
    class: agt
    description: "state hub bridge agent"
 ```
 ### After (cert_command mode — ops-warden or any CA)
 ```yaml
 tunnels:
  state-hub-coulombcore:
    host: coulombcore
    remote_port: 8001
    local_port: 8000
    ssh_user: agt-state-hub-bridge
    ssh_key: ~/.ssh/agt-state-hub-bridge_ed25519
    actor: agt-state-hub-bridge
    cert_command: "warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub"
 actors:
  agt-state-hub-bridge:
    class: agt
    description: "state hub bridge agent"
 ```
 ---
 ## Acceptance Criteria
 - [x] Existing `tunnels.yaml` with `class: automation` loads without error (deprecation
      warning only); tunnel behaves identically
 - [x] New config with `class: agt` and actor name not prefixed `agt-` raises `ConfigError`
 - [x] Config with `cert_command` set: SSH process launched with both `-i key` and
      `-i cert`; `cert_identity` present in `BRIDGE_CONNECTED` audit event
 - [x] Config without `cert_command`: no cert file written; `cert_identity` absent in audit;
      no TTL logic runs
 - [x] `cert_command` exits non-zero: tunnel enters backoff/retry, `BRIDGE_DISCONNECTED`
      logged with stderr detail; eventually reaches `FAILED` after `max_attempts`
 - [x] Cert within 5 min of expiry: SSH restarted with fresh cert; `CERT_EXPIRING` logged
 - [x] `bridge cert-status` shows valid cert info; exits 1 on expired cert
 - [x] All tests pass: `uv run pytest` (233 passed)
 - [x] All lints pass: `uv run ruff check .`
--- a/workplans/BRIDGE-WP-0005-restart-includes-remote-cleanup.md
+++ b/workplans/BRIDGE-WP-0005-restart-includes-remote-cleanup.md
@@ -0,0 +1,194 @@
 ---
 id: BRIDGE-WP-0005
 type: workplan
 title: "Restart includes remote cleanup (blank-slate recovery)"
 domain: infotech
 repo: ops-bridge
 status: finished
 owner: codex
 topic_slug: custodian
 created: "2026-06-21"
 updated: "2026-06-21"
 state_hub_workstream_id: "9565491f-e664-4add-bea4-27c4fb015ee0"
 ---
 # BRIDGE-WP-0005 — Restart includes remote cleanup
 **Origin:** `STATE-WP-0063` weekend automation repair (2026-06-21). A stale orphan
 `sshd` remote forward on Railiance01 port `18000` blocked
 `bridge restart state-hub-railiance01` from producing a working tunnel. Operators
 had to discover `bridge maintenance cleanup <tunnel> --restart` separately.
 **Operator expectation:** `bridge restart` should mean *operational again* — a
 blank-slate recovery — not merely "cycle the local manager PID while a broken
 remote listener still holds the port."
 ## Topology and failure modes (refined)
 Tunnels in `~/.config/bridge/tunnels.yaml` serve three distinct host roles.
 Cleanup policy must respect all of them.
 ### A. Workstation (laptop WSL) — tunnel **origin**
 The State Hub API runs locally (`127.0.0.1:8000`). Reverse tunnels expose it on
 remote hosts:
 | Remote host | Tunnels (reverse) | Role |
 |-------------|-------------------|------|
 | **coulombcore** (`92.205.130.254`) | `state-hub-coulombcore`, `state-hub-mcp-coulombcore` | VPS — stable, occasional maintenance reboot |
 | **railiance01** (`92.205.62.239`) | `state-hub-railiance01`, `state-hub-mcp-railiance01` | VPS — stable, occasional maintenance reboot |
 | **haskelseed** (`192.168.178.135`) | `state-hub-haskelseed`, `state-hub-mcp-haskelseed` | LAN builder — may sleep/reboot when moved |
 **Laptop behaviour:** shutdown, sleep, and location changes (home ↔ office) kill
 local bridge processes without graceful remote SSH teardown. Orphan `sshd`
 listeners on **all three remotes** are common after wake — especially
 `18000`/`18001` on VPS hosts that activity-core and remote agents depend on.
 ### B. Haskelseed — also intermittently offline
 Haskelseed is not a datacenter VPS; it may be powered down or unreachable on
 different networks. The same orphan-forward pattern applies to its reverse ports
 when the workstation-side tunnel dies uncleanly.
 ### C. VPS remotes (coulombcore, railiance01)
 Normally always-on. Maintenance reboots clear remote kernel state, but:
 - a VPS reboot does **not** fix a workstation that is still in `reconnecting`
  with a dead local SSH child;
 - when the laptop returns, orphan forwards from the **previous** session may
  still block new `-R` binds if the VPS did not reboot.
 **Conclusion:** conditional remote cleanup before restart benefits **all reverse
 tunnels**, not only laptop-adjacent hosts. `should_cleanup_tunnel()` already
 skips healthy forwards — VPS tunnels with live working forwards are untouched.
 ### D. Local-direction tunnels — no remote cleanup
 `direction: local` tunnels (`k3s-api-coulombcore`, `nix-daemon-haskelseed`) use
 forward mode from workstation to remote services. They do not bind remote reverse
 ports for State Hub. **`restart` stays local stop/start only** for these.
 ## Design (decided)
 | Command | Behaviour after this workplan |
 |---------|-------------------------------|
 | `bridge restart [tunnel]` | For each **reverse** tunnel: `cleanup_tunnel(..., restart=True)` — run `should_cleanup_tunnel`; clear stale remote listener if needed; then start. For **local** tunnels: existing `stop()` + `start()`. |
 | `bridge maintenance cleanup` | Unchanged — proactive hygiene cron / manual sweep without implying user-facing "restart". |
 | `bridge up` | Out of scope here (see T4 optional follow-up). |
 Implementation sketch: replace the body of `cli.restart()` with a call to
 `cleanup_all_tunnels(..., restart=True, tunnel_name=...)` for reverse tunnels,
 or per-tunnel `cleanup_tunnel` when a single tunnel is named.
 Emit the same action summary strings cleanup already uses (`healthy`,
 `cleaned_and_restarted`, `error`) so operators see whether remote hygiene ran.
 ## Out of scope
 - Changing `should_cleanup_tunnel` heuristics (unless tests expose a VPS false
  positive during T2).
 - Auto-cleanup inside the reconnect backoff loop (stretch — T4).
 - Renaming tunnels or changing `tunnels.yaml` host entries.
 ---
 ## T1 — Wire restart through cleanup path
 ```task
 id: BRIDGE-WP-0005-T01
 status: done
 priority: high
 state_hub_task_id: "b61c5d45-1198-416d-aa15-f2063fc5eb14"
 ```
 Refactor `bridge/cli.py` `restart()` so reverse tunnels call
 `cleanup_tunnel(cfg, state_mgr, restart=True)` instead of bare
 `TunnelManager.stop()` + `start()`.
 Requirements:
 - Single-tunnel and all-tunnel restart both work.
 - Local-direction tunnels keep stop/start only.
 - Exit codes: preserve today’s semantics where practical; exit non-zero if any
  named tunnel ends in `CleanupAction.action == "error"`.
 - Stdout tells the operator what happened (`healthy`, `cleaned_and_restarted`,
  etc.), not only "Restarted tunnel".
 ## T2 — Tests and regression coverage
 ```task
 id: BRIDGE-WP-0005-T02
 status: done
 priority: high
 state_hub_task_id: "b4ad0525-6936-4799-bead-3603d05c49af"
 ```
 Update `tests/test_cli.py`:
 - `test_restart_calls_stop_then_start` → assert restart delegates to cleanup for
  reverse tunnels.
 - Add cases: healthy forward (no remote kill), stale forward (remote cleanup
  invoked), local-direction tunnel (no cleanup call).
 - Reuse mocks from `tests/test_cleanup.py` patterns.
 `make test` and `make lint` pass.
 ## T3 — Operator docs and CLI help
 ```task
 id: BRIDGE-WP-0005-T03
 status: done
 priority: medium
 state_hub_task_id: "60586375-b0b4-4d4c-ba87-0699e76bf30c"
 ```
 Document the blank-slate restart contract:
 - `wiki/OpsBridge.md` — restart vs maintenance cleanup vs up/down.
 - `bridge restart --help` — mention conditional remote stale-forward cleanup.
 - Short "host roles" subsection: laptop origin, haskelseed intermittency, VPS
  maintenance — matching this workplan's topology section.
 - Cross-link from `state-hub` `STATE-WP-0063` / `history/20260621-weekend-automation-assessment.md`
  incident note (one line each way).
 ## T4 — Optional: reconnect-loop hygiene (stretch)
 ```task
 id: BRIDGE-WP-0005-T04
 status: cancel
 priority: low
 state_hub_task_id: "518f1b5e-3098-42aa-9662-bdab1d7d269b"
 ```
 Evaluate whether `TunnelManager` reconnect backoff should invoke remote cleanup
 once after repeated exit-255 bind failures (laptop wake without operator running
 `bridge restart`). Defer unless T1–T3 are done; mark `cancel` if heuristic risk
 outweighs benefit.
 **Decision (2026-06-21): cancelled for now.** Auto-cleanup inside the reconnect
 loop risks killing a legitimately healthy orphan forward owned by another session
 or operator. `bridge restart` now covers the operator-facing blank-slate path;
 nightly `maintenance cleanup --restart` covers unattended hygiene. Revisit only if
 wake-from-sleep reconnect failures remain frequent after a month of observation.
 ## T5 — Live verification on workstation + VPS
 ```task
 id: BRIDGE-WP-0005-T05
 status: done
 priority: medium
 state_hub_task_id: "b5d305ef-5b5d-4afe-a992-e0960d07af79"
 ```
 After T1–T2 ship, verify on real config:
 1. **railiance01** — `state-hub-mcp-railiance01` was `reconnecting` with stale
   forward; `bridge restart` reported `cleaned_and_restarted` and tunnel reached
   `connected`.
 2. **haskelseed** — not exercised (all tunnels already healthy); Alpine netstat
   path unchanged from ADHOC-2026-06-14 and covered by existing cleanup tests.
 3. **coulombcore** — `bridge restart state-hub-coulombcore` reported `healthy`,
   PID unchanged (4116), forward undisturbed.
 State Hub progress logged (2026-06-21). Workplan marked `finished`.
--- a/workplans/OPS-WP-0001-diagnostics.md
+++ b/workplans/OPS-WP-0001-diagnostics.md
@@ -2,7 +2,7 @@
 id: OPS-WP-0001
 type: workplan
 title: "ops-bridge diagnostics and flow improvements"
-domain: custodian
+domain: infotech
 repo: ops-bridge
 status: done
 owner: claude
--- a/workplans/OPS-WP-0002-agent-usability.md
+++ b/workplans/OPS-WP-0002-agent-usability.md
@@ -2,13 +2,13 @@
 id: OPS-WP-0002
 type: workplan
 title: "Agent Usability — MCP Registration, Skill, and Worker Orientation"
-domain: custodian
+domain: infotech
 repo: ops-bridge
-status: active
+status: done
 owner: custodian
 topic_slug: custodian
 created: "2026-03-21"
-updated: "2026-03-21"
+updated: "2026-03-26"
 depends_on: OPS-WP-0001
 state_hub_workstream_id: "c195cc40-8be7-462e-be26-a7d6bda34cd5"
 ---
@@ -74,7 +74,7 @@ worker agents:
 ```task
 id: OPS-WP-0002-T01
-status: todo
+status: done
 priority: high
 state_hub_task_id: "27fc6fa1-6d0e-438a-b4a3-c6091931da88"
 ```
@@ -101,7 +101,7 @@ Gate: `bridge_status()` tool callable via SSE on localhost:8002 after
 ```task
 id: OPS-WP-0002-T02
-status: todo
+status: done
 priority: high
 state_hub_task_id: "2216457d-035e-4804-b685-18975f3c6d1f"
 ```
@@ -133,7 +133,7 @@ mcp-http`.
 ```task
 id: OPS-WP-0002-T03
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "4b2e39eb-4585-4e60-ab16-9e7909eced74"
 ```
@@ -178,7 +178,7 @@ identifies and recovers a manually-stopped tunnel.
 ```task
 id: OPS-WP-0002-T04
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "cc64bb07-ea5d-498a-8c14-bb653581efe7"
 ```
@@ -213,9 +213,9 @@ session protocol references bridge status check.
 ## Done Criteria
- [ ] `make mcp-http` starts the MCP server on port 8002 (SSE)
+- [x] `make mcp-http` starts the MCP server on port 8002 (SSE)
- [ ] `bridge_status` and `bridge_check` callable as MCP tools from Claude Code
+- [x] `bridge_status` and `bridge_check` callable as MCP tools from Claude Code
- [ ] `ops-bridge` registered in `~/.claude.json` at user scope
+- [x] `ops-bridge` registered in `~/.claude.json` at user scope
- [ ] `/bridge` skill surfaces tunnel states and recovers a stopped tunnel
+- [x] `/bridge` skill surfaces tunnel states and recovers a stopped tunnel
- [ ] Global CLAUDE.md has worker agent bridge protocol
+- [x] Global CLAUDE.md has worker agent bridge protocol
- [ ] All existing tests pass after T01 changes (`make test`)
+- [x] All existing tests pass after T01 changes (`make test`)
Author	SHA1	Message	Date
tegwick	6572a2ac99	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-07-03: - update .custodian-brief.md for ops-bridge	2026-07-03 18:52:51 +02:00
tegwick	ce0aa728b1	tunnels: optional remote_host forward destination (default 127.0.0.1) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-07-02 14:18:18 +02:00
tegwick	00671f5133	Normalize agent instructions and workplan frontmatter (STATE-WP-0067) - Align agent files with on-disk workplan prefixes (infer from workplan ids) - Set workplan domain to registered domain_slug; add topic_slug where applicable - Repair frontmatter delimiter formatting; migrate legacy task status literals - Regenerate AGENTS.md, CLAUDE.md, and .claude/rules from State Hub templates	2026-06-22 23:16:27 +02:00
tegwick	09f2cd4b7a	Mark .repo-classification.yaml human-reviewed (CUST-WP-0050 T02) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 11:40:44 +02:00
tegwick	c3b4fb9d55	Reclassify as tooling (CUST-WP-0050 T02) Apply the new 'tooling' category (reusable internal tooling/infrastructure) from the Repo Classification Standard. First-pass agent classification. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 03:06:02 +02:00
tegwick	fab7409c66	Add repo classification (CUST-WP-0050 T02) First-pass agent classification per the Repo Classification Standard v1.0 (canon-repo-classification); pending human review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 02:44:47 +02:00
tegwick	1dd664c792	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-06-21: - update .custodian-brief.md for ops-bridge	2026-06-21 20:12:38 +02:00
tegwick	10c6fdaec9	feat(restart): route reverse tunnels through stale-forward cleanup bridge restart now means blank-slate recovery: reverse tunnels run should_cleanup_tunnel and clear orphan remote listeners before reconnecting; healthy forwards are left running. Local-direction tunnels keep stop/start only. CLI and MCP report per-tunnel actions (healthy, cleaned_and_restarted, restarted, error) and exit non-zero on cleanup failure. Closes BRIDGE-WP-0005.	2026-06-21 20:12:13 +02:00
tegwick	8c11acc00c	docs(ops-bridge): BRIDGE-WP-0005 restart includes remote cleanup Add workplan to make bridge restart perform conditional stale-forward cleanup before start (blank-slate recovery). Refines topology for laptop workstation origin, intermittently offline haskelseed, and stable VPS remotes (coulombcore, railiance01). Origin: STATE-WP-0063 tunnel incident. Registered in State Hub via fix-consistency.	2026-06-21 20:02:18 +02:00
tegwick	499b8781cc	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-06-21: - update .custodian-brief.md for ops-bridge	2026-06-21 20:02:10 +02:00
tegwick	4e9882909f	feat(maintenance): nightly stale SSH forward cleanup at 03:00 Add bridge maintenance cleanup to detect reverse tunnels whose remote port is bound but no longer forwards (zombie sshd sessions), kill the stale listeners on the remote host, and optionally restart the tunnel. Includes install-cron/uninstall-cron/show-cron helpers and README notes for the actcore-state-hub-bridge failure mode we hit on railiance01.	2026-06-19 15:59:27 +02:00
tegwick	a6857fb8f7	Add credential routing instructions for all agent runtimes Propagate shared credential-routing section (Codex, Claude, Grok, llm-connect) from state-hub template via scripts/propagate_credential_routing.py.	2026-06-18 22:48:39 +02:00
tegwick	675772ab3b	Add capability registry scaffold (REUSE-WP-0014-T06 B04)	2026-06-16 01:55:58 +02:00
tegwick	6eb0b1c52f	Fixing bridge to haskelseed	2026-06-14 19:46:06 +02:00
tegwick	d949f3e93e	Refresh agent instruction files	2026-05-18 16:55:47 +02:00
tegwick	de984736ca	feat(cli): add `bridge conventions` and link from actor errors Surfaces the actor naming rules (adm-/agt-/atm- prefixes, legacy class aliases) so users hitting a ConfigError have an in-CLI way to read the spec without grepping the wiki. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 23:21:37 +02:00
tegwick	28ecef121e	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-05-15: - update .custodian-brief.md for ops-bridge	2026-05-15 12:19:50 +02:00
tegwick	860c08f1db	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-05-15: - update .custodian-brief.md for ops-bridge	2026-05-15 09:39:01 +02:00
tegwick	bd169a07e2	feat(directive): implement BRIDGE-WP-0004 AccessManagementDirective alignment - ActorType enum (adm/agt/atm) replaces actor_class string; config validates naming convention (adm-/agt-/atm-*) with hard ConfigError on mismatch; legacy 'human'/'automation' values accepted with DeprecationWarning - cert_command: pluggable shell string run before each SSH launch; cert written to state dir; -i cert appended to SSH command alongside -i key - TTL-aware cert refresh: parses Valid-to via ssh-keygen -L; pre-emptive restart 5 min before expiry (no backoff, no attempt increment); CERT_EXPIRING logged - CertAcquisitionError: cert failures trigger normal backoff/retry loop - cert_identity: Key ID parsed from cert and recorded in BRIDGE_CONNECTED event - bridge cert-status: new CLI command; exit 1 on expired cert; --json flag - 233 tests passing, ruff clean Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 09:38:29 +02:00
tegwick	22601ef3e6	chore(workplans): sync BRIDGE-WP-0004 and WARDEN-WP-0001 tasks to state hub Both workplans had been registered as active workstreams but tasks were never ingested — the markdown checkbox format was invisible to the consistency checker, which requires task code blocks. Activated both workplans (draft→active) and added task blocks with state_hub_task_id for all 19 tasks (9 + 10). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 00:29:51 +02:00
tegwick	569de1497c	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-05-06: - update .custodian-brief.md for ops-bridge	2026-05-06 04:24:17 +02:00
tegwick	fafd04ed2e	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-05-06: - update .custodian-brief.md for ops-bridge	2026-05-06 02:41:26 +02:00
tegwick	c1d87b47df	Added INTENT.md file	2026-05-02 23:17:22 +02:00
tegwick	204bf48bc8	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-05-01: - update .custodian-brief.md for ops-bridge	2026-05-01 23:22:08 +02:00
tegwick	595c495f7c	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-05-01: - update .custodian-brief.md for ops-bridge	2026-05-01 23:07:50 +02:00
tegwick	90eda27a14	Scope update from repo-scoping refactor	2026-05-01 12:28:27 +02:00
tegwick	1361727e15	Added untracked workplans	2026-04-25 17:06:05 +02:00
tegwick	18e3c118dd	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-04-21: - update .custodian-brief.md for ops-bridge	2026-04-21 02:14:25 +02:00
Bernd Worsch	621de64ee0	chore: merge origin/main — reconcile divergent branches Integrates remote changes (session protocol, .custodian-brief.md, MCP SSE/HTTP mode, workplan OPS-WP-0002 completion) with local changes (AccessManagementDirective alignment, architecture docs, BRIDGE-WP-0004 and WARDEN-WP-0001 workplans). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-28 01:05:11 +00:00
Bernd Worsch	f3a7236c5d	docs: align architecture and scope with AccessManagementDirective Expands architecture constraints and SCOPE.md to reflect the three-actor vocabulary (adm/agt/atm), two credential modes (static key + cert_command), and ops-warden boundary. Adds directive wiki doc and two new workplans (BRIDGE-WP-0004 directive alignment, WARDEN-WP-0001 ops-warden bootstrap). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-28 00:59:38 +00:00
tegwick	4f3c8646b3	feat(mcp): SSE/HTTP mode, workplan OPS-WP-0002 done - Add --http flag to MCP server for SSE transport on port 8002 - Add make mcp-http / mcp-stop targets - Pin fastmcp<3.1.0 to stabilize dependency - Update session-protocol: Step 0 tunnel health check before orient - Mark OPS-WP-0002 and all its tasks done Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 14:10:49 +01:00
tegwick	431beef31b	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-03-26: - update .custodian-brief.md for ops-bridge	2026-03-26 22:46:07 +01:00
tegwick	1c7c6eedf8	chore(session): read .custodian-brief.md before MCP call in session init Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 17:48:52 +01:00