Files
the-custodian/workplans/CUST-WP-0021-multi-host-repo-paths.md
tegwick 27eb6b14ad feat(CUST-WP-0021): multi-host repo path hardening — all 5 tasks complete
- T01 (done prior): registered host_paths for bnt-lap001 (14 repos) and
  COULOMBCORE (6 repos) via POST /repos/{slug}/paths/
- T02: validate_repo_adr now accepts repo_slug (not raw path); resolves
  local path via host_paths[hostname] → local_path; clear error for
  unregistered/missing paths
- T03: ingest_sbom_tool lockfile_path is now optional and relative to
  resolved repo root; absolute paths accepted with deprecation warning
- T04: check_repo_consistency pre-flight guard — fetches repo, resolves
  path, returns clear error before spawning subprocess if path missing
- T05: TOOLS.md — updated validate_repo_adr row (slug not path);
  added Multi-Host & Remote Agent Usage section documenting design
  boundary, remote agent workflow, and update_repo_path usage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 22:53:25 +01:00

187 lines
6.5 KiB
Markdown

---
id: CUST-WP-0021
type: workplan
title: "State Hub — Multi-Host Repo Path Hardening"
domain: custodian
status: done
owner: custodian
topic_slug: custodian
created: "2026-03-18"
updated: "2026-03-18"
state_hub_workstream_id: "516ca332-5eac-4d6e-8bf9-b2694ed34276"
---
# State Hub — Multi-Host Repo Path Hardening
## Summary
When a kaizen-agentic worker on COULOMBCORE calls file-system-touching MCP
tools (validate_repo_adr, check_repo_consistency, ingest_sbom), the MCP server
runs those scripts on its own machine (bnt-lap001) against its own copy of the
repo. Two problems emerge:
1. **Wrong path**: `validate_repo_adr` and `ingest_sbom_tool` take raw
filesystem paths as input. A remote agent passes their local path
(e.g. `/home/tegwick/the-custodian`) which does not exist on the server.
2. **Stale state**: Even when path resolution works, the server runs against
its own checkout. A remote agent ahead on a branch gets misleading results.
## Design Boundary (documented, not fixed)
The MCP server is a subprocess on bnt-lap001. It can only read files from
bnt-lap001's filesystem. File-system tools will always operate against the
server's copy. The correct workflow for remote agents working on an ahead
branch is: push/sync first, or run consistency_check.py locally with
`--api-base http://127.0.0.1:18000` rather than via the MCP tool.
This rule is documented in TOOLS.md.
## Resolves
- validate_repo_adr raw-path input (broken for remote callers)
- ingest_sbom_tool raw lockfile path (same problem)
- host_paths empty for all repos except kaizen-agentic
- No error when server lacks the repo — silent wrong results
---
## Tasks
### T01 — Register COULOMBCORE host_paths for all repos it has
```task
id: CUST-WP-0021-T01
status: done
priority: high
state_hub_task_id: "cf2c0449-9250-4425-925b-302482d75a11"
```
COULOMBCORE hostname: `254.130.205.92.host.secureserver.net`
Repos present at `/home/tegwick/<slug>`:
| repo slug | COULOMBCORE path |
|-----------|-----------------|
| the-custodian | /home/tegwick/the-custodian |
| kaizen-agentic | /home/tegwick/kaizen-agentic |
| ops-bridge | /home/tegwick/ops-bridge |
| marki-docx | /home/tegwick/marki-docx |
| railiance-cluster | /home/tegwick/railiance-cluster |
| railiance-infra | /home/tegwick/railiance-infra |
Also register bnt-lap001 paths for all repos currently using only `local_path`
(migrate them into `host_paths` so the map is the canonical source):
| repo slug | bnt-lap001 path |
|-----------|----------------|
| the-custodian | /home/worsch/the-custodian |
| kaizen-agentic | /home/worsch/kaizen-agentic |
| ops-bridge | /home/worsch/ops-bridge |
| activity-core | /home/worsch/activity-core |
| markitect-project | /home/worsch/markitect_project |
| railiance-apps | /home/worsch/railiance-apps |
| railiance-cluster | /home/worsch/railiance-cluster |
| railiance-bootstrap | /home/worsch/railiance-cluster |
| railiance-enablement | /home/worsch/railiance-enablement |
| railiance-hosts | /home/worsch/railiance-infra |
| railiance-infra | /home/worsch/railiance-infra |
| railiance-platform | /home/worsch/railiance-platform |
| key-cape | /home/worsch/key-cape |
| net-kingdom | /home/worsch/net-kingdom |
Use `POST /repos/{slug}/paths/` with `{"host": "<hostname>", "path": "<path>"}`.
---
### T02 — Fix validate_repo_adr: accept repo_slug, resolve path from DB
```task
id: CUST-WP-0021-T02
status: done
priority: high
state_hub_task_id: "52ed094a-4216-4cd1-a634-a29b82f26ec5"
```
Change `validate_repo_adr(repo_path: str, ...)` to
`validate_repo_adr(repo_slug: str, ...)` in `mcp_server/server.py`.
Resolution logic (same as `_kaizen_agents_dir()`):
1. Fetch repo record via `_get(f"/repos/{repo_slug}")`
2. `hostname = socket.gethostname()`
3. `path = host_paths.get(hostname) or repo.get("local_path") or ""`
4. If path is empty or not a directory: return a clear error message with
instructions for remote agents to run the script locally.
5. Pass the resolved path to the validate_repo_adr.py subprocess as before.
Update the tool docstring to document the new parameter and the design
boundary (tool always runs against the server's copy).
---
### T03 — Fix ingest_sbom_tool: resolve lockfile via repo_slug + relative path
```task
id: CUST-WP-0021-T03
status: done
priority: medium
state_hub_task_id: "2a67d5e2-f581-490c-b42e-0f7d37979c0a"
```
Change `ingest_sbom_tool(repo_slug, lockfile_path: str)` so `lockfile_path`
becomes optional and is interpreted as **relative to the repo root** when
provided (not absolute). When omitted, the script auto-detects the lockfile
(existing behaviour with `--repo-path`).
Resolution logic:
1. Fetch repo record, resolve path via `host_paths[hostname]` / `local_path`
2. If path empty/missing: return clear error
3. If `lockfile_path` provided and is relative: join with resolved repo root
4. If `lockfile_path` is absolute: use as-is (backward compat), but emit a
deprecation warning in the result string
5. Pass `--repo-path <resolved>` and optionally `--lockfile <lockfile>` to script
---
### T04 — Add host-path guard to check_repo_consistency
```task
id: CUST-WP-0021-T04
status: done
priority: medium
state_hub_task_id: "3885497e-c491-4ddf-811f-0f0d19a0fc42"
```
`check_repo_consistency` already resolves paths correctly via the script.
But when `host_paths` is empty and `local_path` is absent the script silently
skips file checks. Add a pre-flight guard in the MCP tool:
1. Fetch the repo record before spawning the subprocess
2. Run `resolve_repo_path` logic: `host_paths[hostname]``local_path`
3. If empty: return an early error message:
```
⚠ No path registered for this host (bnt-lap001).
Register with: update_repo_path(repo_slug, "/path/to/repo")
Remote agents: run consistency_check.py locally with --api-base http://127.0.0.1:18000
```
4. If path is set but directory doesn't exist: same error (not just empty string)
---
### T05 — Document design boundary in TOOLS.md
```task
id: CUST-WP-0021-T05
status: done
priority: low
state_hub_task_id: "b4ba8cd0-1093-43ce-8a5d-03529e3c0588"
```
Add a section to `state-hub/mcp_server/TOOLS.md` under a new heading
"## Multi-Host & Remote Agent Usage" that explains:
- File-sys tools (validate_repo_adr, check_repo_consistency, ingest_sbom)
always execute on the MCP server machine against its registered path
- Remote agents on a different branch/ahead-of-server should sync first
OR run the scripts locally with `--api-base http://127.0.0.1:18000`
- How to register a new host's path: `update_repo_path(slug, path, host)`
- Pure-API tools (get_state_summary, create_task, etc.) work from any host