chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27: - update .custodian-brief.md for the-custodian
This commit is contained in:
@@ -1,12 +1,15 @@
|
|||||||
<!-- custodian-brief: generated by fix-consistency — do not edit manually -->
|
<!-- custodian-brief: generated by fix-consistency — do not edit manually -->
|
||||||
# Custodian Brief — the-custodian
|
# Custodian Brief — the-custodian
|
||||||
|
|
||||||
**Domain:** custodian
|
**Domain:** railiance
|
||||||
**Last synced:** 2026-03-26 23:51 UTC
|
**Last synced:** 2026-03-26 23:52 UTC
|
||||||
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
|
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
|
||||||
|
|
||||||
## Active Workstreams
|
## Active Workstreams
|
||||||
|
|
||||||
|
### Cross-Repo E2E Sandbox Framework
|
||||||
|
Progress: 5/5 done | workstream_id: `b68de20b-e397-4f97-b1be-ad30711fc2a6`
|
||||||
|
|
||||||
### FOS Hub Bootstrap — Identity, Hub Extraction, Ops Hub, Fin Hub
|
### FOS Hub Bootstrap — Identity, Hub Extraction, Ops Hub, Fin Hub
|
||||||
Progress: 0/26 done | workstream_id: `293a74fe-a85a-4ad6-8933-23d52a72fe8b`
|
Progress: 0/26 done | workstream_id: `293a74fe-a85a-4ad6-8933-23d52a72fe8b`
|
||||||
|
|
||||||
@@ -61,6 +64,6 @@ Progress: 0/9 done | workstream_id: `9cc32158-2f5c-4ef6-9713-aacce4623d5e`
|
|||||||
## MCP Orientation (when available)
|
## MCP Orientation (when available)
|
||||||
|
|
||||||
If the state-hub MCP server is reachable, call:
|
If the state-hub MCP server is reachable, call:
|
||||||
`get_domain_summary("custodian")`
|
`get_domain_summary("railiance")`
|
||||||
This provides richer cross-domain context.
|
This provides richer cross-domain context.
|
||||||
If the MCP call fails, use this file as your orientation source.
|
If the MCP call fails, use this file as your orientation source.
|
||||||
|
|||||||
58
Makefile
Normal file
58
Makefile
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
# the-custodian top-level Makefile
|
||||||
|
|
||||||
|
## Run e2e tests for a repo in a remote sandbox
|
||||||
|
## Usage: make e2e REPO=activity-core
|
||||||
|
## Requires: RAILIANCE01_HOST env var (or pass HOST=<ip>)
|
||||||
|
##
|
||||||
|
## Options:
|
||||||
|
## REPO=<slug> repository name under ~/ (required)
|
||||||
|
## HOST=<host> override RAILIANCE01_HOST
|
||||||
|
## USER=root SSH user (default: root)
|
||||||
|
## KEY= path to SSH key (optional)
|
||||||
|
## KEEP= set to 1 to keep sandbox after run
|
||||||
|
## WORKSTREAM_ID= state-hub workstream ID for progress event
|
||||||
|
|
||||||
|
REPO_PATH := $(HOME)/$(REPO)
|
||||||
|
|
||||||
|
ifdef HOST
|
||||||
|
E2E_HOST_FLAG := --host $(HOST)
|
||||||
|
else
|
||||||
|
E2E_HOST_FLAG :=
|
||||||
|
endif
|
||||||
|
|
||||||
|
ifdef USER
|
||||||
|
E2E_USER_FLAG := --user $(USER)
|
||||||
|
else
|
||||||
|
E2E_USER_FLAG :=
|
||||||
|
endif
|
||||||
|
|
||||||
|
ifdef KEY
|
||||||
|
E2E_KEY_FLAG := --key $(KEY)
|
||||||
|
else
|
||||||
|
E2E_KEY_FLAG :=
|
||||||
|
endif
|
||||||
|
|
||||||
|
ifdef KEEP
|
||||||
|
E2E_KEEP_FLAG := --keep
|
||||||
|
else
|
||||||
|
E2E_KEEP_FLAG :=
|
||||||
|
endif
|
||||||
|
|
||||||
|
ifdef WORKSTREAM_ID
|
||||||
|
E2E_WS_FLAG := --workstream-id $(WORKSTREAM_ID)
|
||||||
|
else
|
||||||
|
E2E_WS_FLAG :=
|
||||||
|
endif
|
||||||
|
|
||||||
|
.PHONY: e2e
|
||||||
|
e2e:
|
||||||
|
@test -n "$(REPO)" || (echo "ERROR: REPO is required. Usage: make e2e REPO=activity-core"; exit 1)
|
||||||
|
@test -d "$(REPO_PATH)" || (echo "ERROR: repo path does not exist: $(REPO_PATH)"; exit 1)
|
||||||
|
@test -f "$(REPO_PATH)/e2e/e2e.yml" || (echo "ERROR: no e2e/e2e.yml in $(REPO_PATH)"; exit 1)
|
||||||
|
cd "$(CURDIR)" && python3 -m e2e_framework \
|
||||||
|
$(REPO_PATH) \
|
||||||
|
$(E2E_HOST_FLAG) \
|
||||||
|
$(E2E_USER_FLAG) \
|
||||||
|
$(E2E_KEY_FLAG) \
|
||||||
|
$(E2E_KEEP_FLAG) \
|
||||||
|
$(E2E_WS_FLAG)
|
||||||
97
e2e-framework/RUNBOOK.md
Normal file
97
e2e-framework/RUNBOOK.md
Normal file
@@ -0,0 +1,97 @@
|
|||||||
|
# E2E Sandbox Framework — Runbook
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
**Workstation:**
|
||||||
|
- `ssh` + `rsync` available
|
||||||
|
- `python3` + `pyyaml` available (or `uv run`)
|
||||||
|
- State-hub running on `:8000` (for result reporting)
|
||||||
|
|
||||||
|
**Sandbox host (railiance01):**
|
||||||
|
- SSH key access
|
||||||
|
- Docker + docker compose plugin installed
|
||||||
|
- Sufficient disk for images (~4 GB for activity-core stack)
|
||||||
|
|
||||||
|
## First run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Set sandbox host (once, or add to ~/.bashrc / .env)
|
||||||
|
export RAILIANCE01_HOST=<ip-or-alias> # e.g. 92.205.130.254
|
||||||
|
export RAILIANCE01_USER=root # optional, default=root
|
||||||
|
export RAILIANCE01_KEY=~/.ssh/id_rsa # optional, uses ssh default otherwise
|
||||||
|
|
||||||
|
# From the-custodian:
|
||||||
|
make e2e REPO=activity-core
|
||||||
|
```
|
||||||
|
|
||||||
|
Output will show each step: rsync → compose up → health wait → tests → compose down.
|
||||||
|
Exit code is 0 (all passed) or 1 (any failure).
|
||||||
|
|
||||||
|
## Options
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Keep sandbox alive after run (for debugging)
|
||||||
|
make e2e REPO=activity-core KEEP=1
|
||||||
|
|
||||||
|
# Override host without env var
|
||||||
|
make e2e REPO=activity-core HOST=192.168.1.50
|
||||||
|
|
||||||
|
# Attach result to a specific state-hub workstream
|
||||||
|
make e2e REPO=activity-core WORKSTREAM_ID=<uuid>
|
||||||
|
|
||||||
|
# Skip posting to state-hub
|
||||||
|
cd the-custodian && python3 -m e2e_framework ~/activity-core --no-report
|
||||||
|
```
|
||||||
|
|
||||||
|
## Adding a new repo
|
||||||
|
|
||||||
|
1. Create `<repo>/e2e/e2e.yml`:
|
||||||
|
```yaml
|
||||||
|
name: <repo-slug>
|
||||||
|
compose_file: docker-compose.dev.yml # or e2e/compose.yml
|
||||||
|
health_checks:
|
||||||
|
- name: <service>
|
||||||
|
url: http://localhost:<port>
|
||||||
|
timeout: 120
|
||||||
|
test_command: uv run python -m pytest e2e/tests/ -v
|
||||||
|
timeout: 300
|
||||||
|
cleanup: always
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Add `<repo>/e2e/tests/test_*.py` — test scripts that exit 0 on success.
|
||||||
|
|
||||||
|
3. Run: `make e2e REPO=<repo>`
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**Sandbox not cleaned up:**
|
||||||
|
```bash
|
||||||
|
ssh root@$RAILIANCE01_HOST 'ls /tmp/custodian-e2e/'
|
||||||
|
ssh root@$RAILIANCE01_HOST 'docker compose ls'
|
||||||
|
# Manually clean:
|
||||||
|
ssh root@$RAILIANCE01_HOST 'docker compose -p e2e-activity-core-<id> down -v; rm -rf /tmp/custodian-e2e/<id>'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Temporal startup slow (>2 min):**
|
||||||
|
Elasticsearch takes 60–90 seconds. The health check waits up to 180s.
|
||||||
|
If it times out, check:
|
||||||
|
```bash
|
||||||
|
ssh root@$RAILIANCE01_HOST 'docker logs temporal-elasticsearch | tail -20'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Worker fails to start:**
|
||||||
|
Check that `uv` is installed on the sandbox host:
|
||||||
|
```bash
|
||||||
|
ssh root@$RAILIANCE01_HOST 'which uv || curl -LsSf https://astral.sh/uv/install.sh | sh'
|
||||||
|
```
|
||||||
|
|
||||||
|
**rsync excluded paths:**
|
||||||
|
`.git`, `__pycache__`, `*.pyc`, `.venv`, `node_modules` are excluded.
|
||||||
|
This means `uv sync` runs on the remote after rsync (handled by `uv run`).
|
||||||
|
|
||||||
|
## Architecture notes
|
||||||
|
|
||||||
|
- Sandbox isolation: docker compose project name `e2e-{repo}-{sandbox_id}`
|
||||||
|
- Sandbox dir: `/tmp/custodian-e2e/{sandbox_id}/`
|
||||||
|
- No port conflicts: each sandbox uses its own docker network
|
||||||
|
- Parallel runs of the same repo are safe (different sandbox_id)
|
||||||
1
e2e-framework/__init__.py
Normal file
1
e2e-framework/__init__.py
Normal file
@@ -0,0 +1 @@
|
|||||||
|
"""Custodian cross-repo e2e sandbox framework."""
|
||||||
2
e2e-framework/__main__.py
Normal file
2
e2e-framework/__main__.py
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
from .cli import main
|
||||||
|
main()
|
||||||
77
e2e-framework/cli.py
Normal file
77
e2e-framework/cli.py
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
"""
|
||||||
|
Entry point: python -m e2e_framework <repo-path> [options]
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python -m e2e_framework ~/activity-core
|
||||||
|
python -m e2e_framework ~/activity-core --host 92.205.130.254
|
||||||
|
python -m e2e_framework ~/activity-core --host railiance01 --keep
|
||||||
|
make e2e REPO=activity-core (from the-custodian/)
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from .runner import run_e2e
|
||||||
|
from .reporter import report
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(description="Run e2e tests in a remote sandbox")
|
||||||
|
parser.add_argument("repo_path", help="Path to the repo containing e2e/e2e.yml")
|
||||||
|
parser.add_argument(
|
||||||
|
"--host",
|
||||||
|
default=os.environ.get("RAILIANCE01_HOST", ""),
|
||||||
|
help="Sandbox host (SSH alias or IP). Env: RAILIANCE01_HOST",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--user",
|
||||||
|
default=os.environ.get("RAILIANCE01_USER", "root"),
|
||||||
|
help="SSH user (default: root). Env: RAILIANCE01_USER",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--key",
|
||||||
|
default=os.environ.get("RAILIANCE01_KEY"),
|
||||||
|
help="Path to SSH private key. Env: RAILIANCE01_KEY",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--keep",
|
||||||
|
action="store_true",
|
||||||
|
help="Keep sandbox after run (skip compose down + dir removal)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--workstream-id",
|
||||||
|
default=None,
|
||||||
|
help="State-hub workstream ID to attach the progress event to",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--no-report",
|
||||||
|
action="store_true",
|
||||||
|
help="Skip posting results to state-hub",
|
||||||
|
)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if not args.host:
|
||||||
|
print("ERROR: sandbox host required. Set RAILIANCE01_HOST or pass --host.")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
repo_path = Path(args.repo_path).expanduser().resolve()
|
||||||
|
if not repo_path.exists():
|
||||||
|
print(f"ERROR: repo path does not exist: {repo_path}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
result = run_e2e(
|
||||||
|
repo_path=repo_path,
|
||||||
|
host=args.host,
|
||||||
|
ssh_user=args.user,
|
||||||
|
ssh_key=args.key,
|
||||||
|
keep=args.keep,
|
||||||
|
)
|
||||||
|
|
||||||
|
if not args.no_report:
|
||||||
|
report(result, workstream_id=args.workstream_id)
|
||||||
|
|
||||||
|
sys.exit(0 if result.passed else 1)
|
||||||
50
e2e-framework/reporter.py
Normal file
50
e2e-framework/reporter.py
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
"""
|
||||||
|
Push e2e run results to the state-hub as a progress event.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import urllib.request
|
||||||
|
import urllib.error
|
||||||
|
from .runner import RunResult
|
||||||
|
|
||||||
|
STATE_HUB_URL = "http://127.0.0.1:8000"
|
||||||
|
|
||||||
|
|
||||||
|
def report(result: RunResult, workstream_id: str | None = None) -> bool:
|
||||||
|
"""POST result to state-hub add_progress_event. Returns True on success."""
|
||||||
|
body = {
|
||||||
|
"event_type": "e2e_result",
|
||||||
|
"repo": result.repo,
|
||||||
|
"sandbox_id": result.sandbox_id,
|
||||||
|
"passed": result.passed,
|
||||||
|
"exit_code": result.exit_code,
|
||||||
|
"duration_s": round(result.duration_s, 1),
|
||||||
|
}
|
||||||
|
if result.error:
|
||||||
|
body["error"] = result.error
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"summary": (
|
||||||
|
f"E2E {'PASSED' if result.passed else 'FAILED'}: {result.repo} "
|
||||||
|
f"(sandbox={result.sandbox_id}, {result.duration_s:.0f}s)"
|
||||||
|
),
|
||||||
|
"details": json.dumps(body),
|
||||||
|
"event_type": "e2e_result",
|
||||||
|
}
|
||||||
|
if workstream_id:
|
||||||
|
payload["workstream_id"] = workstream_id
|
||||||
|
|
||||||
|
try:
|
||||||
|
req = urllib.request.Request(
|
||||||
|
f"{STATE_HUB_URL}/progress/",
|
||||||
|
data=json.dumps(payload).encode(),
|
||||||
|
headers={"Content-Type": "application/json"},
|
||||||
|
method="POST",
|
||||||
|
)
|
||||||
|
with urllib.request.urlopen(req, timeout=10) as resp:
|
||||||
|
print(f"[reporter] progress event recorded (status={resp.status})")
|
||||||
|
return True
|
||||||
|
except urllib.error.URLError as exc:
|
||||||
|
print(f"[reporter] WARNING: could not reach state-hub: {exc}")
|
||||||
|
return False
|
||||||
131
e2e-framework/runner.py
Normal file
131
e2e-framework/runner.py
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
"""
|
||||||
|
Full e2e lifecycle: up → health-wait → test → down → result.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import time
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from .sandbox import Sandbox
|
||||||
|
from .schema import E2EConfig
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class RunResult:
|
||||||
|
sandbox_id: str
|
||||||
|
repo: str
|
||||||
|
passed: bool
|
||||||
|
exit_code: int
|
||||||
|
duration_s: float
|
||||||
|
output: str
|
||||||
|
error: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
def run_e2e(
|
||||||
|
repo_path: Path,
|
||||||
|
host: str,
|
||||||
|
ssh_user: str = "root",
|
||||||
|
ssh_key: str | None = None,
|
||||||
|
keep: bool = False,
|
||||||
|
) -> RunResult:
|
||||||
|
config = E2EConfig.load(repo_path)
|
||||||
|
sandbox = Sandbox(
|
||||||
|
host=host,
|
||||||
|
repo_path=repo_path,
|
||||||
|
ssh_user=ssh_user,
|
||||||
|
ssh_key=ssh_key,
|
||||||
|
)
|
||||||
|
project_name = f"e2e-{config.name}-{sandbox.sandbox_id}"
|
||||||
|
compose_path = f"{sandbox.remote_dir}/{config.compose_file}"
|
||||||
|
started = time.time()
|
||||||
|
output_lines: list[str] = []
|
||||||
|
|
||||||
|
def log(msg: str) -> None:
|
||||||
|
print(msg)
|
||||||
|
output_lines.append(msg)
|
||||||
|
|
||||||
|
log(f"\n{'='*60}")
|
||||||
|
log(f"E2E run: {config.name} sandbox={sandbox.sandbox_id}")
|
||||||
|
log(f"Host: {host} project: {project_name}")
|
||||||
|
log(f"{'='*60}\n")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# 1. Provision
|
||||||
|
sandbox.provision()
|
||||||
|
|
||||||
|
# 2. docker compose up
|
||||||
|
env_flags = " ".join(f"-e {k}={v}" for k, v in config.env.items())
|
||||||
|
up_cmd = (
|
||||||
|
f"cd {sandbox.remote_dir} && "
|
||||||
|
f"docker compose -p {project_name} -f {compose_path} up -d"
|
||||||
|
)
|
||||||
|
log(f"[runner] docker compose up ({project_name})")
|
||||||
|
rc, out = sandbox.run(up_cmd, timeout=180, stream=False)
|
||||||
|
output_lines.append(out)
|
||||||
|
if rc != 0:
|
||||||
|
raise RuntimeError(f"docker compose up failed (exit {rc}):\n{out}")
|
||||||
|
|
||||||
|
# 3. Health checks
|
||||||
|
for hc in config.health_checks:
|
||||||
|
ok = sandbox.wait_for_url(hc.url, timeout=hc.timeout)
|
||||||
|
if not ok:
|
||||||
|
raise RuntimeError(f"Health check failed: {hc.name} ({hc.url})")
|
||||||
|
|
||||||
|
# 4. Run tests
|
||||||
|
log(f"\n[runner] running: {config.test_command}")
|
||||||
|
test_cmd = f"cd {sandbox.remote_dir} && {config.test_command}"
|
||||||
|
rc, test_out = sandbox.run(test_cmd, timeout=config.timeout, stream=False)
|
||||||
|
output_lines.append(test_out)
|
||||||
|
print(test_out)
|
||||||
|
|
||||||
|
passed = rc == 0
|
||||||
|
duration = time.time() - started
|
||||||
|
log(f"\n[runner] {'PASSED' if passed else 'FAILED'} (exit={rc}, {duration:.1f}s)")
|
||||||
|
|
||||||
|
return RunResult(
|
||||||
|
sandbox_id=sandbox.sandbox_id,
|
||||||
|
repo=config.name,
|
||||||
|
passed=passed,
|
||||||
|
exit_code=rc,
|
||||||
|
duration_s=duration,
|
||||||
|
output="\n".join(output_lines),
|
||||||
|
)
|
||||||
|
|
||||||
|
except Exception as exc:
|
||||||
|
duration = time.time() - started
|
||||||
|
log(f"\n[runner] ERROR: {exc}")
|
||||||
|
return RunResult(
|
||||||
|
sandbox_id=sandbox.sandbox_id,
|
||||||
|
repo=config.name,
|
||||||
|
passed=False,
|
||||||
|
exit_code=-1,
|
||||||
|
duration_s=duration,
|
||||||
|
output="\n".join(output_lines),
|
||||||
|
error=str(exc),
|
||||||
|
)
|
||||||
|
|
||||||
|
finally:
|
||||||
|
_compose_down(sandbox, project_name, compose_path, config, keep)
|
||||||
|
|
||||||
|
|
||||||
|
def _compose_down(
|
||||||
|
sandbox: Sandbox,
|
||||||
|
project_name: str,
|
||||||
|
compose_path: str,
|
||||||
|
config: E2EConfig,
|
||||||
|
keep: bool,
|
||||||
|
) -> None:
|
||||||
|
if keep or config.cleanup == "never":
|
||||||
|
print(f"[runner] skipping cleanup (keep={keep}, cleanup={config.cleanup})")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"[runner] docker compose down ({project_name})")
|
||||||
|
down_cmd = (
|
||||||
|
f"cd {sandbox.remote_dir} && "
|
||||||
|
f"docker compose -p {project_name} -f {compose_path} down -v --remove-orphans 2>&1 || true"
|
||||||
|
)
|
||||||
|
sandbox.run(down_cmd, timeout=60)
|
||||||
|
|
||||||
|
if not keep:
|
||||||
|
sandbox.teardown()
|
||||||
108
e2e-framework/sandbox.py
Normal file
108
e2e-framework/sandbox.py
Normal file
@@ -0,0 +1,108 @@
|
|||||||
|
"""
|
||||||
|
SSH-based sandbox: provision an isolated directory on the remote host,
|
||||||
|
rsync the repo into it, and run arbitrary commands there.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
import uuid
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Sandbox:
|
||||||
|
host: str
|
||||||
|
repo_path: Path
|
||||||
|
sandbox_id: str = field(default_factory=lambda: str(uuid.uuid4())[:8])
|
||||||
|
ssh_user: str = "root"
|
||||||
|
ssh_key: str | None = None # path to private key; None = ssh default
|
||||||
|
base_dir: str = "/tmp/custodian-e2e"
|
||||||
|
|
||||||
|
@property
|
||||||
|
def remote_dir(self) -> str:
|
||||||
|
return f"{self.base_dir}/{self.sandbox_id}"
|
||||||
|
|
||||||
|
@property
|
||||||
|
def ssh_target(self) -> str:
|
||||||
|
return f"{self.ssh_user}@{self.host}"
|
||||||
|
|
||||||
|
# ── low-level helpers ────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def _ssh_args(self) -> list[str]:
|
||||||
|
args = ["ssh", "-o", "StrictHostKeyChecking=no",
|
||||||
|
"-o", "BatchMode=yes",
|
||||||
|
"-o", "ConnectTimeout=15"]
|
||||||
|
if self.ssh_key:
|
||||||
|
args += ["-i", self.ssh_key]
|
||||||
|
args.append(self.ssh_target)
|
||||||
|
return args
|
||||||
|
|
||||||
|
def run(self, cmd: str, *, timeout: int = 60, stream: bool = False) -> tuple[int, str]:
|
||||||
|
"""Run a shell command on the remote host. Returns (exit_code, stdout+stderr)."""
|
||||||
|
full_cmd = self._ssh_args() + [cmd]
|
||||||
|
if stream:
|
||||||
|
proc = subprocess.Popen(full_cmd, text=True,
|
||||||
|
stdout=sys.stdout, stderr=sys.stderr)
|
||||||
|
proc.wait(timeout=timeout)
|
||||||
|
return proc.returncode, ""
|
||||||
|
else:
|
||||||
|
result = subprocess.run(full_cmd, capture_output=True, text=True, timeout=timeout)
|
||||||
|
return result.returncode, result.stdout + result.stderr
|
||||||
|
|
||||||
|
# ── lifecycle ────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def provision(self) -> None:
|
||||||
|
"""Create the remote sandbox directory and rsync the repo into it."""
|
||||||
|
print(f"[sandbox] provisioning {self.remote_dir} on {self.host}")
|
||||||
|
rc, out = self.run(f"mkdir -p {self.remote_dir}")
|
||||||
|
if rc != 0:
|
||||||
|
raise RuntimeError(f"Failed to create remote sandbox dir: {out}")
|
||||||
|
|
||||||
|
rsync_args = [
|
||||||
|
"rsync", "-az", "--delete",
|
||||||
|
"--exclude=.git",
|
||||||
|
"--exclude=__pycache__",
|
||||||
|
"--exclude=*.pyc",
|
||||||
|
"--exclude=.venv",
|
||||||
|
"--exclude=node_modules",
|
||||||
|
]
|
||||||
|
if self.ssh_key:
|
||||||
|
rsync_args += ["-e", f"ssh -i {self.ssh_key} -o StrictHostKeyChecking=no"]
|
||||||
|
else:
|
||||||
|
rsync_args += ["-e", "ssh -o StrictHostKeyChecking=no"]
|
||||||
|
|
||||||
|
rsync_args += [
|
||||||
|
f"{self.repo_path}/",
|
||||||
|
f"{self.ssh_target}:{self.remote_dir}/",
|
||||||
|
]
|
||||||
|
|
||||||
|
print(f"[sandbox] rsyncing {self.repo_path} → remote:{self.remote_dir}")
|
||||||
|
result = subprocess.run(rsync_args, capture_output=True, text=True, timeout=120)
|
||||||
|
if result.returncode != 0:
|
||||||
|
raise RuntimeError(f"rsync failed: {result.stdout + result.stderr}")
|
||||||
|
|
||||||
|
def teardown(self) -> None:
|
||||||
|
"""Remove the remote sandbox directory."""
|
||||||
|
print(f"[sandbox] tearing down {self.remote_dir}")
|
||||||
|
rc, out = self.run(f"rm -rf {self.remote_dir}", timeout=30)
|
||||||
|
if rc != 0:
|
||||||
|
print(f"[sandbox] WARNING: teardown failed (manual cleanup may be needed): {out}")
|
||||||
|
|
||||||
|
def wait_for_url(self, url: str, timeout: int = 120, interval: int = 5) -> bool:
|
||||||
|
"""Poll a URL on the remote host until it returns HTTP 2xx or timeout."""
|
||||||
|
print(f"[sandbox] waiting for {url} (timeout={timeout}s)")
|
||||||
|
deadline = time.time() + timeout
|
||||||
|
while time.time() < deadline:
|
||||||
|
rc, _ = self.run(
|
||||||
|
f"curl -sf --max-time 5 {url} > /dev/null 2>&1",
|
||||||
|
timeout=15,
|
||||||
|
)
|
||||||
|
if rc == 0:
|
||||||
|
print(f"[sandbox] {url} is up")
|
||||||
|
return True
|
||||||
|
time.sleep(interval)
|
||||||
|
print(f"[sandbox] TIMEOUT waiting for {url}")
|
||||||
|
return False
|
||||||
61
e2e-framework/schema.py
Normal file
61
e2e-framework/schema.py
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
"""
|
||||||
|
Parse and validate e2e.yml — the per-repo test contract.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import yaml
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Literal
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class HealthCheck:
|
||||||
|
name: str
|
||||||
|
url: str
|
||||||
|
timeout: int = 120 # seconds
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class E2EConfig:
|
||||||
|
name: str
|
||||||
|
compose_file: str # relative to repo root
|
||||||
|
test_command: str
|
||||||
|
health_checks: list[HealthCheck] = field(default_factory=list)
|
||||||
|
timeout: int = 300 # hard limit for test_command
|
||||||
|
cleanup: Literal["always", "on_success", "never"] = "always"
|
||||||
|
env: dict[str, str] = field(default_factory=dict)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def load(cls, repo_root: Path) -> "E2EConfig":
|
||||||
|
config_path = repo_root / "e2e" / "e2e.yml"
|
||||||
|
if not config_path.exists():
|
||||||
|
raise FileNotFoundError(f"No e2e.yml found at {config_path}")
|
||||||
|
|
||||||
|
raw = yaml.safe_load(config_path.read_text())
|
||||||
|
|
||||||
|
health_checks = [
|
||||||
|
HealthCheck(
|
||||||
|
name=hc.get("name", hc["url"]),
|
||||||
|
url=hc["url"],
|
||||||
|
timeout=int(hc.get("timeout", 120)),
|
||||||
|
)
|
||||||
|
for hc in raw.get("health_checks", [])
|
||||||
|
]
|
||||||
|
|
||||||
|
env = {}
|
||||||
|
for item in raw.get("env", []):
|
||||||
|
if isinstance(item, dict) and "key" in item:
|
||||||
|
env[item["key"]] = str(item.get("value", ""))
|
||||||
|
elif isinstance(item, dict):
|
||||||
|
env.update(item)
|
||||||
|
|
||||||
|
return cls(
|
||||||
|
name=raw["name"],
|
||||||
|
compose_file=raw["compose_file"],
|
||||||
|
test_command=raw["test_command"],
|
||||||
|
health_checks=health_checks,
|
||||||
|
timeout=int(raw.get("timeout", 300)),
|
||||||
|
cleanup=raw.get("cleanup", "always"),
|
||||||
|
env=env,
|
||||||
|
)
|
||||||
169
workplans/CUST-WP-0028-e2e-sandbox-framework.md
Normal file
169
workplans/CUST-WP-0028-e2e-sandbox-framework.md
Normal file
@@ -0,0 +1,169 @@
|
|||||||
|
---
|
||||||
|
id: CUST-WP-0028
|
||||||
|
type: workplan
|
||||||
|
title: "Cross-Repo E2E Sandbox Framework"
|
||||||
|
domain: railiance
|
||||||
|
repo: the-custodian
|
||||||
|
status: active
|
||||||
|
owner: custodian
|
||||||
|
topic_slug: railiance
|
||||||
|
created: "2026-03-27"
|
||||||
|
updated: "2026-03-27"
|
||||||
|
state_hub_workstream_id: "b68de20b-e397-4f97-b1be-ad30711fc2a6"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Cross-Repo E2E Sandbox Framework
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
End-to-end tests that require a real running stack (Temporal, Postgres, workers)
|
||||||
|
cannot be automated in CI or run locally without significant setup friction.
|
||||||
|
Each repo has to reinvent its own e2e story. activity-core T21 is the immediate
|
||||||
|
trigger: the full RunActivityWorkflow flow can't be exercised without a live
|
||||||
|
Temporal cluster.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
A **convention + runtime** that any repo can opt into by dropping in an `e2e/`
|
||||||
|
folder. The shared framework, living in `the-custodian/e2e-framework/`, handles:
|
||||||
|
|
||||||
|
- Provisioning an isolated sandbox on a remote host (railiance01)
|
||||||
|
- `rsync` + `docker compose up` with a unique project name (no port conflicts)
|
||||||
|
- Health polling until the stack is ready
|
||||||
|
- Running the repo's test command and capturing results
|
||||||
|
- `docker compose down` (even on failure)
|
||||||
|
- Reporting structured results to the state-hub
|
||||||
|
|
||||||
|
Each repo just provides: `e2e/e2e.yml` + `e2e/compose.yml` + `e2e/tests/`.
|
||||||
|
|
||||||
|
The sandbox host defaults to `RAILIANCE01_HOST` env var (SSH alias or IP).
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
the-custodian/
|
||||||
|
e2e-framework/
|
||||||
|
schema.py # parse and validate e2e.yml
|
||||||
|
sandbox.py # provision/teardown remote sandbox dir via SSH
|
||||||
|
runner.py # rsync, compose up, health-wait, run tests, compose down
|
||||||
|
reporter.py # push structured result to state-hub
|
||||||
|
cli.py # entry point: python -m e2e_framework <repo-path>
|
||||||
|
|
||||||
|
Makefile # e2e target: make e2e REPO=activity-core
|
||||||
|
|
||||||
|
<repo>/
|
||||||
|
e2e/
|
||||||
|
e2e.yml # metadata: compose_file, health_checks, test_command, timeout
|
||||||
|
compose.yml # stack definition (may symlink docker-compose.dev.yml)
|
||||||
|
tests/ # test scripts (pytest, shell, etc.)
|
||||||
|
```
|
||||||
|
|
||||||
|
## e2e.yml contract
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
name: <repo-slug>
|
||||||
|
compose_file: e2e/compose.yml # relative to repo root
|
||||||
|
health_checks:
|
||||||
|
- name: <label>
|
||||||
|
url: http://localhost:<port> # checked from remote machine
|
||||||
|
timeout: 120 # seconds to wait for this check
|
||||||
|
test_command: python -m pytest e2e/tests/ -v --tb=short
|
||||||
|
timeout: 300 # hard timeout for test_command
|
||||||
|
cleanup: always # always | on_success | never
|
||||||
|
```
|
||||||
|
|
||||||
|
## Tasks
|
||||||
|
|
||||||
|
### T01 — e2e-framework core: schema, sandbox, runner
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0028-T01
|
||||||
|
status: done
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "61dbb674-5933-4185-a7af-f9274bdd43c1"
|
||||||
|
```
|
||||||
|
|
||||||
|
Write `the-custodian/e2e-framework/`:
|
||||||
|
- `schema.py` — dataclasses + YAML loader for `e2e.yml`
|
||||||
|
- `sandbox.py` — SSH wrapper: `provision()` (mkdir + rsync), `run()`, `teardown()`
|
||||||
|
- `runner.py` — full lifecycle: up → health-wait → test → down
|
||||||
|
|
||||||
|
The SSH transport uses subprocess + system `ssh` (no extra deps). rsync over SSH.
|
||||||
|
Health checks curl from within the remote machine via `sandbox.run()`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T02 — reporter, CLI, Makefile target
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0028-T02
|
||||||
|
status: done
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "cd845d62-0ab7-4180-bc87-59789883258d"
|
||||||
|
```
|
||||||
|
|
||||||
|
Write:
|
||||||
|
- `reporter.py` — POST structured result to state-hub `add_progress_event`
|
||||||
|
- `cli.py` — `python -m e2e_framework <repo-path> [--host HOST] [--keep]`
|
||||||
|
- `the-custodian/Makefile` — `make e2e REPO=<slug>` target
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T03 — activity-core e2e contract
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0028-T03
|
||||||
|
status: done
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "2ed2b805-2245-4f7e-84d8-4345c6c5455a"
|
||||||
|
```
|
||||||
|
|
||||||
|
In `activity-core/`:
|
||||||
|
- `e2e/e2e.yml` — references `docker-compose.dev.yml`, declares Temporal UI health check
|
||||||
|
- `e2e/compose.yml` — symlink to `../docker-compose.dev.yml`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T04 — activity-core test script (closes T21)
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0028-T04
|
||||||
|
status: done
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "0d92ac62-dc04-495c-85fa-13a66ffe611a"
|
||||||
|
```
|
||||||
|
|
||||||
|
Write `activity-core/e2e/tests/test_full_flow.py`:
|
||||||
|
- Seeds one ActivityDefinition
|
||||||
|
- Triggers RunActivityWorkflow via Temporal client
|
||||||
|
- Polls for workflow completion
|
||||||
|
- Asserts run log written to DB
|
||||||
|
- Clear pass/fail output per step
|
||||||
|
|
||||||
|
Updates activity-core WP-0001 T21 status to `done`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T05 — runbook + smoke test instructions
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0028-T05
|
||||||
|
status: done
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "bbb106bd-bd89-4c11-b136-276e4d670097"
|
||||||
|
```
|
||||||
|
|
||||||
|
Write `e2e-framework/RUNBOOK.md`:
|
||||||
|
- Prerequisites (SSH access to sandbox host, Docker installed)
|
||||||
|
- First run: `export RAILIANCE01_HOST=<ip>; make e2e REPO=activity-core`
|
||||||
|
- Troubleshooting: sandbox cleanup, docker compose project list, log locations
|
||||||
|
|
||||||
|
Note: manual validation on railiance01 still needed (first live run).
|
||||||
|
|
||||||
|
## Done Criteria
|
||||||
|
|
||||||
|
- [ ] `make e2e REPO=activity-core` runs the full stack on railiance01, reports pass/fail
|
||||||
|
- [ ] Sandbox is always cleaned up (compose down + dir removed) unless `--keep`
|
||||||
|
- [ ] Results posted to state-hub as progress event
|
||||||
|
- [ ] activity-core T21 closed by the automated test script
|
||||||
|
- [ ] Any repo can opt in by adding `e2e/e2e.yml`
|
||||||
Reference in New Issue
Block a user