Updated by fix-consistency on 2026-03-27: - update .custodian-brief.md for the-custodian
2.7 KiB
2.7 KiB
E2E Sandbox Framework — Runbook
Prerequisites
Workstation:
ssh+rsyncavailablepython3+pyyamlavailable (oruv run)- State-hub running on
:8000(for result reporting)
Sandbox host (railiance01):
- SSH key access
- Docker + docker compose plugin installed
- Sufficient disk for images (~4 GB for activity-core stack)
First run
# Set sandbox host (once, or add to ~/.bashrc / .env)
export RAILIANCE01_HOST=<ip-or-alias> # e.g. 92.205.130.254
export RAILIANCE01_USER=root # optional, default=root
export RAILIANCE01_KEY=~/.ssh/id_rsa # optional, uses ssh default otherwise
# From the-custodian:
make e2e REPO=activity-core
Output will show each step: rsync → compose up → health wait → tests → compose down. Exit code is 0 (all passed) or 1 (any failure).
Options
# Keep sandbox alive after run (for debugging)
make e2e REPO=activity-core KEEP=1
# Override host without env var
make e2e REPO=activity-core HOST=192.168.1.50
# Attach result to a specific state-hub workstream
make e2e REPO=activity-core WORKSTREAM_ID=<uuid>
# Skip posting to state-hub
cd the-custodian && python3 -m e2e_framework ~/activity-core --no-report
Adding a new repo
-
Create
<repo>/e2e/e2e.yml:name: <repo-slug> compose_file: docker-compose.dev.yml # or e2e/compose.yml health_checks: - name: <service> url: http://localhost:<port> timeout: 120 test_command: uv run python -m pytest e2e/tests/ -v timeout: 300 cleanup: always -
Add
<repo>/e2e/tests/test_*.py— test scripts that exit 0 on success. -
Run:
make e2e REPO=<repo>
Troubleshooting
Sandbox not cleaned up:
ssh root@$RAILIANCE01_HOST 'ls /tmp/custodian-e2e/'
ssh root@$RAILIANCE01_HOST 'docker compose ls'
# Manually clean:
ssh root@$RAILIANCE01_HOST 'docker compose -p e2e-activity-core-<id> down -v; rm -rf /tmp/custodian-e2e/<id>'
Temporal startup slow (>2 min): Elasticsearch takes 60–90 seconds. The health check waits up to 180s. If it times out, check:
ssh root@$RAILIANCE01_HOST 'docker logs temporal-elasticsearch | tail -20'
Worker fails to start:
Check that uv is installed on the sandbox host:
ssh root@$RAILIANCE01_HOST 'which uv || curl -LsSf https://astral.sh/uv/install.sh | sh'
rsync excluded paths:
.git, __pycache__, *.pyc, .venv, node_modules are excluded.
This means uv sync runs on the remote after rsync (handled by uv run).
Architecture notes
- Sandbox isolation: docker compose project name
e2e-{repo}-{sandbox_id} - Sandbox dir:
/tmp/custodian-e2e/{sandbox_id}/ - No port conflicts: each sandbox uses its own docker network
- Parallel runs of the same repo are safe (different sandbox_id)