Updated by fix-consistency on 2026-03-27: - update .custodian-brief.md for the-custodian
98 lines
2.7 KiB
Markdown
98 lines
2.7 KiB
Markdown
# E2E Sandbox Framework — Runbook
|
||
|
||
## Prerequisites
|
||
|
||
**Workstation:**
|
||
- `ssh` + `rsync` available
|
||
- `python3` + `pyyaml` available (or `uv run`)
|
||
- State-hub running on `:8000` (for result reporting)
|
||
|
||
**Sandbox host (railiance01):**
|
||
- SSH key access
|
||
- Docker + docker compose plugin installed
|
||
- Sufficient disk for images (~4 GB for activity-core stack)
|
||
|
||
## First run
|
||
|
||
```bash
|
||
# Set sandbox host (once, or add to ~/.bashrc / .env)
|
||
export RAILIANCE01_HOST=<ip-or-alias> # e.g. 92.205.130.254
|
||
export RAILIANCE01_USER=root # optional, default=root
|
||
export RAILIANCE01_KEY=~/.ssh/id_rsa # optional, uses ssh default otherwise
|
||
|
||
# From the-custodian:
|
||
make e2e REPO=activity-core
|
||
```
|
||
|
||
Output will show each step: rsync → compose up → health wait → tests → compose down.
|
||
Exit code is 0 (all passed) or 1 (any failure).
|
||
|
||
## Options
|
||
|
||
```bash
|
||
# Keep sandbox alive after run (for debugging)
|
||
make e2e REPO=activity-core KEEP=1
|
||
|
||
# Override host without env var
|
||
make e2e REPO=activity-core HOST=192.168.1.50
|
||
|
||
# Attach result to a specific state-hub workstream
|
||
make e2e REPO=activity-core WORKSTREAM_ID=<uuid>
|
||
|
||
# Skip posting to state-hub
|
||
cd the-custodian && python3 -m e2e_framework ~/activity-core --no-report
|
||
```
|
||
|
||
## Adding a new repo
|
||
|
||
1. Create `<repo>/e2e/e2e.yml`:
|
||
```yaml
|
||
name: <repo-slug>
|
||
compose_file: docker-compose.dev.yml # or e2e/compose.yml
|
||
health_checks:
|
||
- name: <service>
|
||
url: http://localhost:<port>
|
||
timeout: 120
|
||
test_command: uv run python -m pytest e2e/tests/ -v
|
||
timeout: 300
|
||
cleanup: always
|
||
```
|
||
|
||
2. Add `<repo>/e2e/tests/test_*.py` — test scripts that exit 0 on success.
|
||
|
||
3. Run: `make e2e REPO=<repo>`
|
||
|
||
## Troubleshooting
|
||
|
||
**Sandbox not cleaned up:**
|
||
```bash
|
||
ssh root@$RAILIANCE01_HOST 'ls /tmp/custodian-e2e/'
|
||
ssh root@$RAILIANCE01_HOST 'docker compose ls'
|
||
# Manually clean:
|
||
ssh root@$RAILIANCE01_HOST 'docker compose -p e2e-activity-core-<id> down -v; rm -rf /tmp/custodian-e2e/<id>'
|
||
```
|
||
|
||
**Temporal startup slow (>2 min):**
|
||
Elasticsearch takes 60–90 seconds. The health check waits up to 180s.
|
||
If it times out, check:
|
||
```bash
|
||
ssh root@$RAILIANCE01_HOST 'docker logs temporal-elasticsearch | tail -20'
|
||
```
|
||
|
||
**Worker fails to start:**
|
||
Check that `uv` is installed on the sandbox host:
|
||
```bash
|
||
ssh root@$RAILIANCE01_HOST 'which uv || curl -LsSf https://astral.sh/uv/install.sh | sh'
|
||
```
|
||
|
||
**rsync excluded paths:**
|
||
`.git`, `__pycache__`, `*.pyc`, `.venv`, `node_modules` are excluded.
|
||
This means `uv sync` runs on the remote after rsync (handled by `uv run`).
|
||
|
||
## Architecture notes
|
||
|
||
- Sandbox isolation: docker compose project name `e2e-{repo}-{sandbox_id}`
|
||
- Sandbox dir: `/tmp/custodian-e2e/{sandbox_id}/`
|
||
- No port conflicts: each sandbox uses its own docker network
|
||
- Parallel runs of the same repo are safe (different sandbox_id)
|