Files
the-custodian/e2e-framework/RUNBOOK.md
tegwick d061c777d1 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:52:18 +01:00

98 lines
2.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# E2E Sandbox Framework — Runbook
## Prerequisites
**Workstation:**
- `ssh` + `rsync` available
- `python3` + `pyyaml` available (or `uv run`)
- State-hub running on `:8000` (for result reporting)
**Sandbox host (railiance01):**
- SSH key access
- Docker + docker compose plugin installed
- Sufficient disk for images (~4 GB for activity-core stack)
## First run
```bash
# Set sandbox host (once, or add to ~/.bashrc / .env)
export RAILIANCE01_HOST=<ip-or-alias> # e.g. 92.205.130.254
export RAILIANCE01_USER=root # optional, default=root
export RAILIANCE01_KEY=~/.ssh/id_rsa # optional, uses ssh default otherwise
# From the-custodian:
make e2e REPO=activity-core
```
Output will show each step: rsync → compose up → health wait → tests → compose down.
Exit code is 0 (all passed) or 1 (any failure).
## Options
```bash
# Keep sandbox alive after run (for debugging)
make e2e REPO=activity-core KEEP=1
# Override host without env var
make e2e REPO=activity-core HOST=192.168.1.50
# Attach result to a specific state-hub workstream
make e2e REPO=activity-core WORKSTREAM_ID=<uuid>
# Skip posting to state-hub
cd the-custodian && python3 -m e2e_framework ~/activity-core --no-report
```
## Adding a new repo
1. Create `<repo>/e2e/e2e.yml`:
```yaml
name: <repo-slug>
compose_file: docker-compose.dev.yml # or e2e/compose.yml
health_checks:
- name: <service>
url: http://localhost:<port>
timeout: 120
test_command: uv run python -m pytest e2e/tests/ -v
timeout: 300
cleanup: always
```
2. Add `<repo>/e2e/tests/test_*.py` — test scripts that exit 0 on success.
3. Run: `make e2e REPO=<repo>`
## Troubleshooting
**Sandbox not cleaned up:**
```bash
ssh root@$RAILIANCE01_HOST 'ls /tmp/custodian-e2e/'
ssh root@$RAILIANCE01_HOST 'docker compose ls'
# Manually clean:
ssh root@$RAILIANCE01_HOST 'docker compose -p e2e-activity-core-<id> down -v; rm -rf /tmp/custodian-e2e/<id>'
```
**Temporal startup slow (>2 min):**
Elasticsearch takes 6090 seconds. The health check waits up to 180s.
If it times out, check:
```bash
ssh root@$RAILIANCE01_HOST 'docker logs temporal-elasticsearch | tail -20'
```
**Worker fails to start:**
Check that `uv` is installed on the sandbox host:
```bash
ssh root@$RAILIANCE01_HOST 'which uv || curl -LsSf https://astral.sh/uv/install.sh | sh'
```
**rsync excluded paths:**
`.git`, `__pycache__`, `*.pyc`, `.venv`, `node_modules` are excluded.
This means `uv sync` runs on the remote after rsync (handled by `uv run`).
## Architecture notes
- Sandbox isolation: docker compose project name `e2e-{repo}-{sandbox_id}`
- Sandbox dir: `/tmp/custodian-e2e/{sandbox_id}/`
- No port conflicts: each sandbox uses its own docker network
- Parallel runs of the same repo are safe (different sandbox_id)