chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27: - update .custodian-brief.md for the-custodian
This commit is contained in:
97
e2e-framework/RUNBOOK.md
Normal file
97
e2e-framework/RUNBOOK.md
Normal file
@@ -0,0 +1,97 @@
|
||||
# E2E Sandbox Framework — Runbook
|
||||
|
||||
## Prerequisites
|
||||
|
||||
**Workstation:**
|
||||
- `ssh` + `rsync` available
|
||||
- `python3` + `pyyaml` available (or `uv run`)
|
||||
- State-hub running on `:8000` (for result reporting)
|
||||
|
||||
**Sandbox host (railiance01):**
|
||||
- SSH key access
|
||||
- Docker + docker compose plugin installed
|
||||
- Sufficient disk for images (~4 GB for activity-core stack)
|
||||
|
||||
## First run
|
||||
|
||||
```bash
|
||||
# Set sandbox host (once, or add to ~/.bashrc / .env)
|
||||
export RAILIANCE01_HOST=<ip-or-alias> # e.g. 92.205.130.254
|
||||
export RAILIANCE01_USER=root # optional, default=root
|
||||
export RAILIANCE01_KEY=~/.ssh/id_rsa # optional, uses ssh default otherwise
|
||||
|
||||
# From the-custodian:
|
||||
make e2e REPO=activity-core
|
||||
```
|
||||
|
||||
Output will show each step: rsync → compose up → health wait → tests → compose down.
|
||||
Exit code is 0 (all passed) or 1 (any failure).
|
||||
|
||||
## Options
|
||||
|
||||
```bash
|
||||
# Keep sandbox alive after run (for debugging)
|
||||
make e2e REPO=activity-core KEEP=1
|
||||
|
||||
# Override host without env var
|
||||
make e2e REPO=activity-core HOST=192.168.1.50
|
||||
|
||||
# Attach result to a specific state-hub workstream
|
||||
make e2e REPO=activity-core WORKSTREAM_ID=<uuid>
|
||||
|
||||
# Skip posting to state-hub
|
||||
cd the-custodian && python3 -m e2e_framework ~/activity-core --no-report
|
||||
```
|
||||
|
||||
## Adding a new repo
|
||||
|
||||
1. Create `<repo>/e2e/e2e.yml`:
|
||||
```yaml
|
||||
name: <repo-slug>
|
||||
compose_file: docker-compose.dev.yml # or e2e/compose.yml
|
||||
health_checks:
|
||||
- name: <service>
|
||||
url: http://localhost:<port>
|
||||
timeout: 120
|
||||
test_command: uv run python -m pytest e2e/tests/ -v
|
||||
timeout: 300
|
||||
cleanup: always
|
||||
```
|
||||
|
||||
2. Add `<repo>/e2e/tests/test_*.py` — test scripts that exit 0 on success.
|
||||
|
||||
3. Run: `make e2e REPO=<repo>`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Sandbox not cleaned up:**
|
||||
```bash
|
||||
ssh root@$RAILIANCE01_HOST 'ls /tmp/custodian-e2e/'
|
||||
ssh root@$RAILIANCE01_HOST 'docker compose ls'
|
||||
# Manually clean:
|
||||
ssh root@$RAILIANCE01_HOST 'docker compose -p e2e-activity-core-<id> down -v; rm -rf /tmp/custodian-e2e/<id>'
|
||||
```
|
||||
|
||||
**Temporal startup slow (>2 min):**
|
||||
Elasticsearch takes 60–90 seconds. The health check waits up to 180s.
|
||||
If it times out, check:
|
||||
```bash
|
||||
ssh root@$RAILIANCE01_HOST 'docker logs temporal-elasticsearch | tail -20'
|
||||
```
|
||||
|
||||
**Worker fails to start:**
|
||||
Check that `uv` is installed on the sandbox host:
|
||||
```bash
|
||||
ssh root@$RAILIANCE01_HOST 'which uv || curl -LsSf https://astral.sh/uv/install.sh | sh'
|
||||
```
|
||||
|
||||
**rsync excluded paths:**
|
||||
`.git`, `__pycache__`, `*.pyc`, `.venv`, `node_modules` are excluded.
|
||||
This means `uv sync` runs on the remote after rsync (handled by `uv run`).
|
||||
|
||||
## Architecture notes
|
||||
|
||||
- Sandbox isolation: docker compose project name `e2e-{repo}-{sandbox_id}`
|
||||
- Sandbox dir: `/tmp/custodian-e2e/{sandbox_id}/`
|
||||
- No port conflicts: each sandbox uses its own docker network
|
||||
- Parallel runs of the same repo are safe (different sandbox_id)
|
||||
Reference in New Issue
Block a user