Files
the-custodian/e2e-framework/RUNBOOK.md
tegwick d061c777d1 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for the-custodian
2026-03-27 00:52:18 +01:00

2.7 KiB
Raw Permalink Blame History

E2E Sandbox Framework — Runbook

Prerequisites

Workstation:

  • ssh + rsync available
  • python3 + pyyaml available (or uv run)
  • State-hub running on :8000 (for result reporting)

Sandbox host (railiance01):

  • SSH key access
  • Docker + docker compose plugin installed
  • Sufficient disk for images (~4 GB for activity-core stack)

First run

# Set sandbox host (once, or add to ~/.bashrc / .env)
export RAILIANCE01_HOST=<ip-or-alias>   # e.g. 92.205.130.254
export RAILIANCE01_USER=root             # optional, default=root
export RAILIANCE01_KEY=~/.ssh/id_rsa    # optional, uses ssh default otherwise

# From the-custodian:
make e2e REPO=activity-core

Output will show each step: rsync → compose up → health wait → tests → compose down. Exit code is 0 (all passed) or 1 (any failure).

Options

# Keep sandbox alive after run (for debugging)
make e2e REPO=activity-core KEEP=1

# Override host without env var
make e2e REPO=activity-core HOST=192.168.1.50

# Attach result to a specific state-hub workstream
make e2e REPO=activity-core WORKSTREAM_ID=<uuid>

# Skip posting to state-hub
cd the-custodian && python3 -m e2e_framework ~/activity-core --no-report

Adding a new repo

  1. Create <repo>/e2e/e2e.yml:

    name: <repo-slug>
    compose_file: docker-compose.dev.yml   # or e2e/compose.yml
    health_checks:
      - name: <service>
        url: http://localhost:<port>
        timeout: 120
    test_command: uv run python -m pytest e2e/tests/ -v
    timeout: 300
    cleanup: always
    
  2. Add <repo>/e2e/tests/test_*.py — test scripts that exit 0 on success.

  3. Run: make e2e REPO=<repo>

Troubleshooting

Sandbox not cleaned up:

ssh root@$RAILIANCE01_HOST 'ls /tmp/custodian-e2e/'
ssh root@$RAILIANCE01_HOST 'docker compose ls'
# Manually clean:
ssh root@$RAILIANCE01_HOST 'docker compose -p e2e-activity-core-<id> down -v; rm -rf /tmp/custodian-e2e/<id>'

Temporal startup slow (>2 min): Elasticsearch takes 6090 seconds. The health check waits up to 180s. If it times out, check:

ssh root@$RAILIANCE01_HOST 'docker logs temporal-elasticsearch | tail -20'

Worker fails to start: Check that uv is installed on the sandbox host:

ssh root@$RAILIANCE01_HOST 'which uv || curl -LsSf https://astral.sh/uv/install.sh | sh'

rsync excluded paths: .git, __pycache__, *.pyc, .venv, node_modules are excluded. This means uv sync runs on the remote after rsync (handled by uv run).

Architecture notes

  • Sandbox isolation: docker compose project name e2e-{repo}-{sandbox_id}
  • Sandbox dir: /tmp/custodian-e2e/{sandbox_id}/
  • No port conflicts: each sandbox uses its own docker network
  • Parallel runs of the same repo are safe (different sandbox_id)