fix(STATE-WP-0064): parse consistency sweep stdout with skip prefixes

Extract the JSON payload from mixed script output and document Railiance01
kubectl sync steps. Mark T02 done after cluster bridge and resolver canaries.
This commit is contained in:
2026-06-21 20:56:35 +02:00
parent acc5bea15b
commit 821b5d6c89
4 changed files with 62 additions and 14 deletions

View File

@@ -50,11 +50,27 @@ def _parse_stderr(stderr: str) -> dict[str, list[str]]:
}
def _extract_json_payload(text: str) -> Any:
stripped = text.strip()
if not stripped:
return []
decoder = json.JSONDecoder()
for index, char in enumerate(stripped):
if char not in "{[":
continue
try:
payload, _end = decoder.raw_decode(stripped, index)
return payload
except json.JSONDecodeError:
continue
raise json.JSONDecodeError("No JSON payload found", stripped, 0)
def _parse_stdout(stdout: str) -> list[ConsistencySweepRepoResult]:
text = stdout.strip()
if not text:
return []
payload = json.loads(text)
payload = _extract_json_payload(text)
items = payload if isinstance(payload, list) else [payload]
results: list[ConsistencySweepRepoResult] = []
for item in items:

View File

@@ -33,23 +33,31 @@ service target).
From the activity-core host, confirm the definition is synced and the
Temporal schedule exists:
```bash
cd ~/activity-core
ACTIVITY_DEFINITION_DIRS=/home/worsch/the-custodian make sync-activity-definitions
```
Reconcile Temporal schedules (pick one):
Run on **Railiance01** (the laptop `.env` points at docker-compose hostnames
like `app-db` and will time out from WSL):
```bash
# Preferred when activity-core API is up (no worker restart)
curl -s -X POST 'http://localhost:8010/admin/sync?definitions=true&schedules=true'
export KUBECONFIG=~/.kube/config-hosteurope
# CLI fallback
ACTCORE_DB_URL=... TEMPORAL_HOST=... uv run python -m activity_core.sync_schedules
# 1. Apply runtime manifest when definitions change
kubectl apply -f ~/activity-core/k8s/railiance/20-runtime.yaml
# 2. Sync definitions into Postgres
kubectl -n activity-core delete job actcore-sync --ignore-not-found
kubectl apply -f ~/activity-core/k8s/railiance/20-runtime.yaml
kubectl -n activity-core wait --for=condition=complete job/actcore-sync --timeout=180s
# 3. Reconcile Temporal schedules
kubectl -n activity-core exec deploy/actcore-worker -- python -m activity_core.sync_schedules
```
On Railiance01, use the in-cluster activity-core API URL and env from the
deployment instead of `localhost:8010`.
After changing application code, rebuild and import `activity-core:railiance01-prod`
per `activity-core/k8s/railiance/README.md`, then restart
`actcore-worker`, `actcore-api`, and `actcore-event-router`.
Ensure `state-hub-railiance01` ops-bridge tunnel is `connected` before
cluster-triggered sweeps; the in-cluster bridge proxy allows up to 360s for
POST requests.
Expected definition:

View File

@@ -97,6 +97,18 @@ def test_parse_stderr_extracts_skip_lists():
}
def test_extract_json_payload_skips_human_readable_prefix_lines():
stdout = (
" CLEAN (skipped): quiet-repo\n"
" BUDGET EXHAUSTED after 30s (skipped): other-repo\n"
'{\n "repo_slug": "state-hub",\n "repo_path": "/home/worsch/state-hub",\n'
' "result": "pass",\n "summary": {"fail": 0, "warn": 0, "info": 0},\n'
' "fixes_applied": []\n}\n'
)
payload = sweep_service._extract_json_payload(stdout)
assert payload["repo_slug"] == "state-hub"
def test_parse_stdout_handles_single_and_batch_payloads():
single = json.dumps(
{

View File

@@ -92,7 +92,7 @@ Done 2026-06-21:
```task
id: STATE-WP-0064-T02
status: todo
status: done
priority: high
state_hub_task_id: "2e9b5b66-a7b1-46a5-8e1f-22e6b5caeff6"
```
@@ -104,6 +104,18 @@ Trigger one manual ActivityRun. Confirm:
- progress or activity-core run history shows success
- no duplicate side-effects when local timer also fires (idempotent)
Done 2026-06-21:
- Applied `20-runtime.yaml` on Railiance01; `actcore-sync` upserted definition
`7c4e9a12-8f3b-4d5e-9c6a-1b2d3e4f5a6b` (paused schedule).
- Rebuilt/imported `activity-core:railiance01-prod` with
`consistency_sweep_remote_all` resolver.
- Bridge proxy POST timeout raised to 360s (30s was aborting sweeps).
- Manual canaries: cluster POST via bridge (`exit_code 0`, progress event
`65d0bc12-…`) and worker resolver (`exit_code 0`, 1 repo @ 60s budget).
- Laptop `make sync-activity-definitions` is not valid against Railiance01 DB;
use kubectl `actcore-sync` job instead.
## T3 — Parallel run and observability
```task