Files
the-custodian/workplans/CUST-WP-0039-dashboard-poll-optimization.md
tegwick 41ce4ede2c feat(dashboard): poll optimisation — T4, T5, T6
T4: workstreams.md and dependencies.md now call /state/deps instead of the
    full /state/summary — removes 2 heavy 10-table queries per 60 s cycle.

T5: index.md's 4 independent polling loops (summaryState, sbomSnapState,
    regsState, wsChartState) consolidated into a single pageState generator
    with one Promise.all batch and a shared backoff counter.

T6: config.js gains waitForVisible(ms) — pauses polling entirely while the
    tab is hidden and fires immediately on visibilitychange.  pollDelay()
    simplified (hidden-tab POLL_HIDDEN logic removed).  All 16 polling pages
    migrated from await sleep(pollDelay(...)) to await waitForVisible(pollDelay(...)).

CUST-WP-0039 complete — all 6 tasks done.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 17:58:18 +02:00

210 lines
7.1 KiB
Markdown

---
id: CUST-WP-0039
type: workplan
title: "Dashboard Poll Optimization"
domain: custodian
status: done
owner: custodian
topic_slug: custodian
created: "2026-05-11"
updated: "2026-05-11"
state_hub_workstream_id: "d5ffb008-a517-4b8b-86ce-093fcc285fb3"
---
# Dashboard Poll Optimization
## Problem
With `uvicorn --reload` watching `.venv/` now fixed (CUST-WP-0039 precursor), the
remaining sustained load on the API worker comes from the dashboard polling pattern:
- **24 pages**, 14 with active polling loops (POLL_HEAVY = 60 s, POLL = 15 s)
- **`index.md` alone** runs 4 independent polling loops firing 11 API calls per cycle:
`/state/summary`, `/sbom/snapshots/`, `/progress/`, `/workstreams/`, `/tasks/?limit=2000`,
`/topics/`, `/repos/`, `/workstreams/workplan-index`
- **`workstreams.md` and `dependencies.md`** each call `/state/summary` (the most
expensive endpoint — queries 10+ tables) every 60 s just to extract dependency
edges from `open_workstreams[].depends_on`
- **Reference data** (`/topics/`, `/repos/`) is fetched independently by 10+ pages
every 60 s with no caching; these datasets change rarely
- **Background tabs** still poll at 120 s (`POLL_HIDDEN`) — they could pause entirely
## Goals
Reduce API request rate and per-request cost when the dashboard is open, without
degrading UX or data freshness for the pages the user is actively viewing.
## Out of scope
- SSE / WebSocket push (would require significant API rework)
- Observable data loaders / static build mode (different deployment model)
- BroadcastChannel cross-tab sharing (nice-to-have, not in this workplan)
---
## Tasks
### T1 — Add Cache-Control headers to reference endpoints
```task
id: CUST-WP-0039-T1
status: done
priority: high
state_hub_task_id: "b36713d8-d1d5-43c5-86c3-e22f72b68d62"
```
Add `Cache-Control: max-age=60, stale-while-revalidate=30` to the list responses
for `/topics/`, `/repos/`, and `/domains/`. These datasets change only when a human
explicitly creates/renames a domain or registers a repo — never on their own.
Browser-level caching means that when 10 pages all fetch `/topics/` within a 60 s
window, only the first request hits the API; the rest are served from cache.
**Implementation:** Add a FastAPI middleware or a response-header dependency in
`api/routers/topics.py`, `repos.py`, and `domains.py` list endpoints. Use
`from fastapi.responses import Response` + `response.headers["Cache-Control"]`, or
a shared `cache_headers` dependency.
---
### T2 — Add ETag support to high-frequency list endpoints
```task
id: CUST-WP-0039-T2
status: done
priority: high
state_hub_task_id: "75f1c2cd-0baf-4747-8c67-1dbfa81bde41"
```
Add `ETag` (content hash of the response body) and handle `If-None-Match` for
`/workstreams/`, `/tasks/`, and `/state/summary`. When the data hasn't changed the
API returns `304 Not Modified` with no body — roughly 95% smaller than a full
response.
**Implementation:**
- Add a FastAPI middleware (in `api/main.py`) that intercepts JSON list responses,
computes `md5(body)`, sets `ETag: "<hash>"`, and returns 304 if the request
carries a matching `If-None-Match` header.
- No client changes needed — `fetch()` respects ETags automatically when the
response includes `Cache-Control: no-cache` (which forces revalidation but
allows 304).
---
### T3 — Add lightweight `/state/deps` endpoint
```task
id: CUST-WP-0039-T3
status: done
priority: high
state_hub_task_id: "cb7608d3-5dad-4b51-9b91-080539f7aa65"
```
`workstreams.md` and `dependencies.md` call `/state/summary` (a ~10-table query)
only to extract `open_workstreams[].{id, depends_on, blocks}`. Add a dedicated
endpoint that returns just this:
```json
GET /state/deps
[{"id": "...", "title": "...", "status": "...", "depends_on": [...], "blocks": [...]}]
```
Query: `SELECT id, title, status FROM workstreams WHERE status IN ('active','blocked')`
plus the dependency join — roughly 1/10th the work of the full summary.
**Implementation:** New route in `api/routers/state.py` (or a new `deps.py`).
Schema: `WorkstreamDepStub` already exists in `api/schemas/workstream_dependency.py`
— reuse or extend it.
---
### T4 — Replace `/state/summary` in workstreams.md and dependencies.md
```task
id: CUST-WP-0039-T4
status: done
priority: medium
depends_on: [CUST-WP-0039-T3]
state_hub_task_id: "b80dce9c-b1ef-4606-9460-5100d6f58bce"
```
Switch `workstreams.md` and `dependencies.md` to use the new `/state/deps` endpoint
instead of the full `/state/summary`. Both pages construct a dep-edge map from
`open_workstreams[].depends_on`; `/state/deps` provides exactly that.
Changes:
- `dashboard/src/workstreams.md`: replace `apiFetch("/state/summary", ...)` with
`apiFetch("/state/deps")`, update the variable extraction (`openWs = depsData`)
- `dashboard/src/dependencies.md`: same substitution, update edge-building loop
---
### T5 — Consolidate index.md's 4 polling loops into 1
```task
id: CUST-WP-0039-T5
status: done
priority: medium
state_hub_task_id: "7c2d5e01-9de5-48ad-aa0b-a37cf5332ad9"
```
`index.md` runs 4 independent `while(true)` generators (`summaryState`,
`sbomSnapState`, `regsState`, `wsChartState`) that each sleep 60 s independently.
They were split because different sections needed different data, but they all use
POLL_HEAVY and can be unified into a single loop with one `Promise.all` that fetches
all 8 endpoints together.
Benefits:
- 4 timers → 1: simpler, predictable, backoff applies uniformly
- Fetch batching: all 8 requests fire simultaneously, most finish within the same
server round-trip window
- Simpler failure handling: one `failures` counter, one backoff
Approach: single `pageState` generator that yields a flat object with all fields
(summary, snapshots, milestones, wsAll). Destructure at the use sites.
---
### T6 — Full visibility-based polling pause in config.js
```task
id: CUST-WP-0039-T6
status: done
priority: low
state_hub_task_id: "31b6a353-040a-4f87-b2f1-1deab5cf6191"
```
`pollDelay()` currently extends the interval to `POLL_HIDDEN = 120 s` when the tab
is hidden. Change this to pause polling entirely while hidden and resume immediately
on `visibilitychange`.
**Implementation:**
```js
// config.js — replace pollDelay() with:
export async function waitForVisible(base) {
if (typeof document === "undefined") return sleep(base);
if (document.visibilityState === "visible") return sleep(base);
return new Promise(resolve => {
const handler = () => { document.removeEventListener("visibilitychange", handler); resolve(); };
document.addEventListener("visibilitychange", handler);
});
}
```
Pages replace `await sleep(pollDelay(...))` with `await waitForVisible(base)`.
When the user switches back to the tab, the next poll fires immediately rather
than waiting up to 120 s for the backoff to expire.
---
## Expected impact
| Change | Request reduction |
|--------|------------------|
| T1 (cache headers) | ~70% drop in /topics, /repos, /domains hits |
| T2 (ETags) | ~80% payload reduction for unchanged list responses |
| T3+T4 (deps endpoint) | 2 full summary calls removed per 60 s cycle |
| T5 (consolidate index) | 4 loops → 1, reduces timer jitter and staggered load |
| T6 (visibility pause) | Eliminates all background-tab traffic entirely |