diff --git a/workplans/REUSE-WP-0013-registry-establish-and-llm-assist.md b/workplans/REUSE-WP-0013-registry-establish-and-llm-assist.md new file mode 100644 index 0000000..cfeee06 --- /dev/null +++ b/workplans/REUSE-WP-0013-registry-establish-and-llm-assist.md @@ -0,0 +1,241 @@ +--- +id: REUSE-WP-0013 +type: workplan +title: "Registry establish, update, and stats with optional llm-connect assist" +domain: helix_forge +repo: reuse-surface +status: ready +owner: codex +topic_slug: helix-forge +created: "2026-06-16" +updated: "2026-06-16" +--- + +# Registry establish, update, and stats with optional llm-connect assist + +Follow-up to operator feedback and REUSE-WP-0012-T01 blocks +(`history/2026-06-16-hub-registration-blocks.md`). Sibling repos cannot federate +until each publishes `registry/indexes/capabilities.yaml`; today that requires +manual layout copy and authoring. This workplan adds **deterministic bootstrap +and analytics CLI** plus **optional llm-connect-backed discovery and refresh** +so domain repos can establish and maintain registries with low ceremony. + +**Baseline vector:** `D5 / A4 / C5 / R3` +**Target vector after completion:** `D5 / A4 / C5 / R3` (tooling depth; no +product-vector promotion unless dogfood evidence warrants) + +## Problem statement + +| Pain | Today | Target | +|---|---|---| +| Bootstrap registry in sibling repo | Manual copy from reuse-surface | `reuse-surface establish` | +| Keep entries aligned with repo reality | Manual edits + validate | `reuse-surface update` | +| Portfolio / federation readiness view | `report cohorts`, manual curls | `reuse-surface stats` | +| Draft capability metadata from repo context | Agent improvisation | llm-connect structured extract | + +## Design principles + +1. **Deterministic first** — `stats`, `establish --scaffold`, and + `establish --publish-check` work without llm-connect or API keys. +2. **LLM optional** — `establish --discover` and `update --suggest` call + llm-connect HTTP (`POST /execute`) or library adapter when configured. +3. **Dry-run default** — LLM and write paths require explicit `--apply`. +4. **Validate gate** — every `--apply` path ends with `reuse-surface validate` + (or fails with schema errors). +5. **Boundary** — reuse-surface does not embed provider keys; llm-connect owns + routing and credentials. + +## Suggested execution order + +```text +T01 stats (deterministic, unblocks observability) + → T02 establish --scaffold (unblocks sibling publish path) + → T03 establish --publish-check (verifies raw URL / federation readiness) + → T04 llm-connect bridge + draft JSON schema + → T05 establish --discover + → T06 update (deterministic signals + optional LLM suggest) + → T07 docs, tests, gap-analysis note +``` + +## Dependencies + +| Dependency | Owner | Notes | +|---|---|---| +| llm-connect server or package | llm-connect | `LLM_CONNECT_URL` or editable install | +| Capability JSON schema | reuse-surface | `schemas/capability.schema.yaml` unchanged | +| Sibling repo apply | Domain owners | `establish` run in target repo checkout | +| Hub token | Operator | `hub register` remains separate post-publish | + +## Proposed CLI surface + +```bash +# Deterministic +reuse-surface stats +reuse-surface stats --format json --federation-ready +reuse-surface establish --scaffold --domain helix_forge [--path .] +reuse-surface establish --publish-check --raw-url + +# LLM-assisted (optional backend) +export LLM_CONNECT_URL=http://127.0.0.1:8088 +reuse-surface establish --discover [--path .] --dry-run +reuse-surface establish --discover --apply +reuse-surface update --capability --dry-run +reuse-surface update --all --suggest-maturity --dry-run +reuse-surface update --from-git-since HEAD~5 --apply +``` + +--- + +## Add Registry Stats Command + +```task +id: REUSE-WP-0013-T01 +status: todo +priority: high +``` + +Deliver `reuse-surface stats` with deterministic aggregates: + +- Capability count; maturity histogram (D/A/C/R bands) +- Entries at R0–R2 vs R3+; consumption mode counts +- Index vs entry vector drift count +- Federation readiness: local `sources.yaml` / index presence; optional + `--federation-ready` raw URL probe (HTTP status) +- Hub summary when `REUSE_SURFACE_URL` set (registration count) + +Output: Markdown default, `--format json`. Pytest coverage. Document in +`tools/README.md`. + +## Implement establish --scaffold + +```task +id: REUSE-WP-0013-T02 +status: todo +priority: high +``` + +Bootstrap `registry/` in target repo (`--path`, default cwd): + +- `registry/README.md` (pointer to reuse-surface schema and validate) +- `registry/capabilities/.gitkeep` +- `registry/indexes/capabilities.yaml` with `version`, `domain`, `capabilities: []` +- Refuse overwrite unless `--force` +- Print next steps: add entry, validate, merge to main, publish-check + +No llm-connect dependency. Pytest with temp directory. + +## Implement establish --publish-check + +```task +id: REUSE-WP-0013-T03 +status: todo +priority: medium +``` + +Federation publish helper for sibling repo operators: + +- `curl`-equivalent probe of `--raw-url` (status, content-type hint) +- Validate local index YAML if `--path` has `registry/indexes/capabilities.yaml` +- Report pass/fail with remediation tied to `docs/RegistryFederation.md` +- Exit non-zero on hard failures (non-200, invalid YAML) + +## Add llm-connect Bridge And Draft Schema + +```task +id: REUSE-WP-0013-T04 +status: todo +priority: high +``` + +Thin client boundary: + +- `reuse_surface/llm_bridge.py` — `POST {LLM_CONNECT_URL}/execute`, parse JSON + from response content +- `schemas/registry-draft.schema.json` — structured draft shape (capability list + with id, name, summary, vector, tags, consumption_modes, discovery intent) +- Env: `LLM_CONNECT_URL` (default none); graceful error when missing on LLM paths +- Optional `pyproject.toml` extra: `llm` → `llm-connect` dependency +- Pytest with mocked HTTP (no live LLM in CI) + +## Implement establish --discover + +```task +id: REUSE-WP-0013-T05 +status: todo +priority: medium +``` + +LLM-assisted bootstrap after `--scaffold` or on empty registry: + +- Collect context files: `INTENT.md`, `SCOPE.md`, `README*`, `pyproject.toml`, + `AGENTS.md`, top-level package dirs (configurable `--context-max-files`) +- Prompt template + schema-constrained JSON via llm-connect +- `--dry-run`: print proposed entries and index rows +- `--apply`: write `registry/capabilities/*.md` from template merge + index update; + run validate before success exit +- Document prompt assumptions and review checklist in `registry/README.md` + +## Implement update Command + +```task +id: REUSE-WP-0013-T06 +status: todo +priority: medium +``` + +Refresh existing entries from repo signals: + +**Deterministic (no LLM):** + +- Index/entry vector mismatch detection +- New paths under `tests/`, CLI modules → suggest `evidence.tests` / + `availability.current_artifacts` additions +- `--apply` for safe deterministic patches only (explicit list in code) + +**Optional LLM (`--suggest`, `--suggest-maturity`):** + +- Git diff or file snapshot → proposed `promotion_history`, evidence notes +- Always `--dry-run` unless `--apply`; validate after apply + +Targets: single `--capability`, `--all`, `--from-git-since `. + +## Documentation, Tests, And Gap Note + +```task +id: REUSE-WP-0013-T07 +status: todo +priority: low +``` + +- `tools/README.md` — full command reference +- `docs/RegistryFederation.md` — link establish/publish-check to sibling onboarding +- `docs/IntentScopeGapAnalysis.md` — add priority **24** (registry bootstrap + tooling); mark open +- `SCOPE.md` — "What Is Possible Now" when T01–T03 ship (incremental) +- CI: `stats` informational step; no llm-connect in CI +- Total pytest increase; `reuse-surface validate` unchanged + +--- + +## Acceptance + +- [ ] `reuse-surface stats` reports maturity and federation-readiness aggregates +- [ ] `establish --scaffold` creates valid empty registry layout without overwrite accidents +- [ ] `establish --publish-check` detects 303 vs 200 raw URL outcomes +- [ ] llm-connect bridge works with mocked HTTP; fails clearly when URL unset +- [ ] `establish --discover --dry-run` produces schema-valid draft JSON from fixture context +- [ ] `update --dry-run` reports deterministic drift on sample repo +- [ ] All new commands documented; gap priority 24 recorded + +## Out of scope + +- Auto `hub register` (still operator step with token) +- Auto-merge LLM output without human review path +- Embedding similarity or overlap ML (keep `overlaps` token heuristic) +- llm-connect hosting or provider configuration inside reuse-surface + +## Dogfood target + +Run `establish --scaffold` and `establish --publish-check` against `state-hub` +checkout when available; optional `establish --discover` to seed +`capability.statehub.workstream-coordinate` from existing docs. \ No newline at end of file