Implement REUSE-WP-0013 registry establish, update, and stats
Some checks failed
ci / validate-registry (push) Has been cancelled

Add stats, establish (scaffold, publish-check, discover), and update CLI
commands with optional llm-connect bridge, validate --root for sibling repos,
pytest coverage, and documentation for sibling registry onboarding.
This commit is contained in:
2026-06-16 01:21:01 +02:00
parent fb712b4b98
commit 70a5003f6e
19 changed files with 1740 additions and 30 deletions

View File

@@ -0,0 +1,256 @@
---
id: REUSE-WP-0013
type: workplan
title: "Registry establish, update, and stats with optional llm-connect assist"
domain: helix_forge
repo: reuse-surface
status: finished
owner: codex
topic_slug: helix-forge
created: "2026-06-16"
updated: "2026-06-17"
state_hub_workstream_id: "239a0077-8593-4dc7-918d-4c23895275f6"
---
# Registry establish, update, and stats with optional llm-connect assist
Follow-up to operator feedback and REUSE-WP-0012-T01 blocks
(`history/2026-06-16-hub-registration-blocks.md`). Sibling repos cannot federate
until each publishes `registry/indexes/capabilities.yaml`; today that requires
manual layout copy and authoring. This workplan adds **deterministic bootstrap
and analytics CLI** plus **optional llm-connect-backed discovery and refresh**
so domain repos can establish and maintain registries with low ceremony.
**Baseline vector:** `D5 / A4 / C5 / R3`
**Target vector after completion:** `D5 / A4 / C5 / R3` (tooling depth; no
product-vector promotion unless dogfood evidence warrants)
## Problem statement
| Pain | Today | Target |
|---|---|---|
| Bootstrap registry in sibling repo | Manual copy from reuse-surface | `reuse-surface establish` |
| Keep entries aligned with repo reality | Manual edits + validate | `reuse-surface update` |
| Portfolio / federation readiness view | `report cohorts`, manual curls | `reuse-surface stats` |
| Draft capability metadata from repo context | Agent improvisation | llm-connect structured extract |
## Design principles
1. **Deterministic first**`stats`, `establish --scaffold`, and
`establish --publish-check` work without llm-connect or API keys.
2. **LLM optional**`establish --discover` and `update --suggest` call
llm-connect HTTP (`POST /execute`) or library adapter when configured.
3. **Dry-run default** — LLM and write paths require explicit `--apply`.
4. **Validate gate** — every `--apply` path ends with `reuse-surface validate`
(or fails with schema errors).
5. **Boundary** — reuse-surface does not embed provider keys; llm-connect owns
routing and credentials.
## Suggested execution order
```text
T01 stats (deterministic, unblocks observability)
→ T02 establish --scaffold (unblocks sibling publish path)
→ T03 establish --publish-check (verifies raw URL / federation readiness)
→ T04 llm-connect bridge + draft JSON schema
→ T05 establish --discover
→ T06 update (deterministic signals + optional LLM suggest)
→ T07 docs, tests, gap-analysis note
```
## Dependencies
| Dependency | Owner | Notes |
|---|---|---|
| llm-connect server or package | llm-connect | `LLM_CONNECT_URL` or editable install |
| Capability JSON schema | reuse-surface | `schemas/capability.schema.yaml` unchanged |
| Sibling repo apply | Domain owners | `establish` run in target repo checkout |
| Hub token | Operator | `hub register` remains separate post-publish |
## Proposed CLI surface
```bash
# Deterministic
reuse-surface stats
reuse-surface stats --format json --federation-ready
reuse-surface establish --scaffold --domain helix_forge [--path .]
reuse-surface establish --publish-check --raw-url <gitea-raw-url>
# LLM-assisted (optional backend)
export LLM_CONNECT_URL=http://127.0.0.1:8088
reuse-surface establish --discover [--path .] --dry-run
reuse-surface establish --discover --apply
reuse-surface update --capability <id> --dry-run
reuse-surface update --all --suggest-maturity --dry-run
reuse-surface update --from-git-since HEAD~5 --apply
```
---
## Add Registry Stats Command
```task
id: REUSE-WP-0013-T01
status: done
priority: high
state_hub_task_id: "98e65330-bfc7-4282-b372-d35542b899ce"
```
Deliver `reuse-surface stats` with deterministic aggregates:
- Capability count; maturity histogram (D/A/C/R bands)
- Entries at R0R2 vs R3+; consumption mode counts
- Index vs entry vector drift count
- Federation readiness: local `sources.yaml` / index presence; optional
`--federation-ready` raw URL probe (HTTP status)
- Hub summary when `REUSE_SURFACE_URL` set (registration count)
Output: Markdown default, `--format json`. Pytest coverage. Document in
`tools/README.md`.
## Implement establish --scaffold
```task
id: REUSE-WP-0013-T02
status: done
priority: high
state_hub_task_id: "b8fedd87-d0d3-41b4-9af8-e36d52bfe1c5"
```
Bootstrap `registry/` in target repo (`--path`, default cwd):
- `registry/README.md` (pointer to reuse-surface schema and validate)
- `registry/capabilities/.gitkeep`
- `registry/indexes/capabilities.yaml` with `version`, `domain`, `capabilities: []`
- Refuse overwrite unless `--force`
- Print next steps: add entry, validate, merge to main, publish-check
No llm-connect dependency. Pytest with temp directory.
## Implement establish --publish-check
```task
id: REUSE-WP-0013-T03
status: done
priority: medium
state_hub_task_id: "2924d685-709f-4e28-886f-b363cd9c40b4"
```
Federation publish helper for sibling repo operators:
- `curl`-equivalent probe of `--raw-url` (status, content-type hint)
- Validate local index YAML if `--path` has `registry/indexes/capabilities.yaml`
- Report pass/fail with remediation tied to `docs/RegistryFederation.md`
- Exit non-zero on hard failures (non-200, invalid YAML)
## Add llm-connect Bridge And Draft Schema
```task
id: REUSE-WP-0013-T04
status: done
priority: high
state_hub_task_id: "650ebee5-b34b-4ed8-891d-d93aacebadd7"
```
Thin client boundary:
- `reuse_surface/llm_bridge.py``POST {LLM_CONNECT_URL}/execute`, parse JSON
from response content
- `schemas/registry-draft.schema.json` — structured draft shape (capability list
with id, name, summary, vector, tags, consumption_modes, discovery intent)
- Env: `LLM_CONNECT_URL` (default none); graceful error when missing on LLM paths
- Optional `pyproject.toml` extra: `llm``llm-connect` dependency
- Pytest with mocked HTTP (no live LLM in CI)
## Implement establish --discover
```task
id: REUSE-WP-0013-T05
status: done
priority: medium
state_hub_task_id: "b9154889-f538-4266-9918-b277f9a297be"
```
LLM-assisted bootstrap after `--scaffold` or on empty registry:
- Collect context files: `INTENT.md`, `SCOPE.md`, `README*`, `pyproject.toml`,
`AGENTS.md`, top-level package dirs (configurable `--context-max-files`)
- Prompt template + schema-constrained JSON via llm-connect
- `--dry-run`: print proposed entries and index rows
- `--apply`: write `registry/capabilities/*.md` from template merge + index update;
run validate before success exit
- Document prompt assumptions and review checklist in `registry/README.md`
## Implement update Command
```task
id: REUSE-WP-0013-T06
status: done
priority: medium
state_hub_task_id: "b79558da-54b2-4712-91d2-b298c7cf2c40"
```
Refresh existing entries from repo signals:
**Deterministic (no LLM):**
- Index/entry vector mismatch detection
- New paths under `tests/`, CLI modules → suggest `evidence.tests` /
`availability.current_artifacts` additions
- `--apply` for safe deterministic patches only (explicit list in code)
**Optional LLM (`--suggest`, `--suggest-maturity`):**
- Git diff or file snapshot → proposed `promotion_history`, evidence notes
- Always `--dry-run` unless `--apply`; validate after apply
Targets: single `--capability`, `--all`, `--from-git-since <ref>`.
## Documentation, Tests, And Gap Note
```task
id: REUSE-WP-0013-T07
status: done
priority: low
state_hub_task_id: "a55a2f26-004e-4c20-90cb-49bed64a1291"
```
- `tools/README.md` — full command reference
- `docs/RegistryFederation.md` — link establish/publish-check to sibling onboarding
- `docs/IntentScopeGapAnalysis.md` — add priority **24** (registry bootstrap
tooling); mark open
- `SCOPE.md` — "What Is Possible Now" when T01T03 ship (incremental)
- CI: `stats` informational step; no llm-connect in CI
- Total pytest increase; `reuse-surface validate` unchanged
---
## Acceptance
- [x] `reuse-surface stats` reports maturity and federation-readiness aggregates
- [x] `establish --scaffold` creates valid empty registry layout without overwrite accidents
- [x] `establish --publish-check` detects 303 vs 200 raw URL outcomes
- [x] llm-connect bridge works with mocked HTTP; fails clearly when URL unset
- [x] `establish --discover --dry-run` produces schema-valid draft JSON from fixture context
- [x] `update --dry-run` reports deterministic drift on sample repo
- [x] All new commands documented; gap priority 24 recorded
## Completion notes (2026-06-17)
- Modules: `stats.py`, `establish.py`, `registry_update.py`, `llm_bridge.py`
- Schema: `schemas/registry-draft.schema.json`
- `validate --root` for sibling repo validation after establish --apply
- 43 pytest tests; optional `pip install -e ".[llm]"` extra
## Out of scope
- Auto `hub register` (still operator step with token)
- Auto-merge LLM output without human review path
- Embedding similarity or overlap ML (keep `overlaps` token heuristic)
- llm-connect hosting or provider configuration inside reuse-surface
## Dogfood target
Run `establish --scaffold` and `establish --publish-check` against `state-hub`
checkout when available; optional `establish --discover` to seed
`capability.statehub.workstream-coordinate` from existing docs.