Add REUSE-WP-0013 workplan for registry establish, update, and stats
Some checks failed
ci / validate-registry (push) Has been cancelled

Proposes deterministic bootstrap and analytics CLI plus optional llm-connect
assist for sibling repo capability index publishing and registry maintenance.
This commit is contained in:
2026-06-16 01:12:34 +02:00
parent 9bb8b4a21b
commit c81c18c607

View File

@@ -0,0 +1,241 @@
---
id: REUSE-WP-0013
type: workplan
title: "Registry establish, update, and stats with optional llm-connect assist"
domain: helix_forge
repo: reuse-surface
status: ready
owner: codex
topic_slug: helix-forge
created: "2026-06-16"
updated: "2026-06-16"
---
# Registry establish, update, and stats with optional llm-connect assist
Follow-up to operator feedback and REUSE-WP-0012-T01 blocks
(`history/2026-06-16-hub-registration-blocks.md`). Sibling repos cannot federate
until each publishes `registry/indexes/capabilities.yaml`; today that requires
manual layout copy and authoring. This workplan adds **deterministic bootstrap
and analytics CLI** plus **optional llm-connect-backed discovery and refresh**
so domain repos can establish and maintain registries with low ceremony.
**Baseline vector:** `D5 / A4 / C5 / R3`
**Target vector after completion:** `D5 / A4 / C5 / R3` (tooling depth; no
product-vector promotion unless dogfood evidence warrants)
## Problem statement
| Pain | Today | Target |
|---|---|---|
| Bootstrap registry in sibling repo | Manual copy from reuse-surface | `reuse-surface establish` |
| Keep entries aligned with repo reality | Manual edits + validate | `reuse-surface update` |
| Portfolio / federation readiness view | `report cohorts`, manual curls | `reuse-surface stats` |
| Draft capability metadata from repo context | Agent improvisation | llm-connect structured extract |
## Design principles
1. **Deterministic first**`stats`, `establish --scaffold`, and
`establish --publish-check` work without llm-connect or API keys.
2. **LLM optional**`establish --discover` and `update --suggest` call
llm-connect HTTP (`POST /execute`) or library adapter when configured.
3. **Dry-run default** — LLM and write paths require explicit `--apply`.
4. **Validate gate** — every `--apply` path ends with `reuse-surface validate`
(or fails with schema errors).
5. **Boundary** — reuse-surface does not embed provider keys; llm-connect owns
routing and credentials.
## Suggested execution order
```text
T01 stats (deterministic, unblocks observability)
→ T02 establish --scaffold (unblocks sibling publish path)
→ T03 establish --publish-check (verifies raw URL / federation readiness)
→ T04 llm-connect bridge + draft JSON schema
→ T05 establish --discover
→ T06 update (deterministic signals + optional LLM suggest)
→ T07 docs, tests, gap-analysis note
```
## Dependencies
| Dependency | Owner | Notes |
|---|---|---|
| llm-connect server or package | llm-connect | `LLM_CONNECT_URL` or editable install |
| Capability JSON schema | reuse-surface | `schemas/capability.schema.yaml` unchanged |
| Sibling repo apply | Domain owners | `establish` run in target repo checkout |
| Hub token | Operator | `hub register` remains separate post-publish |
## Proposed CLI surface
```bash
# Deterministic
reuse-surface stats
reuse-surface stats --format json --federation-ready
reuse-surface establish --scaffold --domain helix_forge [--path .]
reuse-surface establish --publish-check --raw-url <gitea-raw-url>
# LLM-assisted (optional backend)
export LLM_CONNECT_URL=http://127.0.0.1:8088
reuse-surface establish --discover [--path .] --dry-run
reuse-surface establish --discover --apply
reuse-surface update --capability <id> --dry-run
reuse-surface update --all --suggest-maturity --dry-run
reuse-surface update --from-git-since HEAD~5 --apply
```
---
## Add Registry Stats Command
```task
id: REUSE-WP-0013-T01
status: todo
priority: high
```
Deliver `reuse-surface stats` with deterministic aggregates:
- Capability count; maturity histogram (D/A/C/R bands)
- Entries at R0R2 vs R3+; consumption mode counts
- Index vs entry vector drift count
- Federation readiness: local `sources.yaml` / index presence; optional
`--federation-ready` raw URL probe (HTTP status)
- Hub summary when `REUSE_SURFACE_URL` set (registration count)
Output: Markdown default, `--format json`. Pytest coverage. Document in
`tools/README.md`.
## Implement establish --scaffold
```task
id: REUSE-WP-0013-T02
status: todo
priority: high
```
Bootstrap `registry/` in target repo (`--path`, default cwd):
- `registry/README.md` (pointer to reuse-surface schema and validate)
- `registry/capabilities/.gitkeep`
- `registry/indexes/capabilities.yaml` with `version`, `domain`, `capabilities: []`
- Refuse overwrite unless `--force`
- Print next steps: add entry, validate, merge to main, publish-check
No llm-connect dependency. Pytest with temp directory.
## Implement establish --publish-check
```task
id: REUSE-WP-0013-T03
status: todo
priority: medium
```
Federation publish helper for sibling repo operators:
- `curl`-equivalent probe of `--raw-url` (status, content-type hint)
- Validate local index YAML if `--path` has `registry/indexes/capabilities.yaml`
- Report pass/fail with remediation tied to `docs/RegistryFederation.md`
- Exit non-zero on hard failures (non-200, invalid YAML)
## Add llm-connect Bridge And Draft Schema
```task
id: REUSE-WP-0013-T04
status: todo
priority: high
```
Thin client boundary:
- `reuse_surface/llm_bridge.py``POST {LLM_CONNECT_URL}/execute`, parse JSON
from response content
- `schemas/registry-draft.schema.json` — structured draft shape (capability list
with id, name, summary, vector, tags, consumption_modes, discovery intent)
- Env: `LLM_CONNECT_URL` (default none); graceful error when missing on LLM paths
- Optional `pyproject.toml` extra: `llm``llm-connect` dependency
- Pytest with mocked HTTP (no live LLM in CI)
## Implement establish --discover
```task
id: REUSE-WP-0013-T05
status: todo
priority: medium
```
LLM-assisted bootstrap after `--scaffold` or on empty registry:
- Collect context files: `INTENT.md`, `SCOPE.md`, `README*`, `pyproject.toml`,
`AGENTS.md`, top-level package dirs (configurable `--context-max-files`)
- Prompt template + schema-constrained JSON via llm-connect
- `--dry-run`: print proposed entries and index rows
- `--apply`: write `registry/capabilities/*.md` from template merge + index update;
run validate before success exit
- Document prompt assumptions and review checklist in `registry/README.md`
## Implement update Command
```task
id: REUSE-WP-0013-T06
status: todo
priority: medium
```
Refresh existing entries from repo signals:
**Deterministic (no LLM):**
- Index/entry vector mismatch detection
- New paths under `tests/`, CLI modules → suggest `evidence.tests` /
`availability.current_artifacts` additions
- `--apply` for safe deterministic patches only (explicit list in code)
**Optional LLM (`--suggest`, `--suggest-maturity`):**
- Git diff or file snapshot → proposed `promotion_history`, evidence notes
- Always `--dry-run` unless `--apply`; validate after apply
Targets: single `--capability`, `--all`, `--from-git-since <ref>`.
## Documentation, Tests, And Gap Note
```task
id: REUSE-WP-0013-T07
status: todo
priority: low
```
- `tools/README.md` — full command reference
- `docs/RegistryFederation.md` — link establish/publish-check to sibling onboarding
- `docs/IntentScopeGapAnalysis.md` — add priority **24** (registry bootstrap
tooling); mark open
- `SCOPE.md` — "What Is Possible Now" when T01T03 ship (incremental)
- CI: `stats` informational step; no llm-connect in CI
- Total pytest increase; `reuse-surface validate` unchanged
---
## Acceptance
- [ ] `reuse-surface stats` reports maturity and federation-readiness aggregates
- [ ] `establish --scaffold` creates valid empty registry layout without overwrite accidents
- [ ] `establish --publish-check` detects 303 vs 200 raw URL outcomes
- [ ] llm-connect bridge works with mocked HTTP; fails clearly when URL unset
- [ ] `establish --discover --dry-run` produces schema-valid draft JSON from fixture context
- [ ] `update --dry-run` reports deterministic drift on sample repo
- [ ] All new commands documented; gap priority 24 recorded
## Out of scope
- Auto `hub register` (still operator step with token)
- Auto-merge LLM output without human review path
- Embedding similarity or overlap ML (keep `overlaps` token heuristic)
- llm-connect hosting or provider configuration inside reuse-surface
## Dogfood target
Run `establish --scaffold` and `establish --publish-check` against `state-hub`
checkout when available; optional `establish --discover` to seed
`capability.statehub.workstream-coordinate` from existing docs.