Implement REUSE-WP-0013 registry establish, update, and stats

Add stats, establish (scaffold, publish-check, discover), and update CLI commands with optional llm-connect bridge, validate --root for sibling repos, pytest coverage, and documentation for sibling registry onboarding.
2026-06-16 01:21:01 +02:00
parent fb712b4b98
commit 70a5003f6e
19 changed files with 1740 additions and 30 deletions
--- a/.gitea/workflows/ci.yml
+++ b/.gitea/workflows/ci.yml
@@ -32,6 +32,9 @@ jobs:
          reuse-surface catalog
          reuse-surface graph --check --fail-on-warnings

+      - name: Registry stats (informational)
+        run: reuse-surface stats || true
+
      - name: Planning cohort report (informational)
        run: reuse-surface report cohorts --planning-min D4 || true

--- a/SCOPE.md
+++ b/SCOPE.md
@@ -60,6 +60,11 @@ The MVP registry foundation, CLI tooling (REUSE-WP-0003), federation stack
  against `https://reuse.coulomb.social`
 - **Sync local federation manifest from hub** with `reuse-surface hub sync`
 - **Export planning cohorts** with `reuse-surface report cohorts`
+- **Bootstrap a sibling registry** with `reuse-surface establish --scaffold`
+- **Verify index publish readiness** with `reuse-surface establish --publish-check`
+- **View registry stats** with `reuse-surface stats`
+- **Draft or refresh entries** with `reuse-surface establish --discover` and
+  `reuse-surface update` (optional llm-connect backend)
 - **Run the hub locally or in a container** with `reuse-surface serve`
 - **Generate relation graphs** with `reuse-surface graph`
 - **Explore relations interactively** at `docs/graph/index.html`
@@ -104,8 +109,8 @@ See `tools/README.md` for command reference.
 - **Federated index:** `registry/indexes/federated.yaml` (local compose).
 - **Relation graph:** `docs/graph/capability-graph.mmd`, `docs/graph/index.html`.
 - **Searchable catalog:** `docs/catalog/search.html`.
- **Workplans:** REUSE-WP-0001 through REUSE-WP-0011 finished; WP-0011 archived;
-  **REUSE-WP-0012** finished (federation scale + intent alignment).
+- **Workplans:** REUSE-WP-0001 through REUSE-WP-0012 finished/archived;
+  **REUSE-WP-0013** finished (registry establish/update/stats).
 - **Assessment history:** `history/2026-06-15-intent-scope-assessment.md`.
 - **Self-assessed vector:** `D5 / A4 / C5 / R3` (see `docs/IntentScopeGapAnalysis.md`).

--- a/docs/IntentScopeGapAnalysis.md
+++ b/docs/IntentScopeGapAnalysis.md
@@ -3,7 +3,7 @@
 **Repository:** `reuse-surface`  
 **Artifact:** `docs/IntentScopeGapAnalysis.md`  
 **Status:** Living analysis  
-**Updated:** 2026-06-16  
+**Updated:** 2026-06-17  
 **Purpose:** Record alignment, drift, and open gaps between declared intent and
 current delivered scope so future workplans can close them deliberately.

@@ -30,6 +30,8 @@ four maturity dimensions, and human/agent consumers.
   standardization tracker still manual.
 3. **Hub automation** — `hub sync` shipped; polling/webhooks still absent.
 4. **Managed platform posture** — A5 container documented; A6/Postgres deferred.
+5. **Registry bootstrap in sibling repos** — `establish`/`update`/`stats` shipped;
+   sibling adoption still operator-driven.

 **Current reuse-surface product vector (self-assessment):** `D5 / A4 / C5 / R3`

@@ -197,8 +199,10 @@ archived workplans under `workplans/archived/`.
 | 21 | INTENT layout sync | Update INTENT.md tree and example entry shape | **Closed** (WP-0012) |
 | 22 | Hub hardening | Postgres option, backup, documented SLO (A5→A6 path) | **Closed** (doc; implementation deferred) |
 | 23 | External evidence program | Raise catalog R levels with consumer_feedback | **Closed** (checklist + 3 entries; telemetry deferred) |
+| 24 | Registry bootstrap tooling | `establish`, `update`, `stats` for sibling repos | **Closed** (WP-0013) |

-**Workplan:** `REUSE-WP-0012` (finished). **Assessment snapshots:**
+**Workplan:** `REUSE-WP-0013` (finished). Prior: `REUSE-WP-0012` (finished).
+**Assessment snapshots:**
 `history/2026-06-15-intent-scope-assessment.md`,
 `history/2026-06-16-hub-registration-blocks.md`.

@@ -227,4 +231,5 @@ archived workplans under `workplans/archived/`.
 | 2026-06-15 | REUSE-WP-0011 closed priority 17; hub live at reuse.coulomb.social |
 | 2026-06-15 | Post-WP-0011 refresh: 20 capabilities, vector D5/A4/C4/R3, priorities 18–23 proposed |
 | 2026-06-15 | REUSE-WP-0012 proposed; assessment archived in `history/2026-06-15-intent-scope-assessment.md` |
-| 2026-06-16 | REUSE-WP-0012 closed priorities 19–23; priority 18 deferred on sibling index blocks; vector C5 |
+| 2026-06-16 | REUSE-WP-0012 closed priorities 19–23; priority 18 deferred on sibling index blocks; vector C5 |
+| 2026-06-17 | REUSE-WP-0013 closed priority 24; establish/update/stats + optional llm-connect assist |
--- a/docs/RegistryFederation.md
+++ b/docs/RegistryFederation.md
@@ -97,6 +97,18 @@ curl -fsS "<raw-url>" | head
  source) to an environment variable holding a Bearer token or full header value.
  The hub stores `auth_env` / `auth_header` names only — never secret values.

+### Sibling onboarding (CLI)
+
+```bash
+cd ../state-hub
+reuse-surface establish --scaffold --domain helix_forge
+# optional: LLM_CONNECT_URL=... reuse-surface establish --discover --dry-run
+reuse-surface validate --root .
+git push origin main
+reuse-surface establish --publish-check \
+  --raw-url https://gitea.coulomb.social/coulomb/state-hub/raw/main/registry/indexes/capabilities.yaml
+```
+
 ### Registration checklist

 1. Merge capability index to the default branch.
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -20,6 +20,9 @@ dev = [
  "httpx>=0.27",
  "pytest>=8.0",
 ]
+llm = [
+  "llm-connect",
+]

 [project.scripts]
 reuse-surface = "reuse_surface.cli:main"
--- a/registry/README.md
+++ b/registry/README.md
@@ -35,6 +35,21 @@ registry/

 Missing evidence is acceptable in the MVP when it is explicit rather than hidden.

+## LLM-assisted discover review checklist
+
+When using `reuse-surface establish --discover` (llm-connect backend):
+
+- [ ] Every proposed `id` follows `capability.<domain>.<name>` and is not a duplicate
+- [ ] `summary`, `discovery.intent`, and maturity vectors match repo reality
+- [ ] `owner` reflects the delivering repository or team
+- [ ] Relations are empty or manually added after human review
+- [ ] Run `reuse-surface validate --root <repo>` before merge
+- [ ] Run `reuse-surface establish --publish-check` after pushing to `main`
+
+Discover drafts start at low maturity with explicit auto-draft risks in
+`known_reliability_risks`. Promote only with evidence per
+`specs/CapabilityMaturityStandard.md`.
+
 ## Manual validation checklist

 Use this checklist until an automated CLI validator exists.
--- a/reuse_surface/cli.py
+++ b/reuse_surface/cli.py
@@ -26,21 +26,48 @@ from reuse_surface.reports import (
    format_cohort_markdown,
    select_cohort,
 )
+from reuse_surface.establish import (
+    discover_capabilities,
+    format_publish_check_markdown,
+    publish_check,
+    scaffold_next_steps,
+    scaffold_registry,
+)
+from reuse_surface.registry_update import (
+    apply_deterministic_suggestions,
+    collect_deterministic_suggestions,
+    format_suggestions_json,
+    format_suggestions_markdown,
+    suggest_llm_updates,
+)
+from reuse_surface.stats import collect_stats, format_stats_json, format_stats_markdown
 from reuse_surface.registry import (
    ROOT,
    capability_paths,
    level_at_least,
    load_index,
+    load_index_at,
    load_schema,
    parse_front_matter,
    parse_vector,
+    registry_paths,
 )


-def _check_index_drift(entry_paths: list[Path], index: dict[str, Any]) -> list[str]:
+def _registry_root(args: argparse.Namespace) -> Path:
+    if getattr(args, "root", None):
+        return Path(args.root).resolve()
+    return ROOT
+
+
+def _check_index_drift(
+    entry_paths: list[Path],
+    index: dict[str, Any],
+    repo_root: Path,
+) -> list[str]:
    warnings: list[str] = []
    indexed_paths = {item["path"] for item in index.get("capabilities", [])}
-    file_paths = {str(path.relative_to(ROOT)) for path in entry_paths}
+    file_paths = {str(path.relative_to(repo_root)) for path in entry_paths}
    for path in sorted(file_paths - indexed_paths):
        warnings.append(f"index drift: entry file not indexed: {path}")
    for path in sorted(indexed_paths - file_paths):
@@ -48,11 +75,22 @@ def _check_index_drift(entry_paths: list[Path], index: dict[str, Any]) -> list[s
    return warnings


-def cmd_validate(args: argparse.Namespace) -> int:
+def _capability_paths_for(repo_root: Path, target: Path | None) -> list[Path]:
+    if target is not None:
+        return [target]
+    cap_dir = registry_paths(repo_root)["capabilities"]
+    return sorted(path for path in cap_dir.glob("*.md") if path.name != ".gitkeep")
+
+
+def _run_validate(
+    repo_root: Path,
+    *,
+    target: Path | None,
+    relations: bool,
+) -> tuple[list[str], list[str], list[Path]]:
    schema = load_schema()
    validator = Draft202012Validator(schema)
-    target = Path(args.path) if args.path else None
-    paths = capability_paths(target)
+    paths = _capability_paths_for(repo_root, target)
    errors: list[str] = []
    warnings: list[str] = []

@@ -67,10 +105,23 @@ def cmd_validate(args: argparse.Namespace) -> int:
            errors.append(f"{path}: {location}: {error.message}")

    if not target:
-        index = load_index()
-        warnings.extend(_check_index_drift(paths, index))
-        if args.relations:
+        index_path = registry_paths(repo_root)["index"]
+        if index_path.exists():
+            index = load_index_at(index_path)
+            warnings.extend(_check_index_drift(paths, index, repo_root))
+        if relations and repo_root == ROOT:
            warnings.extend(check_relations())
+    return errors, warnings, paths
+
+
+def cmd_validate(args: argparse.Namespace) -> int:
+    repo_root = _registry_root(args)
+    target = Path(args.path) if args.path else None
+    if target and not target.is_absolute():
+        target = repo_root / target
+    errors, warnings, paths = _run_validate(
+        repo_root, target=target, relations=args.relations
+    )

    for warning in warnings:
        print(f"warning: {warning}", file=sys.stderr)
@@ -329,6 +380,117 @@ def cmd_hub_sync(args: argparse.Namespace) -> int:
    return 0


+def cmd_stats(args: argparse.Namespace) -> int:
+    repo_root = Path(args.path or ".").resolve()
+    stats = collect_stats(
+        repo_root,
+        federation_ready=args.federation_ready,
+        raw_url=args.raw_url,
+        hub_url=getattr(args, "hub_url", None),
+    )
+    if args.format == "json":
+        print(format_stats_json(stats))
+    else:
+        print(format_stats_markdown(stats), end="")
+    return 0
+
+
+def cmd_establish(args: argparse.Namespace) -> int:
+    repo_root = Path(args.path or ".").resolve()
+    try:
+        if args.scaffold:
+            created = scaffold_registry(
+                repo_root, domain=args.domain, force=args.force
+            )
+            for path in created:
+                print(f"ok: wrote {path.relative_to(repo_root)}")
+            print(scaffold_next_steps(repo_root))
+            return 0
+        if args.publish_check:
+            result = publish_check(repo_root, raw_url=args.raw_url)
+            print(format_publish_check_markdown(result), end="")
+            return 0 if result["ok"] else 1
+        if args.discover:
+            result = discover_capabilities(
+                repo_root,
+                domain=args.domain,
+                dry_run=not args.apply,
+                apply=args.apply,
+                llm_url=args.llm_url,
+                context_max_files=args.context_max_files,
+            )
+            if result.get("dry_run"):
+                print(yaml.safe_dump(result["draft"], sort_keys=False))
+                return 0
+            for path in result.get("written", []):
+                print(f"ok: wrote {path}")
+            validate_args = argparse.Namespace(
+                path=None,
+                root=str(repo_root),
+                relations=False,
+                fail_on_warnings=True,
+            )
+            return cmd_validate(validate_args)
+    except ValueError as exc:
+        print(f"error: {exc}", file=sys.stderr)
+        return 1
+    print("error: specify --scaffold, --publish-check, or --discover", file=sys.stderr)
+    return 1
+
+
+def cmd_update(args: argparse.Namespace) -> int:
+    repo_root = Path(args.path or ".").resolve()
+    try:
+        capability_id = None if args.all else args.capability
+        if not args.all and not args.capability:
+            print("error: specify --capability or --all", file=sys.stderr)
+            return 1
+        if args.suggest_maturity:
+            cap_ids = [args.capability] if args.capability else []
+            if args.all:
+                index = load_index_at(registry_paths(repo_root)["index"])
+                cap_ids = [row["id"] for row in index.get("capabilities", [])]
+            payload = {
+                "suggestions": [
+                    suggest_llm_updates(
+                        repo_root,
+                        cap_id,
+                        git_since=args.from_git_since,
+                        llm_url=args.llm_url,
+                    )
+                    for cap_id in cap_ids
+                ]
+            }
+            print(json.dumps(payload, indent=2, sort_keys=True))
+            return 0
+
+        suggestions = collect_deterministic_suggestions(
+            repo_root,
+            capability_id=capability_id,
+            git_since=args.from_git_since,
+        )
+        if args.apply:
+            changed = apply_deterministic_suggestions(repo_root, suggestions)
+            for line in changed:
+                print(f"ok: {line}")
+            validate_args = argparse.Namespace(
+                path=None,
+                root=str(repo_root),
+                relations=False,
+                fail_on_warnings=True,
+            )
+            return cmd_validate(validate_args)
+
+        if args.format == "json":
+            print(format_suggestions_json(suggestions))
+        else:
+            print(format_suggestions_markdown(suggestions), end="")
+        return 0
+    except ValueError as exc:
+        print(f"error: {exc}", file=sys.stderr)
+        return 1
+
+
 def cmd_report_cohorts(args: argparse.Namespace) -> int:
    filters = cohort_filters_from_args(args)
    matches = select_cohort(filters)
@@ -399,6 +561,10 @@ def main(argv: list[str] | None = None) -> int:
        action="store_true",
        help="exit non-zero when warnings are present",
    )
+    validate.add_argument(
+        "--root",
+        help="registry repo root (default: reuse-surface install root)",
+    )
    validate.set_defaults(func=cmd_validate)

    federation = subparsers.add_parser(
@@ -539,6 +705,41 @@ def main(argv: list[str] | None = None) -> int:
    )
    cohorts.set_defaults(func=cmd_report_cohorts)

+    stats = subparsers.add_parser("stats", help="registry maturity and federation stats")
+    stats.add_argument("--path", help="repo root (default: cwd)")
+    stats.add_argument("--federation-ready", action="store_true")
+    stats.add_argument("--raw-url", help="probe federation raw index URL")
+    stats.add_argument("--hub-url", help="hub base URL (or REUSE_SURFACE_URL)")
+    stats.add_argument("--format", choices=["markdown", "json"], default="markdown")
+    stats.set_defaults(func=cmd_stats)
+
+    establish = subparsers.add_parser(
+        "establish", help="bootstrap or discover capability registry"
+    )
+    establish.add_argument("--path", help="target repo root (default: cwd)")
+    establish.add_argument("--domain", default="helix_forge")
+    establish.add_argument("--force", action="store_true")
+    establish.add_argument("--scaffold", action="store_true")
+    establish.add_argument("--publish-check", action="store_true")
+    establish.add_argument("--discover", action="store_true")
+    establish.add_argument("--dry-run", action="store_true", help="discover preview (default)")
+    establish.add_argument("--apply", action="store_true", help="discover write + validate")
+    establish.add_argument("--raw-url", help="raw Gitea index URL for publish-check")
+    establish.add_argument("--llm-url", help="llm-connect base URL (or LLM_CONNECT_URL)")
+    establish.add_argument("--context-max-files", type=int, default=12)
+    establish.set_defaults(func=cmd_establish)
+
+    update = subparsers.add_parser("update", help="refresh registry metadata from repo signals")
+    update.add_argument("--path", help="repo root (default: cwd)")
+    update.add_argument("--capability", help="single capability id")
+    update.add_argument("--all", action="store_true")
+    update.add_argument("--from-git-since", help="git ref for change detection")
+    update.add_argument("--apply", action="store_true")
+    update.add_argument("--suggest-maturity", action="store_true")
+    update.add_argument("--llm-url", help="llm-connect base URL (or LLM_CONNECT_URL)")
+    update.add_argument("--format", choices=["markdown", "json"], default="markdown")
+    update.set_defaults(func=cmd_update)
+
    args = parser.parse_args(argv)
    return args.func(args)

--- a/reuse_surface/establish.py
+++ b/reuse_surface/establish.py
@@ -0,0 +1,448 @@
+from __future__ import annotations
+
+import json
+import textwrap
+import urllib.error
+import urllib.request
+from datetime import date
+from pathlib import Path
+from typing import Any
+
+import yaml
+
+from reuse_surface.llm_bridge import request_registry_draft
+from reuse_surface.registry import load_index_at, registry_paths
+
+SCAFFOLD_README = """# Capability Registry
+
+Markdown-first capability index for federation and reuse planning.
+
+## Authoring
+
+1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`).
+2. Add the row to `indexes/capabilities.yaml`.
+3. Run `reuse-surface validate` from a checkout with the CLI installed.
+4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`.
+
+Federation contract: reuse-surface `docs/RegistryFederation.md`.
+"""
+
+CONTEXT_FILES = (
+    "INTENT.md",
+    "SCOPE.md",
+    "AGENTS.md",
+    "README.md",
+    "pyproject.toml",
+    "Cargo.toml",
+    "go.mod",
+)
+
+
+def scaffold_registry(
+    repo_root: Path,
+    *,
+    domain: str = "helix_forge",
+    force: bool = False,
+) -> list[Path]:
+    paths = registry_paths(repo_root)
+    created: list[Path] = []
+    if paths["registry"].exists() and not force:
+        raise ValueError(
+            f"registry already exists at {paths['registry']}; use --force to overwrite"
+        )
+
+    paths["registry"].mkdir(parents=True, exist_ok=True)
+    paths["capabilities"].mkdir(parents=True, exist_ok=True)
+    paths["index"].parent.mkdir(parents=True, exist_ok=True)
+
+    readme = paths["registry"] / "README.md"
+    if force or not readme.exists():
+        readme.write_text(SCAFFOLD_README, encoding="utf-8")
+        created.append(readme)
+
+    gitkeep = paths["capabilities"] / ".gitkeep"
+    if force or not gitkeep.exists():
+        gitkeep.write_text("", encoding="utf-8")
+        created.append(gitkeep)
+
+    index_data = {
+        "version": 1,
+        "updated": date.today().isoformat(),
+        "domain": domain,
+        "capabilities": [],
+    }
+    if force or not paths["index"].exists():
+        paths["index"].write_text(
+            yaml.safe_dump(index_data, sort_keys=False, allow_unicode=True),
+            encoding="utf-8",
+        )
+        created.append(paths["index"])
+    return created
+
+
+def scaffold_next_steps(repo_root: Path) -> str:
+    return textwrap.dedent(
+        f"""
+        Next steps:
+          1. Add capability entries under {repo_root / 'registry/capabilities'}
+          2. Update {repo_root / 'registry/indexes/capabilities.yaml'}
+          3. reuse-surface validate
+          4. git push origin main
+          5. reuse-surface establish --publish-check --raw-url <gitea-raw-url>
+          6. reuse-surface hub register --repo <slug> --url <raw-url>
+        """
+    ).strip()
+
+
+def publish_check(
+    repo_root: Path,
+    *,
+    raw_url: str | None = None,
+) -> dict[str, Any]:
+    paths = registry_paths(repo_root)
+    result: dict[str, Any] = {
+        "repo_root": str(repo_root),
+        "checks": [],
+        "ok": True,
+    }
+
+    if paths["index"].exists():
+        try:
+            data = load_index_at(paths["index"])
+            valid = isinstance(data, dict) and isinstance(data.get("capabilities"), list)
+            result["checks"].append(
+                {
+                    "name": "local_index_yaml",
+                    "ok": valid,
+                    "detail": f"{len(data.get('capabilities', []))} capabilities"
+                    if valid
+                    else "invalid structure",
+                }
+            )
+            if not valid:
+                result["ok"] = False
+        except (OSError, yaml.YAMLError) as exc:
+            result["checks"].append(
+                {"name": "local_index_yaml", "ok": False, "detail": str(exc)}
+            )
+            result["ok"] = False
+    else:
+        result["checks"].append(
+            {
+                "name": "local_index_yaml",
+                "ok": False,
+                "detail": "registry/indexes/capabilities.yaml missing",
+            }
+        )
+        result["ok"] = False
+
+    if raw_url:
+        probe = _probe_raw_url(raw_url)
+        result["checks"].append(
+            {
+                "name": "raw_url_probe",
+                "ok": probe["ok"],
+                "detail": f"HTTP {probe.get('status')} {probe.get('content_type', '')}".strip(),
+                "url": raw_url,
+            }
+        )
+        if probe["ok"]:
+            body_probe = _fetch_yaml_snippet(raw_url)
+            result["checks"].append(body_probe)
+            if not body_probe.get("ok"):
+                result["ok"] = False
+        else:
+            result["ok"] = False
+            result["remediation"] = (
+                "Merge registry/indexes/capabilities.yaml to main and confirm "
+                "Gitea raw URL returns 200 YAML. See docs/RegistryFederation.md."
+            )
+
+    return result
+
+
+def _probe_raw_url(url: str) -> dict[str, Any]:
+    request = urllib.request.Request(
+        url,
+        method="HEAD",
+        headers={"User-Agent": "reuse-surface/0.1"},
+    )
+    try:
+        with urllib.request.urlopen(request, timeout=30) as response:
+            return {
+                "ok": response.status == 200,
+                "status": response.status,
+                "content_type": response.headers.get("Content-Type", ""),
+            }
+    except urllib.error.HTTPError as exc:
+        return {
+            "ok": False,
+            "status": exc.code,
+            "content_type": exc.headers.get("Content-Type", ""),
+        }
+
+
+def _fetch_yaml_snippet(url: str) -> dict[str, Any]:
+    request = urllib.request.Request(url, headers={"User-Agent": "reuse-surface/0.1"})
+    try:
+        with urllib.request.urlopen(request, timeout=30) as response:
+            body = response.read().decode("utf-8")
+    except urllib.error.HTTPError as exc:
+        return {"name": "raw_url_body", "ok": False, "detail": f"HTTP {exc.code}"}
+    except urllib.error.URLError as exc:
+        return {"name": "raw_url_body", "ok": False, "detail": str(exc.reason)}
+    try:
+        data = yaml.safe_load(body)
+    except yaml.YAMLError as exc:
+        return {"name": "raw_url_body", "ok": False, "detail": str(exc)}
+    ok = isinstance(data, dict) and "capabilities" in data
+    return {
+        "name": "raw_url_body",
+        "ok": ok,
+        "detail": "valid capabilities.yaml shape" if ok else "body is not valid index YAML",
+    }
+
+
+def collect_context(repo_root: Path, *, max_files: int = 12) -> str:
+    chunks: list[str] = []
+    used = 0
+    for name in CONTEXT_FILES:
+        if used >= max_files:
+            break
+        path = repo_root / name
+        if path.is_file():
+            chunks.append(f"### {name}\n{path.read_text(encoding='utf-8')[:8000]}")
+            used += 1
+    pkg_dirs = sorted(
+        [
+            item
+            for item in repo_root.iterdir()
+            if item.is_dir()
+            and not item.name.startswith(".")
+            and item.name not in {"registry", "tests", "docs", "workplans", "node_modules"}
+        ]
+    )
+    for pkg in pkg_dirs[: max(0, max_files - used)]:
+        init = pkg / "__init__.py"
+        if init.exists():
+            chunks.append(f"### {pkg.name}/__init__.py\n{init.read_text(encoding='utf-8')[:2000]}")
+    return "\n\n".join(chunks)
+
+
+def build_discover_prompt(context: str, domain: str) -> str:
+    schema_hint = json.dumps(
+        {
+            "domain": domain,
+            "capabilities": [
+                {
+                    "id": "capability.domain.name",
+                    "name": "Human Name",
+                    "summary": "One sentence.",
+                    "owner": "team",
+                    "vector": "D2 / A0 / C0 / R0",
+                    "tags": ["tag"],
+                    "consumption_modes": ["informational"],
+                    "discovery_intent": "What this enables.",
+                    "discovery_includes": ["included behavior"],
+                    "discovery_excludes": ["excluded behavior"],
+                }
+            ],
+        },
+        indent=2,
+    )
+    return textwrap.dedent(
+        f"""
+        You are drafting a capability registry index for helix_forge reuse-surface.
+
+        Return ONLY a JSON object matching this shape (no markdown fences):
+        {schema_hint}
+
+        Rules:
+        - Propose 1-5 distinct capabilities grounded in the repository context.
+        - Use IDs matching ^capability\\.[a-z0-9]+(\\.[a-z0-9-]+)+$
+        - Default vector D2 / A0 / C0 / R0 unless strong delivery evidence exists.
+        - domain: {domain}
+
+        Repository context:
+        {context}
+        """
+    ).strip()
+
+
+def discover_capabilities(
+    repo_root: Path,
+    *,
+    domain: str = "helix_forge",
+    dry_run: bool = True,
+    apply: bool = False,
+    llm_url: str | None = None,
+    context_max_files: int = 12,
+) -> dict[str, Any]:
+    if apply and dry_run:
+        raise ValueError("use either --dry-run or --apply, not both")
+    if not apply and not dry_run:
+        dry_run = True
+
+    context = collect_context(repo_root, max_files=context_max_files)
+    if not context.strip():
+        raise ValueError("no context files found for discovery")
+
+    prompt = build_discover_prompt(context, domain)
+    draft = request_registry_draft(
+        prompt,
+        base_url=llm_url,
+        config={"temperature": 0.2, "max_tokens": 4000},
+    )
+
+    result: dict[str, Any] = {"draft": draft, "written": [], "dry_run": dry_run}
+    if dry_run:
+        return result
+
+    paths = registry_paths(repo_root)
+    if not paths["index"].exists():
+        scaffold_registry(repo_root, domain=domain, force=False)
+
+    index = load_index_at(paths["index"]) if paths["index"].exists() else {
+        "version": 1,
+        "domain": domain,
+        "capabilities": [],
+    }
+    existing_ids = {row["id"] for row in index.get("capabilities", [])}
+
+    for item in draft.get("capabilities", []):
+        cap_id = item["id"]
+        if cap_id in existing_ids:
+            continue
+        filename = cap_id.replace(".", "-") + ".md"
+        rel_path = f"registry/capabilities/{filename}"
+        entry_path = repo_root / rel_path
+        entry_body = _render_entry_from_draft(item, domain)
+        entry_path.parent.mkdir(parents=True, exist_ok=True)
+        entry_path.write_text(entry_body, encoding="utf-8")
+        vector = item.get("vector", "D2 / A0 / C0 / R0")
+        index.setdefault("capabilities", []).append(
+            {
+                "id": cap_id,
+                "name": item["name"],
+                "summary": item["summary"],
+                "vector": vector,
+                "domain": domain,
+                "status": "draft",
+                "owner": item.get("owner", repo_root.name),
+                "path": rel_path,
+                "tags": item.get("tags", []),
+                "consumption_modes": item.get("consumption_modes", ["informational"]),
+            }
+        )
+        result["written"].append(rel_path)
+
+    index["updated"] = date.today().isoformat()
+    index["domain"] = draft.get("domain", domain)
+    paths["index"].write_text(
+        yaml.safe_dump(index, sort_keys=False, allow_unicode=True),
+        encoding="utf-8",
+    )
+    result["written"].append(str(paths["index"].relative_to(repo_root)))
+    return result
+
+
+def _render_entry_from_draft(item: dict[str, Any], domain: str) -> str:
+    vector = item.get("vector", "D2 / A0 / C0 / R0")
+    d, a, c, r = [part.strip() for part in vector.split("/")]
+    front_matter = {
+        "id": item["id"],
+        "name": item["name"],
+        "summary": item["summary"],
+        "owner": item.get("owner", domain),
+        "status": "draft",
+        "domain": domain,
+        "tags": item.get("tags") or ["draft"],
+        "maturity": {
+            "discovery": {
+                "current": d,
+                "target": "D5",
+                "confidence": "low",
+                "rationale": "Auto-drafted by reuse-surface establish --discover; review required.",
+            },
+            "availability": {
+                "current": a,
+                "target": "A3",
+                "confidence": "low",
+                "rationale": "Auto-drafted; confirm consumption modes and artifacts.",
+            },
+        },
+        "external_evidence": {
+            "completeness": {
+                "level": c,
+                "confidence": "low",
+                "basis": "scope_vs_intent_and_consumer_expectations",
+                "satisfied_expectations": [],
+                "broken_expectations": [],
+                "out_of_scope_expectations": [],
+            },
+            "reliability": {
+                "level": r,
+                "confidence": "low",
+                "basis": "consumer_quality_signals",
+                "known_reliability_risks": ["auto-drafted entry without consumer evidence"],
+            },
+        },
+        "discovery": {
+            "intent": item.get("discovery_intent", item["summary"]),
+            "includes": item.get("discovery_includes") or [],
+            "excludes": item.get("discovery_excludes") or [],
+            "assumptions": [],
+            "use_cases": [],
+            "research_memos": [],
+        },
+        "availability": {
+            "current_level": a,
+            "target_level": "A3",
+            "current_artifacts": [],
+            "target_artifacts": [],
+            "consumption_modes": item.get("consumption_modes") or ["informational"],
+        },
+        "relations": {"depends_on": [], "supports": [], "related_to": []},
+        "evidence": {
+            "documentation": [],
+            "tests": [],
+            "consumer_feedback": [],
+            "bug_reports": [],
+            "incidents": [],
+        },
+        "consumer_guidance": {
+            "recommended_for": ["planning reuse after human review"],
+            "not_recommended_for": ["implementation reuse before validation"],
+            "known_limitations": ["discover draft — verify maturity claims"],
+        },
+        "promotion_history": [],
+    }
+    markdown = (
+        f"# {item['name']}\n\n"
+        "Auto-drafted capability entry. Review maturity, evidence, and relations "
+        "before promoting.\n"
+    )
+    return (
+        "---\n"
+        + yaml.safe_dump(front_matter, sort_keys=False, allow_unicode=True)
+        + "---\n\n"
+        + markdown
+    )
+
+
+def format_publish_check_markdown(result: dict[str, Any]) -> str:
+    lines = ["# Federation publish check", ""]
+    lines.append(f"**Repo:** `{result['repo_root']}`")
+    lines.append(f"**Result:** {'PASS' if result['ok'] else 'FAIL'}")
+    lines.append("")
+    for check in result["checks"]:
+        status = "ok" if check["ok"] else "FAIL"
+        detail = check.get("detail", "")
+        name = check["name"]
+        lines.append(f"- **{name}**: {status} — {detail}")
+        if check.get("url"):
+            lines.append(f"  `{check['url']}`")
+    if result.get("remediation"):
+        lines.append("")
+        lines.append(f"**Remediation:** {result['remediation']}")
+    return "\n".join(lines) + "\n"
--- a/reuse_surface/llm_bridge.py
+++ b/reuse_surface/llm_bridge.py
@@ -0,0 +1,102 @@
+from __future__ import annotations
+
+import json
+import os
+import re
+import urllib.error
+import urllib.request
+from pathlib import Path
+from typing import Any
+
+from jsonschema import Draft202012Validator
+
+from reuse_surface.registry import ROOT
+
+DRAFT_SCHEMA_PATH = ROOT / "schemas" / "registry-draft.schema.json"
+
+
+def llm_connect_url(explicit: str | None = None) -> str:
+    base = (explicit or os.environ.get("LLM_CONNECT_URL", "")).rstrip("/")
+    if not base:
+        raise ValueError(
+            "LLM backend not configured; set LLM_CONNECT_URL or pass --llm-url"
+        )
+    return base
+
+
+def load_draft_schema() -> dict[str, Any]:
+    return json.loads(DRAFT_SCHEMA_PATH.read_text(encoding="utf-8"))
+
+
+def execute_prompt(
+    prompt: str,
+    *,
+    base_url: str | None = None,
+    config: dict[str, Any] | None = None,
+) -> str:
+    url = f"{llm_connect_url(base_url)}/execute"
+    body: dict[str, Any] = {"prompt": prompt}
+    if config:
+        body["config"] = config
+    data = json.dumps(body).encode("utf-8")
+    request = urllib.request.Request(
+        url,
+        data=data,
+        headers={
+            "Content-Type": "application/json",
+            "Accept": "application/json",
+            "User-Agent": "reuse-surface/0.1",
+        },
+        method="POST",
+    )
+    try:
+        with urllib.request.urlopen(request, timeout=120) as response:
+            payload = json.loads(response.read().decode("utf-8"))
+    except urllib.error.HTTPError as exc:
+        raw = exc.read().decode("utf-8")
+        raise ValueError(f"llm-connect returned {exc.code}: {raw}") from exc
+    content = payload.get("content")
+    if not isinstance(content, str) or not content.strip():
+        raise ValueError("llm-connect response missing content")
+    return content
+
+
+def extract_json_object(text: str) -> dict[str, Any]:
+    stripped = text.strip()
+    if stripped.startswith("```"):
+        stripped = re.sub(r"^```(?:json)?\s*", "", stripped)
+        stripped = re.sub(r"\s*```$", "", stripped)
+    try:
+        data = json.loads(stripped)
+    except json.JSONDecodeError:
+        match = re.search(r"\{.*\}", stripped, re.DOTALL)
+        if not match:
+            raise ValueError("llm response did not contain JSON object") from None
+        data = json.loads(match.group(0))
+    if not isinstance(data, dict):
+        raise ValueError("llm response JSON must be an object")
+    return data
+
+
+def request_registry_draft(
+    prompt: str,
+    *,
+    base_url: str | None = None,
+    config: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    draft = extract_json_object(execute_prompt(prompt, base_url=base_url, config=config))
+    validator = Draft202012Validator(load_draft_schema())
+    errors = sorted(validator.iter_errors(draft), key=lambda err: list(err.path))
+    if errors:
+        messages = "; ".join(error.message for error in errors[:3])
+        raise ValueError(f"draft schema validation failed: {messages}")
+    return draft
+
+
+def request_json_object(
+    prompt: str,
+    *,
+    base_url: str | None = None,
+    config: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    return extract_json_object(execute_prompt(prompt, base_url=base_url, config=config))
--- a/reuse_surface/registry.py
+++ b/reuse_surface/registry.py
@@ -60,4 +60,31 @@ def parse_vector(vector: str) -> dict[str, str]:

 def level_at_least(dimension: str, current: str, minimum: str) -> bool:
    order = LEVEL_ORDERS[dimension]
-    return order.index(current) >= order.index(minimum)
+    return order.index(current) >= order.index(minimum)
+
+
+def registry_paths(repo_root: Path) -> dict[str, Path]:
+    registry = repo_root / "registry"
+    return {
+        "registry": registry,
+        "capabilities": registry / "capabilities",
+        "index": registry / "indexes" / "capabilities.yaml",
+        "sources": registry / "federation" / "sources.yaml",
+    }
+
+
+def load_index_at(path: Path) -> dict[str, Any]:
+    with path.open(encoding="utf-8") as handle:
+        return yaml.safe_load(handle)
+
+
+def entry_vector(front_matter: dict[str, Any]) -> str:
+    discovery = front_matter["maturity"]["discovery"]["current"]
+    availability = front_matter["maturity"]["availability"]["current"]
+    completeness = front_matter["external_evidence"]["completeness"]["level"]
+    reliability = front_matter["external_evidence"]["reliability"]["level"]
+    return f"{discovery} / {availability} / {completeness} / {reliability}"
+
+
+def vectors_match(index_vector: str, front_matter: dict[str, Any]) -> bool:
+    return index_vector.replace(" ", "") == entry_vector(front_matter).replace(" ", "")
--- a/reuse_surface/registry_update.py
+++ b/reuse_surface/registry_update.py
@@ -0,0 +1,273 @@
+from __future__ import annotations
+
+import json
+import subprocess
+import textwrap
+from pathlib import Path
+from typing import Any
+
+import yaml
+
+from reuse_surface.llm_bridge import request_json_object
+from reuse_surface.registry import (
+    entry_vector,
+    load_index_at,
+    parse_front_matter,
+    registry_paths,
+    vectors_match,
+)
+
+SAFE_EVIDENCE_PREFIXES = ("tests/", ".gitea/workflows/")
+
+
+def git_changed_files(repo_root: Path, since_ref: str) -> list[str]:
+    result = subprocess.run(
+        ["git", "-C", str(repo_root), "diff", "--name-only", since_ref, "HEAD"],
+        capture_output=True,
+        text=True,
+        check=False,
+    )
+    if result.returncode != 0:
+        raise ValueError(result.stderr.strip() or f"git diff failed for {since_ref}")
+    return [line.strip() for line in result.stdout.splitlines() if line.strip()]
+
+
+def collect_deterministic_suggestions(
+    repo_root: Path,
+    *,
+    capability_id: str | None = None,
+    git_since: str | None = None,
+) -> list[dict[str, Any]]:
+    paths = registry_paths(repo_root)
+    if not paths["index"].exists():
+        raise ValueError("registry index missing; run establish --scaffold first")
+
+    index = load_index_at(paths["index"])
+    rows = index.get("capabilities", [])
+    if capability_id:
+        rows = [row for row in rows if row["id"] == capability_id]
+        if not rows:
+            raise ValueError(f"capability not in index: {capability_id}")
+
+    changed_files = git_changed_files(repo_root, git_since) if git_since else []
+    suggestions: list[dict[str, Any]] = []
+
+    for row in rows:
+        entry_path = repo_root / row["path"]
+        if not entry_path.exists():
+            suggestions.append(
+                {
+                    "capability_id": row["id"],
+                    "kind": "missing_entry",
+                    "detail": f"missing file {row['path']}",
+                }
+            )
+            continue
+
+        front_matter = parse_front_matter(entry_path)
+        if not vectors_match(row["vector"], front_matter):
+            suggestions.append(
+                {
+                    "capability_id": row["id"],
+                    "kind": "vector_drift",
+                    "detail": "index vector differs from entry front matter",
+                    "index_vector": row["vector"],
+                    "entry_vector": entry_vector(front_matter),
+                    "apply_patch": {
+                        "field": "index.vector",
+                        "value": entry_vector(front_matter),
+                    },
+                }
+            )
+
+        evidence_tests = front_matter.get("evidence", {}).get("tests", [])
+        for changed in changed_files:
+            if changed.startswith("tests/") and changed not in evidence_tests:
+                suggestions.append(
+                    {
+                        "capability_id": row["id"],
+                        "kind": "evidence_test",
+                        "detail": f"new test file not cited: {changed}",
+                        "apply_patch": {
+                            "field": "evidence.tests",
+                            "append": changed,
+                        },
+                    }
+                )
+
+        artifacts = front_matter.get("availability", {}).get("current_artifacts", [])
+        for changed in changed_files:
+            if changed.endswith(".py") and changed.startswith(
+                tuple(
+                    p.name + "/"
+                    for p in repo_root.iterdir()
+                    if p.is_dir() and (p / "__init__.py").exists()
+                )
+            ):
+                if changed not in artifacts:
+                    suggestions.append(
+                        {
+                            "capability_id": row["id"],
+                            "kind": "availability_artifact",
+                            "detail": f"changed module not cited: {changed}",
+                            "apply_patch": {
+                                "field": "availability.current_artifacts",
+                                "append": changed,
+                            },
+                        }
+                    )
+
+    return suggestions
+
+
+def apply_deterministic_suggestions(
+    repo_root: Path,
+    suggestions: list[dict[str, Any]],
+) -> list[str]:
+    paths = registry_paths(repo_root)
+    index = load_index_at(paths["index"])
+    index_by_id = {row["id"]: row for row in index.get("capabilities", [])}
+    changed: list[str] = []
+
+    entry_cache: dict[str, dict[str, Any]] = {}
+    entry_paths: dict[str, Path] = {}
+
+    for suggestion in suggestions:
+        patch = suggestion.get("apply_patch")
+        if not patch:
+            continue
+        cap_id = suggestion["capability_id"]
+        if patch["field"] == "index.vector" and cap_id in index_by_id:
+            index_by_id[cap_id]["vector"] = patch["value"]
+            changed.append(f"index vector for {cap_id}")
+
+        row = index_by_id.get(cap_id)
+        if not row:
+            continue
+        entry_path = repo_root / row["path"]
+        if cap_id not in entry_cache:
+            entry_cache[cap_id] = parse_front_matter(entry_path)
+            entry_paths[cap_id] = entry_path
+
+        front_matter = entry_cache[cap_id]
+        if patch["field"] == "evidence.tests":
+            tests = front_matter.setdefault("evidence", {}).setdefault("tests", [])
+            if patch["append"] not in tests:
+                tests.append(patch["append"])
+                changed.append(f"{cap_id} evidence.tests += {patch['append']}")
+        if patch["field"] == "availability.current_artifacts":
+            artifacts = front_matter.setdefault("availability", {}).setdefault(
+                "current_artifacts", []
+            )
+            if patch["append"] not in artifacts:
+                artifacts.append(patch["append"])
+                changed.append(
+                    f"{cap_id} availability.current_artifacts += {patch['append']}"
+                )
+
+    if changed:
+        paths["index"].write_text(
+            yaml.safe_dump(index, sort_keys=False, allow_unicode=True),
+            encoding="utf-8",
+        )
+        for cap_id, front_matter in entry_cache.items():
+            _write_front_matter(entry_paths[cap_id], front_matter)
+    return changed
+
+
+def _write_front_matter(path: Path, front_matter: dict[str, Any]) -> None:
+    text = path.read_text(encoding="utf-8")
+    marker_end = text.find("\n---", 4)
+    body = text[marker_end + 4 :] if marker_end != -1 else "\n"
+    path.write_text(
+        "---\n"
+        + yaml.safe_dump(front_matter, sort_keys=False, allow_unicode=True)
+        + "---"
+        + body,
+        encoding="utf-8",
+    )
+
+
+def build_update_prompt(
+    repo_root: Path,
+    capability_id: str,
+    *,
+    git_since: str | None = None,
+) -> str:
+    paths = registry_paths(repo_root)
+    index = load_index_at(paths["index"])
+    row = next((item for item in index["capabilities"] if item["id"] == capability_id), None)
+    if not row:
+        raise ValueError(f"capability not in index: {capability_id}")
+    entry = parse_front_matter(repo_root / row["path"])
+    diff = ""
+    if git_since:
+        proc = subprocess.run(
+            [
+                "git",
+                "-C",
+                str(repo_root),
+                "diff",
+                git_since,
+                "HEAD",
+                "--",
+                "registry/",
+                "reuse_surface/",
+                "tests/",
+            ],
+            capture_output=True,
+            text=True,
+            check=False,
+        )
+        diff = proc.stdout[:12000]
+
+    return textwrap.dedent(
+        f"""
+        Suggest registry entry updates for capability `{capability_id}`.
+
+        Return ONLY JSON:
+        {{
+          "promotion_history": [
+            {{"date": "YYYY-MM-DD", "dimension": "availability", "from": "A3", "to": "A4", "rationale": "..."}}
+          ],
+          "consumer_feedback": ["optional string notes"],
+          "notes": ["human review items"]
+        }}
+
+        Current entry YAML:
+        {yaml.safe_dump(entry, sort_keys=False)}
+
+        Git diff since {git_since or 'N/A'}:
+        {diff or '(none)'}
+        """
+    ).strip()
+
+
+def suggest_llm_updates(
+    repo_root: Path,
+    capability_id: str,
+    *,
+    git_since: str | None = None,
+    llm_url: str | None = None,
+) -> dict[str, Any]:
+    prompt = build_update_prompt(repo_root, capability_id, git_since=git_since)
+    return request_json_object(
+        prompt,
+        base_url=llm_url,
+        config={"temperature": 0.2, "max_tokens": 2000},
+    )
+
+
+def format_suggestions_markdown(suggestions: list[dict[str, Any]]) -> str:
+    if not suggestions:
+        return "# Registry update suggestions\n\n_No suggestions._\n"
+    lines = ["# Registry update suggestions", ""]
+    for item in suggestions:
+        lines.append(f"- `{item['capability_id']}` **{item['kind']}**: {item['detail']}")
+    lines.append("")
+    lines.append(f"**{len(suggestions)}** suggestion(s). Use `--apply` to apply safe patches.")
+    return "\n".join(lines) + "\n"
+
+
+def format_suggestions_json(suggestions: list[dict[str, Any]]) -> str:
+    return json.dumps({"count": len(suggestions), "suggestions": suggestions}, indent=2)
--- a/reuse_surface/stats.py
+++ b/reuse_surface/stats.py
@@ -0,0 +1,259 @@
+from __future__ import annotations
+
+import json
+import urllib.error
+import urllib.request
+from collections import Counter
+from pathlib import Path
+from typing import Any
+
+import yaml
+
+from reuse_surface import hub_client
+from reuse_surface.registry import (
+    LEVEL_ORDERS,
+    entry_vector,
+    load_index_at,
+    parse_front_matter,
+    parse_vector,
+    registry_paths,
+    vectors_match,
+)
+
+
+def _histogram(values: list[str], order: list[str]) -> dict[str, int]:
+    counts = Counter(values)
+    return {level: counts.get(level, 0) for level in order if counts.get(level, 0)}
+
+
+def _probe_url(url: str) -> dict[str, Any]:
+    request = urllib.request.Request(
+        url,
+        method="HEAD",
+        headers={"User-Agent": "reuse-surface/0.1"},
+    )
+    try:
+        with urllib.request.urlopen(request, timeout=30) as response:
+            return {
+                "url": url,
+                "status": response.status,
+                "content_type": response.headers.get("Content-Type", ""),
+                "ok": response.status == 200,
+            }
+    except urllib.error.HTTPError as exc:
+        return {
+            "url": url,
+            "status": exc.code,
+            "content_type": exc.headers.get("Content-Type", ""),
+            "ok": False,
+        }
+    except urllib.error.URLError as exc:
+        return {"url": url, "status": None, "error": str(exc.reason), "ok": False}
+
+
+def collect_stats(
+    repo_root: Path,
+    *,
+    federation_ready: bool = False,
+    raw_url: str | None = None,
+    hub_url: str | None = None,
+) -> dict[str, Any]:
+    paths = registry_paths(repo_root)
+    stats: dict[str, Any] = {
+        "repo_root": str(repo_root),
+        "registry_present": paths["registry"].exists(),
+        "index_present": paths["index"].exists(),
+        "sources_present": paths["sources"].exists(),
+        "capability_count": 0,
+        "histograms": {},
+        "reliability": {"r0_r2": 0, "r3_plus": 0},
+        "consumption_modes": {},
+        "vector_drift": [],
+        "federation": {},
+        "hub": {},
+    }
+
+    if not paths["index"].exists():
+        if federation_ready and raw_url:
+            stats["federation"]["raw_url_probe"] = _probe_url(raw_url)
+        if hub_url or _hub_configured():
+            stats["hub"] = _hub_summary(hub_url)
+        return stats
+
+    index = load_index_at(paths["index"])
+    capabilities = index.get("capabilities", [])
+    stats["capability_count"] = len(capabilities)
+    stats["domain"] = index.get("domain")
+
+    discovery: list[str] = []
+    availability: list[str] = []
+    completeness: list[str] = []
+    reliability: list[str] = []
+    mode_counts: Counter[str] = Counter()
+
+    for row in capabilities:
+        vector = parse_vector(row["vector"])
+        discovery.append(vector["discovery"])
+        availability.append(vector["availability"])
+        completeness.append(vector["completeness"])
+        reliability.append(vector["reliability"])
+        for mode in row.get("consumption_modes", []):
+            mode_counts[mode] += 1
+
+        entry_path = repo_root / row["path"]
+        if entry_path.exists():
+            try:
+                front_matter = parse_front_matter(entry_path)
+                if not vectors_match(row["vector"], front_matter):
+                    stats["vector_drift"].append(
+                        {
+                            "id": row["id"],
+                            "index_vector": row["vector"],
+                            "entry_vector": entry_vector(front_matter),
+                        }
+                    )
+            except ValueError:
+                stats["vector_drift"].append(
+                    {"id": row["id"], "error": "invalid entry front matter"}
+                )
+
+    stats["histograms"] = {
+        "discovery": _histogram(discovery, LEVEL_ORDERS["discovery"]),
+        "availability": _histogram(availability, LEVEL_ORDERS["availability"]),
+        "completeness": _histogram(completeness, LEVEL_ORDERS["completeness"]),
+        "reliability": _histogram(reliability, LEVEL_ORDERS["reliability"]),
+    }
+    stats["reliability"] = {
+        "r0_r2": sum(1 for level in reliability if level in {"R0", "R1", "R2"}),
+        "r3_plus": sum(1 for level in reliability if level_at_least_reliability(level, "R3")),
+    }
+    stats["consumption_modes"] = dict(sorted(mode_counts.items()))
+
+    if federation_ready:
+        probe_url = raw_url
+        if not probe_url and paths["index"].exists():
+            probe_url = _default_raw_url(repo_root)
+        if probe_url:
+            stats["federation"]["raw_url_probe"] = _probe_url(probe_url)
+        stats["federation"]["index_valid_yaml"] = _index_yaml_valid(paths["index"])
+
+    stats["hub"] = _hub_summary(hub_url)
+    return stats
+
+
+def level_at_least_reliability(current: str, minimum: str) -> bool:
+    order = LEVEL_ORDERS["reliability"]
+    return order.index(current) >= order.index(minimum)
+
+
+def _hub_configured() -> bool:
+    import os
+
+    return bool(os.environ.get("REUSE_SURFACE_URL"))
+
+
+def _hub_summary(hub_url: str | None) -> dict[str, Any]:
+    try:
+        status, payload = hub_client.hub_list(hub_url)
+    except (ValueError, urllib.error.URLError, OSError):
+        return {"configured": False}
+    if status != 200:
+        return {"configured": True, "status": status, "error": payload}
+    repos = payload.get("repos", [])
+    return {
+        "configured": True,
+        "registration_count": payload.get("count", len(repos)),
+        "enabled_count": sum(1 for repo in repos if repo.get("enabled", True)),
+    }
+
+
+def _default_raw_url(repo_root: Path) -> str | None:
+    return None
+
+
+def _index_yaml_valid(index_path: Path) -> bool:
+    try:
+        data = load_index_at(index_path)
+        return isinstance(data, dict) and "capabilities" in data
+    except (OSError, yaml.YAMLError):
+        return False
+
+
+def format_stats_markdown(stats: dict[str, Any]) -> str:
+    lines = ["# Registry stats", ""]
+    lines.append(f"**Repo:** `{stats['repo_root']}`")
+    lines.append(f"**Capabilities:** {stats['capability_count']}")
+    if stats.get("domain"):
+        lines.append(f"**Domain:** `{stats['domain']}`")
+    lines.append("")
+
+    lines.append("## Layout")
+    lines.append(f"- registry present: `{stats['registry_present']}`")
+    lines.append(f"- index present: `{stats['index_present']}`")
+    lines.append(f"- federation sources present: `{stats['sources_present']}`")
+    lines.append("")
+
+    rel = stats["reliability"]
+    lines.append("## Reliability bands (index vectors)")
+    lines.append(f"- R0–R2: **{rel['r0_r2']}**")
+    lines.append(f"- R3+: **{rel['r3_plus']}**")
+    lines.append("")
+
+    for dimension, histogram in stats.get("histograms", {}).items():
+        if not histogram:
+            continue
+        lines.append(f"## {dimension.title()} histogram")
+        for level, count in histogram.items():
+            lines.append(f"- `{level}`: {count}")
+        lines.append("")
+
+    if stats.get("consumption_modes"):
+        lines.append("## Consumption modes")
+        for mode, count in stats["consumption_modes"].items():
+            lines.append(f"- `{mode}`: {count}")
+        lines.append("")
+
+    drift = stats.get("vector_drift", [])
+    lines.append(f"## Vector drift: **{len(drift)}**")
+    for item in drift[:10]:
+        if "error" in item:
+            lines.append(f"- `{item['id']}`: {item['error']}")
+        else:
+            lines.append(
+                f"- `{item['id']}`: index `{item['index_vector']}` "
+                f"≠ entry `{item['entry_vector']}`"
+            )
+    if len(drift) > 10:
+        lines.append(f"- … and {len(drift) - 10} more")
+    lines.append("")
+
+    federation = stats.get("federation", {})
+    if federation:
+        lines.append("## Federation readiness")
+        if "index_valid_yaml" in federation:
+            lines.append(f"- index valid YAML: `{federation['index_valid_yaml']}`")
+        probe = federation.get("raw_url_probe")
+        if probe:
+            status = probe.get("status")
+            ok = probe.get("ok")
+            lines.append(f"- raw URL probe: status **{status}** ({'ok' if ok else 'fail'})")
+            lines.append(f"  `{probe.get('url', '')}`")
+        lines.append("")
+
+    hub = stats.get("hub", {})
+    if hub.get("configured"):
+        lines.append("## Hub")
+        if "registration_count" in hub:
+            lines.append(
+                f"- registrations: **{hub['registration_count']}** "
+                f"({hub.get('enabled_count', 0)} enabled)"
+            )
+        elif "error" in hub:
+            lines.append(f"- hub error: {hub['error']}")
+        lines.append("")
+
+    return "\n".join(lines) + "\n"
+
+
+def format_stats_json(stats: dict[str, Any]) -> str:
+    return json.dumps(stats, indent=2, sort_keys=True)
--- a/schemas/registry-draft.schema.json
+++ b/schemas/registry-draft.schema.json
@@ -0,0 +1,69 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "$id": "https://reuse-surface.local/schemas/registry-draft.schema.json",
+  "title": "RegistryDiscoveryDraft",
+  "type": "object",
+  "additionalProperties": false,
+  "required": ["capabilities"],
+  "properties": {
+    "domain": {
+      "type": "string"
+    },
+    "capabilities": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "additionalProperties": false,
+        "required": ["id", "name", "summary"],
+        "properties": {
+          "id": {
+            "type": "string",
+            "pattern": "^capability\\.[a-z0-9]+(\\.[a-z0-9-]+)+$"
+          },
+          "name": {
+            "type": "string",
+            "minLength": 1
+          },
+          "summary": {
+            "type": "string",
+            "minLength": 1
+          },
+          "owner": {
+            "type": "string"
+          },
+          "vector": {
+            "type": "string",
+            "pattern": "^D[0-7] / A[0-7] / C[0-6] / R[0-6]$"
+          },
+          "tags": {
+            "type": "array",
+            "items": {
+              "type": "string"
+            }
+          },
+          "consumption_modes": {
+            "type": "array",
+            "items": {
+              "type": "string"
+            }
+          },
+          "discovery_intent": {
+            "type": "string"
+          },
+          "discovery_includes": {
+            "type": "array",
+            "items": {
+              "type": "string"
+            }
+          },
+          "discovery_excludes": {
+            "type": "array",
+            "items": {
+              "type": "string"
+            }
+          }
+        }
+      }
+    }
+  }
+}
--- a/tests/test_establish.py
+++ b/tests/test_establish.py
@@ -0,0 +1,77 @@
+from __future__ import annotations
+
+from pathlib import Path
+from unittest.mock import patch
+
+import yaml
+
+from reuse_surface.establish import (
+    discover_capabilities,
+    publish_check,
+    scaffold_registry,
+)
+from reuse_surface.registry import registry_paths
+
+
+def test_scaffold_creates_layout(tmp_path: Path):
+    created = scaffold_registry(tmp_path, domain="helix_forge")
+    paths = registry_paths(tmp_path)
+    assert paths["index"] in created
+    data = yaml.safe_load(paths["index"].read_text(encoding="utf-8"))
+    assert data["capabilities"] == []
+    assert data["domain"] == "helix_forge"
+
+
+def test_scaffold_refuses_existing_without_force(tmp_path: Path):
+    scaffold_registry(tmp_path)
+    try:
+        scaffold_registry(tmp_path)
+        raise AssertionError("expected ValueError")
+    except ValueError as exc:
+        assert "already exists" in str(exc)
+
+
+def test_publish_check_local_index(tmp_path: Path):
+    scaffold_registry(tmp_path)
+    result = publish_check(tmp_path)
+    assert result["ok"] is True
+    assert any(check["name"] == "local_index_yaml" for check in result["checks"])
+
+
+def test_publish_check_raw_url_fail(tmp_path: Path):
+    with patch(
+        "reuse_surface.establish._probe_raw_url",
+        return_value={"ok": False, "status": 303, "content_type": "text/html"},
+    ):
+        result = publish_check(
+            tmp_path,
+            raw_url="https://example.com/capabilities.yaml",
+        )
+    assert result["ok"] is False
+    assert result.get("remediation")
+
+
+def test_discover_dry_run_mock_llm(tmp_path: Path):
+    scaffold_registry(tmp_path)
+    (tmp_path / "README.md").write_text("# Demo service\n", encoding="utf-8")
+    draft = {
+        "domain": "helix_forge",
+        "capabilities": [
+            {
+                "id": "capability.demo.sample",
+                "name": "Sample",
+                "summary": "Sample capability.",
+                "owner": "demo",
+                "vector": "D2 / A0 / C0 / R0",
+                "tags": ["demo"],
+                "consumption_modes": ["informational"],
+                "discovery_intent": "Enable demo planning.",
+            }
+        ],
+    }
+    with patch(
+        "reuse_surface.establish.request_registry_draft",
+        return_value=draft,
+    ):
+        result = discover_capabilities(tmp_path, dry_run=True, apply=False)
+    assert result["draft"]["capabilities"][0]["id"] == "capability.demo.sample"
--- a/tests/test_llm_bridge.py
+++ b/tests/test_llm_bridge.py
@@ -0,0 +1,53 @@
+from __future__ import annotations
+
+import json
+from unittest.mock import patch
+
+import pytest
+
+from reuse_surface.llm_bridge import (
+    extract_json_object,
+    llm_connect_url,
+    request_registry_draft,
+)
+
+
+def test_extract_json_object_from_fenced_block():
+    data = extract_json_object('```json\n{"capabilities": []}\n```')
+    assert data == {"capabilities": []}
+
+
+def test_llm_connect_url_missing_raises():
+    with pytest.raises(ValueError, match="LLM_CONNECT_URL"):
+        llm_connect_url(None)
+
+
+def test_request_registry_draft_mock_http():
+    payload = {
+        "content": json.dumps(
+            {
+                "capabilities": [
+                    {
+                        "id": "capability.demo.sample",
+                        "name": "Sample",
+                        "summary": "Demo capability",
+                    }
+                ]
+            }
+        )
+    }
+
+    class FakeResponse:
+        def __enter__(self):
+            return self
+
+        def __exit__(self, *args):
+            return False
+
+        def read(self):
+            return json.dumps(payload).encode("utf-8")
+
+    with patch.dict("os.environ", {"LLM_CONNECT_URL": "http://llm.test"}):
+        with patch("urllib.request.urlopen", return_value=FakeResponse()):
+            draft = request_registry_draft("test prompt")
+    assert draft["capabilities"][0]["id"] == "capability.demo.sample"
--- a/tests/test_registry_update.py
+++ b/tests/test_registry_update.py
@@ -0,0 +1,87 @@
+from __future__ import annotations
+
+from pathlib import Path
+
+import yaml
+
+from reuse_surface.establish import scaffold_registry
+from reuse_surface.registry import load_index_at, registry_paths
+from reuse_surface.registry_update import (
+    apply_deterministic_suggestions,
+    collect_deterministic_suggestions,
+)
+
+
+def _write_minimal_entry(tmp_path: Path, cap_id: str, vector: str) -> str:
+    rel = "registry/capabilities/capability-demo-sample.md"
+    d, a, c, r = [part.strip() for part in vector.split("/")]
+    front_matter = {
+        "id": cap_id,
+        "name": "Sample",
+        "summary": "Sample",
+        "owner": "demo",
+        "status": "draft",
+        "domain": "helix_forge",
+        "tags": ["demo"],
+        "maturity": {
+            "discovery": {"current": d, "target": "D5", "confidence": "low"},
+            "availability": {"current": a, "target": "A3", "confidence": "low"},
+        },
+        "external_evidence": {
+            "completeness": {"level": c, "confidence": "low"},
+            "reliability": {"level": r, "confidence": "low"},
+        },
+        "discovery": {"intent": "demo", "includes": [], "excludes": []},
+        "availability": {
+            "current_level": a,
+            "target_level": "A3",
+            "current_artifacts": [],
+            "consumption_modes": ["informational"],
+        },
+        "relations": {"depends_on": [], "supports": [], "related_to": []},
+        "evidence": {"documentation": [], "tests": []},
+        "consumer_guidance": {
+            "recommended_for": [],
+            "not_recommended_for": [],
+            "known_limitations": [],
+        },
+    }
+    path = tmp_path / rel
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(
+        "---\n"
+        + yaml.safe_dump(front_matter, sort_keys=False)
+        + "---\n",
+        encoding="utf-8",
+    )
+    return rel
+
+
+def test_vector_drift_suggestion(tmp_path: Path):
+    scaffold_registry(tmp_path)
+    cap_id = "capability.demo.sample"
+    rel = _write_minimal_entry(tmp_path, cap_id, "D3 / A0 / C0 / R0")
+    index_path = registry_paths(tmp_path)["index"]
+    index = load_index_at(index_path)
+    index["capabilities"] = [
+        {
+            "id": cap_id,
+            "name": "Sample",
+            "summary": "Sample",
+            "vector": "D2 / A0 / C0 / R0",
+            "domain": "helix_forge",
+            "status": "draft",
+            "owner": "demo",
+            "path": rel,
+            "tags": ["demo"],
+            "consumption_modes": ["informational"],
+        }
+    ]
+    index_path.write_text(yaml.safe_dump(index, sort_keys=False), encoding="utf-8")
+
+    suggestions = collect_deterministic_suggestions(tmp_path, capability_id=cap_id)
+    assert any(item["kind"] == "vector_drift" for item in suggestions)
+    changed = apply_deterministic_suggestions(tmp_path, suggestions)
+    assert changed
+    updated = load_index_at(index_path)
+    assert updated["capabilities"][0]["vector"] == "D3 / A0 / C0 / R0"
--- a/tests/test_stats.py
+++ b/tests/test_stats.py
@@ -0,0 +1,20 @@
+from __future__ import annotations
+
+from pathlib import Path
+
+from reuse_surface.stats import collect_stats, format_stats_markdown
+
+
+def test_collect_stats_on_repo_root():
+    root = Path(__file__).resolve().parent.parent
+    stats = collect_stats(root)
+    assert stats["capability_count"] == 20
+    assert stats["index_present"] is True
+    assert "discovery" in stats["histograms"]
+
+
+def test_format_stats_markdown_contains_count():
+    root = Path(__file__).resolve().parent.parent
+    text = format_stats_markdown(collect_stats(root))
+    assert "Capabilities:" in text
+    assert "20" in text
--- a/tools/README.md
+++ b/tools/README.md
@@ -104,6 +104,45 @@ reuse-surface hub sync --dry-run

 Run the service locally: `REUSE_SURFACE_TOKEN=dev-token reuse-surface serve`

+### stats
+
+Registry maturity aggregates and federation readiness.
+
+```bash
+reuse-surface stats
+reuse-surface stats --format json
+reuse-surface stats --federation-ready --raw-url https://.../capabilities.yaml
+```
+
+### establish
+
+Bootstrap or discover a capability registry in the current or target repo.
+
+```bash
+reuse-surface establish --scaffold --domain helix_forge
+reuse-surface establish --scaffold --path ../state-hub
+reuse-surface establish --publish-check --raw-url https://.../capabilities.yaml
+export LLM_CONNECT_URL=http://127.0.0.1:8088
+reuse-surface establish --discover --dry-run
+reuse-surface establish --discover --apply
+```
+
+`--scaffold` creates `registry/` layout. `--publish-check` probes raw URL and
+local index YAML. `--discover` drafts capabilities via llm-connect (optional).
+
+### update
+
+Refresh registry metadata from repo drift signals.
+
+```bash
+reuse-surface update --capability capability.registry.register --dry-run
+reuse-surface update --all --from-git-since HEAD~5 --apply
+reuse-surface update --capability capability.registry.register --suggest-maturity
+```
+
+Deterministic patches (`vector_drift`, new `tests/` citations) apply with
+`--apply`. LLM suggestions use `--suggest-maturity` and remain review-only.
+
 ### report cohorts

 Export capability cohorts for planning or implementation reuse decisions.
@@ -140,6 +179,11 @@ Stable IDs and maturity fields are preserved for agent consumption (UC-RS-019).
 | Publish catalog | `reuse-surface catalog` |
 | Compose federation | `reuse-surface federation compose` |
 | Sync federation manifest from hub | `reuse-surface hub sync` |
+| Registry stats | `reuse-surface stats` |
+| Bootstrap sibling registry | `reuse-surface establish --scaffold` |
+| Verify index publish URL | `reuse-surface establish --publish-check` |
+| Draft capabilities (LLM) | `reuse-surface establish --discover` |
+| Refresh entry metadata | `reuse-surface update` |
 | Planning cohort export | `reuse-surface report cohorts` |
 | Relation graph | `reuse-surface graph` |

--- a/workplans/archived/260617-REUSE-WP-0013-registry-establish-and-llm-assist.md
+++ b/workplans/archived/260617-REUSE-WP-0013-registry-establish-and-llm-assist.md
@@ -4,11 +4,11 @@ type: workplan
 title: "Registry establish, update, and stats with optional llm-connect assist"
 domain: helix_forge
 repo: reuse-surface
-status: ready
+status: finished
 owner: codex
 topic_slug: helix-forge
 created: "2026-06-16"
-updated: "2026-06-16"
+updated: "2026-06-17"
 state_hub_workstream_id: "239a0077-8593-4dc7-918d-4c23895275f6"
 ---

@@ -91,7 +91,7 @@ reuse-surface update --from-git-since HEAD~5 --apply

 ```task
 id: REUSE-WP-0013-T01
-status: todo
+status: done
 priority: high
 state_hub_task_id: "98e65330-bfc7-4282-b372-d35542b899ce"
 ```
@@ -112,7 +112,7 @@ Output: Markdown default, `--format json`. Pytest coverage. Document in

 ```task
 id: REUSE-WP-0013-T02
-status: todo
+status: done
 priority: high
 state_hub_task_id: "b8fedd87-d0d3-41b4-9af8-e36d52bfe1c5"
 ```
@@ -131,7 +131,7 @@ No llm-connect dependency. Pytest with temp directory.

 ```task
 id: REUSE-WP-0013-T03
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "2924d685-709f-4e28-886f-b363cd9c40b4"
 ```
@@ -147,7 +147,7 @@ Federation publish helper for sibling repo operators:

 ```task
 id: REUSE-WP-0013-T04
-status: todo
+status: done
 priority: high
 state_hub_task_id: "650ebee5-b34b-4ed8-891d-d93aacebadd7"
 ```
@@ -166,7 +166,7 @@ Thin client boundary:

 ```task
 id: REUSE-WP-0013-T05
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "b9154889-f538-4266-9918-b277f9a297be"
 ```
@@ -185,7 +185,7 @@ LLM-assisted bootstrap after `--scaffold` or on empty registry:

 ```task
 id: REUSE-WP-0013-T06
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "b79558da-54b2-4712-91d2-b298c7cf2c40"
 ```
@@ -210,7 +210,7 @@ Targets: single `--capability`, `--all`, `--from-git-since <ref>`.

 ```task
 id: REUSE-WP-0013-T07
-status: todo
+status: done
 priority: low
 state_hub_task_id: "a55a2f26-004e-4c20-90cb-49bed64a1291"
 ```
@@ -227,13 +227,20 @@ state_hub_task_id: "a55a2f26-004e-4c20-90cb-49bed64a1291"

 ## Acceptance

- [ ] `reuse-surface stats` reports maturity and federation-readiness aggregates
- [ ] `establish --scaffold` creates valid empty registry layout without overwrite accidents
- [ ] `establish --publish-check` detects 303 vs 200 raw URL outcomes
- [ ] llm-connect bridge works with mocked HTTP; fails clearly when URL unset
- [ ] `establish --discover --dry-run` produces schema-valid draft JSON from fixture context
- [ ] `update --dry-run` reports deterministic drift on sample repo
- [ ] All new commands documented; gap priority 24 recorded
+- [x] `reuse-surface stats` reports maturity and federation-readiness aggregates
+- [x] `establish --scaffold` creates valid empty registry layout without overwrite accidents
+- [x] `establish --publish-check` detects 303 vs 200 raw URL outcomes
+- [x] llm-connect bridge works with mocked HTTP; fails clearly when URL unset
+- [x] `establish --discover --dry-run` produces schema-valid draft JSON from fixture context
+- [x] `update --dry-run` reports deterministic drift on sample repo
+- [x] All new commands documented; gap priority 24 recorded
+
+## Completion notes (2026-06-17)
+
+- Modules: `stats.py`, `establish.py`, `registry_update.py`, `llm_bridge.py`
+- Schema: `schemas/registry-draft.schema.json`
+- `validate --root` for sibling repo validation after establish --apply
+- 43 pytest tests; optional `pip install -e ".[llm]"` extra

 ## Out of scope