Implement REUSE-WP-0013 registry establish, update, and stats

Add stats, establish (scaffold, publish-check, discover), and update CLI commands with optional llm-connect bridge, validate --root for sibling repos, pytest coverage, and documentation for sibling registry onboarding.
2026-06-16 01:21:01 +02:00
parent fb712b4b98
commit 70a5003f6e
19 changed files with 1740 additions and 30 deletions
--- a/.gitea/workflows/ci.yml
+++ b/.gitea/workflows/ci.yml
@@ -32,6 +32,9 @@ jobs:
          reuse-surface catalog
          reuse-surface graph --check --fail-on-warnings
      - name: Registry stats (informational)
        run: reuse-surface stats || true
      - name: Planning cohort report (informational)
        run: reuse-surface report cohorts --planning-min D4 || true
--- a/SCOPE.md
+++ b/SCOPE.md
@@ -60,6 +60,11 @@ The MVP registry foundation, CLI tooling (REUSE-WP-0003), federation stack
  against `https://reuse.coulomb.social`
 - **Sync local federation manifest from hub** with `reuse-surface hub sync`
 - **Export planning cohorts** with `reuse-surface report cohorts`
 - **Bootstrap a sibling registry** with `reuse-surface establish --scaffold`
 - **Verify index publish readiness** with `reuse-surface establish --publish-check`
 - **View registry stats** with `reuse-surface stats`
 - **Draft or refresh entries** with `reuse-surface establish --discover` and
  `reuse-surface update` (optional llm-connect backend)
 - **Run the hub locally or in a container** with `reuse-surface serve`
 - **Generate relation graphs** with `reuse-surface graph`
 - **Explore relations interactively** at `docs/graph/index.html`
@@ -104,8 +109,8 @@ See `tools/README.md` for command reference.
 - **Federated index:** `registry/indexes/federated.yaml` (local compose).
 - **Relation graph:** `docs/graph/capability-graph.mmd`, `docs/graph/index.html`.
 - **Searchable catalog:** `docs/catalog/search.html`.
- **Workplans:** REUSE-WP-0001 through REUSE-WP-0011 finished; WP-0011 archived;
+- **Workplans:** REUSE-WP-0001 through REUSE-WP-0012 finished/archived;
-  **REUSE-WP-0012** finished (federation scale + intent alignment).
+  **REUSE-WP-0013** finished (registry establish/update/stats).
 - **Assessment history:** `history/2026-06-15-intent-scope-assessment.md`.
 - **Self-assessed vector:** `D5 / A4 / C5 / R3` (see `docs/IntentScopeGapAnalysis.md`).
--- a/docs/IntentScopeGapAnalysis.md
+++ b/docs/IntentScopeGapAnalysis.md
@@ -3,7 +3,7 @@
 **Repository:** `reuse-surface`  
 **Artifact:** `docs/IntentScopeGapAnalysis.md`  
 **Status:** Living analysis  
-**Updated:** 2026-06-16  
+**Updated:** 2026-06-17  
 **Purpose:** Record alignment, drift, and open gaps between declared intent and
 current delivered scope so future workplans can close them deliberately.
@@ -30,6 +30,8 @@ four maturity dimensions, and human/agent consumers.
   standardization tracker still manual.
 3. **Hub automation** — `hub sync` shipped; polling/webhooks still absent.
 4. **Managed platform posture** — A5 container documented; A6/Postgres deferred.
 5. **Registry bootstrap in sibling repos** — `establish`/`update`/`stats` shipped;
   sibling adoption still operator-driven.
 **Current reuse-surface product vector (self-assessment):** `D5 / A4 / C5 / R3`
@@ -197,8 +199,10 @@ archived workplans under `workplans/archived/`.
 | 21 | INTENT layout sync | Update INTENT.md tree and example entry shape | **Closed** (WP-0012) |
 | 22 | Hub hardening | Postgres option, backup, documented SLO (A5→A6 path) | **Closed** (doc; implementation deferred) |
 | 23 | External evidence program | Raise catalog R levels with consumer_feedback | **Closed** (checklist + 3 entries; telemetry deferred) |
 | 24 | Registry bootstrap tooling | `establish`, `update`, `stats` for sibling repos | **Closed** (WP-0013) |
-**Workplan:** `REUSE-WP-0012` (finished). **Assessment snapshots:**
+**Workplan:** `REUSE-WP-0013` (finished). Prior: `REUSE-WP-0012` (finished).
 **Assessment snapshots:**
 `history/2026-06-15-intent-scope-assessment.md`,
 `history/2026-06-16-hub-registration-blocks.md`.
@@ -228,3 +232,4 @@ archived workplans under `workplans/archived/`.
 | 2026-06-15 | Post-WP-0011 refresh: 20 capabilities, vector D5/A4/C4/R3, priorities 18–23 proposed |
 | 2026-06-15 | REUSE-WP-0012 proposed; assessment archived in `history/2026-06-15-intent-scope-assessment.md` |
 | 2026-06-16 | REUSE-WP-0012 closed priorities 19–23; priority 18 deferred on sibling index blocks; vector C5 |
 | 2026-06-17 | REUSE-WP-0013 closed priority 24; establish/update/stats + optional llm-connect assist |
--- a/docs/RegistryFederation.md
+++ b/docs/RegistryFederation.md
@@ -97,6 +97,18 @@ curl -fsS "<raw-url>" | head
  source) to an environment variable holding a Bearer token or full header value.
  The hub stores `auth_env` / `auth_header` names only — never secret values.
 ### Sibling onboarding (CLI)
 ```bash
 cd ../state-hub
 reuse-surface establish --scaffold --domain helix_forge
 # optional: LLM_CONNECT_URL=... reuse-surface establish --discover --dry-run
 reuse-surface validate --root .
 git push origin main
 reuse-surface establish --publish-check \
  --raw-url https://gitea.coulomb.social/coulomb/state-hub/raw/main/registry/indexes/capabilities.yaml
 ```
 ### Registration checklist
 1. Merge capability index to the default branch.
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -20,6 +20,9 @@ dev = [
  "httpx>=0.27",
  "pytest>=8.0",
 ]
 llm = [
  "llm-connect",
 ]
 [project.scripts]
 reuse-surface = "reuse_surface.cli:main"
--- a/registry/README.md
+++ b/registry/README.md
@@ -35,6 +35,21 @@ registry/
 Missing evidence is acceptable in the MVP when it is explicit rather than hidden.
 ## LLM-assisted discover review checklist
 When using `reuse-surface establish --discover` (llm-connect backend):
 - [ ] Every proposed `id` follows `capability.<domain>.<name>` and is not a duplicate
 - [ ] `summary`, `discovery.intent`, and maturity vectors match repo reality
 - [ ] `owner` reflects the delivering repository or team
 - [ ] Relations are empty or manually added after human review
 - [ ] Run `reuse-surface validate --root <repo>` before merge
 - [ ] Run `reuse-surface establish --publish-check` after pushing to `main`
 Discover drafts start at low maturity with explicit auto-draft risks in
 `known_reliability_risks`. Promote only with evidence per
 `specs/CapabilityMaturityStandard.md`.
 ## Manual validation checklist
 Use this checklist until an automated CLI validator exists.
--- a/reuse_surface/cli.py
+++ b/reuse_surface/cli.py
@@ -26,21 +26,48 @@ from reuse_surface.reports import (
    format_cohort_markdown,
    select_cohort,
 )
 from reuse_surface.establish import (
    discover_capabilities,
    format_publish_check_markdown,
    publish_check,
    scaffold_next_steps,
    scaffold_registry,
 )
 from reuse_surface.registry_update import (
    apply_deterministic_suggestions,
    collect_deterministic_suggestions,
    format_suggestions_json,
    format_suggestions_markdown,
    suggest_llm_updates,
 )
 from reuse_surface.stats import collect_stats, format_stats_json, format_stats_markdown
 from reuse_surface.registry import (
    ROOT,
    capability_paths,
    level_at_least,
    load_index,
    load_index_at,
    load_schema,
    parse_front_matter,
    parse_vector,
    registry_paths,
 )
-def _check_index_drift(entry_paths: list[Path], index: dict[str, Any]) -> list[str]:
+def _registry_root(args: argparse.Namespace) -> Path:
    if getattr(args, "root", None):
        return Path(args.root).resolve()
    return ROOT
 def _check_index_drift(
    entry_paths: list[Path],
    index: dict[str, Any],
    repo_root: Path,
 ) -> list[str]:
    warnings: list[str] = []
    indexed_paths = {item["path"] for item in index.get("capabilities", [])}
-    file_paths = {str(path.relative_to(ROOT)) for path in entry_paths}
+    file_paths = {str(path.relative_to(repo_root)) for path in entry_paths}
    for path in sorted(file_paths - indexed_paths):
        warnings.append(f"index drift: entry file not indexed: {path}")
    for path in sorted(indexed_paths - file_paths):
@@ -48,11 +75,22 @@ def _check_index_drift(entry_paths: list[Path], index: dict[str, Any]) -> list[s
    return warnings
-def cmd_validate(args: argparse.Namespace) -> int:
+def _capability_paths_for(repo_root: Path, target: Path | None) -> list[Path]:
    if target is not None:
        return [target]
    cap_dir = registry_paths(repo_root)["capabilities"]
    return sorted(path for path in cap_dir.glob("*.md") if path.name != ".gitkeep")
 def _run_validate(
    repo_root: Path,
    *,
    target: Path | None,
    relations: bool,
 ) -> tuple[list[str], list[str], list[Path]]:
    schema = load_schema()
    validator = Draft202012Validator(schema)
-    target = Path(args.path) if args.path else None
+    paths = _capability_paths_for(repo_root, target)
    paths = capability_paths(target)
    errors: list[str] = []
    warnings: list[str] = []
@@ -67,10 +105,23 @@ def cmd_validate(args: argparse.Namespace) -> int:
            errors.append(f"{path}: {location}: {error.message}")
    if not target:
-        index = load_index()
+        index_path = registry_paths(repo_root)["index"]
-        warnings.extend(_check_index_drift(paths, index))
+        if index_path.exists():
-        if args.relations:
+            index = load_index_at(index_path)
            warnings.extend(_check_index_drift(paths, index, repo_root))
        if relations and repo_root == ROOT:
            warnings.extend(check_relations())
    return errors, warnings, paths
 def cmd_validate(args: argparse.Namespace) -> int:
    repo_root = _registry_root(args)
    target = Path(args.path) if args.path else None
    if target and not target.is_absolute():
        target = repo_root / target
    errors, warnings, paths = _run_validate(
        repo_root, target=target, relations=args.relations
    )
    for warning in warnings:
        print(f"warning: {warning}", file=sys.stderr)
@@ -329,6 +380,117 @@ def cmd_hub_sync(args: argparse.Namespace) -> int:
    return 0
 def cmd_stats(args: argparse.Namespace) -> int:
    repo_root = Path(args.path or ".").resolve()
    stats = collect_stats(
        repo_root,
        federation_ready=args.federation_ready,
        raw_url=args.raw_url,
        hub_url=getattr(args, "hub_url", None),
    )
    if args.format == "json":
        print(format_stats_json(stats))
    else:
        print(format_stats_markdown(stats), end="")
    return 0
 def cmd_establish(args: argparse.Namespace) -> int:
    repo_root = Path(args.path or ".").resolve()
    try:
        if args.scaffold:
            created = scaffold_registry(
                repo_root, domain=args.domain, force=args.force
            )
            for path in created:
                print(f"ok: wrote {path.relative_to(repo_root)}")
            print(scaffold_next_steps(repo_root))
            return 0
        if args.publish_check:
            result = publish_check(repo_root, raw_url=args.raw_url)
            print(format_publish_check_markdown(result), end="")
            return 0 if result["ok"] else 1
        if args.discover:
            result = discover_capabilities(
                repo_root,
                domain=args.domain,
                dry_run=not args.apply,
                apply=args.apply,
                llm_url=args.llm_url,
                context_max_files=args.context_max_files,
            )
            if result.get("dry_run"):
                print(yaml.safe_dump(result["draft"], sort_keys=False))
                return 0
            for path in result.get("written", []):
                print(f"ok: wrote {path}")
            validate_args = argparse.Namespace(
                path=None,
                root=str(repo_root),
                relations=False,
                fail_on_warnings=True,
            )
            return cmd_validate(validate_args)
    except ValueError as exc:
        print(f"error: {exc}", file=sys.stderr)
        return 1
    print("error: specify --scaffold, --publish-check, or --discover", file=sys.stderr)
    return 1
 def cmd_update(args: argparse.Namespace) -> int:
    repo_root = Path(args.path or ".").resolve()
    try:
        capability_id = None if args.all else args.capability
        if not args.all and not args.capability:
            print("error: specify --capability or --all", file=sys.stderr)
            return 1
        if args.suggest_maturity:
            cap_ids = [args.capability] if args.capability else []
            if args.all:
                index = load_index_at(registry_paths(repo_root)["index"])
                cap_ids = [row["id"] for row in index.get("capabilities", [])]
            payload = {
                "suggestions": [
                    suggest_llm_updates(
                        repo_root,
                        cap_id,
                        git_since=args.from_git_since,
                        llm_url=args.llm_url,
                    )
                    for cap_id in cap_ids
                ]
            }
            print(json.dumps(payload, indent=2, sort_keys=True))
            return 0
        suggestions = collect_deterministic_suggestions(
            repo_root,
            capability_id=capability_id,
            git_since=args.from_git_since,
        )
        if args.apply:
            changed = apply_deterministic_suggestions(repo_root, suggestions)
            for line in changed:
                print(f"ok: {line}")
            validate_args = argparse.Namespace(
                path=None,
                root=str(repo_root),
                relations=False,
                fail_on_warnings=True,
            )
            return cmd_validate(validate_args)
        if args.format == "json":
            print(format_suggestions_json(suggestions))
        else:
            print(format_suggestions_markdown(suggestions), end="")
        return 0
    except ValueError as exc:
        print(f"error: {exc}", file=sys.stderr)
        return 1
 def cmd_report_cohorts(args: argparse.Namespace) -> int:
    filters = cohort_filters_from_args(args)
    matches = select_cohort(filters)
@@ -399,6 +561,10 @@ def main(argv: list[str] | None = None) -> int:
        action="store_true",
        help="exit non-zero when warnings are present",
    )
    validate.add_argument(
        "--root",
        help="registry repo root (default: reuse-surface install root)",
    )
    validate.set_defaults(func=cmd_validate)
    federation = subparsers.add_parser(
@@ -539,6 +705,41 @@ def main(argv: list[str] | None = None) -> int:
    )
    cohorts.set_defaults(func=cmd_report_cohorts)
    stats = subparsers.add_parser("stats", help="registry maturity and federation stats")
    stats.add_argument("--path", help="repo root (default: cwd)")
    stats.add_argument("--federation-ready", action="store_true")
    stats.add_argument("--raw-url", help="probe federation raw index URL")
    stats.add_argument("--hub-url", help="hub base URL (or REUSE_SURFACE_URL)")
    stats.add_argument("--format", choices=["markdown", "json"], default="markdown")
    stats.set_defaults(func=cmd_stats)
    establish = subparsers.add_parser(
        "establish", help="bootstrap or discover capability registry"
    )
    establish.add_argument("--path", help="target repo root (default: cwd)")
    establish.add_argument("--domain", default="helix_forge")
    establish.add_argument("--force", action="store_true")
    establish.add_argument("--scaffold", action="store_true")
    establish.add_argument("--publish-check", action="store_true")
    establish.add_argument("--discover", action="store_true")
    establish.add_argument("--dry-run", action="store_true", help="discover preview (default)")
    establish.add_argument("--apply", action="store_true", help="discover write + validate")
    establish.add_argument("--raw-url", help="raw Gitea index URL for publish-check")
    establish.add_argument("--llm-url", help="llm-connect base URL (or LLM_CONNECT_URL)")
    establish.add_argument("--context-max-files", type=int, default=12)
    establish.set_defaults(func=cmd_establish)
    update = subparsers.add_parser("update", help="refresh registry metadata from repo signals")
    update.add_argument("--path", help="repo root (default: cwd)")
    update.add_argument("--capability", help="single capability id")
    update.add_argument("--all", action="store_true")
    update.add_argument("--from-git-since", help="git ref for change detection")
    update.add_argument("--apply", action="store_true")
    update.add_argument("--suggest-maturity", action="store_true")
    update.add_argument("--llm-url", help="llm-connect base URL (or LLM_CONNECT_URL)")
    update.add_argument("--format", choices=["markdown", "json"], default="markdown")
    update.set_defaults(func=cmd_update)
    args = parser.parse_args(argv)
    return args.func(args)
--- a/reuse_surface/establish.py
+++ b/reuse_surface/establish.py
@@ -0,0 +1,448 @@
 from __future__ import annotations
 import json
 import textwrap
 import urllib.error
 import urllib.request
 from datetime import date
 from pathlib import Path
 from typing import Any
 import yaml
 from reuse_surface.llm_bridge import request_registry_draft
 from reuse_surface.registry import load_index_at, registry_paths
 SCAFFOLD_README = """# Capability Registry
 Markdown-first capability index for federation and reuse planning.
 ## Authoring
 1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`).
 2. Add the row to `indexes/capabilities.yaml`.
 3. Run `reuse-surface validate` from a checkout with the CLI installed.
 4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`.
 Federation contract: reuse-surface `docs/RegistryFederation.md`.
 """
 CONTEXT_FILES = (
    "INTENT.md",
    "SCOPE.md",
    "AGENTS.md",
    "README.md",
    "pyproject.toml",
    "Cargo.toml",
    "go.mod",
 )
 def scaffold_registry(
    repo_root: Path,
    *,
    domain: str = "helix_forge",
    force: bool = False,
 ) -> list[Path]:
    paths = registry_paths(repo_root)
    created: list[Path] = []
    if paths["registry"].exists() and not force:
        raise ValueError(
            f"registry already exists at {paths['registry']}; use --force to overwrite"
        )
    paths["registry"].mkdir(parents=True, exist_ok=True)
    paths["capabilities"].mkdir(parents=True, exist_ok=True)
    paths["index"].parent.mkdir(parents=True, exist_ok=True)
    readme = paths["registry"] / "README.md"
    if force or not readme.exists():
        readme.write_text(SCAFFOLD_README, encoding="utf-8")
        created.append(readme)
    gitkeep = paths["capabilities"] / ".gitkeep"
    if force or not gitkeep.exists():
        gitkeep.write_text("", encoding="utf-8")
        created.append(gitkeep)
    index_data = {
        "version": 1,
        "updated": date.today().isoformat(),
        "domain": domain,
        "capabilities": [],
    }
    if force or not paths["index"].exists():
        paths["index"].write_text(
            yaml.safe_dump(index_data, sort_keys=False, allow_unicode=True),
            encoding="utf-8",
        )
        created.append(paths["index"])
    return created
 def scaffold_next_steps(repo_root: Path) -> str:
    return textwrap.dedent(
        f"""
        Next steps:
          1. Add capability entries under {repo_root / 'registry/capabilities'}
          2. Update {repo_root / 'registry/indexes/capabilities.yaml'}
          3. reuse-surface validate
          4. git push origin main
          5. reuse-surface establish --publish-check --raw-url <gitea-raw-url>
          6. reuse-surface hub register --repo <slug> --url <raw-url>
        """
    ).strip()
 def publish_check(
    repo_root: Path,
    *,
    raw_url: str | None = None,
 ) -> dict[str, Any]:
    paths = registry_paths(repo_root)
    result: dict[str, Any] = {
        "repo_root": str(repo_root),
        "checks": [],
        "ok": True,
    }
    if paths["index"].exists():
        try:
            data = load_index_at(paths["index"])
            valid = isinstance(data, dict) and isinstance(data.get("capabilities"), list)
            result["checks"].append(
                {
                    "name": "local_index_yaml",
                    "ok": valid,
                    "detail": f"{len(data.get('capabilities', []))} capabilities"
                    if valid
                    else "invalid structure",
                }
            )
            if not valid:
                result["ok"] = False
        except (OSError, yaml.YAMLError) as exc:
            result["checks"].append(
                {"name": "local_index_yaml", "ok": False, "detail": str(exc)}
            )
            result["ok"] = False
    else:
        result["checks"].append(
            {
                "name": "local_index_yaml",
                "ok": False,
                "detail": "registry/indexes/capabilities.yaml missing",
            }
        )
        result["ok"] = False
    if raw_url:
        probe = _probe_raw_url(raw_url)
        result["checks"].append(
            {
                "name": "raw_url_probe",
                "ok": probe["ok"],
                "detail": f"HTTP {probe.get('status')} {probe.get('content_type', '')}".strip(),
                "url": raw_url,
            }
        )
        if probe["ok"]:
            body_probe = _fetch_yaml_snippet(raw_url)
            result["checks"].append(body_probe)
            if not body_probe.get("ok"):
                result["ok"] = False
        else:
            result["ok"] = False
            result["remediation"] = (
                "Merge registry/indexes/capabilities.yaml to main and confirm "
                "Gitea raw URL returns 200 YAML. See docs/RegistryFederation.md."
            )
    return result
 def _probe_raw_url(url: str) -> dict[str, Any]:
    request = urllib.request.Request(
        url,
        method="HEAD",
        headers={"User-Agent": "reuse-surface/0.1"},
    )
    try:
        with urllib.request.urlopen(request, timeout=30) as response:
            return {
                "ok": response.status == 200,
                "status": response.status,
                "content_type": response.headers.get("Content-Type", ""),
            }
    except urllib.error.HTTPError as exc:
        return {
            "ok": False,
            "status": exc.code,
            "content_type": exc.headers.get("Content-Type", ""),
        }
 def _fetch_yaml_snippet(url: str) -> dict[str, Any]:
    request = urllib.request.Request(url, headers={"User-Agent": "reuse-surface/0.1"})
    try:
        with urllib.request.urlopen(request, timeout=30) as response:
            body = response.read().decode("utf-8")
    except urllib.error.HTTPError as exc:
        return {"name": "raw_url_body", "ok": False, "detail": f"HTTP {exc.code}"}
    except urllib.error.URLError as exc:
        return {"name": "raw_url_body", "ok": False, "detail": str(exc.reason)}
    try:
        data = yaml.safe_load(body)
    except yaml.YAMLError as exc:
        return {"name": "raw_url_body", "ok": False, "detail": str(exc)}
    ok = isinstance(data, dict) and "capabilities" in data
    return {
        "name": "raw_url_body",
        "ok": ok,
        "detail": "valid capabilities.yaml shape" if ok else "body is not valid index YAML",
    }
 def collect_context(repo_root: Path, *, max_files: int = 12) -> str:
    chunks: list[str] = []
    used = 0
    for name in CONTEXT_FILES:
        if used >= max_files:
            break
        path = repo_root / name
        if path.is_file():
            chunks.append(f"### {name}\n{path.read_text(encoding='utf-8')[:8000]}")
            used += 1
    pkg_dirs = sorted(
        [
            item
            for item in repo_root.iterdir()
            if item.is_dir()
            and not item.name.startswith(".")
            and item.name not in {"registry", "tests", "docs", "workplans", "node_modules"}
        ]
    )
    for pkg in pkg_dirs[: max(0, max_files - used)]:
        init = pkg / "__init__.py"
        if init.exists():
            chunks.append(f"### {pkg.name}/__init__.py\n{init.read_text(encoding='utf-8')[:2000]}")
    return "\n\n".join(chunks)
 def build_discover_prompt(context: str, domain: str) -> str:
    schema_hint = json.dumps(
        {
            "domain": domain,
            "capabilities": [
                {
                    "id": "capability.domain.name",
                    "name": "Human Name",
                    "summary": "One sentence.",
                    "owner": "team",
                    "vector": "D2 / A0 / C0 / R0",
                    "tags": ["tag"],
                    "consumption_modes": ["informational"],
                    "discovery_intent": "What this enables.",
                    "discovery_includes": ["included behavior"],
                    "discovery_excludes": ["excluded behavior"],
                }
            ],
        },
        indent=2,
    )
    return textwrap.dedent(
        f"""
        You are drafting a capability registry index for helix_forge reuse-surface.
        Return ONLY a JSON object matching this shape (no markdown fences):
        {schema_hint}
        Rules:
        - Propose 1-5 distinct capabilities grounded in the repository context.
        - Use IDs matching ^capability\\.[a-z0-9]+(\\.[a-z0-9-]+)+$
        - Default vector D2 / A0 / C0 / R0 unless strong delivery evidence exists.
        - domain: {domain}
        Repository context:
        {context}
        """
    ).strip()
 def discover_capabilities(
    repo_root: Path,
    *,
    domain: str = "helix_forge",
    dry_run: bool = True,
    apply: bool = False,
    llm_url: str | None = None,
    context_max_files: int = 12,
 ) -> dict[str, Any]:
    if apply and dry_run:
        raise ValueError("use either --dry-run or --apply, not both")
    if not apply and not dry_run:
        dry_run = True
    context = collect_context(repo_root, max_files=context_max_files)
    if not context.strip():
        raise ValueError("no context files found for discovery")
    prompt = build_discover_prompt(context, domain)
    draft = request_registry_draft(
        prompt,
        base_url=llm_url,
        config={"temperature": 0.2, "max_tokens": 4000},
    )
    result: dict[str, Any] = {"draft": draft, "written": [], "dry_run": dry_run}
    if dry_run:
        return result
    paths = registry_paths(repo_root)
    if not paths["index"].exists():
        scaffold_registry(repo_root, domain=domain, force=False)
    index = load_index_at(paths["index"]) if paths["index"].exists() else {
        "version": 1,
        "domain": domain,
        "capabilities": [],
    }
    existing_ids = {row["id"] for row in index.get("capabilities", [])}
    for item in draft.get("capabilities", []):
        cap_id = item["id"]
        if cap_id in existing_ids:
            continue
        filename = cap_id.replace(".", "-") + ".md"
        rel_path = f"registry/capabilities/{filename}"
        entry_path = repo_root / rel_path
        entry_body = _render_entry_from_draft(item, domain)
        entry_path.parent.mkdir(parents=True, exist_ok=True)
        entry_path.write_text(entry_body, encoding="utf-8")
        vector = item.get("vector", "D2 / A0 / C0 / R0")
        index.setdefault("capabilities", []).append(
            {
                "id": cap_id,
                "name": item["name"],
                "summary": item["summary"],
                "vector": vector,
                "domain": domain,
                "status": "draft",
                "owner": item.get("owner", repo_root.name),
                "path": rel_path,
                "tags": item.get("tags", []),
                "consumption_modes": item.get("consumption_modes", ["informational"]),
            }
        )
        result["written"].append(rel_path)
    index["updated"] = date.today().isoformat()
    index["domain"] = draft.get("domain", domain)
    paths["index"].write_text(
        yaml.safe_dump(index, sort_keys=False, allow_unicode=True),
        encoding="utf-8",
    )
    result["written"].append(str(paths["index"].relative_to(repo_root)))
    return result
 def _render_entry_from_draft(item: dict[str, Any], domain: str) -> str:
    vector = item.get("vector", "D2 / A0 / C0 / R0")
    d, a, c, r = [part.strip() for part in vector.split("/")]
    front_matter = {
        "id": item["id"],
        "name": item["name"],
        "summary": item["summary"],
        "owner": item.get("owner", domain),
        "status": "draft",
        "domain": domain,
        "tags": item.get("tags") or ["draft"],
        "maturity": {
            "discovery": {
                "current": d,
                "target": "D5",
                "confidence": "low",
                "rationale": "Auto-drafted by reuse-surface establish --discover; review required.",
            },
            "availability": {
                "current": a,
                "target": "A3",
                "confidence": "low",
                "rationale": "Auto-drafted; confirm consumption modes and artifacts.",
            },
        },
        "external_evidence": {
            "completeness": {
                "level": c,
                "confidence": "low",
                "basis": "scope_vs_intent_and_consumer_expectations",
                "satisfied_expectations": [],
                "broken_expectations": [],
                "out_of_scope_expectations": [],
            },
            "reliability": {
                "level": r,
                "confidence": "low",
                "basis": "consumer_quality_signals",
                "known_reliability_risks": ["auto-drafted entry without consumer evidence"],
            },
        },
        "discovery": {
            "intent": item.get("discovery_intent", item["summary"]),
            "includes": item.get("discovery_includes") or [],
            "excludes": item.get("discovery_excludes") or [],
            "assumptions": [],
            "use_cases": [],
            "research_memos": [],
        },
        "availability": {
            "current_level": a,
            "target_level": "A3",
            "current_artifacts": [],
            "target_artifacts": [],
            "consumption_modes": item.get("consumption_modes") or ["informational"],
        },
        "relations": {"depends_on": [], "supports": [], "related_to": []},
        "evidence": {
            "documentation": [],
            "tests": [],
            "consumer_feedback": [],
            "bug_reports": [],
            "incidents": [],
        },
        "consumer_guidance": {
            "recommended_for": ["planning reuse after human review"],
            "not_recommended_for": ["implementation reuse before validation"],
            "known_limitations": ["discover draft — verify maturity claims"],
        },
        "promotion_history": [],
    }
    markdown = (
        f"# {item['name']}\n\n"
        "Auto-drafted capability entry. Review maturity, evidence, and relations "
        "before promoting.\n"
    )
    return (
        "---\n"
        + yaml.safe_dump(front_matter, sort_keys=False, allow_unicode=True)
        + "---\n\n"
        + markdown
    )
 def format_publish_check_markdown(result: dict[str, Any]) -> str:
    lines = ["# Federation publish check", ""]
    lines.append(f"**Repo:** `{result['repo_root']}`")
    lines.append(f"**Result:** {'PASS' if result['ok'] else 'FAIL'}")
    lines.append("")
    for check in result["checks"]:
        status = "ok" if check["ok"] else "FAIL"
        detail = check.get("detail", "")
        name = check["name"]
        lines.append(f"- **{name}**: {status} — {detail}")
        if check.get("url"):
            lines.append(f"  `{check['url']}`")
    if result.get("remediation"):
        lines.append("")
        lines.append(f"**Remediation:** {result['remediation']}")
    return "\n".join(lines) + "\n"
--- a/reuse_surface/llm_bridge.py
+++ b/reuse_surface/llm_bridge.py
@@ -0,0 +1,102 @@
 from __future__ import annotations
 import json
 import os
 import re
 import urllib.error
 import urllib.request
 from pathlib import Path
 from typing import Any
 from jsonschema import Draft202012Validator
 from reuse_surface.registry import ROOT
 DRAFT_SCHEMA_PATH = ROOT / "schemas" / "registry-draft.schema.json"
 def llm_connect_url(explicit: str | None = None) -> str:
    base = (explicit or os.environ.get("LLM_CONNECT_URL", "")).rstrip("/")
    if not base:
        raise ValueError(
            "LLM backend not configured; set LLM_CONNECT_URL or pass --llm-url"
        )
    return base
 def load_draft_schema() -> dict[str, Any]:
    return json.loads(DRAFT_SCHEMA_PATH.read_text(encoding="utf-8"))
 def execute_prompt(
    prompt: str,
    *,
    base_url: str | None = None,
    config: dict[str, Any] | None = None,
 ) -> str:
    url = f"{llm_connect_url(base_url)}/execute"
    body: dict[str, Any] = {"prompt": prompt}
    if config:
        body["config"] = config
    data = json.dumps(body).encode("utf-8")
    request = urllib.request.Request(
        url,
        data=data,
        headers={
            "Content-Type": "application/json",
            "Accept": "application/json",
            "User-Agent": "reuse-surface/0.1",
        },
        method="POST",
    )
    try:
        with urllib.request.urlopen(request, timeout=120) as response:
            payload = json.loads(response.read().decode("utf-8"))
    except urllib.error.HTTPError as exc:
        raw = exc.read().decode("utf-8")
        raise ValueError(f"llm-connect returned {exc.code}: {raw}") from exc
    content = payload.get("content")
    if not isinstance(content, str) or not content.strip():
        raise ValueError("llm-connect response missing content")
    return content
 def extract_json_object(text: str) -> dict[str, Any]:
    stripped = text.strip()
    if stripped.startswith("```"):
        stripped = re.sub(r"^```(?:json)?\s*", "", stripped)
        stripped = re.sub(r"\s*```$", "", stripped)
    try:
        data = json.loads(stripped)
    except json.JSONDecodeError:
        match = re.search(r"\{.*\}", stripped, re.DOTALL)
        if not match:
            raise ValueError("llm response did not contain JSON object") from None
        data = json.loads(match.group(0))
    if not isinstance(data, dict):
        raise ValueError("llm response JSON must be an object")
    return data
 def request_registry_draft(
    prompt: str,
    *,
    base_url: str | None = None,
    config: dict[str, Any] | None = None,
 ) -> dict[str, Any]:
    draft = extract_json_object(execute_prompt(prompt, base_url=base_url, config=config))
    validator = Draft202012Validator(load_draft_schema())
    errors = sorted(validator.iter_errors(draft), key=lambda err: list(err.path))
    if errors:
        messages = "; ".join(error.message for error in errors[:3])
        raise ValueError(f"draft schema validation failed: {messages}")
    return draft
 def request_json_object(
    prompt: str,
    *,
    base_url: str | None = None,
    config: dict[str, Any] | None = None,
 ) -> dict[str, Any]:
    return extract_json_object(execute_prompt(prompt, base_url=base_url, config=config))
--- a/reuse_surface/registry.py
+++ b/reuse_surface/registry.py
@@ -61,3 +61,30 @@ def parse_vector(vector: str) -> dict[str, str]:
 def level_at_least(dimension: str, current: str, minimum: str) -> bool:
    order = LEVEL_ORDERS[dimension]
    return order.index(current) >= order.index(minimum)
 def registry_paths(repo_root: Path) -> dict[str, Path]:
    registry = repo_root / "registry"
    return {
        "registry": registry,
        "capabilities": registry / "capabilities",
        "index": registry / "indexes" / "capabilities.yaml",
        "sources": registry / "federation" / "sources.yaml",
    }
 def load_index_at(path: Path) -> dict[str, Any]:
    with path.open(encoding="utf-8") as handle:
        return yaml.safe_load(handle)
 def entry_vector(front_matter: dict[str, Any]) -> str:
    discovery = front_matter["maturity"]["discovery"]["current"]
    availability = front_matter["maturity"]["availability"]["current"]
    completeness = front_matter["external_evidence"]["completeness"]["level"]
    reliability = front_matter["external_evidence"]["reliability"]["level"]
    return f"{discovery} / {availability} / {completeness} / {reliability}"
 def vectors_match(index_vector: str, front_matter: dict[str, Any]) -> bool:
    return index_vector.replace(" ", "") == entry_vector(front_matter).replace(" ", "")
--- a/reuse_surface/registry_update.py
+++ b/reuse_surface/registry_update.py
@@ -0,0 +1,273 @@
 from __future__ import annotations
 import json
 import subprocess
 import textwrap
 from pathlib import Path
 from typing import Any
 import yaml
 from reuse_surface.llm_bridge import request_json_object
 from reuse_surface.registry import (
    entry_vector,
    load_index_at,
    parse_front_matter,
    registry_paths,
    vectors_match,
 )
 SAFE_EVIDENCE_PREFIXES = ("tests/", ".gitea/workflows/")
 def git_changed_files(repo_root: Path, since_ref: str) -> list[str]:
    result = subprocess.run(
        ["git", "-C", str(repo_root), "diff", "--name-only", since_ref, "HEAD"],
        capture_output=True,
        text=True,
        check=False,
    )
    if result.returncode != 0:
        raise ValueError(result.stderr.strip() or f"git diff failed for {since_ref}")
    return [line.strip() for line in result.stdout.splitlines() if line.strip()]
 def collect_deterministic_suggestions(
    repo_root: Path,
    *,
    capability_id: str | None = None,
    git_since: str | None = None,
 ) -> list[dict[str, Any]]:
    paths = registry_paths(repo_root)
    if not paths["index"].exists():
        raise ValueError("registry index missing; run establish --scaffold first")
    index = load_index_at(paths["index"])
    rows = index.get("capabilities", [])
    if capability_id:
        rows = [row for row in rows if row["id"] == capability_id]
        if not rows:
            raise ValueError(f"capability not in index: {capability_id}")
    changed_files = git_changed_files(repo_root, git_since) if git_since else []
    suggestions: list[dict[str, Any]] = []
    for row in rows:
        entry_path = repo_root / row["path"]
        if not entry_path.exists():
            suggestions.append(
                {
                    "capability_id": row["id"],
                    "kind": "missing_entry",
                    "detail": f"missing file {row['path']}",
                }
            )
            continue
        front_matter = parse_front_matter(entry_path)
        if not vectors_match(row["vector"], front_matter):
            suggestions.append(
                {
                    "capability_id": row["id"],
                    "kind": "vector_drift",
                    "detail": "index vector differs from entry front matter",
                    "index_vector": row["vector"],
                    "entry_vector": entry_vector(front_matter),
                    "apply_patch": {
                        "field": "index.vector",
                        "value": entry_vector(front_matter),
                    },
                }
            )
        evidence_tests = front_matter.get("evidence", {}).get("tests", [])
        for changed in changed_files:
            if changed.startswith("tests/") and changed not in evidence_tests:
                suggestions.append(
                    {
                        "capability_id": row["id"],
                        "kind": "evidence_test",
                        "detail": f"new test file not cited: {changed}",
                        "apply_patch": {
                            "field": "evidence.tests",
                            "append": changed,
                        },
                    }
                )
        artifacts = front_matter.get("availability", {}).get("current_artifacts", [])
        for changed in changed_files:
            if changed.endswith(".py") and changed.startswith(
                tuple(
                    p.name + "/"
                    for p in repo_root.iterdir()
                    if p.is_dir() and (p / "__init__.py").exists()
                )
            ):
                if changed not in artifacts:
                    suggestions.append(
                        {
                            "capability_id": row["id"],
                            "kind": "availability_artifact",
                            "detail": f"changed module not cited: {changed}",
                            "apply_patch": {
                                "field": "availability.current_artifacts",
                                "append": changed,
                            },
                        }
                    )
    return suggestions
 def apply_deterministic_suggestions(
    repo_root: Path,
    suggestions: list[dict[str, Any]],
 ) -> list[str]:
    paths = registry_paths(repo_root)
    index = load_index_at(paths["index"])
    index_by_id = {row["id"]: row for row in index.get("capabilities", [])}
    changed: list[str] = []
    entry_cache: dict[str, dict[str, Any]] = {}
    entry_paths: dict[str, Path] = {}
    for suggestion in suggestions:
        patch = suggestion.get("apply_patch")
        if not patch:
            continue
        cap_id = suggestion["capability_id"]
        if patch["field"] == "index.vector" and cap_id in index_by_id:
            index_by_id[cap_id]["vector"] = patch["value"]
            changed.append(f"index vector for {cap_id}")
        row = index_by_id.get(cap_id)
        if not row:
            continue
        entry_path = repo_root / row["path"]
        if cap_id not in entry_cache:
            entry_cache[cap_id] = parse_front_matter(entry_path)
            entry_paths[cap_id] = entry_path
        front_matter = entry_cache[cap_id]
        if patch["field"] == "evidence.tests":
            tests = front_matter.setdefault("evidence", {}).setdefault("tests", [])
            if patch["append"] not in tests:
                tests.append(patch["append"])
                changed.append(f"{cap_id} evidence.tests += {patch['append']}")
        if patch["field"] == "availability.current_artifacts":
            artifacts = front_matter.setdefault("availability", {}).setdefault(
                "current_artifacts", []
            )
            if patch["append"] not in artifacts:
                artifacts.append(patch["append"])
                changed.append(
                    f"{cap_id} availability.current_artifacts += {patch['append']}"
                )
    if changed:
        paths["index"].write_text(
            yaml.safe_dump(index, sort_keys=False, allow_unicode=True),
            encoding="utf-8",
        )
        for cap_id, front_matter in entry_cache.items():
            _write_front_matter(entry_paths[cap_id], front_matter)
    return changed
 def _write_front_matter(path: Path, front_matter: dict[str, Any]) -> None:
    text = path.read_text(encoding="utf-8")
    marker_end = text.find("\n---", 4)
    body = text[marker_end + 4 :] if marker_end != -1 else "\n"
    path.write_text(
        "---\n"
        + yaml.safe_dump(front_matter, sort_keys=False, allow_unicode=True)
        + "---"
        + body,
        encoding="utf-8",
    )
 def build_update_prompt(
    repo_root: Path,
    capability_id: str,
    *,
    git_since: str | None = None,
 ) -> str:
    paths = registry_paths(repo_root)
    index = load_index_at(paths["index"])
    row = next((item for item in index["capabilities"] if item["id"] == capability_id), None)
    if not row:
        raise ValueError(f"capability not in index: {capability_id}")
    entry = parse_front_matter(repo_root / row["path"])
    diff = ""
    if git_since:
        proc = subprocess.run(
            [
                "git",
                "-C",
                str(repo_root),
                "diff",
                git_since,
                "HEAD",
                "--",
                "registry/",
                "reuse_surface/",
                "tests/",
            ],
            capture_output=True,
            text=True,
            check=False,
        )
        diff = proc.stdout[:12000]
    return textwrap.dedent(
        f"""
        Suggest registry entry updates for capability `{capability_id}`.
        Return ONLY JSON:
        {{
          "promotion_history": [
            {{"date": "YYYY-MM-DD", "dimension": "availability", "from": "A3", "to": "A4", "rationale": "..."}}
          ],
          "consumer_feedback": ["optional string notes"],
          "notes": ["human review items"]
        }}
        Current entry YAML:
        {yaml.safe_dump(entry, sort_keys=False)}
        Git diff since {git_since or 'N/A'}:
        {diff or '(none)'}
        """
    ).strip()
 def suggest_llm_updates(
    repo_root: Path,
    capability_id: str,
    *,
    git_since: str | None = None,
    llm_url: str | None = None,
 ) -> dict[str, Any]:
    prompt = build_update_prompt(repo_root, capability_id, git_since=git_since)
    return request_json_object(
        prompt,
        base_url=llm_url,
        config={"temperature": 0.2, "max_tokens": 2000},
    )
 def format_suggestions_markdown(suggestions: list[dict[str, Any]]) -> str:
    if not suggestions:
        return "# Registry update suggestions\n\n_No suggestions._\n"
    lines = ["# Registry update suggestions", ""]
    for item in suggestions:
        lines.append(f"- `{item['capability_id']}` **{item['kind']}**: {item['detail']}")
    lines.append("")
    lines.append(f"**{len(suggestions)}** suggestion(s). Use `--apply` to apply safe patches.")
    return "\n".join(lines) + "\n"
 def format_suggestions_json(suggestions: list[dict[str, Any]]) -> str:
    return json.dumps({"count": len(suggestions), "suggestions": suggestions}, indent=2)
--- a/reuse_surface/stats.py
+++ b/reuse_surface/stats.py
@@ -0,0 +1,259 @@
 from __future__ import annotations
 import json
 import urllib.error
 import urllib.request
 from collections import Counter
 from pathlib import Path
 from typing import Any
 import yaml
 from reuse_surface import hub_client
 from reuse_surface.registry import (
    LEVEL_ORDERS,
    entry_vector,
    load_index_at,
    parse_front_matter,
    parse_vector,
    registry_paths,
    vectors_match,
 )
 def _histogram(values: list[str], order: list[str]) -> dict[str, int]:
    counts = Counter(values)
    return {level: counts.get(level, 0) for level in order if counts.get(level, 0)}
 def _probe_url(url: str) -> dict[str, Any]:
    request = urllib.request.Request(
        url,
        method="HEAD",
        headers={"User-Agent": "reuse-surface/0.1"},
    )
    try:
        with urllib.request.urlopen(request, timeout=30) as response:
            return {
                "url": url,
                "status": response.status,
                "content_type": response.headers.get("Content-Type", ""),
                "ok": response.status == 200,
            }
    except urllib.error.HTTPError as exc:
        return {
            "url": url,
            "status": exc.code,
            "content_type": exc.headers.get("Content-Type", ""),
            "ok": False,
        }
    except urllib.error.URLError as exc:
        return {"url": url, "status": None, "error": str(exc.reason), "ok": False}
 def collect_stats(
    repo_root: Path,
    *,
    federation_ready: bool = False,
    raw_url: str | None = None,
    hub_url: str | None = None,
 ) -> dict[str, Any]:
    paths = registry_paths(repo_root)
    stats: dict[str, Any] = {
        "repo_root": str(repo_root),
        "registry_present": paths["registry"].exists(),
        "index_present": paths["index"].exists(),
        "sources_present": paths["sources"].exists(),
        "capability_count": 0,
        "histograms": {},
        "reliability": {"r0_r2": 0, "r3_plus": 0},
        "consumption_modes": {},
        "vector_drift": [],
        "federation": {},
        "hub": {},
    }
    if not paths["index"].exists():
        if federation_ready and raw_url:
            stats["federation"]["raw_url_probe"] = _probe_url(raw_url)
        if hub_url or _hub_configured():
            stats["hub"] = _hub_summary(hub_url)
        return stats
    index = load_index_at(paths["index"])
    capabilities = index.get("capabilities", [])
    stats["capability_count"] = len(capabilities)
    stats["domain"] = index.get("domain")
    discovery: list[str] = []
    availability: list[str] = []
    completeness: list[str] = []
    reliability: list[str] = []
    mode_counts: Counter[str] = Counter()
    for row in capabilities:
        vector = parse_vector(row["vector"])
        discovery.append(vector["discovery"])
        availability.append(vector["availability"])
        completeness.append(vector["completeness"])
        reliability.append(vector["reliability"])
        for mode in row.get("consumption_modes", []):
            mode_counts[mode] += 1
        entry_path = repo_root / row["path"]
        if entry_path.exists():
            try:
                front_matter = parse_front_matter(entry_path)
                if not vectors_match(row["vector"], front_matter):
                    stats["vector_drift"].append(
                        {
                            "id": row["id"],
                            "index_vector": row["vector"],
                            "entry_vector": entry_vector(front_matter),
                        }
                    )
            except ValueError:
                stats["vector_drift"].append(
                    {"id": row["id"], "error": "invalid entry front matter"}
                )
    stats["histograms"] = {
        "discovery": _histogram(discovery, LEVEL_ORDERS["discovery"]),
        "availability": _histogram(availability, LEVEL_ORDERS["availability"]),
        "completeness": _histogram(completeness, LEVEL_ORDERS["completeness"]),
        "reliability": _histogram(reliability, LEVEL_ORDERS["reliability"]),
    }
    stats["reliability"] = {
        "r0_r2": sum(1 for level in reliability if level in {"R0", "R1", "R2"}),
        "r3_plus": sum(1 for level in reliability if level_at_least_reliability(level, "R3")),
    }
    stats["consumption_modes"] = dict(sorted(mode_counts.items()))
    if federation_ready:
        probe_url = raw_url
        if not probe_url and paths["index"].exists():
            probe_url = _default_raw_url(repo_root)
        if probe_url:
            stats["federation"]["raw_url_probe"] = _probe_url(probe_url)
        stats["federation"]["index_valid_yaml"] = _index_yaml_valid(paths["index"])
    stats["hub"] = _hub_summary(hub_url)
    return stats
 def level_at_least_reliability(current: str, minimum: str) -> bool:
    order = LEVEL_ORDERS["reliability"]
    return order.index(current) >= order.index(minimum)
 def _hub_configured() -> bool:
    import os
    return bool(os.environ.get("REUSE_SURFACE_URL"))
 def _hub_summary(hub_url: str | None) -> dict[str, Any]:
    try:
        status, payload = hub_client.hub_list(hub_url)
    except (ValueError, urllib.error.URLError, OSError):
        return {"configured": False}
    if status != 200:
        return {"configured": True, "status": status, "error": payload}
    repos = payload.get("repos", [])
    return {
        "configured": True,
        "registration_count": payload.get("count", len(repos)),
        "enabled_count": sum(1 for repo in repos if repo.get("enabled", True)),
    }
 def _default_raw_url(repo_root: Path) -> str | None:
    return None
 def _index_yaml_valid(index_path: Path) -> bool:
    try:
        data = load_index_at(index_path)
        return isinstance(data, dict) and "capabilities" in data
    except (OSError, yaml.YAMLError):
        return False
 def format_stats_markdown(stats: dict[str, Any]) -> str:
    lines = ["# Registry stats", ""]
    lines.append(f"**Repo:** `{stats['repo_root']}`")
    lines.append(f"**Capabilities:** {stats['capability_count']}")
    if stats.get("domain"):
        lines.append(f"**Domain:** `{stats['domain']}`")
    lines.append("")
    lines.append("## Layout")
    lines.append(f"- registry present: `{stats['registry_present']}`")
    lines.append(f"- index present: `{stats['index_present']}`")
    lines.append(f"- federation sources present: `{stats['sources_present']}`")
    lines.append("")
    rel = stats["reliability"]
    lines.append("## Reliability bands (index vectors)")
    lines.append(f"- R0–R2: **{rel['r0_r2']}**")
    lines.append(f"- R3+: **{rel['r3_plus']}**")
    lines.append("")
    for dimension, histogram in stats.get("histograms", {}).items():
        if not histogram:
            continue
        lines.append(f"## {dimension.title()} histogram")
        for level, count in histogram.items():
            lines.append(f"- `{level}`: {count}")
        lines.append("")
    if stats.get("consumption_modes"):
        lines.append("## Consumption modes")
        for mode, count in stats["consumption_modes"].items():
            lines.append(f"- `{mode}`: {count}")
        lines.append("")
    drift = stats.get("vector_drift", [])
    lines.append(f"## Vector drift: **{len(drift)}**")
    for item in drift[:10]:
        if "error" in item:
            lines.append(f"- `{item['id']}`: {item['error']}")
        else:
            lines.append(
                f"- `{item['id']}`: index `{item['index_vector']}` "
                f"≠ entry `{item['entry_vector']}`"
            )
    if len(drift) > 10:
        lines.append(f"- … and {len(drift) - 10} more")
    lines.append("")
    federation = stats.get("federation", {})
    if federation:
        lines.append("## Federation readiness")
        if "index_valid_yaml" in federation:
            lines.append(f"- index valid YAML: `{federation['index_valid_yaml']}`")
        probe = federation.get("raw_url_probe")
        if probe:
            status = probe.get("status")
            ok = probe.get("ok")
            lines.append(f"- raw URL probe: status **{status}** ({'ok' if ok else 'fail'})")
            lines.append(f"  `{probe.get('url', '')}`")
        lines.append("")
    hub = stats.get("hub", {})
    if hub.get("configured"):
        lines.append("## Hub")
        if "registration_count" in hub:
            lines.append(
                f"- registrations: **{hub['registration_count']}** "
                f"({hub.get('enabled_count', 0)} enabled)"
            )
        elif "error" in hub:
            lines.append(f"- hub error: {hub['error']}")
        lines.append("")
    return "\n".join(lines) + "\n"
 def format_stats_json(stats: dict[str, Any]) -> str:
    return json.dumps(stats, indent=2, sort_keys=True)
--- a/schemas/registry-draft.schema.json
+++ b/schemas/registry-draft.schema.json
@@ -0,0 +1,69 @@
 {
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://reuse-surface.local/schemas/registry-draft.schema.json",
  "title": "RegistryDiscoveryDraft",
  "type": "object",
  "additionalProperties": false,
  "required": ["capabilities"],
  "properties": {
    "domain": {
      "type": "string"
    },
    "capabilities": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": ["id", "name", "summary"],
        "properties": {
          "id": {
            "type": "string",
            "pattern": "^capability\\.[a-z0-9]+(\\.[a-z0-9-]+)+$"
          },
          "name": {
            "type": "string",
            "minLength": 1
          },
          "summary": {
            "type": "string",
            "minLength": 1
          },
          "owner": {
            "type": "string"
          },
          "vector": {
            "type": "string",
            "pattern": "^D[0-7] / A[0-7] / C[0-6] / R[0-6]$"
          },
          "tags": {
            "type": "array",
            "items": {
              "type": "string"
            }
          },
          "consumption_modes": {
            "type": "array",
            "items": {
              "type": "string"
            }
          },
          "discovery_intent": {
            "type": "string"
          },
          "discovery_includes": {
            "type": "array",
            "items": {
              "type": "string"
            }
          },
          "discovery_excludes": {
            "type": "array",
            "items": {
              "type": "string"
            }
          }
        }
      }
    }
  }
 }
--- a/tests/test_establish.py
+++ b/tests/test_establish.py
@@ -0,0 +1,77 @@
 from __future__ import annotations
 from pathlib import Path
 from unittest.mock import patch
 import yaml
 from reuse_surface.establish import (
    discover_capabilities,
    publish_check,
    scaffold_registry,
 )
 from reuse_surface.registry import registry_paths
 def test_scaffold_creates_layout(tmp_path: Path):
    created = scaffold_registry(tmp_path, domain="helix_forge")
    paths = registry_paths(tmp_path)
    assert paths["index"] in created
    data = yaml.safe_load(paths["index"].read_text(encoding="utf-8"))
    assert data["capabilities"] == []
    assert data["domain"] == "helix_forge"
 def test_scaffold_refuses_existing_without_force(tmp_path: Path):
    scaffold_registry(tmp_path)
    try:
        scaffold_registry(tmp_path)
        raise AssertionError("expected ValueError")
    except ValueError as exc:
        assert "already exists" in str(exc)
 def test_publish_check_local_index(tmp_path: Path):
    scaffold_registry(tmp_path)
    result = publish_check(tmp_path)
    assert result["ok"] is True
    assert any(check["name"] == "local_index_yaml" for check in result["checks"])
 def test_publish_check_raw_url_fail(tmp_path: Path):
    with patch(
        "reuse_surface.establish._probe_raw_url",
        return_value={"ok": False, "status": 303, "content_type": "text/html"},
    ):
        result = publish_check(
            tmp_path,
            raw_url="https://example.com/capabilities.yaml",
        )
    assert result["ok"] is False
    assert result.get("remediation")
 def test_discover_dry_run_mock_llm(tmp_path: Path):
    scaffold_registry(tmp_path)
    (tmp_path / "README.md").write_text("# Demo service\n", encoding="utf-8")
    draft = {
        "domain": "helix_forge",
        "capabilities": [
            {
                "id": "capability.demo.sample",
                "name": "Sample",
                "summary": "Sample capability.",
                "owner": "demo",
                "vector": "D2 / A0 / C0 / R0",
                "tags": ["demo"],
                "consumption_modes": ["informational"],
                "discovery_intent": "Enable demo planning.",
            }
        ],
    }
    with patch(
        "reuse_surface.establish.request_registry_draft",
        return_value=draft,
    ):
        result = discover_capabilities(tmp_path, dry_run=True, apply=False)
    assert result["draft"]["capabilities"][0]["id"] == "capability.demo.sample"
--- a/tests/test_llm_bridge.py
+++ b/tests/test_llm_bridge.py
@@ -0,0 +1,53 @@
 from __future__ import annotations
 import json
 from unittest.mock import patch
 import pytest
 from reuse_surface.llm_bridge import (
    extract_json_object,
    llm_connect_url,
    request_registry_draft,
 )
 def test_extract_json_object_from_fenced_block():
    data = extract_json_object('```json\n{"capabilities": []}\n```')
    assert data == {"capabilities": []}
 def test_llm_connect_url_missing_raises():
    with pytest.raises(ValueError, match="LLM_CONNECT_URL"):
        llm_connect_url(None)
 def test_request_registry_draft_mock_http():
    payload = {
        "content": json.dumps(
            {
                "capabilities": [
                    {
                        "id": "capability.demo.sample",
                        "name": "Sample",
                        "summary": "Demo capability",
                    }
                ]
            }
        )
    }
    class FakeResponse:
        def __enter__(self):
            return self
        def __exit__(self, *args):
            return False
        def read(self):
            return json.dumps(payload).encode("utf-8")
    with patch.dict("os.environ", {"LLM_CONNECT_URL": "http://llm.test"}):
        with patch("urllib.request.urlopen", return_value=FakeResponse()):
            draft = request_registry_draft("test prompt")
    assert draft["capabilities"][0]["id"] == "capability.demo.sample"
--- a/tests/test_registry_update.py
+++ b/tests/test_registry_update.py
@@ -0,0 +1,87 @@
 from __future__ import annotations
 from pathlib import Path
 import yaml
 from reuse_surface.establish import scaffold_registry
 from reuse_surface.registry import load_index_at, registry_paths
 from reuse_surface.registry_update import (
    apply_deterministic_suggestions,
    collect_deterministic_suggestions,
 )
 def _write_minimal_entry(tmp_path: Path, cap_id: str, vector: str) -> str:
    rel = "registry/capabilities/capability-demo-sample.md"
    d, a, c, r = [part.strip() for part in vector.split("/")]
    front_matter = {
        "id": cap_id,
        "name": "Sample",
        "summary": "Sample",
        "owner": "demo",
        "status": "draft",
        "domain": "helix_forge",
        "tags": ["demo"],
        "maturity": {
            "discovery": {"current": d, "target": "D5", "confidence": "low"},
            "availability": {"current": a, "target": "A3", "confidence": "low"},
        },
        "external_evidence": {
            "completeness": {"level": c, "confidence": "low"},
            "reliability": {"level": r, "confidence": "low"},
        },
        "discovery": {"intent": "demo", "includes": [], "excludes": []},
        "availability": {
            "current_level": a,
            "target_level": "A3",
            "current_artifacts": [],
            "consumption_modes": ["informational"],
        },
        "relations": {"depends_on": [], "supports": [], "related_to": []},
        "evidence": {"documentation": [], "tests": []},
        "consumer_guidance": {
            "recommended_for": [],
            "not_recommended_for": [],
            "known_limitations": [],
        },
    }
    path = tmp_path / rel
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(
        "---\n"
        + yaml.safe_dump(front_matter, sort_keys=False)
        + "---\n",
        encoding="utf-8",
    )
    return rel
 def test_vector_drift_suggestion(tmp_path: Path):
    scaffold_registry(tmp_path)
    cap_id = "capability.demo.sample"
    rel = _write_minimal_entry(tmp_path, cap_id, "D3 / A0 / C0 / R0")
    index_path = registry_paths(tmp_path)["index"]
    index = load_index_at(index_path)
    index["capabilities"] = [
        {
            "id": cap_id,
            "name": "Sample",
            "summary": "Sample",
            "vector": "D2 / A0 / C0 / R0",
            "domain": "helix_forge",
            "status": "draft",
            "owner": "demo",
            "path": rel,
            "tags": ["demo"],
            "consumption_modes": ["informational"],
        }
    ]
    index_path.write_text(yaml.safe_dump(index, sort_keys=False), encoding="utf-8")
    suggestions = collect_deterministic_suggestions(tmp_path, capability_id=cap_id)
    assert any(item["kind"] == "vector_drift" for item in suggestions)
    changed = apply_deterministic_suggestions(tmp_path, suggestions)
    assert changed
    updated = load_index_at(index_path)
    assert updated["capabilities"][0]["vector"] == "D3 / A0 / C0 / R0"
--- a/tests/test_stats.py
+++ b/tests/test_stats.py
@@ -0,0 +1,20 @@
 from __future__ import annotations
 from pathlib import Path
 from reuse_surface.stats import collect_stats, format_stats_markdown
 def test_collect_stats_on_repo_root():
    root = Path(__file__).resolve().parent.parent
    stats = collect_stats(root)
    assert stats["capability_count"] == 20
    assert stats["index_present"] is True
    assert "discovery" in stats["histograms"]
 def test_format_stats_markdown_contains_count():
    root = Path(__file__).resolve().parent.parent
    text = format_stats_markdown(collect_stats(root))
    assert "Capabilities:" in text
    assert "20" in text
--- a/tools/README.md
+++ b/tools/README.md
@@ -104,6 +104,45 @@ reuse-surface hub sync --dry-run
 Run the service locally: `REUSE_SURFACE_TOKEN=dev-token reuse-surface serve`
 ### stats
 Registry maturity aggregates and federation readiness.
 ```bash
 reuse-surface stats
 reuse-surface stats --format json
 reuse-surface stats --federation-ready --raw-url https://.../capabilities.yaml
 ```
 ### establish
 Bootstrap or discover a capability registry in the current or target repo.
 ```bash
 reuse-surface establish --scaffold --domain helix_forge
 reuse-surface establish --scaffold --path ../state-hub
 reuse-surface establish --publish-check --raw-url https://.../capabilities.yaml
 export LLM_CONNECT_URL=http://127.0.0.1:8088
 reuse-surface establish --discover --dry-run
 reuse-surface establish --discover --apply
 ```
 `--scaffold` creates `registry/` layout. `--publish-check` probes raw URL and
 local index YAML. `--discover` drafts capabilities via llm-connect (optional).
 ### update
 Refresh registry metadata from repo drift signals.
 ```bash
 reuse-surface update --capability capability.registry.register --dry-run
 reuse-surface update --all --from-git-since HEAD~5 --apply
 reuse-surface update --capability capability.registry.register --suggest-maturity
 ```
 Deterministic patches (`vector_drift`, new `tests/` citations) apply with
 `--apply`. LLM suggestions use `--suggest-maturity` and remain review-only.
 ### report cohorts
 Export capability cohorts for planning or implementation reuse decisions.
@@ -140,6 +179,11 @@ Stable IDs and maturity fields are preserved for agent consumption (UC-RS-019).
 | Publish catalog | `reuse-surface catalog` |
 | Compose federation | `reuse-surface federation compose` |
 | Sync federation manifest from hub | `reuse-surface hub sync` |
 | Registry stats | `reuse-surface stats` |
 | Bootstrap sibling registry | `reuse-surface establish --scaffold` |
 | Verify index publish URL | `reuse-surface establish --publish-check` |
 | Draft capabilities (LLM) | `reuse-surface establish --discover` |
 | Refresh entry metadata | `reuse-surface update` |
 | Planning cohort export | `reuse-surface report cohorts` |
 | Relation graph | `reuse-surface graph` |
--- a/workplans/archived/260617-REUSE-WP-0013-registry-establish-and-llm-assist.md
+++ b/workplans/archived/260617-REUSE-WP-0013-registry-establish-and-llm-assist.md
@@ -4,11 +4,11 @@ type: workplan
 title: "Registry establish, update, and stats with optional llm-connect assist"
 domain: helix_forge
 repo: reuse-surface
-status: ready
+status: finished
 owner: codex
 topic_slug: helix-forge
 created: "2026-06-16"
-updated: "2026-06-16"
+updated: "2026-06-17"
 state_hub_workstream_id: "239a0077-8593-4dc7-918d-4c23895275f6"
 ---
@@ -91,7 +91,7 @@ reuse-surface update --from-git-since HEAD~5 --apply
 ```task
 id: REUSE-WP-0013-T01
-status: todo
+status: done
 priority: high
 state_hub_task_id: "98e65330-bfc7-4282-b372-d35542b899ce"
 ```
@@ -112,7 +112,7 @@ Output: Markdown default, `--format json`. Pytest coverage. Document in
 ```task
 id: REUSE-WP-0013-T02
-status: todo
+status: done
 priority: high
 state_hub_task_id: "b8fedd87-d0d3-41b4-9af8-e36d52bfe1c5"
 ```
@@ -131,7 +131,7 @@ No llm-connect dependency. Pytest with temp directory.
 ```task
 id: REUSE-WP-0013-T03
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "2924d685-709f-4e28-886f-b363cd9c40b4"
 ```
@@ -147,7 +147,7 @@ Federation publish helper for sibling repo operators:
 ```task
 id: REUSE-WP-0013-T04
-status: todo
+status: done
 priority: high
 state_hub_task_id: "650ebee5-b34b-4ed8-891d-d93aacebadd7"
 ```
@@ -166,7 +166,7 @@ Thin client boundary:
 ```task
 id: REUSE-WP-0013-T05
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "b9154889-f538-4266-9918-b277f9a297be"
 ```
@@ -185,7 +185,7 @@ LLM-assisted bootstrap after `--scaffold` or on empty registry:
 ```task
 id: REUSE-WP-0013-T06
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "b79558da-54b2-4712-91d2-b298c7cf2c40"
 ```
@@ -210,7 +210,7 @@ Targets: single `--capability`, `--all`, `--from-git-since <ref>`.
 ```task
 id: REUSE-WP-0013-T07
-status: todo
+status: done
 priority: low
 state_hub_task_id: "a55a2f26-004e-4c20-90cb-49bed64a1291"
 ```
@@ -227,13 +227,20 @@ state_hub_task_id: "a55a2f26-004e-4c20-90cb-49bed64a1291"
 ## Acceptance
- [ ] `reuse-surface stats` reports maturity and federation-readiness aggregates
+- [x] `reuse-surface stats` reports maturity and federation-readiness aggregates
- [ ] `establish --scaffold` creates valid empty registry layout without overwrite accidents
+- [x] `establish --scaffold` creates valid empty registry layout without overwrite accidents
- [ ] `establish --publish-check` detects 303 vs 200 raw URL outcomes
+- [x] `establish --publish-check` detects 303 vs 200 raw URL outcomes
- [ ] llm-connect bridge works with mocked HTTP; fails clearly when URL unset
+- [x] llm-connect bridge works with mocked HTTP; fails clearly when URL unset
- [ ] `establish --discover --dry-run` produces schema-valid draft JSON from fixture context
+- [x] `establish --discover --dry-run` produces schema-valid draft JSON from fixture context
- [ ] `update --dry-run` reports deterministic drift on sample repo
+- [x] `update --dry-run` reports deterministic drift on sample repo
- [ ] All new commands documented; gap priority 24 recorded
+- [x] All new commands documented; gap priority 24 recorded
 ## Completion notes (2026-06-17)
 - Modules: `stats.py`, `establish.py`, `registry_update.py`, `llm_bridge.py`
 - Schema: `schemas/registry-draft.schema.json`
 - `validate --root` for sibling repo validation after establish --apply
 - 43 pytest tests; optional `pip install -e ".[llm]"` extra
 ## Out of scope