Implement REUSE-WP-0013 registry establish, update, and stats
Some checks failed
ci / validate-registry (push) Has been cancelled

Add stats, establish (scaffold, publish-check, discover), and update CLI
commands with optional llm-connect bridge, validate --root for sibling repos,
pytest coverage, and documentation for sibling registry onboarding.
This commit is contained in:
2026-06-16 01:21:01 +02:00
parent fb712b4b98
commit 70a5003f6e
19 changed files with 1740 additions and 30 deletions

View File

@@ -32,6 +32,9 @@ jobs:
reuse-surface catalog reuse-surface catalog
reuse-surface graph --check --fail-on-warnings reuse-surface graph --check --fail-on-warnings
- name: Registry stats (informational)
run: reuse-surface stats || true
- name: Planning cohort report (informational) - name: Planning cohort report (informational)
run: reuse-surface report cohorts --planning-min D4 || true run: reuse-surface report cohorts --planning-min D4 || true

View File

@@ -60,6 +60,11 @@ The MVP registry foundation, CLI tooling (REUSE-WP-0003), federation stack
against `https://reuse.coulomb.social` against `https://reuse.coulomb.social`
- **Sync local federation manifest from hub** with `reuse-surface hub sync` - **Sync local federation manifest from hub** with `reuse-surface hub sync`
- **Export planning cohorts** with `reuse-surface report cohorts` - **Export planning cohorts** with `reuse-surface report cohorts`
- **Bootstrap a sibling registry** with `reuse-surface establish --scaffold`
- **Verify index publish readiness** with `reuse-surface establish --publish-check`
- **View registry stats** with `reuse-surface stats`
- **Draft or refresh entries** with `reuse-surface establish --discover` and
`reuse-surface update` (optional llm-connect backend)
- **Run the hub locally or in a container** with `reuse-surface serve` - **Run the hub locally or in a container** with `reuse-surface serve`
- **Generate relation graphs** with `reuse-surface graph` - **Generate relation graphs** with `reuse-surface graph`
- **Explore relations interactively** at `docs/graph/index.html` - **Explore relations interactively** at `docs/graph/index.html`
@@ -104,8 +109,8 @@ See `tools/README.md` for command reference.
- **Federated index:** `registry/indexes/federated.yaml` (local compose). - **Federated index:** `registry/indexes/federated.yaml` (local compose).
- **Relation graph:** `docs/graph/capability-graph.mmd`, `docs/graph/index.html`. - **Relation graph:** `docs/graph/capability-graph.mmd`, `docs/graph/index.html`.
- **Searchable catalog:** `docs/catalog/search.html`. - **Searchable catalog:** `docs/catalog/search.html`.
- **Workplans:** REUSE-WP-0001 through REUSE-WP-0011 finished; WP-0011 archived; - **Workplans:** REUSE-WP-0001 through REUSE-WP-0012 finished/archived;
**REUSE-WP-0012** finished (federation scale + intent alignment). **REUSE-WP-0013** finished (registry establish/update/stats).
- **Assessment history:** `history/2026-06-15-intent-scope-assessment.md`. - **Assessment history:** `history/2026-06-15-intent-scope-assessment.md`.
- **Self-assessed vector:** `D5 / A4 / C5 / R3` (see `docs/IntentScopeGapAnalysis.md`). - **Self-assessed vector:** `D5 / A4 / C5 / R3` (see `docs/IntentScopeGapAnalysis.md`).

View File

@@ -3,7 +3,7 @@
**Repository:** `reuse-surface` **Repository:** `reuse-surface`
**Artifact:** `docs/IntentScopeGapAnalysis.md` **Artifact:** `docs/IntentScopeGapAnalysis.md`
**Status:** Living analysis **Status:** Living analysis
**Updated:** 2026-06-16 **Updated:** 2026-06-17
**Purpose:** Record alignment, drift, and open gaps between declared intent and **Purpose:** Record alignment, drift, and open gaps between declared intent and
current delivered scope so future workplans can close them deliberately. current delivered scope so future workplans can close them deliberately.
@@ -30,6 +30,8 @@ four maturity dimensions, and human/agent consumers.
standardization tracker still manual. standardization tracker still manual.
3. **Hub automation**`hub sync` shipped; polling/webhooks still absent. 3. **Hub automation**`hub sync` shipped; polling/webhooks still absent.
4. **Managed platform posture** — A5 container documented; A6/Postgres deferred. 4. **Managed platform posture** — A5 container documented; A6/Postgres deferred.
5. **Registry bootstrap in sibling repos**`establish`/`update`/`stats` shipped;
sibling adoption still operator-driven.
**Current reuse-surface product vector (self-assessment):** `D5 / A4 / C5 / R3` **Current reuse-surface product vector (self-assessment):** `D5 / A4 / C5 / R3`
@@ -197,8 +199,10 @@ archived workplans under `workplans/archived/`.
| 21 | INTENT layout sync | Update INTENT.md tree and example entry shape | **Closed** (WP-0012) | | 21 | INTENT layout sync | Update INTENT.md tree and example entry shape | **Closed** (WP-0012) |
| 22 | Hub hardening | Postgres option, backup, documented SLO (A5→A6 path) | **Closed** (doc; implementation deferred) | | 22 | Hub hardening | Postgres option, backup, documented SLO (A5→A6 path) | **Closed** (doc; implementation deferred) |
| 23 | External evidence program | Raise catalog R levels with consumer_feedback | **Closed** (checklist + 3 entries; telemetry deferred) | | 23 | External evidence program | Raise catalog R levels with consumer_feedback | **Closed** (checklist + 3 entries; telemetry deferred) |
| 24 | Registry bootstrap tooling | `establish`, `update`, `stats` for sibling repos | **Closed** (WP-0013) |
**Workplan:** `REUSE-WP-0012` (finished). **Assessment snapshots:** **Workplan:** `REUSE-WP-0013` (finished). Prior: `REUSE-WP-0012` (finished).
**Assessment snapshots:**
`history/2026-06-15-intent-scope-assessment.md`, `history/2026-06-15-intent-scope-assessment.md`,
`history/2026-06-16-hub-registration-blocks.md`. `history/2026-06-16-hub-registration-blocks.md`.
@@ -228,3 +232,4 @@ archived workplans under `workplans/archived/`.
| 2026-06-15 | Post-WP-0011 refresh: 20 capabilities, vector D5/A4/C4/R3, priorities 1823 proposed | | 2026-06-15 | Post-WP-0011 refresh: 20 capabilities, vector D5/A4/C4/R3, priorities 1823 proposed |
| 2026-06-15 | REUSE-WP-0012 proposed; assessment archived in `history/2026-06-15-intent-scope-assessment.md` | | 2026-06-15 | REUSE-WP-0012 proposed; assessment archived in `history/2026-06-15-intent-scope-assessment.md` |
| 2026-06-16 | REUSE-WP-0012 closed priorities 1923; priority 18 deferred on sibling index blocks; vector C5 | | 2026-06-16 | REUSE-WP-0012 closed priorities 1923; priority 18 deferred on sibling index blocks; vector C5 |
| 2026-06-17 | REUSE-WP-0013 closed priority 24; establish/update/stats + optional llm-connect assist |

View File

@@ -97,6 +97,18 @@ curl -fsS "<raw-url>" | head
source) to an environment variable holding a Bearer token or full header value. source) to an environment variable holding a Bearer token or full header value.
The hub stores `auth_env` / `auth_header` names only — never secret values. The hub stores `auth_env` / `auth_header` names only — never secret values.
### Sibling onboarding (CLI)
```bash
cd ../state-hub
reuse-surface establish --scaffold --domain helix_forge
# optional: LLM_CONNECT_URL=... reuse-surface establish --discover --dry-run
reuse-surface validate --root .
git push origin main
reuse-surface establish --publish-check \
--raw-url https://gitea.coulomb.social/coulomb/state-hub/raw/main/registry/indexes/capabilities.yaml
```
### Registration checklist ### Registration checklist
1. Merge capability index to the default branch. 1. Merge capability index to the default branch.

View File

@@ -20,6 +20,9 @@ dev = [
"httpx>=0.27", "httpx>=0.27",
"pytest>=8.0", "pytest>=8.0",
] ]
llm = [
"llm-connect",
]
[project.scripts] [project.scripts]
reuse-surface = "reuse_surface.cli:main" reuse-surface = "reuse_surface.cli:main"

View File

@@ -35,6 +35,21 @@ registry/
Missing evidence is acceptable in the MVP when it is explicit rather than hidden. Missing evidence is acceptable in the MVP when it is explicit rather than hidden.
## LLM-assisted discover review checklist
When using `reuse-surface establish --discover` (llm-connect backend):
- [ ] Every proposed `id` follows `capability.<domain>.<name>` and is not a duplicate
- [ ] `summary`, `discovery.intent`, and maturity vectors match repo reality
- [ ] `owner` reflects the delivering repository or team
- [ ] Relations are empty or manually added after human review
- [ ] Run `reuse-surface validate --root <repo>` before merge
- [ ] Run `reuse-surface establish --publish-check` after pushing to `main`
Discover drafts start at low maturity with explicit auto-draft risks in
`known_reliability_risks`. Promote only with evidence per
`specs/CapabilityMaturityStandard.md`.
## Manual validation checklist ## Manual validation checklist
Use this checklist until an automated CLI validator exists. Use this checklist until an automated CLI validator exists.

View File

@@ -26,21 +26,48 @@ from reuse_surface.reports import (
format_cohort_markdown, format_cohort_markdown,
select_cohort, select_cohort,
) )
from reuse_surface.establish import (
discover_capabilities,
format_publish_check_markdown,
publish_check,
scaffold_next_steps,
scaffold_registry,
)
from reuse_surface.registry_update import (
apply_deterministic_suggestions,
collect_deterministic_suggestions,
format_suggestions_json,
format_suggestions_markdown,
suggest_llm_updates,
)
from reuse_surface.stats import collect_stats, format_stats_json, format_stats_markdown
from reuse_surface.registry import ( from reuse_surface.registry import (
ROOT, ROOT,
capability_paths, capability_paths,
level_at_least, level_at_least,
load_index, load_index,
load_index_at,
load_schema, load_schema,
parse_front_matter, parse_front_matter,
parse_vector, parse_vector,
registry_paths,
) )
def _check_index_drift(entry_paths: list[Path], index: dict[str, Any]) -> list[str]: def _registry_root(args: argparse.Namespace) -> Path:
if getattr(args, "root", None):
return Path(args.root).resolve()
return ROOT
def _check_index_drift(
entry_paths: list[Path],
index: dict[str, Any],
repo_root: Path,
) -> list[str]:
warnings: list[str] = [] warnings: list[str] = []
indexed_paths = {item["path"] for item in index.get("capabilities", [])} indexed_paths = {item["path"] for item in index.get("capabilities", [])}
file_paths = {str(path.relative_to(ROOT)) for path in entry_paths} file_paths = {str(path.relative_to(repo_root)) for path in entry_paths}
for path in sorted(file_paths - indexed_paths): for path in sorted(file_paths - indexed_paths):
warnings.append(f"index drift: entry file not indexed: {path}") warnings.append(f"index drift: entry file not indexed: {path}")
for path in sorted(indexed_paths - file_paths): for path in sorted(indexed_paths - file_paths):
@@ -48,11 +75,22 @@ def _check_index_drift(entry_paths: list[Path], index: dict[str, Any]) -> list[s
return warnings return warnings
def cmd_validate(args: argparse.Namespace) -> int: def _capability_paths_for(repo_root: Path, target: Path | None) -> list[Path]:
if target is not None:
return [target]
cap_dir = registry_paths(repo_root)["capabilities"]
return sorted(path for path in cap_dir.glob("*.md") if path.name != ".gitkeep")
def _run_validate(
repo_root: Path,
*,
target: Path | None,
relations: bool,
) -> tuple[list[str], list[str], list[Path]]:
schema = load_schema() schema = load_schema()
validator = Draft202012Validator(schema) validator = Draft202012Validator(schema)
target = Path(args.path) if args.path else None paths = _capability_paths_for(repo_root, target)
paths = capability_paths(target)
errors: list[str] = [] errors: list[str] = []
warnings: list[str] = [] warnings: list[str] = []
@@ -67,10 +105,23 @@ def cmd_validate(args: argparse.Namespace) -> int:
errors.append(f"{path}: {location}: {error.message}") errors.append(f"{path}: {location}: {error.message}")
if not target: if not target:
index = load_index() index_path = registry_paths(repo_root)["index"]
warnings.extend(_check_index_drift(paths, index)) if index_path.exists():
if args.relations: index = load_index_at(index_path)
warnings.extend(_check_index_drift(paths, index, repo_root))
if relations and repo_root == ROOT:
warnings.extend(check_relations()) warnings.extend(check_relations())
return errors, warnings, paths
def cmd_validate(args: argparse.Namespace) -> int:
repo_root = _registry_root(args)
target = Path(args.path) if args.path else None
if target and not target.is_absolute():
target = repo_root / target
errors, warnings, paths = _run_validate(
repo_root, target=target, relations=args.relations
)
for warning in warnings: for warning in warnings:
print(f"warning: {warning}", file=sys.stderr) print(f"warning: {warning}", file=sys.stderr)
@@ -329,6 +380,117 @@ def cmd_hub_sync(args: argparse.Namespace) -> int:
return 0 return 0
def cmd_stats(args: argparse.Namespace) -> int:
repo_root = Path(args.path or ".").resolve()
stats = collect_stats(
repo_root,
federation_ready=args.federation_ready,
raw_url=args.raw_url,
hub_url=getattr(args, "hub_url", None),
)
if args.format == "json":
print(format_stats_json(stats))
else:
print(format_stats_markdown(stats), end="")
return 0
def cmd_establish(args: argparse.Namespace) -> int:
repo_root = Path(args.path or ".").resolve()
try:
if args.scaffold:
created = scaffold_registry(
repo_root, domain=args.domain, force=args.force
)
for path in created:
print(f"ok: wrote {path.relative_to(repo_root)}")
print(scaffold_next_steps(repo_root))
return 0
if args.publish_check:
result = publish_check(repo_root, raw_url=args.raw_url)
print(format_publish_check_markdown(result), end="")
return 0 if result["ok"] else 1
if args.discover:
result = discover_capabilities(
repo_root,
domain=args.domain,
dry_run=not args.apply,
apply=args.apply,
llm_url=args.llm_url,
context_max_files=args.context_max_files,
)
if result.get("dry_run"):
print(yaml.safe_dump(result["draft"], sort_keys=False))
return 0
for path in result.get("written", []):
print(f"ok: wrote {path}")
validate_args = argparse.Namespace(
path=None,
root=str(repo_root),
relations=False,
fail_on_warnings=True,
)
return cmd_validate(validate_args)
except ValueError as exc:
print(f"error: {exc}", file=sys.stderr)
return 1
print("error: specify --scaffold, --publish-check, or --discover", file=sys.stderr)
return 1
def cmd_update(args: argparse.Namespace) -> int:
repo_root = Path(args.path or ".").resolve()
try:
capability_id = None if args.all else args.capability
if not args.all and not args.capability:
print("error: specify --capability or --all", file=sys.stderr)
return 1
if args.suggest_maturity:
cap_ids = [args.capability] if args.capability else []
if args.all:
index = load_index_at(registry_paths(repo_root)["index"])
cap_ids = [row["id"] for row in index.get("capabilities", [])]
payload = {
"suggestions": [
suggest_llm_updates(
repo_root,
cap_id,
git_since=args.from_git_since,
llm_url=args.llm_url,
)
for cap_id in cap_ids
]
}
print(json.dumps(payload, indent=2, sort_keys=True))
return 0
suggestions = collect_deterministic_suggestions(
repo_root,
capability_id=capability_id,
git_since=args.from_git_since,
)
if args.apply:
changed = apply_deterministic_suggestions(repo_root, suggestions)
for line in changed:
print(f"ok: {line}")
validate_args = argparse.Namespace(
path=None,
root=str(repo_root),
relations=False,
fail_on_warnings=True,
)
return cmd_validate(validate_args)
if args.format == "json":
print(format_suggestions_json(suggestions))
else:
print(format_suggestions_markdown(suggestions), end="")
return 0
except ValueError as exc:
print(f"error: {exc}", file=sys.stderr)
return 1
def cmd_report_cohorts(args: argparse.Namespace) -> int: def cmd_report_cohorts(args: argparse.Namespace) -> int:
filters = cohort_filters_from_args(args) filters = cohort_filters_from_args(args)
matches = select_cohort(filters) matches = select_cohort(filters)
@@ -399,6 +561,10 @@ def main(argv: list[str] | None = None) -> int:
action="store_true", action="store_true",
help="exit non-zero when warnings are present", help="exit non-zero when warnings are present",
) )
validate.add_argument(
"--root",
help="registry repo root (default: reuse-surface install root)",
)
validate.set_defaults(func=cmd_validate) validate.set_defaults(func=cmd_validate)
federation = subparsers.add_parser( federation = subparsers.add_parser(
@@ -539,6 +705,41 @@ def main(argv: list[str] | None = None) -> int:
) )
cohorts.set_defaults(func=cmd_report_cohorts) cohorts.set_defaults(func=cmd_report_cohorts)
stats = subparsers.add_parser("stats", help="registry maturity and federation stats")
stats.add_argument("--path", help="repo root (default: cwd)")
stats.add_argument("--federation-ready", action="store_true")
stats.add_argument("--raw-url", help="probe federation raw index URL")
stats.add_argument("--hub-url", help="hub base URL (or REUSE_SURFACE_URL)")
stats.add_argument("--format", choices=["markdown", "json"], default="markdown")
stats.set_defaults(func=cmd_stats)
establish = subparsers.add_parser(
"establish", help="bootstrap or discover capability registry"
)
establish.add_argument("--path", help="target repo root (default: cwd)")
establish.add_argument("--domain", default="helix_forge")
establish.add_argument("--force", action="store_true")
establish.add_argument("--scaffold", action="store_true")
establish.add_argument("--publish-check", action="store_true")
establish.add_argument("--discover", action="store_true")
establish.add_argument("--dry-run", action="store_true", help="discover preview (default)")
establish.add_argument("--apply", action="store_true", help="discover write + validate")
establish.add_argument("--raw-url", help="raw Gitea index URL for publish-check")
establish.add_argument("--llm-url", help="llm-connect base URL (or LLM_CONNECT_URL)")
establish.add_argument("--context-max-files", type=int, default=12)
establish.set_defaults(func=cmd_establish)
update = subparsers.add_parser("update", help="refresh registry metadata from repo signals")
update.add_argument("--path", help="repo root (default: cwd)")
update.add_argument("--capability", help="single capability id")
update.add_argument("--all", action="store_true")
update.add_argument("--from-git-since", help="git ref for change detection")
update.add_argument("--apply", action="store_true")
update.add_argument("--suggest-maturity", action="store_true")
update.add_argument("--llm-url", help="llm-connect base URL (or LLM_CONNECT_URL)")
update.add_argument("--format", choices=["markdown", "json"], default="markdown")
update.set_defaults(func=cmd_update)
args = parser.parse_args(argv) args = parser.parse_args(argv)
return args.func(args) return args.func(args)

448
reuse_surface/establish.py Normal file
View File

@@ -0,0 +1,448 @@
from __future__ import annotations
import json
import textwrap
import urllib.error
import urllib.request
from datetime import date
from pathlib import Path
from typing import Any
import yaml
from reuse_surface.llm_bridge import request_registry_draft
from reuse_surface.registry import load_index_at, registry_paths
SCAFFOLD_README = """# Capability Registry
Markdown-first capability index for federation and reuse planning.
## Authoring
1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`).
2. Add the row to `indexes/capabilities.yaml`.
3. Run `reuse-surface validate` from a checkout with the CLI installed.
4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`.
Federation contract: reuse-surface `docs/RegistryFederation.md`.
"""
CONTEXT_FILES = (
"INTENT.md",
"SCOPE.md",
"AGENTS.md",
"README.md",
"pyproject.toml",
"Cargo.toml",
"go.mod",
)
def scaffold_registry(
repo_root: Path,
*,
domain: str = "helix_forge",
force: bool = False,
) -> list[Path]:
paths = registry_paths(repo_root)
created: list[Path] = []
if paths["registry"].exists() and not force:
raise ValueError(
f"registry already exists at {paths['registry']}; use --force to overwrite"
)
paths["registry"].mkdir(parents=True, exist_ok=True)
paths["capabilities"].mkdir(parents=True, exist_ok=True)
paths["index"].parent.mkdir(parents=True, exist_ok=True)
readme = paths["registry"] / "README.md"
if force or not readme.exists():
readme.write_text(SCAFFOLD_README, encoding="utf-8")
created.append(readme)
gitkeep = paths["capabilities"] / ".gitkeep"
if force or not gitkeep.exists():
gitkeep.write_text("", encoding="utf-8")
created.append(gitkeep)
index_data = {
"version": 1,
"updated": date.today().isoformat(),
"domain": domain,
"capabilities": [],
}
if force or not paths["index"].exists():
paths["index"].write_text(
yaml.safe_dump(index_data, sort_keys=False, allow_unicode=True),
encoding="utf-8",
)
created.append(paths["index"])
return created
def scaffold_next_steps(repo_root: Path) -> str:
return textwrap.dedent(
f"""
Next steps:
1. Add capability entries under {repo_root / 'registry/capabilities'}
2. Update {repo_root / 'registry/indexes/capabilities.yaml'}
3. reuse-surface validate
4. git push origin main
5. reuse-surface establish --publish-check --raw-url <gitea-raw-url>
6. reuse-surface hub register --repo <slug> --url <raw-url>
"""
).strip()
def publish_check(
repo_root: Path,
*,
raw_url: str | None = None,
) -> dict[str, Any]:
paths = registry_paths(repo_root)
result: dict[str, Any] = {
"repo_root": str(repo_root),
"checks": [],
"ok": True,
}
if paths["index"].exists():
try:
data = load_index_at(paths["index"])
valid = isinstance(data, dict) and isinstance(data.get("capabilities"), list)
result["checks"].append(
{
"name": "local_index_yaml",
"ok": valid,
"detail": f"{len(data.get('capabilities', []))} capabilities"
if valid
else "invalid structure",
}
)
if not valid:
result["ok"] = False
except (OSError, yaml.YAMLError) as exc:
result["checks"].append(
{"name": "local_index_yaml", "ok": False, "detail": str(exc)}
)
result["ok"] = False
else:
result["checks"].append(
{
"name": "local_index_yaml",
"ok": False,
"detail": "registry/indexes/capabilities.yaml missing",
}
)
result["ok"] = False
if raw_url:
probe = _probe_raw_url(raw_url)
result["checks"].append(
{
"name": "raw_url_probe",
"ok": probe["ok"],
"detail": f"HTTP {probe.get('status')} {probe.get('content_type', '')}".strip(),
"url": raw_url,
}
)
if probe["ok"]:
body_probe = _fetch_yaml_snippet(raw_url)
result["checks"].append(body_probe)
if not body_probe.get("ok"):
result["ok"] = False
else:
result["ok"] = False
result["remediation"] = (
"Merge registry/indexes/capabilities.yaml to main and confirm "
"Gitea raw URL returns 200 YAML. See docs/RegistryFederation.md."
)
return result
def _probe_raw_url(url: str) -> dict[str, Any]:
request = urllib.request.Request(
url,
method="HEAD",
headers={"User-Agent": "reuse-surface/0.1"},
)
try:
with urllib.request.urlopen(request, timeout=30) as response:
return {
"ok": response.status == 200,
"status": response.status,
"content_type": response.headers.get("Content-Type", ""),
}
except urllib.error.HTTPError as exc:
return {
"ok": False,
"status": exc.code,
"content_type": exc.headers.get("Content-Type", ""),
}
def _fetch_yaml_snippet(url: str) -> dict[str, Any]:
request = urllib.request.Request(url, headers={"User-Agent": "reuse-surface/0.1"})
try:
with urllib.request.urlopen(request, timeout=30) as response:
body = response.read().decode("utf-8")
except urllib.error.HTTPError as exc:
return {"name": "raw_url_body", "ok": False, "detail": f"HTTP {exc.code}"}
except urllib.error.URLError as exc:
return {"name": "raw_url_body", "ok": False, "detail": str(exc.reason)}
try:
data = yaml.safe_load(body)
except yaml.YAMLError as exc:
return {"name": "raw_url_body", "ok": False, "detail": str(exc)}
ok = isinstance(data, dict) and "capabilities" in data
return {
"name": "raw_url_body",
"ok": ok,
"detail": "valid capabilities.yaml shape" if ok else "body is not valid index YAML",
}
def collect_context(repo_root: Path, *, max_files: int = 12) -> str:
chunks: list[str] = []
used = 0
for name in CONTEXT_FILES:
if used >= max_files:
break
path = repo_root / name
if path.is_file():
chunks.append(f"### {name}\n{path.read_text(encoding='utf-8')[:8000]}")
used += 1
pkg_dirs = sorted(
[
item
for item in repo_root.iterdir()
if item.is_dir()
and not item.name.startswith(".")
and item.name not in {"registry", "tests", "docs", "workplans", "node_modules"}
]
)
for pkg in pkg_dirs[: max(0, max_files - used)]:
init = pkg / "__init__.py"
if init.exists():
chunks.append(f"### {pkg.name}/__init__.py\n{init.read_text(encoding='utf-8')[:2000]}")
return "\n\n".join(chunks)
def build_discover_prompt(context: str, domain: str) -> str:
schema_hint = json.dumps(
{
"domain": domain,
"capabilities": [
{
"id": "capability.domain.name",
"name": "Human Name",
"summary": "One sentence.",
"owner": "team",
"vector": "D2 / A0 / C0 / R0",
"tags": ["tag"],
"consumption_modes": ["informational"],
"discovery_intent": "What this enables.",
"discovery_includes": ["included behavior"],
"discovery_excludes": ["excluded behavior"],
}
],
},
indent=2,
)
return textwrap.dedent(
f"""
You are drafting a capability registry index for helix_forge reuse-surface.
Return ONLY a JSON object matching this shape (no markdown fences):
{schema_hint}
Rules:
- Propose 1-5 distinct capabilities grounded in the repository context.
- Use IDs matching ^capability\\.[a-z0-9]+(\\.[a-z0-9-]+)+$
- Default vector D2 / A0 / C0 / R0 unless strong delivery evidence exists.
- domain: {domain}
Repository context:
{context}
"""
).strip()
def discover_capabilities(
repo_root: Path,
*,
domain: str = "helix_forge",
dry_run: bool = True,
apply: bool = False,
llm_url: str | None = None,
context_max_files: int = 12,
) -> dict[str, Any]:
if apply and dry_run:
raise ValueError("use either --dry-run or --apply, not both")
if not apply and not dry_run:
dry_run = True
context = collect_context(repo_root, max_files=context_max_files)
if not context.strip():
raise ValueError("no context files found for discovery")
prompt = build_discover_prompt(context, domain)
draft = request_registry_draft(
prompt,
base_url=llm_url,
config={"temperature": 0.2, "max_tokens": 4000},
)
result: dict[str, Any] = {"draft": draft, "written": [], "dry_run": dry_run}
if dry_run:
return result
paths = registry_paths(repo_root)
if not paths["index"].exists():
scaffold_registry(repo_root, domain=domain, force=False)
index = load_index_at(paths["index"]) if paths["index"].exists() else {
"version": 1,
"domain": domain,
"capabilities": [],
}
existing_ids = {row["id"] for row in index.get("capabilities", [])}
for item in draft.get("capabilities", []):
cap_id = item["id"]
if cap_id in existing_ids:
continue
filename = cap_id.replace(".", "-") + ".md"
rel_path = f"registry/capabilities/{filename}"
entry_path = repo_root / rel_path
entry_body = _render_entry_from_draft(item, domain)
entry_path.parent.mkdir(parents=True, exist_ok=True)
entry_path.write_text(entry_body, encoding="utf-8")
vector = item.get("vector", "D2 / A0 / C0 / R0")
index.setdefault("capabilities", []).append(
{
"id": cap_id,
"name": item["name"],
"summary": item["summary"],
"vector": vector,
"domain": domain,
"status": "draft",
"owner": item.get("owner", repo_root.name),
"path": rel_path,
"tags": item.get("tags", []),
"consumption_modes": item.get("consumption_modes", ["informational"]),
}
)
result["written"].append(rel_path)
index["updated"] = date.today().isoformat()
index["domain"] = draft.get("domain", domain)
paths["index"].write_text(
yaml.safe_dump(index, sort_keys=False, allow_unicode=True),
encoding="utf-8",
)
result["written"].append(str(paths["index"].relative_to(repo_root)))
return result
def _render_entry_from_draft(item: dict[str, Any], domain: str) -> str:
vector = item.get("vector", "D2 / A0 / C0 / R0")
d, a, c, r = [part.strip() for part in vector.split("/")]
front_matter = {
"id": item["id"],
"name": item["name"],
"summary": item["summary"],
"owner": item.get("owner", domain),
"status": "draft",
"domain": domain,
"tags": item.get("tags") or ["draft"],
"maturity": {
"discovery": {
"current": d,
"target": "D5",
"confidence": "low",
"rationale": "Auto-drafted by reuse-surface establish --discover; review required.",
},
"availability": {
"current": a,
"target": "A3",
"confidence": "low",
"rationale": "Auto-drafted; confirm consumption modes and artifacts.",
},
},
"external_evidence": {
"completeness": {
"level": c,
"confidence": "low",
"basis": "scope_vs_intent_and_consumer_expectations",
"satisfied_expectations": [],
"broken_expectations": [],
"out_of_scope_expectations": [],
},
"reliability": {
"level": r,
"confidence": "low",
"basis": "consumer_quality_signals",
"known_reliability_risks": ["auto-drafted entry without consumer evidence"],
},
},
"discovery": {
"intent": item.get("discovery_intent", item["summary"]),
"includes": item.get("discovery_includes") or [],
"excludes": item.get("discovery_excludes") or [],
"assumptions": [],
"use_cases": [],
"research_memos": [],
},
"availability": {
"current_level": a,
"target_level": "A3",
"current_artifacts": [],
"target_artifacts": [],
"consumption_modes": item.get("consumption_modes") or ["informational"],
},
"relations": {"depends_on": [], "supports": [], "related_to": []},
"evidence": {
"documentation": [],
"tests": [],
"consumer_feedback": [],
"bug_reports": [],
"incidents": [],
},
"consumer_guidance": {
"recommended_for": ["planning reuse after human review"],
"not_recommended_for": ["implementation reuse before validation"],
"known_limitations": ["discover draft — verify maturity claims"],
},
"promotion_history": [],
}
markdown = (
f"# {item['name']}\n\n"
"Auto-drafted capability entry. Review maturity, evidence, and relations "
"before promoting.\n"
)
return (
"---\n"
+ yaml.safe_dump(front_matter, sort_keys=False, allow_unicode=True)
+ "---\n\n"
+ markdown
)
def format_publish_check_markdown(result: dict[str, Any]) -> str:
lines = ["# Federation publish check", ""]
lines.append(f"**Repo:** `{result['repo_root']}`")
lines.append(f"**Result:** {'PASS' if result['ok'] else 'FAIL'}")
lines.append("")
for check in result["checks"]:
status = "ok" if check["ok"] else "FAIL"
detail = check.get("detail", "")
name = check["name"]
lines.append(f"- **{name}**: {status}{detail}")
if check.get("url"):
lines.append(f" `{check['url']}`")
if result.get("remediation"):
lines.append("")
lines.append(f"**Remediation:** {result['remediation']}")
return "\n".join(lines) + "\n"

102
reuse_surface/llm_bridge.py Normal file
View File

@@ -0,0 +1,102 @@
from __future__ import annotations
import json
import os
import re
import urllib.error
import urllib.request
from pathlib import Path
from typing import Any
from jsonschema import Draft202012Validator
from reuse_surface.registry import ROOT
DRAFT_SCHEMA_PATH = ROOT / "schemas" / "registry-draft.schema.json"
def llm_connect_url(explicit: str | None = None) -> str:
base = (explicit or os.environ.get("LLM_CONNECT_URL", "")).rstrip("/")
if not base:
raise ValueError(
"LLM backend not configured; set LLM_CONNECT_URL or pass --llm-url"
)
return base
def load_draft_schema() -> dict[str, Any]:
return json.loads(DRAFT_SCHEMA_PATH.read_text(encoding="utf-8"))
def execute_prompt(
prompt: str,
*,
base_url: str | None = None,
config: dict[str, Any] | None = None,
) -> str:
url = f"{llm_connect_url(base_url)}/execute"
body: dict[str, Any] = {"prompt": prompt}
if config:
body["config"] = config
data = json.dumps(body).encode("utf-8")
request = urllib.request.Request(
url,
data=data,
headers={
"Content-Type": "application/json",
"Accept": "application/json",
"User-Agent": "reuse-surface/0.1",
},
method="POST",
)
try:
with urllib.request.urlopen(request, timeout=120) as response:
payload = json.loads(response.read().decode("utf-8"))
except urllib.error.HTTPError as exc:
raw = exc.read().decode("utf-8")
raise ValueError(f"llm-connect returned {exc.code}: {raw}") from exc
content = payload.get("content")
if not isinstance(content, str) or not content.strip():
raise ValueError("llm-connect response missing content")
return content
def extract_json_object(text: str) -> dict[str, Any]:
stripped = text.strip()
if stripped.startswith("```"):
stripped = re.sub(r"^```(?:json)?\s*", "", stripped)
stripped = re.sub(r"\s*```$", "", stripped)
try:
data = json.loads(stripped)
except json.JSONDecodeError:
match = re.search(r"\{.*\}", stripped, re.DOTALL)
if not match:
raise ValueError("llm response did not contain JSON object") from None
data = json.loads(match.group(0))
if not isinstance(data, dict):
raise ValueError("llm response JSON must be an object")
return data
def request_registry_draft(
prompt: str,
*,
base_url: str | None = None,
config: dict[str, Any] | None = None,
) -> dict[str, Any]:
draft = extract_json_object(execute_prompt(prompt, base_url=base_url, config=config))
validator = Draft202012Validator(load_draft_schema())
errors = sorted(validator.iter_errors(draft), key=lambda err: list(err.path))
if errors:
messages = "; ".join(error.message for error in errors[:3])
raise ValueError(f"draft schema validation failed: {messages}")
return draft
def request_json_object(
prompt: str,
*,
base_url: str | None = None,
config: dict[str, Any] | None = None,
) -> dict[str, Any]:
return extract_json_object(execute_prompt(prompt, base_url=base_url, config=config))

View File

@@ -61,3 +61,30 @@ def parse_vector(vector: str) -> dict[str, str]:
def level_at_least(dimension: str, current: str, minimum: str) -> bool: def level_at_least(dimension: str, current: str, minimum: str) -> bool:
order = LEVEL_ORDERS[dimension] order = LEVEL_ORDERS[dimension]
return order.index(current) >= order.index(minimum) return order.index(current) >= order.index(minimum)
def registry_paths(repo_root: Path) -> dict[str, Path]:
registry = repo_root / "registry"
return {
"registry": registry,
"capabilities": registry / "capabilities",
"index": registry / "indexes" / "capabilities.yaml",
"sources": registry / "federation" / "sources.yaml",
}
def load_index_at(path: Path) -> dict[str, Any]:
with path.open(encoding="utf-8") as handle:
return yaml.safe_load(handle)
def entry_vector(front_matter: dict[str, Any]) -> str:
discovery = front_matter["maturity"]["discovery"]["current"]
availability = front_matter["maturity"]["availability"]["current"]
completeness = front_matter["external_evidence"]["completeness"]["level"]
reliability = front_matter["external_evidence"]["reliability"]["level"]
return f"{discovery} / {availability} / {completeness} / {reliability}"
def vectors_match(index_vector: str, front_matter: dict[str, Any]) -> bool:
return index_vector.replace(" ", "") == entry_vector(front_matter).replace(" ", "")

View File

@@ -0,0 +1,273 @@
from __future__ import annotations
import json
import subprocess
import textwrap
from pathlib import Path
from typing import Any
import yaml
from reuse_surface.llm_bridge import request_json_object
from reuse_surface.registry import (
entry_vector,
load_index_at,
parse_front_matter,
registry_paths,
vectors_match,
)
SAFE_EVIDENCE_PREFIXES = ("tests/", ".gitea/workflows/")
def git_changed_files(repo_root: Path, since_ref: str) -> list[str]:
result = subprocess.run(
["git", "-C", str(repo_root), "diff", "--name-only", since_ref, "HEAD"],
capture_output=True,
text=True,
check=False,
)
if result.returncode != 0:
raise ValueError(result.stderr.strip() or f"git diff failed for {since_ref}")
return [line.strip() for line in result.stdout.splitlines() if line.strip()]
def collect_deterministic_suggestions(
repo_root: Path,
*,
capability_id: str | None = None,
git_since: str | None = None,
) -> list[dict[str, Any]]:
paths = registry_paths(repo_root)
if not paths["index"].exists():
raise ValueError("registry index missing; run establish --scaffold first")
index = load_index_at(paths["index"])
rows = index.get("capabilities", [])
if capability_id:
rows = [row for row in rows if row["id"] == capability_id]
if not rows:
raise ValueError(f"capability not in index: {capability_id}")
changed_files = git_changed_files(repo_root, git_since) if git_since else []
suggestions: list[dict[str, Any]] = []
for row in rows:
entry_path = repo_root / row["path"]
if not entry_path.exists():
suggestions.append(
{
"capability_id": row["id"],
"kind": "missing_entry",
"detail": f"missing file {row['path']}",
}
)
continue
front_matter = parse_front_matter(entry_path)
if not vectors_match(row["vector"], front_matter):
suggestions.append(
{
"capability_id": row["id"],
"kind": "vector_drift",
"detail": "index vector differs from entry front matter",
"index_vector": row["vector"],
"entry_vector": entry_vector(front_matter),
"apply_patch": {
"field": "index.vector",
"value": entry_vector(front_matter),
},
}
)
evidence_tests = front_matter.get("evidence", {}).get("tests", [])
for changed in changed_files:
if changed.startswith("tests/") and changed not in evidence_tests:
suggestions.append(
{
"capability_id": row["id"],
"kind": "evidence_test",
"detail": f"new test file not cited: {changed}",
"apply_patch": {
"field": "evidence.tests",
"append": changed,
},
}
)
artifacts = front_matter.get("availability", {}).get("current_artifacts", [])
for changed in changed_files:
if changed.endswith(".py") and changed.startswith(
tuple(
p.name + "/"
for p in repo_root.iterdir()
if p.is_dir() and (p / "__init__.py").exists()
)
):
if changed not in artifacts:
suggestions.append(
{
"capability_id": row["id"],
"kind": "availability_artifact",
"detail": f"changed module not cited: {changed}",
"apply_patch": {
"field": "availability.current_artifacts",
"append": changed,
},
}
)
return suggestions
def apply_deterministic_suggestions(
repo_root: Path,
suggestions: list[dict[str, Any]],
) -> list[str]:
paths = registry_paths(repo_root)
index = load_index_at(paths["index"])
index_by_id = {row["id"]: row for row in index.get("capabilities", [])}
changed: list[str] = []
entry_cache: dict[str, dict[str, Any]] = {}
entry_paths: dict[str, Path] = {}
for suggestion in suggestions:
patch = suggestion.get("apply_patch")
if not patch:
continue
cap_id = suggestion["capability_id"]
if patch["field"] == "index.vector" and cap_id in index_by_id:
index_by_id[cap_id]["vector"] = patch["value"]
changed.append(f"index vector for {cap_id}")
row = index_by_id.get(cap_id)
if not row:
continue
entry_path = repo_root / row["path"]
if cap_id not in entry_cache:
entry_cache[cap_id] = parse_front_matter(entry_path)
entry_paths[cap_id] = entry_path
front_matter = entry_cache[cap_id]
if patch["field"] == "evidence.tests":
tests = front_matter.setdefault("evidence", {}).setdefault("tests", [])
if patch["append"] not in tests:
tests.append(patch["append"])
changed.append(f"{cap_id} evidence.tests += {patch['append']}")
if patch["field"] == "availability.current_artifacts":
artifacts = front_matter.setdefault("availability", {}).setdefault(
"current_artifacts", []
)
if patch["append"] not in artifacts:
artifacts.append(patch["append"])
changed.append(
f"{cap_id} availability.current_artifacts += {patch['append']}"
)
if changed:
paths["index"].write_text(
yaml.safe_dump(index, sort_keys=False, allow_unicode=True),
encoding="utf-8",
)
for cap_id, front_matter in entry_cache.items():
_write_front_matter(entry_paths[cap_id], front_matter)
return changed
def _write_front_matter(path: Path, front_matter: dict[str, Any]) -> None:
text = path.read_text(encoding="utf-8")
marker_end = text.find("\n---", 4)
body = text[marker_end + 4 :] if marker_end != -1 else "\n"
path.write_text(
"---\n"
+ yaml.safe_dump(front_matter, sort_keys=False, allow_unicode=True)
+ "---"
+ body,
encoding="utf-8",
)
def build_update_prompt(
repo_root: Path,
capability_id: str,
*,
git_since: str | None = None,
) -> str:
paths = registry_paths(repo_root)
index = load_index_at(paths["index"])
row = next((item for item in index["capabilities"] if item["id"] == capability_id), None)
if not row:
raise ValueError(f"capability not in index: {capability_id}")
entry = parse_front_matter(repo_root / row["path"])
diff = ""
if git_since:
proc = subprocess.run(
[
"git",
"-C",
str(repo_root),
"diff",
git_since,
"HEAD",
"--",
"registry/",
"reuse_surface/",
"tests/",
],
capture_output=True,
text=True,
check=False,
)
diff = proc.stdout[:12000]
return textwrap.dedent(
f"""
Suggest registry entry updates for capability `{capability_id}`.
Return ONLY JSON:
{{
"promotion_history": [
{{"date": "YYYY-MM-DD", "dimension": "availability", "from": "A3", "to": "A4", "rationale": "..."}}
],
"consumer_feedback": ["optional string notes"],
"notes": ["human review items"]
}}
Current entry YAML:
{yaml.safe_dump(entry, sort_keys=False)}
Git diff since {git_since or 'N/A'}:
{diff or '(none)'}
"""
).strip()
def suggest_llm_updates(
repo_root: Path,
capability_id: str,
*,
git_since: str | None = None,
llm_url: str | None = None,
) -> dict[str, Any]:
prompt = build_update_prompt(repo_root, capability_id, git_since=git_since)
return request_json_object(
prompt,
base_url=llm_url,
config={"temperature": 0.2, "max_tokens": 2000},
)
def format_suggestions_markdown(suggestions: list[dict[str, Any]]) -> str:
if not suggestions:
return "# Registry update suggestions\n\n_No suggestions._\n"
lines = ["# Registry update suggestions", ""]
for item in suggestions:
lines.append(f"- `{item['capability_id']}` **{item['kind']}**: {item['detail']}")
lines.append("")
lines.append(f"**{len(suggestions)}** suggestion(s). Use `--apply` to apply safe patches.")
return "\n".join(lines) + "\n"
def format_suggestions_json(suggestions: list[dict[str, Any]]) -> str:
return json.dumps({"count": len(suggestions), "suggestions": suggestions}, indent=2)

259
reuse_surface/stats.py Normal file
View File

@@ -0,0 +1,259 @@
from __future__ import annotations
import json
import urllib.error
import urllib.request
from collections import Counter
from pathlib import Path
from typing import Any
import yaml
from reuse_surface import hub_client
from reuse_surface.registry import (
LEVEL_ORDERS,
entry_vector,
load_index_at,
parse_front_matter,
parse_vector,
registry_paths,
vectors_match,
)
def _histogram(values: list[str], order: list[str]) -> dict[str, int]:
counts = Counter(values)
return {level: counts.get(level, 0) for level in order if counts.get(level, 0)}
def _probe_url(url: str) -> dict[str, Any]:
request = urllib.request.Request(
url,
method="HEAD",
headers={"User-Agent": "reuse-surface/0.1"},
)
try:
with urllib.request.urlopen(request, timeout=30) as response:
return {
"url": url,
"status": response.status,
"content_type": response.headers.get("Content-Type", ""),
"ok": response.status == 200,
}
except urllib.error.HTTPError as exc:
return {
"url": url,
"status": exc.code,
"content_type": exc.headers.get("Content-Type", ""),
"ok": False,
}
except urllib.error.URLError as exc:
return {"url": url, "status": None, "error": str(exc.reason), "ok": False}
def collect_stats(
repo_root: Path,
*,
federation_ready: bool = False,
raw_url: str | None = None,
hub_url: str | None = None,
) -> dict[str, Any]:
paths = registry_paths(repo_root)
stats: dict[str, Any] = {
"repo_root": str(repo_root),
"registry_present": paths["registry"].exists(),
"index_present": paths["index"].exists(),
"sources_present": paths["sources"].exists(),
"capability_count": 0,
"histograms": {},
"reliability": {"r0_r2": 0, "r3_plus": 0},
"consumption_modes": {},
"vector_drift": [],
"federation": {},
"hub": {},
}
if not paths["index"].exists():
if federation_ready and raw_url:
stats["federation"]["raw_url_probe"] = _probe_url(raw_url)
if hub_url or _hub_configured():
stats["hub"] = _hub_summary(hub_url)
return stats
index = load_index_at(paths["index"])
capabilities = index.get("capabilities", [])
stats["capability_count"] = len(capabilities)
stats["domain"] = index.get("domain")
discovery: list[str] = []
availability: list[str] = []
completeness: list[str] = []
reliability: list[str] = []
mode_counts: Counter[str] = Counter()
for row in capabilities:
vector = parse_vector(row["vector"])
discovery.append(vector["discovery"])
availability.append(vector["availability"])
completeness.append(vector["completeness"])
reliability.append(vector["reliability"])
for mode in row.get("consumption_modes", []):
mode_counts[mode] += 1
entry_path = repo_root / row["path"]
if entry_path.exists():
try:
front_matter = parse_front_matter(entry_path)
if not vectors_match(row["vector"], front_matter):
stats["vector_drift"].append(
{
"id": row["id"],
"index_vector": row["vector"],
"entry_vector": entry_vector(front_matter),
}
)
except ValueError:
stats["vector_drift"].append(
{"id": row["id"], "error": "invalid entry front matter"}
)
stats["histograms"] = {
"discovery": _histogram(discovery, LEVEL_ORDERS["discovery"]),
"availability": _histogram(availability, LEVEL_ORDERS["availability"]),
"completeness": _histogram(completeness, LEVEL_ORDERS["completeness"]),
"reliability": _histogram(reliability, LEVEL_ORDERS["reliability"]),
}
stats["reliability"] = {
"r0_r2": sum(1 for level in reliability if level in {"R0", "R1", "R2"}),
"r3_plus": sum(1 for level in reliability if level_at_least_reliability(level, "R3")),
}
stats["consumption_modes"] = dict(sorted(mode_counts.items()))
if federation_ready:
probe_url = raw_url
if not probe_url and paths["index"].exists():
probe_url = _default_raw_url(repo_root)
if probe_url:
stats["federation"]["raw_url_probe"] = _probe_url(probe_url)
stats["federation"]["index_valid_yaml"] = _index_yaml_valid(paths["index"])
stats["hub"] = _hub_summary(hub_url)
return stats
def level_at_least_reliability(current: str, minimum: str) -> bool:
order = LEVEL_ORDERS["reliability"]
return order.index(current) >= order.index(minimum)
def _hub_configured() -> bool:
import os
return bool(os.environ.get("REUSE_SURFACE_URL"))
def _hub_summary(hub_url: str | None) -> dict[str, Any]:
try:
status, payload = hub_client.hub_list(hub_url)
except (ValueError, urllib.error.URLError, OSError):
return {"configured": False}
if status != 200:
return {"configured": True, "status": status, "error": payload}
repos = payload.get("repos", [])
return {
"configured": True,
"registration_count": payload.get("count", len(repos)),
"enabled_count": sum(1 for repo in repos if repo.get("enabled", True)),
}
def _default_raw_url(repo_root: Path) -> str | None:
return None
def _index_yaml_valid(index_path: Path) -> bool:
try:
data = load_index_at(index_path)
return isinstance(data, dict) and "capabilities" in data
except (OSError, yaml.YAMLError):
return False
def format_stats_markdown(stats: dict[str, Any]) -> str:
lines = ["# Registry stats", ""]
lines.append(f"**Repo:** `{stats['repo_root']}`")
lines.append(f"**Capabilities:** {stats['capability_count']}")
if stats.get("domain"):
lines.append(f"**Domain:** `{stats['domain']}`")
lines.append("")
lines.append("## Layout")
lines.append(f"- registry present: `{stats['registry_present']}`")
lines.append(f"- index present: `{stats['index_present']}`")
lines.append(f"- federation sources present: `{stats['sources_present']}`")
lines.append("")
rel = stats["reliability"]
lines.append("## Reliability bands (index vectors)")
lines.append(f"- R0R2: **{rel['r0_r2']}**")
lines.append(f"- R3+: **{rel['r3_plus']}**")
lines.append("")
for dimension, histogram in stats.get("histograms", {}).items():
if not histogram:
continue
lines.append(f"## {dimension.title()} histogram")
for level, count in histogram.items():
lines.append(f"- `{level}`: {count}")
lines.append("")
if stats.get("consumption_modes"):
lines.append("## Consumption modes")
for mode, count in stats["consumption_modes"].items():
lines.append(f"- `{mode}`: {count}")
lines.append("")
drift = stats.get("vector_drift", [])
lines.append(f"## Vector drift: **{len(drift)}**")
for item in drift[:10]:
if "error" in item:
lines.append(f"- `{item['id']}`: {item['error']}")
else:
lines.append(
f"- `{item['id']}`: index `{item['index_vector']}` "
f"≠ entry `{item['entry_vector']}`"
)
if len(drift) > 10:
lines.append(f"- … and {len(drift) - 10} more")
lines.append("")
federation = stats.get("federation", {})
if federation:
lines.append("## Federation readiness")
if "index_valid_yaml" in federation:
lines.append(f"- index valid YAML: `{federation['index_valid_yaml']}`")
probe = federation.get("raw_url_probe")
if probe:
status = probe.get("status")
ok = probe.get("ok")
lines.append(f"- raw URL probe: status **{status}** ({'ok' if ok else 'fail'})")
lines.append(f" `{probe.get('url', '')}`")
lines.append("")
hub = stats.get("hub", {})
if hub.get("configured"):
lines.append("## Hub")
if "registration_count" in hub:
lines.append(
f"- registrations: **{hub['registration_count']}** "
f"({hub.get('enabled_count', 0)} enabled)"
)
elif "error" in hub:
lines.append(f"- hub error: {hub['error']}")
lines.append("")
return "\n".join(lines) + "\n"
def format_stats_json(stats: dict[str, Any]) -> str:
return json.dumps(stats, indent=2, sort_keys=True)

View File

@@ -0,0 +1,69 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://reuse-surface.local/schemas/registry-draft.schema.json",
"title": "RegistryDiscoveryDraft",
"type": "object",
"additionalProperties": false,
"required": ["capabilities"],
"properties": {
"domain": {
"type": "string"
},
"capabilities": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"required": ["id", "name", "summary"],
"properties": {
"id": {
"type": "string",
"pattern": "^capability\\.[a-z0-9]+(\\.[a-z0-9-]+)+$"
},
"name": {
"type": "string",
"minLength": 1
},
"summary": {
"type": "string",
"minLength": 1
},
"owner": {
"type": "string"
},
"vector": {
"type": "string",
"pattern": "^D[0-7] / A[0-7] / C[0-6] / R[0-6]$"
},
"tags": {
"type": "array",
"items": {
"type": "string"
}
},
"consumption_modes": {
"type": "array",
"items": {
"type": "string"
}
},
"discovery_intent": {
"type": "string"
},
"discovery_includes": {
"type": "array",
"items": {
"type": "string"
}
},
"discovery_excludes": {
"type": "array",
"items": {
"type": "string"
}
}
}
}
}
}
}

77
tests/test_establish.py Normal file
View File

@@ -0,0 +1,77 @@
from __future__ import annotations
from pathlib import Path
from unittest.mock import patch
import yaml
from reuse_surface.establish import (
discover_capabilities,
publish_check,
scaffold_registry,
)
from reuse_surface.registry import registry_paths
def test_scaffold_creates_layout(tmp_path: Path):
created = scaffold_registry(tmp_path, domain="helix_forge")
paths = registry_paths(tmp_path)
assert paths["index"] in created
data = yaml.safe_load(paths["index"].read_text(encoding="utf-8"))
assert data["capabilities"] == []
assert data["domain"] == "helix_forge"
def test_scaffold_refuses_existing_without_force(tmp_path: Path):
scaffold_registry(tmp_path)
try:
scaffold_registry(tmp_path)
raise AssertionError("expected ValueError")
except ValueError as exc:
assert "already exists" in str(exc)
def test_publish_check_local_index(tmp_path: Path):
scaffold_registry(tmp_path)
result = publish_check(tmp_path)
assert result["ok"] is True
assert any(check["name"] == "local_index_yaml" for check in result["checks"])
def test_publish_check_raw_url_fail(tmp_path: Path):
with patch(
"reuse_surface.establish._probe_raw_url",
return_value={"ok": False, "status": 303, "content_type": "text/html"},
):
result = publish_check(
tmp_path,
raw_url="https://example.com/capabilities.yaml",
)
assert result["ok"] is False
assert result.get("remediation")
def test_discover_dry_run_mock_llm(tmp_path: Path):
scaffold_registry(tmp_path)
(tmp_path / "README.md").write_text("# Demo service\n", encoding="utf-8")
draft = {
"domain": "helix_forge",
"capabilities": [
{
"id": "capability.demo.sample",
"name": "Sample",
"summary": "Sample capability.",
"owner": "demo",
"vector": "D2 / A0 / C0 / R0",
"tags": ["demo"],
"consumption_modes": ["informational"],
"discovery_intent": "Enable demo planning.",
}
],
}
with patch(
"reuse_surface.establish.request_registry_draft",
return_value=draft,
):
result = discover_capabilities(tmp_path, dry_run=True, apply=False)
assert result["draft"]["capabilities"][0]["id"] == "capability.demo.sample"

53
tests/test_llm_bridge.py Normal file
View File

@@ -0,0 +1,53 @@
from __future__ import annotations
import json
from unittest.mock import patch
import pytest
from reuse_surface.llm_bridge import (
extract_json_object,
llm_connect_url,
request_registry_draft,
)
def test_extract_json_object_from_fenced_block():
data = extract_json_object('```json\n{"capabilities": []}\n```')
assert data == {"capabilities": []}
def test_llm_connect_url_missing_raises():
with pytest.raises(ValueError, match="LLM_CONNECT_URL"):
llm_connect_url(None)
def test_request_registry_draft_mock_http():
payload = {
"content": json.dumps(
{
"capabilities": [
{
"id": "capability.demo.sample",
"name": "Sample",
"summary": "Demo capability",
}
]
}
)
}
class FakeResponse:
def __enter__(self):
return self
def __exit__(self, *args):
return False
def read(self):
return json.dumps(payload).encode("utf-8")
with patch.dict("os.environ", {"LLM_CONNECT_URL": "http://llm.test"}):
with patch("urllib.request.urlopen", return_value=FakeResponse()):
draft = request_registry_draft("test prompt")
assert draft["capabilities"][0]["id"] == "capability.demo.sample"

View File

@@ -0,0 +1,87 @@
from __future__ import annotations
from pathlib import Path
import yaml
from reuse_surface.establish import scaffold_registry
from reuse_surface.registry import load_index_at, registry_paths
from reuse_surface.registry_update import (
apply_deterministic_suggestions,
collect_deterministic_suggestions,
)
def _write_minimal_entry(tmp_path: Path, cap_id: str, vector: str) -> str:
rel = "registry/capabilities/capability-demo-sample.md"
d, a, c, r = [part.strip() for part in vector.split("/")]
front_matter = {
"id": cap_id,
"name": "Sample",
"summary": "Sample",
"owner": "demo",
"status": "draft",
"domain": "helix_forge",
"tags": ["demo"],
"maturity": {
"discovery": {"current": d, "target": "D5", "confidence": "low"},
"availability": {"current": a, "target": "A3", "confidence": "low"},
},
"external_evidence": {
"completeness": {"level": c, "confidence": "low"},
"reliability": {"level": r, "confidence": "low"},
},
"discovery": {"intent": "demo", "includes": [], "excludes": []},
"availability": {
"current_level": a,
"target_level": "A3",
"current_artifacts": [],
"consumption_modes": ["informational"],
},
"relations": {"depends_on": [], "supports": [], "related_to": []},
"evidence": {"documentation": [], "tests": []},
"consumer_guidance": {
"recommended_for": [],
"not_recommended_for": [],
"known_limitations": [],
},
}
path = tmp_path / rel
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(
"---\n"
+ yaml.safe_dump(front_matter, sort_keys=False)
+ "---\n",
encoding="utf-8",
)
return rel
def test_vector_drift_suggestion(tmp_path: Path):
scaffold_registry(tmp_path)
cap_id = "capability.demo.sample"
rel = _write_minimal_entry(tmp_path, cap_id, "D3 / A0 / C0 / R0")
index_path = registry_paths(tmp_path)["index"]
index = load_index_at(index_path)
index["capabilities"] = [
{
"id": cap_id,
"name": "Sample",
"summary": "Sample",
"vector": "D2 / A0 / C0 / R0",
"domain": "helix_forge",
"status": "draft",
"owner": "demo",
"path": rel,
"tags": ["demo"],
"consumption_modes": ["informational"],
}
]
index_path.write_text(yaml.safe_dump(index, sort_keys=False), encoding="utf-8")
suggestions = collect_deterministic_suggestions(tmp_path, capability_id=cap_id)
assert any(item["kind"] == "vector_drift" for item in suggestions)
changed = apply_deterministic_suggestions(tmp_path, suggestions)
assert changed
updated = load_index_at(index_path)
assert updated["capabilities"][0]["vector"] == "D3 / A0 / C0 / R0"

20
tests/test_stats.py Normal file
View File

@@ -0,0 +1,20 @@
from __future__ import annotations
from pathlib import Path
from reuse_surface.stats import collect_stats, format_stats_markdown
def test_collect_stats_on_repo_root():
root = Path(__file__).resolve().parent.parent
stats = collect_stats(root)
assert stats["capability_count"] == 20
assert stats["index_present"] is True
assert "discovery" in stats["histograms"]
def test_format_stats_markdown_contains_count():
root = Path(__file__).resolve().parent.parent
text = format_stats_markdown(collect_stats(root))
assert "Capabilities:" in text
assert "20" in text

View File

@@ -104,6 +104,45 @@ reuse-surface hub sync --dry-run
Run the service locally: `REUSE_SURFACE_TOKEN=dev-token reuse-surface serve` Run the service locally: `REUSE_SURFACE_TOKEN=dev-token reuse-surface serve`
### stats
Registry maturity aggregates and federation readiness.
```bash
reuse-surface stats
reuse-surface stats --format json
reuse-surface stats --federation-ready --raw-url https://.../capabilities.yaml
```
### establish
Bootstrap or discover a capability registry in the current or target repo.
```bash
reuse-surface establish --scaffold --domain helix_forge
reuse-surface establish --scaffold --path ../state-hub
reuse-surface establish --publish-check --raw-url https://.../capabilities.yaml
export LLM_CONNECT_URL=http://127.0.0.1:8088
reuse-surface establish --discover --dry-run
reuse-surface establish --discover --apply
```
`--scaffold` creates `registry/` layout. `--publish-check` probes raw URL and
local index YAML. `--discover` drafts capabilities via llm-connect (optional).
### update
Refresh registry metadata from repo drift signals.
```bash
reuse-surface update --capability capability.registry.register --dry-run
reuse-surface update --all --from-git-since HEAD~5 --apply
reuse-surface update --capability capability.registry.register --suggest-maturity
```
Deterministic patches (`vector_drift`, new `tests/` citations) apply with
`--apply`. LLM suggestions use `--suggest-maturity` and remain review-only.
### report cohorts ### report cohorts
Export capability cohorts for planning or implementation reuse decisions. Export capability cohorts for planning or implementation reuse decisions.
@@ -140,6 +179,11 @@ Stable IDs and maturity fields are preserved for agent consumption (UC-RS-019).
| Publish catalog | `reuse-surface catalog` | | Publish catalog | `reuse-surface catalog` |
| Compose federation | `reuse-surface federation compose` | | Compose federation | `reuse-surface federation compose` |
| Sync federation manifest from hub | `reuse-surface hub sync` | | Sync federation manifest from hub | `reuse-surface hub sync` |
| Registry stats | `reuse-surface stats` |
| Bootstrap sibling registry | `reuse-surface establish --scaffold` |
| Verify index publish URL | `reuse-surface establish --publish-check` |
| Draft capabilities (LLM) | `reuse-surface establish --discover` |
| Refresh entry metadata | `reuse-surface update` |
| Planning cohort export | `reuse-surface report cohorts` | | Planning cohort export | `reuse-surface report cohorts` |
| Relation graph | `reuse-surface graph` | | Relation graph | `reuse-surface graph` |

View File

@@ -4,11 +4,11 @@ type: workplan
title: "Registry establish, update, and stats with optional llm-connect assist" title: "Registry establish, update, and stats with optional llm-connect assist"
domain: helix_forge domain: helix_forge
repo: reuse-surface repo: reuse-surface
status: ready status: finished
owner: codex owner: codex
topic_slug: helix-forge topic_slug: helix-forge
created: "2026-06-16" created: "2026-06-16"
updated: "2026-06-16" updated: "2026-06-17"
state_hub_workstream_id: "239a0077-8593-4dc7-918d-4c23895275f6" state_hub_workstream_id: "239a0077-8593-4dc7-918d-4c23895275f6"
--- ---
@@ -91,7 +91,7 @@ reuse-surface update --from-git-since HEAD~5 --apply
```task ```task
id: REUSE-WP-0013-T01 id: REUSE-WP-0013-T01
status: todo status: done
priority: high priority: high
state_hub_task_id: "98e65330-bfc7-4282-b372-d35542b899ce" state_hub_task_id: "98e65330-bfc7-4282-b372-d35542b899ce"
``` ```
@@ -112,7 +112,7 @@ Output: Markdown default, `--format json`. Pytest coverage. Document in
```task ```task
id: REUSE-WP-0013-T02 id: REUSE-WP-0013-T02
status: todo status: done
priority: high priority: high
state_hub_task_id: "b8fedd87-d0d3-41b4-9af8-e36d52bfe1c5" state_hub_task_id: "b8fedd87-d0d3-41b4-9af8-e36d52bfe1c5"
``` ```
@@ -131,7 +131,7 @@ No llm-connect dependency. Pytest with temp directory.
```task ```task
id: REUSE-WP-0013-T03 id: REUSE-WP-0013-T03
status: todo status: done
priority: medium priority: medium
state_hub_task_id: "2924d685-709f-4e28-886f-b363cd9c40b4" state_hub_task_id: "2924d685-709f-4e28-886f-b363cd9c40b4"
``` ```
@@ -147,7 +147,7 @@ Federation publish helper for sibling repo operators:
```task ```task
id: REUSE-WP-0013-T04 id: REUSE-WP-0013-T04
status: todo status: done
priority: high priority: high
state_hub_task_id: "650ebee5-b34b-4ed8-891d-d93aacebadd7" state_hub_task_id: "650ebee5-b34b-4ed8-891d-d93aacebadd7"
``` ```
@@ -166,7 +166,7 @@ Thin client boundary:
```task ```task
id: REUSE-WP-0013-T05 id: REUSE-WP-0013-T05
status: todo status: done
priority: medium priority: medium
state_hub_task_id: "b9154889-f538-4266-9918-b277f9a297be" state_hub_task_id: "b9154889-f538-4266-9918-b277f9a297be"
``` ```
@@ -185,7 +185,7 @@ LLM-assisted bootstrap after `--scaffold` or on empty registry:
```task ```task
id: REUSE-WP-0013-T06 id: REUSE-WP-0013-T06
status: todo status: done
priority: medium priority: medium
state_hub_task_id: "b79558da-54b2-4712-91d2-b298c7cf2c40" state_hub_task_id: "b79558da-54b2-4712-91d2-b298c7cf2c40"
``` ```
@@ -210,7 +210,7 @@ Targets: single `--capability`, `--all`, `--from-git-since <ref>`.
```task ```task
id: REUSE-WP-0013-T07 id: REUSE-WP-0013-T07
status: todo status: done
priority: low priority: low
state_hub_task_id: "a55a2f26-004e-4c20-90cb-49bed64a1291" state_hub_task_id: "a55a2f26-004e-4c20-90cb-49bed64a1291"
``` ```
@@ -227,13 +227,20 @@ state_hub_task_id: "a55a2f26-004e-4c20-90cb-49bed64a1291"
## Acceptance ## Acceptance
- [ ] `reuse-surface stats` reports maturity and federation-readiness aggregates - [x] `reuse-surface stats` reports maturity and federation-readiness aggregates
- [ ] `establish --scaffold` creates valid empty registry layout without overwrite accidents - [x] `establish --scaffold` creates valid empty registry layout without overwrite accidents
- [ ] `establish --publish-check` detects 303 vs 200 raw URL outcomes - [x] `establish --publish-check` detects 303 vs 200 raw URL outcomes
- [ ] llm-connect bridge works with mocked HTTP; fails clearly when URL unset - [x] llm-connect bridge works with mocked HTTP; fails clearly when URL unset
- [ ] `establish --discover --dry-run` produces schema-valid draft JSON from fixture context - [x] `establish --discover --dry-run` produces schema-valid draft JSON from fixture context
- [ ] `update --dry-run` reports deterministic drift on sample repo - [x] `update --dry-run` reports deterministic drift on sample repo
- [ ] All new commands documented; gap priority 24 recorded - [x] All new commands documented; gap priority 24 recorded
## Completion notes (2026-06-17)
- Modules: `stats.py`, `establish.py`, `registry_update.py`, `llm_bridge.py`
- Schema: `schemas/registry-draft.schema.json`
- `validate --root` for sibling repo validation after establish --apply
- 43 pytest tests; optional `pip install -e ".[llm]"` extra
## Out of scope ## Out of scope