diff --git a/.gitea/workflows/ci.yml b/.gitea/workflows/ci.yml index ea0f94f..2b62dd8 100644 --- a/.gitea/workflows/ci.yml +++ b/.gitea/workflows/ci.yml @@ -32,6 +32,9 @@ jobs: reuse-surface catalog reuse-surface graph --check --fail-on-warnings + - name: Registry stats (informational) + run: reuse-surface stats || true + - name: Planning cohort report (informational) run: reuse-surface report cohorts --planning-min D4 || true diff --git a/SCOPE.md b/SCOPE.md index e514e20..d543b02 100644 --- a/SCOPE.md +++ b/SCOPE.md @@ -60,6 +60,11 @@ The MVP registry foundation, CLI tooling (REUSE-WP-0003), federation stack against `https://reuse.coulomb.social` - **Sync local federation manifest from hub** with `reuse-surface hub sync` - **Export planning cohorts** with `reuse-surface report cohorts` +- **Bootstrap a sibling registry** with `reuse-surface establish --scaffold` +- **Verify index publish readiness** with `reuse-surface establish --publish-check` +- **View registry stats** with `reuse-surface stats` +- **Draft or refresh entries** with `reuse-surface establish --discover` and + `reuse-surface update` (optional llm-connect backend) - **Run the hub locally or in a container** with `reuse-surface serve` - **Generate relation graphs** with `reuse-surface graph` - **Explore relations interactively** at `docs/graph/index.html` @@ -104,8 +109,8 @@ See `tools/README.md` for command reference. - **Federated index:** `registry/indexes/federated.yaml` (local compose). - **Relation graph:** `docs/graph/capability-graph.mmd`, `docs/graph/index.html`. - **Searchable catalog:** `docs/catalog/search.html`. -- **Workplans:** REUSE-WP-0001 through REUSE-WP-0011 finished; WP-0011 archived; - **REUSE-WP-0012** finished (federation scale + intent alignment). +- **Workplans:** REUSE-WP-0001 through REUSE-WP-0012 finished/archived; + **REUSE-WP-0013** finished (registry establish/update/stats). - **Assessment history:** `history/2026-06-15-intent-scope-assessment.md`. - **Self-assessed vector:** `D5 / A4 / C5 / R3` (see `docs/IntentScopeGapAnalysis.md`). diff --git a/docs/IntentScopeGapAnalysis.md b/docs/IntentScopeGapAnalysis.md index 93446a0..e09ef88 100644 --- a/docs/IntentScopeGapAnalysis.md +++ b/docs/IntentScopeGapAnalysis.md @@ -3,7 +3,7 @@ **Repository:** `reuse-surface` **Artifact:** `docs/IntentScopeGapAnalysis.md` **Status:** Living analysis -**Updated:** 2026-06-16 +**Updated:** 2026-06-17 **Purpose:** Record alignment, drift, and open gaps between declared intent and current delivered scope so future workplans can close them deliberately. @@ -30,6 +30,8 @@ four maturity dimensions, and human/agent consumers. standardization tracker still manual. 3. **Hub automation** — `hub sync` shipped; polling/webhooks still absent. 4. **Managed platform posture** — A5 container documented; A6/Postgres deferred. +5. **Registry bootstrap in sibling repos** — `establish`/`update`/`stats` shipped; + sibling adoption still operator-driven. **Current reuse-surface product vector (self-assessment):** `D5 / A4 / C5 / R3` @@ -197,8 +199,10 @@ archived workplans under `workplans/archived/`. | 21 | INTENT layout sync | Update INTENT.md tree and example entry shape | **Closed** (WP-0012) | | 22 | Hub hardening | Postgres option, backup, documented SLO (A5→A6 path) | **Closed** (doc; implementation deferred) | | 23 | External evidence program | Raise catalog R levels with consumer_feedback | **Closed** (checklist + 3 entries; telemetry deferred) | +| 24 | Registry bootstrap tooling | `establish`, `update`, `stats` for sibling repos | **Closed** (WP-0013) | -**Workplan:** `REUSE-WP-0012` (finished). **Assessment snapshots:** +**Workplan:** `REUSE-WP-0013` (finished). Prior: `REUSE-WP-0012` (finished). +**Assessment snapshots:** `history/2026-06-15-intent-scope-assessment.md`, `history/2026-06-16-hub-registration-blocks.md`. @@ -227,4 +231,5 @@ archived workplans under `workplans/archived/`. | 2026-06-15 | REUSE-WP-0011 closed priority 17; hub live at reuse.coulomb.social | | 2026-06-15 | Post-WP-0011 refresh: 20 capabilities, vector D5/A4/C4/R3, priorities 18–23 proposed | | 2026-06-15 | REUSE-WP-0012 proposed; assessment archived in `history/2026-06-15-intent-scope-assessment.md` | -| 2026-06-16 | REUSE-WP-0012 closed priorities 19–23; priority 18 deferred on sibling index blocks; vector C5 | \ No newline at end of file +| 2026-06-16 | REUSE-WP-0012 closed priorities 19–23; priority 18 deferred on sibling index blocks; vector C5 | +| 2026-06-17 | REUSE-WP-0013 closed priority 24; establish/update/stats + optional llm-connect assist | \ No newline at end of file diff --git a/docs/RegistryFederation.md b/docs/RegistryFederation.md index 0169823..a729440 100644 --- a/docs/RegistryFederation.md +++ b/docs/RegistryFederation.md @@ -97,6 +97,18 @@ curl -fsS "" | head source) to an environment variable holding a Bearer token or full header value. The hub stores `auth_env` / `auth_header` names only — never secret values. +### Sibling onboarding (CLI) + +```bash +cd ../state-hub +reuse-surface establish --scaffold --domain helix_forge +# optional: LLM_CONNECT_URL=... reuse-surface establish --discover --dry-run +reuse-surface validate --root . +git push origin main +reuse-surface establish --publish-check \ + --raw-url https://gitea.coulomb.social/coulomb/state-hub/raw/main/registry/indexes/capabilities.yaml +``` + ### Registration checklist 1. Merge capability index to the default branch. diff --git a/pyproject.toml b/pyproject.toml index 1b19b4b..9cff6bf 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -20,6 +20,9 @@ dev = [ "httpx>=0.27", "pytest>=8.0", ] +llm = [ + "llm-connect", +] [project.scripts] reuse-surface = "reuse_surface.cli:main" diff --git a/registry/README.md b/registry/README.md index a4779b0..583d3b5 100644 --- a/registry/README.md +++ b/registry/README.md @@ -35,6 +35,21 @@ registry/ Missing evidence is acceptable in the MVP when it is explicit rather than hidden. +## LLM-assisted discover review checklist + +When using `reuse-surface establish --discover` (llm-connect backend): + +- [ ] Every proposed `id` follows `capability..` and is not a duplicate +- [ ] `summary`, `discovery.intent`, and maturity vectors match repo reality +- [ ] `owner` reflects the delivering repository or team +- [ ] Relations are empty or manually added after human review +- [ ] Run `reuse-surface validate --root ` before merge +- [ ] Run `reuse-surface establish --publish-check` after pushing to `main` + +Discover drafts start at low maturity with explicit auto-draft risks in +`known_reliability_risks`. Promote only with evidence per +`specs/CapabilityMaturityStandard.md`. + ## Manual validation checklist Use this checklist until an automated CLI validator exists. diff --git a/reuse_surface/cli.py b/reuse_surface/cli.py index 4188516..54fc2fb 100644 --- a/reuse_surface/cli.py +++ b/reuse_surface/cli.py @@ -26,21 +26,48 @@ from reuse_surface.reports import ( format_cohort_markdown, select_cohort, ) +from reuse_surface.establish import ( + discover_capabilities, + format_publish_check_markdown, + publish_check, + scaffold_next_steps, + scaffold_registry, +) +from reuse_surface.registry_update import ( + apply_deterministic_suggestions, + collect_deterministic_suggestions, + format_suggestions_json, + format_suggestions_markdown, + suggest_llm_updates, +) +from reuse_surface.stats import collect_stats, format_stats_json, format_stats_markdown from reuse_surface.registry import ( ROOT, capability_paths, level_at_least, load_index, + load_index_at, load_schema, parse_front_matter, parse_vector, + registry_paths, ) -def _check_index_drift(entry_paths: list[Path], index: dict[str, Any]) -> list[str]: +def _registry_root(args: argparse.Namespace) -> Path: + if getattr(args, "root", None): + return Path(args.root).resolve() + return ROOT + + +def _check_index_drift( + entry_paths: list[Path], + index: dict[str, Any], + repo_root: Path, +) -> list[str]: warnings: list[str] = [] indexed_paths = {item["path"] for item in index.get("capabilities", [])} - file_paths = {str(path.relative_to(ROOT)) for path in entry_paths} + file_paths = {str(path.relative_to(repo_root)) for path in entry_paths} for path in sorted(file_paths - indexed_paths): warnings.append(f"index drift: entry file not indexed: {path}") for path in sorted(indexed_paths - file_paths): @@ -48,11 +75,22 @@ def _check_index_drift(entry_paths: list[Path], index: dict[str, Any]) -> list[s return warnings -def cmd_validate(args: argparse.Namespace) -> int: +def _capability_paths_for(repo_root: Path, target: Path | None) -> list[Path]: + if target is not None: + return [target] + cap_dir = registry_paths(repo_root)["capabilities"] + return sorted(path for path in cap_dir.glob("*.md") if path.name != ".gitkeep") + + +def _run_validate( + repo_root: Path, + *, + target: Path | None, + relations: bool, +) -> tuple[list[str], list[str], list[Path]]: schema = load_schema() validator = Draft202012Validator(schema) - target = Path(args.path) if args.path else None - paths = capability_paths(target) + paths = _capability_paths_for(repo_root, target) errors: list[str] = [] warnings: list[str] = [] @@ -67,10 +105,23 @@ def cmd_validate(args: argparse.Namespace) -> int: errors.append(f"{path}: {location}: {error.message}") if not target: - index = load_index() - warnings.extend(_check_index_drift(paths, index)) - if args.relations: + index_path = registry_paths(repo_root)["index"] + if index_path.exists(): + index = load_index_at(index_path) + warnings.extend(_check_index_drift(paths, index, repo_root)) + if relations and repo_root == ROOT: warnings.extend(check_relations()) + return errors, warnings, paths + + +def cmd_validate(args: argparse.Namespace) -> int: + repo_root = _registry_root(args) + target = Path(args.path) if args.path else None + if target and not target.is_absolute(): + target = repo_root / target + errors, warnings, paths = _run_validate( + repo_root, target=target, relations=args.relations + ) for warning in warnings: print(f"warning: {warning}", file=sys.stderr) @@ -329,6 +380,117 @@ def cmd_hub_sync(args: argparse.Namespace) -> int: return 0 +def cmd_stats(args: argparse.Namespace) -> int: + repo_root = Path(args.path or ".").resolve() + stats = collect_stats( + repo_root, + federation_ready=args.federation_ready, + raw_url=args.raw_url, + hub_url=getattr(args, "hub_url", None), + ) + if args.format == "json": + print(format_stats_json(stats)) + else: + print(format_stats_markdown(stats), end="") + return 0 + + +def cmd_establish(args: argparse.Namespace) -> int: + repo_root = Path(args.path or ".").resolve() + try: + if args.scaffold: + created = scaffold_registry( + repo_root, domain=args.domain, force=args.force + ) + for path in created: + print(f"ok: wrote {path.relative_to(repo_root)}") + print(scaffold_next_steps(repo_root)) + return 0 + if args.publish_check: + result = publish_check(repo_root, raw_url=args.raw_url) + print(format_publish_check_markdown(result), end="") + return 0 if result["ok"] else 1 + if args.discover: + result = discover_capabilities( + repo_root, + domain=args.domain, + dry_run=not args.apply, + apply=args.apply, + llm_url=args.llm_url, + context_max_files=args.context_max_files, + ) + if result.get("dry_run"): + print(yaml.safe_dump(result["draft"], sort_keys=False)) + return 0 + for path in result.get("written", []): + print(f"ok: wrote {path}") + validate_args = argparse.Namespace( + path=None, + root=str(repo_root), + relations=False, + fail_on_warnings=True, + ) + return cmd_validate(validate_args) + except ValueError as exc: + print(f"error: {exc}", file=sys.stderr) + return 1 + print("error: specify --scaffold, --publish-check, or --discover", file=sys.stderr) + return 1 + + +def cmd_update(args: argparse.Namespace) -> int: + repo_root = Path(args.path or ".").resolve() + try: + capability_id = None if args.all else args.capability + if not args.all and not args.capability: + print("error: specify --capability or --all", file=sys.stderr) + return 1 + if args.suggest_maturity: + cap_ids = [args.capability] if args.capability else [] + if args.all: + index = load_index_at(registry_paths(repo_root)["index"]) + cap_ids = [row["id"] for row in index.get("capabilities", [])] + payload = { + "suggestions": [ + suggest_llm_updates( + repo_root, + cap_id, + git_since=args.from_git_since, + llm_url=args.llm_url, + ) + for cap_id in cap_ids + ] + } + print(json.dumps(payload, indent=2, sort_keys=True)) + return 0 + + suggestions = collect_deterministic_suggestions( + repo_root, + capability_id=capability_id, + git_since=args.from_git_since, + ) + if args.apply: + changed = apply_deterministic_suggestions(repo_root, suggestions) + for line in changed: + print(f"ok: {line}") + validate_args = argparse.Namespace( + path=None, + root=str(repo_root), + relations=False, + fail_on_warnings=True, + ) + return cmd_validate(validate_args) + + if args.format == "json": + print(format_suggestions_json(suggestions)) + else: + print(format_suggestions_markdown(suggestions), end="") + return 0 + except ValueError as exc: + print(f"error: {exc}", file=sys.stderr) + return 1 + + def cmd_report_cohorts(args: argparse.Namespace) -> int: filters = cohort_filters_from_args(args) matches = select_cohort(filters) @@ -399,6 +561,10 @@ def main(argv: list[str] | None = None) -> int: action="store_true", help="exit non-zero when warnings are present", ) + validate.add_argument( + "--root", + help="registry repo root (default: reuse-surface install root)", + ) validate.set_defaults(func=cmd_validate) federation = subparsers.add_parser( @@ -539,6 +705,41 @@ def main(argv: list[str] | None = None) -> int: ) cohorts.set_defaults(func=cmd_report_cohorts) + stats = subparsers.add_parser("stats", help="registry maturity and federation stats") + stats.add_argument("--path", help="repo root (default: cwd)") + stats.add_argument("--federation-ready", action="store_true") + stats.add_argument("--raw-url", help="probe federation raw index URL") + stats.add_argument("--hub-url", help="hub base URL (or REUSE_SURFACE_URL)") + stats.add_argument("--format", choices=["markdown", "json"], default="markdown") + stats.set_defaults(func=cmd_stats) + + establish = subparsers.add_parser( + "establish", help="bootstrap or discover capability registry" + ) + establish.add_argument("--path", help="target repo root (default: cwd)") + establish.add_argument("--domain", default="helix_forge") + establish.add_argument("--force", action="store_true") + establish.add_argument("--scaffold", action="store_true") + establish.add_argument("--publish-check", action="store_true") + establish.add_argument("--discover", action="store_true") + establish.add_argument("--dry-run", action="store_true", help="discover preview (default)") + establish.add_argument("--apply", action="store_true", help="discover write + validate") + establish.add_argument("--raw-url", help="raw Gitea index URL for publish-check") + establish.add_argument("--llm-url", help="llm-connect base URL (or LLM_CONNECT_URL)") + establish.add_argument("--context-max-files", type=int, default=12) + establish.set_defaults(func=cmd_establish) + + update = subparsers.add_parser("update", help="refresh registry metadata from repo signals") + update.add_argument("--path", help="repo root (default: cwd)") + update.add_argument("--capability", help="single capability id") + update.add_argument("--all", action="store_true") + update.add_argument("--from-git-since", help="git ref for change detection") + update.add_argument("--apply", action="store_true") + update.add_argument("--suggest-maturity", action="store_true") + update.add_argument("--llm-url", help="llm-connect base URL (or LLM_CONNECT_URL)") + update.add_argument("--format", choices=["markdown", "json"], default="markdown") + update.set_defaults(func=cmd_update) + args = parser.parse_args(argv) return args.func(args) diff --git a/reuse_surface/establish.py b/reuse_surface/establish.py new file mode 100644 index 0000000..b41f933 --- /dev/null +++ b/reuse_surface/establish.py @@ -0,0 +1,448 @@ +from __future__ import annotations + +import json +import textwrap +import urllib.error +import urllib.request +from datetime import date +from pathlib import Path +from typing import Any + +import yaml + +from reuse_surface.llm_bridge import request_registry_draft +from reuse_surface.registry import load_index_at, registry_paths + +SCAFFOLD_README = """# Capability Registry + +Markdown-first capability index for federation and reuse planning. + +## Authoring + +1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`). +2. Add the row to `indexes/capabilities.yaml`. +3. Run `reuse-surface validate` from a checkout with the CLI installed. +4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`. + +Federation contract: reuse-surface `docs/RegistryFederation.md`. +""" + +CONTEXT_FILES = ( + "INTENT.md", + "SCOPE.md", + "AGENTS.md", + "README.md", + "pyproject.toml", + "Cargo.toml", + "go.mod", +) + + +def scaffold_registry( + repo_root: Path, + *, + domain: str = "helix_forge", + force: bool = False, +) -> list[Path]: + paths = registry_paths(repo_root) + created: list[Path] = [] + if paths["registry"].exists() and not force: + raise ValueError( + f"registry already exists at {paths['registry']}; use --force to overwrite" + ) + + paths["registry"].mkdir(parents=True, exist_ok=True) + paths["capabilities"].mkdir(parents=True, exist_ok=True) + paths["index"].parent.mkdir(parents=True, exist_ok=True) + + readme = paths["registry"] / "README.md" + if force or not readme.exists(): + readme.write_text(SCAFFOLD_README, encoding="utf-8") + created.append(readme) + + gitkeep = paths["capabilities"] / ".gitkeep" + if force or not gitkeep.exists(): + gitkeep.write_text("", encoding="utf-8") + created.append(gitkeep) + + index_data = { + "version": 1, + "updated": date.today().isoformat(), + "domain": domain, + "capabilities": [], + } + if force or not paths["index"].exists(): + paths["index"].write_text( + yaml.safe_dump(index_data, sort_keys=False, allow_unicode=True), + encoding="utf-8", + ) + created.append(paths["index"]) + return created + + +def scaffold_next_steps(repo_root: Path) -> str: + return textwrap.dedent( + f""" + Next steps: + 1. Add capability entries under {repo_root / 'registry/capabilities'} + 2. Update {repo_root / 'registry/indexes/capabilities.yaml'} + 3. reuse-surface validate + 4. git push origin main + 5. reuse-surface establish --publish-check --raw-url + 6. reuse-surface hub register --repo --url + """ + ).strip() + + +def publish_check( + repo_root: Path, + *, + raw_url: str | None = None, +) -> dict[str, Any]: + paths = registry_paths(repo_root) + result: dict[str, Any] = { + "repo_root": str(repo_root), + "checks": [], + "ok": True, + } + + if paths["index"].exists(): + try: + data = load_index_at(paths["index"]) + valid = isinstance(data, dict) and isinstance(data.get("capabilities"), list) + result["checks"].append( + { + "name": "local_index_yaml", + "ok": valid, + "detail": f"{len(data.get('capabilities', []))} capabilities" + if valid + else "invalid structure", + } + ) + if not valid: + result["ok"] = False + except (OSError, yaml.YAMLError) as exc: + result["checks"].append( + {"name": "local_index_yaml", "ok": False, "detail": str(exc)} + ) + result["ok"] = False + else: + result["checks"].append( + { + "name": "local_index_yaml", + "ok": False, + "detail": "registry/indexes/capabilities.yaml missing", + } + ) + result["ok"] = False + + if raw_url: + probe = _probe_raw_url(raw_url) + result["checks"].append( + { + "name": "raw_url_probe", + "ok": probe["ok"], + "detail": f"HTTP {probe.get('status')} {probe.get('content_type', '')}".strip(), + "url": raw_url, + } + ) + if probe["ok"]: + body_probe = _fetch_yaml_snippet(raw_url) + result["checks"].append(body_probe) + if not body_probe.get("ok"): + result["ok"] = False + else: + result["ok"] = False + result["remediation"] = ( + "Merge registry/indexes/capabilities.yaml to main and confirm " + "Gitea raw URL returns 200 YAML. See docs/RegistryFederation.md." + ) + + return result + + +def _probe_raw_url(url: str) -> dict[str, Any]: + request = urllib.request.Request( + url, + method="HEAD", + headers={"User-Agent": "reuse-surface/0.1"}, + ) + try: + with urllib.request.urlopen(request, timeout=30) as response: + return { + "ok": response.status == 200, + "status": response.status, + "content_type": response.headers.get("Content-Type", ""), + } + except urllib.error.HTTPError as exc: + return { + "ok": False, + "status": exc.code, + "content_type": exc.headers.get("Content-Type", ""), + } + + +def _fetch_yaml_snippet(url: str) -> dict[str, Any]: + request = urllib.request.Request(url, headers={"User-Agent": "reuse-surface/0.1"}) + try: + with urllib.request.urlopen(request, timeout=30) as response: + body = response.read().decode("utf-8") + except urllib.error.HTTPError as exc: + return {"name": "raw_url_body", "ok": False, "detail": f"HTTP {exc.code}"} + except urllib.error.URLError as exc: + return {"name": "raw_url_body", "ok": False, "detail": str(exc.reason)} + try: + data = yaml.safe_load(body) + except yaml.YAMLError as exc: + return {"name": "raw_url_body", "ok": False, "detail": str(exc)} + ok = isinstance(data, dict) and "capabilities" in data + return { + "name": "raw_url_body", + "ok": ok, + "detail": "valid capabilities.yaml shape" if ok else "body is not valid index YAML", + } + + +def collect_context(repo_root: Path, *, max_files: int = 12) -> str: + chunks: list[str] = [] + used = 0 + for name in CONTEXT_FILES: + if used >= max_files: + break + path = repo_root / name + if path.is_file(): + chunks.append(f"### {name}\n{path.read_text(encoding='utf-8')[:8000]}") + used += 1 + pkg_dirs = sorted( + [ + item + for item in repo_root.iterdir() + if item.is_dir() + and not item.name.startswith(".") + and item.name not in {"registry", "tests", "docs", "workplans", "node_modules"} + ] + ) + for pkg in pkg_dirs[: max(0, max_files - used)]: + init = pkg / "__init__.py" + if init.exists(): + chunks.append(f"### {pkg.name}/__init__.py\n{init.read_text(encoding='utf-8')[:2000]}") + return "\n\n".join(chunks) + + +def build_discover_prompt(context: str, domain: str) -> str: + schema_hint = json.dumps( + { + "domain": domain, + "capabilities": [ + { + "id": "capability.domain.name", + "name": "Human Name", + "summary": "One sentence.", + "owner": "team", + "vector": "D2 / A0 / C0 / R0", + "tags": ["tag"], + "consumption_modes": ["informational"], + "discovery_intent": "What this enables.", + "discovery_includes": ["included behavior"], + "discovery_excludes": ["excluded behavior"], + } + ], + }, + indent=2, + ) + return textwrap.dedent( + f""" + You are drafting a capability registry index for helix_forge reuse-surface. + + Return ONLY a JSON object matching this shape (no markdown fences): + {schema_hint} + + Rules: + - Propose 1-5 distinct capabilities grounded in the repository context. + - Use IDs matching ^capability\\.[a-z0-9]+(\\.[a-z0-9-]+)+$ + - Default vector D2 / A0 / C0 / R0 unless strong delivery evidence exists. + - domain: {domain} + + Repository context: + {context} + """ + ).strip() + + +def discover_capabilities( + repo_root: Path, + *, + domain: str = "helix_forge", + dry_run: bool = True, + apply: bool = False, + llm_url: str | None = None, + context_max_files: int = 12, +) -> dict[str, Any]: + if apply and dry_run: + raise ValueError("use either --dry-run or --apply, not both") + if not apply and not dry_run: + dry_run = True + + context = collect_context(repo_root, max_files=context_max_files) + if not context.strip(): + raise ValueError("no context files found for discovery") + + prompt = build_discover_prompt(context, domain) + draft = request_registry_draft( + prompt, + base_url=llm_url, + config={"temperature": 0.2, "max_tokens": 4000}, + ) + + result: dict[str, Any] = {"draft": draft, "written": [], "dry_run": dry_run} + if dry_run: + return result + + paths = registry_paths(repo_root) + if not paths["index"].exists(): + scaffold_registry(repo_root, domain=domain, force=False) + + index = load_index_at(paths["index"]) if paths["index"].exists() else { + "version": 1, + "domain": domain, + "capabilities": [], + } + existing_ids = {row["id"] for row in index.get("capabilities", [])} + + for item in draft.get("capabilities", []): + cap_id = item["id"] + if cap_id in existing_ids: + continue + filename = cap_id.replace(".", "-") + ".md" + rel_path = f"registry/capabilities/{filename}" + entry_path = repo_root / rel_path + entry_body = _render_entry_from_draft(item, domain) + entry_path.parent.mkdir(parents=True, exist_ok=True) + entry_path.write_text(entry_body, encoding="utf-8") + vector = item.get("vector", "D2 / A0 / C0 / R0") + index.setdefault("capabilities", []).append( + { + "id": cap_id, + "name": item["name"], + "summary": item["summary"], + "vector": vector, + "domain": domain, + "status": "draft", + "owner": item.get("owner", repo_root.name), + "path": rel_path, + "tags": item.get("tags", []), + "consumption_modes": item.get("consumption_modes", ["informational"]), + } + ) + result["written"].append(rel_path) + + index["updated"] = date.today().isoformat() + index["domain"] = draft.get("domain", domain) + paths["index"].write_text( + yaml.safe_dump(index, sort_keys=False, allow_unicode=True), + encoding="utf-8", + ) + result["written"].append(str(paths["index"].relative_to(repo_root))) + return result + + +def _render_entry_from_draft(item: dict[str, Any], domain: str) -> str: + vector = item.get("vector", "D2 / A0 / C0 / R0") + d, a, c, r = [part.strip() for part in vector.split("/")] + front_matter = { + "id": item["id"], + "name": item["name"], + "summary": item["summary"], + "owner": item.get("owner", domain), + "status": "draft", + "domain": domain, + "tags": item.get("tags") or ["draft"], + "maturity": { + "discovery": { + "current": d, + "target": "D5", + "confidence": "low", + "rationale": "Auto-drafted by reuse-surface establish --discover; review required.", + }, + "availability": { + "current": a, + "target": "A3", + "confidence": "low", + "rationale": "Auto-drafted; confirm consumption modes and artifacts.", + }, + }, + "external_evidence": { + "completeness": { + "level": c, + "confidence": "low", + "basis": "scope_vs_intent_and_consumer_expectations", + "satisfied_expectations": [], + "broken_expectations": [], + "out_of_scope_expectations": [], + }, + "reliability": { + "level": r, + "confidence": "low", + "basis": "consumer_quality_signals", + "known_reliability_risks": ["auto-drafted entry without consumer evidence"], + }, + }, + "discovery": { + "intent": item.get("discovery_intent", item["summary"]), + "includes": item.get("discovery_includes") or [], + "excludes": item.get("discovery_excludes") or [], + "assumptions": [], + "use_cases": [], + "research_memos": [], + }, + "availability": { + "current_level": a, + "target_level": "A3", + "current_artifacts": [], + "target_artifacts": [], + "consumption_modes": item.get("consumption_modes") or ["informational"], + }, + "relations": {"depends_on": [], "supports": [], "related_to": []}, + "evidence": { + "documentation": [], + "tests": [], + "consumer_feedback": [], + "bug_reports": [], + "incidents": [], + }, + "consumer_guidance": { + "recommended_for": ["planning reuse after human review"], + "not_recommended_for": ["implementation reuse before validation"], + "known_limitations": ["discover draft — verify maturity claims"], + }, + "promotion_history": [], + } + markdown = ( + f"# {item['name']}\n\n" + "Auto-drafted capability entry. Review maturity, evidence, and relations " + "before promoting.\n" + ) + return ( + "---\n" + + yaml.safe_dump(front_matter, sort_keys=False, allow_unicode=True) + + "---\n\n" + + markdown + ) + + +def format_publish_check_markdown(result: dict[str, Any]) -> str: + lines = ["# Federation publish check", ""] + lines.append(f"**Repo:** `{result['repo_root']}`") + lines.append(f"**Result:** {'PASS' if result['ok'] else 'FAIL'}") + lines.append("") + for check in result["checks"]: + status = "ok" if check["ok"] else "FAIL" + detail = check.get("detail", "") + name = check["name"] + lines.append(f"- **{name}**: {status} — {detail}") + if check.get("url"): + lines.append(f" `{check['url']}`") + if result.get("remediation"): + lines.append("") + lines.append(f"**Remediation:** {result['remediation']}") + return "\n".join(lines) + "\n" \ No newline at end of file diff --git a/reuse_surface/llm_bridge.py b/reuse_surface/llm_bridge.py new file mode 100644 index 0000000..212e5bc --- /dev/null +++ b/reuse_surface/llm_bridge.py @@ -0,0 +1,102 @@ +from __future__ import annotations + +import json +import os +import re +import urllib.error +import urllib.request +from pathlib import Path +from typing import Any + +from jsonschema import Draft202012Validator + +from reuse_surface.registry import ROOT + +DRAFT_SCHEMA_PATH = ROOT / "schemas" / "registry-draft.schema.json" + + +def llm_connect_url(explicit: str | None = None) -> str: + base = (explicit or os.environ.get("LLM_CONNECT_URL", "")).rstrip("/") + if not base: + raise ValueError( + "LLM backend not configured; set LLM_CONNECT_URL or pass --llm-url" + ) + return base + + +def load_draft_schema() -> dict[str, Any]: + return json.loads(DRAFT_SCHEMA_PATH.read_text(encoding="utf-8")) + + +def execute_prompt( + prompt: str, + *, + base_url: str | None = None, + config: dict[str, Any] | None = None, +) -> str: + url = f"{llm_connect_url(base_url)}/execute" + body: dict[str, Any] = {"prompt": prompt} + if config: + body["config"] = config + data = json.dumps(body).encode("utf-8") + request = urllib.request.Request( + url, + data=data, + headers={ + "Content-Type": "application/json", + "Accept": "application/json", + "User-Agent": "reuse-surface/0.1", + }, + method="POST", + ) + try: + with urllib.request.urlopen(request, timeout=120) as response: + payload = json.loads(response.read().decode("utf-8")) + except urllib.error.HTTPError as exc: + raw = exc.read().decode("utf-8") + raise ValueError(f"llm-connect returned {exc.code}: {raw}") from exc + content = payload.get("content") + if not isinstance(content, str) or not content.strip(): + raise ValueError("llm-connect response missing content") + return content + + +def extract_json_object(text: str) -> dict[str, Any]: + stripped = text.strip() + if stripped.startswith("```"): + stripped = re.sub(r"^```(?:json)?\s*", "", stripped) + stripped = re.sub(r"\s*```$", "", stripped) + try: + data = json.loads(stripped) + except json.JSONDecodeError: + match = re.search(r"\{.*\}", stripped, re.DOTALL) + if not match: + raise ValueError("llm response did not contain JSON object") from None + data = json.loads(match.group(0)) + if not isinstance(data, dict): + raise ValueError("llm response JSON must be an object") + return data + + +def request_registry_draft( + prompt: str, + *, + base_url: str | None = None, + config: dict[str, Any] | None = None, +) -> dict[str, Any]: + draft = extract_json_object(execute_prompt(prompt, base_url=base_url, config=config)) + validator = Draft202012Validator(load_draft_schema()) + errors = sorted(validator.iter_errors(draft), key=lambda err: list(err.path)) + if errors: + messages = "; ".join(error.message for error in errors[:3]) + raise ValueError(f"draft schema validation failed: {messages}") + return draft + + +def request_json_object( + prompt: str, + *, + base_url: str | None = None, + config: dict[str, Any] | None = None, +) -> dict[str, Any]: + return extract_json_object(execute_prompt(prompt, base_url=base_url, config=config)) \ No newline at end of file diff --git a/reuse_surface/registry.py b/reuse_surface/registry.py index 043499f..e183a5d 100644 --- a/reuse_surface/registry.py +++ b/reuse_surface/registry.py @@ -60,4 +60,31 @@ def parse_vector(vector: str) -> dict[str, str]: def level_at_least(dimension: str, current: str, minimum: str) -> bool: order = LEVEL_ORDERS[dimension] - return order.index(current) >= order.index(minimum) \ No newline at end of file + return order.index(current) >= order.index(minimum) + + +def registry_paths(repo_root: Path) -> dict[str, Path]: + registry = repo_root / "registry" + return { + "registry": registry, + "capabilities": registry / "capabilities", + "index": registry / "indexes" / "capabilities.yaml", + "sources": registry / "federation" / "sources.yaml", + } + + +def load_index_at(path: Path) -> dict[str, Any]: + with path.open(encoding="utf-8") as handle: + return yaml.safe_load(handle) + + +def entry_vector(front_matter: dict[str, Any]) -> str: + discovery = front_matter["maturity"]["discovery"]["current"] + availability = front_matter["maturity"]["availability"]["current"] + completeness = front_matter["external_evidence"]["completeness"]["level"] + reliability = front_matter["external_evidence"]["reliability"]["level"] + return f"{discovery} / {availability} / {completeness} / {reliability}" + + +def vectors_match(index_vector: str, front_matter: dict[str, Any]) -> bool: + return index_vector.replace(" ", "") == entry_vector(front_matter).replace(" ", "") \ No newline at end of file diff --git a/reuse_surface/registry_update.py b/reuse_surface/registry_update.py new file mode 100644 index 0000000..e19b10e --- /dev/null +++ b/reuse_surface/registry_update.py @@ -0,0 +1,273 @@ +from __future__ import annotations + +import json +import subprocess +import textwrap +from pathlib import Path +from typing import Any + +import yaml + +from reuse_surface.llm_bridge import request_json_object +from reuse_surface.registry import ( + entry_vector, + load_index_at, + parse_front_matter, + registry_paths, + vectors_match, +) + +SAFE_EVIDENCE_PREFIXES = ("tests/", ".gitea/workflows/") + + +def git_changed_files(repo_root: Path, since_ref: str) -> list[str]: + result = subprocess.run( + ["git", "-C", str(repo_root), "diff", "--name-only", since_ref, "HEAD"], + capture_output=True, + text=True, + check=False, + ) + if result.returncode != 0: + raise ValueError(result.stderr.strip() or f"git diff failed for {since_ref}") + return [line.strip() for line in result.stdout.splitlines() if line.strip()] + + +def collect_deterministic_suggestions( + repo_root: Path, + *, + capability_id: str | None = None, + git_since: str | None = None, +) -> list[dict[str, Any]]: + paths = registry_paths(repo_root) + if not paths["index"].exists(): + raise ValueError("registry index missing; run establish --scaffold first") + + index = load_index_at(paths["index"]) + rows = index.get("capabilities", []) + if capability_id: + rows = [row for row in rows if row["id"] == capability_id] + if not rows: + raise ValueError(f"capability not in index: {capability_id}") + + changed_files = git_changed_files(repo_root, git_since) if git_since else [] + suggestions: list[dict[str, Any]] = [] + + for row in rows: + entry_path = repo_root / row["path"] + if not entry_path.exists(): + suggestions.append( + { + "capability_id": row["id"], + "kind": "missing_entry", + "detail": f"missing file {row['path']}", + } + ) + continue + + front_matter = parse_front_matter(entry_path) + if not vectors_match(row["vector"], front_matter): + suggestions.append( + { + "capability_id": row["id"], + "kind": "vector_drift", + "detail": "index vector differs from entry front matter", + "index_vector": row["vector"], + "entry_vector": entry_vector(front_matter), + "apply_patch": { + "field": "index.vector", + "value": entry_vector(front_matter), + }, + } + ) + + evidence_tests = front_matter.get("evidence", {}).get("tests", []) + for changed in changed_files: + if changed.startswith("tests/") and changed not in evidence_tests: + suggestions.append( + { + "capability_id": row["id"], + "kind": "evidence_test", + "detail": f"new test file not cited: {changed}", + "apply_patch": { + "field": "evidence.tests", + "append": changed, + }, + } + ) + + artifacts = front_matter.get("availability", {}).get("current_artifacts", []) + for changed in changed_files: + if changed.endswith(".py") and changed.startswith( + tuple( + p.name + "/" + for p in repo_root.iterdir() + if p.is_dir() and (p / "__init__.py").exists() + ) + ): + if changed not in artifacts: + suggestions.append( + { + "capability_id": row["id"], + "kind": "availability_artifact", + "detail": f"changed module not cited: {changed}", + "apply_patch": { + "field": "availability.current_artifacts", + "append": changed, + }, + } + ) + + return suggestions + + +def apply_deterministic_suggestions( + repo_root: Path, + suggestions: list[dict[str, Any]], +) -> list[str]: + paths = registry_paths(repo_root) + index = load_index_at(paths["index"]) + index_by_id = {row["id"]: row for row in index.get("capabilities", [])} + changed: list[str] = [] + + entry_cache: dict[str, dict[str, Any]] = {} + entry_paths: dict[str, Path] = {} + + for suggestion in suggestions: + patch = suggestion.get("apply_patch") + if not patch: + continue + cap_id = suggestion["capability_id"] + if patch["field"] == "index.vector" and cap_id in index_by_id: + index_by_id[cap_id]["vector"] = patch["value"] + changed.append(f"index vector for {cap_id}") + + row = index_by_id.get(cap_id) + if not row: + continue + entry_path = repo_root / row["path"] + if cap_id not in entry_cache: + entry_cache[cap_id] = parse_front_matter(entry_path) + entry_paths[cap_id] = entry_path + + front_matter = entry_cache[cap_id] + if patch["field"] == "evidence.tests": + tests = front_matter.setdefault("evidence", {}).setdefault("tests", []) + if patch["append"] not in tests: + tests.append(patch["append"]) + changed.append(f"{cap_id} evidence.tests += {patch['append']}") + if patch["field"] == "availability.current_artifacts": + artifacts = front_matter.setdefault("availability", {}).setdefault( + "current_artifacts", [] + ) + if patch["append"] not in artifacts: + artifacts.append(patch["append"]) + changed.append( + f"{cap_id} availability.current_artifacts += {patch['append']}" + ) + + if changed: + paths["index"].write_text( + yaml.safe_dump(index, sort_keys=False, allow_unicode=True), + encoding="utf-8", + ) + for cap_id, front_matter in entry_cache.items(): + _write_front_matter(entry_paths[cap_id], front_matter) + return changed + + +def _write_front_matter(path: Path, front_matter: dict[str, Any]) -> None: + text = path.read_text(encoding="utf-8") + marker_end = text.find("\n---", 4) + body = text[marker_end + 4 :] if marker_end != -1 else "\n" + path.write_text( + "---\n" + + yaml.safe_dump(front_matter, sort_keys=False, allow_unicode=True) + + "---" + + body, + encoding="utf-8", + ) + + +def build_update_prompt( + repo_root: Path, + capability_id: str, + *, + git_since: str | None = None, +) -> str: + paths = registry_paths(repo_root) + index = load_index_at(paths["index"]) + row = next((item for item in index["capabilities"] if item["id"] == capability_id), None) + if not row: + raise ValueError(f"capability not in index: {capability_id}") + entry = parse_front_matter(repo_root / row["path"]) + diff = "" + if git_since: + proc = subprocess.run( + [ + "git", + "-C", + str(repo_root), + "diff", + git_since, + "HEAD", + "--", + "registry/", + "reuse_surface/", + "tests/", + ], + capture_output=True, + text=True, + check=False, + ) + diff = proc.stdout[:12000] + + return textwrap.dedent( + f""" + Suggest registry entry updates for capability `{capability_id}`. + + Return ONLY JSON: + {{ + "promotion_history": [ + {{"date": "YYYY-MM-DD", "dimension": "availability", "from": "A3", "to": "A4", "rationale": "..."}} + ], + "consumer_feedback": ["optional string notes"], + "notes": ["human review items"] + }} + + Current entry YAML: + {yaml.safe_dump(entry, sort_keys=False)} + + Git diff since {git_since or 'N/A'}: + {diff or '(none)'} + """ + ).strip() + + +def suggest_llm_updates( + repo_root: Path, + capability_id: str, + *, + git_since: str | None = None, + llm_url: str | None = None, +) -> dict[str, Any]: + prompt = build_update_prompt(repo_root, capability_id, git_since=git_since) + return request_json_object( + prompt, + base_url=llm_url, + config={"temperature": 0.2, "max_tokens": 2000}, + ) + + +def format_suggestions_markdown(suggestions: list[dict[str, Any]]) -> str: + if not suggestions: + return "# Registry update suggestions\n\n_No suggestions._\n" + lines = ["# Registry update suggestions", ""] + for item in suggestions: + lines.append(f"- `{item['capability_id']}` **{item['kind']}**: {item['detail']}") + lines.append("") + lines.append(f"**{len(suggestions)}** suggestion(s). Use `--apply` to apply safe patches.") + return "\n".join(lines) + "\n" + + +def format_suggestions_json(suggestions: list[dict[str, Any]]) -> str: + return json.dumps({"count": len(suggestions), "suggestions": suggestions}, indent=2) \ No newline at end of file diff --git a/reuse_surface/stats.py b/reuse_surface/stats.py new file mode 100644 index 0000000..5127be9 --- /dev/null +++ b/reuse_surface/stats.py @@ -0,0 +1,259 @@ +from __future__ import annotations + +import json +import urllib.error +import urllib.request +from collections import Counter +from pathlib import Path +from typing import Any + +import yaml + +from reuse_surface import hub_client +from reuse_surface.registry import ( + LEVEL_ORDERS, + entry_vector, + load_index_at, + parse_front_matter, + parse_vector, + registry_paths, + vectors_match, +) + + +def _histogram(values: list[str], order: list[str]) -> dict[str, int]: + counts = Counter(values) + return {level: counts.get(level, 0) for level in order if counts.get(level, 0)} + + +def _probe_url(url: str) -> dict[str, Any]: + request = urllib.request.Request( + url, + method="HEAD", + headers={"User-Agent": "reuse-surface/0.1"}, + ) + try: + with urllib.request.urlopen(request, timeout=30) as response: + return { + "url": url, + "status": response.status, + "content_type": response.headers.get("Content-Type", ""), + "ok": response.status == 200, + } + except urllib.error.HTTPError as exc: + return { + "url": url, + "status": exc.code, + "content_type": exc.headers.get("Content-Type", ""), + "ok": False, + } + except urllib.error.URLError as exc: + return {"url": url, "status": None, "error": str(exc.reason), "ok": False} + + +def collect_stats( + repo_root: Path, + *, + federation_ready: bool = False, + raw_url: str | None = None, + hub_url: str | None = None, +) -> dict[str, Any]: + paths = registry_paths(repo_root) + stats: dict[str, Any] = { + "repo_root": str(repo_root), + "registry_present": paths["registry"].exists(), + "index_present": paths["index"].exists(), + "sources_present": paths["sources"].exists(), + "capability_count": 0, + "histograms": {}, + "reliability": {"r0_r2": 0, "r3_plus": 0}, + "consumption_modes": {}, + "vector_drift": [], + "federation": {}, + "hub": {}, + } + + if not paths["index"].exists(): + if federation_ready and raw_url: + stats["federation"]["raw_url_probe"] = _probe_url(raw_url) + if hub_url or _hub_configured(): + stats["hub"] = _hub_summary(hub_url) + return stats + + index = load_index_at(paths["index"]) + capabilities = index.get("capabilities", []) + stats["capability_count"] = len(capabilities) + stats["domain"] = index.get("domain") + + discovery: list[str] = [] + availability: list[str] = [] + completeness: list[str] = [] + reliability: list[str] = [] + mode_counts: Counter[str] = Counter() + + for row in capabilities: + vector = parse_vector(row["vector"]) + discovery.append(vector["discovery"]) + availability.append(vector["availability"]) + completeness.append(vector["completeness"]) + reliability.append(vector["reliability"]) + for mode in row.get("consumption_modes", []): + mode_counts[mode] += 1 + + entry_path = repo_root / row["path"] + if entry_path.exists(): + try: + front_matter = parse_front_matter(entry_path) + if not vectors_match(row["vector"], front_matter): + stats["vector_drift"].append( + { + "id": row["id"], + "index_vector": row["vector"], + "entry_vector": entry_vector(front_matter), + } + ) + except ValueError: + stats["vector_drift"].append( + {"id": row["id"], "error": "invalid entry front matter"} + ) + + stats["histograms"] = { + "discovery": _histogram(discovery, LEVEL_ORDERS["discovery"]), + "availability": _histogram(availability, LEVEL_ORDERS["availability"]), + "completeness": _histogram(completeness, LEVEL_ORDERS["completeness"]), + "reliability": _histogram(reliability, LEVEL_ORDERS["reliability"]), + } + stats["reliability"] = { + "r0_r2": sum(1 for level in reliability if level in {"R0", "R1", "R2"}), + "r3_plus": sum(1 for level in reliability if level_at_least_reliability(level, "R3")), + } + stats["consumption_modes"] = dict(sorted(mode_counts.items())) + + if federation_ready: + probe_url = raw_url + if not probe_url and paths["index"].exists(): + probe_url = _default_raw_url(repo_root) + if probe_url: + stats["federation"]["raw_url_probe"] = _probe_url(probe_url) + stats["federation"]["index_valid_yaml"] = _index_yaml_valid(paths["index"]) + + stats["hub"] = _hub_summary(hub_url) + return stats + + +def level_at_least_reliability(current: str, minimum: str) -> bool: + order = LEVEL_ORDERS["reliability"] + return order.index(current) >= order.index(minimum) + + +def _hub_configured() -> bool: + import os + + return bool(os.environ.get("REUSE_SURFACE_URL")) + + +def _hub_summary(hub_url: str | None) -> dict[str, Any]: + try: + status, payload = hub_client.hub_list(hub_url) + except (ValueError, urllib.error.URLError, OSError): + return {"configured": False} + if status != 200: + return {"configured": True, "status": status, "error": payload} + repos = payload.get("repos", []) + return { + "configured": True, + "registration_count": payload.get("count", len(repos)), + "enabled_count": sum(1 for repo in repos if repo.get("enabled", True)), + } + + +def _default_raw_url(repo_root: Path) -> str | None: + return None + + +def _index_yaml_valid(index_path: Path) -> bool: + try: + data = load_index_at(index_path) + return isinstance(data, dict) and "capabilities" in data + except (OSError, yaml.YAMLError): + return False + + +def format_stats_markdown(stats: dict[str, Any]) -> str: + lines = ["# Registry stats", ""] + lines.append(f"**Repo:** `{stats['repo_root']}`") + lines.append(f"**Capabilities:** {stats['capability_count']}") + if stats.get("domain"): + lines.append(f"**Domain:** `{stats['domain']}`") + lines.append("") + + lines.append("## Layout") + lines.append(f"- registry present: `{stats['registry_present']}`") + lines.append(f"- index present: `{stats['index_present']}`") + lines.append(f"- federation sources present: `{stats['sources_present']}`") + lines.append("") + + rel = stats["reliability"] + lines.append("## Reliability bands (index vectors)") + lines.append(f"- R0–R2: **{rel['r0_r2']}**") + lines.append(f"- R3+: **{rel['r3_plus']}**") + lines.append("") + + for dimension, histogram in stats.get("histograms", {}).items(): + if not histogram: + continue + lines.append(f"## {dimension.title()} histogram") + for level, count in histogram.items(): + lines.append(f"- `{level}`: {count}") + lines.append("") + + if stats.get("consumption_modes"): + lines.append("## Consumption modes") + for mode, count in stats["consumption_modes"].items(): + lines.append(f"- `{mode}`: {count}") + lines.append("") + + drift = stats.get("vector_drift", []) + lines.append(f"## Vector drift: **{len(drift)}**") + for item in drift[:10]: + if "error" in item: + lines.append(f"- `{item['id']}`: {item['error']}") + else: + lines.append( + f"- `{item['id']}`: index `{item['index_vector']}` " + f"≠ entry `{item['entry_vector']}`" + ) + if len(drift) > 10: + lines.append(f"- … and {len(drift) - 10} more") + lines.append("") + + federation = stats.get("federation", {}) + if federation: + lines.append("## Federation readiness") + if "index_valid_yaml" in federation: + lines.append(f"- index valid YAML: `{federation['index_valid_yaml']}`") + probe = federation.get("raw_url_probe") + if probe: + status = probe.get("status") + ok = probe.get("ok") + lines.append(f"- raw URL probe: status **{status}** ({'ok' if ok else 'fail'})") + lines.append(f" `{probe.get('url', '')}`") + lines.append("") + + hub = stats.get("hub", {}) + if hub.get("configured"): + lines.append("## Hub") + if "registration_count" in hub: + lines.append( + f"- registrations: **{hub['registration_count']}** " + f"({hub.get('enabled_count', 0)} enabled)" + ) + elif "error" in hub: + lines.append(f"- hub error: {hub['error']}") + lines.append("") + + return "\n".join(lines) + "\n" + + +def format_stats_json(stats: dict[str, Any]) -> str: + return json.dumps(stats, indent=2, sort_keys=True) \ No newline at end of file diff --git a/schemas/registry-draft.schema.json b/schemas/registry-draft.schema.json new file mode 100644 index 0000000..8d8d75e --- /dev/null +++ b/schemas/registry-draft.schema.json @@ -0,0 +1,69 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://reuse-surface.local/schemas/registry-draft.schema.json", + "title": "RegistryDiscoveryDraft", + "type": "object", + "additionalProperties": false, + "required": ["capabilities"], + "properties": { + "domain": { + "type": "string" + }, + "capabilities": { + "type": "array", + "items": { + "type": "object", + "additionalProperties": false, + "required": ["id", "name", "summary"], + "properties": { + "id": { + "type": "string", + "pattern": "^capability\\.[a-z0-9]+(\\.[a-z0-9-]+)+$" + }, + "name": { + "type": "string", + "minLength": 1 + }, + "summary": { + "type": "string", + "minLength": 1 + }, + "owner": { + "type": "string" + }, + "vector": { + "type": "string", + "pattern": "^D[0-7] / A[0-7] / C[0-6] / R[0-6]$" + }, + "tags": { + "type": "array", + "items": { + "type": "string" + } + }, + "consumption_modes": { + "type": "array", + "items": { + "type": "string" + } + }, + "discovery_intent": { + "type": "string" + }, + "discovery_includes": { + "type": "array", + "items": { + "type": "string" + } + }, + "discovery_excludes": { + "type": "array", + "items": { + "type": "string" + } + } + } + } + } + } +} \ No newline at end of file diff --git a/tests/test_establish.py b/tests/test_establish.py new file mode 100644 index 0000000..cd78d6c --- /dev/null +++ b/tests/test_establish.py @@ -0,0 +1,77 @@ +from __future__ import annotations + +from pathlib import Path +from unittest.mock import patch + +import yaml + +from reuse_surface.establish import ( + discover_capabilities, + publish_check, + scaffold_registry, +) +from reuse_surface.registry import registry_paths + + +def test_scaffold_creates_layout(tmp_path: Path): + created = scaffold_registry(tmp_path, domain="helix_forge") + paths = registry_paths(tmp_path) + assert paths["index"] in created + data = yaml.safe_load(paths["index"].read_text(encoding="utf-8")) + assert data["capabilities"] == [] + assert data["domain"] == "helix_forge" + + +def test_scaffold_refuses_existing_without_force(tmp_path: Path): + scaffold_registry(tmp_path) + try: + scaffold_registry(tmp_path) + raise AssertionError("expected ValueError") + except ValueError as exc: + assert "already exists" in str(exc) + + +def test_publish_check_local_index(tmp_path: Path): + scaffold_registry(tmp_path) + result = publish_check(tmp_path) + assert result["ok"] is True + assert any(check["name"] == "local_index_yaml" for check in result["checks"]) + + +def test_publish_check_raw_url_fail(tmp_path: Path): + with patch( + "reuse_surface.establish._probe_raw_url", + return_value={"ok": False, "status": 303, "content_type": "text/html"}, + ): + result = publish_check( + tmp_path, + raw_url="https://example.com/capabilities.yaml", + ) + assert result["ok"] is False + assert result.get("remediation") + + +def test_discover_dry_run_mock_llm(tmp_path: Path): + scaffold_registry(tmp_path) + (tmp_path / "README.md").write_text("# Demo service\n", encoding="utf-8") + draft = { + "domain": "helix_forge", + "capabilities": [ + { + "id": "capability.demo.sample", + "name": "Sample", + "summary": "Sample capability.", + "owner": "demo", + "vector": "D2 / A0 / C0 / R0", + "tags": ["demo"], + "consumption_modes": ["informational"], + "discovery_intent": "Enable demo planning.", + } + ], + } + with patch( + "reuse_surface.establish.request_registry_draft", + return_value=draft, + ): + result = discover_capabilities(tmp_path, dry_run=True, apply=False) + assert result["draft"]["capabilities"][0]["id"] == "capability.demo.sample" \ No newline at end of file diff --git a/tests/test_llm_bridge.py b/tests/test_llm_bridge.py new file mode 100644 index 0000000..76a48e5 --- /dev/null +++ b/tests/test_llm_bridge.py @@ -0,0 +1,53 @@ +from __future__ import annotations + +import json +from unittest.mock import patch + +import pytest + +from reuse_surface.llm_bridge import ( + extract_json_object, + llm_connect_url, + request_registry_draft, +) + + +def test_extract_json_object_from_fenced_block(): + data = extract_json_object('```json\n{"capabilities": []}\n```') + assert data == {"capabilities": []} + + +def test_llm_connect_url_missing_raises(): + with pytest.raises(ValueError, match="LLM_CONNECT_URL"): + llm_connect_url(None) + + +def test_request_registry_draft_mock_http(): + payload = { + "content": json.dumps( + { + "capabilities": [ + { + "id": "capability.demo.sample", + "name": "Sample", + "summary": "Demo capability", + } + ] + } + ) + } + + class FakeResponse: + def __enter__(self): + return self + + def __exit__(self, *args): + return False + + def read(self): + return json.dumps(payload).encode("utf-8") + + with patch.dict("os.environ", {"LLM_CONNECT_URL": "http://llm.test"}): + with patch("urllib.request.urlopen", return_value=FakeResponse()): + draft = request_registry_draft("test prompt") + assert draft["capabilities"][0]["id"] == "capability.demo.sample" \ No newline at end of file diff --git a/tests/test_registry_update.py b/tests/test_registry_update.py new file mode 100644 index 0000000..6022acb --- /dev/null +++ b/tests/test_registry_update.py @@ -0,0 +1,87 @@ +from __future__ import annotations + +from pathlib import Path + +import yaml + +from reuse_surface.establish import scaffold_registry +from reuse_surface.registry import load_index_at, registry_paths +from reuse_surface.registry_update import ( + apply_deterministic_suggestions, + collect_deterministic_suggestions, +) + + +def _write_minimal_entry(tmp_path: Path, cap_id: str, vector: str) -> str: + rel = "registry/capabilities/capability-demo-sample.md" + d, a, c, r = [part.strip() for part in vector.split("/")] + front_matter = { + "id": cap_id, + "name": "Sample", + "summary": "Sample", + "owner": "demo", + "status": "draft", + "domain": "helix_forge", + "tags": ["demo"], + "maturity": { + "discovery": {"current": d, "target": "D5", "confidence": "low"}, + "availability": {"current": a, "target": "A3", "confidence": "low"}, + }, + "external_evidence": { + "completeness": {"level": c, "confidence": "low"}, + "reliability": {"level": r, "confidence": "low"}, + }, + "discovery": {"intent": "demo", "includes": [], "excludes": []}, + "availability": { + "current_level": a, + "target_level": "A3", + "current_artifacts": [], + "consumption_modes": ["informational"], + }, + "relations": {"depends_on": [], "supports": [], "related_to": []}, + "evidence": {"documentation": [], "tests": []}, + "consumer_guidance": { + "recommended_for": [], + "not_recommended_for": [], + "known_limitations": [], + }, + } + path = tmp_path / rel + path.parent.mkdir(parents=True, exist_ok=True) + path.write_text( + "---\n" + + yaml.safe_dump(front_matter, sort_keys=False) + + "---\n", + encoding="utf-8", + ) + return rel + + +def test_vector_drift_suggestion(tmp_path: Path): + scaffold_registry(tmp_path) + cap_id = "capability.demo.sample" + rel = _write_minimal_entry(tmp_path, cap_id, "D3 / A0 / C0 / R0") + index_path = registry_paths(tmp_path)["index"] + index = load_index_at(index_path) + index["capabilities"] = [ + { + "id": cap_id, + "name": "Sample", + "summary": "Sample", + "vector": "D2 / A0 / C0 / R0", + "domain": "helix_forge", + "status": "draft", + "owner": "demo", + "path": rel, + "tags": ["demo"], + "consumption_modes": ["informational"], + } + ] + index_path.write_text(yaml.safe_dump(index, sort_keys=False), encoding="utf-8") + + suggestions = collect_deterministic_suggestions(tmp_path, capability_id=cap_id) + assert any(item["kind"] == "vector_drift" for item in suggestions) + changed = apply_deterministic_suggestions(tmp_path, suggestions) + assert changed + updated = load_index_at(index_path) + assert updated["capabilities"][0]["vector"] == "D3 / A0 / C0 / R0" \ No newline at end of file diff --git a/tests/test_stats.py b/tests/test_stats.py new file mode 100644 index 0000000..2e5cdcb --- /dev/null +++ b/tests/test_stats.py @@ -0,0 +1,20 @@ +from __future__ import annotations + +from pathlib import Path + +from reuse_surface.stats import collect_stats, format_stats_markdown + + +def test_collect_stats_on_repo_root(): + root = Path(__file__).resolve().parent.parent + stats = collect_stats(root) + assert stats["capability_count"] == 20 + assert stats["index_present"] is True + assert "discovery" in stats["histograms"] + + +def test_format_stats_markdown_contains_count(): + root = Path(__file__).resolve().parent.parent + text = format_stats_markdown(collect_stats(root)) + assert "Capabilities:" in text + assert "20" in text \ No newline at end of file diff --git a/tools/README.md b/tools/README.md index 0da6ce1..83f94bf 100644 --- a/tools/README.md +++ b/tools/README.md @@ -104,6 +104,45 @@ reuse-surface hub sync --dry-run Run the service locally: `REUSE_SURFACE_TOKEN=dev-token reuse-surface serve` +### stats + +Registry maturity aggregates and federation readiness. + +```bash +reuse-surface stats +reuse-surface stats --format json +reuse-surface stats --federation-ready --raw-url https://.../capabilities.yaml +``` + +### establish + +Bootstrap or discover a capability registry in the current or target repo. + +```bash +reuse-surface establish --scaffold --domain helix_forge +reuse-surface establish --scaffold --path ../state-hub +reuse-surface establish --publish-check --raw-url https://.../capabilities.yaml +export LLM_CONNECT_URL=http://127.0.0.1:8088 +reuse-surface establish --discover --dry-run +reuse-surface establish --discover --apply +``` + +`--scaffold` creates `registry/` layout. `--publish-check` probes raw URL and +local index YAML. `--discover` drafts capabilities via llm-connect (optional). + +### update + +Refresh registry metadata from repo drift signals. + +```bash +reuse-surface update --capability capability.registry.register --dry-run +reuse-surface update --all --from-git-since HEAD~5 --apply +reuse-surface update --capability capability.registry.register --suggest-maturity +``` + +Deterministic patches (`vector_drift`, new `tests/` citations) apply with +`--apply`. LLM suggestions use `--suggest-maturity` and remain review-only. + ### report cohorts Export capability cohorts for planning or implementation reuse decisions. @@ -140,6 +179,11 @@ Stable IDs and maturity fields are preserved for agent consumption (UC-RS-019). | Publish catalog | `reuse-surface catalog` | | Compose federation | `reuse-surface federation compose` | | Sync federation manifest from hub | `reuse-surface hub sync` | +| Registry stats | `reuse-surface stats` | +| Bootstrap sibling registry | `reuse-surface establish --scaffold` | +| Verify index publish URL | `reuse-surface establish --publish-check` | +| Draft capabilities (LLM) | `reuse-surface establish --discover` | +| Refresh entry metadata | `reuse-surface update` | | Planning cohort export | `reuse-surface report cohorts` | | Relation graph | `reuse-surface graph` | diff --git a/workplans/REUSE-WP-0013-registry-establish-and-llm-assist.md b/workplans/archived/260617-REUSE-WP-0013-registry-establish-and-llm-assist.md similarity index 90% rename from workplans/REUSE-WP-0013-registry-establish-and-llm-assist.md rename to workplans/archived/260617-REUSE-WP-0013-registry-establish-and-llm-assist.md index ffb7852..6991900 100644 --- a/workplans/REUSE-WP-0013-registry-establish-and-llm-assist.md +++ b/workplans/archived/260617-REUSE-WP-0013-registry-establish-and-llm-assist.md @@ -4,11 +4,11 @@ type: workplan title: "Registry establish, update, and stats with optional llm-connect assist" domain: helix_forge repo: reuse-surface -status: ready +status: finished owner: codex topic_slug: helix-forge created: "2026-06-16" -updated: "2026-06-16" +updated: "2026-06-17" state_hub_workstream_id: "239a0077-8593-4dc7-918d-4c23895275f6" --- @@ -91,7 +91,7 @@ reuse-surface update --from-git-since HEAD~5 --apply ```task id: REUSE-WP-0013-T01 -status: todo +status: done priority: high state_hub_task_id: "98e65330-bfc7-4282-b372-d35542b899ce" ``` @@ -112,7 +112,7 @@ Output: Markdown default, `--format json`. Pytest coverage. Document in ```task id: REUSE-WP-0013-T02 -status: todo +status: done priority: high state_hub_task_id: "b8fedd87-d0d3-41b4-9af8-e36d52bfe1c5" ``` @@ -131,7 +131,7 @@ No llm-connect dependency. Pytest with temp directory. ```task id: REUSE-WP-0013-T03 -status: todo +status: done priority: medium state_hub_task_id: "2924d685-709f-4e28-886f-b363cd9c40b4" ``` @@ -147,7 +147,7 @@ Federation publish helper for sibling repo operators: ```task id: REUSE-WP-0013-T04 -status: todo +status: done priority: high state_hub_task_id: "650ebee5-b34b-4ed8-891d-d93aacebadd7" ``` @@ -166,7 +166,7 @@ Thin client boundary: ```task id: REUSE-WP-0013-T05 -status: todo +status: done priority: medium state_hub_task_id: "b9154889-f538-4266-9918-b277f9a297be" ``` @@ -185,7 +185,7 @@ LLM-assisted bootstrap after `--scaffold` or on empty registry: ```task id: REUSE-WP-0013-T06 -status: todo +status: done priority: medium state_hub_task_id: "b79558da-54b2-4712-91d2-b298c7cf2c40" ``` @@ -210,7 +210,7 @@ Targets: single `--capability`, `--all`, `--from-git-since `. ```task id: REUSE-WP-0013-T07 -status: todo +status: done priority: low state_hub_task_id: "a55a2f26-004e-4c20-90cb-49bed64a1291" ``` @@ -227,13 +227,20 @@ state_hub_task_id: "a55a2f26-004e-4c20-90cb-49bed64a1291" ## Acceptance -- [ ] `reuse-surface stats` reports maturity and federation-readiness aggregates -- [ ] `establish --scaffold` creates valid empty registry layout without overwrite accidents -- [ ] `establish --publish-check` detects 303 vs 200 raw URL outcomes -- [ ] llm-connect bridge works with mocked HTTP; fails clearly when URL unset -- [ ] `establish --discover --dry-run` produces schema-valid draft JSON from fixture context -- [ ] `update --dry-run` reports deterministic drift on sample repo -- [ ] All new commands documented; gap priority 24 recorded +- [x] `reuse-surface stats` reports maturity and federation-readiness aggregates +- [x] `establish --scaffold` creates valid empty registry layout without overwrite accidents +- [x] `establish --publish-check` detects 303 vs 200 raw URL outcomes +- [x] llm-connect bridge works with mocked HTTP; fails clearly when URL unset +- [x] `establish --discover --dry-run` produces schema-valid draft JSON from fixture context +- [x] `update --dry-run` reports deterministic drift on sample repo +- [x] All new commands documented; gap priority 24 recorded + +## Completion notes (2026-06-17) + +- Modules: `stats.py`, `establish.py`, `registry_update.py`, `llm_bridge.py` +- Schema: `schemas/registry-draft.schema.json` +- `validate --root` for sibling repo validation after establish --apply +- 43 pytest tests; optional `pip install -e ".[llm]"` extra ## Out of scope