diff --git a/.gitea/workflows/ci.yml b/.gitea/workflows/ci.yml new file mode 100644 index 0000000..29061f8 --- /dev/null +++ b/.gitea/workflows/ci.yml @@ -0,0 +1,25 @@ +name: ci + +on: + push: + branches: [main] + pull_request: + branches: [main] + +jobs: + validate-registry: + runs-on: ubuntu-latest + steps: + - name: Check out source + uses: actions/checkout@v4 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: "3.12" + + - name: Install package + run: python -m pip install -e . + + - name: Validate capability registry + run: reuse-surface validate \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md index 5dbb89b..d8fd824 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -124,6 +124,10 @@ artifacts. # Registry validation (schema + index drift) .venv/bin/reuse-surface validate +# Overlap and catalog generation +.venv/bin/reuse-surface overlaps +.venv/bin/reuse-surface catalog + # Repository hygiene rg --files git diff --check @@ -149,6 +153,9 @@ The generated instruction in older workplans says `make fix-consistency REPO=reuse-surface`; that is still valid when `uv` is installed and on PATH. On this workstation, the `.venv/bin/python` fallback has been verified. +CI runs `reuse-surface validate` on push and pull requests via +`.gitea/workflows/ci.yml`. + ### Run There is no local service to run from this repository. diff --git a/SCOPE.md b/SCOPE.md index e3e977a..12e2fe2 100644 --- a/SCOPE.md +++ b/SCOPE.md @@ -50,7 +50,9 @@ and agents can: `external_evidence.reliability` - **Validate entries automatically** with `reuse-surface validate` - **Export a machine-readable bundle** with `reuse-surface export` -- **Avoid duplicates** by querying the index before creating new entries +- **Detect overlap candidates** with `reuse-surface overlaps` +- **Generate a human-readable catalog** with `reuse-surface catalog` +- **Avoid duplicates** by querying the index and checking overlaps before adding entries Registry tooling availability is **A3** (CLI). The registry product itself is still documentation-first for authoring; consumption combines Markdown entries, @@ -58,11 +60,10 @@ the index, and CLI automation. ## What Is Not Possible Yet -- Generated human-readable catalog site +- Interactive catalog site with live search beyond static HTML export - Capability graph visualization -- Automated duplicate/overlap detection - Federation across repositories or organizations -- CI integration or packaged releases beyond local `pip install -e .` +- Packaged releases beyond local `pip install -e .` and Gitea CI validation See `tools/README.md` for command reference. @@ -74,7 +75,9 @@ See `tools/README.md` for command reference. `pyproject.toml` and `reuse_surface/`. - `docs/CapabilityRegistryConcept.md` and `docs/IntentScopeGapAnalysis.md` document onboarding and intent-scope tracking. -- Finished workplans: `REUSE-WP-0001`, `REUSE-WP-0002`, `REUSE-WP-0003`. +- CI validates the registry on push/PR via `.gitea/workflows/ci.yml`. +- Generated catalog: `docs/CapabilityCatalog.md` and `docs/catalog/index.html`. +- Finished workplans: `REUSE-WP-0001` through `REUSE-WP-0004`. - **Self-assessed vector:** `D5 / A3 / C4 / R2` (see gap analysis). ## Repository Layout @@ -105,6 +108,7 @@ reuse-surface/ - Maturity standard: specs/CapabilityMaturityStandard.md - Registry index: registry/indexes/capabilities.yaml - Registry guidance: registry/README.md +- Generated catalog: docs/CapabilityCatalog.md - CLI reference: tools/README.md - Agent instructions: AGENTS.md - Workplans: workplans/ \ No newline at end of file diff --git a/docs/CapabilityCatalog.md b/docs/CapabilityCatalog.md new file mode 100644 index 0000000..7186681 --- /dev/null +++ b/docs/CapabilityCatalog.md @@ -0,0 +1,78 @@ +# Capability Catalog + +**Domain:** helix_forge +**Updated:** 2026-06-15 +**Entries:** 6 + +Generated by `reuse-surface catalog`. Do not edit manually. + +## helix_forge + +### Feature Availability Evaluation + +- **ID:** `capability.feature-control.evaluate` +- **Vector:** D5 / A4 / C3 / R3 +- **Owner:** feature-control +- **Path:** `registry/capabilities/capability.feature-control.evaluate.md` +- **Summary:** Evaluate whether a feature is active, hidden, disabled, or unavailable for a subject in context. + +**Known limitations:** +- bulk rule management is not yet covered +- agent-specific simulation remains a known gap + +### Feature Rollout Control + +- **ID:** `capability.feature-control.rollout` +- **Vector:** D4 / A2 / C2 / R1 +- **Owner:** feature-control +- **Path:** `registry/capabilities/capability.feature-control.rollout.md` +- **Summary:** Gradually expose features to subjects across tenants, domains, groups, or cohorts using rollout rules and staged availability. + +**Known limitations:** +- distinguish carefully from capability.feature-control.evaluate + +### Identity Subject Resolution + +- **ID:** `capability.identity.subject-resolution` +- **Vector:** D3 / A0 / C1 / R0 +- **Owner:** identity-canon +- **Path:** `registry/capabilities/capability.identity.subject-resolution.md` +- **Summary:** Resolve who or what is acting in a context by mapping principals, accounts, actors, and identifiers to a stable subject model. + +**Known limitations:** +- resolver artifacts are not yet available + +### Identity Vocabulary Canonicalization + +- **ID:** `capability.identity.vocabulary-canonicalize` +- **Vector:** D4 / A0 / C2 / R0 +- **Owner:** identity-canon +- **Path:** `registry/capabilities/capability.identity.vocabulary-canonicalize.md` +- **Summary:** Define and maintain an implementation-neutral vocabulary for identity-related concepts across overlapping domains. + +**Known limitations:** +- source-note backfill is incomplete +- mappings may remain candidate until evidence review completes + +### Capability Registration + +- **ID:** `capability.registry.register` +- **Vector:** D3 / A3 / C2 / R2 +- **Owner:** reuse-surface +- **Path:** `registry/capabilities/capability.registry.register.md` +- **Summary:** Register a new capability so it becomes visible for planning and implementation reuse. + +**Known limitations:** +- manual index updates are required after adding an entry +- duplicate detection is guidance-only in the MVP + +### Workstream And Task Coordination + +- **ID:** `capability.statehub.workstream-coordinate` +- **Vector:** D4 / A4 / C3 / R2 +- **Owner:** state-hub +- **Path:** `registry/capabilities/capability.statehub.workstream-coordinate.md` +- **Summary:** Track active workstreams, tasks, progress, and consistency across domain repositories through a local-first coordination service. + +**Known limitations:** +- requires running State Hub locally or via tunnel diff --git a/docs/IntentScopeGapAnalysis.md b/docs/IntentScopeGapAnalysis.md index 27e3195..30d9205 100644 --- a/docs/IntentScopeGapAnalysis.md +++ b/docs/IntentScopeGapAnalysis.md @@ -265,12 +265,18 @@ own evidence (e.g. feature-control at R3). ### Next recommended work +| Priority | Gap | Outcome | Status | +|---|---|---|---| +| 9 | Catalog site | `reuse-surface catalog` → MD + HTML | Closed (WP-0004) | +| 10 | Overlap detection | `reuse-surface overlaps` | Closed (WP-0004) | +| 11 | CI validation | `.gitea/workflows/ci.yml` | Closed (WP-0004) | +| 12 | Registry federation | Cross-repo capability index composition | Open | + | Priority | Gap | Suggested outcome | |---|---|---| -| 9 | Catalog site | Static browsable capability catalog (UC-RS-018) | -| 10 | Overlap detection | CLI or report for duplicate/overlapping capabilities | -| 11 | CI validation | Run `reuse-surface validate` in CI on registry changes | -| 12 | Registry federation | Cross-repo capability index composition | +| 13 | Interactive catalog | Searchable catalog UI beyond static HTML | +| 14 | Graph visualization | Capability relation graphs | +| 15 | Federation | Compose indexes across repositories | --- @@ -289,4 +295,5 @@ own evidence (e.g. feature-control at R3). | Date | Change | |---|---| | 2026-06-15 | Initial analysis after REUSE-WP-0002 completion | -| 2026-06-15 | REUSE-WP-0003 closed priority gaps 1–8; vector updated to D5/A3/C4/R2 | \ No newline at end of file +| 2026-06-15 | REUSE-WP-0003 closed priority gaps 1–8; vector updated to D5/A3/C4/R2 | +| 2026-06-15 | REUSE-WP-0004 closed priorities 9–11 (catalog, overlaps, CI) | \ No newline at end of file diff --git a/docs/catalog/index.html b/docs/catalog/index.html new file mode 100644 index 0000000..9a5ca52 --- /dev/null +++ b/docs/catalog/index.html @@ -0,0 +1,57 @@ + + + + + Capability Catalog — helix_forge + + + +

Capability Catalog

+

Updated 2026-06-15 · 6 entries

+

helix_forge

+
+

Feature Availability Evaluation

+

capability.feature-control.evaluate · D5 / A4 / C3 / R3

+

Evaluate whether a feature is active, hidden, disabled, or unavailable for a subject in context.

+

registry/capabilities/capability.feature-control.evaluate.md

+
+
+

Feature Rollout Control

+

capability.feature-control.rollout · D4 / A2 / C2 / R1

+

Gradually expose features to subjects across tenants, domains, groups, or cohorts using rollout rules and staged availability.

+

registry/capabilities/capability.feature-control.rollout.md

+
+
+

Identity Subject Resolution

+

capability.identity.subject-resolution · D3 / A0 / C1 / R0

+

Resolve who or what is acting in a context by mapping principals, accounts, actors, and identifiers to a stable subject model.

+

registry/capabilities/capability.identity.subject-resolution.md

+
+
+

Identity Vocabulary Canonicalization

+

capability.identity.vocabulary-canonicalize · D4 / A0 / C2 / R0

+

Define and maintain an implementation-neutral vocabulary for identity-related concepts across overlapping domains.

+

registry/capabilities/capability.identity.vocabulary-canonicalize.md

+
+
+

Capability Registration

+

capability.registry.register · D3 / A3 / C2 / R2

+

Register a new capability so it becomes visible for planning and implementation reuse.

+

registry/capabilities/capability.registry.register.md

+
+
+

Workstream And Task Coordination

+

capability.statehub.workstream-coordinate · D4 / A4 / C3 / R2

+

Track active workstreams, tasks, progress, and consistency across domain repositories through a local-first coordination service.

+

registry/capabilities/capability.statehub.workstream-coordinate.md

+
+ + diff --git a/registry/README.md b/registry/README.md index 3a10ac1..a9a7e82 100644 --- a/registry/README.md +++ b/registry/README.md @@ -117,6 +117,8 @@ Compare vectors side by side and read: ### UC-RS-015 — Detect duplicate or overlapping capabilities +Run `reuse-surface overlaps` for automated candidate detection, then review: + Check for overlap in: - similar `name` or `summary` diff --git a/reuse_surface/catalog.py b/reuse_surface/catalog.py new file mode 100644 index 0000000..2bb3a35 --- /dev/null +++ b/reuse_surface/catalog.py @@ -0,0 +1,122 @@ +from __future__ import annotations + +import html +from collections import defaultdict +from pathlib import Path +from typing import Any + +ROOT = Path(__file__).resolve().parent.parent +CATALOG_MD = ROOT / "docs" / "CapabilityCatalog.md" +CATALOG_HTML_DIR = ROOT / "docs" / "catalog" +CATALOG_HTML = CATALOG_HTML_DIR / "index.html" + + +def _grouped_capabilities( + indexed_entries: list[tuple[dict[str, Any], dict[str, Any]]], +) -> dict[str, list[tuple[dict[str, Any], dict[str, Any]]]]: + grouped: dict[str, list[tuple[dict[str, Any], dict[str, Any]]]] = defaultdict( + list + ) + for index_item, entry in indexed_entries: + domain = index_item.get("domain", "unknown") + grouped[domain].append((index_item, entry)) + return dict(sorted(grouped.items())) + + +def render_markdown( + index: dict[str, Any], + indexed_entries: list[tuple[dict[str, Any], dict[str, Any]]], +) -> str: + lines = [ + "# Capability Catalog", + "", + f"**Domain:** {index.get('domain', 'unknown')} ", + f"**Updated:** {index.get('updated', 'unknown')} ", + f"**Entries:** {len(indexed_entries)}", + "", + "Generated by `reuse-surface catalog`. Do not edit manually.", + "", + ] + for domain, items in _grouped_capabilities(indexed_entries).items(): + lines.extend([f"## {domain}", ""]) + for index_item, entry in sorted(items, key=lambda pair: pair[0]["id"]): + lines.extend( + [ + f"### {index_item['name']}", + "", + f"- **ID:** `{index_item['id']}`", + f"- **Vector:** {index_item['vector']}", + f"- **Owner:** {index_item.get('owner', 'unknown')}", + f"- **Path:** `{index_item['path']}`", + f"- **Summary:** {index_item['summary']}", + "", + ] + ) + guidance = entry.get("consumer_guidance") or {} + limitations = guidance.get("known_limitations") or [] + if limitations: + lines.append("**Known limitations:**") + lines.extend(f"- {item}" for item in limitations) + lines.append("") + return "\n".join(lines).rstrip() + "\n" + + +def render_html( + index: dict[str, Any], + indexed_entries: list[tuple[dict[str, Any], dict[str, Any]]], +) -> str: + sections: list[str] = [] + for domain, items in _grouped_capabilities(indexed_entries).items(): + cards: list[str] = [] + for index_item, entry in sorted(items, key=lambda pair: pair[0]["id"]): + name = html.escape(index_item["name"]) + summary = html.escape(index_item["summary"]) + cap_id = html.escape(index_item["id"]) + vector = html.escape(index_item["vector"]) + path = html.escape(index_item["path"]) + cards.append( + f"""
+

{name}

+

{cap_id} · {vector}

+

{summary}

+

{path}

+
""" + ) + sections.append( + f"

{html.escape(domain)}

\n" + "\n".join(cards) + "
" + ) + + body = "\n".join(sections) + title = html.escape(f"Capability Catalog — {index.get('domain', 'unknown')}") + return f""" + + + + {title} + + + +

Capability Catalog

+

Updated {html.escape(str(index.get('updated', 'unknown')))} · {len(indexed_entries)} entries

+ {body} + + +""" + + +def write_catalog( + index: dict[str, Any], + indexed_entries: list[tuple[dict[str, Any], dict[str, Any]]], +) -> tuple[Path, Path]: + CATALOG_HTML_DIR.mkdir(parents=True, exist_ok=True) + CATALOG_MD.write_text(render_markdown(index, indexed_entries), encoding="utf-8") + CATALOG_HTML.write_text(render_html(index, indexed_entries), encoding="utf-8") + return CATALOG_MD, CATALOG_HTML \ No newline at end of file diff --git a/reuse_surface/cli.py b/reuse_surface/cli.py index 694b5d4..4096d07 100644 --- a/reuse_surface/cli.py +++ b/reuse_surface/cli.py @@ -9,9 +9,9 @@ from typing import Any import yaml from jsonschema import Draft202012Validator +from reuse_surface.catalog import write_catalog +from reuse_surface.overlaps import find_overlaps from reuse_surface.registry import ( - CAPABILITIES_DIR, - INDEX_PATH, ROOT, capability_paths, level_at_least, @@ -115,6 +115,40 @@ def cmd_query(args: argparse.Namespace) -> int: return 0 +def _load_indexed_entries() -> list[tuple[dict[str, Any], dict[str, Any]]]: + index = load_index() + indexed_entries: list[tuple[dict[str, Any], dict[str, Any]]] = [] + for item in index.get("capabilities", []): + path = ROOT / item["path"] + indexed_entries.append((item, parse_front_matter(path))) + return indexed_entries + + +def cmd_overlaps(args: argparse.Namespace) -> int: + indexed_entries = _load_indexed_entries() + candidates = find_overlaps(indexed_entries, threshold=args.threshold) + if not candidates: + print("no overlap candidates") + return 0 + for candidate in candidates: + reasons = "; ".join(candidate.reasons) + print( + f"{candidate.left_id} <> {candidate.right_id} " + f"score={candidate.score:.2f} {reasons}" + ) + print(f"\n{len(candidates)} candidate{'s' if len(candidates) != 1 else ''}") + return 0 + + +def cmd_catalog(args: argparse.Namespace) -> int: + index = load_index() + indexed_entries = _load_indexed_entries() + md_path, html_path = write_catalog(index, indexed_entries) + print(f"ok: wrote {md_path.relative_to(ROOT)}") + print(f"ok: wrote {html_path.relative_to(ROOT)}") + return 0 + + def cmd_export(args: argparse.Namespace) -> int: index = load_index() bundle: dict[str, Any] = { @@ -184,6 +218,22 @@ def main(argv: list[str] | None = None) -> int: ) export.set_defaults(func=cmd_export) + overlaps = subparsers.add_parser( + "overlaps", help="detect potential duplicate capabilities" + ) + overlaps.add_argument( + "--threshold", + type=float, + default=0.28, + help="token similarity threshold (0-1)", + ) + overlaps.set_defaults(func=cmd_overlaps) + + catalog = subparsers.add_parser( + "catalog", help="generate human-readable capability catalog" + ) + catalog.set_defaults(func=cmd_catalog) + args = parser.parse_args(argv) return args.func(args) diff --git a/reuse_surface/overlaps.py b/reuse_surface/overlaps.py new file mode 100644 index 0000000..97ad9fd --- /dev/null +++ b/reuse_surface/overlaps.py @@ -0,0 +1,87 @@ +from __future__ import annotations + +import re +from dataclasses import dataclass +from typing import Any + +TOKEN_RE = re.compile(r"[a-z][a-z0-9-]{2,}") + + +@dataclass +class OverlapCandidate: + left_id: str + right_id: str + score: float + reasons: list[str] + + +def _tokens(text: str) -> set[str]: + return set(TOKEN_RE.findall(text.lower())) + + +def _entry_blob(entry: dict[str, Any], index_item: dict[str, Any]) -> str: + discovery = entry.get("discovery") or {} + parts = [ + index_item.get("name", ""), + index_item.get("summary", ""), + entry.get("id", ""), + " ".join(index_item.get("tags", [])), + discovery.get("intent", ""), + " ".join(discovery.get("includes", [])), + ] + return " ".join(str(part) for part in parts if part) + + +def _relation_overlap(left: dict[str, Any], right: dict[str, Any]) -> list[str]: + reasons: list[str] = [] + left_id = left["id"] + right_id = right["id"] + relations = left.get("relations") or {} + for relation_type, targets in relations.items(): + if not isinstance(targets, list): + continue + if right_id in targets: + reasons.append(f"relation:{relation_type}") + if left_id.split(".")[1] == right_id.split(".")[1]: + reasons.append("shared domain segment") + return reasons + + +def find_overlaps( + indexed_entries: list[tuple[dict[str, Any], dict[str, Any]]], + *, + threshold: float = 0.28, +) -> list[OverlapCandidate]: + candidates: list[OverlapCandidate] = [] + blobs = [ + (_entry_blob(entry, index_item), index_item["id"], entry) + for index_item, entry in indexed_entries + ] + + for i, (left_blob, left_id, left_entry) in enumerate(blobs): + left_tokens = _tokens(left_blob) + for j in range(i + 1, len(blobs)): + right_blob, right_id, right_entry = blobs[j] + right_tokens = _tokens(right_blob) + if not left_tokens or not right_tokens: + continue + score = len(left_tokens & right_tokens) / len(left_tokens | right_tokens) + reasons: list[str] = [] + if score >= threshold: + reasons.append(f"token similarity {score:.2f}") + shared_tags = set(left_entry.get("tags", [])) & set( + right_entry.get("tags", []) + ) + if shared_tags: + reasons.append(f"shared tags: {', '.join(sorted(shared_tags))}") + reasons.extend(_relation_overlap(left_entry, right_entry)) + if reasons and (score >= threshold or len(reasons) > 1): + candidates.append( + OverlapCandidate( + left_id=left_id, + right_id=right_id, + score=score, + reasons=reasons, + ) + ) + return sorted(candidates, key=lambda item: item.score, reverse=True) \ No newline at end of file diff --git a/tools/README.md b/tools/README.md index a8e2b7f..3b35e3f 100644 --- a/tools/README.md +++ b/tools/README.md @@ -42,6 +42,25 @@ reuse-surface export reuse-surface export --format json ``` +### overlaps + +Detect potential duplicate or overlapping capabilities (UC-RS-015). + +```bash +reuse-surface overlaps +reuse-surface overlaps --threshold 0.35 +``` + +### catalog + +Generate human-readable catalog artifacts (UC-RS-018). + +```bash +reuse-surface catalog +``` + +Writes `docs/CapabilityCatalog.md` and `docs/catalog/index.html`. + ## Export format The export bundle includes: @@ -59,6 +78,8 @@ Stable IDs and maturity fields are preserved for agent consumption (UC-RS-019). | Discover capabilities | `reuse-surface query` or read the index | | Validate entry shape | `reuse-surface validate` | | Export for agents | `reuse-surface export --format json` | +| Detect overlap | `reuse-surface overlaps` | +| Publish catalog | `reuse-surface catalog` | ## Related use cases diff --git a/workplans/REUSE-WP-0004-registry-hardening.md b/workplans/REUSE-WP-0004-registry-hardening.md new file mode 100644 index 0000000..da60f56 --- /dev/null +++ b/workplans/REUSE-WP-0004-registry-hardening.md @@ -0,0 +1,66 @@ +--- +id: REUSE-WP-0004 +type: workplan +title: "Registry hardening: CI, overlap detection, and catalog" +domain: helix_forge +repo: reuse-surface +status: finished +owner: codex +topic_slug: helix-forge +created: "2026-06-15" +updated: "2026-06-15" +--- + +# Registry hardening: CI, overlap detection, and catalog + +Follow-up to `docs/IntentScopeGapAnalysis.md` section 8 next recommended work +(priorities 9–11). Raise registry quality through automated CI validation, overlap +reporting (UC-RS-015), and a generated human-readable catalog (UC-RS-018). + +## Add CI Validation Workflow + +```task +id: REUSE-WP-0004-T01 +status: done +priority: high +``` + +Add `.gitea/workflows/ci.yml` that runs on push and pull requests to `main`. +Install the package and run `reuse-surface validate`. Document the workflow in +`AGENTS.md`. + +## Add Overlap Detection Command + +```task +id: REUSE-WP-0004-T02 +status: done +priority: high +``` + +Add `reuse-surface overlaps` that flags potential duplicate or overlapping +capabilities using summary/tags/includes similarity and relation signals. +Document usage in `registry/README.md` and `tools/README.md`. + +## Add Catalog Generation Command + +```task +id: REUSE-WP-0004-T03 +status: done +priority: medium +``` + +Add `reuse-surface catalog` that generates `docs/CapabilityCatalog.md` and +`docs/catalog/index.html` from the index and entry front matter. Group by domain +and show maturity vectors. + +## Refresh Docs And Gap Analysis + +```task +id: REUSE-WP-0004-T04 +status: done +priority: medium +``` + +Update `SCOPE.md`, `tools/README.md`, and `docs/IntentScopeGapAnalysis.md` to +reflect CI, overlaps, and catalog capabilities. Close gap analysis priorities +9–11. \ No newline at end of file