generated from coulomb/repo-seed
Complete REUSE-WP-0004: CI, overlap detection, and catalog generation
Some checks failed
ci / validate-registry (push) Has been cancelled
Some checks failed
ci / validate-registry (push) Has been cancelled
Add Gitea CI workflow for registry validation, reuse-surface overlaps and catalog commands, generated catalog artifacts, and documentation updates closing gap analysis priorities 9-11.
This commit is contained in:
25
.gitea/workflows/ci.yml
Normal file
25
.gitea/workflows/ci.yml
Normal file
@@ -0,0 +1,25 @@
|
||||
name: ci
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
validate-registry:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Check out source
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.12"
|
||||
|
||||
- name: Install package
|
||||
run: python -m pip install -e .
|
||||
|
||||
- name: Validate capability registry
|
||||
run: reuse-surface validate
|
||||
@@ -124,6 +124,10 @@ artifacts.
|
||||
# Registry validation (schema + index drift)
|
||||
.venv/bin/reuse-surface validate
|
||||
|
||||
# Overlap and catalog generation
|
||||
.venv/bin/reuse-surface overlaps
|
||||
.venv/bin/reuse-surface catalog
|
||||
|
||||
# Repository hygiene
|
||||
rg --files
|
||||
git diff --check
|
||||
@@ -149,6 +153,9 @@ The generated instruction in older workplans says `make fix-consistency
|
||||
REPO=reuse-surface`; that is still valid when `uv` is installed and on PATH.
|
||||
On this workstation, the `.venv/bin/python` fallback has been verified.
|
||||
|
||||
CI runs `reuse-surface validate` on push and pull requests via
|
||||
`.gitea/workflows/ci.yml`.
|
||||
|
||||
### Run
|
||||
|
||||
There is no local service to run from this repository.
|
||||
|
||||
14
SCOPE.md
14
SCOPE.md
@@ -50,7 +50,9 @@ and agents can:
|
||||
`external_evidence.reliability`
|
||||
- **Validate entries automatically** with `reuse-surface validate`
|
||||
- **Export a machine-readable bundle** with `reuse-surface export`
|
||||
- **Avoid duplicates** by querying the index before creating new entries
|
||||
- **Detect overlap candidates** with `reuse-surface overlaps`
|
||||
- **Generate a human-readable catalog** with `reuse-surface catalog`
|
||||
- **Avoid duplicates** by querying the index and checking overlaps before adding entries
|
||||
|
||||
Registry tooling availability is **A3** (CLI). The registry product itself is
|
||||
still documentation-first for authoring; consumption combines Markdown entries,
|
||||
@@ -58,11 +60,10 @@ the index, and CLI automation.
|
||||
|
||||
## What Is Not Possible Yet
|
||||
|
||||
- Generated human-readable catalog site
|
||||
- Interactive catalog site with live search beyond static HTML export
|
||||
- Capability graph visualization
|
||||
- Automated duplicate/overlap detection
|
||||
- Federation across repositories or organizations
|
||||
- CI integration or packaged releases beyond local `pip install -e .`
|
||||
- Packaged releases beyond local `pip install -e .` and Gitea CI validation
|
||||
|
||||
See `tools/README.md` for command reference.
|
||||
|
||||
@@ -74,7 +75,9 @@ See `tools/README.md` for command reference.
|
||||
`pyproject.toml` and `reuse_surface/`.
|
||||
- `docs/CapabilityRegistryConcept.md` and `docs/IntentScopeGapAnalysis.md`
|
||||
document onboarding and intent-scope tracking.
|
||||
- Finished workplans: `REUSE-WP-0001`, `REUSE-WP-0002`, `REUSE-WP-0003`.
|
||||
- CI validates the registry on push/PR via `.gitea/workflows/ci.yml`.
|
||||
- Generated catalog: `docs/CapabilityCatalog.md` and `docs/catalog/index.html`.
|
||||
- Finished workplans: `REUSE-WP-0001` through `REUSE-WP-0004`.
|
||||
- **Self-assessed vector:** `D5 / A3 / C4 / R2` (see gap analysis).
|
||||
|
||||
## Repository Layout
|
||||
@@ -105,6 +108,7 @@ reuse-surface/
|
||||
- Maturity standard: specs/CapabilityMaturityStandard.md
|
||||
- Registry index: registry/indexes/capabilities.yaml
|
||||
- Registry guidance: registry/README.md
|
||||
- Generated catalog: docs/CapabilityCatalog.md
|
||||
- CLI reference: tools/README.md
|
||||
- Agent instructions: AGENTS.md
|
||||
- Workplans: workplans/
|
||||
78
docs/CapabilityCatalog.md
Normal file
78
docs/CapabilityCatalog.md
Normal file
@@ -0,0 +1,78 @@
|
||||
# Capability Catalog
|
||||
|
||||
**Domain:** helix_forge
|
||||
**Updated:** 2026-06-15
|
||||
**Entries:** 6
|
||||
|
||||
Generated by `reuse-surface catalog`. Do not edit manually.
|
||||
|
||||
## helix_forge
|
||||
|
||||
### Feature Availability Evaluation
|
||||
|
||||
- **ID:** `capability.feature-control.evaluate`
|
||||
- **Vector:** D5 / A4 / C3 / R3
|
||||
- **Owner:** feature-control
|
||||
- **Path:** `registry/capabilities/capability.feature-control.evaluate.md`
|
||||
- **Summary:** Evaluate whether a feature is active, hidden, disabled, or unavailable for a subject in context.
|
||||
|
||||
**Known limitations:**
|
||||
- bulk rule management is not yet covered
|
||||
- agent-specific simulation remains a known gap
|
||||
|
||||
### Feature Rollout Control
|
||||
|
||||
- **ID:** `capability.feature-control.rollout`
|
||||
- **Vector:** D4 / A2 / C2 / R1
|
||||
- **Owner:** feature-control
|
||||
- **Path:** `registry/capabilities/capability.feature-control.rollout.md`
|
||||
- **Summary:** Gradually expose features to subjects across tenants, domains, groups, or cohorts using rollout rules and staged availability.
|
||||
|
||||
**Known limitations:**
|
||||
- distinguish carefully from capability.feature-control.evaluate
|
||||
|
||||
### Identity Subject Resolution
|
||||
|
||||
- **ID:** `capability.identity.subject-resolution`
|
||||
- **Vector:** D3 / A0 / C1 / R0
|
||||
- **Owner:** identity-canon
|
||||
- **Path:** `registry/capabilities/capability.identity.subject-resolution.md`
|
||||
- **Summary:** Resolve who or what is acting in a context by mapping principals, accounts, actors, and identifiers to a stable subject model.
|
||||
|
||||
**Known limitations:**
|
||||
- resolver artifacts are not yet available
|
||||
|
||||
### Identity Vocabulary Canonicalization
|
||||
|
||||
- **ID:** `capability.identity.vocabulary-canonicalize`
|
||||
- **Vector:** D4 / A0 / C2 / R0
|
||||
- **Owner:** identity-canon
|
||||
- **Path:** `registry/capabilities/capability.identity.vocabulary-canonicalize.md`
|
||||
- **Summary:** Define and maintain an implementation-neutral vocabulary for identity-related concepts across overlapping domains.
|
||||
|
||||
**Known limitations:**
|
||||
- source-note backfill is incomplete
|
||||
- mappings may remain candidate until evidence review completes
|
||||
|
||||
### Capability Registration
|
||||
|
||||
- **ID:** `capability.registry.register`
|
||||
- **Vector:** D3 / A3 / C2 / R2
|
||||
- **Owner:** reuse-surface
|
||||
- **Path:** `registry/capabilities/capability.registry.register.md`
|
||||
- **Summary:** Register a new capability so it becomes visible for planning and implementation reuse.
|
||||
|
||||
**Known limitations:**
|
||||
- manual index updates are required after adding an entry
|
||||
- duplicate detection is guidance-only in the MVP
|
||||
|
||||
### Workstream And Task Coordination
|
||||
|
||||
- **ID:** `capability.statehub.workstream-coordinate`
|
||||
- **Vector:** D4 / A4 / C3 / R2
|
||||
- **Owner:** state-hub
|
||||
- **Path:** `registry/capabilities/capability.statehub.workstream-coordinate.md`
|
||||
- **Summary:** Track active workstreams, tasks, progress, and consistency across domain repositories through a local-first coordination service.
|
||||
|
||||
**Known limitations:**
|
||||
- requires running State Hub locally or via tunnel
|
||||
@@ -265,12 +265,18 @@ own evidence (e.g. feature-control at R3).
|
||||
|
||||
### Next recommended work
|
||||
|
||||
| Priority | Gap | Outcome | Status |
|
||||
|---|---|---|---|
|
||||
| 9 | Catalog site | `reuse-surface catalog` → MD + HTML | Closed (WP-0004) |
|
||||
| 10 | Overlap detection | `reuse-surface overlaps` | Closed (WP-0004) |
|
||||
| 11 | CI validation | `.gitea/workflows/ci.yml` | Closed (WP-0004) |
|
||||
| 12 | Registry federation | Cross-repo capability index composition | Open |
|
||||
|
||||
| Priority | Gap | Suggested outcome |
|
||||
|---|---|---|
|
||||
| 9 | Catalog site | Static browsable capability catalog (UC-RS-018) |
|
||||
| 10 | Overlap detection | CLI or report for duplicate/overlapping capabilities |
|
||||
| 11 | CI validation | Run `reuse-surface validate` in CI on registry changes |
|
||||
| 12 | Registry federation | Cross-repo capability index composition |
|
||||
| 13 | Interactive catalog | Searchable catalog UI beyond static HTML |
|
||||
| 14 | Graph visualization | Capability relation graphs |
|
||||
| 15 | Federation | Compose indexes across repositories |
|
||||
|
||||
---
|
||||
|
||||
@@ -289,4 +295,5 @@ own evidence (e.g. feature-control at R3).
|
||||
| Date | Change |
|
||||
|---|---|
|
||||
| 2026-06-15 | Initial analysis after REUSE-WP-0002 completion |
|
||||
| 2026-06-15 | REUSE-WP-0003 closed priority gaps 1–8; vector updated to D5/A3/C4/R2 |
|
||||
| 2026-06-15 | REUSE-WP-0003 closed priority gaps 1–8; vector updated to D5/A3/C4/R2 |
|
||||
| 2026-06-15 | REUSE-WP-0004 closed priorities 9–11 (catalog, overlaps, CI) |
|
||||
57
docs/catalog/index.html
Normal file
57
docs/catalog/index.html
Normal file
@@ -0,0 +1,57 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Capability Catalog — helix_forge</title>
|
||||
<style>
|
||||
body { font-family: system-ui, sans-serif; margin: 2rem; line-height: 1.5; }
|
||||
h1 { margin-bottom: 0.2rem; }
|
||||
.subtitle { color: #555; margin-bottom: 2rem; }
|
||||
section { margin-bottom: 2rem; }
|
||||
.card { border: 1px solid #ddd; border-radius: 8px; padding: 1rem; margin: 1rem 0; }
|
||||
.meta { color: #444; font-size: 0.95rem; }
|
||||
.path { font-size: 0.85rem; color: #666; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>Capability Catalog</h1>
|
||||
<p class="subtitle">Updated 2026-06-15 · 6 entries</p>
|
||||
<section><h2>helix_forge</h2>
|
||||
<article class="card">
|
||||
<h3>Feature Availability Evaluation</h3>
|
||||
<p class="meta"><code>capability.feature-control.evaluate</code> · D5 / A4 / C3 / R3</p>
|
||||
<p>Evaluate whether a feature is active, hidden, disabled, or unavailable for a subject in context.</p>
|
||||
<p class="path">registry/capabilities/capability.feature-control.evaluate.md</p>
|
||||
</article>
|
||||
<article class="card">
|
||||
<h3>Feature Rollout Control</h3>
|
||||
<p class="meta"><code>capability.feature-control.rollout</code> · D4 / A2 / C2 / R1</p>
|
||||
<p>Gradually expose features to subjects across tenants, domains, groups, or cohorts using rollout rules and staged availability.</p>
|
||||
<p class="path">registry/capabilities/capability.feature-control.rollout.md</p>
|
||||
</article>
|
||||
<article class="card">
|
||||
<h3>Identity Subject Resolution</h3>
|
||||
<p class="meta"><code>capability.identity.subject-resolution</code> · D3 / A0 / C1 / R0</p>
|
||||
<p>Resolve who or what is acting in a context by mapping principals, accounts, actors, and identifiers to a stable subject model.</p>
|
||||
<p class="path">registry/capabilities/capability.identity.subject-resolution.md</p>
|
||||
</article>
|
||||
<article class="card">
|
||||
<h3>Identity Vocabulary Canonicalization</h3>
|
||||
<p class="meta"><code>capability.identity.vocabulary-canonicalize</code> · D4 / A0 / C2 / R0</p>
|
||||
<p>Define and maintain an implementation-neutral vocabulary for identity-related concepts across overlapping domains.</p>
|
||||
<p class="path">registry/capabilities/capability.identity.vocabulary-canonicalize.md</p>
|
||||
</article>
|
||||
<article class="card">
|
||||
<h3>Capability Registration</h3>
|
||||
<p class="meta"><code>capability.registry.register</code> · D3 / A3 / C2 / R2</p>
|
||||
<p>Register a new capability so it becomes visible for planning and implementation reuse.</p>
|
||||
<p class="path">registry/capabilities/capability.registry.register.md</p>
|
||||
</article>
|
||||
<article class="card">
|
||||
<h3>Workstream And Task Coordination</h3>
|
||||
<p class="meta"><code>capability.statehub.workstream-coordinate</code> · D4 / A4 / C3 / R2</p>
|
||||
<p>Track active workstreams, tasks, progress, and consistency across domain repositories through a local-first coordination service.</p>
|
||||
<p class="path">registry/capabilities/capability.statehub.workstream-coordinate.md</p>
|
||||
</article></section>
|
||||
</body>
|
||||
</html>
|
||||
@@ -117,6 +117,8 @@ Compare vectors side by side and read:
|
||||
|
||||
### UC-RS-015 — Detect duplicate or overlapping capabilities
|
||||
|
||||
Run `reuse-surface overlaps` for automated candidate detection, then review:
|
||||
|
||||
Check for overlap in:
|
||||
|
||||
- similar `name` or `summary`
|
||||
|
||||
122
reuse_surface/catalog.py
Normal file
122
reuse_surface/catalog.py
Normal file
@@ -0,0 +1,122 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import html
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
ROOT = Path(__file__).resolve().parent.parent
|
||||
CATALOG_MD = ROOT / "docs" / "CapabilityCatalog.md"
|
||||
CATALOG_HTML_DIR = ROOT / "docs" / "catalog"
|
||||
CATALOG_HTML = CATALOG_HTML_DIR / "index.html"
|
||||
|
||||
|
||||
def _grouped_capabilities(
|
||||
indexed_entries: list[tuple[dict[str, Any], dict[str, Any]]],
|
||||
) -> dict[str, list[tuple[dict[str, Any], dict[str, Any]]]]:
|
||||
grouped: dict[str, list[tuple[dict[str, Any], dict[str, Any]]]] = defaultdict(
|
||||
list
|
||||
)
|
||||
for index_item, entry in indexed_entries:
|
||||
domain = index_item.get("domain", "unknown")
|
||||
grouped[domain].append((index_item, entry))
|
||||
return dict(sorted(grouped.items()))
|
||||
|
||||
|
||||
def render_markdown(
|
||||
index: dict[str, Any],
|
||||
indexed_entries: list[tuple[dict[str, Any], dict[str, Any]]],
|
||||
) -> str:
|
||||
lines = [
|
||||
"# Capability Catalog",
|
||||
"",
|
||||
f"**Domain:** {index.get('domain', 'unknown')} ",
|
||||
f"**Updated:** {index.get('updated', 'unknown')} ",
|
||||
f"**Entries:** {len(indexed_entries)}",
|
||||
"",
|
||||
"Generated by `reuse-surface catalog`. Do not edit manually.",
|
||||
"",
|
||||
]
|
||||
for domain, items in _grouped_capabilities(indexed_entries).items():
|
||||
lines.extend([f"## {domain}", ""])
|
||||
for index_item, entry in sorted(items, key=lambda pair: pair[0]["id"]):
|
||||
lines.extend(
|
||||
[
|
||||
f"### {index_item['name']}",
|
||||
"",
|
||||
f"- **ID:** `{index_item['id']}`",
|
||||
f"- **Vector:** {index_item['vector']}",
|
||||
f"- **Owner:** {index_item.get('owner', 'unknown')}",
|
||||
f"- **Path:** `{index_item['path']}`",
|
||||
f"- **Summary:** {index_item['summary']}",
|
||||
"",
|
||||
]
|
||||
)
|
||||
guidance = entry.get("consumer_guidance") or {}
|
||||
limitations = guidance.get("known_limitations") or []
|
||||
if limitations:
|
||||
lines.append("**Known limitations:**")
|
||||
lines.extend(f"- {item}" for item in limitations)
|
||||
lines.append("")
|
||||
return "\n".join(lines).rstrip() + "\n"
|
||||
|
||||
|
||||
def render_html(
|
||||
index: dict[str, Any],
|
||||
indexed_entries: list[tuple[dict[str, Any], dict[str, Any]]],
|
||||
) -> str:
|
||||
sections: list[str] = []
|
||||
for domain, items in _grouped_capabilities(indexed_entries).items():
|
||||
cards: list[str] = []
|
||||
for index_item, entry in sorted(items, key=lambda pair: pair[0]["id"]):
|
||||
name = html.escape(index_item["name"])
|
||||
summary = html.escape(index_item["summary"])
|
||||
cap_id = html.escape(index_item["id"])
|
||||
vector = html.escape(index_item["vector"])
|
||||
path = html.escape(index_item["path"])
|
||||
cards.append(
|
||||
f"""<article class="card">
|
||||
<h3>{name}</h3>
|
||||
<p class="meta"><code>{cap_id}</code> · {vector}</p>
|
||||
<p>{summary}</p>
|
||||
<p class="path">{path}</p>
|
||||
</article>"""
|
||||
)
|
||||
sections.append(
|
||||
f"<section><h2>{html.escape(domain)}</h2>\n" + "\n".join(cards) + "</section>"
|
||||
)
|
||||
|
||||
body = "\n".join(sections)
|
||||
title = html.escape(f"Capability Catalog — {index.get('domain', 'unknown')}")
|
||||
return f"""<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>{title}</title>
|
||||
<style>
|
||||
body {{ font-family: system-ui, sans-serif; margin: 2rem; line-height: 1.5; }}
|
||||
h1 {{ margin-bottom: 0.2rem; }}
|
||||
.subtitle {{ color: #555; margin-bottom: 2rem; }}
|
||||
section {{ margin-bottom: 2rem; }}
|
||||
.card {{ border: 1px solid #ddd; border-radius: 8px; padding: 1rem; margin: 1rem 0; }}
|
||||
.meta {{ color: #444; font-size: 0.95rem; }}
|
||||
.path {{ font-size: 0.85rem; color: #666; }}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>Capability Catalog</h1>
|
||||
<p class="subtitle">Updated {html.escape(str(index.get('updated', 'unknown')))} · {len(indexed_entries)} entries</p>
|
||||
{body}
|
||||
</body>
|
||||
</html>
|
||||
"""
|
||||
|
||||
|
||||
def write_catalog(
|
||||
index: dict[str, Any],
|
||||
indexed_entries: list[tuple[dict[str, Any], dict[str, Any]]],
|
||||
) -> tuple[Path, Path]:
|
||||
CATALOG_HTML_DIR.mkdir(parents=True, exist_ok=True)
|
||||
CATALOG_MD.write_text(render_markdown(index, indexed_entries), encoding="utf-8")
|
||||
CATALOG_HTML.write_text(render_html(index, indexed_entries), encoding="utf-8")
|
||||
return CATALOG_MD, CATALOG_HTML
|
||||
@@ -9,9 +9,9 @@ from typing import Any
|
||||
import yaml
|
||||
from jsonschema import Draft202012Validator
|
||||
|
||||
from reuse_surface.catalog import write_catalog
|
||||
from reuse_surface.overlaps import find_overlaps
|
||||
from reuse_surface.registry import (
|
||||
CAPABILITIES_DIR,
|
||||
INDEX_PATH,
|
||||
ROOT,
|
||||
capability_paths,
|
||||
level_at_least,
|
||||
@@ -115,6 +115,40 @@ def cmd_query(args: argparse.Namespace) -> int:
|
||||
return 0
|
||||
|
||||
|
||||
def _load_indexed_entries() -> list[tuple[dict[str, Any], dict[str, Any]]]:
|
||||
index = load_index()
|
||||
indexed_entries: list[tuple[dict[str, Any], dict[str, Any]]] = []
|
||||
for item in index.get("capabilities", []):
|
||||
path = ROOT / item["path"]
|
||||
indexed_entries.append((item, parse_front_matter(path)))
|
||||
return indexed_entries
|
||||
|
||||
|
||||
def cmd_overlaps(args: argparse.Namespace) -> int:
|
||||
indexed_entries = _load_indexed_entries()
|
||||
candidates = find_overlaps(indexed_entries, threshold=args.threshold)
|
||||
if not candidates:
|
||||
print("no overlap candidates")
|
||||
return 0
|
||||
for candidate in candidates:
|
||||
reasons = "; ".join(candidate.reasons)
|
||||
print(
|
||||
f"{candidate.left_id} <> {candidate.right_id} "
|
||||
f"score={candidate.score:.2f} {reasons}"
|
||||
)
|
||||
print(f"\n{len(candidates)} candidate{'s' if len(candidates) != 1 else ''}")
|
||||
return 0
|
||||
|
||||
|
||||
def cmd_catalog(args: argparse.Namespace) -> int:
|
||||
index = load_index()
|
||||
indexed_entries = _load_indexed_entries()
|
||||
md_path, html_path = write_catalog(index, indexed_entries)
|
||||
print(f"ok: wrote {md_path.relative_to(ROOT)}")
|
||||
print(f"ok: wrote {html_path.relative_to(ROOT)}")
|
||||
return 0
|
||||
|
||||
|
||||
def cmd_export(args: argparse.Namespace) -> int:
|
||||
index = load_index()
|
||||
bundle: dict[str, Any] = {
|
||||
@@ -184,6 +218,22 @@ def main(argv: list[str] | None = None) -> int:
|
||||
)
|
||||
export.set_defaults(func=cmd_export)
|
||||
|
||||
overlaps = subparsers.add_parser(
|
||||
"overlaps", help="detect potential duplicate capabilities"
|
||||
)
|
||||
overlaps.add_argument(
|
||||
"--threshold",
|
||||
type=float,
|
||||
default=0.28,
|
||||
help="token similarity threshold (0-1)",
|
||||
)
|
||||
overlaps.set_defaults(func=cmd_overlaps)
|
||||
|
||||
catalog = subparsers.add_parser(
|
||||
"catalog", help="generate human-readable capability catalog"
|
||||
)
|
||||
catalog.set_defaults(func=cmd_catalog)
|
||||
|
||||
args = parser.parse_args(argv)
|
||||
return args.func(args)
|
||||
|
||||
|
||||
87
reuse_surface/overlaps.py
Normal file
87
reuse_surface/overlaps.py
Normal file
@@ -0,0 +1,87 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from typing import Any
|
||||
|
||||
TOKEN_RE = re.compile(r"[a-z][a-z0-9-]{2,}")
|
||||
|
||||
|
||||
@dataclass
|
||||
class OverlapCandidate:
|
||||
left_id: str
|
||||
right_id: str
|
||||
score: float
|
||||
reasons: list[str]
|
||||
|
||||
|
||||
def _tokens(text: str) -> set[str]:
|
||||
return set(TOKEN_RE.findall(text.lower()))
|
||||
|
||||
|
||||
def _entry_blob(entry: dict[str, Any], index_item: dict[str, Any]) -> str:
|
||||
discovery = entry.get("discovery") or {}
|
||||
parts = [
|
||||
index_item.get("name", ""),
|
||||
index_item.get("summary", ""),
|
||||
entry.get("id", ""),
|
||||
" ".join(index_item.get("tags", [])),
|
||||
discovery.get("intent", ""),
|
||||
" ".join(discovery.get("includes", [])),
|
||||
]
|
||||
return " ".join(str(part) for part in parts if part)
|
||||
|
||||
|
||||
def _relation_overlap(left: dict[str, Any], right: dict[str, Any]) -> list[str]:
|
||||
reasons: list[str] = []
|
||||
left_id = left["id"]
|
||||
right_id = right["id"]
|
||||
relations = left.get("relations") or {}
|
||||
for relation_type, targets in relations.items():
|
||||
if not isinstance(targets, list):
|
||||
continue
|
||||
if right_id in targets:
|
||||
reasons.append(f"relation:{relation_type}")
|
||||
if left_id.split(".")[1] == right_id.split(".")[1]:
|
||||
reasons.append("shared domain segment")
|
||||
return reasons
|
||||
|
||||
|
||||
def find_overlaps(
|
||||
indexed_entries: list[tuple[dict[str, Any], dict[str, Any]]],
|
||||
*,
|
||||
threshold: float = 0.28,
|
||||
) -> list[OverlapCandidate]:
|
||||
candidates: list[OverlapCandidate] = []
|
||||
blobs = [
|
||||
(_entry_blob(entry, index_item), index_item["id"], entry)
|
||||
for index_item, entry in indexed_entries
|
||||
]
|
||||
|
||||
for i, (left_blob, left_id, left_entry) in enumerate(blobs):
|
||||
left_tokens = _tokens(left_blob)
|
||||
for j in range(i + 1, len(blobs)):
|
||||
right_blob, right_id, right_entry = blobs[j]
|
||||
right_tokens = _tokens(right_blob)
|
||||
if not left_tokens or not right_tokens:
|
||||
continue
|
||||
score = len(left_tokens & right_tokens) / len(left_tokens | right_tokens)
|
||||
reasons: list[str] = []
|
||||
if score >= threshold:
|
||||
reasons.append(f"token similarity {score:.2f}")
|
||||
shared_tags = set(left_entry.get("tags", [])) & set(
|
||||
right_entry.get("tags", [])
|
||||
)
|
||||
if shared_tags:
|
||||
reasons.append(f"shared tags: {', '.join(sorted(shared_tags))}")
|
||||
reasons.extend(_relation_overlap(left_entry, right_entry))
|
||||
if reasons and (score >= threshold or len(reasons) > 1):
|
||||
candidates.append(
|
||||
OverlapCandidate(
|
||||
left_id=left_id,
|
||||
right_id=right_id,
|
||||
score=score,
|
||||
reasons=reasons,
|
||||
)
|
||||
)
|
||||
return sorted(candidates, key=lambda item: item.score, reverse=True)
|
||||
@@ -42,6 +42,25 @@ reuse-surface export
|
||||
reuse-surface export --format json
|
||||
```
|
||||
|
||||
### overlaps
|
||||
|
||||
Detect potential duplicate or overlapping capabilities (UC-RS-015).
|
||||
|
||||
```bash
|
||||
reuse-surface overlaps
|
||||
reuse-surface overlaps --threshold 0.35
|
||||
```
|
||||
|
||||
### catalog
|
||||
|
||||
Generate human-readable catalog artifacts (UC-RS-018).
|
||||
|
||||
```bash
|
||||
reuse-surface catalog
|
||||
```
|
||||
|
||||
Writes `docs/CapabilityCatalog.md` and `docs/catalog/index.html`.
|
||||
|
||||
## Export format
|
||||
|
||||
The export bundle includes:
|
||||
@@ -59,6 +78,8 @@ Stable IDs and maturity fields are preserved for agent consumption (UC-RS-019).
|
||||
| Discover capabilities | `reuse-surface query` or read the index |
|
||||
| Validate entry shape | `reuse-surface validate` |
|
||||
| Export for agents | `reuse-surface export --format json` |
|
||||
| Detect overlap | `reuse-surface overlaps` |
|
||||
| Publish catalog | `reuse-surface catalog` |
|
||||
|
||||
## Related use cases
|
||||
|
||||
|
||||
66
workplans/REUSE-WP-0004-registry-hardening.md
Normal file
66
workplans/REUSE-WP-0004-registry-hardening.md
Normal file
@@ -0,0 +1,66 @@
|
||||
---
|
||||
id: REUSE-WP-0004
|
||||
type: workplan
|
||||
title: "Registry hardening: CI, overlap detection, and catalog"
|
||||
domain: helix_forge
|
||||
repo: reuse-surface
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: helix-forge
|
||||
created: "2026-06-15"
|
||||
updated: "2026-06-15"
|
||||
---
|
||||
|
||||
# Registry hardening: CI, overlap detection, and catalog
|
||||
|
||||
Follow-up to `docs/IntentScopeGapAnalysis.md` section 8 next recommended work
|
||||
(priorities 9–11). Raise registry quality through automated CI validation, overlap
|
||||
reporting (UC-RS-015), and a generated human-readable catalog (UC-RS-018).
|
||||
|
||||
## Add CI Validation Workflow
|
||||
|
||||
```task
|
||||
id: REUSE-WP-0004-T01
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
Add `.gitea/workflows/ci.yml` that runs on push and pull requests to `main`.
|
||||
Install the package and run `reuse-surface validate`. Document the workflow in
|
||||
`AGENTS.md`.
|
||||
|
||||
## Add Overlap Detection Command
|
||||
|
||||
```task
|
||||
id: REUSE-WP-0004-T02
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
Add `reuse-surface overlaps` that flags potential duplicate or overlapping
|
||||
capabilities using summary/tags/includes similarity and relation signals.
|
||||
Document usage in `registry/README.md` and `tools/README.md`.
|
||||
|
||||
## Add Catalog Generation Command
|
||||
|
||||
```task
|
||||
id: REUSE-WP-0004-T03
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Add `reuse-surface catalog` that generates `docs/CapabilityCatalog.md` and
|
||||
`docs/catalog/index.html` from the index and entry front matter. Group by domain
|
||||
and show maturity vectors.
|
||||
|
||||
## Refresh Docs And Gap Analysis
|
||||
|
||||
```task
|
||||
id: REUSE-WP-0004-T04
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Update `SCOPE.md`, `tools/README.md`, and `docs/IntentScopeGapAnalysis.md` to
|
||||
reflect CI, overlaps, and catalog capabilities. Close gap analysis priorities
|
||||
9–11.
|
||||
Reference in New Issue
Block a user