diff --git a/docs/characteristic-evidence-model.md b/docs/characteristic-evidence-model.md index 0c1b25b..90f65f7 100644 --- a/docs/characteristic-evidence-model.md +++ b/docs/characteristic-evidence-model.md @@ -30,6 +30,33 @@ organized under the wrong capability. Observed facts are deterministic scanner output. They describe what was seen in the repository: files, languages, frameworks, routes, tests, documentation, provider names, configuration variables, and similar source-linked observations. +Facts can carry a source role so generation can separate product evidence from +ambient context. Important roles include: + +- `intent_summary`: `INTENT.md` or equivalent design-intent material describing + why the repository should exist and what utility it is meant to provide. +- `derived_scope`: `SCOPE.md` or equivalent current-scope material. This is a + derived or curated description of what is believed to be true now, not primary + evidence for rebuilding the same characteristic model. +- `product_documentation`: README, docs, specifications, and user-facing guides. +- `implementation_source`: source code owned by the repository. +- `dependency_declaration`: manifests, imports, lockfiles, and package metadata. +- `configuration`, `ci_tooling`, `test_evidence`, and `agent_guidance`. + +`INTENT.md` and `SCOPE.md` deliberately answer different questions. Intent is a +design artifact: what the repository is supposed to become or provide. Scope is +a derived current-state artifact: what the repository is understood to provide +after evidence and review. A good `SCOPE.md` is valuable context, but using it +as ordinary evidence for generated characteristics creates a circular model. +Rebuilds should therefore prefer `INTENT.md`, product documentation, source, and +tests; `SCOPE.md` should be used as comparison material or explicit bootstrap +input only when a curator chooses that mode. + +For repositories that already have a useful `SCOPE.md` but no `INTENT.md`, +repo-scoping can perform a one-time bootstrap by copying the scope text into a +new intent file with a clear provenance note. After that bootstrap, the files +should diverge naturally: `INTENT.md` remains design intent, while `SCOPE.md` +remains generated or curated current scope. Source references point from interpreted claims back to files or facts. diff --git a/docs/terminology.md b/docs/terminology.md index f692e99..a19f1a4 100644 --- a/docs/terminology.md +++ b/docs/terminology.md @@ -42,6 +42,20 @@ normalization. facts or to lower-level characteristics. - Observed fact: deterministic scanner output such as files, manifests, languages, tests, APIs, routes, commands, or documentation references. +- Intent: a design-time statement of expected repository utility. `INTENT.md` + is the preferred file for this. It can guide candidate generation because it + describes why the repository should exist. +- Derived scope: a current-state statement of what the repository is understood + to provide. `SCOPE.md` is the preferred file for this. It is generated or + curated from evidence and approved characteristics, so it should not be used + as ordinary evidence for rebuilding those same characteristics. +- Intent bootstrap: a one-time migration that creates `INTENT.md` from an + existing `SCOPE.md` when no intent file exists. The generated file carries a + provenance note and should be reviewed as design intent. +- Source role: provenance metadata on a fact or content chunk, such as + `intent_summary`, `derived_scope`, `product_documentation`, + `implementation_source`, `dependency_declaration`, `configuration`, + `ci_tooling`, `test_evidence`, or `agent_guidance`. - Candidate: proposed characteristic or evidence from deterministic heuristics or optional LLM assistance. Candidates are review inputs, not registry truth. - Approved: curated registry truth that appears in ability maps, search, exports, diff --git a/src/repo_registry/candidate_graph/generator.py b/src/repo_registry/candidate_graph/generator.py index e9978f6..838e6f5 100644 --- a/src/repo_registry/candidate_graph/generator.py +++ b/src/repo_registry/candidate_graph/generator.py @@ -63,8 +63,7 @@ class CandidateGraphGenerator: return [] chunks = chunks or [] - scope_docs = self._facts(facts, "scope") - docs = scope_docs + self._facts(facts, "documentation") + docs = self._facts(facts, "intent") + self._facts(facts, "documentation") tests = self._facts(facts, "test") examples = self._facts(facts, "example") interfaces = self._facts(facts, "interface") @@ -662,7 +661,7 @@ class CandidateGraphGenerator: def _document_purpose_sentence(self, chunks: list[ContentChunk]) -> str: for chunk in self._documentation_chunks(chunks): - if chunk.kind not in {"scope", "documentation"}: + if chunk.kind not in {"intent", "documentation"}: continue lines = [line.strip() for line in chunk.text.splitlines() if line.strip()] paragraph = next((line for line in lines if not line.startswith("#")), "") @@ -745,8 +744,8 @@ class CandidateGraphGenerator: def _documentation_chunks(self, chunks: list[ContentChunk]) -> list[ContentChunk]: return sorted( - [chunk for chunk in chunks if chunk.kind in {"scope", "documentation"}], - key=lambda chunk: (0 if chunk.kind == "scope" else 1, chunk.path, chunk.start_line), + [chunk for chunk in chunks if chunk.kind in {"intent", "documentation"}], + key=lambda chunk: (0 if chunk.kind == "intent" else 1, chunk.path, chunk.start_line), ) def _interface_summary(self, chunks: list[ContentChunk]) -> str: diff --git a/src/repo_registry/content_indexing/extractor.py b/src/repo_registry/content_indexing/extractor.py index ed216ab..96854cc 100644 --- a/src/repo_registry/content_indexing/extractor.py +++ b/src/repo_registry/content_indexing/extractor.py @@ -7,6 +7,7 @@ from repo_registry.core.models import ObservedFact INDEXED_FACT_KINDS = { + "intent", "scope", "documentation", "example", diff --git a/src/repo_registry/intent/__init__.py b/src/repo_registry/intent/__init__.py new file mode 100644 index 0000000..033d548 --- /dev/null +++ b/src/repo_registry/intent/__init__.py @@ -0,0 +1 @@ +"""Intent-file helpers for repository scoping.""" diff --git a/src/repo_registry/intent/bootstrap.py b/src/repo_registry/intent/bootstrap.py new file mode 100644 index 0000000..020754c --- /dev/null +++ b/src/repo_registry/intent/bootstrap.py @@ -0,0 +1,130 @@ +from __future__ import annotations + +import argparse +from dataclasses import dataclass +from datetime import date +from pathlib import Path +from typing import Iterable + + +BOOTSTRAP_NOTE = ( + "> Bootstrapped from `SCOPE.md` by repo-scoping.\n" + "> Review and edit this file as design intent. `SCOPE.md` remains the\n" + "> derived current-scope artifact." +) + + +@dataclass(frozen=True) +class IntentBootstrapResult: + repo_path: str + scope_path: str + intent_path: str + status: str + message: str + + +def bootstrap_intent_from_scope( + repo_path: str | Path, + *, + dry_run: bool = False, + overwrite: bool = False, + today: date | None = None, +) -> IntentBootstrapResult: + root = Path(repo_path).expanduser().resolve() + scope_path = root / "SCOPE.md" + intent_path = root / "INTENT.md" + + if not root.is_dir(): + return _result(root, scope_path, intent_path, "missing_repo", "repository path does not exist") + if not scope_path.is_file(): + return _result(root, scope_path, intent_path, "missing_scope", "SCOPE.md is not present") + if intent_path.exists() and not overwrite: + return _result(root, scope_path, intent_path, "exists", "INTENT.md already exists") + + status = "would_overwrite" if intent_path.exists() else "would_create" + if dry_run: + return _result(root, scope_path, intent_path, status, f"{status} INTENT.md from SCOPE.md") + + intent_text = scope_to_intent_text( + scope_path.read_text(encoding="utf-8"), + today=today, + ) + intent_path.write_text(intent_text, encoding="utf-8") + created_status = "overwritten" if status == "would_overwrite" else "created" + return _result(root, scope_path, intent_path, created_status, f"{created_status} INTENT.md from SCOPE.md") + + +def bootstrap_many( + repo_paths: Iterable[str | Path], + *, + dry_run: bool = False, + overwrite: bool = False, + today: date | None = None, +) -> list[IntentBootstrapResult]: + return [ + bootstrap_intent_from_scope( + repo_path, + dry_run=dry_run, + overwrite=overwrite, + today=today, + ) + for repo_path in repo_paths + ] + + +def scope_to_intent_text(scope_text: str, *, today: date | None = None) -> str: + current_date = today or date.today() + lines = scope_text.splitlines() + while lines and not lines[0].strip(): + lines.pop(0) + + if lines and lines[0].lstrip().lower().startswith("# scope"): + lines[0] = "# INTENT" + elif not lines or not lines[0].startswith("#"): + lines.insert(0, "# INTENT") + + note = f"{BOOTSTRAP_NOTE}\n> Bootstrap date: {current_date.isoformat()}" + insert_at = 1 if lines else 0 + while insert_at < len(lines) and not lines[insert_at].strip(): + insert_at += 1 + lines[insert_at:insert_at] = ["", note, ""] + return "\n".join(lines).rstrip() + "\n" + + +def _result( + root: Path, + scope_path: Path, + intent_path: Path, + status: str, + message: str, +) -> IntentBootstrapResult: + return IntentBootstrapResult( + repo_path=str(root), + scope_path=str(scope_path), + intent_path=str(intent_path), + status=status, + message=message, + ) + + +def main(argv: list[str] | None = None) -> int: + parser = argparse.ArgumentParser( + description="Bootstrap INTENT.md from SCOPE.md for repositories that do not have intent files yet." + ) + parser.add_argument("repo_paths", nargs="+", help="Repository checkout path(s) to inspect") + parser.add_argument("--dry-run", action="store_true", help="Report planned writes without writing files") + parser.add_argument("--overwrite", action="store_true", help="Overwrite existing INTENT.md files") + args = parser.parse_args(argv) + + results = bootstrap_many( + args.repo_paths, + dry_run=args.dry_run, + overwrite=args.overwrite, + ) + for result in results: + print(f"{result.status}\t{result.repo_path}\t{result.message}") + return 1 if any(result.status in {"missing_repo", "missing_scope"} for result in results) else 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/src/repo_registry/repo_scanning/scanner.py b/src/repo_registry/repo_scanning/scanner.py index db77d85..4156a51 100644 --- a/src/repo_registry/repo_scanning/scanner.py +++ b/src/repo_registry/repo_scanning/scanner.py @@ -180,13 +180,22 @@ class DeterministicScanner: name = path.name.lower() source_role = self._source_role(relative) - if name == "scope.md": + if name == "intent.md": + facts.append( + FactCandidate( + "intent", + "INTENT", + relative, + metadata={"source_role": "intent_summary"}, + ) + ) + elif name == "scope.md": facts.append( FactCandidate( "scope", "SCOPE", relative, - metadata={"source_role": "scope_summary"}, + metadata={"source_role": "derived_scope"}, ) ) elif name.startswith("readme"): @@ -429,8 +438,10 @@ class DeterministicScanner: lower = relative_path.lower() parts = lower.split("/") name = parts[-1] + if name == "intent.md": + return "intent_summary" if name == "scope.md": - return "scope_summary" + return "derived_scope" if name in AGENT_GUIDANCE_FILES or any(part in AGENT_GUIDANCE_DIRS for part in parts): return "agent_guidance" if lower.startswith((".github/workflows/", ".gitea/workflows/")): diff --git a/tests/test_candidate_graph.py b/tests/test_candidate_graph.py index 56fd0f4..0d0305b 100644 --- a/tests/test_candidate_graph.py +++ b/tests/test_candidate_graph.py @@ -108,6 +108,51 @@ def test_candidate_generator_enriches_descriptions_from_content_chunks(): assert '@app.post("/classify")' in graph[0].capabilities[0].description +def test_candidate_generator_prefers_intent_over_derived_scope_chunks(): + repository = Repository( + id=1, + name="KeyCape", + url="/tmp/key-cape", + description=None, + branch="main", + status="analyzed", + ) + facts = [ + fact(1, "intent", "INTENT", "INTENT.md"), + fact(2, "scope", "SCOPE", "SCOPE.md"), + fact(3, "documentation", "README", "README.md"), + ] + chunks = [ + chunk( + 1, + "scope", + "SCOPE.md", + "# SCOPE\nAlready provides deployed IAM runtime behavior.", + end_line=2, + ), + chunk( + 2, + "intent", + "INTENT.md", + "# INTENT\nDesign a lightweight IAM profile implementation.", + end_line=2, + ), + chunk( + 3, + "documentation", + "README.md", + "# KeyCape\nREADME fallback should not beat intent.", + end_line=2, + ), + ] + + graph = CandidateGraphGenerator().generate(repository, facts, chunks) + + assert graph[0].name == "Design A Lightweight IAM Profile Implementation" + assert "INTENT. Design a lightweight IAM profile implementation" in graph[0].description + assert graph[0].source_refs[0].path == "INTENT.md" + + def test_candidate_confidence_scoring_stays_conservative_for_weak_facts(): repository = Repository( id=1, diff --git a/tests/test_content_indexing.py b/tests/test_content_indexing.py index e7cb07e..46c153e 100644 --- a/tests/test_content_indexing.py +++ b/tests/test_content_indexing.py @@ -86,18 +86,18 @@ def test_content_extractor_chunks_provider_related_config(tmp_path): assert "OPENROUTER_API_KEY" in chunks[0].text -def test_content_extractor_preserves_source_role_metadata(tmp_path): +def test_content_extractor_preserves_intent_source_role_metadata(tmp_path): repo = tmp_path / "repo" repo.mkdir() - (repo / "SCOPE.md").write_text("# SCOPE\n\nProvides OIDC.\n", encoding="utf-8") + (repo / "INTENT.md").write_text("# INTENT\n\nProvide OIDC.\n", encoding="utf-8") chunks = ContentExtractor().extract( repo, [ - fact(1, "scope", "SCOPE", "SCOPE.md", source_role="scope_summary"), + fact(1, "intent", "INTENT", "INTENT.md", source_role="intent_summary"), ], ) assert len(chunks) == 1 - assert chunks[0].kind == "scope" - assert chunks[0].metadata["source_role"] == "scope_summary" + assert chunks[0].kind == "intent" + assert chunks[0].metadata["source_role"] == "intent_summary" diff --git a/tests/test_intent_bootstrap.py b/tests/test_intent_bootstrap.py new file mode 100644 index 0000000..eccbcf0 --- /dev/null +++ b/tests/test_intent_bootstrap.py @@ -0,0 +1,51 @@ +from datetime import date + +from repo_registry.intent.bootstrap import bootstrap_intent_from_scope, scope_to_intent_text + + +def test_scope_to_intent_text_replaces_scope_heading_and_marks_bootstrap(): + text = scope_to_intent_text( + "# SCOPE.md - Demo\n\n## One-liner\n\nCurrent utility.\n", + today=date(2026, 5, 2), + ) + + assert text.startswith("# INTENT\n\n") + assert "Bootstrapped from `SCOPE.md`" in text + assert "Bootstrap date: 2026-05-02" in text + assert "## One-liner\n\nCurrent utility." in text + + +def test_bootstrap_intent_from_scope_creates_intent_when_missing(tmp_path): + repo = tmp_path / "repo" + repo.mkdir() + (repo / "SCOPE.md").write_text("# SCOPE\n\nProvides search.\n", encoding="utf-8") + + result = bootstrap_intent_from_scope(repo, today=date(2026, 5, 2)) + + assert result.status == "created" + intent_text = (repo / "INTENT.md").read_text(encoding="utf-8") + assert intent_text.startswith("# INTENT") + assert "Provides search." in intent_text + + +def test_bootstrap_intent_from_scope_does_not_overwrite_existing_intent(tmp_path): + repo = tmp_path / "repo" + repo.mkdir() + (repo / "SCOPE.md").write_text("# SCOPE\n", encoding="utf-8") + (repo / "INTENT.md").write_text("# INTENT\n\nKeep me.\n", encoding="utf-8") + + result = bootstrap_intent_from_scope(repo) + + assert result.status == "exists" + assert (repo / "INTENT.md").read_text(encoding="utf-8") == "# INTENT\n\nKeep me.\n" + + +def test_bootstrap_intent_from_scope_dry_run_reports_without_writing(tmp_path): + repo = tmp_path / "repo" + repo.mkdir() + (repo / "SCOPE.md").write_text("# SCOPE\n", encoding="utf-8") + + result = bootstrap_intent_from_scope(repo, dry_run=True) + + assert result.status == "would_create" + assert not (repo / "INTENT.md").exists() diff --git a/tests/test_repository_scanner.py b/tests/test_repository_scanner.py index 1af2993..3b44228 100644 --- a/tests/test_repository_scanner.py +++ b/tests/test_repository_scanner.py @@ -42,20 +42,29 @@ def test_deterministic_scanner_extracts_structural_facts(tmp_path): assert languages == {"Python": 2} -def test_scanner_records_scope_with_source_role(tmp_path): +def test_scanner_records_intent_and_scope_with_distinct_source_roles(tmp_path): repo = tmp_path / "sample" repo.mkdir() + (repo / "INTENT.md").write_text( + "# INTENT\n\nProvides planned OIDC profile enforcement.\n", + encoding="utf-8", + ) (repo / "SCOPE.md").write_text( - "# SCOPE\n\n## One-liner\n\nProvides OIDC profile enforcement.\n", + "# SCOPE\n\n## One-liner\n\nCurrently provides OIDC profile enforcement.\n", encoding="utf-8", ) result = DeterministicScanner().scan(repo) + intent_fact = next(fact for fact in result.facts if fact.kind == "intent") + assert intent_fact.name == "INTENT" + assert intent_fact.path == "INTENT.md" + assert intent_fact.metadata["source_role"] == "intent_summary" + scope_fact = next(fact for fact in result.facts if fact.kind == "scope") assert scope_fact.name == "SCOPE" assert scope_fact.path == "SCOPE.md" - assert scope_fact.metadata["source_role"] == "scope_summary" + assert scope_fact.metadata["source_role"] == "derived_scope" def test_scanner_readme_only_fixture_records_docs_without_interfaces(tmp_path): diff --git a/workplans/RREG-WP-0009-provenance-aware-characteristic-rebuild.md b/workplans/RREG-WP-0009-provenance-aware-characteristic-rebuild.md index 0fb4f60..7adcf94 100644 --- a/workplans/RREG-WP-0009-provenance-aware-characteristic-rebuild.md +++ b/workplans/RREG-WP-0009-provenance-aware-characteristic-rebuild.md @@ -23,8 +23,9 @@ dependency, import, or operational convention mentioned in its files. The target behavior is facts-first and provenance-aware: - Deterministic scanning observes facts without over-interpreting them. -- Facts carry source roles such as product documentation, scope summary, - implementation source, dependency declaration, agent guidance, or CI/tooling. +- Facts carry source roles such as intent summary, derived scope, product + documentation, implementation source, dependency declaration, agent guidance, + or CI/tooling. - Characteristic generation promotes only repository-owned utility unless the repository clearly acts as a facade or adapter for another capability. - Rebuild workflows can discard old approved characteristics and regenerate a @@ -44,7 +45,12 @@ generation can distinguish product evidence from ambient context. Initial source roles: -- `scope_summary`: `SCOPE.md` and other canonical scope files. +- `intent_summary`: `INTENT.md` and other design-intent files that describe why + the repository should exist and what utility it is meant to provide. +- `derived_scope`: `SCOPE.md` and other generated or curated current-scope + files. These are valuable context, but should not be treated as primary + evidence for regenerating characteristics unless a curator explicitly chooses + a bootstrap/import mode. - `product_documentation`: README, docs, specifications, user-facing guides. - `implementation_source`: code files owned by the repository. - `test_evidence`: test and acceptance files. @@ -59,8 +65,10 @@ Initial source roles: Acceptance criteria: - Observed facts can carry a source role in metadata without breaking existing storage or API consumers. -- `SCOPE.md` is indexed as `scope_summary` and gets high priority during +- `INTENT.md` is indexed as `intent_summary` and gets high priority during candidate generation. +- `SCOPE.md` is indexed as `derived_scope` and remains distinguishable from + source evidence and design intent. - Agent guidance files are classified separately from product documentation. - Content chunks preserve the fact source role used to produce them. @@ -113,19 +121,24 @@ Acceptance criteria: ```task id: RREG-WP-0009-T04 -status: todo +status: in_progress priority: high state_hub_task_id: "4f666cd6-471e-4af9-b53c-4f3d7a1d1973" ``` -Use canonical scope files and product documentation as stronger evidence for +Use explicit intent files and product documentation as stronger evidence for expected repository utility than ambient config, CI files, dependency mentions, -or agent instructions. +agent instructions, or previously derived scope files. Acceptance criteria: -- Candidate ability naming prefers `SCOPE.md` one-liner/core idea when present. -- Candidate capability generation can extract explicit `Provided Capabilities` - blocks from `SCOPE.md`. +- Candidate ability naming prefers `INTENT.md` one-liner/core idea when present. +- Candidate capability generation can extract explicit intended capability + blocks from `INTENT.md`. +- `SCOPE.md` is treated as derived current scope, not as ordinary evidence for + rebuilding the characteristic model from scratch. +- Existing `SCOPE.md` files can be explicitly bootstrapped into initial + `INTENT.md` files when no intent file exists; this is a one-time migration + aid, not an ongoing equivalence between scope and intent. - README/docs/spec evidence is weighted above CI/tooling and generic config. - key-cape generates candidates centered on lightweight IAM, OIDC/PKCE profile enforcement, migration tooling, and LDAP/schema validation rather than LLM @@ -226,7 +239,7 @@ Acceptance criteria: ```task id: RREG-WP-0009-T09 -status: todo +status: in_progress priority: medium state_hub_task_id: "071f6d76-c92b-4ac1-825c-edcbef4bdbf6" ```