one time bootstrap path

This commit is contained in:
2026-05-02 00:36:00 +02:00
parent 911ca45618
commit 76f5ecb1b4
12 changed files with 328 additions and 27 deletions

View File

@@ -30,6 +30,33 @@ organized under the wrong capability.
Observed facts are deterministic scanner output. They describe what was seen in
the repository: files, languages, frameworks, routes, tests, documentation,
provider names, configuration variables, and similar source-linked observations.
Facts can carry a source role so generation can separate product evidence from
ambient context. Important roles include:
- `intent_summary`: `INTENT.md` or equivalent design-intent material describing
why the repository should exist and what utility it is meant to provide.
- `derived_scope`: `SCOPE.md` or equivalent current-scope material. This is a
derived or curated description of what is believed to be true now, not primary
evidence for rebuilding the same characteristic model.
- `product_documentation`: README, docs, specifications, and user-facing guides.
- `implementation_source`: source code owned by the repository.
- `dependency_declaration`: manifests, imports, lockfiles, and package metadata.
- `configuration`, `ci_tooling`, `test_evidence`, and `agent_guidance`.
`INTENT.md` and `SCOPE.md` deliberately answer different questions. Intent is a
design artifact: what the repository is supposed to become or provide. Scope is
a derived current-state artifact: what the repository is understood to provide
after evidence and review. A good `SCOPE.md` is valuable context, but using it
as ordinary evidence for generated characteristics creates a circular model.
Rebuilds should therefore prefer `INTENT.md`, product documentation, source, and
tests; `SCOPE.md` should be used as comparison material or explicit bootstrap
input only when a curator chooses that mode.
For repositories that already have a useful `SCOPE.md` but no `INTENT.md`,
repo-scoping can perform a one-time bootstrap by copying the scope text into a
new intent file with a clear provenance note. After that bootstrap, the files
should diverge naturally: `INTENT.md` remains design intent, while `SCOPE.md`
remains generated or curated current scope.
Source references point from interpreted claims back to files or facts.

View File

@@ -42,6 +42,20 @@ normalization.
facts or to lower-level characteristics.
- Observed fact: deterministic scanner output such as files, manifests,
languages, tests, APIs, routes, commands, or documentation references.
- Intent: a design-time statement of expected repository utility. `INTENT.md`
is the preferred file for this. It can guide candidate generation because it
describes why the repository should exist.
- Derived scope: a current-state statement of what the repository is understood
to provide. `SCOPE.md` is the preferred file for this. It is generated or
curated from evidence and approved characteristics, so it should not be used
as ordinary evidence for rebuilding those same characteristics.
- Intent bootstrap: a one-time migration that creates `INTENT.md` from an
existing `SCOPE.md` when no intent file exists. The generated file carries a
provenance note and should be reviewed as design intent.
- Source role: provenance metadata on a fact or content chunk, such as
`intent_summary`, `derived_scope`, `product_documentation`,
`implementation_source`, `dependency_declaration`, `configuration`,
`ci_tooling`, `test_evidence`, or `agent_guidance`.
- Candidate: proposed characteristic or evidence from deterministic heuristics
or optional LLM assistance. Candidates are review inputs, not registry truth.
- Approved: curated registry truth that appears in ability maps, search, exports,

View File

@@ -63,8 +63,7 @@ class CandidateGraphGenerator:
return []
chunks = chunks or []
scope_docs = self._facts(facts, "scope")
docs = scope_docs + self._facts(facts, "documentation")
docs = self._facts(facts, "intent") + self._facts(facts, "documentation")
tests = self._facts(facts, "test")
examples = self._facts(facts, "example")
interfaces = self._facts(facts, "interface")
@@ -662,7 +661,7 @@ class CandidateGraphGenerator:
def _document_purpose_sentence(self, chunks: list[ContentChunk]) -> str:
for chunk in self._documentation_chunks(chunks):
if chunk.kind not in {"scope", "documentation"}:
if chunk.kind not in {"intent", "documentation"}:
continue
lines = [line.strip() for line in chunk.text.splitlines() if line.strip()]
paragraph = next((line for line in lines if not line.startswith("#")), "")
@@ -745,8 +744,8 @@ class CandidateGraphGenerator:
def _documentation_chunks(self, chunks: list[ContentChunk]) -> list[ContentChunk]:
return sorted(
[chunk for chunk in chunks if chunk.kind in {"scope", "documentation"}],
key=lambda chunk: (0 if chunk.kind == "scope" else 1, chunk.path, chunk.start_line),
[chunk for chunk in chunks if chunk.kind in {"intent", "documentation"}],
key=lambda chunk: (0 if chunk.kind == "intent" else 1, chunk.path, chunk.start_line),
)
def _interface_summary(self, chunks: list[ContentChunk]) -> str:

View File

@@ -7,6 +7,7 @@ from repo_registry.core.models import ObservedFact
INDEXED_FACT_KINDS = {
"intent",
"scope",
"documentation",
"example",

View File

@@ -0,0 +1 @@
"""Intent-file helpers for repository scoping."""

View File

@@ -0,0 +1,130 @@
from __future__ import annotations
import argparse
from dataclasses import dataclass
from datetime import date
from pathlib import Path
from typing import Iterable
BOOTSTRAP_NOTE = (
"> Bootstrapped from `SCOPE.md` by repo-scoping.\n"
"> Review and edit this file as design intent. `SCOPE.md` remains the\n"
"> derived current-scope artifact."
)
@dataclass(frozen=True)
class IntentBootstrapResult:
repo_path: str
scope_path: str
intent_path: str
status: str
message: str
def bootstrap_intent_from_scope(
repo_path: str | Path,
*,
dry_run: bool = False,
overwrite: bool = False,
today: date | None = None,
) -> IntentBootstrapResult:
root = Path(repo_path).expanduser().resolve()
scope_path = root / "SCOPE.md"
intent_path = root / "INTENT.md"
if not root.is_dir():
return _result(root, scope_path, intent_path, "missing_repo", "repository path does not exist")
if not scope_path.is_file():
return _result(root, scope_path, intent_path, "missing_scope", "SCOPE.md is not present")
if intent_path.exists() and not overwrite:
return _result(root, scope_path, intent_path, "exists", "INTENT.md already exists")
status = "would_overwrite" if intent_path.exists() else "would_create"
if dry_run:
return _result(root, scope_path, intent_path, status, f"{status} INTENT.md from SCOPE.md")
intent_text = scope_to_intent_text(
scope_path.read_text(encoding="utf-8"),
today=today,
)
intent_path.write_text(intent_text, encoding="utf-8")
created_status = "overwritten" if status == "would_overwrite" else "created"
return _result(root, scope_path, intent_path, created_status, f"{created_status} INTENT.md from SCOPE.md")
def bootstrap_many(
repo_paths: Iterable[str | Path],
*,
dry_run: bool = False,
overwrite: bool = False,
today: date | None = None,
) -> list[IntentBootstrapResult]:
return [
bootstrap_intent_from_scope(
repo_path,
dry_run=dry_run,
overwrite=overwrite,
today=today,
)
for repo_path in repo_paths
]
def scope_to_intent_text(scope_text: str, *, today: date | None = None) -> str:
current_date = today or date.today()
lines = scope_text.splitlines()
while lines and not lines[0].strip():
lines.pop(0)
if lines and lines[0].lstrip().lower().startswith("# scope"):
lines[0] = "# INTENT"
elif not lines or not lines[0].startswith("#"):
lines.insert(0, "# INTENT")
note = f"{BOOTSTRAP_NOTE}\n> Bootstrap date: {current_date.isoformat()}"
insert_at = 1 if lines else 0
while insert_at < len(lines) and not lines[insert_at].strip():
insert_at += 1
lines[insert_at:insert_at] = ["", note, ""]
return "\n".join(lines).rstrip() + "\n"
def _result(
root: Path,
scope_path: Path,
intent_path: Path,
status: str,
message: str,
) -> IntentBootstrapResult:
return IntentBootstrapResult(
repo_path=str(root),
scope_path=str(scope_path),
intent_path=str(intent_path),
status=status,
message=message,
)
def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(
description="Bootstrap INTENT.md from SCOPE.md for repositories that do not have intent files yet."
)
parser.add_argument("repo_paths", nargs="+", help="Repository checkout path(s) to inspect")
parser.add_argument("--dry-run", action="store_true", help="Report planned writes without writing files")
parser.add_argument("--overwrite", action="store_true", help="Overwrite existing INTENT.md files")
args = parser.parse_args(argv)
results = bootstrap_many(
args.repo_paths,
dry_run=args.dry_run,
overwrite=args.overwrite,
)
for result in results:
print(f"{result.status}\t{result.repo_path}\t{result.message}")
return 1 if any(result.status in {"missing_repo", "missing_scope"} for result in results) else 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -180,13 +180,22 @@ class DeterministicScanner:
name = path.name.lower()
source_role = self._source_role(relative)
if name == "scope.md":
if name == "intent.md":
facts.append(
FactCandidate(
"intent",
"INTENT",
relative,
metadata={"source_role": "intent_summary"},
)
)
elif name == "scope.md":
facts.append(
FactCandidate(
"scope",
"SCOPE",
relative,
metadata={"source_role": "scope_summary"},
metadata={"source_role": "derived_scope"},
)
)
elif name.startswith("readme"):
@@ -429,8 +438,10 @@ class DeterministicScanner:
lower = relative_path.lower()
parts = lower.split("/")
name = parts[-1]
if name == "intent.md":
return "intent_summary"
if name == "scope.md":
return "scope_summary"
return "derived_scope"
if name in AGENT_GUIDANCE_FILES or any(part in AGENT_GUIDANCE_DIRS for part in parts):
return "agent_guidance"
if lower.startswith((".github/workflows/", ".gitea/workflows/")):

View File

@@ -108,6 +108,51 @@ def test_candidate_generator_enriches_descriptions_from_content_chunks():
assert '@app.post("/classify")' in graph[0].capabilities[0].description
def test_candidate_generator_prefers_intent_over_derived_scope_chunks():
repository = Repository(
id=1,
name="KeyCape",
url="/tmp/key-cape",
description=None,
branch="main",
status="analyzed",
)
facts = [
fact(1, "intent", "INTENT", "INTENT.md"),
fact(2, "scope", "SCOPE", "SCOPE.md"),
fact(3, "documentation", "README", "README.md"),
]
chunks = [
chunk(
1,
"scope",
"SCOPE.md",
"# SCOPE\nAlready provides deployed IAM runtime behavior.",
end_line=2,
),
chunk(
2,
"intent",
"INTENT.md",
"# INTENT\nDesign a lightweight IAM profile implementation.",
end_line=2,
),
chunk(
3,
"documentation",
"README.md",
"# KeyCape\nREADME fallback should not beat intent.",
end_line=2,
),
]
graph = CandidateGraphGenerator().generate(repository, facts, chunks)
assert graph[0].name == "Design A Lightweight IAM Profile Implementation"
assert "INTENT. Design a lightweight IAM profile implementation" in graph[0].description
assert graph[0].source_refs[0].path == "INTENT.md"
def test_candidate_confidence_scoring_stays_conservative_for_weak_facts():
repository = Repository(
id=1,

View File

@@ -86,18 +86,18 @@ def test_content_extractor_chunks_provider_related_config(tmp_path):
assert "OPENROUTER_API_KEY" in chunks[0].text
def test_content_extractor_preserves_source_role_metadata(tmp_path):
def test_content_extractor_preserves_intent_source_role_metadata(tmp_path):
repo = tmp_path / "repo"
repo.mkdir()
(repo / "SCOPE.md").write_text("# SCOPE\n\nProvides OIDC.\n", encoding="utf-8")
(repo / "INTENT.md").write_text("# INTENT\n\nProvide OIDC.\n", encoding="utf-8")
chunks = ContentExtractor().extract(
repo,
[
fact(1, "scope", "SCOPE", "SCOPE.md", source_role="scope_summary"),
fact(1, "intent", "INTENT", "INTENT.md", source_role="intent_summary"),
],
)
assert len(chunks) == 1
assert chunks[0].kind == "scope"
assert chunks[0].metadata["source_role"] == "scope_summary"
assert chunks[0].kind == "intent"
assert chunks[0].metadata["source_role"] == "intent_summary"

View File

@@ -0,0 +1,51 @@
from datetime import date
from repo_registry.intent.bootstrap import bootstrap_intent_from_scope, scope_to_intent_text
def test_scope_to_intent_text_replaces_scope_heading_and_marks_bootstrap():
text = scope_to_intent_text(
"# SCOPE.md - Demo\n\n## One-liner\n\nCurrent utility.\n",
today=date(2026, 5, 2),
)
assert text.startswith("# INTENT\n\n")
assert "Bootstrapped from `SCOPE.md`" in text
assert "Bootstrap date: 2026-05-02" in text
assert "## One-liner\n\nCurrent utility." in text
def test_bootstrap_intent_from_scope_creates_intent_when_missing(tmp_path):
repo = tmp_path / "repo"
repo.mkdir()
(repo / "SCOPE.md").write_text("# SCOPE\n\nProvides search.\n", encoding="utf-8")
result = bootstrap_intent_from_scope(repo, today=date(2026, 5, 2))
assert result.status == "created"
intent_text = (repo / "INTENT.md").read_text(encoding="utf-8")
assert intent_text.startswith("# INTENT")
assert "Provides search." in intent_text
def test_bootstrap_intent_from_scope_does_not_overwrite_existing_intent(tmp_path):
repo = tmp_path / "repo"
repo.mkdir()
(repo / "SCOPE.md").write_text("# SCOPE\n", encoding="utf-8")
(repo / "INTENT.md").write_text("# INTENT\n\nKeep me.\n", encoding="utf-8")
result = bootstrap_intent_from_scope(repo)
assert result.status == "exists"
assert (repo / "INTENT.md").read_text(encoding="utf-8") == "# INTENT\n\nKeep me.\n"
def test_bootstrap_intent_from_scope_dry_run_reports_without_writing(tmp_path):
repo = tmp_path / "repo"
repo.mkdir()
(repo / "SCOPE.md").write_text("# SCOPE\n", encoding="utf-8")
result = bootstrap_intent_from_scope(repo, dry_run=True)
assert result.status == "would_create"
assert not (repo / "INTENT.md").exists()

View File

@@ -42,20 +42,29 @@ def test_deterministic_scanner_extracts_structural_facts(tmp_path):
assert languages == {"Python": 2}
def test_scanner_records_scope_with_source_role(tmp_path):
def test_scanner_records_intent_and_scope_with_distinct_source_roles(tmp_path):
repo = tmp_path / "sample"
repo.mkdir()
(repo / "INTENT.md").write_text(
"# INTENT\n\nProvides planned OIDC profile enforcement.\n",
encoding="utf-8",
)
(repo / "SCOPE.md").write_text(
"# SCOPE\n\n## One-liner\n\nProvides OIDC profile enforcement.\n",
"# SCOPE\n\n## One-liner\n\nCurrently provides OIDC profile enforcement.\n",
encoding="utf-8",
)
result = DeterministicScanner().scan(repo)
intent_fact = next(fact for fact in result.facts if fact.kind == "intent")
assert intent_fact.name == "INTENT"
assert intent_fact.path == "INTENT.md"
assert intent_fact.metadata["source_role"] == "intent_summary"
scope_fact = next(fact for fact in result.facts if fact.kind == "scope")
assert scope_fact.name == "SCOPE"
assert scope_fact.path == "SCOPE.md"
assert scope_fact.metadata["source_role"] == "scope_summary"
assert scope_fact.metadata["source_role"] == "derived_scope"
def test_scanner_readme_only_fixture_records_docs_without_interfaces(tmp_path):

View File

@@ -23,8 +23,9 @@ dependency, import, or operational convention mentioned in its files.
The target behavior is facts-first and provenance-aware:
- Deterministic scanning observes facts without over-interpreting them.
- Facts carry source roles such as product documentation, scope summary,
implementation source, dependency declaration, agent guidance, or CI/tooling.
- Facts carry source roles such as intent summary, derived scope, product
documentation, implementation source, dependency declaration, agent guidance,
or CI/tooling.
- Characteristic generation promotes only repository-owned utility unless the
repository clearly acts as a facade or adapter for another capability.
- Rebuild workflows can discard old approved characteristics and regenerate a
@@ -44,7 +45,12 @@ generation can distinguish product evidence from ambient context.
Initial source roles:
- `scope_summary`: `SCOPE.md` and other canonical scope files.
- `intent_summary`: `INTENT.md` and other design-intent files that describe why
the repository should exist and what utility it is meant to provide.
- `derived_scope`: `SCOPE.md` and other generated or curated current-scope
files. These are valuable context, but should not be treated as primary
evidence for regenerating characteristics unless a curator explicitly chooses
a bootstrap/import mode.
- `product_documentation`: README, docs, specifications, user-facing guides.
- `implementation_source`: code files owned by the repository.
- `test_evidence`: test and acceptance files.
@@ -59,8 +65,10 @@ Initial source roles:
Acceptance criteria:
- Observed facts can carry a source role in metadata without breaking existing
storage or API consumers.
- `SCOPE.md` is indexed as `scope_summary` and gets high priority during
- `INTENT.md` is indexed as `intent_summary` and gets high priority during
candidate generation.
- `SCOPE.md` is indexed as `derived_scope` and remains distinguishable from
source evidence and design intent.
- Agent guidance files are classified separately from product documentation.
- Content chunks preserve the fact source role used to produce them.
@@ -113,19 +121,24 @@ Acceptance criteria:
```task
id: RREG-WP-0009-T04
status: todo
status: in_progress
priority: high
state_hub_task_id: "4f666cd6-471e-4af9-b53c-4f3d7a1d1973"
```
Use canonical scope files and product documentation as stronger evidence for
Use explicit intent files and product documentation as stronger evidence for
expected repository utility than ambient config, CI files, dependency mentions,
or agent instructions.
agent instructions, or previously derived scope files.
Acceptance criteria:
- Candidate ability naming prefers `SCOPE.md` one-liner/core idea when present.
- Candidate capability generation can extract explicit `Provided Capabilities`
blocks from `SCOPE.md`.
- Candidate ability naming prefers `INTENT.md` one-liner/core idea when present.
- Candidate capability generation can extract explicit intended capability
blocks from `INTENT.md`.
- `SCOPE.md` is treated as derived current scope, not as ordinary evidence for
rebuilding the characteristic model from scratch.
- Existing `SCOPE.md` files can be explicitly bootstrapped into initial
`INTENT.md` files when no intent file exists; this is a one-time migration
aid, not an ongoing equivalence between scope and intent.
- README/docs/spec evidence is weighted above CI/tooling and generic config.
- key-cape generates candidates centered on lightweight IAM, OIDC/PKCE profile
enforcement, migration tooling, and LDAP/schema validation rather than LLM
@@ -226,7 +239,7 @@ Acceptance criteria:
```task
id: RREG-WP-0009-T09
status: todo
status: in_progress
priority: medium
state_hub_task_id: "071f6d76-c92b-4ac1-825c-edcbef4bdbf6"
```