generated from coulomb/repo-seed
utility relationships understanding of INTENT.md vs. SCOPE.md and documentation.
This commit is contained in:
@@ -67,6 +67,28 @@ that show the repository provides the utility directly or intentionally exposes
|
||||
it as a facade/adapter. Mentions, dependencies, configuration, and tooling are
|
||||
context until a curator promotes them or stronger owned evidence appears.
|
||||
|
||||
Trusted auto-approval applies the same rule. A candidate capability must have
|
||||
source references and an eligible utility relationship (`owned`, `facade`, or
|
||||
`adapter`) before it can be approved automatically. Dependency, tooling,
|
||||
configuration, and mention-only candidates remain review material. The review
|
||||
decision should explain both sides: why approved candidates were considered safe
|
||||
and why skipped candidates need curator review.
|
||||
|
||||
`INTENT.md` may also seed intended capabilities when it contains an explicit
|
||||
capability section. These intent-derived candidates are marked as review
|
||||
required because intent says what the repository is meant to provide, not what
|
||||
has already been proven. `SCOPE.md` sections with the same wording are not
|
||||
treated as equivalent input during rebuilds, because scope is derived from the
|
||||
registry model being rebuilt.
|
||||
|
||||
The motivating failure mode was a key-cape-like repository whose agent guidance
|
||||
and generic backend-adapter vocabulary looked superficially like LLM provider
|
||||
routing. That pattern should produce source-linked facts for the files that
|
||||
exist, but it should not become an LLM-provider capability unless there is
|
||||
provider-specific owned, facade, or adapter evidence. The scanner and generator
|
||||
should solve this by provenance and utility relationship rules, not by
|
||||
hard-coding product names.
|
||||
|
||||
Source references point from interpreted claims back to files or facts.
|
||||
|
||||
Evidence is support for a characteristic. It is not the same thing as an observed
|
||||
@@ -98,3 +120,21 @@ without breaking existing data:
|
||||
|
||||
The UI should make this relationship clear by presenting evidence as support
|
||||
under the characteristic it supports, not as a peer of features.
|
||||
|
||||
## Rebuilds And Supersession
|
||||
|
||||
Use a normal analysis rerun when the existing approved map is mostly trustworthy
|
||||
and the goal is to compare new evidence against prior candidates. Use a rebuild
|
||||
from scratch when approved characteristics are polluted by a bad extraction
|
||||
pattern, stale after a major rename, or circularly derived from old scope text.
|
||||
|
||||
A dry-run rebuild should be the first step. It scans current source, generates a
|
||||
fresh candidate graph, and reports what approved abilities, capabilities,
|
||||
features, and evidence would be superseded. A confirmed rebuild preserves audit
|
||||
history by recording which approved IDs were superseded, then clears the current
|
||||
approved map and leaves the fresh candidate graph for review or trusted
|
||||
auto-approval.
|
||||
|
||||
Curators should treat superseded characteristics as historical claims, not as
|
||||
deleted facts. They explain what the registry used to believe and why a rebuild
|
||||
was chosen over incremental correction.
|
||||
|
||||
@@ -57,15 +57,40 @@ normalization.
|
||||
`implementation_source`, `dependency_declaration`, `configuration`,
|
||||
`ci_tooling`, `test_evidence`, or `agent_guidance`.
|
||||
- Utility relationship: metadata describing how a fact relates to repository
|
||||
utility, such as `owned`, `facade`, `adapter`, `configure`, `dependency`,
|
||||
`tooling`, or `mention`. Only owned/facade/adapter relationships should be
|
||||
promoted directly into provided capabilities.
|
||||
utility. Only `owned`, `facade`, and explicit `adapter` relationships should
|
||||
be promoted directly into provided capabilities or trusted auto-approval.
|
||||
- Owned capability: utility the repository provides through its own product
|
||||
behavior, source, interface, or documented intent.
|
||||
- Facade capability: utility intentionally exposed through this repository even
|
||||
though important work is delegated elsewhere. Public wrapper APIs, CLI
|
||||
commands, or product documentation should make the facade role explicit.
|
||||
- Adapter capability: utility that connects callers to another implementation
|
||||
through repository-owned adapter code. Generic use of the word adapter is not
|
||||
enough; the adapter needs source-linked evidence for the capability being
|
||||
exposed.
|
||||
- Consumer/configuration relationship: evidence that the repository uses or
|
||||
configures something, such as an environment variable or client dependency,
|
||||
without itself providing that utility.
|
||||
- Dependency relationship: evidence from manifests, imports, lockfiles, or
|
||||
package metadata. Dependencies belong in evidence or "How It Fits" context
|
||||
unless a curator promotes them.
|
||||
- Tooling-context relationship: build, CI, release, or agent-operating context.
|
||||
Tooling can explain how the repository is worked on, but should not define
|
||||
product capabilities by itself.
|
||||
- Mention relationship: ambient text that names a provider, framework, sibling
|
||||
repo, or product concept without showing that the repository provides it.
|
||||
- Candidate: proposed characteristic or evidence from deterministic heuristics
|
||||
or optional LLM assistance. Candidates are review inputs, not registry truth.
|
||||
- Approved: curated registry truth that appears in ability maps, search, exports,
|
||||
and SCOPE generation.
|
||||
- Rejected: a candidate judged false or irrelevant. Rejected entries are hidden
|
||||
by default but retained for audit and recovery.
|
||||
- Rebuild from scratch: an explicit operation that regenerates candidates from
|
||||
current source after approved characteristics have become polluted or stale.
|
||||
Dry-run first; confirmed rebuilds preserve audit history.
|
||||
- Supersede: mark prior approved characteristics as replaced by a rebuild or
|
||||
review correction. Superseded entries explain historical registry state rather
|
||||
than disappearing.
|
||||
- Classification: a main type plus optional additional attributes that help
|
||||
users filter and orient without forcing every item into a single rigid box.
|
||||
|
||||
|
||||
@@ -74,6 +74,7 @@ class CandidateGraphGenerator:
|
||||
credential_configs = self._facts(facts, "credential_config")
|
||||
provider_registries = self._facts(facts, "provider_registry")
|
||||
fallback_policies = self._facts(facts, "fallback_policy")
|
||||
intent_facts = self._facts(facts, "intent")
|
||||
ability_primary_class, ability_attributes = self._ability_classification(
|
||||
repository,
|
||||
facts,
|
||||
@@ -103,6 +104,9 @@ class CandidateGraphGenerator:
|
||||
capabilities.append(
|
||||
self._interface_capability(interfaces, tests, examples, docs, chunks)
|
||||
)
|
||||
capabilities.extend(
|
||||
self._intent_capabilities(intent_facts, chunks, tests, examples, docs)
|
||||
)
|
||||
promotable_llm_providers = self._promotable_llm_facts(llm_providers)
|
||||
promotable_provider_registries = self._promotable_llm_facts(provider_registries)
|
||||
promotable_fallback_policies = self._promotable_llm_facts(fallback_policies)
|
||||
@@ -139,11 +143,11 @@ class CandidateGraphGenerator:
|
||||
languages=languages,
|
||||
docs=docs,
|
||||
),
|
||||
source_refs=self._source_refs(manifests + frameworks + languages),
|
||||
primary_class="repository-structure",
|
||||
attributes=self._structure_attributes(
|
||||
manifests,
|
||||
frameworks,
|
||||
source_refs=self._source_refs(manifests + frameworks + languages),
|
||||
primary_class="repository-structure",
|
||||
attributes=self._structure_attributes(
|
||||
manifests,
|
||||
frameworks,
|
||||
languages,
|
||||
),
|
||||
evidence=self._evidence(tests, examples, docs),
|
||||
@@ -284,6 +288,91 @@ class CandidateGraphGenerator:
|
||||
evidence=self._evidence(tests, examples, docs),
|
||||
)
|
||||
|
||||
def _intent_capabilities(
|
||||
self,
|
||||
intent_facts: list[ObservedFact],
|
||||
chunks: list[ContentChunk],
|
||||
tests: list[ObservedFact],
|
||||
examples: list[ObservedFact],
|
||||
docs: list[ObservedFact],
|
||||
) -> list[CandidateCapabilityDraft]:
|
||||
intent_chunks = [
|
||||
chunk
|
||||
for chunk in chunks
|
||||
if chunk.kind == "intent"
|
||||
and (
|
||||
chunk.metadata.get("source_role") == "intent_summary"
|
||||
or chunk.path.lower().endswith("intent.md")
|
||||
)
|
||||
]
|
||||
if not intent_chunks:
|
||||
return []
|
||||
source_refs = self._source_refs(intent_facts)
|
||||
capabilities: list[CandidateCapabilityDraft] = []
|
||||
seen: set[str] = set()
|
||||
for item in self._intent_capability_items(intent_chunks):
|
||||
name = self._intent_capability_name(item)
|
||||
key = name.lower()
|
||||
if not name or key in seen:
|
||||
continue
|
||||
seen.add(key)
|
||||
capabilities.append(
|
||||
CandidateCapabilityDraft(
|
||||
name=name,
|
||||
description=(
|
||||
"Reviewable intended capability extracted from repository "
|
||||
f"intent: {item}"
|
||||
),
|
||||
inputs=[],
|
||||
outputs=[name],
|
||||
confidence=self._confidence(
|
||||
0.45,
|
||||
[
|
||||
(0.15, bool(source_refs)),
|
||||
(0.10, bool(tests)),
|
||||
(0.05, bool(examples)),
|
||||
(0.05, bool(docs)),
|
||||
],
|
||||
),
|
||||
source_refs=source_refs,
|
||||
primary_class="intent-capability",
|
||||
attributes=[
|
||||
"intent-derived",
|
||||
"utility-owned",
|
||||
"review-required-intent",
|
||||
],
|
||||
evidence=self._evidence(tests, examples, docs),
|
||||
)
|
||||
)
|
||||
return capabilities
|
||||
|
||||
def _intent_capability_items(self, chunks: list[ContentChunk]) -> list[str]:
|
||||
items: list[str] = []
|
||||
in_capability_section = False
|
||||
for chunk in sorted(chunks, key=lambda item: (item.path, item.start_line)):
|
||||
for raw_line in chunk.text.splitlines():
|
||||
line = raw_line.strip()
|
||||
if not line:
|
||||
continue
|
||||
if line.startswith("#"):
|
||||
heading = line.lstrip("#").strip().lower()
|
||||
in_capability_section = "capabilit" in heading
|
||||
continue
|
||||
if not in_capability_section:
|
||||
continue
|
||||
item = re.sub(r"^(?:[-*]|\d+[.)])\s+", "", line).strip()
|
||||
item = re.sub(r"^(?:capability|intended capability)\s*:\s*", "", item, flags=re.I)
|
||||
if item and item != line or raw_line.lstrip().startswith(("-", "*")):
|
||||
items.append(item)
|
||||
return items
|
||||
|
||||
def _intent_capability_name(self, text: str) -> str:
|
||||
candidate = re.split(r"\s+-\s+|\s*:\s*|[.!?]\s+", text.strip(), maxsplit=1)[0]
|
||||
candidate = candidate.strip(" .:-")
|
||||
if not candidate:
|
||||
return ""
|
||||
return self._title_from_words(candidate.split()[:8])
|
||||
|
||||
def _interface_features(
|
||||
self,
|
||||
interfaces: list[ObservedFact],
|
||||
@@ -437,7 +526,7 @@ class CandidateGraphGenerator:
|
||||
def _interface_attributes(self, interfaces: list[ObservedFact]) -> list[str]:
|
||||
feature_types = {self._feature_type(fact) for fact in interfaces}
|
||||
attributes = ["api" if item == "API" else "cli" if item == "CLI" else "callable" for item in feature_types]
|
||||
return self._unique(["surface", *attributes])
|
||||
return self._unique(["surface", *attributes, "utility-owned"])
|
||||
|
||||
def _feature_attributes(
|
||||
self,
|
||||
@@ -467,6 +556,9 @@ class CandidateGraphGenerator:
|
||||
"manifest" if manifests else "",
|
||||
*[fact.name for fact in frameworks],
|
||||
*[fact.name for fact in languages],
|
||||
"utility-dependency" if manifests or frameworks else "",
|
||||
"utility-tooling" if languages and not (manifests or frameworks) else "",
|
||||
"review-required-structural-context",
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -489,6 +489,8 @@ class RegistryService:
|
||||
graph = self.store.get_candidate_graph(repository_id, analysis_run_id)
|
||||
approved_count = 0
|
||||
skipped_count = 0
|
||||
approved_reasons: list[str] = []
|
||||
skipped_reasons: list[str] = []
|
||||
for ability in graph.abilities:
|
||||
if ability.status != "candidate":
|
||||
continue
|
||||
@@ -497,11 +499,14 @@ class RegistryService:
|
||||
for capability in ability.capabilities
|
||||
if capability.status == "candidate"
|
||||
]
|
||||
safe_capabilities = [
|
||||
capability
|
||||
for capability in candidate_capabilities
|
||||
if self._trusted_auto_approve_capability_safe(capability)
|
||||
]
|
||||
safe_capabilities = []
|
||||
for capability in candidate_capabilities:
|
||||
safe, reason = self._trusted_auto_approve_capability_decision(capability)
|
||||
if safe:
|
||||
safe_capabilities.append(capability)
|
||||
approved_reasons.append(f"{capability.name}: {reason}")
|
||||
else:
|
||||
skipped_reasons.append(f"{capability.name}: {reason}")
|
||||
skipped_count += len(candidate_capabilities) - len(safe_capabilities)
|
||||
if not safe_capabilities:
|
||||
continue
|
||||
@@ -536,6 +541,7 @@ class RegistryService:
|
||||
notes=(
|
||||
f"{notes} Auto-approved {approved_count} safe candidate "
|
||||
f"capability(s); left {skipped_count} for review."
|
||||
f"{self._trusted_auto_approve_notes(approved_reasons, skipped_reasons)}"
|
||||
).strip(),
|
||||
)
|
||||
return self.store.get_ability_map(repository_id)
|
||||
@@ -544,23 +550,64 @@ class RegistryService:
|
||||
self,
|
||||
capability: CandidateCapability,
|
||||
) -> bool:
|
||||
safe, _reason = self._trusted_auto_approve_capability_decision(capability)
|
||||
return safe
|
||||
|
||||
def _trusted_auto_approve_capability_decision(
|
||||
self,
|
||||
capability: CandidateCapability,
|
||||
) -> tuple[bool, str]:
|
||||
has_source_refs = bool(capability.source_refs) or any(
|
||||
feature.source_refs for feature in capability.features
|
||||
)
|
||||
if not has_source_refs:
|
||||
return False
|
||||
return False, "missing source references"
|
||||
if capability.primary_class == "repository-structure":
|
||||
return False
|
||||
return False, "structural/dependency context requires curator review"
|
||||
utility_relationships = self._candidate_utility_relationships(capability)
|
||||
eligible_relationships = {"owned", "facade", "adapter"}
|
||||
if not utility_relationships:
|
||||
return False, "missing utility relationship"
|
||||
if not (utility_relationships & eligible_relationships):
|
||||
relationships = ", ".join(sorted(utility_relationships))
|
||||
return False, f"utility relationship is not eligible ({relationships})"
|
||||
if capability.primary_class == "llm-integration":
|
||||
return bool(
|
||||
{"utility-owned", "utility-facade", "utility-adapter"}
|
||||
& set(capability.attributes)
|
||||
)
|
||||
return True, "eligible LLM utility relationship with source support"
|
||||
if capability.primary_class in {"interface", "API", "CLI", "callable", "api", "cli"}:
|
||||
return capability.confidence >= 0.55
|
||||
if capability.confidence >= 0.55:
|
||||
return True, "owned interface with sufficient confidence"
|
||||
return False, "owned interface confidence below trusted threshold"
|
||||
if capability.features:
|
||||
return capability.confidence >= 0.55
|
||||
return capability.confidence >= 0.75
|
||||
if capability.confidence >= 0.55:
|
||||
return True, "eligible utility relationship with feature support"
|
||||
return False, "feature-backed capability confidence below trusted threshold"
|
||||
if capability.confidence >= 0.75:
|
||||
return True, "eligible utility relationship with high confidence"
|
||||
return False, "capability confidence below trusted threshold"
|
||||
|
||||
def _candidate_utility_relationships(
|
||||
self,
|
||||
capability: CandidateCapability,
|
||||
) -> set[str]:
|
||||
return {
|
||||
attribute.removeprefix("utility-")
|
||||
for attribute in capability.attributes
|
||||
if attribute.startswith("utility-")
|
||||
}
|
||||
|
||||
def _trusted_auto_approve_notes(
|
||||
self,
|
||||
approved_reasons: list[str],
|
||||
skipped_reasons: list[str],
|
||||
) -> str:
|
||||
details: list[str] = []
|
||||
if approved_reasons:
|
||||
details.append("Approved: " + "; ".join(approved_reasons) + ".")
|
||||
if skipped_reasons:
|
||||
details.append("Skipped: " + "; ".join(skipped_reasons) + ".")
|
||||
if not details:
|
||||
return ""
|
||||
return " " + " ".join(details)
|
||||
|
||||
def _approved_counts(self, repository_id: int) -> dict[str, int]:
|
||||
ability_map = self.store.get_ability_map(repository_id)
|
||||
|
||||
@@ -58,7 +58,7 @@ def test_candidate_generator_builds_purpose_seed_from_observed_facts():
|
||||
interface_capability = ability.capabilities[0]
|
||||
assert interface_capability.name == "Expose Repository Interface"
|
||||
assert interface_capability.primary_class == "interface"
|
||||
assert {"surface", "api"} <= set(interface_capability.attributes)
|
||||
assert {"surface", "api", "utility-owned"} <= set(interface_capability.attributes)
|
||||
assert interface_capability.confidence == 0.75
|
||||
assert interface_capability.inputs == ["HTTP request"]
|
||||
assert interface_capability.outputs == ["HTTP response"]
|
||||
@@ -68,6 +68,76 @@ def test_candidate_generator_builds_purpose_seed_from_observed_facts():
|
||||
assert interface_capability.features[0].name == "POST /classify"
|
||||
assert interface_capability.features[0].location == "app.py"
|
||||
assert interface_capability.evidence[0].strength == "strong"
|
||||
structure_capability = ability.capabilities[1]
|
||||
assert structure_capability.name == "Describe Repository Structure"
|
||||
assert {
|
||||
"utility-dependency",
|
||||
"review-required-structural-context",
|
||||
} <= set(structure_capability.attributes)
|
||||
|
||||
|
||||
def test_candidate_generator_extracts_intended_capability_blocks_from_intent_chunks():
|
||||
repository = Repository(
|
||||
id=1,
|
||||
name="KeyCape",
|
||||
url="/tmp/key-cape",
|
||||
description=None,
|
||||
branch="main",
|
||||
status="analyzed",
|
||||
)
|
||||
facts = [
|
||||
fact(
|
||||
1,
|
||||
"intent",
|
||||
"INTENT",
|
||||
"INTENT.md",
|
||||
metadata={"source_role": "intent_summary"},
|
||||
),
|
||||
fact(
|
||||
2,
|
||||
"scope",
|
||||
"SCOPE",
|
||||
"SCOPE.md",
|
||||
metadata={"source_role": "derived_scope"},
|
||||
),
|
||||
]
|
||||
chunks = [
|
||||
chunk(
|
||||
1,
|
||||
"intent",
|
||||
"INTENT.md",
|
||||
"# INTENT\n\n"
|
||||
"Lightweight IAM for small deployments.\n\n"
|
||||
"## Intended Capabilities\n\n"
|
||||
"- Enforce OIDC PKCE profiles: reject unsafe client profiles.\n"
|
||||
"- Validate LDAP schema migrations.\n",
|
||||
),
|
||||
chunk(
|
||||
2,
|
||||
"scope",
|
||||
"SCOPE.md",
|
||||
"# SCOPE\n\n## Intended Capabilities\n\n- Route LLM provider requests.\n",
|
||||
),
|
||||
]
|
||||
|
||||
graph = CandidateGraphGenerator().generate(repository, facts, chunks)
|
||||
|
||||
capability_names = {capability.name for capability in graph[0].capabilities}
|
||||
assert "Enforce OIDC PKCE Profiles" in capability_names
|
||||
assert "Validate LDAP Schema Migrations" in capability_names
|
||||
assert "Route LLM Provider Requests" not in capability_names
|
||||
intent_capability = next(
|
||||
capability
|
||||
for capability in graph[0].capabilities
|
||||
if capability.name == "Enforce OIDC PKCE Profiles"
|
||||
)
|
||||
assert intent_capability.primary_class == "intent-capability"
|
||||
assert {
|
||||
"intent-derived",
|
||||
"utility-owned",
|
||||
"review-required-intent",
|
||||
} <= set(intent_capability.attributes)
|
||||
assert [ref.path for ref in intent_capability.source_refs] == ["INTENT.md"]
|
||||
|
||||
|
||||
def test_candidate_generator_enriches_descriptions_from_content_chunks():
|
||||
|
||||
@@ -756,6 +756,15 @@ def test_analyze_repository_can_trusted_auto_approve_candidates(tmp_path):
|
||||
assert decisions[0].action == "trusted_auto_approve_candidate_graph"
|
||||
assert "deterministic candidate generation" in decisions[0].notes
|
||||
assert "Auto-approved 1 safe candidate capability(s); left 1 for review." in decisions[0].notes
|
||||
assert (
|
||||
"Approved: Expose Repository Interface: owned interface with sufficient confidence."
|
||||
in decisions[0].notes
|
||||
)
|
||||
assert (
|
||||
"Skipped: Describe Repository Structure: structural/dependency context "
|
||||
"requires curator review."
|
||||
in decisions[0].notes
|
||||
)
|
||||
|
||||
|
||||
def test_rebuild_characteristics_dry_run_preserves_approved_map(tmp_path):
|
||||
|
||||
@@ -98,7 +98,7 @@ Acceptance criteria:
|
||||
|
||||
```task
|
||||
id: RREG-WP-0009-T03
|
||||
status: in_progress
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "3b8bac53-6a14-43b3-9a59-e15c24c0cd6e"
|
||||
```
|
||||
@@ -121,7 +121,7 @@ Acceptance criteria:
|
||||
|
||||
```task
|
||||
id: RREG-WP-0009-T04
|
||||
status: in_progress
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "4f666cd6-471e-4af9-b53c-4f3d7a1d1973"
|
||||
```
|
||||
@@ -148,7 +148,7 @@ Acceptance criteria:
|
||||
|
||||
```task
|
||||
id: RREG-WP-0009-T05
|
||||
status: in_progress
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "d10d4bd7-4e5e-4efc-a724-b072fc53b8d2"
|
||||
```
|
||||
@@ -239,7 +239,7 @@ Acceptance criteria:
|
||||
|
||||
```task
|
||||
id: RREG-WP-0009-T09
|
||||
status: in_progress
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "071f6d76-c92b-4ac1-825c-edcbef4bdbf6"
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user