generated from coulomb/repo-seed
Add discovery snapshot contract
This commit is contained in:
165
docs/repo-reality-scanner.md
Normal file
165
docs/repo-reality-scanner.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Repo Reality Scanner
|
||||
|
||||
The repo reality scanner discovers Fabric entities from repository evidence and
|
||||
turns them into candidate graph facts. It is a discovery layer, not a new
|
||||
authoring surface. Repo-owned declarations remain the highest-trust source for
|
||||
accepted Fabric graph data.
|
||||
|
||||
## Contract
|
||||
|
||||
A scanner run emits a `FabricDiscoverySnapshot`. The snapshot is scoped to one
|
||||
repository, one commit, and one scan profile. It contains:
|
||||
|
||||
- replacement scopes, which define the evidence sets that may be replaced on a
|
||||
rescan
|
||||
- candidate nodes, edges, and attributes
|
||||
- source anchors for every candidate
|
||||
- extractor provenance for every candidate
|
||||
- tombstones for candidates that vanished inside a replacement scope
|
||||
- reconciliation policy metadata
|
||||
|
||||
The JSON schema lives at `schemas/discovery-snapshot.schema.yaml`.
|
||||
|
||||
## Identity
|
||||
|
||||
Identity is the main safety boundary. The scanner must not append guesses on
|
||||
every run. It needs to produce stable keys that are repeatable for the same
|
||||
observed entity.
|
||||
|
||||
Candidate node keys use this shape:
|
||||
|
||||
```text
|
||||
discovery:{repo_slug}:{entity_kind}:{normalized_name}[:source_fingerprint]
|
||||
```
|
||||
|
||||
Use the optional source fingerprint when a name is too generic or when multiple
|
||||
entities of the same kind can share a display name. Examples include HTTP
|
||||
routes, generated clients, deployment manifests, and catalog records.
|
||||
|
||||
Candidate edge keys use a relationship fingerprint over:
|
||||
|
||||
- source stable key
|
||||
- edge type
|
||||
- target stable key
|
||||
- optional evidence scope
|
||||
|
||||
Candidate attribute keys use the entity stable key plus the normalized
|
||||
attribute name and, where needed, a source fingerprint.
|
||||
|
||||
Stable-key parts are lowercased and normalized to ASCII-like identity segments.
|
||||
The helper functions in `railiance_fabric.discovery` define the initial rules.
|
||||
|
||||
## Source Anchors
|
||||
|
||||
Every candidate must carry one or more source anchors. A source anchor identifies
|
||||
why the scanner believes the fact exists. Anchors can point to files, package
|
||||
manifests, lockfiles, API contracts, deployment manifests, service catalogs,
|
||||
registries, LLM evidence bundles, or manual review notes.
|
||||
|
||||
Source anchors include a fingerprint. The fingerprint should cover stable
|
||||
location fields such as path, URL, ref, line range, or JSON pointer. Snippets are
|
||||
useful for review but should not be the only identity anchor because formatting
|
||||
noise can churn snippets.
|
||||
|
||||
## Replacement Scopes
|
||||
|
||||
A replacement scope says which extractor owns which set of candidates. Rescans
|
||||
may retire missing candidates only inside the same scope.
|
||||
|
||||
Examples:
|
||||
|
||||
- `scope:repo-scoping:python-package:package_manifest:<hash>`
|
||||
- `scope:state-hub:fabric-declarations:declaration`
|
||||
- `scope:llm-connect:readme-summary:file:<hash>`
|
||||
- `scope:railiance-fabric:local-registry:fabric_registry`
|
||||
|
||||
Scopes have a mode:
|
||||
|
||||
- `replacement`: candidates missing from the next run in the same scope become
|
||||
tombstones.
|
||||
- `additive`: candidates are added or updated, but absence does not retire old
|
||||
candidates.
|
||||
|
||||
LLM extractors should usually use replacement mode only for tightly bounded
|
||||
evidence bundles. Broad repo summaries are safer as additive or review-only
|
||||
until the extraction prompts are proven stable.
|
||||
|
||||
## Merge Precedence
|
||||
|
||||
When multiple sources describe the same entity, reconciliation uses this
|
||||
precedence:
|
||||
|
||||
1. `repo_declaration`
|
||||
2. `deterministic`
|
||||
3. `catalog`
|
||||
4. `registry`
|
||||
5. `llm`
|
||||
6. `manual`
|
||||
|
||||
Manual review can override local candidate state, but it should not silently
|
||||
rewrite repo-owned declarations. If accepted discoveries should become
|
||||
authoritative, the safer next step is to generate a repo-owned declaration patch
|
||||
for human review.
|
||||
|
||||
## Duplicate Handling
|
||||
|
||||
The reconciler should merge candidates with the same stable key automatically.
|
||||
It should also look for possible duplicates using:
|
||||
|
||||
- alias overlap
|
||||
- identical source anchors
|
||||
- identical evidence fingerprints
|
||||
- normalized label similarity within the same entity kind
|
||||
- relationship fingerprints with the same endpoints and edge type
|
||||
- declaration ids that match discovery aliases
|
||||
|
||||
Exact stable-key matches can be merged automatically. Alias-only or
|
||||
similarity-only matches should become `needs_review` conflicts unless an
|
||||
extractor has a source-specific rule that makes the match deterministic.
|
||||
|
||||
## Rescan And Tombstones
|
||||
|
||||
On a rescan, the scanner compares the previous accepted discovery snapshot with
|
||||
the newly produced snapshot for the same repo/profile.
|
||||
|
||||
- Same stable key: update in place.
|
||||
- Same source anchor but changed attributes: update with changed evidence.
|
||||
- Missing from same replacement scope: create a tombstone.
|
||||
- Missing from a different scope: leave untouched.
|
||||
- Reappears after tombstone: reactivate if the stable key and scope match.
|
||||
- Reappears with a new key but same alias/source anchor: flag as possible
|
||||
duplicate resurrection.
|
||||
|
||||
Tombstones explain graph drift and prevent immediate re-creation loops. They
|
||||
should be retained long enough to compare several scan cycles and can later be
|
||||
compacted by repo, extractor, or entity kind.
|
||||
|
||||
## Mapping To Fabric Graphs
|
||||
|
||||
Discovery candidates can project into the existing graph model when accepted:
|
||||
|
||||
- candidate service nodes map to `ServiceDeclaration`-like graph nodes
|
||||
- candidate capabilities and interfaces map to provider surface nodes
|
||||
- candidate dependencies map to dependency nodes and `consumes` edges
|
||||
- candidate deployment/runtime entities map to graph explorer infrastructure
|
||||
nodes until declarations gain first-class runtime support
|
||||
- candidate libraries map to library inventory records and graph explorer nodes
|
||||
|
||||
If a repo-owned declaration already exists for the same entity, discovery output
|
||||
should attach as supporting evidence instead of creating another node.
|
||||
|
||||
## LLM Boundary
|
||||
|
||||
LLM extraction through `llm-connect` is optional and schema-gated. The scanner
|
||||
should use deterministic preselection to build small evidence bundles, ask for
|
||||
structured JSON, validate the JSON against the discovery schema, and record:
|
||||
|
||||
- extractor id and version
|
||||
- prompt version
|
||||
- provider and model
|
||||
- usage metadata
|
||||
- confidence and uncertainty
|
||||
- rationale
|
||||
|
||||
Malformed, low-confidence, or conflicting LLM output becomes review material,
|
||||
not accepted graph data.
|
||||
113
railiance_fabric/discovery.py
Normal file
113
railiance_fabric/discovery.py
Normal file
@@ -0,0 +1,113 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
import re
|
||||
from typing import Any
|
||||
|
||||
|
||||
_IDENTITY_PART_RE = re.compile(r"[^a-z0-9._@+-]+")
|
||||
_DASH_RE = re.compile(r"-+")
|
||||
|
||||
|
||||
def normalize_identity_part(value: object, *, fallback: str = "unknown") -> str:
|
||||
"""Normalize one stable-key segment without making it opaque."""
|
||||
|
||||
text = re.sub(r"([a-z0-9])([A-Z])", r"\1-\2", str(value or "").strip()).lower()
|
||||
text = _IDENTITY_PART_RE.sub("-", text)
|
||||
text = _DASH_RE.sub("-", text).strip("._-+@")
|
||||
return text or fallback
|
||||
|
||||
|
||||
def short_fingerprint(value: object, *, length: int = 12) -> str:
|
||||
"""Return a deterministic short SHA-256 fingerprint for identity suffixes."""
|
||||
|
||||
if length < 8:
|
||||
raise ValueError("fingerprints shorter than 8 characters are too collision-prone")
|
||||
if isinstance(value, str):
|
||||
payload = value
|
||||
else:
|
||||
payload = json.dumps(value, sort_keys=True, separators=(",", ":"), default=str)
|
||||
return hashlib.sha256(payload.encode("utf-8")).hexdigest()[:length]
|
||||
|
||||
|
||||
def discovery_stable_key(
|
||||
repo_slug: str,
|
||||
entity_kind: str,
|
||||
name: str,
|
||||
*,
|
||||
source_anchor: object | None = None,
|
||||
) -> str:
|
||||
"""Build a canonical discovery key for a repo-scoped candidate entity."""
|
||||
|
||||
key = "discovery:{repo}:{kind}:{name}".format(
|
||||
repo=normalize_identity_part(repo_slug),
|
||||
kind=normalize_identity_part(entity_kind),
|
||||
name=normalize_identity_part(name),
|
||||
)
|
||||
if source_anchor is not None:
|
||||
key = f"{key}:{short_fingerprint(source_anchor)}"
|
||||
return _limit_stable_key(key)
|
||||
|
||||
|
||||
def relationship_stable_key(
|
||||
source_key: str,
|
||||
edge_type: str,
|
||||
target_key: str,
|
||||
*,
|
||||
evidence_scope: object | None = None,
|
||||
) -> str:
|
||||
"""Build a stable relationship key from normalized endpoints and edge type."""
|
||||
|
||||
payload = {
|
||||
"source": source_key,
|
||||
"edge_type": normalize_identity_part(edge_type),
|
||||
"target": target_key,
|
||||
"evidence_scope": evidence_scope or "",
|
||||
}
|
||||
return f"edge:{short_fingerprint(payload, length=20)}"
|
||||
|
||||
|
||||
def attribute_stable_key(entity_key: str, attribute_name: str, *, source_anchor: object | None = None) -> str:
|
||||
"""Build a stable key for a discovered attribute on an entity."""
|
||||
|
||||
key = f"attribute:{str(entity_key).strip()}:{normalize_identity_part(attribute_name)}"
|
||||
if source_anchor is not None:
|
||||
key = f"{key}:{short_fingerprint(source_anchor)}"
|
||||
return _limit_stable_key(key)
|
||||
|
||||
|
||||
def replacement_scope_id(
|
||||
repo_slug: str,
|
||||
extractor_id: str,
|
||||
source_kind: str,
|
||||
*,
|
||||
source_path: str | None = None,
|
||||
) -> str:
|
||||
"""Build the scope id that controls safe replacement on rescans."""
|
||||
|
||||
key = "scope:{repo}:{extractor}:{source_kind}".format(
|
||||
repo=normalize_identity_part(repo_slug),
|
||||
extractor=normalize_identity_part(extractor_id),
|
||||
source_kind=normalize_identity_part(source_kind),
|
||||
)
|
||||
if source_path:
|
||||
key = f"{key}:{short_fingerprint(source_path)}"
|
||||
return _limit_stable_key(key)
|
||||
|
||||
|
||||
def source_fingerprint(anchor: dict[str, Any]) -> str:
|
||||
"""Fingerprint the stable parts of a source anchor."""
|
||||
|
||||
stable_anchor = {
|
||||
key: anchor.get(key)
|
||||
for key in ("source_kind", "path", "url", "ref", "line_start", "line_end", "json_pointer")
|
||||
if anchor.get(key) not in (None, "")
|
||||
}
|
||||
return short_fingerprint(stable_anchor, length=16)
|
||||
|
||||
|
||||
def _limit_stable_key(key: str, *, max_length: int = 240) -> str:
|
||||
if len(key) <= max_length:
|
||||
return key
|
||||
return f"{key[: max_length - 21].rstrip(':._-')}:{short_fingerprint(key, length=20)}"
|
||||
487
schemas/discovery-snapshot.schema.yaml
Normal file
487
schemas/discovery-snapshot.schema.yaml
Normal file
@@ -0,0 +1,487 @@
|
||||
$schema: "https://json-schema.org/draft/2020-12/schema"
|
||||
$id: "https://railiance.local/fabric/schemas/discovery-snapshot.schema.yaml"
|
||||
title: "FabricDiscoverySnapshot"
|
||||
type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- apiVersion
|
||||
- kind
|
||||
- source
|
||||
- scan
|
||||
- replacement_scopes
|
||||
- candidates
|
||||
- tombstones
|
||||
- reconciliation
|
||||
properties:
|
||||
apiVersion:
|
||||
$ref: "./common.schema.yaml#/$defs/apiVersion"
|
||||
kind:
|
||||
type: string
|
||||
const: FabricDiscoverySnapshot
|
||||
generated_at:
|
||||
type: string
|
||||
format: date-time
|
||||
source:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- repo_slug
|
||||
- commit
|
||||
properties:
|
||||
repo_slug:
|
||||
type: string
|
||||
minLength: 1
|
||||
repo_name:
|
||||
type: string
|
||||
domain:
|
||||
type: string
|
||||
commit:
|
||||
type: string
|
||||
minLength: 1
|
||||
default_branch:
|
||||
type: string
|
||||
path:
|
||||
type: string
|
||||
scan:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- run_id
|
||||
- profile
|
||||
- deterministic_only
|
||||
- llm_enabled
|
||||
properties:
|
||||
run_id:
|
||||
$ref: "#/$defs/stableKey"
|
||||
profile:
|
||||
type: string
|
||||
minLength: 1
|
||||
deterministic_only:
|
||||
type: boolean
|
||||
llm_enabled:
|
||||
type: boolean
|
||||
started_at:
|
||||
type: string
|
||||
format: date-time
|
||||
completed_at:
|
||||
type: string
|
||||
format: date-time
|
||||
llm_budget:
|
||||
type: object
|
||||
additionalProperties: true
|
||||
replacement_scopes:
|
||||
type: array
|
||||
items:
|
||||
$ref: "#/$defs/replacementScope"
|
||||
candidates:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- nodes
|
||||
- edges
|
||||
- attributes
|
||||
properties:
|
||||
nodes:
|
||||
type: array
|
||||
items:
|
||||
$ref: "#/$defs/candidateNode"
|
||||
edges:
|
||||
type: array
|
||||
items:
|
||||
$ref: "#/$defs/candidateEdge"
|
||||
attributes:
|
||||
type: array
|
||||
items:
|
||||
$ref: "#/$defs/candidateAttribute"
|
||||
tombstones:
|
||||
type: array
|
||||
items:
|
||||
$ref: "#/$defs/tombstone"
|
||||
reconciliation:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- precedence
|
||||
- duplicate_policy
|
||||
- retirement_policy
|
||||
properties:
|
||||
precedence:
|
||||
type: array
|
||||
minItems: 1
|
||||
uniqueItems: true
|
||||
items:
|
||||
$ref: "#/$defs/origin"
|
||||
duplicate_policy:
|
||||
type: string
|
||||
minLength: 1
|
||||
retirement_policy:
|
||||
type: string
|
||||
minLength: 1
|
||||
conflicts:
|
||||
type: array
|
||||
items:
|
||||
type: object
|
||||
additionalProperties: true
|
||||
|
||||
$defs:
|
||||
stableKey:
|
||||
type: string
|
||||
minLength: 3
|
||||
maxLength: 240
|
||||
pattern: "^[A-Za-z0-9][A-Za-z0-9._:/@+-]*$"
|
||||
|
||||
origin:
|
||||
type: string
|
||||
enum:
|
||||
- repo_declaration
|
||||
- deterministic
|
||||
- catalog
|
||||
- registry
|
||||
- llm
|
||||
- manual
|
||||
|
||||
reviewState:
|
||||
type: string
|
||||
enum:
|
||||
- accepted
|
||||
- candidate
|
||||
- needs_review
|
||||
- rejected
|
||||
|
||||
entityStatus:
|
||||
type: string
|
||||
enum:
|
||||
- active
|
||||
- retired
|
||||
- duplicate
|
||||
- conflicted
|
||||
|
||||
sourceKind:
|
||||
type: string
|
||||
enum:
|
||||
- file
|
||||
- declaration
|
||||
- package_manifest
|
||||
- lockfile
|
||||
- api_contract
|
||||
- deployment_manifest
|
||||
- service_config
|
||||
- service_catalog
|
||||
- package_registry
|
||||
- container_registry
|
||||
- fabric_registry
|
||||
- llm
|
||||
- manual
|
||||
|
||||
extractionMethod:
|
||||
type: string
|
||||
enum:
|
||||
- declaration
|
||||
- deterministic
|
||||
- connector
|
||||
- llm
|
||||
- manual
|
||||
|
||||
confidence:
|
||||
type: number
|
||||
minimum: 0
|
||||
maximum: 1
|
||||
|
||||
jsonValue:
|
||||
anyOf:
|
||||
- type: "null"
|
||||
- type: string
|
||||
- type: number
|
||||
- type: integer
|
||||
- type: boolean
|
||||
- type: array
|
||||
items:
|
||||
$ref: "#/$defs/jsonValue"
|
||||
- type: object
|
||||
additionalProperties:
|
||||
$ref: "#/$defs/jsonValue"
|
||||
|
||||
sourceAnchor:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- source_kind
|
||||
- fingerprint
|
||||
properties:
|
||||
source_kind:
|
||||
$ref: "#/$defs/sourceKind"
|
||||
path:
|
||||
type: string
|
||||
minLength: 1
|
||||
url:
|
||||
type: string
|
||||
format: uri
|
||||
ref:
|
||||
type: string
|
||||
minLength: 1
|
||||
line_start:
|
||||
type: integer
|
||||
minimum: 1
|
||||
line_end:
|
||||
type: integer
|
||||
minimum: 1
|
||||
json_pointer:
|
||||
type: string
|
||||
fingerprint:
|
||||
type: string
|
||||
minLength: 8
|
||||
snippet:
|
||||
type: string
|
||||
|
||||
provenance:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- extractor_id
|
||||
- method
|
||||
- origin
|
||||
properties:
|
||||
extractor_id:
|
||||
type: string
|
||||
minLength: 1
|
||||
extractor_version:
|
||||
type: string
|
||||
method:
|
||||
$ref: "#/$defs/extractionMethod"
|
||||
origin:
|
||||
$ref: "#/$defs/origin"
|
||||
prompt_version:
|
||||
type: string
|
||||
provider:
|
||||
type: string
|
||||
model:
|
||||
type: string
|
||||
usage:
|
||||
type: object
|
||||
additionalProperties: true
|
||||
rationale:
|
||||
type: string
|
||||
|
||||
replacementScope:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- id
|
||||
- extractor_id
|
||||
- source_kind
|
||||
- mode
|
||||
properties:
|
||||
id:
|
||||
$ref: "#/$defs/stableKey"
|
||||
extractor_id:
|
||||
type: string
|
||||
minLength: 1
|
||||
source_kind:
|
||||
$ref: "#/$defs/sourceKind"
|
||||
source_path:
|
||||
type: string
|
||||
mode:
|
||||
type: string
|
||||
enum:
|
||||
- replacement
|
||||
- additive
|
||||
description:
|
||||
type: string
|
||||
|
||||
candidateNode:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- stable_key
|
||||
- kind
|
||||
- label
|
||||
- repo
|
||||
- origin
|
||||
- review_state
|
||||
- status
|
||||
- confidence
|
||||
- replacement_scope
|
||||
- provenance
|
||||
- source_anchors
|
||||
properties:
|
||||
stable_key:
|
||||
$ref: "#/$defs/stableKey"
|
||||
graph_id:
|
||||
$ref: "./common.schema.yaml#/$defs/graphId"
|
||||
kind:
|
||||
type: string
|
||||
minLength: 1
|
||||
label:
|
||||
type: string
|
||||
minLength: 1
|
||||
repo:
|
||||
type: string
|
||||
minLength: 1
|
||||
domain:
|
||||
type: string
|
||||
lifecycle:
|
||||
type: string
|
||||
aliases:
|
||||
type: array
|
||||
uniqueItems: true
|
||||
items:
|
||||
type: string
|
||||
minLength: 1
|
||||
attributes:
|
||||
type: object
|
||||
additionalProperties:
|
||||
$ref: "#/$defs/jsonValue"
|
||||
origin:
|
||||
$ref: "#/$defs/origin"
|
||||
review_state:
|
||||
$ref: "#/$defs/reviewState"
|
||||
status:
|
||||
$ref: "#/$defs/entityStatus"
|
||||
confidence:
|
||||
$ref: "#/$defs/confidence"
|
||||
replacement_scope:
|
||||
$ref: "#/$defs/stableKey"
|
||||
provenance:
|
||||
type: array
|
||||
minItems: 1
|
||||
items:
|
||||
$ref: "#/$defs/provenance"
|
||||
source_anchors:
|
||||
type: array
|
||||
minItems: 1
|
||||
items:
|
||||
$ref: "#/$defs/sourceAnchor"
|
||||
|
||||
candidateEdge:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- stable_key
|
||||
- edge_type
|
||||
- source_key
|
||||
- target_key
|
||||
- origin
|
||||
- review_state
|
||||
- status
|
||||
- confidence
|
||||
- replacement_scope
|
||||
- provenance
|
||||
- source_anchors
|
||||
properties:
|
||||
stable_key:
|
||||
$ref: "#/$defs/stableKey"
|
||||
edge_type:
|
||||
type: string
|
||||
minLength: 1
|
||||
source_key:
|
||||
$ref: "#/$defs/stableKey"
|
||||
target_key:
|
||||
$ref: "#/$defs/stableKey"
|
||||
aliases:
|
||||
type: array
|
||||
uniqueItems: true
|
||||
items:
|
||||
type: string
|
||||
minLength: 1
|
||||
attributes:
|
||||
type: object
|
||||
additionalProperties:
|
||||
$ref: "#/$defs/jsonValue"
|
||||
origin:
|
||||
$ref: "#/$defs/origin"
|
||||
review_state:
|
||||
$ref: "#/$defs/reviewState"
|
||||
status:
|
||||
$ref: "#/$defs/entityStatus"
|
||||
confidence:
|
||||
$ref: "#/$defs/confidence"
|
||||
replacement_scope:
|
||||
$ref: "#/$defs/stableKey"
|
||||
provenance:
|
||||
type: array
|
||||
minItems: 1
|
||||
items:
|
||||
$ref: "#/$defs/provenance"
|
||||
source_anchors:
|
||||
type: array
|
||||
minItems: 1
|
||||
items:
|
||||
$ref: "#/$defs/sourceAnchor"
|
||||
|
||||
candidateAttribute:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- stable_key
|
||||
- entity_key
|
||||
- name
|
||||
- value
|
||||
- origin
|
||||
- review_state
|
||||
- confidence
|
||||
- replacement_scope
|
||||
- provenance
|
||||
- source_anchors
|
||||
properties:
|
||||
stable_key:
|
||||
$ref: "#/$defs/stableKey"
|
||||
entity_key:
|
||||
$ref: "#/$defs/stableKey"
|
||||
name:
|
||||
type: string
|
||||
minLength: 1
|
||||
value:
|
||||
$ref: "#/$defs/jsonValue"
|
||||
origin:
|
||||
$ref: "#/$defs/origin"
|
||||
review_state:
|
||||
$ref: "#/$defs/reviewState"
|
||||
confidence:
|
||||
$ref: "#/$defs/confidence"
|
||||
replacement_scope:
|
||||
$ref: "#/$defs/stableKey"
|
||||
provenance:
|
||||
type: array
|
||||
minItems: 1
|
||||
items:
|
||||
$ref: "#/$defs/provenance"
|
||||
source_anchors:
|
||||
type: array
|
||||
minItems: 1
|
||||
items:
|
||||
$ref: "#/$defs/sourceAnchor"
|
||||
|
||||
tombstone:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
required:
|
||||
- stable_key
|
||||
- entity_kind
|
||||
- replacement_scope
|
||||
- retired_at
|
||||
- reason
|
||||
properties:
|
||||
stable_key:
|
||||
$ref: "#/$defs/stableKey"
|
||||
entity_kind:
|
||||
type: string
|
||||
enum:
|
||||
- node
|
||||
- edge
|
||||
- attribute
|
||||
replacement_scope:
|
||||
$ref: "#/$defs/stableKey"
|
||||
retired_at:
|
||||
type: string
|
||||
format: date-time
|
||||
reason:
|
||||
type: string
|
||||
enum:
|
||||
- source_missing
|
||||
- scope_replaced
|
||||
- duplicate_superseded
|
||||
- declaration_override
|
||||
- manually_retired
|
||||
previous_candidate:
|
||||
type: object
|
||||
additionalProperties: true
|
||||
270
tests/test_discovery.py
Normal file
270
tests/test_discovery.py
Normal file
@@ -0,0 +1,270 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
import jsonschema
|
||||
|
||||
from railiance_fabric.discovery import (
|
||||
attribute_stable_key,
|
||||
discovery_stable_key,
|
||||
relationship_stable_key,
|
||||
replacement_scope_id,
|
||||
source_fingerprint,
|
||||
)
|
||||
from railiance_fabric.schema_validation import draft202012_validator
|
||||
|
||||
|
||||
def test_discovery_identity_helpers_are_stable_and_scoped() -> None:
|
||||
service_key = discovery_stable_key("Repo Scoping", "ServiceDeclaration", "Scope Generator")
|
||||
duplicate_key = discovery_stable_key("repo-scoping", "service declaration", "scope-generator")
|
||||
readme_key = discovery_stable_key(
|
||||
"repo-scoping",
|
||||
"ServiceDeclaration",
|
||||
"Scope Generator",
|
||||
source_anchor={"path": "README.md", "line_start": 12},
|
||||
)
|
||||
pyproject_key = discovery_stable_key(
|
||||
"repo-scoping",
|
||||
"ServiceDeclaration",
|
||||
"Scope Generator",
|
||||
source_anchor={"path": "pyproject.toml", "line_start": 1},
|
||||
)
|
||||
|
||||
assert service_key == duplicate_key
|
||||
assert service_key == "discovery:repo-scoping:service-declaration:scope-generator"
|
||||
assert readme_key != pyproject_key
|
||||
assert readme_key.startswith(service_key)
|
||||
|
||||
edge_key = relationship_stable_key(
|
||||
service_key,
|
||||
"provides",
|
||||
discovery_stable_key("repo-scoping", "CapabilityDeclaration", "Scope Generation"),
|
||||
evidence_scope="fabric/declarations",
|
||||
)
|
||||
assert edge_key.startswith("edge:")
|
||||
assert edge_key == relationship_stable_key(
|
||||
service_key,
|
||||
"provides",
|
||||
discovery_stable_key("repo-scoping", "CapabilityDeclaration", "Scope Generation"),
|
||||
evidence_scope="fabric/declarations",
|
||||
)
|
||||
|
||||
attribute_key = attribute_stable_key(service_key, "runtime")
|
||||
assert attribute_key == "attribute:discovery:repo-scoping:service-declaration:scope-generator:runtime"
|
||||
|
||||
assert replacement_scope_id(
|
||||
"Repo Scoping",
|
||||
"python-package",
|
||||
"package_manifest",
|
||||
source_path="pyproject.toml",
|
||||
) != replacement_scope_id(
|
||||
"Repo Scoping",
|
||||
"python-package",
|
||||
"package_manifest",
|
||||
source_path="package.json",
|
||||
)
|
||||
|
||||
|
||||
def test_source_fingerprint_ignores_review_snippet_noise() -> None:
|
||||
base = {
|
||||
"source_kind": "file",
|
||||
"path": "README.md",
|
||||
"line_start": 10,
|
||||
"line_end": 18,
|
||||
"snippet": "first wording",
|
||||
}
|
||||
changed_snippet = {**base, "snippet": "edited wording"}
|
||||
|
||||
assert source_fingerprint(base) == source_fingerprint(changed_snippet)
|
||||
assert source_fingerprint(base) != source_fingerprint({**base, "line_start": 11})
|
||||
|
||||
|
||||
def test_discovery_snapshot_schema_accepts_candidate_graph() -> None:
|
||||
service_key = discovery_stable_key("repo-scoping", "ServiceDeclaration", "Scope Generator")
|
||||
capability_key = discovery_stable_key("repo-scoping", "CapabilityDeclaration", "Scope Generation")
|
||||
edge_key = relationship_stable_key(service_key, "provides", capability_key)
|
||||
scope_id = replacement_scope_id(
|
||||
"repo-scoping",
|
||||
"python-package",
|
||||
"package_manifest",
|
||||
source_path="pyproject.toml",
|
||||
)
|
||||
anchor = {
|
||||
"source_kind": "package_manifest",
|
||||
"path": "pyproject.toml",
|
||||
"json_pointer": "/project/name",
|
||||
"fingerprint": source_fingerprint(
|
||||
{
|
||||
"source_kind": "package_manifest",
|
||||
"path": "pyproject.toml",
|
||||
"json_pointer": "/project/name",
|
||||
}
|
||||
),
|
||||
}
|
||||
provenance = {
|
||||
"extractor_id": "python-package",
|
||||
"extractor_version": "0.1.0",
|
||||
"method": "deterministic",
|
||||
"origin": "deterministic",
|
||||
}
|
||||
payload = {
|
||||
"apiVersion": "railiance.fabric/v1alpha1",
|
||||
"kind": "FabricDiscoverySnapshot",
|
||||
"generated_at": "2026-05-19T00:00:00Z",
|
||||
"source": {
|
||||
"repo_slug": "repo-scoping",
|
||||
"repo_name": "repo-scoping",
|
||||
"domain": "capabilities",
|
||||
"commit": "abc123",
|
||||
"path": "/home/worsch/repo-scoping",
|
||||
},
|
||||
"scan": {
|
||||
"run_id": "scan:repo-scoping:abc123",
|
||||
"profile": "deterministic",
|
||||
"deterministic_only": True,
|
||||
"llm_enabled": False,
|
||||
"started_at": "2026-05-19T00:00:00Z",
|
||||
"completed_at": "2026-05-19T00:00:01Z",
|
||||
},
|
||||
"replacement_scopes": [
|
||||
{
|
||||
"id": scope_id,
|
||||
"extractor_id": "python-package",
|
||||
"source_kind": "package_manifest",
|
||||
"source_path": "pyproject.toml",
|
||||
"mode": "replacement",
|
||||
}
|
||||
],
|
||||
"candidates": {
|
||||
"nodes": [
|
||||
{
|
||||
"stable_key": service_key,
|
||||
"kind": "ServiceDeclaration",
|
||||
"label": "Scope Generator",
|
||||
"repo": "repo-scoping",
|
||||
"domain": "capabilities",
|
||||
"lifecycle": "active",
|
||||
"aliases": ["repo-scoping", "scope-generator"],
|
||||
"attributes": {"language": "python"},
|
||||
"origin": "deterministic",
|
||||
"review_state": "candidate",
|
||||
"status": "active",
|
||||
"confidence": 0.85,
|
||||
"replacement_scope": scope_id,
|
||||
"provenance": [provenance],
|
||||
"source_anchors": [anchor],
|
||||
},
|
||||
{
|
||||
"stable_key": capability_key,
|
||||
"kind": "CapabilityDeclaration",
|
||||
"label": "Scope Generation",
|
||||
"repo": "repo-scoping",
|
||||
"domain": "capabilities",
|
||||
"aliases": ["scope-generation"],
|
||||
"attributes": {"capability_type": "scope-generation"},
|
||||
"origin": "llm",
|
||||
"review_state": "needs_review",
|
||||
"status": "active",
|
||||
"confidence": 0.62,
|
||||
"replacement_scope": scope_id,
|
||||
"provenance": [
|
||||
{
|
||||
"extractor_id": "readme-llm",
|
||||
"method": "llm",
|
||||
"origin": "llm",
|
||||
"prompt_version": "repo-summary-v1",
|
||||
"provider": "mock",
|
||||
"model": "mock",
|
||||
"usage": {"total_tokens": 0},
|
||||
"rationale": "Fixture output for schema coverage.",
|
||||
}
|
||||
],
|
||||
"source_anchors": [anchor],
|
||||
},
|
||||
],
|
||||
"edges": [
|
||||
{
|
||||
"stable_key": edge_key,
|
||||
"edge_type": "provides",
|
||||
"source_key": service_key,
|
||||
"target_key": capability_key,
|
||||
"origin": "deterministic",
|
||||
"review_state": "candidate",
|
||||
"status": "active",
|
||||
"confidence": 0.8,
|
||||
"replacement_scope": scope_id,
|
||||
"provenance": [provenance],
|
||||
"source_anchors": [anchor],
|
||||
}
|
||||
],
|
||||
"attributes": [
|
||||
{
|
||||
"stable_key": attribute_stable_key(service_key, "language"),
|
||||
"entity_key": service_key,
|
||||
"name": "language",
|
||||
"value": "python",
|
||||
"origin": "deterministic",
|
||||
"review_state": "candidate",
|
||||
"confidence": 0.95,
|
||||
"replacement_scope": scope_id,
|
||||
"provenance": [provenance],
|
||||
"source_anchors": [anchor],
|
||||
}
|
||||
],
|
||||
},
|
||||
"tombstones": [
|
||||
{
|
||||
"stable_key": discovery_stable_key("repo-scoping", "ServiceDeclaration", "Old Scope Tool"),
|
||||
"entity_kind": "node",
|
||||
"replacement_scope": scope_id,
|
||||
"retired_at": "2026-05-19T00:00:01Z",
|
||||
"reason": "source_missing",
|
||||
}
|
||||
],
|
||||
"reconciliation": {
|
||||
"precedence": ["repo_declaration", "deterministic", "catalog", "registry", "llm", "manual"],
|
||||
"duplicate_policy": "stable-key matches merge automatically; alias-only matches require review",
|
||||
"retirement_policy": "missing candidates retire only inside their replacement scope",
|
||||
},
|
||||
}
|
||||
|
||||
_validate_schema("discovery-snapshot.schema.yaml", payload)
|
||||
|
||||
|
||||
def test_discovery_snapshot_schema_rejects_unscoped_tombstone() -> None:
|
||||
payload = {
|
||||
"apiVersion": "railiance.fabric/v1alpha1",
|
||||
"kind": "FabricDiscoverySnapshot",
|
||||
"source": {"repo_slug": "repo-scoping", "commit": "abc123"},
|
||||
"scan": {
|
||||
"run_id": "scan:repo-scoping:abc123",
|
||||
"profile": "deterministic",
|
||||
"deterministic_only": True,
|
||||
"llm_enabled": False,
|
||||
},
|
||||
"replacement_scopes": [],
|
||||
"candidates": {"nodes": [], "edges": [], "attributes": []},
|
||||
"tombstones": [
|
||||
{
|
||||
"stable_key": "discovery:repo-scoping:service:old",
|
||||
"entity_kind": "node",
|
||||
"retired_at": "2026-05-19T00:00:01Z",
|
||||
"reason": "source_missing",
|
||||
}
|
||||
],
|
||||
"reconciliation": {
|
||||
"precedence": ["repo_declaration", "deterministic", "catalog", "registry", "llm", "manual"],
|
||||
"duplicate_policy": "stable-key matches merge automatically",
|
||||
"retirement_policy": "missing candidates retire only inside their replacement scope",
|
||||
},
|
||||
}
|
||||
|
||||
validator = draft202012_validator(Path("schemas") / "discovery-snapshot.schema.yaml")
|
||||
with pytest.raises(jsonschema.ValidationError):
|
||||
validator.validate(payload)
|
||||
|
||||
|
||||
def _validate_schema(schema_name: str, payload: dict[str, object]) -> None:
|
||||
validator = draft202012_validator(Path("schemas") / schema_name)
|
||||
validator.validate(payload)
|
||||
@@ -4,7 +4,7 @@ type: workplan
|
||||
title: "Repo Reality Scanner"
|
||||
domain: railiance
|
||||
repo: railiance-fabric
|
||||
status: proposed
|
||||
status: active
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
planning_priority: high
|
||||
@@ -119,7 +119,7 @@ On rescan:
|
||||
|
||||
```task
|
||||
id: RAIL-FAB-WP-0010-T01
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "d77423fa-a47f-4246-86bd-ea1ca2d17bc4"
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user