4.5 KiB
Self-Scoping Assessment Workflow
Self-scoping is the feedback loop for improving repo-scoping with evidence. The loop is simple: run the current engine against repo-scoping itself, compare the result to a curated golden profile and known bad runs, then record whether the new result is better.
Outcome Terms
baseline: a result accepted as a reference point for later comparisons.challenger: a fresh result from a new engine version or configuration.preferred: the reviewer chose this result over the prior baseline.tied: the reviewer judged old and new results roughly equivalent.rejected: the result is known bad and should not become baseline truth.superseded: the result used to be useful but was replaced by a newer preferred assessment.needs-human: the result cannot be judged confidently without curator review.
The known 2026-05-15 run 39 artifact is a rejected negative regression seed,
not a baseline to imitate.
Release Binding
Assessment output is only useful if it is bound to the engine that generated it. Comparable challenger artifacts should record:
- repo-scoping package version
- engine git commit
- engine release or tag when available
- engine dirty state
- scanner version
- candidate generator version
- quality criteria version
- prompt version when LLM or agentic review is used
An artifact with release_binding_status=complete can be compared as a real
challenger. An artifact with historical_incomplete can still be useful as a
negative seed, but it should not become a preferred baseline. An unbound
artifact is diagnostic only.
Dirty state does not automatically make an artifact useless, but it must be visible. A dirty challenger should usually be rerun after the relevant changes are committed.
Standard Loop
-
Run the self-assessment command:
repo-scoping self-assess \ --source-path . \ --assessment-output docs/self-scoping/assessments/repo-scoping-challenger.json \ --comparison-output docs/self-scoping/assessments/repo-scoping-challenger.md -
Read the comparison report.
-
If the report says
regression, inspect forbidden capabilities, misplaced features, and known regression patterns first. -
If the report says
needs_review, inspect missing expected capabilities and source evidence before choosing old or new output. -
If the report says
candidate_improvement, still confirm that the hierarchy, source refs, and native-utility boundaries make sense. -
Record the decision as an assessment outcome before changing the active baseline.
CI Use
Use --fail-on-regression only when regressions should block the command:
repo-scoping self-assess \
--source-path . \
--comparison-output /tmp/repo-scoping-self-assessment.md \
--fail-on-regression
The command should not fail for ordinary needs_review results. Review-needed
output is signal, not a broken build.
Updating The Golden Profile
Update golden/repo-scoping-golden-profile.v1.json when the repository's real
product utility has changed. Examples:
- repo-scoping adds a genuinely new user-facing capability.
- a capability is renamed after curator agreement.
- a former out-of-scope behavior becomes product intent and has supporting implementation evidence.
Do not update the golden profile just because the engine failed to find an expected capability. That is usually an engine issue.
Fixing The Engine
Fix the engine when a challenger:
- repeats a known regression pattern
- promotes dependency, fixture, schema, scanner-rule, or workplan vocabulary as native capability truth
- places features under a capability they do not support
- loses source refs or cites evidence that does not support the abstraction
- relies on generated
SCOPE.mdas primary proof for rebuilding the same model
The 2026-05-15 run 39 failure is the canonical example: provider vocabulary from
scanner code, tests, fixtures, and schema examples became the false native
capability Route LLM Requests Across Providers. The correct action is to fix
scanner/generator/acceptance behavior, not to teach the golden profile that
repo-scoping is an LLM router.
Relationship To Agentic Acceptance
Deterministic assessment can reject, downgrade, or flag output with transparent
criteria. It should not approve candidate characteristics as registry truth.
When automation stands in for human review, the decision belongs to an agentic
reviewer that inspects evidence, applies versioned criteria, and records a
rationale. That acceptance redesign is tracked in RREG-WP-0014.