Files
repo-scoping/docs/self-scoping/workflow.md

4.5 KiB

Self-Scoping Assessment Workflow

Self-scoping is the feedback loop for improving repo-scoping with evidence. The loop is simple: run the current engine against repo-scoping itself, compare the result to a curated golden profile and known bad runs, then record whether the new result is better.

Outcome Terms

  • baseline: a result accepted as a reference point for later comparisons.
  • challenger: a fresh result from a new engine version or configuration.
  • preferred: the reviewer chose this result over the prior baseline.
  • tied: the reviewer judged old and new results roughly equivalent.
  • rejected: the result is known bad and should not become baseline truth.
  • superseded: the result used to be useful but was replaced by a newer preferred assessment.
  • needs-human: the result cannot be judged confidently without curator review.

The known 2026-05-15 run 39 artifact is a rejected negative regression seed, not a baseline to imitate.

Release Binding

Assessment output is only useful if it is bound to the engine that generated it. Comparable challenger artifacts should record:

  • repo-scoping package version
  • engine git commit
  • engine release or tag when available
  • engine dirty state
  • scanner version
  • candidate generator version
  • quality criteria version
  • prompt version when LLM or agentic review is used

An artifact with release_binding_status=complete can be compared as a real challenger. An artifact with historical_incomplete can still be useful as a negative seed, but it should not become a preferred baseline. An unbound artifact is diagnostic only.

Dirty state does not automatically make an artifact useless, but it must be visible. A dirty challenger should usually be rerun after the relevant changes are committed.

Standard Loop

  1. Run the self-assessment command:

    repo-scoping self-assess \
      --source-path . \
      --assessment-output docs/self-scoping/assessments/repo-scoping-challenger.json \
      --comparison-output docs/self-scoping/assessments/repo-scoping-challenger.md
    
  2. Read the comparison report.

  3. If the report says regression, inspect forbidden capabilities, misplaced features, and known regression patterns first.

  4. If the report says needs_review, inspect missing expected capabilities and source evidence before choosing old or new output.

  5. If the report says candidate_improvement, still confirm that the hierarchy, source refs, and native-utility boundaries make sense.

  6. Record the decision as an assessment outcome before changing the active baseline.

CI Use

Use --fail-on-regression only when regressions should block the command:

repo-scoping self-assess \
  --source-path . \
  --comparison-output /tmp/repo-scoping-self-assessment.md \
  --fail-on-regression

The command should not fail for ordinary needs_review results. Review-needed output is signal, not a broken build.

Updating The Golden Profile

Update golden/repo-scoping-golden-profile.v1.json when the repository's real product utility has changed. Examples:

  • repo-scoping adds a genuinely new user-facing capability.
  • a capability is renamed after curator agreement.
  • a former out-of-scope behavior becomes product intent and has supporting implementation evidence.

Do not update the golden profile just because the engine failed to find an expected capability. That is usually an engine issue.

Fixing The Engine

Fix the engine when a challenger:

  • repeats a known regression pattern
  • promotes dependency, fixture, schema, scanner-rule, or workplan vocabulary as native capability truth
  • places features under a capability they do not support
  • loses source refs or cites evidence that does not support the abstraction
  • relies on generated SCOPE.md as primary proof for rebuilding the same model

The 2026-05-15 run 39 failure is the canonical example: provider vocabulary from scanner code, tests, fixtures, and schema examples became the false native capability Route LLM Requests Across Providers. The correct action is to fix scanner/generator/acceptance behavior, not to teach the golden profile that repo-scoping is an LLM router.

Relationship To Agentic Acceptance

Deterministic assessment can reject, downgrade, or flag output with transparent criteria. It should not approve candidate characteristics as registry truth. When automation stands in for human review, the decision belongs to an agentic reviewer that inspects evidence, applies versioned criteria, and records a rationale. That acceptance redesign is tracked in RREG-WP-0014.