Capture clean self-assessment regression signal

2026-05-15 17:15:35 +02:00
parent abcb2cebbc
commit 458eb410c4
6 changed files with 2958 additions and 3 deletions
--- a/docs/self-scoping/README.md
+++ b/docs/self-scoping/README.md
@@ -11,6 +11,12 @@ instead of relying on memory or screenshots.
 - `assessments/repo-scoping-known-bad-2026-05-15-run-39.json` captures the
  known-bad self-analysis that promoted LLM-provider vocabulary into native
  repo-scoping capability truth.
+- `assessments/repo-scoping-post-wp0015-clean-2026-05-15.json` captures the
+  first clean, release-bound deterministic challenger after acceptance-boundary
+  and input-hygiene work. It remains a rejected regression because candidate
+  generation still collapses repo-scoping's native surfaces under the forbidden
+  provider-routing capability, but its source set no longer includes
+  `var/checkouts/` contamination.
 - `workflow.md` explains how to run challenger assessments, interpret outcomes,
  and decide whether to update the golden profile or fix the engine.
 - `outcomes/` stores append-only reviewer decisions created from side-by-side
--- a/docs/self-scoping/assessments/repo-scoping-post-wp0015-clean-2026-05-15.json
+++ b/docs/self-scoping/assessments/repo-scoping-post-wp0015-clean-2026-05-15.json
--- a/docs/self-scoping/assessments/repo-scoping-post-wp0015-clean-2026-05-15.md
+++ b/docs/self-scoping/assessments/repo-scoping-post-wp0015-clean-2026-05-15.md
@@ -0,0 +1,36 @@
+# Self-Scoping Comparison: repo-scoping-challenger-run-1
+
+- Status: `regression`
+- Golden profile: `repo-scoping-golden-profile-v1`
+- Target repo: `repo-scoping`
+- Summary: Assessment repeats known or forbidden self-scoping patterns; prefer the golden profile until the engine is corrected.
+
+## Missing Expected Capabilities
+- Explore Dependency And Impact Graphs
+- Generate And Maintain SCOPE.md
+- Generate Reviewable Candidate Characteristics
+- Index Source Content With Provenance
+- Provide Scope Context To Downstream Agents
+- Register And Track Repositories
+- Review And Approve Candidate Characteristics
+- Scan Repositories Into Observed Facts
+- Search Compare And Export Approved Profiles
+
+## Forbidden Native Capabilities Present
+- Route LLM Requests Across Providers
+
+## Known Regression Patterns
+- `RREG-SELF-REG-001` LLM provider vocabulary promoted as native capability: Generated tree contains Route LLM Requests Across Providers as a repo-scoping capability.
+- `RREG-SELF-REG-002` Native API and CLI surfaces attached under false capability: API or CLI surface features are nested below provider routing.
+
+## Misplaced Features
+- `HTTP API surface: possible API surface, GET /health, @app.get(, +49 more` under `Route LLM Requests Across Providers` (API): API/CLI surface is nested below provider-routing capability.
+- `CLI command surface: CLI command build_parser, CLI command make_service` under `Route LLM Requests Across Providers` (CLI): API/CLI surface is nested below provider-routing capability.
+
+## Matched Expected Capabilities
+- None
+
+## Review Hints
+- Do not promote this assessment as a preferred baseline.
+- Inspect forbidden capabilities and misplaced features first.
+- Use the findings as signal for scanner, generator, or acceptance-policy changes.
--- a/tests/test_self_scoping_artifacts.py
+++ b/tests/test_self_scoping_artifacts.py
@@ -11,6 +11,13 @@ KNOWN_BAD_PATH = (
    / "assessments"
    / "repo-scoping-known-bad-2026-05-15-run-39.json"
 )
+POST_WP0015_PATH = (
+    ROOT
+    / "docs"
+    / "self-scoping"
+    / "assessments"
+    / "repo-scoping-post-wp0015-clean-2026-05-15.json"
+)
 GOLDEN_PROFILE_PATH = (
    ROOT
    / "docs"
@@ -90,6 +97,26 @@ def test_known_bad_self_scoping_artifact_captures_rejected_regression_seed():
    assert artifact["quality_gate_outcomes"] == []


+def test_post_wp0015_self_scoping_artifact_is_cleanly_bound_and_unapproved():
+    artifact = load_json(POST_WP0015_PATH)
+
+    paths = artifact["content_chunk_summary"]["paths"]
+    capability_names = {
+        capability["name"]
+        for ability in artifact["generated_tree"]["abilities"]
+        for capability in ability["capabilities"]
+    }
+    criteria = {outcome["criterion_id"] for outcome in artifact["quality_gate_outcomes"]}
+
+    assert artifact["engine_identity"]["release_binding_status"] == "complete"
+    assert artifact["engine_identity"]["engine_dirty_state"] == "clean"
+    assert artifact["execution"]["mode"] == "deterministic-only"
+    assert not any(path.startswith("var/checkouts/") for path in paths)
+    assert artifact["approved_map"]["abilities"] == []
+    assert capability_names == {"Route LLM Requests Across Providers"}
+    assert criteria == {"RREG-QC-002", "RREG-QC-003"}
+
+
 def test_golden_profile_names_expected_native_capabilities_and_forbidden_false_positive():
    profile = load_json(GOLDEN_PROFILE_PATH)

--- a/workplans/RREG-WP-0015-self-assessment-input-hygiene.md
+++ b/workplans/RREG-WP-0015-self-assessment-input-hygiene.md
@@ -4,7 +4,7 @@ type: workplan
 title: "Self-Assessment Input Hygiene"
 domain: capabilities
 repo: repo-scoping
-status: active
+status: done
 owner: codex
 topic_slug: foerster-capabilities
 created: "2026-05-15"
@@ -52,7 +52,7 @@ contribute documentation, language, or LLM-provider facts to the parent repo.

 ```task
 id: RREG-WP-0015-T02
-status: todo
+status: done
 priority: high
 state_hub_task_id: "81bb46e7-01dc-4c14-8a32-1d4d456dc209"
 ```
@@ -67,11 +67,18 @@ Acceptance criteria:
  quality issues from approved registry truth.
 - The artifact/report names make their relationship to WP0014/WP0015 clear.

+Implementation note 2026-05-15: captured
+`docs/self-scoping/assessments/repo-scoping-post-wp0015-clean-2026-05-15.json`
+and the paired Markdown comparison report. The artifact is release-bound to a
+clean engine commit, contains zero `var/checkouts/` paths, leaves the approved
+map empty, and records quality-gate outcomes `RREG-QC-002` and `RREG-QC-003`
+against the remaining provider-routing candidate regression.
+
 ## T03: Triage Remaining Generator Quality Gaps

 ```task
 id: RREG-WP-0015-T03
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "20b6f34e-1d92-407b-84dd-6e3ec7e77eb3"
 ```
@@ -84,3 +91,12 @@ Acceptance criteria:
  and quality-gate outcomes.
 - The next workplan is scoped around generator improvements, not deterministic
  acceptance.
+
+Implementation note 2026-05-15: the clean challenger still generates only
+`Route LLM Requests Across Providers`, misses all curated expected
+repo-scoping capabilities, and misplaces API/CLI surfaces under provider
+routing. The approved map remains empty and quality gates flag the candidate
+with `RREG-QC-002` and `RREG-QC-003`, so the next slice is generator quality.
+Created `RREG-WP-0016 Native Candidate Generation Recovery` to focus on
+separating provider vocabulary from native capability seeds and recovering
+repo-owned candidate families.
--- a/workplans/RREG-WP-0016-native-candidate-generation-recovery.md
+++ b/workplans/RREG-WP-0016-native-candidate-generation-recovery.md
@@ -0,0 +1,95 @@
+---
+id: RREG-WP-0016
+type: workplan
+title: "Native Candidate Generation Recovery"
+domain: capabilities
+repo: repo-scoping
+status: active
+owner: codex
+topic_slug: foerster-capabilities
+created: "2026-05-15"
+updated: "2026-05-15"
+---
+
+# Native Candidate Generation Recovery
+
+WP0014 fixed the acceptance boundary: deterministic rules no longer promote bad
+candidates into approved registry truth. WP0015 fixed the self-assessment input
+set so repo-scoping no longer analyzes runtime checkouts as if they were native
+source.
+
+The clean post-WP0015 challenger still fails the golden profile. It produces one
+candidate capability, `Route LLM Requests Across Providers`, and nests native
+API/CLI surfaces below that false provider-routing capability. Quality gates now
+flag this correctly (`RREG-QC-002`, `RREG-QC-003`) and the approved map remains
+empty, but candidate generation still needs to recover the repo's real native
+capabilities.
+
+This workplan is deliberately about generation quality, not acceptance policy.
+
+## T01: Separate Provider Vocabulary From Native Capability Seeds
+
+```task
+id: RREG-WP-0016-T01
+status: todo
+priority: high
+```
+
+Update deterministic candidate generation so LLM-provider facts and credential
+configuration are never enough to create the parent capability for a repository
+unless supported by owned product intent/source.
+
+Acceptance criteria:
+- Provider vocabulary can remain evidence or a low-level observed fact.
+- Provider vocabulary does not become the dominant parent capability for
+  repo-scoping's own self-assessment.
+- Existing llm-connect-like fixtures that truly model provider adapters remain
+  explainable through source-role metadata.
+
+## T02: Generate Native Repo-Scoping Candidate Families
+
+```task
+id: RREG-WP-0016-T02
+status: todo
+priority: high
+```
+
+Use intent, docs, source, tests, and schema files to generate repo-scoping's
+expected native candidate families instead of a single provider-routing bucket.
+
+Initial expected families:
+- repository registration and metadata import
+- deterministic repository scanning and observed facts
+- provenance-aware content indexing
+- reviewable candidate characteristic generation
+- candidate review/edit/reject/merge/relink/approval workflow
+- approved profile search, comparison, export, and gap exploration
+- SCOPE.md generation, diffing, validation, and write flows
+- dependency graph and impact exploration
+- scope context API for downstream agents
+
+Acceptance criteria:
+- Clean self-assessment matches at least a meaningful subset of the golden
+  expected capabilities.
+- API and CLI features attach under native workflow candidates, not provider
+  routing.
+- Candidate source refs cite repo-owned docs/source/tests instead of schema
+  examples or dependency vocabulary alone.
+
+## T03: Re-Run Clean Self-Assessment And Compare
+
+```task
+id: RREG-WP-0016-T03
+status: todo
+priority: high
+```
+
+After generator improvements, rerun the clean deterministic self-assessment and
+compare it to the golden profile and the post-WP0015 rejected challenger.
+
+Acceptance criteria:
+- The forbidden provider-routing candidate is absent or isolated as rejected /
+  requires-review evidence rather than a native capability.
+- The comparison report shows matched expected capabilities.
+- Remaining gaps are captured as generator follow-up, golden-profile update, or
+  human review notes.