Capture native self-assessment improvement

This commit is contained in:
2026-05-15 18:34:00 +02:00
parent 2c3dad80d6
commit 83c39a7aa6
5 changed files with 9193 additions and 2 deletions

View File

@@ -17,6 +17,11 @@ instead of relying on memory or screenshots.
generation still collapses repo-scoping's native surfaces under the forbidden
provider-routing capability, but its source set no longer includes
`var/checkouts/` contamination.
- `assessments/repo-scoping-post-wp0016-native-2026-05-15.json` captures the
first deterministic challenger after native candidate generation recovery. It
matches every expected capability in the golden profile and has no known
provider-routing regression, while still leaving generated candidates pending
review with quality-gate signals.
- `workflow.md` explains how to run challenger assessments, interpret outcomes,
and decide whether to update the golden profile or fix the engine.
- `outcomes/` stores append-only reviewer decisions created from side-by-side

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,33 @@
# Self-Scoping Comparison: repo-scoping-challenger-run-1
- Status: `candidate_improvement`
- Golden profile: `repo-scoping-golden-profile-v1`
- Target repo: `repo-scoping`
- Summary: Assessment covers the golden profile without known regression patterns.
## Missing Expected Capabilities
- None
## Forbidden Native Capabilities Present
- None
## Known Regression Patterns
- None
## Misplaced Features
- None
## Matched Expected Capabilities
- Explore Dependency And Impact Graphs
- Generate And Maintain SCOPE.md
- Generate Reviewable Candidate Characteristics
- Index Source Content With Provenance
- Provide Scope Context To Downstream Agents
- Register And Track Repositories
- Review And Approve Candidate Characteristics
- Scan Repositories Into Observed Facts
- Search Compare And Export Approved Profiles
## Review Hints
- Candidate appears better than the known golden checks.
- Human or agentic review should still confirm source evidence quality.

View File

@@ -18,6 +18,13 @@ POST_WP0015_PATH = (
/ "assessments"
/ "repo-scoping-post-wp0015-clean-2026-05-15.json"
)
POST_WP0016_PATH = (
ROOT
/ "docs"
/ "self-scoping"
/ "assessments"
/ "repo-scoping-post-wp0016-native-2026-05-15.json"
)
GOLDEN_PROFILE_PATH = (
ROOT
/ "docs"
@@ -117,6 +124,33 @@ def test_post_wp0015_self_scoping_artifact_is_cleanly_bound_and_unapproved():
assert criteria == {"RREG-QC-002", "RREG-QC-003"}
def test_post_wp0016_self_scoping_artifact_matches_golden_without_regression():
artifact = load_json(POST_WP0016_PATH)
capability_names = {
capability["name"]
for ability in artifact["generated_tree"]["abilities"]
for capability in ability["capabilities"]
}
expected_names = {
capability["name"]
for capability in load_json(GOLDEN_PROFILE_PATH)["ability"][
"expected_capabilities"
]
}
regression_ids = {
item["id"] for item in artifact.get("known_regression_patterns", [])
}
criteria = {outcome["criterion_id"] for outcome in artifact["quality_gate_outcomes"]}
assert artifact["engine_identity"]["release_binding_status"] == "complete"
assert artifact["engine_identity"]["engine_dirty_state"] == "clean"
assert capability_names == expected_names
assert regression_ids == set()
assert artifact["approved_map"]["abilities"] == []
assert criteria == {"RREG-QC-001", "RREG-QC-006"}
def test_golden_profile_names_expected_native_capabilities_and_forbidden_false_positive():
profile = load_json(GOLDEN_PROFILE_PATH)

View File

@@ -4,7 +4,7 @@ type: workplan
title: "Native Candidate Generation Recovery"
domain: capabilities
repo: repo-scoping
status: active
status: done
owner: codex
topic_slug: foerster-capabilities
created: "2026-05-15"
@@ -101,7 +101,7 @@ and no misplaced API/CLI features were reported.
```task
id: RREG-WP-0016-T03
status: todo
status: done
priority: high
state_hub_task_id: "9ae662c0-3858-4c7d-b06e-26e1c8da7921"
```
@@ -115,3 +115,12 @@ Acceptance criteria:
- The comparison report shows matched expected capabilities.
- Remaining gaps are captured as generator follow-up, golden-profile update, or
human review notes.
Implementation note 2026-05-15: captured
`docs/self-scoping/assessments/repo-scoping-post-wp0016-native-2026-05-15.json`
and the paired Markdown comparison report from a clean engine commit. The
comparison status is `candidate_improvement`: all nine golden expected
capabilities match, no known provider-routing regression is present, no
misplaced API/CLI features are reported, and generated candidates remain
unapproved with transparent quality-gate outcomes `RREG-QC-001` and
`RREG-QC-006` for review.