generated from coulomb/repo-seed
Capture clean self-assessment regression signal
This commit is contained in:
@@ -11,6 +11,12 @@ instead of relying on memory or screenshots.
|
||||
- `assessments/repo-scoping-known-bad-2026-05-15-run-39.json` captures the
|
||||
known-bad self-analysis that promoted LLM-provider vocabulary into native
|
||||
repo-scoping capability truth.
|
||||
- `assessments/repo-scoping-post-wp0015-clean-2026-05-15.json` captures the
|
||||
first clean, release-bound deterministic challenger after acceptance-boundary
|
||||
and input-hygiene work. It remains a rejected regression because candidate
|
||||
generation still collapses repo-scoping's native surfaces under the forbidden
|
||||
provider-routing capability, but its source set no longer includes
|
||||
`var/checkouts/` contamination.
|
||||
- `workflow.md` explains how to run challenger assessments, interpret outcomes,
|
||||
and decide whether to update the golden profile or fix the engine.
|
||||
- `outcomes/` stores append-only reviewer decisions created from side-by-side
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,36 @@
|
||||
# Self-Scoping Comparison: repo-scoping-challenger-run-1
|
||||
|
||||
- Status: `regression`
|
||||
- Golden profile: `repo-scoping-golden-profile-v1`
|
||||
- Target repo: `repo-scoping`
|
||||
- Summary: Assessment repeats known or forbidden self-scoping patterns; prefer the golden profile until the engine is corrected.
|
||||
|
||||
## Missing Expected Capabilities
|
||||
- Explore Dependency And Impact Graphs
|
||||
- Generate And Maintain SCOPE.md
|
||||
- Generate Reviewable Candidate Characteristics
|
||||
- Index Source Content With Provenance
|
||||
- Provide Scope Context To Downstream Agents
|
||||
- Register And Track Repositories
|
||||
- Review And Approve Candidate Characteristics
|
||||
- Scan Repositories Into Observed Facts
|
||||
- Search Compare And Export Approved Profiles
|
||||
|
||||
## Forbidden Native Capabilities Present
|
||||
- Route LLM Requests Across Providers
|
||||
|
||||
## Known Regression Patterns
|
||||
- `RREG-SELF-REG-001` LLM provider vocabulary promoted as native capability: Generated tree contains Route LLM Requests Across Providers as a repo-scoping capability.
|
||||
- `RREG-SELF-REG-002` Native API and CLI surfaces attached under false capability: API or CLI surface features are nested below provider routing.
|
||||
|
||||
## Misplaced Features
|
||||
- `HTTP API surface: possible API surface, GET /health, @app.get(, +49 more` under `Route LLM Requests Across Providers` (API): API/CLI surface is nested below provider-routing capability.
|
||||
- `CLI command surface: CLI command build_parser, CLI command make_service` under `Route LLM Requests Across Providers` (CLI): API/CLI surface is nested below provider-routing capability.
|
||||
|
||||
## Matched Expected Capabilities
|
||||
- None
|
||||
|
||||
## Review Hints
|
||||
- Do not promote this assessment as a preferred baseline.
|
||||
- Inspect forbidden capabilities and misplaced features first.
|
||||
- Use the findings as signal for scanner, generator, or acceptance-policy changes.
|
||||
@@ -11,6 +11,13 @@ KNOWN_BAD_PATH = (
|
||||
/ "assessments"
|
||||
/ "repo-scoping-known-bad-2026-05-15-run-39.json"
|
||||
)
|
||||
POST_WP0015_PATH = (
|
||||
ROOT
|
||||
/ "docs"
|
||||
/ "self-scoping"
|
||||
/ "assessments"
|
||||
/ "repo-scoping-post-wp0015-clean-2026-05-15.json"
|
||||
)
|
||||
GOLDEN_PROFILE_PATH = (
|
||||
ROOT
|
||||
/ "docs"
|
||||
@@ -90,6 +97,26 @@ def test_known_bad_self_scoping_artifact_captures_rejected_regression_seed():
|
||||
assert artifact["quality_gate_outcomes"] == []
|
||||
|
||||
|
||||
def test_post_wp0015_self_scoping_artifact_is_cleanly_bound_and_unapproved():
|
||||
artifact = load_json(POST_WP0015_PATH)
|
||||
|
||||
paths = artifact["content_chunk_summary"]["paths"]
|
||||
capability_names = {
|
||||
capability["name"]
|
||||
for ability in artifact["generated_tree"]["abilities"]
|
||||
for capability in ability["capabilities"]
|
||||
}
|
||||
criteria = {outcome["criterion_id"] for outcome in artifact["quality_gate_outcomes"]}
|
||||
|
||||
assert artifact["engine_identity"]["release_binding_status"] == "complete"
|
||||
assert artifact["engine_identity"]["engine_dirty_state"] == "clean"
|
||||
assert artifact["execution"]["mode"] == "deterministic-only"
|
||||
assert not any(path.startswith("var/checkouts/") for path in paths)
|
||||
assert artifact["approved_map"]["abilities"] == []
|
||||
assert capability_names == {"Route LLM Requests Across Providers"}
|
||||
assert criteria == {"RREG-QC-002", "RREG-QC-003"}
|
||||
|
||||
|
||||
def test_golden_profile_names_expected_native_capabilities_and_forbidden_false_positive():
|
||||
profile = load_json(GOLDEN_PROFILE_PATH)
|
||||
|
||||
|
||||
@@ -4,7 +4,7 @@ type: workplan
|
||||
title: "Self-Assessment Input Hygiene"
|
||||
domain: capabilities
|
||||
repo: repo-scoping
|
||||
status: active
|
||||
status: done
|
||||
owner: codex
|
||||
topic_slug: foerster-capabilities
|
||||
created: "2026-05-15"
|
||||
@@ -52,7 +52,7 @@ contribute documentation, language, or LLM-provider facts to the parent repo.
|
||||
|
||||
```task
|
||||
id: RREG-WP-0015-T02
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "81bb46e7-01dc-4c14-8a32-1d4d456dc209"
|
||||
```
|
||||
@@ -67,11 +67,18 @@ Acceptance criteria:
|
||||
quality issues from approved registry truth.
|
||||
- The artifact/report names make their relationship to WP0014/WP0015 clear.
|
||||
|
||||
Implementation note 2026-05-15: captured
|
||||
`docs/self-scoping/assessments/repo-scoping-post-wp0015-clean-2026-05-15.json`
|
||||
and the paired Markdown comparison report. The artifact is release-bound to a
|
||||
clean engine commit, contains zero `var/checkouts/` paths, leaves the approved
|
||||
map empty, and records quality-gate outcomes `RREG-QC-002` and `RREG-QC-003`
|
||||
against the remaining provider-routing candidate regression.
|
||||
|
||||
## T03: Triage Remaining Generator Quality Gaps
|
||||
|
||||
```task
|
||||
id: RREG-WP-0015-T03
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "20b6f34e-1d92-407b-84dd-6e3ec7e77eb3"
|
||||
```
|
||||
@@ -84,3 +91,12 @@ Acceptance criteria:
|
||||
and quality-gate outcomes.
|
||||
- The next workplan is scoped around generator improvements, not deterministic
|
||||
acceptance.
|
||||
|
||||
Implementation note 2026-05-15: the clean challenger still generates only
|
||||
`Route LLM Requests Across Providers`, misses all curated expected
|
||||
repo-scoping capabilities, and misplaces API/CLI surfaces under provider
|
||||
routing. The approved map remains empty and quality gates flag the candidate
|
||||
with `RREG-QC-002` and `RREG-QC-003`, so the next slice is generator quality.
|
||||
Created `RREG-WP-0016 Native Candidate Generation Recovery` to focus on
|
||||
separating provider vocabulary from native capability seeds and recovering
|
||||
repo-owned candidate families.
|
||||
|
||||
@@ -0,0 +1,95 @@
|
||||
---
|
||||
id: RREG-WP-0016
|
||||
type: workplan
|
||||
title: "Native Candidate Generation Recovery"
|
||||
domain: capabilities
|
||||
repo: repo-scoping
|
||||
status: active
|
||||
owner: codex
|
||||
topic_slug: foerster-capabilities
|
||||
created: "2026-05-15"
|
||||
updated: "2026-05-15"
|
||||
---
|
||||
|
||||
# Native Candidate Generation Recovery
|
||||
|
||||
WP0014 fixed the acceptance boundary: deterministic rules no longer promote bad
|
||||
candidates into approved registry truth. WP0015 fixed the self-assessment input
|
||||
set so repo-scoping no longer analyzes runtime checkouts as if they were native
|
||||
source.
|
||||
|
||||
The clean post-WP0015 challenger still fails the golden profile. It produces one
|
||||
candidate capability, `Route LLM Requests Across Providers`, and nests native
|
||||
API/CLI surfaces below that false provider-routing capability. Quality gates now
|
||||
flag this correctly (`RREG-QC-002`, `RREG-QC-003`) and the approved map remains
|
||||
empty, but candidate generation still needs to recover the repo's real native
|
||||
capabilities.
|
||||
|
||||
This workplan is deliberately about generation quality, not acceptance policy.
|
||||
|
||||
## T01: Separate Provider Vocabulary From Native Capability Seeds
|
||||
|
||||
```task
|
||||
id: RREG-WP-0016-T01
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Update deterministic candidate generation so LLM-provider facts and credential
|
||||
configuration are never enough to create the parent capability for a repository
|
||||
unless supported by owned product intent/source.
|
||||
|
||||
Acceptance criteria:
|
||||
- Provider vocabulary can remain evidence or a low-level observed fact.
|
||||
- Provider vocabulary does not become the dominant parent capability for
|
||||
repo-scoping's own self-assessment.
|
||||
- Existing llm-connect-like fixtures that truly model provider adapters remain
|
||||
explainable through source-role metadata.
|
||||
|
||||
## T02: Generate Native Repo-Scoping Candidate Families
|
||||
|
||||
```task
|
||||
id: RREG-WP-0016-T02
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Use intent, docs, source, tests, and schema files to generate repo-scoping's
|
||||
expected native candidate families instead of a single provider-routing bucket.
|
||||
|
||||
Initial expected families:
|
||||
- repository registration and metadata import
|
||||
- deterministic repository scanning and observed facts
|
||||
- provenance-aware content indexing
|
||||
- reviewable candidate characteristic generation
|
||||
- candidate review/edit/reject/merge/relink/approval workflow
|
||||
- approved profile search, comparison, export, and gap exploration
|
||||
- SCOPE.md generation, diffing, validation, and write flows
|
||||
- dependency graph and impact exploration
|
||||
- scope context API for downstream agents
|
||||
|
||||
Acceptance criteria:
|
||||
- Clean self-assessment matches at least a meaningful subset of the golden
|
||||
expected capabilities.
|
||||
- API and CLI features attach under native workflow candidates, not provider
|
||||
routing.
|
||||
- Candidate source refs cite repo-owned docs/source/tests instead of schema
|
||||
examples or dependency vocabulary alone.
|
||||
|
||||
## T03: Re-Run Clean Self-Assessment And Compare
|
||||
|
||||
```task
|
||||
id: RREG-WP-0016-T03
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
After generator improvements, rerun the clean deterministic self-assessment and
|
||||
compare it to the golden profile and the post-WP0015 rejected challenger.
|
||||
|
||||
Acceptance criteria:
|
||||
- The forbidden provider-routing candidate is absent or isolated as rejected /
|
||||
requires-review evidence rather than a native capability.
|
||||
- The comparison report shows matched expected capabilities.
|
||||
- Remaining gaps are captured as generator follow-up, golden-profile update, or
|
||||
human review notes.
|
||||
Reference in New Issue
Block a user