generated from coulomb/repo-seed
280 lines
10 KiB
Markdown
280 lines
10 KiB
Markdown
---
|
|
id: RREG-WP-0009
|
|
type: workplan
|
|
title: "Provenance-Aware Characteristic Rebuild"
|
|
domain: capabilities
|
|
repo: repo-scoping
|
|
status: done
|
|
owner: codex
|
|
topic_slug: foerster-capabilities
|
|
created: "2026-05-01"
|
|
updated: "2026-05-02"
|
|
state_hub_workstream_id: "d8f304b3-a30c-4172-99de-19ab84bf330e"
|
|
---
|
|
|
|
# Provenance-Aware Characteristic Rebuild
|
|
|
|
This workplan addresses the key-cape analysis failure where deterministic facts
|
|
from agent guidance and generic implementation vocabulary were promoted into
|
|
incorrect repository characteristics. The registry should derive expected
|
|
utility from what a repository originally provides, not from every tool,
|
|
dependency, import, or operational convention mentioned in its files.
|
|
|
|
The target behavior is facts-first and provenance-aware:
|
|
|
|
- Deterministic scanning observes facts without over-interpreting them.
|
|
- Facts carry source roles such as intent summary, derived scope, product
|
|
documentation, implementation source, dependency declaration, agent guidance,
|
|
or CI/tooling.
|
|
- Characteristic generation promotes only repository-owned utility unless the
|
|
repository clearly acts as a facade or adapter for another capability.
|
|
- Rebuild workflows can discard old approved characteristics and regenerate a
|
|
fresh candidate graph from current facts when previous approvals are polluted.
|
|
|
|
## Model Source Provenance
|
|
|
|
```task
|
|
id: RREG-WP-0009-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "0c189443-5000-4025-a144-75e5bf1e3be5"
|
|
```
|
|
|
|
Add provenance roles for observed facts and content chunks so downstream
|
|
generation can distinguish product evidence from ambient context.
|
|
|
|
Initial source roles:
|
|
|
|
- `intent_summary`: `INTENT.md` and other design-intent files that describe why
|
|
the repository should exist and what utility it is meant to provide.
|
|
- `derived_scope`: `SCOPE.md` and other generated or curated current-scope
|
|
files. These are valuable context, but should not be treated as primary
|
|
evidence for regenerating characteristics unless a curator explicitly chooses
|
|
a bootstrap/import mode.
|
|
- `product_documentation`: README, docs, specifications, user-facing guides.
|
|
- `implementation_source`: code files owned by the repository.
|
|
- `test_evidence`: test and acceptance files.
|
|
- `dependency_declaration`: manifests, lockfiles, imports, module requirements.
|
|
- `configuration`: runtime configuration files and examples.
|
|
- `ci_tooling`: build, release, and workflow automation.
|
|
- `agent_guidance`: `AGENTS.md`, `CLAUDE.md`, `.claude/`, and similar agent
|
|
operating instructions.
|
|
- `external_reference`: mentions of sibling repos, upstream services, or tools
|
|
that are not exposed as owned capability.
|
|
|
|
Acceptance criteria:
|
|
- Observed facts can carry a source role in metadata without breaking existing
|
|
storage or API consumers.
|
|
- `INTENT.md` is indexed as `intent_summary` and gets high priority during
|
|
candidate generation.
|
|
- `SCOPE.md` is indexed as `derived_scope` and remains distinguishable from
|
|
source evidence and design intent.
|
|
- Agent guidance files are classified separately from product documentation.
|
|
- Content chunks preserve the fact source role used to produce them.
|
|
|
|
## Tighten Provider And Dependency Facts
|
|
|
|
```task
|
|
id: RREG-WP-0009-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "3ef728a0-832f-4441-9ece-16888ef68c47"
|
|
```
|
|
|
|
Reduce false provider facts caused by filenames, agent guidance, dependency
|
|
mentions, and generic words such as "adapter".
|
|
|
|
Acceptance criteria:
|
|
- `CLAUDE.md` and references to `CLAUDE.md` do not produce `llm_provider`
|
|
facts.
|
|
- Generic terms such as `adapter` do not imply an LLM provider registry unless
|
|
they appear with strong provider-specific implementation evidence.
|
|
- Provider facts record whether the repository appears to own, wrap, configure,
|
|
consume, or merely mention the provider.
|
|
- Tests cover the key-cape false-positive pattern and the llm-connect true
|
|
positive pattern.
|
|
|
|
## Add Original Utility Gating
|
|
|
|
```task
|
|
id: RREG-WP-0009-T03
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "3b8bac53-6a14-43b3-9a59-e15c24c0cd6e"
|
|
```
|
|
|
|
Add a promotion gate between observed facts and candidate characteristics. The
|
|
gate should ask whether the repository provides the capability in an original
|
|
way or only uses, imports, configures, documents, or depends on it.
|
|
|
|
Acceptance criteria:
|
|
- Candidate capabilities carry a utility relationship such as `owned`,
|
|
`facade`, `adapter`, `consumer`, `dependency`, `tooling`, or `mention`.
|
|
- Only `owned`, `facade`, and explicit `adapter` relationships are eligible for
|
|
auto-generated capabilities.
|
|
- `consumer`, `dependency`, `tooling`, and `mention` relationships remain as
|
|
evidence or "How It Fits" context unless a curator promotes them.
|
|
- The generator explains rejected or downgraded candidate capabilities in a
|
|
reviewable way.
|
|
|
|
## Reweight Scope And Documentation Inputs
|
|
|
|
```task
|
|
id: RREG-WP-0009-T04
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "4f666cd6-471e-4af9-b53c-4f3d7a1d1973"
|
|
```
|
|
|
|
Use explicit intent files and product documentation as stronger evidence for
|
|
expected repository utility than ambient config, CI files, dependency mentions,
|
|
agent instructions, or previously derived scope files.
|
|
|
|
Acceptance criteria:
|
|
- Candidate ability naming prefers `INTENT.md` one-liner/core idea when present.
|
|
- Candidate capability generation can extract explicit intended capability
|
|
blocks from `INTENT.md`.
|
|
- `SCOPE.md` is treated as derived current scope, not as ordinary evidence for
|
|
rebuilding the characteristic model from scratch.
|
|
- Existing `SCOPE.md` files can be explicitly bootstrapped into initial
|
|
`INTENT.md` files when no intent file exists; this is a one-time migration
|
|
aid, not an ongoing equivalence between scope and intent.
|
|
- README/docs/spec evidence is weighted above CI/tooling and generic config.
|
|
- key-cape generates candidates centered on lightweight IAM, OIDC/PKCE profile
|
|
enforcement, migration tooling, and LDAP/schema validation rather than LLM
|
|
provider routing.
|
|
|
|
## Conservative Auto-Approval Policy
|
|
|
|
```task
|
|
id: RREG-WP-0009-T05
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "d10d4bd7-4e5e-4efc-a724-b072fc53b8d2"
|
|
```
|
|
|
|
Prevent trusted auto-approve from approving characteristics whose evidence is
|
|
weak, ambiguous, or sourced primarily from dependency/tooling/agent context.
|
|
|
|
Acceptance criteria:
|
|
- Auto-approval requires eligible utility relationship and sufficient source
|
|
role support.
|
|
- Candidates sourced mainly from `agent_guidance`, `ci_tooling`,
|
|
`dependency_declaration`, or `external_reference` remain pending.
|
|
- Auto-approval records why each approved candidate was considered safe.
|
|
- Auto-approval records why each skipped candidate requires curator review.
|
|
|
|
## Rebuild Characteristics From Scratch
|
|
|
|
```task
|
|
id: RREG-WP-0009-T06
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "490b7926-9d03-4663-9d4d-9de9bbb9c755"
|
|
```
|
|
|
|
Provide an explicit rebuild option for repositories whose approved
|
|
characteristics are polluted or stale. The rebuild should preserve audit history
|
|
while allowing a clean candidate graph to replace prior approved
|
|
characteristics.
|
|
|
|
Acceptance criteria:
|
|
- API endpoint supports dry-run rebuild for a repository:
|
|
scan current source, regenerate facts/chunks/candidates, and report what would
|
|
be removed, retained, or proposed.
|
|
- API endpoint supports destructive rebuild with explicit confirmation:
|
|
archive or supersede existing approved abilities, capabilities, features, and
|
|
evidence, then regenerate candidates from scratch.
|
|
- UI exposes "Rebuild characteristics from scratch" with a dry-run preview and
|
|
confirmation step.
|
|
- Rebuild records a review decision and progress/audit event naming the prior
|
|
approved characteristic IDs that were superseded.
|
|
- Rebuild can run with LLM assistance on or off and records which mode was used.
|
|
|
|
## Characteristic Rebuild CLI
|
|
|
|
```task
|
|
id: RREG-WP-0009-T07
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "7afd6550-e4a4-4a8a-94bf-d974b0ccb8d2"
|
|
```
|
|
|
|
Add a CLI or maintenance command for batch rebuilds across all repositories or a
|
|
selected subset.
|
|
|
|
Acceptance criteria:
|
|
- Command supports `--repo`, `--all`, `--dry-run`, `--no-llm`, and
|
|
`--trusted-auto-approve` options.
|
|
- Batch output lists repositories, latest analysis run, candidate source,
|
|
approved items superseded, and remaining review queue.
|
|
- Command refuses destructive all-repo rebuild unless an explicit confirmation
|
|
flag is provided.
|
|
- Tests cover dry-run and destructive single-repo rebuild behavior.
|
|
|
|
## Regression Fixtures
|
|
|
|
```task
|
|
id: RREG-WP-0009-T08
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "05077f3d-d40d-45fd-865c-0924407beb4f"
|
|
```
|
|
|
|
Add focused fixtures that lock in the distinction between owned capability,
|
|
facade capability, dependency use, tooling context, and mere mentions.
|
|
|
|
Acceptance criteria:
|
|
- key-cape-like fixture includes `SCOPE.md`, `CLAUDE.md`, README references,
|
|
backend adapters, and IAM/OIDC capability descriptions.
|
|
- llm-connect-like fixture still generates LLM provider routing capability.
|
|
- facade fixture proves a repo can legitimately expose an imported capability
|
|
when it has public wrapper/API evidence.
|
|
- dependency-only fixture proves imported libraries do not become provided
|
|
capabilities.
|
|
- Tests assert both positive generated characteristics and negative rejected or
|
|
downgraded candidates.
|
|
|
|
## Documentation And Review Guidance
|
|
|
|
```task
|
|
id: RREG-WP-0009-T09
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "071f6d76-c92b-4ac1-825c-edcbef4bdbf6"
|
|
```
|
|
|
|
Document how curators should interpret source provenance, utility
|
|
relationships, and rebuild operations.
|
|
|
|
Acceptance criteria:
|
|
- `docs/characteristic-evidence-model.md` explains source roles and original
|
|
utility.
|
|
- `docs/terminology.md` defines owned capability, facade, dependency,
|
|
tooling-context, mention, and rebuild/supersede.
|
|
- Rebuild documentation explains when to rebuild from scratch versus rerun
|
|
analysis while preserving approved characteristics.
|
|
- key-cape is documented as the motivating failure mode without hard-coding
|
|
product-specific behavior into the scanner.
|
|
|
|
## Cross-Repository Analysis Isolation
|
|
|
|
```task
|
|
id: RREG-WP-0009-T10
|
|
status: done
|
|
priority: high
|
|
```
|
|
|
|
Validate that analyzing one repository never depends on approved maps,
|
|
candidate graphs, facts, chunks, or derived scope data from any other
|
|
repository in the registry database.
|
|
|
|
Acceptance criteria:
|
|
- A repository with stale approved characteristics cannot influence fresh
|
|
candidate generation for another repository.
|
|
- Candidate graph, observed fact, and content chunk lookups remain scoped by
|
|
repository and analysis run.
|
|
- Tests cover a poisoned-repo scenario where repo A contains old LLM/provider
|
|
characteristics and repo B still generates only its own repository-owned
|
|
candidates.
|