--- id: RREG-WP-0009 type: workplan title: "Provenance-Aware Characteristic Rebuild" domain: capabilities repo: repo-scoping status: done owner: codex topic_slug: foerster-capabilities created: "2026-05-01" updated: "2026-05-02" state_hub_workstream_id: "d8f304b3-a30c-4172-99de-19ab84bf330e" --- # Provenance-Aware Characteristic Rebuild This workplan addresses the key-cape analysis failure where deterministic facts from agent guidance and generic implementation vocabulary were promoted into incorrect repository characteristics. The registry should derive expected utility from what a repository originally provides, not from every tool, dependency, import, or operational convention mentioned in its files. The target behavior is facts-first and provenance-aware: - Deterministic scanning observes facts without over-interpreting them. - Facts carry source roles such as intent summary, derived scope, product documentation, implementation source, dependency declaration, agent guidance, or CI/tooling. - Characteristic generation promotes only repository-owned utility unless the repository clearly acts as a facade or adapter for another capability. - Rebuild workflows can discard old approved characteristics and regenerate a fresh candidate graph from current facts when previous approvals are polluted. ## Model Source Provenance ```task id: RREG-WP-0009-T01 status: done priority: high state_hub_task_id: "0c189443-5000-4025-a144-75e5bf1e3be5" ``` Add provenance roles for observed facts and content chunks so downstream generation can distinguish product evidence from ambient context. Initial source roles: - `intent_summary`: `INTENT.md` and other design-intent files that describe why the repository should exist and what utility it is meant to provide. - `derived_scope`: `SCOPE.md` and other generated or curated current-scope files. These are valuable context, but should not be treated as primary evidence for regenerating characteristics unless a curator explicitly chooses a bootstrap/import mode. - `product_documentation`: README, docs, specifications, user-facing guides. - `implementation_source`: code files owned by the repository. - `test_evidence`: test and acceptance files. - `dependency_declaration`: manifests, lockfiles, imports, module requirements. - `configuration`: runtime configuration files and examples. - `ci_tooling`: build, release, and workflow automation. - `agent_guidance`: `AGENTS.md`, `CLAUDE.md`, `.claude/`, and similar agent operating instructions. - `external_reference`: mentions of sibling repos, upstream services, or tools that are not exposed as owned capability. Acceptance criteria: - Observed facts can carry a source role in metadata without breaking existing storage or API consumers. - `INTENT.md` is indexed as `intent_summary` and gets high priority during candidate generation. - `SCOPE.md` is indexed as `derived_scope` and remains distinguishable from source evidence and design intent. - Agent guidance files are classified separately from product documentation. - Content chunks preserve the fact source role used to produce them. ## Tighten Provider And Dependency Facts ```task id: RREG-WP-0009-T02 status: done priority: high state_hub_task_id: "3ef728a0-832f-4441-9ece-16888ef68c47" ``` Reduce false provider facts caused by filenames, agent guidance, dependency mentions, and generic words such as "adapter". Acceptance criteria: - `CLAUDE.md` and references to `CLAUDE.md` do not produce `llm_provider` facts. - Generic terms such as `adapter` do not imply an LLM provider registry unless they appear with strong provider-specific implementation evidence. - Provider facts record whether the repository appears to own, wrap, configure, consume, or merely mention the provider. - Tests cover the key-cape false-positive pattern and the llm-connect true positive pattern. ## Add Original Utility Gating ```task id: RREG-WP-0009-T03 status: done priority: high state_hub_task_id: "3b8bac53-6a14-43b3-9a59-e15c24c0cd6e" ``` Add a promotion gate between observed facts and candidate characteristics. The gate should ask whether the repository provides the capability in an original way or only uses, imports, configures, documents, or depends on it. Acceptance criteria: - Candidate capabilities carry a utility relationship such as `owned`, `facade`, `adapter`, `consumer`, `dependency`, `tooling`, or `mention`. - Only `owned`, `facade`, and explicit `adapter` relationships are eligible for auto-generated capabilities. - `consumer`, `dependency`, `tooling`, and `mention` relationships remain as evidence or "How It Fits" context unless a curator promotes them. - The generator explains rejected or downgraded candidate capabilities in a reviewable way. ## Reweight Scope And Documentation Inputs ```task id: RREG-WP-0009-T04 status: done priority: high state_hub_task_id: "4f666cd6-471e-4af9-b53c-4f3d7a1d1973" ``` Use explicit intent files and product documentation as stronger evidence for expected repository utility than ambient config, CI files, dependency mentions, agent instructions, or previously derived scope files. Acceptance criteria: - Candidate ability naming prefers `INTENT.md` one-liner/core idea when present. - Candidate capability generation can extract explicit intended capability blocks from `INTENT.md`. - `SCOPE.md` is treated as derived current scope, not as ordinary evidence for rebuilding the characteristic model from scratch. - Existing `SCOPE.md` files can be explicitly bootstrapped into initial `INTENT.md` files when no intent file exists; this is a one-time migration aid, not an ongoing equivalence between scope and intent. - README/docs/spec evidence is weighted above CI/tooling and generic config. - key-cape generates candidates centered on lightweight IAM, OIDC/PKCE profile enforcement, migration tooling, and LDAP/schema validation rather than LLM provider routing. ## Conservative Auto-Approval Policy ```task id: RREG-WP-0009-T05 status: done priority: medium state_hub_task_id: "d10d4bd7-4e5e-4efc-a724-b072fc53b8d2" ``` Prevent trusted auto-approve from approving characteristics whose evidence is weak, ambiguous, or sourced primarily from dependency/tooling/agent context. Acceptance criteria: - Auto-approval requires eligible utility relationship and sufficient source role support. - Candidates sourced mainly from `agent_guidance`, `ci_tooling`, `dependency_declaration`, or `external_reference` remain pending. - Auto-approval records why each approved candidate was considered safe. - Auto-approval records why each skipped candidate requires curator review. ## Rebuild Characteristics From Scratch ```task id: RREG-WP-0009-T06 status: done priority: high state_hub_task_id: "490b7926-9d03-4663-9d4d-9de9bbb9c755" ``` Provide an explicit rebuild option for repositories whose approved characteristics are polluted or stale. The rebuild should preserve audit history while allowing a clean candidate graph to replace prior approved characteristics. Acceptance criteria: - API endpoint supports dry-run rebuild for a repository: scan current source, regenerate facts/chunks/candidates, and report what would be removed, retained, or proposed. - API endpoint supports destructive rebuild with explicit confirmation: archive or supersede existing approved abilities, capabilities, features, and evidence, then regenerate candidates from scratch. - UI exposes "Rebuild characteristics from scratch" with a dry-run preview and confirmation step. - Rebuild records a review decision and progress/audit event naming the prior approved characteristic IDs that were superseded. - Rebuild can run with LLM assistance on or off and records which mode was used. ## Characteristic Rebuild CLI ```task id: RREG-WP-0009-T07 status: done priority: medium state_hub_task_id: "7afd6550-e4a4-4a8a-94bf-d974b0ccb8d2" ``` Add a CLI or maintenance command for batch rebuilds across all repositories or a selected subset. Acceptance criteria: - Command supports `--repo`, `--all`, `--dry-run`, `--no-llm`, and `--trusted-auto-approve` options. - Batch output lists repositories, latest analysis run, candidate source, approved items superseded, and remaining review queue. - Command refuses destructive all-repo rebuild unless an explicit confirmation flag is provided. - Tests cover dry-run and destructive single-repo rebuild behavior. ## Regression Fixtures ```task id: RREG-WP-0009-T08 status: done priority: high state_hub_task_id: "05077f3d-d40d-45fd-865c-0924407beb4f" ``` Add focused fixtures that lock in the distinction between owned capability, facade capability, dependency use, tooling context, and mere mentions. Acceptance criteria: - key-cape-like fixture includes `SCOPE.md`, `CLAUDE.md`, README references, backend adapters, and IAM/OIDC capability descriptions. - llm-connect-like fixture still generates LLM provider routing capability. - facade fixture proves a repo can legitimately expose an imported capability when it has public wrapper/API evidence. - dependency-only fixture proves imported libraries do not become provided capabilities. - Tests assert both positive generated characteristics and negative rejected or downgraded candidates. ## Documentation And Review Guidance ```task id: RREG-WP-0009-T09 status: done priority: medium state_hub_task_id: "071f6d76-c92b-4ac1-825c-edcbef4bdbf6" ``` Document how curators should interpret source provenance, utility relationships, and rebuild operations. Acceptance criteria: - `docs/characteristic-evidence-model.md` explains source roles and original utility. - `docs/terminology.md` defines owned capability, facade, dependency, tooling-context, mention, and rebuild/supersede. - Rebuild documentation explains when to rebuild from scratch versus rerun analysis while preserving approved characteristics. - key-cape is documented as the motivating failure mode without hard-coding product-specific behavior into the scanner. ## Cross-Repository Analysis Isolation ```task id: RREG-WP-0009-T10 status: done priority: high ``` Validate that analyzing one repository never depends on approved maps, candidate graphs, facts, chunks, or derived scope data from any other repository in the registry database. Acceptance criteria: - A repository with stale approved characteristics cannot influence fresh candidate generation for another repository. - Candidate graph, observed fact, and content chunk lookups remain scoped by repository and analysis run. - Tests cover a poisoned-repo scenario where repo A contains old LLM/provider characteristics and repo B still generates only its own repository-owned candidates.