repo-scoping/workplans/RREG-WP-0009-provenance-aware-characteristic-rebuild.md

---
id: RREG-WP-0009
type: workplan
title: "Provenance-Aware Characteristic Rebuild"
domain: capabilities
repo: repo-scoping
status: done
owner: codex
topic_slug: foerster-capabilities
created: "2026-05-01"
updated: "2026-05-02"
state_hub_workstream_id: "d8f304b3-a30c-4172-99de-19ab84bf330e"
---

# Provenance-Aware Characteristic Rebuild

This workplan addresses the key-cape analysis failure where deterministic facts
from agent guidance and generic implementation vocabulary were promoted into
incorrect repository characteristics. The registry should derive expected
utility from what a repository originally provides, not from every tool,
dependency, import, or operational convention mentioned in its files.

The target behavior is facts-first and provenance-aware:

- Deterministic scanning observes facts without over-interpreting them.
- Facts carry source roles such as intent summary, derived scope, product
  documentation, implementation source, dependency declaration, agent guidance,
  or CI/tooling.
- Characteristic generation promotes only repository-owned utility unless the
  repository clearly acts as a facade or adapter for another capability.
- Rebuild workflows can discard old approved characteristics and regenerate a
  fresh candidate graph from current facts when previous approvals are polluted.

## Model Source Provenance

```task
id: RREG-WP-0009-T01
status: done
priority: high
state_hub_task_id: "0c189443-5000-4025-a144-75e5bf1e3be5"
```

Add provenance roles for observed facts and content chunks so downstream
generation can distinguish product evidence from ambient context.

Initial source roles:

- `intent_summary`: `INTENT.md` and other design-intent files that describe why
  the repository should exist and what utility it is meant to provide.
- `derived_scope`: `SCOPE.md` and other generated or curated current-scope
  files. These are valuable context, but should not be treated as primary
  evidence for regenerating characteristics unless a curator explicitly chooses
  a bootstrap/import mode.
- `product_documentation`: README, docs, specifications, user-facing guides.
- `implementation_source`: code files owned by the repository.
- `test_evidence`: test and acceptance files.
- `dependency_declaration`: manifests, lockfiles, imports, module requirements.
- `configuration`: runtime configuration files and examples.
- `ci_tooling`: build, release, and workflow automation.
- `agent_guidance`: `AGENTS.md`, `CLAUDE.md`, `.claude/`, and similar agent
  operating instructions.
- `external_reference`: mentions of sibling repos, upstream services, or tools
  that are not exposed as owned capability.

Acceptance criteria:
- Observed facts can carry a source role in metadata without breaking existing
  storage or API consumers.
- `INTENT.md` is indexed as `intent_summary` and gets high priority during
  candidate generation.
- `SCOPE.md` is indexed as `derived_scope` and remains distinguishable from
  source evidence and design intent.
- Agent guidance files are classified separately from product documentation.
- Content chunks preserve the fact source role used to produce them.

## Tighten Provider And Dependency Facts

```task
id: RREG-WP-0009-T02
status: done
priority: high
state_hub_task_id: "3ef728a0-832f-4441-9ece-16888ef68c47"
```

Reduce false provider facts caused by filenames, agent guidance, dependency
mentions, and generic words such as "adapter".

Acceptance criteria:
- `CLAUDE.md` and references to `CLAUDE.md` do not produce `llm_provider`
  facts.
- Generic terms such as `adapter` do not imply an LLM provider registry unless
  they appear with strong provider-specific implementation evidence.
- Provider facts record whether the repository appears to own, wrap, configure,
  consume, or merely mention the provider.
- Tests cover the key-cape false-positive pattern and the llm-connect true
  positive pattern.

## Add Original Utility Gating

```task
id: RREG-WP-0009-T03
status: done
priority: high
state_hub_task_id: "3b8bac53-6a14-43b3-9a59-e15c24c0cd6e"
```

Add a promotion gate between observed facts and candidate characteristics. The
gate should ask whether the repository provides the capability in an original
way or only uses, imports, configures, documents, or depends on it.

Acceptance criteria:
- Candidate capabilities carry a utility relationship such as `owned`,
  `facade`, `adapter`, `consumer`, `dependency`, `tooling`, or `mention`.
- Only `owned`, `facade`, and explicit `adapter` relationships are eligible for
  auto-generated capabilities.
- `consumer`, `dependency`, `tooling`, and `mention` relationships remain as
  evidence or "How It Fits" context unless a curator promotes them.
- The generator explains rejected or downgraded candidate capabilities in a
  reviewable way.

## Reweight Scope And Documentation Inputs

```task
id: RREG-WP-0009-T04
status: done
priority: high
state_hub_task_id: "4f666cd6-471e-4af9-b53c-4f3d7a1d1973"
```

Use explicit intent files and product documentation as stronger evidence for
expected repository utility than ambient config, CI files, dependency mentions,
agent instructions, or previously derived scope files.

Acceptance criteria:
- Candidate ability naming prefers `INTENT.md` one-liner/core idea when present.
- Candidate capability generation can extract explicit intended capability
  blocks from `INTENT.md`.
- `SCOPE.md` is treated as derived current scope, not as ordinary evidence for
  rebuilding the characteristic model from scratch.
- Existing `SCOPE.md` files can be explicitly bootstrapped into initial
  `INTENT.md` files when no intent file exists; this is a one-time migration
  aid, not an ongoing equivalence between scope and intent.
- README/docs/spec evidence is weighted above CI/tooling and generic config.
- key-cape generates candidates centered on lightweight IAM, OIDC/PKCE profile
  enforcement, migration tooling, and LDAP/schema validation rather than LLM
  provider routing.

## Conservative Auto-Approval Policy

```task
id: RREG-WP-0009-T05
status: done
priority: medium
state_hub_task_id: "d10d4bd7-4e5e-4efc-a724-b072fc53b8d2"
```

Prevent trusted auto-approve from approving characteristics whose evidence is
weak, ambiguous, or sourced primarily from dependency/tooling/agent context.

Acceptance criteria:
- Auto-approval requires eligible utility relationship and sufficient source
  role support.
- Candidates sourced mainly from `agent_guidance`, `ci_tooling`,
  `dependency_declaration`, or `external_reference` remain pending.
- Auto-approval records why each approved candidate was considered safe.
- Auto-approval records why each skipped candidate requires curator review.

## Rebuild Characteristics From Scratch

```task
id: RREG-WP-0009-T06
status: done
priority: high
state_hub_task_id: "490b7926-9d03-4663-9d4d-9de9bbb9c755"
```

Provide an explicit rebuild option for repositories whose approved
characteristics are polluted or stale. The rebuild should preserve audit history
while allowing a clean candidate graph to replace prior approved
characteristics.

Acceptance criteria:
- API endpoint supports dry-run rebuild for a repository:
  scan current source, regenerate facts/chunks/candidates, and report what would
  be removed, retained, or proposed.
- API endpoint supports destructive rebuild with explicit confirmation:
  archive or supersede existing approved abilities, capabilities, features, and
  evidence, then regenerate candidates from scratch.
- UI exposes "Rebuild characteristics from scratch" with a dry-run preview and
  confirmation step.
- Rebuild records a review decision and progress/audit event naming the prior
  approved characteristic IDs that were superseded.
- Rebuild can run with LLM assistance on or off and records which mode was used.

## Characteristic Rebuild CLI

```task
id: RREG-WP-0009-T07
status: done
priority: medium
state_hub_task_id: "7afd6550-e4a4-4a8a-94bf-d974b0ccb8d2"
```

Add a CLI or maintenance command for batch rebuilds across all repositories or a
selected subset.

Acceptance criteria:
- Command supports `--repo`, `--all`, `--dry-run`, `--no-llm`, and
  `--trusted-auto-approve` options.
- Batch output lists repositories, latest analysis run, candidate source,
  approved items superseded, and remaining review queue.
- Command refuses destructive all-repo rebuild unless an explicit confirmation
  flag is provided.
- Tests cover dry-run and destructive single-repo rebuild behavior.

## Regression Fixtures

```task
id: RREG-WP-0009-T08
status: done
priority: high
state_hub_task_id: "05077f3d-d40d-45fd-865c-0924407beb4f"
```

Add focused fixtures that lock in the distinction between owned capability,
facade capability, dependency use, tooling context, and mere mentions.

Acceptance criteria:
- key-cape-like fixture includes `SCOPE.md`, `CLAUDE.md`, README references,
  backend adapters, and IAM/OIDC capability descriptions.
- llm-connect-like fixture still generates LLM provider routing capability.
- facade fixture proves a repo can legitimately expose an imported capability
  when it has public wrapper/API evidence.
- dependency-only fixture proves imported libraries do not become provided
  capabilities.
- Tests assert both positive generated characteristics and negative rejected or
  downgraded candidates.

## Documentation And Review Guidance

```task
id: RREG-WP-0009-T09
status: done
priority: medium
state_hub_task_id: "071f6d76-c92b-4ac1-825c-edcbef4bdbf6"
```

Document how curators should interpret source provenance, utility
relationships, and rebuild operations.

Acceptance criteria:
- `docs/characteristic-evidence-model.md` explains source roles and original
  utility.
- `docs/terminology.md` defines owned capability, facade, dependency,
  tooling-context, mention, and rebuild/supersede.
- Rebuild documentation explains when to rebuild from scratch versus rerun
  analysis while preserving approved characteristics.
- key-cape is documented as the motivating failure mode without hard-coding
  product-specific behavior into the scanner.

## Cross-Repository Analysis Isolation

```task
id: RREG-WP-0009-T10
status: done
priority: high
```

Validate that analyzing one repository never depends on approved maps,
candidate graphs, facts, chunks, or derived scope data from any other
repository in the registry database.

Acceptance criteria:
- A repository with stale approved characteristics cannot influence fresh
  candidate generation for another repository.
- Candidate graph, observed fact, and content chunk lookups remain scoped by
  repository and analysis run.
- Tests cover a poisoned-repo scenario where repo A contains old LLM/provider
  characteristics and repo B still generates only its own repository-owned
  candidates.