Add CUST-WP-0050: repo classification & registration redesign

Proposed workplan to adopt the Repo Classification Standard ecosystem-wide:
per-repo .repo-classification.yaml as source of truth, State Hub domain model
replaced by the standard's 14 market domains, auto-registration tooling, and
reclassification of the 57 existing registrations. Folds in the 2026-06-21
discrepancy findings as reconciliation targets. Blocking design question D1
(topic vs market-domain) flagged for resolution before schema work.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-22 01:19:38 +02:00
parent 2c27ac6d2e
commit 0ba909263b

View File

@@ -0,0 +1,306 @@
---
id: CUST-WP-0050
type: workplan
title: "Repo Classification & State Hub Registration Redesign"
domain: custodian
repo: the-custodian
status: proposed
owner: custodian
topic_slug: custodian
planning_priority: high
planning_order: 50
created: "2026-06-22"
updated: "2026-06-22"
---
# CUST-WP-0050 - Repo Classification & State Hub Registration Redesign
## Goal
Adopt the **Repo Classification Standard** (`canon/standards/repo-classification-standard_v1.0.md`,
`id: canon-repo-classification`) as the ecosystem-wide model for organising
repositories, and redesign State Hub registration around it so that:
- every known repository carries a committed `.repo-classification.yaml` that is
the **source of truth** for its classification,
- the State Hub can **automatically register all known repos** by reading and
validating those files (local checkout or Gitea API), and
- all **previously registered repos are reclassified** under the new standard,
replacing the current ad-hoc 14-domain model.
End state: one principled, validated taxonomy (category · domain · capability
tags · business stake · business mechanics) spanning the whole portfolio, with
registration that is reproducible from repo-owned metadata rather than
hand-curated DB rows.
## Context
A 2026-06-21 review compared three views of the portfolio and found them out of
sync:
- **Gitea** hosts ~72 repos (70 under `coulomb/`, plus a fork and a personal repo).
- **State Hub** has 57 `managed_repos` across **14 ad-hoc domains** (custodian,
railiance, markitect, coulomb_social, personhood, capabilities, canon,
citation_evidence, helix_forge, inter_hub, netkingdom, stack,
vergabe_teilnahme, whynot).
- **the-custodian** `canon/projects/` froze at the original **6 founding charters**.
Concrete discrepancies to resolve as part of this work:
- ~18 Gitea repos are **unregistered** (e.g. audit-core, binect-chrome, binect-js,
coordination-engine, direkt-vermittlung-de, human-resources, polycode-sim,
ralph-workplan, repo-seed, tegwick-control, tele-mcp, testdrive-jsui,
timeline-svg, vantage-point, whynot-control, whynot-design).
- **Phantom / renamed** registrations: `markitect-project` (registered) vs
`markitect-main` (Gitea) — likely a rename; `railiance-bootstrap` and
`railiance-hosts` registered but absent from Gitea.
- **Duplicate domain**: `vergabe_teilnahme` looks like a second registration of
`vergabe-teilnahme` (already under coulomb_social).
- **Empty domain**: `personhood` has a charter and topic but no repos.
- **Naming drift**: `coulomb.social`/`coulomb_social`, `foerster-capabilities`/`capabilities`.
The new standard fixes the root cause: it separates *category* (work mode),
*domain* (intended market/user), *capability tags* (what it does), and *business
stake* (who cares) — concerns the current 14 "domains" conflate.
### Architecture decision: replace the domain model
The standard's `domain` is a **fixed 14-value market vocabulary** (infotech,
financials, communication, consumer, health, industrials, energy, utilities,
materials, realestate, crypto, agents, space, government) that is *orthogonal* to
the Hub's current 14 coordination domains. Per the steering decision on
2026-06-22, the new market-domain vocabulary **replaces** the Hub's domain model
(rather than augmenting it or running a parallel two-axis model).
This is a **breaking migration**: the current `domains` table is 1:1 with
`topics`, and topics own workstreams, goals, decisions, and progress events. The
new market domains are coarse (most repos are `infotech`), so the old 1:1
domain↔topic assumption cannot survive unchanged. **Decoupling coordination
topics from the market-domain attribute is the central design problem of T04/T05**
(see Open Questions D1).
## Scope
In scope:
- Promote and steward the standard as custodian canon (done: the standard now
lives at `canon/standards/repo-classification-standard_v1.0.md`).
- A single machine-readable allowed-values source derived from the standard,
consumed by both the per-repo files and the Hub validator.
- A committed `.repo-classification.yaml` for every active repo (agent-assisted
first pass, human-reviewed), authored in each repo.
- State Hub schema/model redesign replacing the domain model with the 14 market
domains and storing the full classification on `managed_repos`.
- A reversible data migration re-homing existing topics/workstreams/goals/
decisions/charters and resolving the discrepancies listed above.
- Auto-registration tooling (bulk, idempotent) that reads classification files
from local checkouts or the Gitea API and registers/reclassifies repos.
- Updates to dashboard, consistency checker, MCP/REST surface, and orientation
docs to the new taxonomy.
Out of scope:
- Re-architecting workstream/task semantics beyond what the domain replacement
forces.
- Changing the Gitea hosting model or repo contents beyond adding the
classification file.
- Classifying throwaway/forked/non-ecosystem repos (explicit exclusion list).
## Repo boundary
This is the **custodian driving/coordination workplan** (it owns the canon
standard and the portfolio decision), consistent with how `CUST-WP-0043` drove
State Hub work. Implementation tasks **T04T08 execute in `/home/worsch/state-hub`**
and should be re-homed as a state-hub-local workplan once this plan is approved;
per-repo classification files (T02/T03) are authored in each target repo. The
hub remains a read/index model fed by repo-owned files (ADR-001).
## Tasks
### Phase 1 — Standard as a validation source
### T01 - Derive machine-readable allowed-values from the standard
```task
id: CUST-WP-0050-T01
status: todo
priority: high
```
Extract the standard's controlled vocabularies (5 categories, 14 domains, the
business_stake and business_mechanics enums, and the recommended capability
families) into a single machine-readable artefact (e.g.
`canon/standards/repo-classification.allowed.yaml`) that both the per-repo
`.repo-classification.yaml` linter and the State Hub validator import.
Done when a single allowed-values file exists, is referenced by the standard, and
a small validator can check a `.repo-classification.yaml` against it.
### Phase 2 — Classify the portfolio (repo-owned source of truth)
### T02 - Classify custodian-owned repos
```task
id: CUST-WP-0050-T02
status: todo
priority: high
```
Author and human-review `.repo-classification.yaml` for the custodian-domain
repos (the-custodian, state-hub, hub-core, inter-hub, activity-core, issue-core,
kaizen-agentic, llm-connect, ops-bridge, ops-warden, email-connect) using the
standard's §16 agent prompt as a first pass.
Done when each custodian repo has a committed file that validates against T01 and
has been reviewed by a human.
### T03 - Classify the full Gitea inventory
```task
id: CUST-WP-0050-T03
status: todo
priority: high
```
Produce proposed `.repo-classification.yaml` for every active repo in the Gitea
`coulomb` org (~70), prioritising the 57 already-registered and the ~18
unregistered repos. Deliver as per-repo PRs for owner/human review. Maintain an
explicit **exclusion list** (forks, `lando_worsch/python-snake`, archived
`test_domain_v2`) recorded in this workplan.
Done when every non-excluded active repo has a committed, validated classification
file (or is on the recorded exclusion list).
### Phase 3 — State Hub redesign (executed in /home/worsch/state-hub)
### T04 - Redesign schema: replace domains, add classification
```task
id: CUST-WP-0050-T04
status: todo
priority: high
```
Replace the `domains` table contents with the 14 fixed market domains and add
classification storage to `managed_repos`: `category`, primary `domain_id`,
`secondary_domains[]`, `capability_tags[]`, `business_stake[]`,
`business_mechanics[]`, plus provenance (`classified_at`, `classified_by`,
`standard_version`). Enforce the allowed-values from T01 at the API boundary.
Decouple `topic` from market-domain (see D1). Provide an Alembic migration and
updated SQLAlchemy models + Pydantic schemas.
Done when the schema/model/API accept and validate the full classification and
reject invalid values, with a forward migration and a tested downgrade path.
### T05 - Migration mapping + data migration
```task
id: CUST-WP-0050-T05
status: todo
priority: high
```
Define and apply the mapping from the old 14 domains/topics to the new model
(guided by standard §15 Migration Notes), re-pointing existing topics,
workstreams, goals, decisions, progress events, and charter `topic_id`
references with **no orphaned workstreams**. Resolve the 2026-06-21 discrepancies:
reconcile `markitect-project``markitect-main`, retire phantom
`railiance-bootstrap`/`railiance-hosts` (or relink), collapse the
`vergabe_teilnahme` duplicate, and decide `personhood`'s disposition (charter-only
vs retire).
Done when a dry-run migration report is reviewed and the applied migration leaves
zero orphaned coordination records; the discrepancy list is resolved or explicitly
deferred with reasons.
### T06 - Auto-registration tooling
```task
id: CUST-WP-0050-T06
status: todo
priority: high
```
Build an idempotent `register-from-classification` capability (Make target +
script + MCP tool) that, given a repo (local path or Gitea API), reads
`.repo-classification.yaml`, validates against T01, and upserts the
`managed_repo` with full classification. Support a **bulk** run over the Gitea
inventory and reclassification of existing rows. Reuse the k3s/Gitea access path
documented during the 2026-06-21 review (Gitea runs in k3s on coulombcore;
reach it via `kubectl port-forward svc/gitea-http`).
Done when one command registers/reclassifies every repo with a valid file and
emits a report of registered / updated / skipped / invalid.
### T07 - Reclassify existing registrations
```task
id: CUST-WP-0050-T07
status: todo
priority: medium
```
Run T06 against the classification files for the 57 previously-registered repos,
reconciling each to the new taxonomy and retiring phantom/duplicate records.
Done when all previously-registered repos reflect their new classification and
the managed-repo set matches the (non-excluded) Gitea inventory.
### Phase 4 — Consuming surfaces & cutover
### T08 - Update dashboard, consistency checker, MCP/REST, docs
```task
id: CUST-WP-0050-T08
status: todo
priority: medium
```
Update the dashboard to navigate by category/domain/capability/business-stake;
add a consistency rule flagging registered repos lacking a valid
`.repo-classification.yaml`; expose list/filter-by-classification in MCP/REST; and
update orientation docs (`SCOPE.md`, `README.md`, `.claude/rules/*`) that
reference the old "domains".
Done when the dashboard renders the new taxonomy, the consistency checker has a
classification rule, and docs no longer assume the old domain model.
### T09 - Cutover, verification, retire old model
```task
id: CUST-WP-0050-T09
status: todo
priority: medium
```
Switch orientation/registration tooling to the new model end-to-end, archive the
old domain semantics, and run `make fix-consistency REPO=the-custodian`.
Done when an end-to-end pass (classify → auto-register → dashboard view) is
verified and the old ad-hoc domain model is retired.
## Open Questions / Decisions
- **D1 (blocking T04/T05): topic ↔ market-domain after replacement.** Market
domains are coarse; coordination still needs finer grouping. Proposed: keep
`topic` as the coordination unit, made independent of market domain (market
domain becomes a `managed_repo` attribute; a topic may span repos of different
market domains). Needs confirmation before schema work starts.
- **D2: classification ownership/approval.** Who approves each repo's
`.repo-classification.yaml` — per-repo owner, or central custodian review?
- **D3: exclusion list.** Confirm exclusions (fork `tegwick/the-custodian`,
`lando_worsch/python-snake`, archived `test_domain_v2`, any inactive repos).
- **D4: behavioural vs descriptive.** Do `secondary_domains` / `capability_tags`
/ `business_stake` drive any Hub behaviour initially, or are they descriptive
until a later phase?
## Risks
- **Breaking-migration blast radius** — topics/workstreams/goals/decisions and
charter `topic_id` references all move; mitigate with a reviewed dry-run and a
tested downgrade (T05).
- **Cross-repo coordination** — T03 touches ~70 repos via PRs; sequence behind
T01/T02 so the vocabulary is stable first.
- **Consistency-checker coupling** — existing C-rules assume the current domain
model; update alongside (T08) to avoid mass false positives.
- **Boundary drift** — keep implementation in `state-hub`; this plan coordinates.