Add CUST-WP-0050: repo classification & registration redesign
Proposed workplan to adopt the Repo Classification Standard ecosystem-wide: per-repo .repo-classification.yaml as source of truth, State Hub domain model replaced by the standard's 14 market domains, auto-registration tooling, and reclassification of the 57 existing registrations. Folds in the 2026-06-21 discrepancy findings as reconciliation targets. Blocking design question D1 (topic vs market-domain) flagged for resolution before schema work. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,306 @@
|
||||
---
|
||||
id: CUST-WP-0050
|
||||
type: workplan
|
||||
title: "Repo Classification & State Hub Registration Redesign"
|
||||
domain: custodian
|
||||
repo: the-custodian
|
||||
status: proposed
|
||||
owner: custodian
|
||||
topic_slug: custodian
|
||||
planning_priority: high
|
||||
planning_order: 50
|
||||
created: "2026-06-22"
|
||||
updated: "2026-06-22"
|
||||
---
|
||||
|
||||
# CUST-WP-0050 - Repo Classification & State Hub Registration Redesign
|
||||
|
||||
## Goal
|
||||
|
||||
Adopt the **Repo Classification Standard** (`canon/standards/repo-classification-standard_v1.0.md`,
|
||||
`id: canon-repo-classification`) as the ecosystem-wide model for organising
|
||||
repositories, and redesign State Hub registration around it so that:
|
||||
|
||||
- every known repository carries a committed `.repo-classification.yaml` that is
|
||||
the **source of truth** for its classification,
|
||||
- the State Hub can **automatically register all known repos** by reading and
|
||||
validating those files (local checkout or Gitea API), and
|
||||
- all **previously registered repos are reclassified** under the new standard,
|
||||
replacing the current ad-hoc 14-domain model.
|
||||
|
||||
End state: one principled, validated taxonomy (category · domain · capability
|
||||
tags · business stake · business mechanics) spanning the whole portfolio, with
|
||||
registration that is reproducible from repo-owned metadata rather than
|
||||
hand-curated DB rows.
|
||||
|
||||
## Context
|
||||
|
||||
A 2026-06-21 review compared three views of the portfolio and found them out of
|
||||
sync:
|
||||
|
||||
- **Gitea** hosts ~72 repos (70 under `coulomb/`, plus a fork and a personal repo).
|
||||
- **State Hub** has 57 `managed_repos` across **14 ad-hoc domains** (custodian,
|
||||
railiance, markitect, coulomb_social, personhood, capabilities, canon,
|
||||
citation_evidence, helix_forge, inter_hub, netkingdom, stack,
|
||||
vergabe_teilnahme, whynot).
|
||||
- **the-custodian** `canon/projects/` froze at the original **6 founding charters**.
|
||||
|
||||
Concrete discrepancies to resolve as part of this work:
|
||||
|
||||
- ~18 Gitea repos are **unregistered** (e.g. audit-core, binect-chrome, binect-js,
|
||||
coordination-engine, direkt-vermittlung-de, human-resources, polycode-sim,
|
||||
ralph-workplan, repo-seed, tegwick-control, tele-mcp, testdrive-jsui,
|
||||
timeline-svg, vantage-point, whynot-control, whynot-design).
|
||||
- **Phantom / renamed** registrations: `markitect-project` (registered) vs
|
||||
`markitect-main` (Gitea) — likely a rename; `railiance-bootstrap` and
|
||||
`railiance-hosts` registered but absent from Gitea.
|
||||
- **Duplicate domain**: `vergabe_teilnahme` looks like a second registration of
|
||||
`vergabe-teilnahme` (already under coulomb_social).
|
||||
- **Empty domain**: `personhood` has a charter and topic but no repos.
|
||||
- **Naming drift**: `coulomb.social`/`coulomb_social`, `foerster-capabilities`/`capabilities`.
|
||||
|
||||
The new standard fixes the root cause: it separates *category* (work mode),
|
||||
*domain* (intended market/user), *capability tags* (what it does), and *business
|
||||
stake* (who cares) — concerns the current 14 "domains" conflate.
|
||||
|
||||
### Architecture decision: replace the domain model
|
||||
|
||||
The standard's `domain` is a **fixed 14-value market vocabulary** (infotech,
|
||||
financials, communication, consumer, health, industrials, energy, utilities,
|
||||
materials, realestate, crypto, agents, space, government) that is *orthogonal* to
|
||||
the Hub's current 14 coordination domains. Per the steering decision on
|
||||
2026-06-22, the new market-domain vocabulary **replaces** the Hub's domain model
|
||||
(rather than augmenting it or running a parallel two-axis model).
|
||||
|
||||
This is a **breaking migration**: the current `domains` table is 1:1 with
|
||||
`topics`, and topics own workstreams, goals, decisions, and progress events. The
|
||||
new market domains are coarse (most repos are `infotech`), so the old 1:1
|
||||
domain↔topic assumption cannot survive unchanged. **Decoupling coordination
|
||||
topics from the market-domain attribute is the central design problem of T04/T05**
|
||||
(see Open Questions D1).
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
|
||||
- Promote and steward the standard as custodian canon (done: the standard now
|
||||
lives at `canon/standards/repo-classification-standard_v1.0.md`).
|
||||
- A single machine-readable allowed-values source derived from the standard,
|
||||
consumed by both the per-repo files and the Hub validator.
|
||||
- A committed `.repo-classification.yaml` for every active repo (agent-assisted
|
||||
first pass, human-reviewed), authored in each repo.
|
||||
- State Hub schema/model redesign replacing the domain model with the 14 market
|
||||
domains and storing the full classification on `managed_repos`.
|
||||
- A reversible data migration re-homing existing topics/workstreams/goals/
|
||||
decisions/charters and resolving the discrepancies listed above.
|
||||
- Auto-registration tooling (bulk, idempotent) that reads classification files
|
||||
from local checkouts or the Gitea API and registers/reclassifies repos.
|
||||
- Updates to dashboard, consistency checker, MCP/REST surface, and orientation
|
||||
docs to the new taxonomy.
|
||||
|
||||
Out of scope:
|
||||
|
||||
- Re-architecting workstream/task semantics beyond what the domain replacement
|
||||
forces.
|
||||
- Changing the Gitea hosting model or repo contents beyond adding the
|
||||
classification file.
|
||||
- Classifying throwaway/forked/non-ecosystem repos (explicit exclusion list).
|
||||
|
||||
## Repo boundary
|
||||
|
||||
This is the **custodian driving/coordination workplan** (it owns the canon
|
||||
standard and the portfolio decision), consistent with how `CUST-WP-0043` drove
|
||||
State Hub work. Implementation tasks **T04–T08 execute in `/home/worsch/state-hub`**
|
||||
and should be re-homed as a state-hub-local workplan once this plan is approved;
|
||||
per-repo classification files (T02/T03) are authored in each target repo. The
|
||||
hub remains a read/index model fed by repo-owned files (ADR-001).
|
||||
|
||||
## Tasks
|
||||
|
||||
### Phase 1 — Standard as a validation source
|
||||
|
||||
### T01 - Derive machine-readable allowed-values from the standard
|
||||
|
||||
```task
|
||||
id: CUST-WP-0050-T01
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Extract the standard's controlled vocabularies (5 categories, 14 domains, the
|
||||
business_stake and business_mechanics enums, and the recommended capability
|
||||
families) into a single machine-readable artefact (e.g.
|
||||
`canon/standards/repo-classification.allowed.yaml`) that both the per-repo
|
||||
`.repo-classification.yaml` linter and the State Hub validator import.
|
||||
|
||||
Done when a single allowed-values file exists, is referenced by the standard, and
|
||||
a small validator can check a `.repo-classification.yaml` against it.
|
||||
|
||||
### Phase 2 — Classify the portfolio (repo-owned source of truth)
|
||||
|
||||
### T02 - Classify custodian-owned repos
|
||||
|
||||
```task
|
||||
id: CUST-WP-0050-T02
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Author and human-review `.repo-classification.yaml` for the custodian-domain
|
||||
repos (the-custodian, state-hub, hub-core, inter-hub, activity-core, issue-core,
|
||||
kaizen-agentic, llm-connect, ops-bridge, ops-warden, email-connect) using the
|
||||
standard's §16 agent prompt as a first pass.
|
||||
|
||||
Done when each custodian repo has a committed file that validates against T01 and
|
||||
has been reviewed by a human.
|
||||
|
||||
### T03 - Classify the full Gitea inventory
|
||||
|
||||
```task
|
||||
id: CUST-WP-0050-T03
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Produce proposed `.repo-classification.yaml` for every active repo in the Gitea
|
||||
`coulomb` org (~70), prioritising the 57 already-registered and the ~18
|
||||
unregistered repos. Deliver as per-repo PRs for owner/human review. Maintain an
|
||||
explicit **exclusion list** (forks, `lando_worsch/python-snake`, archived
|
||||
`test_domain_v2`) recorded in this workplan.
|
||||
|
||||
Done when every non-excluded active repo has a committed, validated classification
|
||||
file (or is on the recorded exclusion list).
|
||||
|
||||
### Phase 3 — State Hub redesign (executed in /home/worsch/state-hub)
|
||||
|
||||
### T04 - Redesign schema: replace domains, add classification
|
||||
|
||||
```task
|
||||
id: CUST-WP-0050-T04
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Replace the `domains` table contents with the 14 fixed market domains and add
|
||||
classification storage to `managed_repos`: `category`, primary `domain_id`,
|
||||
`secondary_domains[]`, `capability_tags[]`, `business_stake[]`,
|
||||
`business_mechanics[]`, plus provenance (`classified_at`, `classified_by`,
|
||||
`standard_version`). Enforce the allowed-values from T01 at the API boundary.
|
||||
Decouple `topic` from market-domain (see D1). Provide an Alembic migration and
|
||||
updated SQLAlchemy models + Pydantic schemas.
|
||||
|
||||
Done when the schema/model/API accept and validate the full classification and
|
||||
reject invalid values, with a forward migration and a tested downgrade path.
|
||||
|
||||
### T05 - Migration mapping + data migration
|
||||
|
||||
```task
|
||||
id: CUST-WP-0050-T05
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Define and apply the mapping from the old 14 domains/topics to the new model
|
||||
(guided by standard §15 Migration Notes), re-pointing existing topics,
|
||||
workstreams, goals, decisions, progress events, and charter `topic_id`
|
||||
references with **no orphaned workstreams**. Resolve the 2026-06-21 discrepancies:
|
||||
reconcile `markitect-project`↔`markitect-main`, retire phantom
|
||||
`railiance-bootstrap`/`railiance-hosts` (or relink), collapse the
|
||||
`vergabe_teilnahme` duplicate, and decide `personhood`'s disposition (charter-only
|
||||
vs retire).
|
||||
|
||||
Done when a dry-run migration report is reviewed and the applied migration leaves
|
||||
zero orphaned coordination records; the discrepancy list is resolved or explicitly
|
||||
deferred with reasons.
|
||||
|
||||
### T06 - Auto-registration tooling
|
||||
|
||||
```task
|
||||
id: CUST-WP-0050-T06
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Build an idempotent `register-from-classification` capability (Make target +
|
||||
script + MCP tool) that, given a repo (local path or Gitea API), reads
|
||||
`.repo-classification.yaml`, validates against T01, and upserts the
|
||||
`managed_repo` with full classification. Support a **bulk** run over the Gitea
|
||||
inventory and reclassification of existing rows. Reuse the k3s/Gitea access path
|
||||
documented during the 2026-06-21 review (Gitea runs in k3s on coulombcore;
|
||||
reach it via `kubectl port-forward svc/gitea-http`).
|
||||
|
||||
Done when one command registers/reclassifies every repo with a valid file and
|
||||
emits a report of registered / updated / skipped / invalid.
|
||||
|
||||
### T07 - Reclassify existing registrations
|
||||
|
||||
```task
|
||||
id: CUST-WP-0050-T07
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Run T06 against the classification files for the 57 previously-registered repos,
|
||||
reconciling each to the new taxonomy and retiring phantom/duplicate records.
|
||||
|
||||
Done when all previously-registered repos reflect their new classification and
|
||||
the managed-repo set matches the (non-excluded) Gitea inventory.
|
||||
|
||||
### Phase 4 — Consuming surfaces & cutover
|
||||
|
||||
### T08 - Update dashboard, consistency checker, MCP/REST, docs
|
||||
|
||||
```task
|
||||
id: CUST-WP-0050-T08
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Update the dashboard to navigate by category/domain/capability/business-stake;
|
||||
add a consistency rule flagging registered repos lacking a valid
|
||||
`.repo-classification.yaml`; expose list/filter-by-classification in MCP/REST; and
|
||||
update orientation docs (`SCOPE.md`, `README.md`, `.claude/rules/*`) that
|
||||
reference the old "domains".
|
||||
|
||||
Done when the dashboard renders the new taxonomy, the consistency checker has a
|
||||
classification rule, and docs no longer assume the old domain model.
|
||||
|
||||
### T09 - Cutover, verification, retire old model
|
||||
|
||||
```task
|
||||
id: CUST-WP-0050-T09
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Switch orientation/registration tooling to the new model end-to-end, archive the
|
||||
old domain semantics, and run `make fix-consistency REPO=the-custodian`.
|
||||
|
||||
Done when an end-to-end pass (classify → auto-register → dashboard view) is
|
||||
verified and the old ad-hoc domain model is retired.
|
||||
|
||||
## Open Questions / Decisions
|
||||
|
||||
- **D1 (blocking T04/T05): topic ↔ market-domain after replacement.** Market
|
||||
domains are coarse; coordination still needs finer grouping. Proposed: keep
|
||||
`topic` as the coordination unit, made independent of market domain (market
|
||||
domain becomes a `managed_repo` attribute; a topic may span repos of different
|
||||
market domains). Needs confirmation before schema work starts.
|
||||
- **D2: classification ownership/approval.** Who approves each repo's
|
||||
`.repo-classification.yaml` — per-repo owner, or central custodian review?
|
||||
- **D3: exclusion list.** Confirm exclusions (fork `tegwick/the-custodian`,
|
||||
`lando_worsch/python-snake`, archived `test_domain_v2`, any inactive repos).
|
||||
- **D4: behavioural vs descriptive.** Do `secondary_domains` / `capability_tags`
|
||||
/ `business_stake` drive any Hub behaviour initially, or are they descriptive
|
||||
until a later phase?
|
||||
|
||||
## Risks
|
||||
|
||||
- **Breaking-migration blast radius** — topics/workstreams/goals/decisions and
|
||||
charter `topic_id` references all move; mitigate with a reviewed dry-run and a
|
||||
tested downgrade (T05).
|
||||
- **Cross-repo coordination** — T03 touches ~70 repos via PRs; sequence behind
|
||||
T01/T02 so the vocabulary is stable first.
|
||||
- **Consistency-checker coupling** — existing C-rules assume the current domain
|
||||
model; update alongside (T08) to avoid mass false positives.
|
||||
- **Boundary drift** — keep implementation in `state-hub`; this plan coordinates.
|
||||
Reference in New Issue
Block a user