Files
the-custodian/workplans/CUST-WP-0050-repo-classification-registration-redesign.md
tegwick 4099179374 CUST-WP-0050: drop T03, re-home T04-T10 to STATE-WP-0065, add T11
Per 2026-06-22 review: T03 dropped (registering unregistered repos under the
old model = legacy to clean up). Implementation re-homed to state-hub-local
STATE-WP-0065; T04/T05/T10 merged into one spine migration (P1). CUST-WP-0050
stays the coordination driver. T11 (post-cutover inventory) replaces T03.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 11:52:55 +02:00

353 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: CUST-WP-0050
type: workplan
title: "Repo Classification & State Hub Registration Redesign"
domain: custodian
repo: the-custodian
status: active
owner: custodian
topic_slug: custodian
planning_priority: high
planning_order: 50
created: "2026-06-22"
updated: "2026-06-22"
started: "2026-06-22"
state_hub_workstream_id: "9f031f48-8de8-48b6-8e69-d2d83ad70a7a"
---
# CUST-WP-0050 - Repo Classification & State Hub Registration Redesign
## Goal
Adopt the **Repo Classification Standard** (`canon/standards/repo-classification-standard_v1.0.md`,
`id: canon-repo-classification`) as the ecosystem-wide model for organising
repositories, and redesign State Hub registration around it so that:
- every known repository carries a committed `.repo-classification.yaml` that is
the **source of truth** for its classification,
- the State Hub can **automatically register all known repos** by reading and
validating those files (local checkout or Gitea API), and
- all **previously registered repos are reclassified** under the new standard,
replacing the current ad-hoc 14-domain model.
End state: one principled, validated taxonomy (category · domain · capability
tags · business stake · business mechanics) spanning the whole portfolio, with
registration that is reproducible from repo-owned metadata rather than
hand-curated DB rows.
## Context
A 2026-06-21 review compared three views of the portfolio and found them out of
sync:
- **Gitea** hosts ~72 repos (70 under `coulomb/`, plus a fork and a personal repo).
- **State Hub** has 57 `managed_repos` across **14 ad-hoc domains** (custodian,
railiance, markitect, coulomb_social, personhood, capabilities, canon,
citation_evidence, helix_forge, inter_hub, netkingdom, stack,
vergabe_teilnahme, whynot).
- **the-custodian** `canon/projects/` froze at the original **6 founding charters**.
Concrete discrepancies to resolve as part of this work:
- ~18 Gitea repos are **unregistered** (e.g. audit-core, binect-chrome, binect-js,
coordination-engine, direkt-vermittlung-de, human-resources, polycode-sim,
ralph-workplan, repo-seed, tegwick-control, tele-mcp, testdrive-jsui,
timeline-svg, vantage-point, whynot-control, whynot-design).
- **Phantom / renamed** registrations: `markitect-project` (registered) vs
`markitect-main` (Gitea) — likely a rename; `railiance-bootstrap` and
`railiance-hosts` registered but absent from Gitea.
- **Duplicate domain**: `vergabe_teilnahme` looks like a second registration of
`vergabe-teilnahme` (already under coulomb_social).
- **Empty domain**: `personhood` has a charter and topic but no repos.
- **Naming drift**: `coulomb.social`/`coulomb_social`, `foerster-capabilities`/`capabilities`.
The new standard fixes the root cause: it separates *category* (work mode),
*domain* (intended market/user), *capability tags* (what it does), and *business
stake* (who cares) — concerns the current 14 "domains" conflate.
### Architecture decision: repo-anchored model, domain derived from classification
The standard's `domain` is a **fixed 14-value market vocabulary** (infotech,
financials, communication, consumer, health, industrials, energy, utilities,
materials, realestate, crypto, agents, space, government) that is *orthogonal* to
the Hub's current 14 coordination domains. Per the steering decision on
2026-06-22, the new market-domain vocabulary **replaces** the Hub's domain model
(rather than augmenting it or running a parallel two-axis model).
The current spine is `Domain → Topic → Workstream`, where `topics.domain_id` and
`workstreams.topic_id` are both **NOT NULL** and the 14 domains are seeded **1:1
with 14 topics** (a data convention — the schema actually allows many topics per
domain, but that has never been used). `workstreams.repo_id` and
`repo_goals.repo_id` already exist, but the *required* anchor is the soft,
hub-only `topic`, while the stable git-managed `repo` link is optional.
Per the 2026-06-22 steering decision, this redesign **flips the polarity**: the
**repo becomes the primary anchor** for workplans, and market-domain is
**derived** from the repo's `.repo-classification.yaml`, not stored as a separate
`topic`/`domain` parent. Concretely:
- `workstreams.repo_id` becomes the **required** anchor; `topic_id` is demoted to
optional (or `topic` is retired) — see T10.
- Market-domain is computed from `repo → classification.domain`; the standalone
`topics.domain_id` / `managed_repos.domain_id` spine is removed.
- `RepoGoal` (already repo-anchored) becomes the goal primitive; `DomainGoal`
becomes a thin strategic rollup keyed by the 14 market domains.
- **Cross-repo workplans** anchor to a dedicated **project repo** that retires to
archive on completion, with results living on in the modified product repos —
see **ADR-005** and Open Questions D1/D1a.
This is consistent with ADR-001: the spine becomes the git-managed repo plus its
committed classification file, so the Hub stays fully rebuildable from repo-owned
files. It is a **breaking migration** of the coordination spine (T04/T05).
## Scope
In scope:
- Promote and steward the standard as custodian canon (done: the standard now
lives at `canon/standards/repo-classification-standard_v1.0.md`).
- A single machine-readable allowed-values source derived from the standard,
consumed by both the per-repo files and the Hub validator.
- A committed `.repo-classification.yaml` for every active repo (agent-assisted
first pass, human-reviewed), authored in each repo.
- State Hub schema/model redesign replacing the domain model with the 14 market
domains and storing the full classification on `managed_repos`.
- A reversible data migration re-homing existing topics/workstreams/goals/
decisions/charters and resolving the discrepancies listed above.
- Auto-registration tooling (bulk, idempotent) that reads classification files
from local checkouts or the Gitea API and registers/reclassifies repos.
- Updates to dashboard, consistency checker, MCP/REST surface, and orientation
docs to the new taxonomy.
In scope (added 2026-06-22):
- Re-anchor workplans to repos (`repo_id` required, `topic` optional/retired) and
derive market-domain from classification (T04/T10).
- Rename `workstream → workplan` across schema, API, and MCP so the Hub
vocabulary matches the repo files and current usage (T10).
Out of scope:
- Re-architecting task-level semantics beyond what the re-anchor and rename force.
- Changing the Gitea hosting model or repo contents beyond adding the
classification file (and, for cross-repo efforts, creating project repos per
ADR-005).
- Classifying throwaway/forked/non-ecosystem repos (explicit exclusion list).
## Repo boundary
This is the **custodian driving/coordination workplan** (it owns the canon
standard and the portfolio decision), consistent with how `CUST-WP-0043` drove
State Hub work. Implementation tasks **T04T08 execute in `/home/worsch/state-hub`**
and should be re-homed as a state-hub-local workplan once this plan is approved;
per-repo classification files (T02/T03) are authored in each target repo. The
hub remains a read/index model fed by repo-owned files (ADR-001).
## Tasks
### Phase 1 — Standard as a validation source
### T01 - Derive machine-readable allowed-values from the standard
```task
id: CUST-WP-0050-T01
status: done
priority: high
state_hub_task_id: "d978b1f3-4eca-4a17-835b-2c25d13cae22"
```
Extract the standard's controlled vocabularies (5 categories, 14 domains, the
business_stake and business_mechanics enums, and the recommended capability
families) into a single machine-readable artefact (e.g.
`canon/standards/repo-classification.allowed.yaml`) that both the per-repo
`.repo-classification.yaml` linter and the State Hub validator import.
Done when a single allowed-values file exists, is referenced by the standard, and
a small validator can check a `.repo-classification.yaml` against it.
**Delivered (2026-06-22):** `canon/standards/repo-classification.allowed.yaml`
(categories, domains, business_stake, business_mechanics, capability families,
guidance bounds); referenced from the standard §12; validator
`tools/validate_repo_classification.py` (stdlib + PyYAML) with `--self-test`
(PASS) — checks category/domain enums, secondary-domain rules, kebab-case tags,
and stake/mechanics enums.
### Phase 2 — Classify the portfolio (repo-owned source of truth)
### T02 - Classify custodian-owned repos
```task
id: CUST-WP-0050-T02
status: done
priority: high
state_hub_task_id: "b7edfbb5-483f-4600-9356-8f885c78ce58"
```
Author and human-review `.repo-classification.yaml` for the custodian-domain
repos (the-custodian, state-hub, hub-core, inter-hub, activity-core, issue-core,
kaizen-agentic, llm-connect, ops-bridge, ops-warden, email-connect) using the
standard's §16 agent prompt as a first pass.
Done when each custodian repo has a committed file that validates against T01 and
has been reviewed by a human.
**Progress (2026-06-22):** all 11 custodian-domain repos now carry a committed,
validated `.repo-classification.yaml` (first-pass `classified_by: agent`).
Following the 2026-06-22 decision, a new **`tooling`** category (between `project`
and `product`) was added to the standard for reusable internal
tooling/infrastructure, and the nine tooling repos were reclassified to it:
the-custodian (research·infotech), inter-hub (research·infotech), state-hub
(tooling·infotech), hub-core (tooling·infotech), activity-core (tooling·infotech),
issue-core (tooling·infotech), kaizen-agentic (tooling·agents), llm-connect
(tooling·agents), ops-bridge (tooling·infotech), ops-warden (tooling·infotech),
email-connect (tooling·infotech). Commits are **local-only** in each repo (not
yet pushed to Gitea).
**Done (2026-06-22):** human review complete — Bernd confirmed the
agents-vs-infotech primary-domain choice, keeping both kaizen-agentic and
llm-connect as `agents` primary (`infotech` secondary). All 11 files flipped to
`classified_by: human` and re-validated clean against T01. Task **done**.
### T03 - Classify the full Gitea inventory — DROPPED
```task
id: CUST-WP-0050-T03
status: cancel
priority: high
state_hub_task_id: "81489716-61ef-4207-ab8a-5877843281de"
```
**Dropped 2026-06-22.** T03 conflated *authoring classification files* with
*registering repos*. Doing a ~70-repo PR storm before the vocabulary and schema
are proven is premature and high-blast-radius, and registering the ~18
unregistered repos now would land them in the **old** domain model — creating
legacy only to clean it up. Classifying + registering the remaining inventory is
deferred to **T11**, executed **under the new model after cutover**. The 11
custodian fixtures (T02) plus T01 are sufficient to build and prove the redesign.
### Phase 34 — RE-HOMED to STATE-WP-0065 (state-hub)
Per this plan's repo boundary and the 2026-06-22 decision, the implementation of
the State Hub redesign now lives in a **state-hub-local workplan**:
`state-hub/workplans/STATE-WP-0065-repo-anchored-classification-spine.md`
(workstream `8dc7d106-11e2-41df-b512-89ed69d2a65f`). CUST-WP-0050 remains the
**coordination driver** (canon standard, decisions D1/D1a, ADR-005). The original
implementation tasks below are **cancelled here** (re-homed, not abandoned); the
efficient regrouping merges the three spine-rewriting tasks into one migration:
| Was (CUST-WP-0050) | Now (STATE-WP-0065) |
| --- | --- |
| T04 schema + T05 data migration + T10 re-anchor/rename | **P1** single Alembic spine migration |
| T04/T10 API + validation surface | **P2** API / MCP / validation |
| T06 auto-registration | **P3** auto-registration tooling |
| T07 reclassify existing | folded into **P3** (lazy, as committed files appear) |
| T08 surfaces + T09 cutover | **P4** surfaces & cutover |
```task
id: CUST-WP-0050-T04
status: cancel
priority: high
state_hub_task_id: "b61f6267-c2b2-4325-95fa-30ee899ce7d1"
```
Re-homed → STATE-WP-0065 P1 (schema: domains→14 market domains + classification on `managed_repos`).
```task
id: CUST-WP-0050-T05
status: cancel
priority: high
state_hub_task_id: "171fa385-4d78-41ea-b749-ac3f9082fe47"
```
Re-homed → STATE-WP-0065 P1 (data migration + discrepancy resolution, same window as schema).
```task
id: CUST-WP-0050-T06
status: cancel
priority: high
state_hub_task_id: "6ae14007-d6d2-4395-814e-ace91486a953"
```
Re-homed → STATE-WP-0065 P3 (`register-from-classification` bulk/idempotent tooling).
```task
id: CUST-WP-0050-T07
status: cancel
priority: medium
state_hub_task_id: "6411bf3f-9de2-4bcd-9ffe-6209cda6ba93"
```
Re-homed → STATE-WP-0065 P3 (lazy reclassification of existing registrations as committed files appear).
```task
id: CUST-WP-0050-T08
status: cancel
priority: medium
state_hub_task_id: "09951aec-2960-4c50-b73d-4e2e7bd285c9"
```
Re-homed → STATE-WP-0065 P4 (dashboard, consistency rule, MCP/REST filters, docs).
```task
id: CUST-WP-0050-T09
status: cancel
priority: medium
state_hub_task_id: "babbb80a-c52d-4ec2-b217-2f6196a2e5f3"
```
Re-homed → STATE-WP-0065 P4 (cutover, verification, retire old model).
```task
id: CUST-WP-0050-T10
status: cancel
priority: high
state_hub_task_id: "bee16416-a67f-4155-93d7-09f278daa04f"
```
Re-homed → STATE-WP-0065 P1 (re-anchor `repo_id` required + `workstream → workplan` rename, merged into the spine migration).
### Phase 4 (custodian) — Post-cutover inventory
### T11 - Classify & register remaining Gitea inventory (post-cutover)
```task
id: CUST-WP-0050-T11
status: todo
priority: medium
```
Replaces dropped T03. **After** STATE-WP-0065 cutover proves the new model,
author `.repo-classification.yaml` for the remaining active Gitea repos (the ~18
unregistered + any not yet migrated) and bulk-register them via the
STATE-WP-0065 P3 tooling — **under the new model**, no legacy detour. Maintain an
explicit **exclusion list** (fork `tegwick/the-custodian`,
`lando_worsch/python-snake`, archived `test_domain_v2`, inactive repos).
Done when every non-excluded active Gitea repo has a committed, validated
classification file and a `managed_repo` row under the new taxonomy (or is on the
recorded exclusion list).
## Open Questions / Decisions
- **D1 (RESOLVED 2026-06-22): the repo is the primary anchor.** Workplans bind to
repos (`repo_id` required); market-domain is *derived* from the repo's
classification; `topic`/`domain` stop being the spine (`topic` retires or
becomes an optional cross-repo tag). This supersedes the earlier "keep topic as
an independent coordination unit" proposal. Implemented by T04/T10.
- **D1a (open, follows from D1): anchor for cross-repo workplans.** Per **ADR-005**,
a complex cross-repo effort gets its own **project repo** (`category: project`)
as the anchor, retired to archive on completion with results living in the
modified product repos. Open sub-point: the project-repo **naming convention**
(e.g. `proj-<slug>` vs a dedicated grouping) and the archival trigger details.
- **D2: classification ownership/approval.** Who approves each repo's
`.repo-classification.yaml` — per-repo owner, or central custodian review?
- **D3: exclusion list.** Confirm exclusions (fork `tegwick/the-custodian`,
`lando_worsch/python-snake`, archived `test_domain_v2`, any inactive repos).
- **D4: behavioural vs descriptive.** Do `secondary_domains` / `capability_tags`
/ `business_stake` drive any Hub behaviour initially, or are they descriptive
until a later phase?
## Risks
- **Breaking-migration blast radius** — topics/workstreams/goals/decisions and
charter `topic_id` references all move; mitigate with a reviewed dry-run and a
tested downgrade (T05).
- **Cross-repo coordination** — T03 touches ~70 repos via PRs; sequence behind
T01/T02 so the vocabulary is stable first.
- **Consistency-checker coupling** — existing C-rules assume the current domain
model; update alongside (T08) to avoid mass false positives.
- **Boundary drift** — keep implementation in `state-hub`; this plan coordinates.