--- id: CUST-WP-0050 type: workplan title: "Repo Classification & State Hub Registration Redesign" domain: infotech repo: the-custodian status: finished owner: custodian topic_slug: custodian planning_priority: high planning_order: 50 created: "2026-06-22" updated: "2026-06-22" started: "2026-06-22" finished: "2026-06-22" state_hub_workstream_id: "9f031f48-8de8-48b6-8e69-d2d83ad70a7a" --- # CUST-WP-0050 - Repo Classification & State Hub Registration Redesign ## Goal Adopt the **Repo Classification Standard** (`canon/standards/repo-classification-standard_v1.0.md`, `id: canon-repo-classification`) as the ecosystem-wide model for organising repositories, and redesign State Hub registration around it so that: - every known repository carries a committed `.repo-classification.yaml` that is the **source of truth** for its classification, - the State Hub can **automatically register all known repos** by reading and validating those files (local checkout or Gitea API), and - all **previously registered repos are reclassified** under the new standard, replacing the current ad-hoc 14-domain model. End state: one principled, validated taxonomy (category · domain · capability tags · business stake · business mechanics) spanning the whole portfolio, with registration that is reproducible from repo-owned metadata rather than hand-curated DB rows. ## Context A 2026-06-21 review compared three views of the portfolio and found them out of sync: - **Gitea** hosts ~72 repos (70 under `coulomb/`, plus a fork and a personal repo). - **State Hub** has 57 `managed_repos` across **14 ad-hoc domains** (custodian, railiance, markitect, coulomb_social, personhood, capabilities, canon, citation_evidence, helix_forge, inter_hub, netkingdom, stack, vergabe_teilnahme, whynot). - **the-custodian** `canon/projects/` froze at the original **6 founding charters**. Concrete discrepancies to resolve as part of this work: - ~18 Gitea repos are **unregistered** (e.g. audit-core, binect-chrome, binect-js, coordination-engine, direkt-vermittlung-de, human-resources, polycode-sim, ralph-workplan, repo-seed, tegwick-control, tele-mcp, testdrive-jsui, timeline-svg, vantage-point, whynot-control, whynot-design). - **Phantom / renamed** registrations: `markitect-project` (registered) vs `markitect-main` (Gitea) — likely a rename; `railiance-bootstrap` and `railiance-hosts` registered but absent from Gitea. - **Duplicate domain**: `vergabe_teilnahme` looks like a second registration of `vergabe-teilnahme` (already under coulomb_social). - **Empty domain**: `personhood` has a charter and topic but no repos. - **Naming drift**: `coulomb.social`/`coulomb_social`, `foerster-capabilities`/`capabilities`. The new standard fixes the root cause: it separates *category* (work mode), *domain* (intended market/user), *capability tags* (what it does), and *business stake* (who cares) — concerns the current 14 "domains" conflate. ### Architecture decision: repo-anchored model, domain derived from classification The standard's `domain` is a **fixed 14-value market vocabulary** (infotech, financials, communication, consumer, health, industrials, energy, utilities, materials, realestate, crypto, agents, space, government) that is *orthogonal* to the Hub's current 14 coordination domains. Per the steering decision on 2026-06-22, the new market-domain vocabulary **replaces** the Hub's domain model (rather than augmenting it or running a parallel two-axis model). The current spine is `Domain → Topic → Workstream`, where `topics.domain_id` and `workstreams.topic_id` are both **NOT NULL** and the 14 domains are seeded **1:1 with 14 topics** (a data convention — the schema actually allows many topics per domain, but that has never been used). `workstreams.repo_id` and `repo_goals.repo_id` already exist, but the *required* anchor is the soft, hub-only `topic`, while the stable git-managed `repo` link is optional. Per the 2026-06-22 steering decision, this redesign **flips the polarity**: the **repo becomes the primary anchor** for workplans, and market-domain is **derived** from the repo's `.repo-classification.yaml`, not stored as a separate `topic`/`domain` parent. Concretely: - `workstreams.repo_id` becomes the **required** anchor; `topic_id` is demoted to optional (or `topic` is retired) — see T10. - Market-domain is computed from `repo → classification.domain`; the standalone `topics.domain_id` / `managed_repos.domain_id` spine is removed. - `RepoGoal` (already repo-anchored) becomes the goal primitive; `DomainGoal` becomes a thin strategic rollup keyed by the 14 market domains. - **Cross-repo workplans** anchor to a dedicated **project repo** that retires to archive on completion, with results living on in the modified product repos — see **ADR-005** and Open Questions D1/D1a. This is consistent with ADR-001: the spine becomes the git-managed repo plus its committed classification file, so the Hub stays fully rebuildable from repo-owned files. It is a **breaking migration** of the coordination spine (T04/T05). ## Scope In scope: - Promote and steward the standard as custodian canon (done: the standard now lives at `canon/standards/repo-classification-standard_v1.0.md`). - A single machine-readable allowed-values source derived from the standard, consumed by both the per-repo files and the Hub validator. - A committed `.repo-classification.yaml` for every active repo (agent-assisted first pass, human-reviewed), authored in each repo. - State Hub schema/model redesign replacing the domain model with the 14 market domains and storing the full classification on `managed_repos`. - A reversible data migration re-homing existing topics/workstreams/goals/ decisions/charters and resolving the discrepancies listed above. - Auto-registration tooling (bulk, idempotent) that reads classification files from local checkouts or the Gitea API and registers/reclassifies repos. - Updates to dashboard, consistency checker, MCP/REST surface, and orientation docs to the new taxonomy. In scope (added 2026-06-22): - Re-anchor workplans to repos (`repo_id` required, `topic` optional/retired) and derive market-domain from classification (T04/T10). - Rename `workstream → workplan` across schema, API, and MCP so the Hub vocabulary matches the repo files and current usage (T10). Out of scope: - Re-architecting task-level semantics beyond what the re-anchor and rename force. - Changing the Gitea hosting model or repo contents beyond adding the classification file (and, for cross-repo efforts, creating project repos per ADR-005). - Classifying throwaway/forked/non-ecosystem repos (explicit exclusion list). ## Repo boundary This is the **custodian driving/coordination workplan** (it owns the canon standard and the portfolio decision), consistent with how `CUST-WP-0043` drove State Hub work. Implementation tasks **T04–T08 execute in `/home/worsch/state-hub`** and should be re-homed as a state-hub-local workplan once this plan is approved; per-repo classification files (T02/T03) are authored in each target repo. The hub remains a read/index model fed by repo-owned files (ADR-001). ## Tasks ### Phase 1 — Standard as a validation source ### T01 - Derive machine-readable allowed-values from the standard ```task id: CUST-WP-0050-T01 status: done priority: high state_hub_task_id: "d978b1f3-4eca-4a17-835b-2c25d13cae22" ``` Extract the standard's controlled vocabularies (5 categories, 14 domains, the business_stake and business_mechanics enums, and the recommended capability families) into a single machine-readable artefact (e.g. `canon/standards/repo-classification.allowed.yaml`) that both the per-repo `.repo-classification.yaml` linter and the State Hub validator import. Done when a single allowed-values file exists, is referenced by the standard, and a small validator can check a `.repo-classification.yaml` against it. **Delivered (2026-06-22):** `canon/standards/repo-classification.allowed.yaml` (categories, domains, business_stake, business_mechanics, capability families, guidance bounds); referenced from the standard §12; validator `tools/validate_repo_classification.py` (stdlib + PyYAML) with `--self-test` (PASS) — checks category/domain enums, secondary-domain rules, kebab-case tags, and stake/mechanics enums. ### Phase 2 — Classify the portfolio (repo-owned source of truth) ### T02 - Classify custodian-owned repos ```task id: CUST-WP-0050-T02 status: done priority: high state_hub_task_id: "b7edfbb5-483f-4600-9356-8f885c78ce58" ``` Author and human-review `.repo-classification.yaml` for the custodian-domain repos (the-custodian, state-hub, hub-core, inter-hub, activity-core, issue-core, kaizen-agentic, llm-connect, ops-bridge, ops-warden, email-connect) using the standard's §16 agent prompt as a first pass. Done when each custodian repo has a committed file that validates against T01 and has been reviewed by a human. **Progress (2026-06-22):** all 11 custodian-domain repos now carry a committed, validated `.repo-classification.yaml` (first-pass `classified_by: agent`). Following the 2026-06-22 decision, a new **`tooling`** category (between `project` and `product`) was added to the standard for reusable internal tooling/infrastructure, and the nine tooling repos were reclassified to it: the-custodian (research·infotech), inter-hub (research·infotech), state-hub (tooling·infotech), hub-core (tooling·infotech), activity-core (tooling·infotech), issue-core (tooling·infotech), kaizen-agentic (tooling·agents), llm-connect (tooling·agents), ops-bridge (tooling·infotech), ops-warden (tooling·infotech), email-connect (tooling·infotech). Commits are **local-only** in each repo (not yet pushed to Gitea). **Done (2026-06-22):** human review complete — Bernd confirmed the agents-vs-infotech primary-domain choice, keeping both kaizen-agentic and llm-connect as `agents` primary (`infotech` secondary). All 11 files flipped to `classified_by: human` and re-validated clean against T01. Task **done**. ### T03 - Classify the full Gitea inventory — DROPPED ```task id: CUST-WP-0050-T03 status: cancel priority: high state_hub_task_id: "81489716-61ef-4207-ab8a-5877843281de" ``` **Dropped 2026-06-22.** T03 conflated *authoring classification files* with *registering repos*. Doing a ~70-repo PR storm before the vocabulary and schema are proven is premature and high-blast-radius, and registering the ~18 unregistered repos now would land them in the **old** domain model — creating legacy only to clean it up. Classifying + registering the remaining inventory is deferred to **T11**, executed **under the new model after cutover**. The 11 custodian fixtures (T02) plus T01 are sufficient to build and prove the redesign. ### Phase 3–4 — RE-HOMED to STATE-WP-0065 (state-hub) Per this plan's repo boundary and the 2026-06-22 decision, the implementation of the State Hub redesign now lives in a **state-hub-local workplan**: `state-hub/workplans/STATE-WP-0065-repo-anchored-classification-spine.md` (workstream `8dc7d106-11e2-41df-b512-89ed69d2a65f`). CUST-WP-0050 remains the **coordination driver** (canon standard, decisions D1/D1a, ADR-005). The original implementation tasks below are **cancelled here** (re-homed, not abandoned); the efficient regrouping merges the three spine-rewriting tasks into one migration: | Was (CUST-WP-0050) | Now (STATE-WP-0065) | | --- | --- | | T04 schema + T05 data migration + T10 re-anchor/rename | **P1** single Alembic spine migration | | T04/T10 API + validation surface | **P2** API / MCP / validation | | T06 auto-registration | **P3** auto-registration tooling | | T07 reclassify existing | folded into **P3** (lazy, as committed files appear) | | T08 surfaces + T09 cutover | **P4** surfaces & cutover | ```task id: CUST-WP-0050-T04 status: cancel priority: high state_hub_task_id: "b61f6267-c2b2-4325-95fa-30ee899ce7d1" ``` Re-homed → STATE-WP-0065 P1 (schema: domains→14 market domains + classification on `managed_repos`). ```task id: CUST-WP-0050-T05 status: cancel priority: high state_hub_task_id: "171fa385-4d78-41ea-b749-ac3f9082fe47" ``` Re-homed → STATE-WP-0065 P1 (data migration + discrepancy resolution, same window as schema). ```task id: CUST-WP-0050-T06 status: cancel priority: high state_hub_task_id: "6ae14007-d6d2-4395-814e-ace91486a953" ``` Re-homed → STATE-WP-0065 P3 (`register-from-classification` bulk/idempotent tooling). ```task id: CUST-WP-0050-T07 status: cancel priority: medium state_hub_task_id: "6411bf3f-9de2-4bcd-9ffe-6209cda6ba93" ``` Re-homed → STATE-WP-0065 P3 (lazy reclassification of existing registrations as committed files appear). ```task id: CUST-WP-0050-T08 status: cancel priority: medium state_hub_task_id: "09951aec-2960-4c50-b73d-4e2e7bd285c9" ``` Re-homed → STATE-WP-0065 P4 (dashboard, consistency rule, MCP/REST filters, docs). ```task id: CUST-WP-0050-T09 status: cancel priority: medium state_hub_task_id: "babbb80a-c52d-4ec2-b217-2f6196a2e5f3" ``` Re-homed → STATE-WP-0065 P4 (cutover, verification, retire old model). ```task id: CUST-WP-0050-T10 status: cancel priority: high state_hub_task_id: "bee16416-a67f-4155-93d7-09f278daa04f" ``` Re-homed → STATE-WP-0065 P1 (re-anchor `repo_id` required + `workstream → workplan` rename, merged into the spine migration). ### Phase 4 (custodian) — Post-cutover inventory ### T11 - Classify & register remaining Gitea inventory (post-cutover) ```task id: CUST-WP-0050-T11 status: done priority: medium state_hub_task_id: "d8895c58-a930-42aa-8207-9babf9ba572a" ``` Replaces dropped T03. **After** STATE-WP-0065 cutover proves the new model, author `.repo-classification.yaml` for the remaining active Gitea repos (the ~18 unregistered + any not yet migrated) and bulk-register them via the STATE-WP-0065 P3 tooling — **under the new model**, no legacy detour. Maintain an explicit **exclusion list** (fork `tegwick/the-custodian`, `lando_worsch/python-snake`, archived `test_domain_v2`, inactive repos). Done when every non-excluded active Gitea repo has a committed, validated classification file and a `managed_repo` row under the new taxonomy (or is on the recorded exclusion list). **Done (2026-06-22):** - Exclusion list: `canon/standards/repo-classification.exclusions.yaml` (forks, archived phantoms, templates/sandboxes, Gitea repos pending local checkout). - Batch author: `tools/batch_author_repo_classifications.py` — agent first-pass for 51 local repos (skips 10 human-reviewed custodian fixtures); all validated against T01; committed in each target repo. - Registration: 7 newly registered (`coordination-engine`, `human-resources`, `markitect-main`, `repo-seed`, `tegwick-control`, `vantage-point`, `whynot-control`); `make register-from-classification-all` updated 43 existing rows from `classified_by: migration` → `agent` (0 invalid). - **Coverage:** 63 active `managed_repos` — 11 `human`, 51 `agent`, 1 deferred (`marki-docx`, hub-only, on exclusion list pending clone). Excluded locally: `hub-core-seed`, `sand-boxer`. Archived hub rows (4) unchanged. ## Open Questions / Decisions - **D1 (RESOLVED 2026-06-22): the repo is the primary anchor.** Workplans bind to repos (`repo_id` required); market-domain is *derived* from the repo's classification; `topic`/`domain` stop being the spine (`topic` retires or becomes an optional cross-repo tag). This supersedes the earlier "keep topic as an independent coordination unit" proposal. Implemented by T04/T10. - **D1a (open, follows from D1): anchor for cross-repo workplans.** Per **ADR-005**, a complex cross-repo effort gets its own **project repo** (`category: project`) as the anchor, retired to archive on completion with results living in the modified product repos. Open sub-point: the project-repo **naming convention** (e.g. `proj-` vs a dedicated grouping) and the archival trigger details. - **D2: classification ownership/approval.** Who approves each repo's `.repo-classification.yaml` — per-repo owner, or central custodian review? - **D3 (RESOLVED 2026-06-22): exclusion list.** Recorded at `canon/standards/repo-classification.exclusions.yaml` — forks/personal repos, archived phantoms, template/sandbox checkouts, and Gitea slugs pending local checkout (incl. `marki-docx`). - **D4: behavioural vs descriptive.** Do `secondary_domains` / `capability_tags` / `business_stake` drive any Hub behaviour initially, or are they descriptive until a later phase? ## Risks - **Breaking-migration blast radius** — topics/workstreams/goals/decisions and charter `topic_id` references all move; mitigate with a reviewed dry-run and a tested downgrade (T05). - **Cross-repo coordination** — T03 touches ~70 repos via PRs; sequence behind T01/T02 so the vocabulary is stable first. - **Consistency-checker coupling** — existing C-rules assume the current domain model; update alongside (T08) to avoid mass false positives. - **Boundary drift** — keep implementation in `state-hub`; this plan coordinates.