diff --git a/workplans/CUST-WP-0050-repo-classification-registration-redesign.md b/workplans/CUST-WP-0050-repo-classification-registration-redesign.md new file mode 100644 index 0000000..98d0ea2 --- /dev/null +++ b/workplans/CUST-WP-0050-repo-classification-registration-redesign.md @@ -0,0 +1,306 @@ +--- +id: CUST-WP-0050 +type: workplan +title: "Repo Classification & State Hub Registration Redesign" +domain: custodian +repo: the-custodian +status: proposed +owner: custodian +topic_slug: custodian +planning_priority: high +planning_order: 50 +created: "2026-06-22" +updated: "2026-06-22" +--- + +# CUST-WP-0050 - Repo Classification & State Hub Registration Redesign + +## Goal + +Adopt the **Repo Classification Standard** (`canon/standards/repo-classification-standard_v1.0.md`, +`id: canon-repo-classification`) as the ecosystem-wide model for organising +repositories, and redesign State Hub registration around it so that: + +- every known repository carries a committed `.repo-classification.yaml` that is + the **source of truth** for its classification, +- the State Hub can **automatically register all known repos** by reading and + validating those files (local checkout or Gitea API), and +- all **previously registered repos are reclassified** under the new standard, + replacing the current ad-hoc 14-domain model. + +End state: one principled, validated taxonomy (category · domain · capability +tags · business stake · business mechanics) spanning the whole portfolio, with +registration that is reproducible from repo-owned metadata rather than +hand-curated DB rows. + +## Context + +A 2026-06-21 review compared three views of the portfolio and found them out of +sync: + +- **Gitea** hosts ~72 repos (70 under `coulomb/`, plus a fork and a personal repo). +- **State Hub** has 57 `managed_repos` across **14 ad-hoc domains** (custodian, + railiance, markitect, coulomb_social, personhood, capabilities, canon, + citation_evidence, helix_forge, inter_hub, netkingdom, stack, + vergabe_teilnahme, whynot). +- **the-custodian** `canon/projects/` froze at the original **6 founding charters**. + +Concrete discrepancies to resolve as part of this work: + +- ~18 Gitea repos are **unregistered** (e.g. audit-core, binect-chrome, binect-js, + coordination-engine, direkt-vermittlung-de, human-resources, polycode-sim, + ralph-workplan, repo-seed, tegwick-control, tele-mcp, testdrive-jsui, + timeline-svg, vantage-point, whynot-control, whynot-design). +- **Phantom / renamed** registrations: `markitect-project` (registered) vs + `markitect-main` (Gitea) — likely a rename; `railiance-bootstrap` and + `railiance-hosts` registered but absent from Gitea. +- **Duplicate domain**: `vergabe_teilnahme` looks like a second registration of + `vergabe-teilnahme` (already under coulomb_social). +- **Empty domain**: `personhood` has a charter and topic but no repos. +- **Naming drift**: `coulomb.social`/`coulomb_social`, `foerster-capabilities`/`capabilities`. + +The new standard fixes the root cause: it separates *category* (work mode), +*domain* (intended market/user), *capability tags* (what it does), and *business +stake* (who cares) — concerns the current 14 "domains" conflate. + +### Architecture decision: replace the domain model + +The standard's `domain` is a **fixed 14-value market vocabulary** (infotech, +financials, communication, consumer, health, industrials, energy, utilities, +materials, realestate, crypto, agents, space, government) that is *orthogonal* to +the Hub's current 14 coordination domains. Per the steering decision on +2026-06-22, the new market-domain vocabulary **replaces** the Hub's domain model +(rather than augmenting it or running a parallel two-axis model). + +This is a **breaking migration**: the current `domains` table is 1:1 with +`topics`, and topics own workstreams, goals, decisions, and progress events. The +new market domains are coarse (most repos are `infotech`), so the old 1:1 +domain↔topic assumption cannot survive unchanged. **Decoupling coordination +topics from the market-domain attribute is the central design problem of T04/T05** +(see Open Questions D1). + +## Scope + +In scope: + +- Promote and steward the standard as custodian canon (done: the standard now + lives at `canon/standards/repo-classification-standard_v1.0.md`). +- A single machine-readable allowed-values source derived from the standard, + consumed by both the per-repo files and the Hub validator. +- A committed `.repo-classification.yaml` for every active repo (agent-assisted + first pass, human-reviewed), authored in each repo. +- State Hub schema/model redesign replacing the domain model with the 14 market + domains and storing the full classification on `managed_repos`. +- A reversible data migration re-homing existing topics/workstreams/goals/ + decisions/charters and resolving the discrepancies listed above. +- Auto-registration tooling (bulk, idempotent) that reads classification files + from local checkouts or the Gitea API and registers/reclassifies repos. +- Updates to dashboard, consistency checker, MCP/REST surface, and orientation + docs to the new taxonomy. + +Out of scope: + +- Re-architecting workstream/task semantics beyond what the domain replacement + forces. +- Changing the Gitea hosting model or repo contents beyond adding the + classification file. +- Classifying throwaway/forked/non-ecosystem repos (explicit exclusion list). + +## Repo boundary + +This is the **custodian driving/coordination workplan** (it owns the canon +standard and the portfolio decision), consistent with how `CUST-WP-0043` drove +State Hub work. Implementation tasks **T04–T08 execute in `/home/worsch/state-hub`** +and should be re-homed as a state-hub-local workplan once this plan is approved; +per-repo classification files (T02/T03) are authored in each target repo. The +hub remains a read/index model fed by repo-owned files (ADR-001). + +## Tasks + +### Phase 1 — Standard as a validation source + +### T01 - Derive machine-readable allowed-values from the standard + +```task +id: CUST-WP-0050-T01 +status: todo +priority: high +``` + +Extract the standard's controlled vocabularies (5 categories, 14 domains, the +business_stake and business_mechanics enums, and the recommended capability +families) into a single machine-readable artefact (e.g. +`canon/standards/repo-classification.allowed.yaml`) that both the per-repo +`.repo-classification.yaml` linter and the State Hub validator import. + +Done when a single allowed-values file exists, is referenced by the standard, and +a small validator can check a `.repo-classification.yaml` against it. + +### Phase 2 — Classify the portfolio (repo-owned source of truth) + +### T02 - Classify custodian-owned repos + +```task +id: CUST-WP-0050-T02 +status: todo +priority: high +``` + +Author and human-review `.repo-classification.yaml` for the custodian-domain +repos (the-custodian, state-hub, hub-core, inter-hub, activity-core, issue-core, +kaizen-agentic, llm-connect, ops-bridge, ops-warden, email-connect) using the +standard's §16 agent prompt as a first pass. + +Done when each custodian repo has a committed file that validates against T01 and +has been reviewed by a human. + +### T03 - Classify the full Gitea inventory + +```task +id: CUST-WP-0050-T03 +status: todo +priority: high +``` + +Produce proposed `.repo-classification.yaml` for every active repo in the Gitea +`coulomb` org (~70), prioritising the 57 already-registered and the ~18 +unregistered repos. Deliver as per-repo PRs for owner/human review. Maintain an +explicit **exclusion list** (forks, `lando_worsch/python-snake`, archived +`test_domain_v2`) recorded in this workplan. + +Done when every non-excluded active repo has a committed, validated classification +file (or is on the recorded exclusion list). + +### Phase 3 — State Hub redesign (executed in /home/worsch/state-hub) + +### T04 - Redesign schema: replace domains, add classification + +```task +id: CUST-WP-0050-T04 +status: todo +priority: high +``` + +Replace the `domains` table contents with the 14 fixed market domains and add +classification storage to `managed_repos`: `category`, primary `domain_id`, +`secondary_domains[]`, `capability_tags[]`, `business_stake[]`, +`business_mechanics[]`, plus provenance (`classified_at`, `classified_by`, +`standard_version`). Enforce the allowed-values from T01 at the API boundary. +Decouple `topic` from market-domain (see D1). Provide an Alembic migration and +updated SQLAlchemy models + Pydantic schemas. + +Done when the schema/model/API accept and validate the full classification and +reject invalid values, with a forward migration and a tested downgrade path. + +### T05 - Migration mapping + data migration + +```task +id: CUST-WP-0050-T05 +status: todo +priority: high +``` + +Define and apply the mapping from the old 14 domains/topics to the new model +(guided by standard §15 Migration Notes), re-pointing existing topics, +workstreams, goals, decisions, progress events, and charter `topic_id` +references with **no orphaned workstreams**. Resolve the 2026-06-21 discrepancies: +reconcile `markitect-project`↔`markitect-main`, retire phantom +`railiance-bootstrap`/`railiance-hosts` (or relink), collapse the +`vergabe_teilnahme` duplicate, and decide `personhood`'s disposition (charter-only +vs retire). + +Done when a dry-run migration report is reviewed and the applied migration leaves +zero orphaned coordination records; the discrepancy list is resolved or explicitly +deferred with reasons. + +### T06 - Auto-registration tooling + +```task +id: CUST-WP-0050-T06 +status: todo +priority: high +``` + +Build an idempotent `register-from-classification` capability (Make target + +script + MCP tool) that, given a repo (local path or Gitea API), reads +`.repo-classification.yaml`, validates against T01, and upserts the +`managed_repo` with full classification. Support a **bulk** run over the Gitea +inventory and reclassification of existing rows. Reuse the k3s/Gitea access path +documented during the 2026-06-21 review (Gitea runs in k3s on coulombcore; +reach it via `kubectl port-forward svc/gitea-http`). + +Done when one command registers/reclassifies every repo with a valid file and +emits a report of registered / updated / skipped / invalid. + +### T07 - Reclassify existing registrations + +```task +id: CUST-WP-0050-T07 +status: todo +priority: medium +``` + +Run T06 against the classification files for the 57 previously-registered repos, +reconciling each to the new taxonomy and retiring phantom/duplicate records. + +Done when all previously-registered repos reflect their new classification and +the managed-repo set matches the (non-excluded) Gitea inventory. + +### Phase 4 — Consuming surfaces & cutover + +### T08 - Update dashboard, consistency checker, MCP/REST, docs + +```task +id: CUST-WP-0050-T08 +status: todo +priority: medium +``` + +Update the dashboard to navigate by category/domain/capability/business-stake; +add a consistency rule flagging registered repos lacking a valid +`.repo-classification.yaml`; expose list/filter-by-classification in MCP/REST; and +update orientation docs (`SCOPE.md`, `README.md`, `.claude/rules/*`) that +reference the old "domains". + +Done when the dashboard renders the new taxonomy, the consistency checker has a +classification rule, and docs no longer assume the old domain model. + +### T09 - Cutover, verification, retire old model + +```task +id: CUST-WP-0050-T09 +status: todo +priority: medium +``` + +Switch orientation/registration tooling to the new model end-to-end, archive the +old domain semantics, and run `make fix-consistency REPO=the-custodian`. + +Done when an end-to-end pass (classify → auto-register → dashboard view) is +verified and the old ad-hoc domain model is retired. + +## Open Questions / Decisions + +- **D1 (blocking T04/T05): topic ↔ market-domain after replacement.** Market + domains are coarse; coordination still needs finer grouping. Proposed: keep + `topic` as the coordination unit, made independent of market domain (market + domain becomes a `managed_repo` attribute; a topic may span repos of different + market domains). Needs confirmation before schema work starts. +- **D2: classification ownership/approval.** Who approves each repo's + `.repo-classification.yaml` — per-repo owner, or central custodian review? +- **D3: exclusion list.** Confirm exclusions (fork `tegwick/the-custodian`, + `lando_worsch/python-snake`, archived `test_domain_v2`, any inactive repos). +- **D4: behavioural vs descriptive.** Do `secondary_domains` / `capability_tags` + / `business_stake` drive any Hub behaviour initially, or are they descriptive + until a later phase? + +## Risks + +- **Breaking-migration blast radius** — topics/workstreams/goals/decisions and + charter `topic_id` references all move; mitigate with a reviewed dry-run and a + tested downgrade (T05). +- **Cross-repo coordination** — T03 touches ~70 repos via PRs; sequence behind + T01/T02 so the vocabulary is stable first. +- **Consistency-checker coupling** — existing C-rules assume the current domain + model; update alongside (T08) to avoid mass false positives. +- **Boundary drift** — keep implementation in `state-hub`; this plan coordinates.