Files
the-custodian/workplans/CUST-WP-0050-repo-classification-registration-redesign.md
tegwick 8bdefcd6ba Normalize agent instructions and workplan frontmatter (STATE-WP-0067)
- Align agent files with on-disk workplan prefixes (infer from workplan ids)
- Set workplan domain to registered domain_slug; add topic_slug where applicable
- Repair frontmatter delimiter formatting; migrate legacy task status literals
- Regenerate AGENTS.md, CLAUDE.md, and .claude/rules from State Hub templates
2026-06-22 23:16:28 +02:00

17 KiB
Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, created, updated, started, finished, state_hub_workstream_id
id type title domain repo status owner topic_slug planning_priority planning_order created updated started finished state_hub_workstream_id
CUST-WP-0050 workplan Repo Classification & State Hub Registration Redesign infotech the-custodian finished custodian custodian high 50 2026-06-22 2026-06-22 2026-06-22 2026-06-22 9f031f48-8de8-48b6-8e69-d2d83ad70a7a

CUST-WP-0050 - Repo Classification & State Hub Registration Redesign

Goal

Adopt the Repo Classification Standard (canon/standards/repo-classification-standard_v1.0.md, id: canon-repo-classification) as the ecosystem-wide model for organising repositories, and redesign State Hub registration around it so that:

  • every known repository carries a committed .repo-classification.yaml that is the source of truth for its classification,
  • the State Hub can automatically register all known repos by reading and validating those files (local checkout or Gitea API), and
  • all previously registered repos are reclassified under the new standard, replacing the current ad-hoc 14-domain model.

End state: one principled, validated taxonomy (category · domain · capability tags · business stake · business mechanics) spanning the whole portfolio, with registration that is reproducible from repo-owned metadata rather than hand-curated DB rows.

Context

A 2026-06-21 review compared three views of the portfolio and found them out of sync:

  • Gitea hosts ~72 repos (70 under coulomb/, plus a fork and a personal repo).
  • State Hub has 57 managed_repos across 14 ad-hoc domains (custodian, railiance, markitect, coulomb_social, personhood, capabilities, canon, citation_evidence, helix_forge, inter_hub, netkingdom, stack, vergabe_teilnahme, whynot).
  • the-custodian canon/projects/ froze at the original 6 founding charters.

Concrete discrepancies to resolve as part of this work:

  • ~18 Gitea repos are unregistered (e.g. audit-core, binect-chrome, binect-js, coordination-engine, direkt-vermittlung-de, human-resources, polycode-sim, ralph-workplan, repo-seed, tegwick-control, tele-mcp, testdrive-jsui, timeline-svg, vantage-point, whynot-control, whynot-design).
  • Phantom / renamed registrations: markitect-project (registered) vs markitect-main (Gitea) — likely a rename; railiance-bootstrap and railiance-hosts registered but absent from Gitea.
  • Duplicate domain: vergabe_teilnahme looks like a second registration of vergabe-teilnahme (already under coulomb_social).
  • Empty domain: personhood has a charter and topic but no repos.
  • Naming drift: coulomb.social/coulomb_social, foerster-capabilities/capabilities.

The new standard fixes the root cause: it separates category (work mode), domain (intended market/user), capability tags (what it does), and business stake (who cares) — concerns the current 14 "domains" conflate.

Architecture decision: repo-anchored model, domain derived from classification

The standard's domain is a fixed 14-value market vocabulary (infotech, financials, communication, consumer, health, industrials, energy, utilities, materials, realestate, crypto, agents, space, government) that is orthogonal to the Hub's current 14 coordination domains. Per the steering decision on 2026-06-22, the new market-domain vocabulary replaces the Hub's domain model (rather than augmenting it or running a parallel two-axis model).

The current spine is Domain → Topic → Workstream, where topics.domain_id and workstreams.topic_id are both NOT NULL and the 14 domains are seeded 1:1 with 14 topics (a data convention — the schema actually allows many topics per domain, but that has never been used). workstreams.repo_id and repo_goals.repo_id already exist, but the required anchor is the soft, hub-only topic, while the stable git-managed repo link is optional.

Per the 2026-06-22 steering decision, this redesign flips the polarity: the repo becomes the primary anchor for workplans, and market-domain is derived from the repo's .repo-classification.yaml, not stored as a separate topic/domain parent. Concretely:

  • workstreams.repo_id becomes the required anchor; topic_id is demoted to optional (or topic is retired) — see T10.
  • Market-domain is computed from repo → classification.domain; the standalone topics.domain_id / managed_repos.domain_id spine is removed.
  • RepoGoal (already repo-anchored) becomes the goal primitive; DomainGoal becomes a thin strategic rollup keyed by the 14 market domains.
  • Cross-repo workplans anchor to a dedicated project repo that retires to archive on completion, with results living on in the modified product repos — see ADR-005 and Open Questions D1/D1a.

This is consistent with ADR-001: the spine becomes the git-managed repo plus its committed classification file, so the Hub stays fully rebuildable from repo-owned files. It is a breaking migration of the coordination spine (T04/T05).

Scope

In scope:

  • Promote and steward the standard as custodian canon (done: the standard now lives at canon/standards/repo-classification-standard_v1.0.md).
  • A single machine-readable allowed-values source derived from the standard, consumed by both the per-repo files and the Hub validator.
  • A committed .repo-classification.yaml for every active repo (agent-assisted first pass, human-reviewed), authored in each repo.
  • State Hub schema/model redesign replacing the domain model with the 14 market domains and storing the full classification on managed_repos.
  • A reversible data migration re-homing existing topics/workstreams/goals/ decisions/charters and resolving the discrepancies listed above.
  • Auto-registration tooling (bulk, idempotent) that reads classification files from local checkouts or the Gitea API and registers/reclassifies repos.
  • Updates to dashboard, consistency checker, MCP/REST surface, and orientation docs to the new taxonomy.

In scope (added 2026-06-22):

  • Re-anchor workplans to repos (repo_id required, topic optional/retired) and derive market-domain from classification (T04/T10).
  • Rename workstream → workplan across schema, API, and MCP so the Hub vocabulary matches the repo files and current usage (T10).

Out of scope:

  • Re-architecting task-level semantics beyond what the re-anchor and rename force.
  • Changing the Gitea hosting model or repo contents beyond adding the classification file (and, for cross-repo efforts, creating project repos per ADR-005).
  • Classifying throwaway/forked/non-ecosystem repos (explicit exclusion list).

Repo boundary

This is the custodian driving/coordination workplan (it owns the canon standard and the portfolio decision), consistent with how CUST-WP-0043 drove State Hub work. Implementation tasks T04T08 execute in /home/worsch/state-hub and should be re-homed as a state-hub-local workplan once this plan is approved; per-repo classification files (T02/T03) are authored in each target repo. The hub remains a read/index model fed by repo-owned files (ADR-001).

Tasks

Phase 1 — Standard as a validation source

T01 - Derive machine-readable allowed-values from the standard

id: CUST-WP-0050-T01
status: done
priority: high
state_hub_task_id: "d978b1f3-4eca-4a17-835b-2c25d13cae22"

Extract the standard's controlled vocabularies (5 categories, 14 domains, the business_stake and business_mechanics enums, and the recommended capability families) into a single machine-readable artefact (e.g. canon/standards/repo-classification.allowed.yaml) that both the per-repo .repo-classification.yaml linter and the State Hub validator import.

Done when a single allowed-values file exists, is referenced by the standard, and a small validator can check a .repo-classification.yaml against it.

Delivered (2026-06-22): canon/standards/repo-classification.allowed.yaml (categories, domains, business_stake, business_mechanics, capability families, guidance bounds); referenced from the standard §12; validator tools/validate_repo_classification.py (stdlib + PyYAML) with --self-test (PASS) — checks category/domain enums, secondary-domain rules, kebab-case tags, and stake/mechanics enums.

Phase 2 — Classify the portfolio (repo-owned source of truth)

T02 - Classify custodian-owned repos

id: CUST-WP-0050-T02
status: done
priority: high
state_hub_task_id: "b7edfbb5-483f-4600-9356-8f885c78ce58"

Author and human-review .repo-classification.yaml for the custodian-domain repos (the-custodian, state-hub, hub-core, inter-hub, activity-core, issue-core, kaizen-agentic, llm-connect, ops-bridge, ops-warden, email-connect) using the standard's §16 agent prompt as a first pass.

Done when each custodian repo has a committed file that validates against T01 and has been reviewed by a human.

Progress (2026-06-22): all 11 custodian-domain repos now carry a committed, validated .repo-classification.yaml (first-pass classified_by: agent). Following the 2026-06-22 decision, a new tooling category (between project and product) was added to the standard for reusable internal tooling/infrastructure, and the nine tooling repos were reclassified to it: the-custodian (research·infotech), inter-hub (research·infotech), state-hub (tooling·infotech), hub-core (tooling·infotech), activity-core (tooling·infotech), issue-core (tooling·infotech), kaizen-agentic (tooling·agents), llm-connect (tooling·agents), ops-bridge (tooling·infotech), ops-warden (tooling·infotech), email-connect (tooling·infotech). Commits are local-only in each repo (not yet pushed to Gitea).

Done (2026-06-22): human review complete — Bernd confirmed the agents-vs-infotech primary-domain choice, keeping both kaizen-agentic and llm-connect as agents primary (infotech secondary). All 11 files flipped to classified_by: human and re-validated clean against T01. Task done.

T03 - Classify the full Gitea inventory — DROPPED

id: CUST-WP-0050-T03
status: cancel
priority: high
state_hub_task_id: "81489716-61ef-4207-ab8a-5877843281de"

Dropped 2026-06-22. T03 conflated authoring classification files with registering repos. Doing a ~70-repo PR storm before the vocabulary and schema are proven is premature and high-blast-radius, and registering the ~18 unregistered repos now would land them in the old domain model — creating legacy only to clean it up. Classifying + registering the remaining inventory is deferred to T11, executed under the new model after cutover. The 11 custodian fixtures (T02) plus T01 are sufficient to build and prove the redesign.

Phase 34 — RE-HOMED to STATE-WP-0065 (state-hub)

Per this plan's repo boundary and the 2026-06-22 decision, the implementation of the State Hub redesign now lives in a state-hub-local workplan: state-hub/workplans/STATE-WP-0065-repo-anchored-classification-spine.md (workstream 8dc7d106-11e2-41df-b512-89ed69d2a65f). CUST-WP-0050 remains the coordination driver (canon standard, decisions D1/D1a, ADR-005). The original implementation tasks below are cancelled here (re-homed, not abandoned); the efficient regrouping merges the three spine-rewriting tasks into one migration:

Was (CUST-WP-0050) Now (STATE-WP-0065)
T04 schema + T05 data migration + T10 re-anchor/rename P1 single Alembic spine migration
T04/T10 API + validation surface P2 API / MCP / validation
T06 auto-registration P3 auto-registration tooling
T07 reclassify existing folded into P3 (lazy, as committed files appear)
T08 surfaces + T09 cutover P4 surfaces & cutover
id: CUST-WP-0050-T04
status: cancel
priority: high
state_hub_task_id: "b61f6267-c2b2-4325-95fa-30ee899ce7d1"

Re-homed → STATE-WP-0065 P1 (schema: domains→14 market domains + classification on managed_repos).

id: CUST-WP-0050-T05
status: cancel
priority: high
state_hub_task_id: "171fa385-4d78-41ea-b749-ac3f9082fe47"

Re-homed → STATE-WP-0065 P1 (data migration + discrepancy resolution, same window as schema).

id: CUST-WP-0050-T06
status: cancel
priority: high
state_hub_task_id: "6ae14007-d6d2-4395-814e-ace91486a953"

Re-homed → STATE-WP-0065 P3 (register-from-classification bulk/idempotent tooling).

id: CUST-WP-0050-T07
status: cancel
priority: medium
state_hub_task_id: "6411bf3f-9de2-4bcd-9ffe-6209cda6ba93"

Re-homed → STATE-WP-0065 P3 (lazy reclassification of existing registrations as committed files appear).

id: CUST-WP-0050-T08
status: cancel
priority: medium
state_hub_task_id: "09951aec-2960-4c50-b73d-4e2e7bd285c9"

Re-homed → STATE-WP-0065 P4 (dashboard, consistency rule, MCP/REST filters, docs).

id: CUST-WP-0050-T09
status: cancel
priority: medium
state_hub_task_id: "babbb80a-c52d-4ec2-b217-2f6196a2e5f3"

Re-homed → STATE-WP-0065 P4 (cutover, verification, retire old model).

id: CUST-WP-0050-T10
status: cancel
priority: high
state_hub_task_id: "bee16416-a67f-4155-93d7-09f278daa04f"

Re-homed → STATE-WP-0065 P1 (re-anchor repo_id required + workstream → workplan rename, merged into the spine migration).

Phase 4 (custodian) — Post-cutover inventory

T11 - Classify & register remaining Gitea inventory (post-cutover)

id: CUST-WP-0050-T11
status: done
priority: medium
state_hub_task_id: "d8895c58-a930-42aa-8207-9babf9ba572a"

Replaces dropped T03. After STATE-WP-0065 cutover proves the new model, author .repo-classification.yaml for the remaining active Gitea repos (the ~18 unregistered + any not yet migrated) and bulk-register them via the STATE-WP-0065 P3 tooling — under the new model, no legacy detour. Maintain an explicit exclusion list (fork tegwick/the-custodian, lando_worsch/python-snake, archived test_domain_v2, inactive repos).

Done when every non-excluded active Gitea repo has a committed, validated classification file and a managed_repo row under the new taxonomy (or is on the recorded exclusion list).

Done (2026-06-22):

  • Exclusion list: canon/standards/repo-classification.exclusions.yaml (forks, archived phantoms, templates/sandboxes, Gitea repos pending local checkout).
  • Batch author: tools/batch_author_repo_classifications.py — agent first-pass for 51 local repos (skips 10 human-reviewed custodian fixtures); all validated against T01; committed in each target repo.
  • Registration: 7 newly registered (coordination-engine, human-resources, markitect-main, repo-seed, tegwick-control, vantage-point, whynot-control); make register-from-classification-all updated 43 existing rows from classified_by: migrationagent (0 invalid).
  • Coverage: 63 active managed_repos — 11 human, 51 agent, 1 deferred (marki-docx, hub-only, on exclusion list pending clone). Excluded locally: hub-core-seed, sand-boxer. Archived hub rows (4) unchanged.

Open Questions / Decisions

  • D1 (RESOLVED 2026-06-22): the repo is the primary anchor. Workplans bind to repos (repo_id required); market-domain is derived from the repo's classification; topic/domain stop being the spine (topic retires or becomes an optional cross-repo tag). This supersedes the earlier "keep topic as an independent coordination unit" proposal. Implemented by T04/T10.
  • D1a (open, follows from D1): anchor for cross-repo workplans. Per ADR-005, a complex cross-repo effort gets its own project repo (category: project) as the anchor, retired to archive on completion with results living in the modified product repos. Open sub-point: the project-repo naming convention (e.g. proj-<slug> vs a dedicated grouping) and the archival trigger details.
  • D2: classification ownership/approval. Who approves each repo's .repo-classification.yaml — per-repo owner, or central custodian review?
  • D3 (RESOLVED 2026-06-22): exclusion list. Recorded at canon/standards/repo-classification.exclusions.yaml — forks/personal repos, archived phantoms, template/sandbox checkouts, and Gitea slugs pending local checkout (incl. marki-docx).
  • D4: behavioural vs descriptive. Do secondary_domains / capability_tags / business_stake drive any Hub behaviour initially, or are they descriptive until a later phase?

Risks

  • Breaking-migration blast radius — topics/workstreams/goals/decisions and charter topic_id references all move; mitigate with a reviewed dry-run and a tested downgrade (T05).
  • Cross-repo coordination — T03 touches ~70 repos via PRs; sequence behind T01/T02 so the vocabulary is stable first.
  • Consistency-checker coupling — existing C-rules assume the current domain model; update alongside (T08) to avoid mass false positives.
  • Boundary drift — keep implementation in state-hub; this plan coordinates.