- Align agent files with on-disk workplan prefixes (infer from workplan ids) - Set workplan domain to registered domain_slug; add topic_slug where applicable - Repair frontmatter delimiter formatting; migrate legacy task status literals - Regenerate AGENTS.md, CLAUDE.md, and .claude/rules from State Hub templates
372 lines
17 KiB
Markdown
372 lines
17 KiB
Markdown
---
|
||
id: CUST-WP-0050
|
||
type: workplan
|
||
title: "Repo Classification & State Hub Registration Redesign"
|
||
domain: infotech
|
||
repo: the-custodian
|
||
status: finished
|
||
owner: custodian
|
||
topic_slug: custodian
|
||
planning_priority: high
|
||
planning_order: 50
|
||
created: "2026-06-22"
|
||
updated: "2026-06-22"
|
||
started: "2026-06-22"
|
||
finished: "2026-06-22"
|
||
state_hub_workstream_id: "9f031f48-8de8-48b6-8e69-d2d83ad70a7a"
|
||
---
|
||
|
||
# CUST-WP-0050 - Repo Classification & State Hub Registration Redesign
|
||
|
||
## Goal
|
||
|
||
Adopt the **Repo Classification Standard** (`canon/standards/repo-classification-standard_v1.0.md`,
|
||
`id: canon-repo-classification`) as the ecosystem-wide model for organising
|
||
repositories, and redesign State Hub registration around it so that:
|
||
|
||
- every known repository carries a committed `.repo-classification.yaml` that is
|
||
the **source of truth** for its classification,
|
||
- the State Hub can **automatically register all known repos** by reading and
|
||
validating those files (local checkout or Gitea API), and
|
||
- all **previously registered repos are reclassified** under the new standard,
|
||
replacing the current ad-hoc 14-domain model.
|
||
|
||
End state: one principled, validated taxonomy (category · domain · capability
|
||
tags · business stake · business mechanics) spanning the whole portfolio, with
|
||
registration that is reproducible from repo-owned metadata rather than
|
||
hand-curated DB rows.
|
||
|
||
## Context
|
||
|
||
A 2026-06-21 review compared three views of the portfolio and found them out of
|
||
sync:
|
||
|
||
- **Gitea** hosts ~72 repos (70 under `coulomb/`, plus a fork and a personal repo).
|
||
- **State Hub** has 57 `managed_repos` across **14 ad-hoc domains** (custodian,
|
||
railiance, markitect, coulomb_social, personhood, capabilities, canon,
|
||
citation_evidence, helix_forge, inter_hub, netkingdom, stack,
|
||
vergabe_teilnahme, whynot).
|
||
- **the-custodian** `canon/projects/` froze at the original **6 founding charters**.
|
||
|
||
Concrete discrepancies to resolve as part of this work:
|
||
|
||
- ~18 Gitea repos are **unregistered** (e.g. audit-core, binect-chrome, binect-js,
|
||
coordination-engine, direkt-vermittlung-de, human-resources, polycode-sim,
|
||
ralph-workplan, repo-seed, tegwick-control, tele-mcp, testdrive-jsui,
|
||
timeline-svg, vantage-point, whynot-control, whynot-design).
|
||
- **Phantom / renamed** registrations: `markitect-project` (registered) vs
|
||
`markitect-main` (Gitea) — likely a rename; `railiance-bootstrap` and
|
||
`railiance-hosts` registered but absent from Gitea.
|
||
- **Duplicate domain**: `vergabe_teilnahme` looks like a second registration of
|
||
`vergabe-teilnahme` (already under coulomb_social).
|
||
- **Empty domain**: `personhood` has a charter and topic but no repos.
|
||
- **Naming drift**: `coulomb.social`/`coulomb_social`, `foerster-capabilities`/`capabilities`.
|
||
|
||
The new standard fixes the root cause: it separates *category* (work mode),
|
||
*domain* (intended market/user), *capability tags* (what it does), and *business
|
||
stake* (who cares) — concerns the current 14 "domains" conflate.
|
||
|
||
### Architecture decision: repo-anchored model, domain derived from classification
|
||
|
||
The standard's `domain` is a **fixed 14-value market vocabulary** (infotech,
|
||
financials, communication, consumer, health, industrials, energy, utilities,
|
||
materials, realestate, crypto, agents, space, government) that is *orthogonal* to
|
||
the Hub's current 14 coordination domains. Per the steering decision on
|
||
2026-06-22, the new market-domain vocabulary **replaces** the Hub's domain model
|
||
(rather than augmenting it or running a parallel two-axis model).
|
||
|
||
The current spine is `Domain → Topic → Workstream`, where `topics.domain_id` and
|
||
`workstreams.topic_id` are both **NOT NULL** and the 14 domains are seeded **1:1
|
||
with 14 topics** (a data convention — the schema actually allows many topics per
|
||
domain, but that has never been used). `workstreams.repo_id` and
|
||
`repo_goals.repo_id` already exist, but the *required* anchor is the soft,
|
||
hub-only `topic`, while the stable git-managed `repo` link is optional.
|
||
|
||
Per the 2026-06-22 steering decision, this redesign **flips the polarity**: the
|
||
**repo becomes the primary anchor** for workplans, and market-domain is
|
||
**derived** from the repo's `.repo-classification.yaml`, not stored as a separate
|
||
`topic`/`domain` parent. Concretely:
|
||
|
||
- `workstreams.repo_id` becomes the **required** anchor; `topic_id` is demoted to
|
||
optional (or `topic` is retired) — see T10.
|
||
- Market-domain is computed from `repo → classification.domain`; the standalone
|
||
`topics.domain_id` / `managed_repos.domain_id` spine is removed.
|
||
- `RepoGoal` (already repo-anchored) becomes the goal primitive; `DomainGoal`
|
||
becomes a thin strategic rollup keyed by the 14 market domains.
|
||
- **Cross-repo workplans** anchor to a dedicated **project repo** that retires to
|
||
archive on completion, with results living on in the modified product repos —
|
||
see **ADR-005** and Open Questions D1/D1a.
|
||
|
||
This is consistent with ADR-001: the spine becomes the git-managed repo plus its
|
||
committed classification file, so the Hub stays fully rebuildable from repo-owned
|
||
files. It is a **breaking migration** of the coordination spine (T04/T05).
|
||
|
||
## Scope
|
||
|
||
In scope:
|
||
|
||
- Promote and steward the standard as custodian canon (done: the standard now
|
||
lives at `canon/standards/repo-classification-standard_v1.0.md`).
|
||
- A single machine-readable allowed-values source derived from the standard,
|
||
consumed by both the per-repo files and the Hub validator.
|
||
- A committed `.repo-classification.yaml` for every active repo (agent-assisted
|
||
first pass, human-reviewed), authored in each repo.
|
||
- State Hub schema/model redesign replacing the domain model with the 14 market
|
||
domains and storing the full classification on `managed_repos`.
|
||
- A reversible data migration re-homing existing topics/workstreams/goals/
|
||
decisions/charters and resolving the discrepancies listed above.
|
||
- Auto-registration tooling (bulk, idempotent) that reads classification files
|
||
from local checkouts or the Gitea API and registers/reclassifies repos.
|
||
- Updates to dashboard, consistency checker, MCP/REST surface, and orientation
|
||
docs to the new taxonomy.
|
||
|
||
In scope (added 2026-06-22):
|
||
|
||
- Re-anchor workplans to repos (`repo_id` required, `topic` optional/retired) and
|
||
derive market-domain from classification (T04/T10).
|
||
- Rename `workstream → workplan` across schema, API, and MCP so the Hub
|
||
vocabulary matches the repo files and current usage (T10).
|
||
|
||
Out of scope:
|
||
|
||
- Re-architecting task-level semantics beyond what the re-anchor and rename force.
|
||
- Changing the Gitea hosting model or repo contents beyond adding the
|
||
classification file (and, for cross-repo efforts, creating project repos per
|
||
ADR-005).
|
||
- Classifying throwaway/forked/non-ecosystem repos (explicit exclusion list).
|
||
|
||
## Repo boundary
|
||
|
||
This is the **custodian driving/coordination workplan** (it owns the canon
|
||
standard and the portfolio decision), consistent with how `CUST-WP-0043` drove
|
||
State Hub work. Implementation tasks **T04–T08 execute in `/home/worsch/state-hub`**
|
||
and should be re-homed as a state-hub-local workplan once this plan is approved;
|
||
per-repo classification files (T02/T03) are authored in each target repo. The
|
||
hub remains a read/index model fed by repo-owned files (ADR-001).
|
||
|
||
## Tasks
|
||
|
||
### Phase 1 — Standard as a validation source
|
||
|
||
### T01 - Derive machine-readable allowed-values from the standard
|
||
|
||
```task
|
||
id: CUST-WP-0050-T01
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "d978b1f3-4eca-4a17-835b-2c25d13cae22"
|
||
```
|
||
|
||
Extract the standard's controlled vocabularies (5 categories, 14 domains, the
|
||
business_stake and business_mechanics enums, and the recommended capability
|
||
families) into a single machine-readable artefact (e.g.
|
||
`canon/standards/repo-classification.allowed.yaml`) that both the per-repo
|
||
`.repo-classification.yaml` linter and the State Hub validator import.
|
||
|
||
Done when a single allowed-values file exists, is referenced by the standard, and
|
||
a small validator can check a `.repo-classification.yaml` against it.
|
||
|
||
**Delivered (2026-06-22):** `canon/standards/repo-classification.allowed.yaml`
|
||
(categories, domains, business_stake, business_mechanics, capability families,
|
||
guidance bounds); referenced from the standard §12; validator
|
||
`tools/validate_repo_classification.py` (stdlib + PyYAML) with `--self-test`
|
||
(PASS) — checks category/domain enums, secondary-domain rules, kebab-case tags,
|
||
and stake/mechanics enums.
|
||
|
||
### Phase 2 — Classify the portfolio (repo-owned source of truth)
|
||
|
||
### T02 - Classify custodian-owned repos
|
||
|
||
```task
|
||
id: CUST-WP-0050-T02
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "b7edfbb5-483f-4600-9356-8f885c78ce58"
|
||
```
|
||
|
||
Author and human-review `.repo-classification.yaml` for the custodian-domain
|
||
repos (the-custodian, state-hub, hub-core, inter-hub, activity-core, issue-core,
|
||
kaizen-agentic, llm-connect, ops-bridge, ops-warden, email-connect) using the
|
||
standard's §16 agent prompt as a first pass.
|
||
|
||
Done when each custodian repo has a committed file that validates against T01 and
|
||
has been reviewed by a human.
|
||
|
||
**Progress (2026-06-22):** all 11 custodian-domain repos now carry a committed,
|
||
validated `.repo-classification.yaml` (first-pass `classified_by: agent`).
|
||
Following the 2026-06-22 decision, a new **`tooling`** category (between `project`
|
||
and `product`) was added to the standard for reusable internal
|
||
tooling/infrastructure, and the nine tooling repos were reclassified to it:
|
||
the-custodian (research·infotech), inter-hub (research·infotech), state-hub
|
||
(tooling·infotech), hub-core (tooling·infotech), activity-core (tooling·infotech),
|
||
issue-core (tooling·infotech), kaizen-agentic (tooling·agents), llm-connect
|
||
(tooling·agents), ops-bridge (tooling·infotech), ops-warden (tooling·infotech),
|
||
email-connect (tooling·infotech). Commits are **local-only** in each repo (not
|
||
yet pushed to Gitea).
|
||
|
||
**Done (2026-06-22):** human review complete — Bernd confirmed the
|
||
agents-vs-infotech primary-domain choice, keeping both kaizen-agentic and
|
||
llm-connect as `agents` primary (`infotech` secondary). All 11 files flipped to
|
||
`classified_by: human` and re-validated clean against T01. Task **done**.
|
||
|
||
### T03 - Classify the full Gitea inventory — DROPPED
|
||
|
||
```task
|
||
id: CUST-WP-0050-T03
|
||
status: cancel
|
||
priority: high
|
||
state_hub_task_id: "81489716-61ef-4207-ab8a-5877843281de"
|
||
```
|
||
|
||
**Dropped 2026-06-22.** T03 conflated *authoring classification files* with
|
||
*registering repos*. Doing a ~70-repo PR storm before the vocabulary and schema
|
||
are proven is premature and high-blast-radius, and registering the ~18
|
||
unregistered repos now would land them in the **old** domain model — creating
|
||
legacy only to clean it up. Classifying + registering the remaining inventory is
|
||
deferred to **T11**, executed **under the new model after cutover**. The 11
|
||
custodian fixtures (T02) plus T01 are sufficient to build and prove the redesign.
|
||
|
||
### Phase 3–4 — RE-HOMED to STATE-WP-0065 (state-hub)
|
||
|
||
Per this plan's repo boundary and the 2026-06-22 decision, the implementation of
|
||
the State Hub redesign now lives in a **state-hub-local workplan**:
|
||
`state-hub/workplans/STATE-WP-0065-repo-anchored-classification-spine.md`
|
||
(workstream `8dc7d106-11e2-41df-b512-89ed69d2a65f`). CUST-WP-0050 remains the
|
||
**coordination driver** (canon standard, decisions D1/D1a, ADR-005). The original
|
||
implementation tasks below are **cancelled here** (re-homed, not abandoned); the
|
||
efficient regrouping merges the three spine-rewriting tasks into one migration:
|
||
|
||
| Was (CUST-WP-0050) | Now (STATE-WP-0065) |
|
||
| --- | --- |
|
||
| T04 schema + T05 data migration + T10 re-anchor/rename | **P1** single Alembic spine migration |
|
||
| T04/T10 API + validation surface | **P2** API / MCP / validation |
|
||
| T06 auto-registration | **P3** auto-registration tooling |
|
||
| T07 reclassify existing | folded into **P3** (lazy, as committed files appear) |
|
||
| T08 surfaces + T09 cutover | **P4** surfaces & cutover |
|
||
|
||
```task
|
||
id: CUST-WP-0050-T04
|
||
status: cancel
|
||
priority: high
|
||
state_hub_task_id: "b61f6267-c2b2-4325-95fa-30ee899ce7d1"
|
||
```
|
||
Re-homed → STATE-WP-0065 P1 (schema: domains→14 market domains + classification on `managed_repos`).
|
||
|
||
```task
|
||
id: CUST-WP-0050-T05
|
||
status: cancel
|
||
priority: high
|
||
state_hub_task_id: "171fa385-4d78-41ea-b749-ac3f9082fe47"
|
||
```
|
||
Re-homed → STATE-WP-0065 P1 (data migration + discrepancy resolution, same window as schema).
|
||
|
||
```task
|
||
id: CUST-WP-0050-T06
|
||
status: cancel
|
||
priority: high
|
||
state_hub_task_id: "6ae14007-d6d2-4395-814e-ace91486a953"
|
||
```
|
||
Re-homed → STATE-WP-0065 P3 (`register-from-classification` bulk/idempotent tooling).
|
||
|
||
```task
|
||
id: CUST-WP-0050-T07
|
||
status: cancel
|
||
priority: medium
|
||
state_hub_task_id: "6411bf3f-9de2-4bcd-9ffe-6209cda6ba93"
|
||
```
|
||
Re-homed → STATE-WP-0065 P3 (lazy reclassification of existing registrations as committed files appear).
|
||
|
||
```task
|
||
id: CUST-WP-0050-T08
|
||
status: cancel
|
||
priority: medium
|
||
state_hub_task_id: "09951aec-2960-4c50-b73d-4e2e7bd285c9"
|
||
```
|
||
Re-homed → STATE-WP-0065 P4 (dashboard, consistency rule, MCP/REST filters, docs).
|
||
|
||
```task
|
||
id: CUST-WP-0050-T09
|
||
status: cancel
|
||
priority: medium
|
||
state_hub_task_id: "babbb80a-c52d-4ec2-b217-2f6196a2e5f3"
|
||
```
|
||
Re-homed → STATE-WP-0065 P4 (cutover, verification, retire old model).
|
||
|
||
```task
|
||
id: CUST-WP-0050-T10
|
||
status: cancel
|
||
priority: high
|
||
state_hub_task_id: "bee16416-a67f-4155-93d7-09f278daa04f"
|
||
```
|
||
Re-homed → STATE-WP-0065 P1 (re-anchor `repo_id` required + `workstream → workplan` rename, merged into the spine migration).
|
||
|
||
### Phase 4 (custodian) — Post-cutover inventory
|
||
|
||
### T11 - Classify & register remaining Gitea inventory (post-cutover)
|
||
|
||
```task
|
||
id: CUST-WP-0050-T11
|
||
status: done
|
||
priority: medium
|
||
state_hub_task_id: "d8895c58-a930-42aa-8207-9babf9ba572a"
|
||
```
|
||
|
||
Replaces dropped T03. **After** STATE-WP-0065 cutover proves the new model,
|
||
author `.repo-classification.yaml` for the remaining active Gitea repos (the ~18
|
||
unregistered + any not yet migrated) and bulk-register them via the
|
||
STATE-WP-0065 P3 tooling — **under the new model**, no legacy detour. Maintain an
|
||
explicit **exclusion list** (fork `tegwick/the-custodian`,
|
||
`lando_worsch/python-snake`, archived `test_domain_v2`, inactive repos).
|
||
|
||
Done when every non-excluded active Gitea repo has a committed, validated
|
||
classification file and a `managed_repo` row under the new taxonomy (or is on the
|
||
recorded exclusion list).
|
||
|
||
**Done (2026-06-22):**
|
||
|
||
- Exclusion list: `canon/standards/repo-classification.exclusions.yaml` (forks,
|
||
archived phantoms, templates/sandboxes, Gitea repos pending local checkout).
|
||
- Batch author: `tools/batch_author_repo_classifications.py` — agent first-pass
|
||
for 51 local repos (skips 10 human-reviewed custodian fixtures); all validated
|
||
against T01; committed in each target repo.
|
||
- Registration: 7 newly registered (`coordination-engine`, `human-resources`,
|
||
`markitect-main`, `repo-seed`, `tegwick-control`, `vantage-point`,
|
||
`whynot-control`); `make register-from-classification-all` updated 43 existing
|
||
rows from `classified_by: migration` → `agent` (0 invalid).
|
||
- **Coverage:** 63 active `managed_repos` — 11 `human`, 51 `agent`, 1 deferred
|
||
(`marki-docx`, hub-only, on exclusion list pending clone). Excluded locally:
|
||
`hub-core-seed`, `sand-boxer`. Archived hub rows (4) unchanged.
|
||
|
||
## Open Questions / Decisions
|
||
|
||
- **D1 (RESOLVED 2026-06-22): the repo is the primary anchor.** Workplans bind to
|
||
repos (`repo_id` required); market-domain is *derived* from the repo's
|
||
classification; `topic`/`domain` stop being the spine (`topic` retires or
|
||
becomes an optional cross-repo tag). This supersedes the earlier "keep topic as
|
||
an independent coordination unit" proposal. Implemented by T04/T10.
|
||
- **D1a (open, follows from D1): anchor for cross-repo workplans.** Per **ADR-005**,
|
||
a complex cross-repo effort gets its own **project repo** (`category: project`)
|
||
as the anchor, retired to archive on completion with results living in the
|
||
modified product repos. Open sub-point: the project-repo **naming convention**
|
||
(e.g. `proj-<slug>` vs a dedicated grouping) and the archival trigger details.
|
||
- **D2: classification ownership/approval.** Who approves each repo's
|
||
`.repo-classification.yaml` — per-repo owner, or central custodian review?
|
||
- **D3 (RESOLVED 2026-06-22): exclusion list.** Recorded at
|
||
`canon/standards/repo-classification.exclusions.yaml` — forks/personal repos,
|
||
archived phantoms, template/sandbox checkouts, and Gitea slugs pending local
|
||
checkout (incl. `marki-docx`).
|
||
- **D4: behavioural vs descriptive.** Do `secondary_domains` / `capability_tags`
|
||
/ `business_stake` drive any Hub behaviour initially, or are they descriptive
|
||
until a later phase?
|
||
|
||
## Risks
|
||
|
||
- **Breaking-migration blast radius** — topics/workstreams/goals/decisions and
|
||
charter `topic_id` references all move; mitigate with a reviewed dry-run and a
|
||
tested downgrade (T05).
|
||
- **Cross-repo coordination** — T03 touches ~70 repos via PRs; sequence behind
|
||
T01/T02 so the vocabulary is stable first.
|
||
- **Consistency-checker coupling** — existing C-rules assume the current domain
|
||
model; update alongside (T08) to avoid mass false positives.
|
||
- **Boundary drift** — keep implementation in `state-hub`; this plan coordinates.
|