Start CUST-WP-0050: T01 allowed-values + validator; classify the-custodian
Activate the workplan and complete T01: add the machine-readable controlled vocabulary canon/standards/repo-classification.allowed.yaml (categories, domains, business_stake, business_mechanics, capability families, guidance), reference it from the standard §12, and add tools/validate_repo_classification.py (stdlib + PyYAML, --self-test PASS). Begin T02: author the-custodian/.repo-classification.yaml (research · infotech · agents), which validates clean. classified_by: agent, pending human review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
38
.repo-classification.yaml
Normal file
38
.repo-classification.yaml
Normal file
@@ -0,0 +1,38 @@
|
||||
repo_classification:
|
||||
standard: Repo Classification Standard
|
||||
version: "1.0"
|
||||
classified_at: "2026-06-22"
|
||||
classified_by: agent
|
||||
|
||||
# the-custodian is the governance/continuity substrate: canon, standards,
|
||||
# ADRs, charters, memory, and cross-domain coordination scaffolding.
|
||||
category: research
|
||||
domain: infotech
|
||||
secondary_domains:
|
||||
- agents
|
||||
|
||||
capability_tags:
|
||||
- governance
|
||||
- knowledge
|
||||
- coordination
|
||||
- policy
|
||||
- documentation
|
||||
|
||||
business_stake:
|
||||
- technology
|
||||
- operations
|
||||
- intelligence
|
||||
- execution
|
||||
|
||||
business_mechanics:
|
||||
- intention
|
||||
- control
|
||||
- coordination
|
||||
- adaptation
|
||||
|
||||
notes: >
|
||||
Primary domain is infotech (the intended users are the ecosystem's
|
||||
developers and agents); agents is a secondary domain because the repo is
|
||||
agent-coordination infrastructure. Classified as research because its core
|
||||
output is canon, standards, and decision records rather than a deployable
|
||||
product. First-pass agent classification pending human review (CUST-WP-0050 T02).
|
||||
@@ -682,6 +682,10 @@ Add only the mechanics that materially apply.
|
||||
|
||||
## 12. Validation Checklist
|
||||
|
||||
The controlled vocabularies are maintained in machine-readable form at
|
||||
`canon/standards/repo-classification.allowed.yaml`. Validate a file with
|
||||
`tools/validate_repo_classification.py <path-to-.repo-classification.yaml>`.
|
||||
|
||||
A repo classification is valid when:
|
||||
|
||||
- [ ] `category` exists and has exactly one allowed value.
|
||||
|
||||
111
canon/standards/repo-classification.allowed.yaml
Normal file
111
canon/standards/repo-classification.allowed.yaml
Normal file
@@ -0,0 +1,111 @@
|
||||
# Machine-readable allowed-values for the Repo Classification Standard.
|
||||
#
|
||||
# Single source of truth for the standard's controlled vocabularies, derived
|
||||
# from canon/standards/repo-classification-standard_v1.0.md. Consumed by:
|
||||
# - the per-repo .repo-classification.yaml linter (tools/validate_repo_classification.py)
|
||||
# - the State Hub registration validator (CUST-WP-0050 T04)
|
||||
#
|
||||
# When the standard's vocabularies change, update this file and bump `version`
|
||||
# to match the standard version. CUST-WP-0050 T01.
|
||||
|
||||
standard: "Repo Classification Standard"
|
||||
version: "1.0"
|
||||
canon_id: "canon-repo-classification"
|
||||
|
||||
# category — exactly 1 required (§5)
|
||||
categories:
|
||||
- experimental
|
||||
- research
|
||||
- project
|
||||
- product
|
||||
- business
|
||||
|
||||
# domain / secondary_domains — primary exactly 1; secondaries 0..n (§6)
|
||||
domains:
|
||||
- infotech
|
||||
- financials
|
||||
- communication
|
||||
- consumer
|
||||
- health
|
||||
- industrials
|
||||
- energy
|
||||
- utilities
|
||||
- materials
|
||||
- realestate
|
||||
- crypto
|
||||
- agents
|
||||
- space
|
||||
- government
|
||||
|
||||
# business_stake — 0..n; 2..6 recommended (§8)
|
||||
business_stake:
|
||||
- execution
|
||||
- intelligence
|
||||
- finance
|
||||
- legal
|
||||
- sales
|
||||
- experience
|
||||
- technology
|
||||
- operations
|
||||
- product
|
||||
- people
|
||||
- procurement
|
||||
- sustainability
|
||||
- automation
|
||||
|
||||
# business_mechanics — 0..n, optional (§9)
|
||||
business_mechanics:
|
||||
- intention
|
||||
- control
|
||||
- coordination
|
||||
- operation
|
||||
- adaptation
|
||||
|
||||
# capability_tags are intentionally OPEN-ENDED (§7): lowercase kebab-case, not
|
||||
# restricted to this set. The families below are the standard's recommended
|
||||
# canonical tags — used to warn on likely synonyms/typos, never to reject.
|
||||
capability_families:
|
||||
identity_and_access:
|
||||
- identity
|
||||
- authentication
|
||||
- authorization
|
||||
- access-control
|
||||
- user-management
|
||||
- tenancy
|
||||
knowledge_and_evidence:
|
||||
- knowledge
|
||||
- citations
|
||||
- evidence
|
||||
- source-management
|
||||
- traceability
|
||||
- documentation
|
||||
platform_and_operations:
|
||||
- platform
|
||||
- deployment
|
||||
- operations
|
||||
- observability
|
||||
- feature-control
|
||||
- configuration
|
||||
- orchestration
|
||||
market_and_coordination:
|
||||
- marketplace
|
||||
- pricing
|
||||
- reputation
|
||||
- challenges
|
||||
- bounties
|
||||
- collaboration
|
||||
- coordination
|
||||
governance_and_control:
|
||||
- governance
|
||||
- policy
|
||||
- compliance
|
||||
- risk
|
||||
- audit
|
||||
- control
|
||||
|
||||
# Validation guidance (advisory bounds the linter applies as warnings)
|
||||
guidance:
|
||||
secondary_domains_max: 3
|
||||
business_stake_recommended_min: 2
|
||||
business_stake_recommended_max: 6
|
||||
capability_tag_pattern: "^[a-z0-9]+(-[a-z0-9]+)*$"
|
||||
186
tools/validate_repo_classification.py
Normal file
186
tools/validate_repo_classification.py
Normal file
@@ -0,0 +1,186 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Validate a .repo-classification.yaml against the Repo Classification Standard.
|
||||
|
||||
Single small linter shared by repo authors and (later) the State Hub registration
|
||||
validator. It checks a repo's classification file against the controlled
|
||||
vocabularies in canon/standards/repo-classification.allowed.yaml.
|
||||
|
||||
Usage:
|
||||
validate_repo_classification.py <path-to-.repo-classification.yaml> ...
|
||||
validate_repo_classification.py --self-test
|
||||
|
||||
Exit code 0 = all files valid (warnings allowed); 1 = at least one invalid.
|
||||
|
||||
CUST-WP-0050 T01. Depends on PyYAML (stdlib + pyyaml only).
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
ALLOWED_PATH = (
|
||||
Path(__file__).resolve().parent.parent
|
||||
/ "canon"
|
||||
/ "standards"
|
||||
/ "repo-classification.allowed.yaml"
|
||||
)
|
||||
|
||||
|
||||
def load_allowed(path: Path = ALLOWED_PATH) -> dict:
|
||||
with path.open() as fh:
|
||||
return yaml.safe_load(fh)
|
||||
|
||||
|
||||
def _known_capability_tags(allowed: dict) -> set[str]:
|
||||
tags: set[str] = set()
|
||||
for fam in (allowed.get("capability_families") or {}).values():
|
||||
tags.update(fam or [])
|
||||
return tags
|
||||
|
||||
|
||||
def validate(doc: dict, allowed: dict) -> tuple[list[str], list[str]]:
|
||||
"""Return (errors, warnings) for a parsed classification document."""
|
||||
errors: list[str] = []
|
||||
warnings: list[str] = []
|
||||
|
||||
block = doc.get("repo_classification") if isinstance(doc, dict) else None
|
||||
if not isinstance(block, dict):
|
||||
return (["missing top-level `repo_classification:` mapping"], [])
|
||||
|
||||
categories = set(allowed["categories"])
|
||||
domains = set(allowed["domains"])
|
||||
stakes = set(allowed["business_stake"])
|
||||
mechanics = set(allowed["business_mechanics"])
|
||||
guidance = allowed.get("guidance", {})
|
||||
pattern = re.compile(guidance.get("capability_tag_pattern", r"^[a-z0-9]+(-[a-z0-9]+)*$"))
|
||||
|
||||
# category — required, exactly one allowed value
|
||||
category = block.get("category")
|
||||
if category is None:
|
||||
errors.append("`category` is required")
|
||||
elif category not in categories:
|
||||
errors.append(f"`category` '{category}' not in {sorted(categories)}")
|
||||
|
||||
# domain — required, exactly one allowed value
|
||||
domain = block.get("domain")
|
||||
if domain is None:
|
||||
errors.append("`domain` is required")
|
||||
elif domain not in domains:
|
||||
errors.append(f"`domain` '{domain}' not in allowed domains")
|
||||
|
||||
# secondary_domains — 0..n allowed domains, excluding primary, no dups
|
||||
secondary = block.get("secondary_domains") or []
|
||||
if not isinstance(secondary, list):
|
||||
errors.append("`secondary_domains` must be a list")
|
||||
secondary = []
|
||||
for d in secondary:
|
||||
if d not in domains:
|
||||
errors.append(f"secondary domain '{d}' not in allowed domains")
|
||||
if d == domain:
|
||||
errors.append(f"secondary domain '{d}' repeats the primary domain")
|
||||
if len(secondary) != len(set(secondary)):
|
||||
errors.append("`secondary_domains` contains duplicates")
|
||||
smax = guidance.get("secondary_domains_max", 3)
|
||||
if len(secondary) > smax:
|
||||
warnings.append(f"{len(secondary)} secondary_domains exceeds recommended max {smax}")
|
||||
|
||||
# capability_tags — open-ended, kebab-case; warn on unknown/synonym
|
||||
tags = block.get("capability_tags") or []
|
||||
if not isinstance(tags, list):
|
||||
errors.append("`capability_tags` must be a list")
|
||||
tags = []
|
||||
known = _known_capability_tags(allowed)
|
||||
for t in tags:
|
||||
if not isinstance(t, str) or not pattern.match(t):
|
||||
errors.append(f"capability_tag '{t}' is not lowercase kebab-case")
|
||||
elif t not in known:
|
||||
warnings.append(f"capability_tag '{t}' is not a recommended family tag (allowed, check for synonym)")
|
||||
|
||||
# business_stake — 0..n allowed; recommend 2..6
|
||||
stake = block.get("business_stake") or []
|
||||
if not isinstance(stake, list):
|
||||
errors.append("`business_stake` must be a list")
|
||||
stake = []
|
||||
for s in stake:
|
||||
if s not in stakes:
|
||||
errors.append(f"business_stake '{s}' not in {sorted(stakes)}")
|
||||
if stake:
|
||||
lo = guidance.get("business_stake_recommended_min", 2)
|
||||
hi = guidance.get("business_stake_recommended_max", 6)
|
||||
if not (lo <= len(stake) <= hi):
|
||||
warnings.append(f"{len(stake)} business_stake values; {lo}-{hi} recommended")
|
||||
|
||||
# business_mechanics — 0..n allowed
|
||||
mech = block.get("business_mechanics") or []
|
||||
if not isinstance(mech, list):
|
||||
errors.append("`business_mechanics` must be a list")
|
||||
mech = []
|
||||
for m in mech:
|
||||
if m not in mechanics:
|
||||
errors.append(f"business_mechanics '{m}' not in {sorted(mechanics)}")
|
||||
|
||||
return errors, warnings
|
||||
|
||||
|
||||
def validate_file(path: Path, allowed: dict) -> bool:
|
||||
try:
|
||||
doc = yaml.safe_load(path.read_text())
|
||||
except (OSError, yaml.YAMLError) as exc:
|
||||
print(f"FAIL {path}: cannot read/parse ({exc})")
|
||||
return False
|
||||
errors, warnings = validate(doc, allowed)
|
||||
for w in warnings:
|
||||
print(f"warn {path}: {w}")
|
||||
if errors:
|
||||
for e in errors:
|
||||
print(f"FAIL {path}: {e}")
|
||||
return False
|
||||
print(f"ok {path}")
|
||||
return True
|
||||
|
||||
|
||||
def self_test(allowed: dict) -> bool:
|
||||
good = {
|
||||
"repo_classification": {
|
||||
"category": "research",
|
||||
"domain": "infotech",
|
||||
"secondary_domains": ["agents"],
|
||||
"capability_tags": ["governance", "knowledge", "coordination"],
|
||||
"business_stake": ["technology", "operations", "intelligence"],
|
||||
"business_mechanics": ["intention", "control", "coordination"],
|
||||
}
|
||||
}
|
||||
bad = {
|
||||
"repo_classification": {
|
||||
"category": "platform", # not a category (it's a tag)
|
||||
"domain": "knowledge", # not a market domain
|
||||
"secondary_domains": ["infotech", "infotech"],
|
||||
"capability_tags": ["Stuff", "access-control"],
|
||||
"business_stake": ["technology", "wizardry"],
|
||||
"business_mechanics": ["teleportation"],
|
||||
}
|
||||
}
|
||||
ge, _ = validate(good, allowed)
|
||||
be, _ = validate(bad, allowed)
|
||||
ok = (ge == []) and (len(be) >= 5)
|
||||
print(f"self-test: good_errors={len(ge)} bad_errors={len(be)} -> {'PASS' if ok else 'FAIL'}")
|
||||
return ok
|
||||
|
||||
|
||||
def main(argv: list[str]) -> int:
|
||||
allowed = load_allowed()
|
||||
args = argv[1:]
|
||||
if not args or args == ["--self-test"]:
|
||||
return 0 if self_test(allowed) else 1
|
||||
all_ok = True
|
||||
for a in args:
|
||||
if not validate_file(Path(a), allowed):
|
||||
all_ok = False
|
||||
return 0 if all_ok else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main(sys.argv))
|
||||
@@ -4,13 +4,14 @@ type: workplan
|
||||
title: "Repo Classification & State Hub Registration Redesign"
|
||||
domain: custodian
|
||||
repo: the-custodian
|
||||
status: proposed
|
||||
status: active
|
||||
owner: custodian
|
||||
topic_slug: custodian
|
||||
planning_priority: high
|
||||
planning_order: 50
|
||||
created: "2026-06-22"
|
||||
updated: "2026-06-22"
|
||||
started: "2026-06-22"
|
||||
state_hub_workstream_id: "9f031f48-8de8-48b6-8e69-d2d83ad70a7a"
|
||||
---
|
||||
|
||||
@@ -150,7 +151,7 @@ hub remains a read/index model fed by repo-owned files (ADR-001).
|
||||
|
||||
```task
|
||||
id: CUST-WP-0050-T01
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "d978b1f3-4eca-4a17-835b-2c25d13cae22"
|
||||
```
|
||||
@@ -164,13 +165,20 @@ families) into a single machine-readable artefact (e.g.
|
||||
Done when a single allowed-values file exists, is referenced by the standard, and
|
||||
a small validator can check a `.repo-classification.yaml` against it.
|
||||
|
||||
**Delivered (2026-06-22):** `canon/standards/repo-classification.allowed.yaml`
|
||||
(categories, domains, business_stake, business_mechanics, capability families,
|
||||
guidance bounds); referenced from the standard §12; validator
|
||||
`tools/validate_repo_classification.py` (stdlib + PyYAML) with `--self-test`
|
||||
(PASS) — checks category/domain enums, secondary-domain rules, kebab-case tags,
|
||||
and stake/mechanics enums.
|
||||
|
||||
### Phase 2 — Classify the portfolio (repo-owned source of truth)
|
||||
|
||||
### T02 - Classify custodian-owned repos
|
||||
|
||||
```task
|
||||
id: CUST-WP-0050-T02
|
||||
status: todo
|
||||
status: in_progress
|
||||
priority: high
|
||||
state_hub_task_id: "b7edfbb5-483f-4600-9356-8f885c78ce58"
|
||||
```
|
||||
@@ -183,6 +191,12 @@ standard's §16 agent prompt as a first pass.
|
||||
Done when each custodian repo has a committed file that validates against T01 and
|
||||
has been reviewed by a human.
|
||||
|
||||
**Progress (2026-06-22):** `the-custodian/.repo-classification.yaml` authored
|
||||
(category: research · domain: infotech · secondary: agents) and validates clean;
|
||||
flagged `classified_by: agent` pending human review. Remaining 10 custodian repos
|
||||
(state-hub, hub-core, inter-hub, activity-core, issue-core, kaizen-agentic,
|
||||
llm-connect, ops-bridge, ops-warden, email-connect) still to classify.
|
||||
|
||||
### T03 - Classify the full Gitea inventory
|
||||
|
||||
```task
|
||||
|
||||
Reference in New Issue
Block a user