Implements CUST-WP-0007. Resolves inconsistencies I-1, I-2, I-5, I-6
identified in the GEMS audit (GenericEntityModellingSystem.md).
Pass 1 (e1f2a3b4c5d6): domain_id FK on extension_points and
technical_debt (replaces raw string column); repo_id FK on contributions.
Fixes domain-filtering bugs in EP/TD dashboard pages.
Pass 2 (f2a3b4c5d6e7): repo_id nullable FK on workstreams, aligning
the GEMS primary attachment with ADR-001 (repo > topic). Dashboard
pages updated to prefer repo->domain over topic->domain.
Pass 3 (a3b4c5d6e7f8): SBOMSnapshot container entity (GEMS Complex
between Repository and SBOMEntry). Ingest is now additive — each call
creates a new snapshot; history is retained. List/report endpoints
filter to latest snapshot per repo via _latest_snapshot_ids_subquery().
New endpoints: GET /sbom/snapshots/, GET /sbom/snapshots/{id}/.
Dashboard gains a Snapshot History section.
Also adds GEMS analysis artefacts: wiki/GEMS-StateHub-TypeRegistry.md,
wiki/GEMS-StateHub-SWOT.md, workplans/CUST-WP-0006 (analysis),
workplans/CUST-WP-0007 (migration, now completed).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9.5 KiB
SWOT Analysis — Migrating State-Hub to GEMS
Evaluation of migrating the Custodian State Hub data store from its current
ad-hoc relational schema to the Generic Entity Modelling System (GEMS) as
defined in wiki/GenericEntityModellingSystem.md and instantiated in
wiki/GEMS-StateHub-TypeRegistry.md.
Created: 2026-03-02 Author: Custodian (analytical session)
Migration Options Under Consideration
Before the SWOT, three architectural options are in scope:
Option A — Full Generic Entity Model
Single entities table + attachments junction + JSONB payload. True GEMS
implementation. All current typed tables dissolved into the entity model.
Option B — Typed-Table Approach with GEMS Constraints Keep typed tables (domains, topics, workstreams, etc.) but add:
- A universal
entity_idabstraction layer - An
attachmentsjunction table for secondary attachments - Application-level GEMS constraint validation
- Fix all structural inconsistencies (I-1 through I-8 in CUST-WP-0006)
Option C — Incremental Normalization (Pattern C from GEMS §9) Fix the most critical inconsistencies immediately (I-1, I-2, I-5), leave lesser items wrapped/deferred. No generic entity table introduced.
SWOT Analysis
Strengths
S1 — Uniform modeling surface eliminates special-casing (all options) Currently each entity type has bespoke FKs, bespoke routers, and bespoke MCP tools. GEMS gives a predictable pattern: every entity has a primary context and optional secondaries. New entity types follow the same pattern with zero schema design work.
S2 — Fixes real, observable bugs (Options B and C) The domain string inconsistency (I-1) causes SBOM and EP/TD dashboard views to silently display wrong or missing domain associations. The Workstream/Topic container mismatch (I-2) causes domain attribution to fail in the Dependencies view. These are current user-visible defects — migration resolves them.
S3 — ADR-001 alignment (Options B and C) ADR-001 mandates that workstreams originate in repos. The current schema forces workstreams under Topics. Migrating Workstream.primary → Repository would bring the schema into conformance with the governing ADR.
S4 — Enables first-class graph queries (Option A, partially B) With Relations as first-class entities, queries like "what decisions influenced which tasks?" or "what dependencies cross domain boundaries?" become uniform and indexable. Currently these require ad-hoc multi-table joins.
S5 — Incremental migration is supported by the model (all options) GEMS §9 explicitly defines integration patterns for existing systems. Pattern C (progressive normalization) allows working systems to remain stable while the most valuable types are migrated first.
S6 — Future-proofs multi-domain cross-system queries As more repositories are registered and domains become interdependent, the current schema's inconsistencies compound. GEMS alignment now prevents exponential complexity accumulation.
Weaknesses
W1 — Option A requires full schema rewrite (high risk) Dissolving typed tables into a generic entity model means every router, every MCP tool, every dashboard data loader, and every Alembic migration must be rewritten. This is weeks of work with high regression risk.
W2 — Loss of SQL-level type safety (Option A) Typed tables give the database schema as documentation and enforce type-correct relations at the DB constraint level (FK types, enum columns). A generic entity table with JSONB payloads moves type enforcement to the application layer, which is easier to break silently.
W3 — GEMS does not define a concrete SQL schema The GEMS document is conceptual. Translating the attachment list model into PostgreSQL requires design decisions (indexed JSONB vs. junction table, UUID ordering, etc.) that are not trivial and have performance implications.
W4 — ProgressEvent's multi-attach pattern doesn't map cleanly to GEMS ProgressEvent's current schema (nullable topic_id, workstream_id, task_id, decision_id) is intentionally flexible for an append-only log. GEMS's "exactly one primary attachment" rule may force awkward choices (e.g. always using Workstream as primary even for domain-level events).
W5 — Ecosystem root is of uncertain value Adding an explicit Ecosystem singleton adds ceremony for little practical query benefit in the current six-domain setup. It may become valuable when the system grows to multi-tenant or multi-ecosystem scope, but is premature now.
Opportunities
O1 — Snapshot diffing for SBOM (SBOMSnapshot entity) Adding a SBOMSnapshot container (resolves I-5) enables: "what packages were added/removed between ingests?" This is a direct user value feature, not just architectural cleanup.
O2 — Unified contribution and decision provenance graph With Relation entities, you can model "Decision D motivates Workstream W" or "Contribution C implements Decision D" as queryable, auditable edges. This is the foundation for a richer Custodian agent that can reason about the provenance of work items.
O3 — Generic dashboard patterns Once GEMS is in place, dashboard pages can share a single entity-browsing component rather than one bespoke page per entity type. This reduces UI technical debt significantly.
O4 — Enabling cross-repo task relations (DependsOn at Repository scope) With Relations as first-class, it becomes natural to register "Task A in repo X blocks Task B in repo Y" — a cross-repo dependency that the current WorkstreamDependency table cannot model.
O5 — Type registry as a self-documenting schema A GEMS Type Registry is human-readable, machine-validatable, and version-controlled. It replaces the current implicit understanding of "what can be attached to what" with an explicit contract.
Threats
T1 — Risk of over-engineering a working system The state-hub currently works well enough for its intended read-model role. A full schema rewrite to achieve theoretical elegance could introduce regressions, stall other domain work, and deliver minimal user-visible value in the short term.
T2 — ADR-001 workplan file format would need updates
If Workstream moves from Topic to Repository as its primary container, every
existing workplan frontmatter field (topic_slug) would need to become or add
repo_slug. All workplan files across all registered repos require updating.
T3 — Hybrid state during incremental migration is confusing Pattern C leaves the system in a mixed state for an extended period: some entities are GEMS-conformant, others are legacy. Tooling must handle both shapes simultaneously, increasing maintenance burden.
T4 — Dashboard rewrites could introduce new bugs The dashboard is the primary UI for the hub. Rewriting data loaders and query patterns risks introducing visual regressions that would go unnoticed without a test suite (there is currently none for the dashboard).
T5 — No migration dry-run tooling exists
The current make sync-workplans doesn't exist yet (CUST-WP future deliverable).
Running migrations against production data without a rollback path is risky.
Verdict and Recommended Path
Recommended: Option C — Incremental Normalization
Proceed in three targeted passes, each independently releasable:
Pass 1 — Fix structural inconsistencies (I-1, I-6): low risk, high consistency gain
- Migrate
ExtensionPoint.domain(String) →domain_idFK + back-fill - Migrate
TechnicalDebt.domain(String) →domain_idFK + back-fill - Add
repo_idFK toContribution(nullable initially) - This pass has zero API breaking changes; only DB schema and router filter logic change.
Pass 2 — Align Workstream with ADR-001 (I-2): medium risk, architectural gain
- Add
repo_idFK toWorkstream(nullable initially, then enforce) - Update MCP
create_workstreamto requirerepo_id - Update workplan frontmatter format to include
repo_slug - Migrate
dependencies.mdto userepoinstead oftopicfor domain resolution - Decision DEC-GEMS-002 must be resolved before this pass begins
Pass 3 — Add SBOMSnapshot container (I-5): medium risk, feature gain
- Add
sbom_snapshotstable + FK fromsbom_entries - Update ingest API to create/find snapshot per repo+timestamp
- Enable snapshot history and diff queries in SBOM dashboard
- Decision DEC-GEMS-004 must be resolved before this pass begins
Deferred: Full generic entity model (Option A), Ecosystem root (I-7), DependsOn as first-class Relation (I-8), ManagedRepo.topic_id cleanup (I-4). These are tracked as extension points; revisit after Passes 1-3 are stable.
Decision Dependency Map
DEC-GEMS-001 (architecture) ──────────────────────────────────► Pass 3+
DEC-GEMS-002 (workstream/topic vs repo) ──────────────────────► Pass 2
DEC-GEMS-003 (domain string → FK) ────────────────────────────► Pass 1
DEC-GEMS-004 (SBOMSnapshot container) ────────────────────────► Pass 3
DEC-GEMS-005 (Ecosystem root) ─────────────────────────────────► Deferred
DEC-GEMS-006 (DependsOn as Relation entity) ───────────────────► Deferred
Pass 1 can begin as soon as DEC-GEMS-003 is resolved (expected: trivially yes). Pass 2 requires DEC-GEMS-002 resolution (breaking change; needs explicit approval). Pass 3 requires DEC-GEMS-004 resolution.