# SWOT Analysis — Migrating State-Hub to GEMS Evaluation of migrating the Custodian State Hub data store from its current ad-hoc relational schema to the Generic Entity Modelling System (GEMS) as defined in `wiki/GenericEntityModellingSystem.md` and instantiated in `wiki/GEMS-StateHub-TypeRegistry.md`. **Created:** 2026-03-02 **Author:** Custodian (analytical session) --- ## Migration Options Under Consideration Before the SWOT, three architectural options are in scope: **Option A — Full Generic Entity Model** Single `entities` table + `attachments` junction + JSONB payload. True GEMS implementation. All current typed tables dissolved into the entity model. **Option B — Typed-Table Approach with GEMS Constraints** Keep typed tables (domains, topics, workstreams, etc.) but add: - A universal `entity_id` abstraction layer - An `attachments` junction table for secondary attachments - Application-level GEMS constraint validation - Fix all structural inconsistencies (I-1 through I-8 in CUST-WP-0006) **Option C — Incremental Normalization (Pattern C from GEMS §9)** Fix the most critical inconsistencies immediately (I-1, I-2, I-5), leave lesser items wrapped/deferred. No generic entity table introduced. --- ## SWOT Analysis ### Strengths **S1 — Uniform modeling surface eliminates special-casing (all options)** Currently each entity type has bespoke FKs, bespoke routers, and bespoke MCP tools. GEMS gives a predictable pattern: every entity has a primary context and optional secondaries. New entity types follow the same pattern with zero schema design work. **S2 — Fixes real, observable bugs (Options B and C)** The domain string inconsistency (I-1) causes SBOM and EP/TD dashboard views to silently display wrong or missing domain associations. The Workstream/Topic container mismatch (I-2) causes domain attribution to fail in the Dependencies view. These are current user-visible defects — migration resolves them. **S3 — ADR-001 alignment (Options B and C)** ADR-001 mandates that workstreams originate in repos. The current schema forces workstreams under Topics. Migrating Workstream.primary → Repository would bring the schema into conformance with the governing ADR. **S4 — Enables first-class graph queries (Option A, partially B)** With Relations as first-class entities, queries like "what decisions influenced which tasks?" or "what dependencies cross domain boundaries?" become uniform and indexable. Currently these require ad-hoc multi-table joins. **S5 — Incremental migration is supported by the model (all options)** GEMS §9 explicitly defines integration patterns for existing systems. Pattern C (progressive normalization) allows working systems to remain stable while the most valuable types are migrated first. **S6 — Future-proofs multi-domain cross-system queries** As more repositories are registered and domains become interdependent, the current schema's inconsistencies compound. GEMS alignment now prevents exponential complexity accumulation. --- ### Weaknesses **W1 — Option A requires full schema rewrite (high risk)** Dissolving typed tables into a generic entity model means every router, every MCP tool, every dashboard data loader, and every Alembic migration must be rewritten. This is weeks of work with high regression risk. **W2 — Loss of SQL-level type safety (Option A)** Typed tables give the database schema as documentation and enforce type-correct relations at the DB constraint level (FK types, enum columns). A generic entity table with JSONB payloads moves type enforcement to the application layer, which is easier to break silently. **W3 — GEMS does not define a concrete SQL schema** The GEMS document is conceptual. Translating the attachment list model into PostgreSQL requires design decisions (indexed JSONB vs. junction table, UUID ordering, etc.) that are not trivial and have performance implications. **W4 — ProgressEvent's multi-attach pattern doesn't map cleanly to GEMS** ProgressEvent's current schema (nullable topic_id, workstream_id, task_id, decision_id) is intentionally flexible for an append-only log. GEMS's "exactly one primary attachment" rule may force awkward choices (e.g. always using Workstream as primary even for domain-level events). **W5 — Ecosystem root is of uncertain value** Adding an explicit Ecosystem singleton adds ceremony for little practical query benefit in the current six-domain setup. It may become valuable when the system grows to multi-tenant or multi-ecosystem scope, but is premature now. --- ### Opportunities **O1 — Snapshot diffing for SBOM (SBOMSnapshot entity)** Adding a SBOMSnapshot container (resolves I-5) enables: "what packages were added/removed between ingests?" This is a direct user value feature, not just architectural cleanup. **O2 — Unified contribution and decision provenance graph** With Relation entities, you can model "Decision D motivates Workstream W" or "Contribution C implements Decision D" as queryable, auditable edges. This is the foundation for a richer Custodian agent that can reason about the provenance of work items. **O3 — Generic dashboard patterns** Once GEMS is in place, dashboard pages can share a single entity-browsing component rather than one bespoke page per entity type. This reduces UI technical debt significantly. **O4 — Enabling cross-repo task relations (DependsOn at Repository scope)** With Relations as first-class, it becomes natural to register "Task A in repo X blocks Task B in repo Y" — a cross-repo dependency that the current WorkstreamDependency table cannot model. **O5 — Type registry as a self-documenting schema** A GEMS Type Registry is human-readable, machine-validatable, and version-controlled. It replaces the current implicit understanding of "what can be attached to what" with an explicit contract. --- ### Threats **T1 — Risk of over-engineering a working system** The state-hub currently works well enough for its intended read-model role. A full schema rewrite to achieve theoretical elegance could introduce regressions, stall other domain work, and deliver minimal user-visible value in the short term. **T2 — ADR-001 workplan file format would need updates** If Workstream moves from Topic to Repository as its primary container, every existing workplan frontmatter field (`topic_slug`) would need to become or add `repo_slug`. All workplan files across all registered repos require updating. **T3 — Hybrid state during incremental migration is confusing** Pattern C leaves the system in a mixed state for an extended period: some entities are GEMS-conformant, others are legacy. Tooling must handle both shapes simultaneously, increasing maintenance burden. **T4 — Dashboard rewrites could introduce new bugs** The dashboard is the primary UI for the hub. Rewriting data loaders and query patterns risks introducing visual regressions that would go unnoticed without a test suite (there is currently none for the dashboard). **T5 — No migration dry-run tooling exists** The current `make sync-workplans` doesn't exist yet (CUST-WP future deliverable). Running migrations against production data without a rollback path is risky. --- ## Verdict and Recommended Path **Recommended: Option C — Incremental Normalization** Proceed in three targeted passes, each independently releasable: **Pass 1 — Fix structural inconsistencies (I-1, I-6): low risk, high consistency gain** - Migrate `ExtensionPoint.domain` (String) → `domain_id` FK + back-fill - Migrate `TechnicalDebt.domain` (String) → `domain_id` FK + back-fill - Add `repo_id` FK to `Contribution` (nullable initially) - This pass has zero API breaking changes; only DB schema and router filter logic change. **Pass 2 — Align Workstream with ADR-001 (I-2): medium risk, architectural gain** - Add `repo_id` FK to `Workstream` (nullable initially, then enforce) - Update MCP `create_workstream` to require `repo_id` - Update workplan frontmatter format to include `repo_slug` - Migrate `dependencies.md` to use `repo` instead of `topic` for domain resolution - Decision DEC-GEMS-002 must be resolved before this pass begins **Pass 3 — Add SBOMSnapshot container (I-5): medium risk, feature gain** - Add `sbom_snapshots` table + FK from `sbom_entries` - Update ingest API to create/find snapshot per repo+timestamp - Enable snapshot history and diff queries in SBOM dashboard - Decision DEC-GEMS-004 must be resolved before this pass begins **Deferred:** Full generic entity model (Option A), Ecosystem root (I-7), DependsOn as first-class Relation (I-8), ManagedRepo.topic_id cleanup (I-4). These are tracked as extension points; revisit after Passes 1-3 are stable. --- ## Decision Dependency Map ``` DEC-GEMS-001 (architecture) ──────────────────────────────────► Pass 3+ DEC-GEMS-002 (workstream/topic vs repo) ──────────────────────► Pass 2 DEC-GEMS-003 (domain string → FK) ────────────────────────────► Pass 1 DEC-GEMS-004 (SBOMSnapshot container) ────────────────────────► Pass 3 DEC-GEMS-005 (Ecosystem root) ─────────────────────────────────► Deferred DEC-GEMS-006 (DependsOn as Relation entity) ───────────────────► Deferred ``` Pass 1 can begin as soon as DEC-GEMS-003 is resolved (expected: trivially yes). Pass 2 requires DEC-GEMS-002 resolution (breaking change; needs explicit approval). Pass 3 requires DEC-GEMS-004 resolution.