Implements CUST-WP-0007. Resolves inconsistencies I-1, I-2, I-5, I-6
identified in the GEMS audit (GenericEntityModellingSystem.md).
Pass 1 (e1f2a3b4c5d6): domain_id FK on extension_points and
technical_debt (replaces raw string column); repo_id FK on contributions.
Fixes domain-filtering bugs in EP/TD dashboard pages.
Pass 2 (f2a3b4c5d6e7): repo_id nullable FK on workstreams, aligning
the GEMS primary attachment with ADR-001 (repo > topic). Dashboard
pages updated to prefer repo->domain over topic->domain.
Pass 3 (a3b4c5d6e7f8): SBOMSnapshot container entity (GEMS Complex
between Repository and SBOMEntry). Ingest is now additive — each call
creates a new snapshot; history is retained. List/report endpoints
filter to latest snapshot per repo via _latest_snapshot_ids_subquery().
New endpoints: GET /sbom/snapshots/, GET /sbom/snapshots/{id}/.
Dashboard gains a Snapshot History section.
Also adds GEMS analysis artefacts: wiki/GEMS-StateHub-TypeRegistry.md,
wiki/GEMS-StateHub-SWOT.md, workplans/CUST-WP-0006 (analysis),
workplans/CUST-WP-0007 (migration, now completed).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
206 lines
9.5 KiB
Markdown
206 lines
9.5 KiB
Markdown
# SWOT Analysis — Migrating State-Hub to GEMS
|
|
|
|
Evaluation of migrating the Custodian State Hub data store from its current
|
|
ad-hoc relational schema to the Generic Entity Modelling System (GEMS) as
|
|
defined in `wiki/GenericEntityModellingSystem.md` and instantiated in
|
|
`wiki/GEMS-StateHub-TypeRegistry.md`.
|
|
|
|
**Created:** 2026-03-02
|
|
**Author:** Custodian (analytical session)
|
|
|
|
---
|
|
|
|
## Migration Options Under Consideration
|
|
|
|
Before the SWOT, three architectural options are in scope:
|
|
|
|
**Option A — Full Generic Entity Model**
|
|
Single `entities` table + `attachments` junction + JSONB payload. True GEMS
|
|
implementation. All current typed tables dissolved into the entity model.
|
|
|
|
**Option B — Typed-Table Approach with GEMS Constraints**
|
|
Keep typed tables (domains, topics, workstreams, etc.) but add:
|
|
- A universal `entity_id` abstraction layer
|
|
- An `attachments` junction table for secondary attachments
|
|
- Application-level GEMS constraint validation
|
|
- Fix all structural inconsistencies (I-1 through I-8 in CUST-WP-0006)
|
|
|
|
**Option C — Incremental Normalization (Pattern C from GEMS §9)**
|
|
Fix the most critical inconsistencies immediately (I-1, I-2, I-5), leave
|
|
lesser items wrapped/deferred. No generic entity table introduced.
|
|
|
|
---
|
|
|
|
## SWOT Analysis
|
|
|
|
### Strengths
|
|
|
|
**S1 — Uniform modeling surface eliminates special-casing (all options)**
|
|
Currently each entity type has bespoke FKs, bespoke routers, and bespoke MCP
|
|
tools. GEMS gives a predictable pattern: every entity has a primary context and
|
|
optional secondaries. New entity types follow the same pattern with zero
|
|
schema design work.
|
|
|
|
**S2 — Fixes real, observable bugs (Options B and C)**
|
|
The domain string inconsistency (I-1) causes SBOM and EP/TD dashboard views to
|
|
silently display wrong or missing domain associations. The Workstream/Topic
|
|
container mismatch (I-2) causes domain attribution to fail in the Dependencies
|
|
view. These are current user-visible defects — migration resolves them.
|
|
|
|
**S3 — ADR-001 alignment (Options B and C)**
|
|
ADR-001 mandates that workstreams originate in repos. The current schema forces
|
|
workstreams under Topics. Migrating Workstream.primary → Repository would bring
|
|
the schema into conformance with the governing ADR.
|
|
|
|
**S4 — Enables first-class graph queries (Option A, partially B)**
|
|
With Relations as first-class entities, queries like "what decisions influenced
|
|
which tasks?" or "what dependencies cross domain boundaries?" become uniform
|
|
and indexable. Currently these require ad-hoc multi-table joins.
|
|
|
|
**S5 — Incremental migration is supported by the model (all options)**
|
|
GEMS §9 explicitly defines integration patterns for existing systems. Pattern C
|
|
(progressive normalization) allows working systems to remain stable while the
|
|
most valuable types are migrated first.
|
|
|
|
**S6 — Future-proofs multi-domain cross-system queries**
|
|
As more repositories are registered and domains become interdependent, the
|
|
current schema's inconsistencies compound. GEMS alignment now prevents
|
|
exponential complexity accumulation.
|
|
|
|
---
|
|
|
|
### Weaknesses
|
|
|
|
**W1 — Option A requires full schema rewrite (high risk)**
|
|
Dissolving typed tables into a generic entity model means every router, every
|
|
MCP tool, every dashboard data loader, and every Alembic migration must be
|
|
rewritten. This is weeks of work with high regression risk.
|
|
|
|
**W2 — Loss of SQL-level type safety (Option A)**
|
|
Typed tables give the database schema as documentation and enforce type-correct
|
|
relations at the DB constraint level (FK types, enum columns). A generic entity
|
|
table with JSONB payloads moves type enforcement to the application layer, which
|
|
is easier to break silently.
|
|
|
|
**W3 — GEMS does not define a concrete SQL schema**
|
|
The GEMS document is conceptual. Translating the attachment list model into
|
|
PostgreSQL requires design decisions (indexed JSONB vs. junction table, UUID
|
|
ordering, etc.) that are not trivial and have performance implications.
|
|
|
|
**W4 — ProgressEvent's multi-attach pattern doesn't map cleanly to GEMS**
|
|
ProgressEvent's current schema (nullable topic_id, workstream_id, task_id,
|
|
decision_id) is intentionally flexible for an append-only log. GEMS's "exactly
|
|
one primary attachment" rule may force awkward choices (e.g. always using
|
|
Workstream as primary even for domain-level events).
|
|
|
|
**W5 — Ecosystem root is of uncertain value**
|
|
Adding an explicit Ecosystem singleton adds ceremony for little practical query
|
|
benefit in the current six-domain setup. It may become valuable when the system
|
|
grows to multi-tenant or multi-ecosystem scope, but is premature now.
|
|
|
|
---
|
|
|
|
### Opportunities
|
|
|
|
**O1 — Snapshot diffing for SBOM (SBOMSnapshot entity)**
|
|
Adding a SBOMSnapshot container (resolves I-5) enables: "what packages were
|
|
added/removed between ingests?" This is a direct user value feature, not just
|
|
architectural cleanup.
|
|
|
|
**O2 — Unified contribution and decision provenance graph**
|
|
With Relation entities, you can model "Decision D motivates Workstream W" or
|
|
"Contribution C implements Decision D" as queryable, auditable edges. This is the
|
|
foundation for a richer Custodian agent that can reason about the provenance of
|
|
work items.
|
|
|
|
**O3 — Generic dashboard patterns**
|
|
Once GEMS is in place, dashboard pages can share a single entity-browsing
|
|
component rather than one bespoke page per entity type. This reduces UI technical
|
|
debt significantly.
|
|
|
|
**O4 — Enabling cross-repo task relations (DependsOn at Repository scope)**
|
|
With Relations as first-class, it becomes natural to register "Task A in repo X
|
|
blocks Task B in repo Y" — a cross-repo dependency that the current
|
|
WorkstreamDependency table cannot model.
|
|
|
|
**O5 — Type registry as a self-documenting schema**
|
|
A GEMS Type Registry is human-readable, machine-validatable, and version-controlled.
|
|
It replaces the current implicit understanding of "what can be attached to what"
|
|
with an explicit contract.
|
|
|
|
---
|
|
|
|
### Threats
|
|
|
|
**T1 — Risk of over-engineering a working system**
|
|
The state-hub currently works well enough for its intended read-model role. A
|
|
full schema rewrite to achieve theoretical elegance could introduce regressions,
|
|
stall other domain work, and deliver minimal user-visible value in the short term.
|
|
|
|
**T2 — ADR-001 workplan file format would need updates**
|
|
If Workstream moves from Topic to Repository as its primary container, every
|
|
existing workplan frontmatter field (`topic_slug`) would need to become or add
|
|
`repo_slug`. All workplan files across all registered repos require updating.
|
|
|
|
**T3 — Hybrid state during incremental migration is confusing**
|
|
Pattern C leaves the system in a mixed state for an extended period: some
|
|
entities are GEMS-conformant, others are legacy. Tooling must handle both
|
|
shapes simultaneously, increasing maintenance burden.
|
|
|
|
**T4 — Dashboard rewrites could introduce new bugs**
|
|
The dashboard is the primary UI for the hub. Rewriting data loaders and query
|
|
patterns risks introducing visual regressions that would go unnoticed without a
|
|
test suite (there is currently none for the dashboard).
|
|
|
|
**T5 — No migration dry-run tooling exists**
|
|
The current `make sync-workplans` doesn't exist yet (CUST-WP future deliverable).
|
|
Running migrations against production data without a rollback path is risky.
|
|
|
|
---
|
|
|
|
## Verdict and Recommended Path
|
|
|
|
**Recommended: Option C — Incremental Normalization**
|
|
|
|
Proceed in three targeted passes, each independently releasable:
|
|
|
|
**Pass 1 — Fix structural inconsistencies (I-1, I-6): low risk, high consistency gain**
|
|
- Migrate `ExtensionPoint.domain` (String) → `domain_id` FK + back-fill
|
|
- Migrate `TechnicalDebt.domain` (String) → `domain_id` FK + back-fill
|
|
- Add `repo_id` FK to `Contribution` (nullable initially)
|
|
- This pass has zero API breaking changes; only DB schema and router filter logic change.
|
|
|
|
**Pass 2 — Align Workstream with ADR-001 (I-2): medium risk, architectural gain**
|
|
- Add `repo_id` FK to `Workstream` (nullable initially, then enforce)
|
|
- Update MCP `create_workstream` to require `repo_id`
|
|
- Update workplan frontmatter format to include `repo_slug`
|
|
- Migrate `dependencies.md` to use `repo` instead of `topic` for domain resolution
|
|
- Decision DEC-GEMS-002 must be resolved before this pass begins
|
|
|
|
**Pass 3 — Add SBOMSnapshot container (I-5): medium risk, feature gain**
|
|
- Add `sbom_snapshots` table + FK from `sbom_entries`
|
|
- Update ingest API to create/find snapshot per repo+timestamp
|
|
- Enable snapshot history and diff queries in SBOM dashboard
|
|
- Decision DEC-GEMS-004 must be resolved before this pass begins
|
|
|
|
**Deferred:** Full generic entity model (Option A), Ecosystem root (I-7),
|
|
DependsOn as first-class Relation (I-8), ManagedRepo.topic_id cleanup (I-4).
|
|
These are tracked as extension points; revisit after Passes 1-3 are stable.
|
|
|
|
---
|
|
|
|
## Decision Dependency Map
|
|
|
|
```
|
|
DEC-GEMS-001 (architecture) ──────────────────────────────────► Pass 3+
|
|
DEC-GEMS-002 (workstream/topic vs repo) ──────────────────────► Pass 2
|
|
DEC-GEMS-003 (domain string → FK) ────────────────────────────► Pass 1
|
|
DEC-GEMS-004 (SBOMSnapshot container) ────────────────────────► Pass 3
|
|
DEC-GEMS-005 (Ecosystem root) ─────────────────────────────────► Deferred
|
|
DEC-GEMS-006 (DependsOn as Relation entity) ───────────────────► Deferred
|
|
```
|
|
|
|
Pass 1 can begin as soon as DEC-GEMS-003 is resolved (expected: trivially yes).
|
|
Pass 2 requires DEC-GEMS-002 resolution (breaking change; needs explicit approval).
|
|
Pass 3 requires DEC-GEMS-004 resolution.
|