Files
the-custodian/wiki/GEMS-StateHub-SWOT.md
tegwick 8ab6e6c9c5 feat(gems): three-pass schema migration aligning state-hub with GEMS
Implements CUST-WP-0007. Resolves inconsistencies I-1, I-2, I-5, I-6
identified in the GEMS audit (GenericEntityModellingSystem.md).

Pass 1 (e1f2a3b4c5d6): domain_id FK on extension_points and
technical_debt (replaces raw string column); repo_id FK on contributions.
Fixes domain-filtering bugs in EP/TD dashboard pages.

Pass 2 (f2a3b4c5d6e7): repo_id nullable FK on workstreams, aligning
the GEMS primary attachment with ADR-001 (repo > topic). Dashboard
pages updated to prefer repo->domain over topic->domain.

Pass 3 (a3b4c5d6e7f8): SBOMSnapshot container entity (GEMS Complex
between Repository and SBOMEntry). Ingest is now additive — each call
creates a new snapshot; history is retained. List/report endpoints
filter to latest snapshot per repo via _latest_snapshot_ids_subquery().
New endpoints: GET /sbom/snapshots/, GET /sbom/snapshots/{id}/.
Dashboard gains a Snapshot History section.

Also adds GEMS analysis artefacts: wiki/GEMS-StateHub-TypeRegistry.md,
wiki/GEMS-StateHub-SWOT.md, workplans/CUST-WP-0006 (analysis),
workplans/CUST-WP-0007 (migration, now completed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 23:39:17 +01:00

206 lines
9.5 KiB
Markdown

# SWOT Analysis — Migrating State-Hub to GEMS
Evaluation of migrating the Custodian State Hub data store from its current
ad-hoc relational schema to the Generic Entity Modelling System (GEMS) as
defined in `wiki/GenericEntityModellingSystem.md` and instantiated in
`wiki/GEMS-StateHub-TypeRegistry.md`.
**Created:** 2026-03-02
**Author:** Custodian (analytical session)
---
## Migration Options Under Consideration
Before the SWOT, three architectural options are in scope:
**Option A — Full Generic Entity Model**
Single `entities` table + `attachments` junction + JSONB payload. True GEMS
implementation. All current typed tables dissolved into the entity model.
**Option B — Typed-Table Approach with GEMS Constraints**
Keep typed tables (domains, topics, workstreams, etc.) but add:
- A universal `entity_id` abstraction layer
- An `attachments` junction table for secondary attachments
- Application-level GEMS constraint validation
- Fix all structural inconsistencies (I-1 through I-8 in CUST-WP-0006)
**Option C — Incremental Normalization (Pattern C from GEMS §9)**
Fix the most critical inconsistencies immediately (I-1, I-2, I-5), leave
lesser items wrapped/deferred. No generic entity table introduced.
---
## SWOT Analysis
### Strengths
**S1 — Uniform modeling surface eliminates special-casing (all options)**
Currently each entity type has bespoke FKs, bespoke routers, and bespoke MCP
tools. GEMS gives a predictable pattern: every entity has a primary context and
optional secondaries. New entity types follow the same pattern with zero
schema design work.
**S2 — Fixes real, observable bugs (Options B and C)**
The domain string inconsistency (I-1) causes SBOM and EP/TD dashboard views to
silently display wrong or missing domain associations. The Workstream/Topic
container mismatch (I-2) causes domain attribution to fail in the Dependencies
view. These are current user-visible defects — migration resolves them.
**S3 — ADR-001 alignment (Options B and C)**
ADR-001 mandates that workstreams originate in repos. The current schema forces
workstreams under Topics. Migrating Workstream.primary → Repository would bring
the schema into conformance with the governing ADR.
**S4 — Enables first-class graph queries (Option A, partially B)**
With Relations as first-class entities, queries like "what decisions influenced
which tasks?" or "what dependencies cross domain boundaries?" become uniform
and indexable. Currently these require ad-hoc multi-table joins.
**S5 — Incremental migration is supported by the model (all options)**
GEMS §9 explicitly defines integration patterns for existing systems. Pattern C
(progressive normalization) allows working systems to remain stable while the
most valuable types are migrated first.
**S6 — Future-proofs multi-domain cross-system queries**
As more repositories are registered and domains become interdependent, the
current schema's inconsistencies compound. GEMS alignment now prevents
exponential complexity accumulation.
---
### Weaknesses
**W1 — Option A requires full schema rewrite (high risk)**
Dissolving typed tables into a generic entity model means every router, every
MCP tool, every dashboard data loader, and every Alembic migration must be
rewritten. This is weeks of work with high regression risk.
**W2 — Loss of SQL-level type safety (Option A)**
Typed tables give the database schema as documentation and enforce type-correct
relations at the DB constraint level (FK types, enum columns). A generic entity
table with JSONB payloads moves type enforcement to the application layer, which
is easier to break silently.
**W3 — GEMS does not define a concrete SQL schema**
The GEMS document is conceptual. Translating the attachment list model into
PostgreSQL requires design decisions (indexed JSONB vs. junction table, UUID
ordering, etc.) that are not trivial and have performance implications.
**W4 — ProgressEvent's multi-attach pattern doesn't map cleanly to GEMS**
ProgressEvent's current schema (nullable topic_id, workstream_id, task_id,
decision_id) is intentionally flexible for an append-only log. GEMS's "exactly
one primary attachment" rule may force awkward choices (e.g. always using
Workstream as primary even for domain-level events).
**W5 — Ecosystem root is of uncertain value**
Adding an explicit Ecosystem singleton adds ceremony for little practical query
benefit in the current six-domain setup. It may become valuable when the system
grows to multi-tenant or multi-ecosystem scope, but is premature now.
---
### Opportunities
**O1 — Snapshot diffing for SBOM (SBOMSnapshot entity)**
Adding a SBOMSnapshot container (resolves I-5) enables: "what packages were
added/removed between ingests?" This is a direct user value feature, not just
architectural cleanup.
**O2 — Unified contribution and decision provenance graph**
With Relation entities, you can model "Decision D motivates Workstream W" or
"Contribution C implements Decision D" as queryable, auditable edges. This is the
foundation for a richer Custodian agent that can reason about the provenance of
work items.
**O3 — Generic dashboard patterns**
Once GEMS is in place, dashboard pages can share a single entity-browsing
component rather than one bespoke page per entity type. This reduces UI technical
debt significantly.
**O4 — Enabling cross-repo task relations (DependsOn at Repository scope)**
With Relations as first-class, it becomes natural to register "Task A in repo X
blocks Task B in repo Y" — a cross-repo dependency that the current
WorkstreamDependency table cannot model.
**O5 — Type registry as a self-documenting schema**
A GEMS Type Registry is human-readable, machine-validatable, and version-controlled.
It replaces the current implicit understanding of "what can be attached to what"
with an explicit contract.
---
### Threats
**T1 — Risk of over-engineering a working system**
The state-hub currently works well enough for its intended read-model role. A
full schema rewrite to achieve theoretical elegance could introduce regressions,
stall other domain work, and deliver minimal user-visible value in the short term.
**T2 — ADR-001 workplan file format would need updates**
If Workstream moves from Topic to Repository as its primary container, every
existing workplan frontmatter field (`topic_slug`) would need to become or add
`repo_slug`. All workplan files across all registered repos require updating.
**T3 — Hybrid state during incremental migration is confusing**
Pattern C leaves the system in a mixed state for an extended period: some
entities are GEMS-conformant, others are legacy. Tooling must handle both
shapes simultaneously, increasing maintenance burden.
**T4 — Dashboard rewrites could introduce new bugs**
The dashboard is the primary UI for the hub. Rewriting data loaders and query
patterns risks introducing visual regressions that would go unnoticed without a
test suite (there is currently none for the dashboard).
**T5 — No migration dry-run tooling exists**
The current `make sync-workplans` doesn't exist yet (CUST-WP future deliverable).
Running migrations against production data without a rollback path is risky.
---
## Verdict and Recommended Path
**Recommended: Option C — Incremental Normalization**
Proceed in three targeted passes, each independently releasable:
**Pass 1 — Fix structural inconsistencies (I-1, I-6): low risk, high consistency gain**
- Migrate `ExtensionPoint.domain` (String) → `domain_id` FK + back-fill
- Migrate `TechnicalDebt.domain` (String) → `domain_id` FK + back-fill
- Add `repo_id` FK to `Contribution` (nullable initially)
- This pass has zero API breaking changes; only DB schema and router filter logic change.
**Pass 2 — Align Workstream with ADR-001 (I-2): medium risk, architectural gain**
- Add `repo_id` FK to `Workstream` (nullable initially, then enforce)
- Update MCP `create_workstream` to require `repo_id`
- Update workplan frontmatter format to include `repo_slug`
- Migrate `dependencies.md` to use `repo` instead of `topic` for domain resolution
- Decision DEC-GEMS-002 must be resolved before this pass begins
**Pass 3 — Add SBOMSnapshot container (I-5): medium risk, feature gain**
- Add `sbom_snapshots` table + FK from `sbom_entries`
- Update ingest API to create/find snapshot per repo+timestamp
- Enable snapshot history and diff queries in SBOM dashboard
- Decision DEC-GEMS-004 must be resolved before this pass begins
**Deferred:** Full generic entity model (Option A), Ecosystem root (I-7),
DependsOn as first-class Relation (I-8), ManagedRepo.topic_id cleanup (I-4).
These are tracked as extension points; revisit after Passes 1-3 are stable.
---
## Decision Dependency Map
```
DEC-GEMS-001 (architecture) ──────────────────────────────────► Pass 3+
DEC-GEMS-002 (workstream/topic vs repo) ──────────────────────► Pass 2
DEC-GEMS-003 (domain string → FK) ────────────────────────────► Pass 1
DEC-GEMS-004 (SBOMSnapshot container) ────────────────────────► Pass 3
DEC-GEMS-005 (Ecosystem root) ─────────────────────────────────► Deferred
DEC-GEMS-006 (DependsOn as Relation entity) ───────────────────► Deferred
```
Pass 1 can begin as soon as DEC-GEMS-003 is resolved (expected: trivially yes).
Pass 2 requires DEC-GEMS-002 resolution (breaking change; needs explicit approval).
Pass 3 requires DEC-GEMS-004 resolution.