Files

tegwick 8ab6e6c9c5 feat(gems): three-pass schema migration aligning state-hub with GEMS

Implements CUST-WP-0007. Resolves inconsistencies I-1, I-2, I-5, I-6
identified in the GEMS audit (GenericEntityModellingSystem.md).

Pass 1 (e1f2a3b4c5d6): domain_id FK on extension_points and
technical_debt (replaces raw string column); repo_id FK on contributions.
Fixes domain-filtering bugs in EP/TD dashboard pages.

Pass 2 (f2a3b4c5d6e7): repo_id nullable FK on workstreams, aligning
the GEMS primary attachment with ADR-001 (repo > topic). Dashboard
pages updated to prefer repo->domain over topic->domain.

Pass 3 (a3b4c5d6e7f8): SBOMSnapshot container entity (GEMS Complex
between Repository and SBOMEntry). Ingest is now additive — each call
creates a new snapshot; history is retained. List/report endpoints
filter to latest snapshot per repo via _latest_snapshot_ids_subquery().
New endpoints: GET /sbom/snapshots/, GET /sbom/snapshots/{id}/.
Dashboard gains a Snapshot History section.

Also adds GEMS analysis artefacts: wiki/GEMS-StateHub-TypeRegistry.md,
wiki/GEMS-StateHub-SWOT.md, workplans/CUST-WP-0006 (analysis),
workplans/CUST-WP-0007 (migration, now completed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-02 23:39:17 +01:00

11 KiB

Raw Blame History

Generic entity modeling system

A domain-agnostic data modeling system for organizing “entities under management” in a rigorous, flexible, and extensible way.

Goals

Rigorous: clear invariants, predictable querying, safe evolution.
Flexible: new entity types and new relations without migrations that rewrite everything.
Extensible: supports multiple domains, sub-domains, and incremental adoption over existing data.

1. Core concepts

1.1 Entity

An Entity is the atomic unit of identity and lifecycle.

Entity fields (conceptual)

id (immutable unique identifier)
kind ∈ {Atom, Complex, Relation}
type (domain-specific type name, e.g. Task, Repository, Customer)
payload (type-specific attributes; ideally versioned)
attachments (ordered list of entity references)
meta (timestamps, version, permissions, provenance)

Entity kinds

Atom: primary facts / content objects.
Complex: organizational containers and structure owners (hierarchy, collections, indexes, contexts).
Relation: first-class edge object that encodes a relationship between entities; owned by a Complex.

2. Attachments

2.1 Attachment list

Every entity has an ordered list:

attachments: [EntityRef]
attachments[0] is the Primary Attachment.

Derived notion: “Part-of”

If an entity’s primary attachment is an Atom, then the entity is a Part of that Atom.

This is a classification derived from data, not a separate stored relation.

2.2 Attachment roles (recommended)

To avoid ambiguity and allow validation, each attachment can optionally have a role label.

Conceptually:

attachment = { targetId, position, role }

Common roles:

primary (implicit by position 0)
index (entity appears in this complex for navigation/search)
provenance (source reference)
tag (classification)
context (additional scope)

Ordering remains canonical; roles improve clarity and constraints.

3. Hierarchy and layering

3.1 Primary chain

The Primary Chain of an entity is obtained by repeatedly following attachments[0].

Invariant (recommended): the primary chain must be acyclic.

This yields a robust layering model:

Every entity “lives in” a context (a Complex), or is “part of” an Atom.
You can always answer: “Where does this belong?” by walking the primary chain.

3.2 Roots and scopes

A system should define at least one root Complex (e.g. Ecosystem, Workspace, Tenant).

All managed entities must be reachable from a root by following primary attachments.

4. Relations as first-class entities

4.1 Relation entity

A Relation is an entity whose purpose is to define a connection among other entities.

Key rule

A Relation’s primary attachment MUST be a Complex. That Complex is the relation-space (the context that “owns” the relationship).

This avoids “atoms knowing” relation details: atoms remain content, complexes and relations hold structure.

4.2 Relation endpoints convention

To make relations queryable and consistent, standardize attachment slots:

attachments[0] = contextComplex (primary; relation-space)
attachments[1] = fromEndpoint
attachments[2] = toEndpoint
attachments[3..] = optional extra endpoints (evidence, via, stakeholder, etc.)

Relation semantics live in:

type (e.g. DependsOn, Implements, References)
and/or payload fields like { relType: "...", strength: ..., rationale: ... }

5. Type system and constraints

5.1 Entity Type Registry

Maintain a registry of types describing:

kind: Atom/Complex/Relation
allowed primary attachment kinds/types
allowed secondary attachment kinds/types
payload schema (optional but recommended)
indexing / query defaults

Example (conceptual):

Task: kind=Atom, primary must be Repository (Complex)
Repository: kind=Complex, primary must be Domain (Complex)

5.2 Validation invariants (recommended minimum)

Exactly one primary attachment (position 0).
Primary chain must be acyclic.
Primary attachment kind/type constraints must match the registry.
Context-consistency constraints for organizer complexes:
- if Task has a secondary attachment to Workstream, then Task.primary == Workstream.primary (same repository).
Relation constraints:
- primary must be Complex
- endpoint types must match relation type definition
- relation context must match endpoint context rules (usually same repo/domain)

These constraints give rigor without hard-coding a single domain model.

6. Query model (domain-agnostic)

These queries exist in any domain:

6.1 Locate context

context(entity) = walk primary chain to root, or to the nearest scope boundary (e.g. nearest Domain/Workspace).

6.2 Membership

Members of a Complex: all entities with primary == complexId.

6.3 Parts of an Atom

Parts of an Atom: all entities with primary == atomId.

6.4 Relations in a relation-space

Relations owned by a Complex: all Relation entities with primary == complexId.

6.5 Neighborhood (graph view)

For entity X: find all relations in the same relation-space where X appears as endpoint.

7. Example domain: Ecosystem → Domain → Repository → Workstreams/SBOMs

This section makes the system concrete using your types.

7.1 Complexes

Ecosystem (Complex, root)
Domain (Complex, primary = Ecosystem)
Repository (Complex, primary = Domain)
Workstream (Complex, primary = Repository) — organizes work items
SBOM (Complex, primary = Repository) — organizes dependencies

7.2 Atoms

Decision (Atom, primary = Repository)
Task (Atom, primary = Repository)
TechDebt (Atom, primary = Repository)
Extend (Atom, primary = Repository)
Dependency (Atom, primary = Repository)

7.3 Organizing via secondary attachments

A Task in a Workstream:
- Task.attachments = [Repo42, Workstream7]
A Dependency in an SBOM:
- Dependency.attachments = [Repo42, Sbom3]

Atoms remain ignorant of how the workstream orders tasks; the workstream can store structure.

7.4 Relation examples

Task → Task dependency (repo-scoped)

Relation type: DependsOn (Relation)

DependsOn.attachments = [Repo42, TaskA, TaskB]
payload: { critical: true, reason: "API contract needed first" }

Decision influences tasks (repo-scoped)

Relation type: Motivates (Relation)

Motivates.attachments = [Repo42, Decision9, TaskA]

Dependency graph inside an SBOM (sbom-scoped)

Relation type: Requires (Relation)

Requires.attachments = [Sbom3, DependencyX, DependencyY]
payload: { scope: "runtime" }

This cleanly separates:

planning relations (Repo relation-space)
supply-chain relations (SBOM relation-space)

8. Applying the modeling system to a new domain

You can apply this to any domain by following a small method.

8.1 Step-by-step method

Step 1 — Choose a root Complex

Pick the top-level scope:

Workspace, Tenant, Organization, Ecosystem, etc.

Step 2 — Identify “containers” vs “content”

Containers become Complexes (projects, folders, accounts, repositories, case files).
Content objects become Atoms (documents, customers, invoices, tickets, assets).

Rule of thumb:

If it organizes others or defines a scope, it’s a Complex.
If it’s a “thing” with intrinsic content/lifecycle, it’s an Atom.

Step 3 — Define the primary hierarchy (layering)

Decide what “belongs to what” as the default place where entities live. Example pattern:

Atom.primary = nearest containing Complex

Step 4 — Define organizer complexes (optional)

Introduce complexes like Workstream, Board, Collection, SBOM, Timeline that provide structure. Use secondary attachments from atoms to these complexes.

Step 5 — Define relation-spaces

Choose where relations live:

typically in the “owning” complex (project/repo/case)
sometimes in a specialized complex (SBOM, timeline, graph)

Step 6 — Create a Type Registry + constraints

For each type, specify:

kind
required primary attachment type(s)
optional secondary attachment types
allowed relation endpoints (if relation type)

Step 7 — Migrate incrementally

Start with primary attachments and identity first. Add organizer complexes and relations later without breaking identity.

9. Applying it to an existing domain with pre-existing entities

The key is to wrap existing entities as Entities in this system without rewriting them all at once.

9.1 Integration patterns

Pattern A — “Entity wrapper” over existing tables/documents

Keep existing storage unchanged.
Create an Entity record that references external storage:
- payload contains { externalType, externalId, sourceSystem }
Attachments, relations, and organization are managed in the new layer.

This is the safest “overlay” approach.

Pattern B — “Dual write” for new objects

New entities are created in the new model as the source of truth.
Optionally mirrored into legacy storage for compatibility.

Pattern C — “Progressive normalization”

Start overlay-style.
Gradually move the most valuable types (e.g., Tasks, Decisions) into native entities.
Leave rarely touched legacy objects wrapped indefinitely.

9.2 Migration steps for existing data

Assign stable IDs
- If legacy IDs exist, reuse them with a namespace prefix.
Create root complexes
- e.g. one Ecosystem or per-tenant Workspace.
Attach existing entities to a primary context
- even if initially coarse (everything attaches to one domain/project).
Introduce finer complexes
- split into domains, repos/projects later by moving primary attachments.
Add relations incrementally
- create relation entities for the relationships you query most.
Backfill organizer complexes
- workstreams, boards, SBOMs, etc., via secondary attachments.

Because relations and organization are additive, you can evolve structure without breaking identity.

10. What this system buys you

A uniform modeling surface across domains.
A clean separation of content (atoms) from structure (complexes + relations).
Multiple overlapping organizations via secondary attachments without duplication.
First-class relationships with auditability and contextual ownership.
Incremental adoption over legacy systems.

Extension Points

This could be turned into a compact “spec” format (like a small RFC) plus a concrete “Type Registry” table for your example (including recommended relation types and constraints).

xxx

11 KiB Raw Blame History Unescape Escape