Files
the-custodian/wiki/GenericEntityModellingSystem.md
tegwick 8ab6e6c9c5 feat(gems): three-pass schema migration aligning state-hub with GEMS
Implements CUST-WP-0007. Resolves inconsistencies I-1, I-2, I-5, I-6
identified in the GEMS audit (GenericEntityModellingSystem.md).

Pass 1 (e1f2a3b4c5d6): domain_id FK on extension_points and
technical_debt (replaces raw string column); repo_id FK on contributions.
Fixes domain-filtering bugs in EP/TD dashboard pages.

Pass 2 (f2a3b4c5d6e7): repo_id nullable FK on workstreams, aligning
the GEMS primary attachment with ADR-001 (repo > topic). Dashboard
pages updated to prefer repo->domain over topic->domain.

Pass 3 (a3b4c5d6e7f8): SBOMSnapshot container entity (GEMS Complex
between Repository and SBOMEntry). Ingest is now additive — each call
creates a new snapshot; history is retained. List/report endpoints
filter to latest snapshot per repo via _latest_snapshot_ids_subquery().
New endpoints: GET /sbom/snapshots/, GET /sbom/snapshots/{id}/.
Dashboard gains a Snapshot History section.

Also adds GEMS analysis artefacts: wiki/GEMS-StateHub-TypeRegistry.md,
wiki/GEMS-StateHub-SWOT.md, workplans/CUST-WP-0006 (analysis),
workplans/CUST-WP-0007 (migration, now completed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 23:39:17 +01:00

11 KiB
Raw Blame History

Generic entity modeling system

A domain-agnostic data modeling system for organizing “entities under management” in a rigorous, flexible, and extensible way.

Goals

  • Rigorous: clear invariants, predictable querying, safe evolution.
  • Flexible: new entity types and new relations without migrations that rewrite everything.
  • Extensible: supports multiple domains, sub-domains, and incremental adoption over existing data.

1. Core concepts

1.1 Entity

An Entity is the atomic unit of identity and lifecycle.

Entity fields (conceptual)

  • id (immutable unique identifier)
  • kind ∈ {Atom, Complex, Relation}
  • type (domain-specific type name, e.g. Task, Repository, Customer)
  • payload (type-specific attributes; ideally versioned)
  • attachments (ordered list of entity references)
  • meta (timestamps, version, permissions, provenance)

Entity kinds

  • Atom: primary facts / content objects.
  • Complex: organizational containers and structure owners (hierarchy, collections, indexes, contexts).
  • Relation: first-class edge object that encodes a relationship between entities; owned by a Complex.

2. Attachments

2.1 Attachment list

Every entity has an ordered list:

  • attachments: [EntityRef]
  • attachments[0] is the Primary Attachment.

Derived notion: “Part-of”

If an entitys primary attachment is an Atom, then the entity is a Part of that Atom.

This is a classification derived from data, not a separate stored relation.

To avoid ambiguity and allow validation, each attachment can optionally have a role label.

Conceptually:

  • attachment = { targetId, position, role }

Common roles:

  • primary (implicit by position 0)
  • index (entity appears in this complex for navigation/search)
  • provenance (source reference)
  • tag (classification)
  • context (additional scope)

Ordering remains canonical; roles improve clarity and constraints.


3. Hierarchy and layering

3.1 Primary chain

The Primary Chain of an entity is obtained by repeatedly following attachments[0].

Invariant (recommended): the primary chain must be acyclic.

This yields a robust layering model:

  • Every entity “lives in” a context (a Complex), or is “part of” an Atom.
  • You can always answer: “Where does this belong?” by walking the primary chain.

3.2 Roots and scopes

A system should define at least one root Complex (e.g. Ecosystem, Workspace, Tenant).

All managed entities must be reachable from a root by following primary attachments.


4. Relations as first-class entities

4.1 Relation entity

A Relation is an entity whose purpose is to define a connection among other entities.

Key rule

  • A Relations primary attachment MUST be a Complex. That Complex is the relation-space (the context that “owns” the relationship).

This avoids “atoms knowing” relation details: atoms remain content, complexes and relations hold structure.

4.2 Relation endpoints convention

To make relations queryable and consistent, standardize attachment slots:

  • attachments[0] = contextComplex (primary; relation-space)
  • attachments[1] = fromEndpoint
  • attachments[2] = toEndpoint
  • attachments[3..] = optional extra endpoints (evidence, via, stakeholder, etc.)

Relation semantics live in:

  • type (e.g. DependsOn, Implements, References)
  • and/or payload fields like { relType: "...", strength: ..., rationale: ... }

5. Type system and constraints

5.1 Entity Type Registry

Maintain a registry of types describing:

  • kind: Atom/Complex/Relation
  • allowed primary attachment kinds/types
  • allowed secondary attachment kinds/types
  • payload schema (optional but recommended)
  • indexing / query defaults

Example (conceptual):

  • Task: kind=Atom, primary must be Repository (Complex)
  • Repository: kind=Complex, primary must be Domain (Complex)
  1. Exactly one primary attachment (position 0).

  2. Primary chain must be acyclic.

  3. Primary attachment kind/type constraints must match the registry.

  4. Context-consistency constraints for organizer complexes:

    • if Task has a secondary attachment to Workstream, then Task.primary == Workstream.primary (same repository).
  5. Relation constraints:

    • primary must be Complex
    • endpoint types must match relation type definition
    • relation context must match endpoint context rules (usually same repo/domain)

These constraints give rigor without hard-coding a single domain model.


6. Query model (domain-agnostic)

These queries exist in any domain:

6.1 Locate context

  • context(entity) = walk primary chain to root, or to the nearest scope boundary (e.g. nearest Domain/Workspace).

6.2 Membership

  • Members of a Complex: all entities with primary == complexId.

6.3 Parts of an Atom

  • Parts of an Atom: all entities with primary == atomId.

6.4 Relations in a relation-space

  • Relations owned by a Complex: all Relation entities with primary == complexId.

6.5 Neighborhood (graph view)

  • For entity X: find all relations in the same relation-space where X appears as endpoint.

7. Example domain: Ecosystem → Domain → Repository → Workstreams/SBOMs

This section makes the system concrete using your types.

7.1 Complexes

  • Ecosystem (Complex, root)
  • Domain (Complex, primary = Ecosystem)
  • Repository (Complex, primary = Domain)
  • Workstream (Complex, primary = Repository) — organizes work items
  • SBOM (Complex, primary = Repository) — organizes dependencies

7.2 Atoms

  • Decision (Atom, primary = Repository)
  • Task (Atom, primary = Repository)
  • TechDebt (Atom, primary = Repository)
  • Extend (Atom, primary = Repository)
  • Dependency (Atom, primary = Repository)

7.3 Organizing via secondary attachments

  • A Task in a Workstream:

    • Task.attachments = [Repo42, Workstream7]
  • A Dependency in an SBOM:

    • Dependency.attachments = [Repo42, Sbom3]

Atoms remain ignorant of how the workstream orders tasks; the workstream can store structure.

7.4 Relation examples

Task → Task dependency (repo-scoped)

Relation type: DependsOn (Relation)

  • DependsOn.attachments = [Repo42, TaskA, TaskB]
  • payload: { critical: true, reason: "API contract needed first" }

Decision influences tasks (repo-scoped)

Relation type: Motivates (Relation)

  • Motivates.attachments = [Repo42, Decision9, TaskA]

Dependency graph inside an SBOM (sbom-scoped)

Relation type: Requires (Relation)

  • Requires.attachments = [Sbom3, DependencyX, DependencyY]
  • payload: { scope: "runtime" }

This cleanly separates:

  • planning relations (Repo relation-space)
  • supply-chain relations (SBOM relation-space)

8. Applying the modeling system to a new domain

You can apply this to any domain by following a small method.

8.1 Step-by-step method

Step 1 — Choose a root Complex

Pick the top-level scope:

  • Workspace, Tenant, Organization, Ecosystem, etc.

Step 2 — Identify “containers” vs “content”

  • Containers become Complexes (projects, folders, accounts, repositories, case files).
  • Content objects become Atoms (documents, customers, invoices, tickets, assets).

Rule of thumb:

  • If it organizes others or defines a scope, its a Complex.
  • If its a “thing” with intrinsic content/lifecycle, its an Atom.

Step 3 — Define the primary hierarchy (layering)

Decide what “belongs to what” as the default place where entities live. Example pattern:

  • Atom.primary = nearest containing Complex

Step 4 — Define organizer complexes (optional)

Introduce complexes like Workstream, Board, Collection, SBOM, Timeline that provide structure. Use secondary attachments from atoms to these complexes.

Step 5 — Define relation-spaces

Choose where relations live:

  • typically in the “owning” complex (project/repo/case)
  • sometimes in a specialized complex (SBOM, timeline, graph)

Step 6 — Create a Type Registry + constraints

For each type, specify:

  • kind
  • required primary attachment type(s)
  • optional secondary attachment types
  • allowed relation endpoints (if relation type)

Step 7 — Migrate incrementally

Start with primary attachments and identity first. Add organizer complexes and relations later without breaking identity.


9. Applying it to an existing domain with pre-existing entities

The key is to wrap existing entities as Entities in this system without rewriting them all at once.

9.1 Integration patterns

Pattern A — “Entity wrapper” over existing tables/documents

  • Keep existing storage unchanged.

  • Create an Entity record that references external storage:

    • payload contains { externalType, externalId, sourceSystem }
  • Attachments, relations, and organization are managed in the new layer.

This is the safest “overlay” approach.

Pattern B — “Dual write” for new objects

  • New entities are created in the new model as the source of truth.
  • Optionally mirrored into legacy storage for compatibility.

Pattern C — “Progressive normalization”

  • Start overlay-style.
  • Gradually move the most valuable types (e.g., Tasks, Decisions) into native entities.
  • Leave rarely touched legacy objects wrapped indefinitely.

9.2 Migration steps for existing data

  1. Assign stable IDs

    • If legacy IDs exist, reuse them with a namespace prefix.
  2. Create root complexes

    • e.g. one Ecosystem or per-tenant Workspace.
  3. Attach existing entities to a primary context

    • even if initially coarse (everything attaches to one domain/project).
  4. Introduce finer complexes

    • split into domains, repos/projects later by moving primary attachments.
  5. Add relations incrementally

    • create relation entities for the relationships you query most.
  6. Backfill organizer complexes

    • workstreams, boards, SBOMs, etc., via secondary attachments.

Because relations and organization are additive, you can evolve structure without breaking identity.


10. What this system buys you

  • A uniform modeling surface across domains.
  • A clean separation of content (atoms) from structure (complexes + relations).
  • Multiple overlapping organizations via secondary attachments without duplication.
  • First-class relationships with auditability and contextual ownership.
  • Incremental adoption over legacy systems.

Extension Points

This could be turned into a compact “spec” format (like a small RFC) plus a concrete “Type Registry” table for your example (including recommended relation types and constraints).

xxx