Implements CUST-WP-0007. Resolves inconsistencies I-1, I-2, I-5, I-6
identified in the GEMS audit (GenericEntityModellingSystem.md).
Pass 1 (e1f2a3b4c5d6): domain_id FK on extension_points and
technical_debt (replaces raw string column); repo_id FK on contributions.
Fixes domain-filtering bugs in EP/TD dashboard pages.
Pass 2 (f2a3b4c5d6e7): repo_id nullable FK on workstreams, aligning
the GEMS primary attachment with ADR-001 (repo > topic). Dashboard
pages updated to prefer repo->domain over topic->domain.
Pass 3 (a3b4c5d6e7f8): SBOMSnapshot container entity (GEMS Complex
between Repository and SBOMEntry). Ingest is now additive — each call
creates a new snapshot; history is retained. List/report endpoints
filter to latest snapshot per repo via _latest_snapshot_ids_subquery().
New endpoints: GET /sbom/snapshots/, GET /sbom/snapshots/{id}/.
Dashboard gains a Snapshot History section.
Also adds GEMS analysis artefacts: wiki/GEMS-StateHub-TypeRegistry.md,
wiki/GEMS-StateHub-SWOT.md, workplans/CUST-WP-0006 (analysis),
workplans/CUST-WP-0007 (migration, now completed).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
365 lines
11 KiB
Markdown
365 lines
11 KiB
Markdown
## Generic entity modeling system
|
||
|
||
A domain-agnostic data modeling system for organizing “entities under management” in a rigorous, flexible, and extensible way.
|
||
|
||
### Goals
|
||
|
||
* **Rigorous**: clear invariants, predictable querying, safe evolution.
|
||
* **Flexible**: new entity types and new relations without migrations that rewrite everything.
|
||
* **Extensible**: supports multiple domains, sub-domains, and incremental adoption over existing data.
|
||
|
||
---
|
||
|
||
# 1. Core concepts
|
||
|
||
## 1.1 Entity
|
||
|
||
An **Entity** is the atomic unit of identity and lifecycle.
|
||
|
||
**Entity fields (conceptual)**
|
||
|
||
* `id` (immutable unique identifier)
|
||
* `kind` ∈ {`Atom`, `Complex`, `Relation`}
|
||
* `type` (domain-specific type name, e.g. `Task`, `Repository`, `Customer`)
|
||
* `payload` (type-specific attributes; ideally versioned)
|
||
* `attachments` (ordered list of entity references)
|
||
* `meta` (timestamps, version, permissions, provenance)
|
||
|
||
### Entity kinds
|
||
|
||
* **Atom**: primary facts / content objects.
|
||
* **Complex**: organizational containers and structure owners (hierarchy, collections, indexes, contexts).
|
||
* **Relation**: first-class edge object that encodes a relationship between entities; owned by a Complex.
|
||
|
||
---
|
||
|
||
# 2. Attachments
|
||
|
||
## 2.1 Attachment list
|
||
|
||
Every entity has an ordered list:
|
||
|
||
* `attachments: [EntityRef]`
|
||
* `attachments[0]` is the **Primary Attachment**.
|
||
|
||
### Derived notion: “Part-of”
|
||
|
||
If an entity’s primary attachment is an **Atom**, then the entity is a **Part** of that Atom.
|
||
|
||
This is a *classification* derived from data, not a separate stored relation.
|
||
|
||
## 2.2 Attachment roles (recommended)
|
||
|
||
To avoid ambiguity and allow validation, each attachment can optionally have a role label.
|
||
|
||
Conceptually:
|
||
|
||
* `attachment = { targetId, position, role }`
|
||
|
||
Common roles:
|
||
|
||
* `primary` (implicit by position 0)
|
||
* `index` (entity appears in this complex for navigation/search)
|
||
* `provenance` (source reference)
|
||
* `tag` (classification)
|
||
* `context` (additional scope)
|
||
|
||
Ordering remains canonical; roles improve clarity and constraints.
|
||
|
||
---
|
||
|
||
# 3. Hierarchy and layering
|
||
|
||
## 3.1 Primary chain
|
||
|
||
The **Primary Chain** of an entity is obtained by repeatedly following `attachments[0]`.
|
||
|
||
**Invariant (recommended):** the primary chain must be **acyclic**.
|
||
|
||
This yields a robust layering model:
|
||
|
||
* Every entity “lives in” a context (a Complex), or is “part of” an Atom.
|
||
* You can always answer: “Where does this belong?” by walking the primary chain.
|
||
|
||
## 3.2 Roots and scopes
|
||
|
||
A system should define at least one **root Complex** (e.g. `Ecosystem`, `Workspace`, `Tenant`).
|
||
|
||
All managed entities must be reachable from a root by following primary attachments.
|
||
|
||
---
|
||
|
||
# 4. Relations as first-class entities
|
||
|
||
## 4.1 Relation entity
|
||
|
||
A **Relation** is an entity whose purpose is to define a connection among other entities.
|
||
|
||
**Key rule**
|
||
|
||
* A Relation’s **primary attachment MUST be a Complex**.
|
||
That Complex is the **relation-space** (the context that “owns” the relationship).
|
||
|
||
This avoids “atoms knowing” relation details: atoms remain content, complexes and relations hold structure.
|
||
|
||
## 4.2 Relation endpoints convention
|
||
|
||
To make relations queryable and consistent, standardize attachment slots:
|
||
|
||
* `attachments[0] = contextComplex` (primary; relation-space)
|
||
* `attachments[1] = fromEndpoint`
|
||
* `attachments[2] = toEndpoint`
|
||
* `attachments[3..] = optional extra endpoints` (evidence, via, stakeholder, etc.)
|
||
|
||
Relation semantics live in:
|
||
|
||
* `type` (e.g. `DependsOn`, `Implements`, `References`)
|
||
* and/or payload fields like `{ relType: "...", strength: ..., rationale: ... }`
|
||
|
||
---
|
||
|
||
# 5. Type system and constraints
|
||
|
||
## 5.1 Entity Type Registry
|
||
|
||
Maintain a registry of types describing:
|
||
|
||
* `kind`: Atom/Complex/Relation
|
||
* allowed primary attachment kinds/types
|
||
* allowed secondary attachment kinds/types
|
||
* payload schema (optional but recommended)
|
||
* indexing / query defaults
|
||
|
||
Example (conceptual):
|
||
|
||
* `Task`: kind=Atom, primary must be `Repository` (Complex)
|
||
* `Repository`: kind=Complex, primary must be `Domain` (Complex)
|
||
|
||
## 5.2 Validation invariants (recommended minimum)
|
||
|
||
1. **Exactly one primary attachment** (position 0).
|
||
2. **Primary chain must be acyclic**.
|
||
3. **Primary attachment kind/type constraints** must match the registry.
|
||
4. **Context-consistency constraints** for organizer complexes:
|
||
|
||
* if `Task` has a secondary attachment to `Workstream`,
|
||
then `Task.primary == Workstream.primary` (same repository).
|
||
5. **Relation constraints**:
|
||
|
||
* primary must be Complex
|
||
* endpoint types must match relation type definition
|
||
* relation context must match endpoint context rules (usually same repo/domain)
|
||
|
||
These constraints give rigor without hard-coding a single domain model.
|
||
|
||
---
|
||
|
||
# 6. Query model (domain-agnostic)
|
||
|
||
These queries exist in any domain:
|
||
|
||
## 6.1 Locate context
|
||
|
||
* `context(entity)` = walk primary chain to root, or to the nearest scope boundary (e.g. nearest Domain/Workspace).
|
||
|
||
## 6.2 Membership
|
||
|
||
* Members of a Complex: all entities with `primary == complexId`.
|
||
|
||
## 6.3 Parts of an Atom
|
||
|
||
* Parts of an Atom: all entities with `primary == atomId`.
|
||
|
||
## 6.4 Relations in a relation-space
|
||
|
||
* Relations owned by a Complex: all Relation entities with `primary == complexId`.
|
||
|
||
## 6.5 Neighborhood (graph view)
|
||
|
||
* For entity X: find all relations in the same relation-space where X appears as endpoint.
|
||
|
||
---
|
||
|
||
# 7. Example domain: Ecosystem → Domain → Repository → Workstreams/SBOMs
|
||
|
||
This section makes the system concrete using your types.
|
||
|
||
## 7.1 Complexes
|
||
|
||
* `Ecosystem` (Complex, root)
|
||
* `Domain` (Complex, primary = Ecosystem)
|
||
* `Repository` (Complex, primary = Domain)
|
||
* `Workstream` (Complex, primary = Repository) — organizes work items
|
||
* `SBOM` (Complex, primary = Repository) — organizes dependencies
|
||
|
||
## 7.2 Atoms
|
||
|
||
* `Decision` (Atom, primary = Repository)
|
||
* `Task` (Atom, primary = Repository)
|
||
* `TechDebt` (Atom, primary = Repository)
|
||
* `Extend` (Atom, primary = Repository)
|
||
* `Dependency` (Atom, primary = Repository)
|
||
|
||
## 7.3 Organizing via secondary attachments
|
||
|
||
* A Task in a Workstream:
|
||
|
||
* `Task.attachments = [Repo42, Workstream7]`
|
||
* A Dependency in an SBOM:
|
||
|
||
* `Dependency.attachments = [Repo42, Sbom3]`
|
||
|
||
Atoms remain ignorant of *how* the workstream orders tasks; the workstream can store structure.
|
||
|
||
## 7.4 Relation examples
|
||
|
||
### Task → Task dependency (repo-scoped)
|
||
|
||
Relation type: `DependsOn` (Relation)
|
||
|
||
* `DependsOn.attachments = [Repo42, TaskA, TaskB]`
|
||
* payload: `{ critical: true, reason: "API contract needed first" }`
|
||
|
||
### Decision influences tasks (repo-scoped)
|
||
|
||
Relation type: `Motivates` (Relation)
|
||
|
||
* `Motivates.attachments = [Repo42, Decision9, TaskA]`
|
||
|
||
### Dependency graph inside an SBOM (sbom-scoped)
|
||
|
||
Relation type: `Requires` (Relation)
|
||
|
||
* `Requires.attachments = [Sbom3, DependencyX, DependencyY]`
|
||
* payload: `{ scope: "runtime" }`
|
||
|
||
This cleanly separates:
|
||
|
||
* planning relations (Repo relation-space)
|
||
* supply-chain relations (SBOM relation-space)
|
||
|
||
---
|
||
|
||
# 8. Applying the modeling system to a new domain
|
||
|
||
You can apply this to any domain by following a small method.
|
||
|
||
## 8.1 Step-by-step method
|
||
|
||
### Step 1 — Choose a root Complex
|
||
|
||
Pick the top-level scope:
|
||
|
||
* `Workspace`, `Tenant`, `Organization`, `Ecosystem`, etc.
|
||
|
||
### Step 2 — Identify “containers” vs “content”
|
||
|
||
* Containers become **Complexes** (projects, folders, accounts, repositories, case files).
|
||
* Content objects become **Atoms** (documents, customers, invoices, tickets, assets).
|
||
|
||
Rule of thumb:
|
||
|
||
* If it *organizes* others or defines a scope, it’s a Complex.
|
||
* If it’s a “thing” with intrinsic content/lifecycle, it’s an Atom.
|
||
|
||
### Step 3 — Define the primary hierarchy (layering)
|
||
|
||
Decide what “belongs to what” as the default place where entities live.
|
||
Example pattern:
|
||
|
||
* `Atom.primary = nearest containing Complex`
|
||
|
||
### Step 4 — Define organizer complexes (optional)
|
||
|
||
Introduce complexes like `Workstream`, `Board`, `Collection`, `SBOM`, `Timeline` that provide structure.
|
||
Use **secondary attachments** from atoms to these complexes.
|
||
|
||
### Step 5 — Define relation-spaces
|
||
|
||
Choose where relations live:
|
||
|
||
* typically in the “owning” complex (project/repo/case)
|
||
* sometimes in a specialized complex (SBOM, timeline, graph)
|
||
|
||
### Step 6 — Create a Type Registry + constraints
|
||
|
||
For each type, specify:
|
||
|
||
* kind
|
||
* required primary attachment type(s)
|
||
* optional secondary attachment types
|
||
* allowed relation endpoints (if relation type)
|
||
|
||
### Step 7 — Migrate incrementally
|
||
|
||
Start with primary attachments and identity first.
|
||
Add organizer complexes and relations later without breaking identity.
|
||
|
||
---
|
||
|
||
# 9. Applying it to an existing domain with pre-existing entities
|
||
|
||
The key is to **wrap** existing entities as Entities in this system without rewriting them all at once.
|
||
|
||
## 9.1 Integration patterns
|
||
|
||
### Pattern A — “Entity wrapper” over existing tables/documents
|
||
|
||
* Keep existing storage unchanged.
|
||
* Create an `Entity` record that references external storage:
|
||
|
||
* payload contains `{ externalType, externalId, sourceSystem }`
|
||
* Attachments, relations, and organization are managed in the new layer.
|
||
|
||
This is the safest “overlay” approach.
|
||
|
||
### Pattern B — “Dual write” for new objects
|
||
|
||
* New entities are created in the new model as the source of truth.
|
||
* Optionally mirrored into legacy storage for compatibility.
|
||
|
||
### Pattern C — “Progressive normalization”
|
||
|
||
* Start overlay-style.
|
||
* Gradually move the most valuable types (e.g., Tasks, Decisions) into native entities.
|
||
* Leave rarely touched legacy objects wrapped indefinitely.
|
||
|
||
## 9.2 Migration steps for existing data
|
||
|
||
1. **Assign stable IDs**
|
||
|
||
* If legacy IDs exist, reuse them with a namespace prefix.
|
||
2. **Create root complexes**
|
||
|
||
* e.g. one `Ecosystem` or per-tenant `Workspace`.
|
||
3. **Attach existing entities to a primary context**
|
||
|
||
* even if initially coarse (everything attaches to one domain/project).
|
||
4. **Introduce finer complexes**
|
||
|
||
* split into domains, repos/projects later by moving primary attachments.
|
||
5. **Add relations incrementally**
|
||
|
||
* create relation entities for the relationships you query most.
|
||
6. **Backfill organizer complexes**
|
||
|
||
* workstreams, boards, SBOMs, etc., via secondary attachments.
|
||
|
||
Because relations and organization are additive, you can evolve structure without breaking identity.
|
||
|
||
---
|
||
|
||
# 10. What this system buys you
|
||
|
||
* A **uniform modeling surface** across domains.
|
||
* A **clean separation** of content (atoms) from structure (complexes + relations).
|
||
* **Multiple overlapping organizations** via secondary attachments without duplication.
|
||
* **First-class relationships** with auditability and contextual ownership.
|
||
* **Incremental adoption** over legacy systems.
|
||
|
||
## Extension Points
|
||
|
||
This could be turned into a compact “spec” format (like a small RFC) plus a concrete “Type Registry” table for your example (including recommended relation types and constraints).
|
||
|
||
xxx
|