Files
the-custodian/wiki/GenericEntityModellingSystem.md
tegwick 8ab6e6c9c5 feat(gems): three-pass schema migration aligning state-hub with GEMS
Implements CUST-WP-0007. Resolves inconsistencies I-1, I-2, I-5, I-6
identified in the GEMS audit (GenericEntityModellingSystem.md).

Pass 1 (e1f2a3b4c5d6): domain_id FK on extension_points and
technical_debt (replaces raw string column); repo_id FK on contributions.
Fixes domain-filtering bugs in EP/TD dashboard pages.

Pass 2 (f2a3b4c5d6e7): repo_id nullable FK on workstreams, aligning
the GEMS primary attachment with ADR-001 (repo > topic). Dashboard
pages updated to prefer repo->domain over topic->domain.

Pass 3 (a3b4c5d6e7f8): SBOMSnapshot container entity (GEMS Complex
between Repository and SBOMEntry). Ingest is now additive — each call
creates a new snapshot; history is retained. List/report endpoints
filter to latest snapshot per repo via _latest_snapshot_ids_subquery().
New endpoints: GET /sbom/snapshots/, GET /sbom/snapshots/{id}/.
Dashboard gains a Snapshot History section.

Also adds GEMS analysis artefacts: wiki/GEMS-StateHub-TypeRegistry.md,
wiki/GEMS-StateHub-SWOT.md, workplans/CUST-WP-0006 (analysis),
workplans/CUST-WP-0007 (migration, now completed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 23:39:17 +01:00

365 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Generic entity modeling system
A domain-agnostic data modeling system for organizing “entities under management” in a rigorous, flexible, and extensible way.
### Goals
* **Rigorous**: clear invariants, predictable querying, safe evolution.
* **Flexible**: new entity types and new relations without migrations that rewrite everything.
* **Extensible**: supports multiple domains, sub-domains, and incremental adoption over existing data.
---
# 1. Core concepts
## 1.1 Entity
An **Entity** is the atomic unit of identity and lifecycle.
**Entity fields (conceptual)**
* `id` (immutable unique identifier)
* `kind` ∈ {`Atom`, `Complex`, `Relation`}
* `type` (domain-specific type name, e.g. `Task`, `Repository`, `Customer`)
* `payload` (type-specific attributes; ideally versioned)
* `attachments` (ordered list of entity references)
* `meta` (timestamps, version, permissions, provenance)
### Entity kinds
* **Atom**: primary facts / content objects.
* **Complex**: organizational containers and structure owners (hierarchy, collections, indexes, contexts).
* **Relation**: first-class edge object that encodes a relationship between entities; owned by a Complex.
---
# 2. Attachments
## 2.1 Attachment list
Every entity has an ordered list:
* `attachments: [EntityRef]`
* `attachments[0]` is the **Primary Attachment**.
### Derived notion: “Part-of”
If an entitys primary attachment is an **Atom**, then the entity is a **Part** of that Atom.
This is a *classification* derived from data, not a separate stored relation.
## 2.2 Attachment roles (recommended)
To avoid ambiguity and allow validation, each attachment can optionally have a role label.
Conceptually:
* `attachment = { targetId, position, role }`
Common roles:
* `primary` (implicit by position 0)
* `index` (entity appears in this complex for navigation/search)
* `provenance` (source reference)
* `tag` (classification)
* `context` (additional scope)
Ordering remains canonical; roles improve clarity and constraints.
---
# 3. Hierarchy and layering
## 3.1 Primary chain
The **Primary Chain** of an entity is obtained by repeatedly following `attachments[0]`.
**Invariant (recommended):** the primary chain must be **acyclic**.
This yields a robust layering model:
* Every entity “lives in” a context (a Complex), or is “part of” an Atom.
* You can always answer: “Where does this belong?” by walking the primary chain.
## 3.2 Roots and scopes
A system should define at least one **root Complex** (e.g. `Ecosystem`, `Workspace`, `Tenant`).
All managed entities must be reachable from a root by following primary attachments.
---
# 4. Relations as first-class entities
## 4.1 Relation entity
A **Relation** is an entity whose purpose is to define a connection among other entities.
**Key rule**
* A Relations **primary attachment MUST be a Complex**.
That Complex is the **relation-space** (the context that “owns” the relationship).
This avoids “atoms knowing” relation details: atoms remain content, complexes and relations hold structure.
## 4.2 Relation endpoints convention
To make relations queryable and consistent, standardize attachment slots:
* `attachments[0] = contextComplex` (primary; relation-space)
* `attachments[1] = fromEndpoint`
* `attachments[2] = toEndpoint`
* `attachments[3..] = optional extra endpoints` (evidence, via, stakeholder, etc.)
Relation semantics live in:
* `type` (e.g. `DependsOn`, `Implements`, `References`)
* and/or payload fields like `{ relType: "...", strength: ..., rationale: ... }`
---
# 5. Type system and constraints
## 5.1 Entity Type Registry
Maintain a registry of types describing:
* `kind`: Atom/Complex/Relation
* allowed primary attachment kinds/types
* allowed secondary attachment kinds/types
* payload schema (optional but recommended)
* indexing / query defaults
Example (conceptual):
* `Task`: kind=Atom, primary must be `Repository` (Complex)
* `Repository`: kind=Complex, primary must be `Domain` (Complex)
## 5.2 Validation invariants (recommended minimum)
1. **Exactly one primary attachment** (position 0).
2. **Primary chain must be acyclic**.
3. **Primary attachment kind/type constraints** must match the registry.
4. **Context-consistency constraints** for organizer complexes:
* if `Task` has a secondary attachment to `Workstream`,
then `Task.primary == Workstream.primary` (same repository).
5. **Relation constraints**:
* primary must be Complex
* endpoint types must match relation type definition
* relation context must match endpoint context rules (usually same repo/domain)
These constraints give rigor without hard-coding a single domain model.
---
# 6. Query model (domain-agnostic)
These queries exist in any domain:
## 6.1 Locate context
* `context(entity)` = walk primary chain to root, or to the nearest scope boundary (e.g. nearest Domain/Workspace).
## 6.2 Membership
* Members of a Complex: all entities with `primary == complexId`.
## 6.3 Parts of an Atom
* Parts of an Atom: all entities with `primary == atomId`.
## 6.4 Relations in a relation-space
* Relations owned by a Complex: all Relation entities with `primary == complexId`.
## 6.5 Neighborhood (graph view)
* For entity X: find all relations in the same relation-space where X appears as endpoint.
---
# 7. Example domain: Ecosystem → Domain → Repository → Workstreams/SBOMs
This section makes the system concrete using your types.
## 7.1 Complexes
* `Ecosystem` (Complex, root)
* `Domain` (Complex, primary = Ecosystem)
* `Repository` (Complex, primary = Domain)
* `Workstream` (Complex, primary = Repository) — organizes work items
* `SBOM` (Complex, primary = Repository) — organizes dependencies
## 7.2 Atoms
* `Decision` (Atom, primary = Repository)
* `Task` (Atom, primary = Repository)
* `TechDebt` (Atom, primary = Repository)
* `Extend` (Atom, primary = Repository)
* `Dependency` (Atom, primary = Repository)
## 7.3 Organizing via secondary attachments
* A Task in a Workstream:
* `Task.attachments = [Repo42, Workstream7]`
* A Dependency in an SBOM:
* `Dependency.attachments = [Repo42, Sbom3]`
Atoms remain ignorant of *how* the workstream orders tasks; the workstream can store structure.
## 7.4 Relation examples
### Task → Task dependency (repo-scoped)
Relation type: `DependsOn` (Relation)
* `DependsOn.attachments = [Repo42, TaskA, TaskB]`
* payload: `{ critical: true, reason: "API contract needed first" }`
### Decision influences tasks (repo-scoped)
Relation type: `Motivates` (Relation)
* `Motivates.attachments = [Repo42, Decision9, TaskA]`
### Dependency graph inside an SBOM (sbom-scoped)
Relation type: `Requires` (Relation)
* `Requires.attachments = [Sbom3, DependencyX, DependencyY]`
* payload: `{ scope: "runtime" }`
This cleanly separates:
* planning relations (Repo relation-space)
* supply-chain relations (SBOM relation-space)
---
# 8. Applying the modeling system to a new domain
You can apply this to any domain by following a small method.
## 8.1 Step-by-step method
### Step 1 — Choose a root Complex
Pick the top-level scope:
* `Workspace`, `Tenant`, `Organization`, `Ecosystem`, etc.
### Step 2 — Identify “containers” vs “content”
* Containers become **Complexes** (projects, folders, accounts, repositories, case files).
* Content objects become **Atoms** (documents, customers, invoices, tickets, assets).
Rule of thumb:
* If it *organizes* others or defines a scope, its a Complex.
* If its a “thing” with intrinsic content/lifecycle, its an Atom.
### Step 3 — Define the primary hierarchy (layering)
Decide what “belongs to what” as the default place where entities live.
Example pattern:
* `Atom.primary = nearest containing Complex`
### Step 4 — Define organizer complexes (optional)
Introduce complexes like `Workstream`, `Board`, `Collection`, `SBOM`, `Timeline` that provide structure.
Use **secondary attachments** from atoms to these complexes.
### Step 5 — Define relation-spaces
Choose where relations live:
* typically in the “owning” complex (project/repo/case)
* sometimes in a specialized complex (SBOM, timeline, graph)
### Step 6 — Create a Type Registry + constraints
For each type, specify:
* kind
* required primary attachment type(s)
* optional secondary attachment types
* allowed relation endpoints (if relation type)
### Step 7 — Migrate incrementally
Start with primary attachments and identity first.
Add organizer complexes and relations later without breaking identity.
---
# 9. Applying it to an existing domain with pre-existing entities
The key is to **wrap** existing entities as Entities in this system without rewriting them all at once.
## 9.1 Integration patterns
### Pattern A — “Entity wrapper” over existing tables/documents
* Keep existing storage unchanged.
* Create an `Entity` record that references external storage:
* payload contains `{ externalType, externalId, sourceSystem }`
* Attachments, relations, and organization are managed in the new layer.
This is the safest “overlay” approach.
### Pattern B — “Dual write” for new objects
* New entities are created in the new model as the source of truth.
* Optionally mirrored into legacy storage for compatibility.
### Pattern C — “Progressive normalization”
* Start overlay-style.
* Gradually move the most valuable types (e.g., Tasks, Decisions) into native entities.
* Leave rarely touched legacy objects wrapped indefinitely.
## 9.2 Migration steps for existing data
1. **Assign stable IDs**
* If legacy IDs exist, reuse them with a namespace prefix.
2. **Create root complexes**
* e.g. one `Ecosystem` or per-tenant `Workspace`.
3. **Attach existing entities to a primary context**
* even if initially coarse (everything attaches to one domain/project).
4. **Introduce finer complexes**
* split into domains, repos/projects later by moving primary attachments.
5. **Add relations incrementally**
* create relation entities for the relationships you query most.
6. **Backfill organizer complexes**
* workstreams, boards, SBOMs, etc., via secondary attachments.
Because relations and organization are additive, you can evolve structure without breaking identity.
---
# 10. What this system buys you
* A **uniform modeling surface** across domains.
* A **clean separation** of content (atoms) from structure (complexes + relations).
* **Multiple overlapping organizations** via secondary attachments without duplication.
* **First-class relationships** with auditability and contextual ownership.
* **Incremental adoption** over legacy systems.
## Extension Points
This could be turned into a compact “spec” format (like a small RFC) plus a concrete “Type Registry” table for your example (including recommended relation types and constraints).
xxx