## Generic entity modeling system

A domain-agnostic data modeling system for organizing “entities under management” in a rigorous, flexible, and extensible way.

### Goals

* **Rigorous**: clear invariants, predictable querying, safe evolution.
* **Flexible**: new entity types and new relations without migrations that rewrite everything.
* **Extensible**: supports multiple domains, sub-domains, and incremental adoption over existing data.

---

# 1. Core concepts

## 1.1 Entity

An **Entity** is the atomic unit of identity and lifecycle.

**Entity fields (conceptual)**

* `id` (immutable unique identifier)
* `kind` ∈ {`Atom`, `Complex`, `Relation`}
* `type` (domain-specific type name, e.g. `Task`, `Repository`, `Customer`)
* `payload` (type-specific attributes; ideally versioned)
* `attachments` (ordered list of entity references)
* `meta` (timestamps, version, permissions, provenance)

### Entity kinds

* **Atom**: primary facts / content objects.
* **Complex**: organizational containers and structure owners (hierarchy, collections, indexes, contexts).
* **Relation**: first-class edge object that encodes a relationship between entities; owned by a Complex.

---

# 2. Attachments

## 2.1 Attachment list

Every entity has an ordered list:

* `attachments: [EntityRef]`
* `attachments[0]` is the **Primary Attachment**.

### Derived notion: “Part-of”

If an entity’s primary attachment is an **Atom**, then the entity is a **Part** of that Atom.

This is a *classification* derived from data, not a separate stored relation.

## 2.2 Attachment roles (recommended)

To avoid ambiguity and allow validation, each attachment can optionally have a role label.

Conceptually:

* `attachment = { targetId, position, role }`

Common roles:

* `primary` (implicit by position 0)
* `index` (entity appears in this complex for navigation/search)
* `provenance` (source reference)
* `tag` (classification)
* `context` (additional scope)

Ordering remains canonical; roles improve clarity and constraints.

---

# 3. Hierarchy and layering

## 3.1 Primary chain

The **Primary Chain** of an entity is obtained by repeatedly following `attachments[0]`.

**Invariant (recommended):** the primary chain must be **acyclic**.

This yields a robust layering model:

* Every entity “lives in” a context (a Complex), or is “part of” an Atom.
* You can always answer: “Where does this belong?” by walking the primary chain.

## 3.2 Roots and scopes

A system should define at least one **root Complex** (e.g. `Ecosystem`, `Workspace`, `Tenant`).

All managed entities must be reachable from a root by following primary attachments.

---

# 4. Relations as first-class entities

## 4.1 Relation entity

A **Relation** is an entity whose purpose is to define a connection among other entities.

**Key rule**

* A Relation’s **primary attachment MUST be a Complex**.
  That Complex is the **relation-space** (the context that “owns” the relationship).

This avoids “atoms knowing” relation details: atoms remain content, complexes and relations hold structure.

## 4.2 Relation endpoints convention

To make relations queryable and consistent, standardize attachment slots:

* `attachments[0] = contextComplex` (primary; relation-space)
* `attachments[1] = fromEndpoint`
* `attachments[2] = toEndpoint`
* `attachments[3..] = optional extra endpoints` (evidence, via, stakeholder, etc.)

Relation semantics live in:

* `type` (e.g. `DependsOn`, `Implements`, `References`)
* and/or payload fields like `{ relType: "...", strength: ..., rationale: ... }`

---

# 5. Type system and constraints

## 5.1 Entity Type Registry

Maintain a registry of types describing:

* `kind`: Atom/Complex/Relation
* allowed primary attachment kinds/types
* allowed secondary attachment kinds/types
* payload schema (optional but recommended)
* indexing / query defaults

Example (conceptual):

* `Task`: kind=Atom, primary must be `Repository` (Complex)
* `Repository`: kind=Complex, primary must be `Domain` (Complex)

## 5.2 Validation invariants (recommended minimum)

1. **Exactly one primary attachment** (position 0).
2. **Primary chain must be acyclic**.
3. **Primary attachment kind/type constraints** must match the registry.
4. **Context-consistency constraints** for organizer complexes:

   * if `Task` has a secondary attachment to `Workstream`,
     then `Task.primary == Workstream.primary` (same repository).
5. **Relation constraints**:

   * primary must be Complex
   * endpoint types must match relation type definition
   * relation context must match endpoint context rules (usually same repo/domain)

These constraints give rigor without hard-coding a single domain model.

---

# 6. Query model (domain-agnostic)

These queries exist in any domain:

## 6.1 Locate context

* `context(entity)` = walk primary chain to root, or to the nearest scope boundary (e.g. nearest Domain/Workspace).

## 6.2 Membership

* Members of a Complex: all entities with `primary == complexId`.

## 6.3 Parts of an Atom

* Parts of an Atom: all entities with `primary == atomId`.

## 6.4 Relations in a relation-space

* Relations owned by a Complex: all Relation entities with `primary == complexId`.

## 6.5 Neighborhood (graph view)

* For entity X: find all relations in the same relation-space where X appears as endpoint.

---

# 7. Example domain: Ecosystem → Domain → Repository → Workstreams/SBOMs

This section makes the system concrete using your types.

## 7.1 Complexes

* `Ecosystem` (Complex, root)
* `Domain` (Complex, primary = Ecosystem)
* `Repository` (Complex, primary = Domain)
* `Workstream` (Complex, primary = Repository) — organizes work items
* `SBOM` (Complex, primary = Repository) — organizes dependencies

## 7.2 Atoms

* `Decision` (Atom, primary = Repository)
* `Task` (Atom, primary = Repository)
* `TechDebt` (Atom, primary = Repository)
* `Extend` (Atom, primary = Repository)
* `Dependency` (Atom, primary = Repository)

## 7.3 Organizing via secondary attachments

* A Task in a Workstream:

  * `Task.attachments = [Repo42, Workstream7]`
* A Dependency in an SBOM:

  * `Dependency.attachments = [Repo42, Sbom3]`

Atoms remain ignorant of *how* the workstream orders tasks; the workstream can store structure.

## 7.4 Relation examples

### Task → Task dependency (repo-scoped)

Relation type: `DependsOn` (Relation)

* `DependsOn.attachments = [Repo42, TaskA, TaskB]`
* payload: `{ critical: true, reason: "API contract needed first" }`

### Decision influences tasks (repo-scoped)

Relation type: `Motivates` (Relation)

* `Motivates.attachments = [Repo42, Decision9, TaskA]`

### Dependency graph inside an SBOM (sbom-scoped)

Relation type: `Requires` (Relation)

* `Requires.attachments = [Sbom3, DependencyX, DependencyY]`
* payload: `{ scope: "runtime" }`

This cleanly separates:

* planning relations (Repo relation-space)
* supply-chain relations (SBOM relation-space)

---

# 8. Applying the modeling system to a new domain

You can apply this to any domain by following a small method.

## 8.1 Step-by-step method

### Step 1 — Choose a root Complex

Pick the top-level scope:

* `Workspace`, `Tenant`, `Organization`, `Ecosystem`, etc.

### Step 2 — Identify “containers” vs “content”

* Containers become **Complexes** (projects, folders, accounts, repositories, case files).
* Content objects become **Atoms** (documents, customers, invoices, tickets, assets).

Rule of thumb:

* If it *organizes* others or defines a scope, it’s a Complex.
* If it’s a “thing” with intrinsic content/lifecycle, it’s an Atom.

### Step 3 — Define the primary hierarchy (layering)

Decide what “belongs to what” as the default place where entities live.
Example pattern:

* `Atom.primary = nearest containing Complex`

### Step 4 — Define organizer complexes (optional)

Introduce complexes like `Workstream`, `Board`, `Collection`, `SBOM`, `Timeline` that provide structure.
Use **secondary attachments** from atoms to these complexes.

### Step 5 — Define relation-spaces

Choose where relations live:

* typically in the “owning” complex (project/repo/case)
* sometimes in a specialized complex (SBOM, timeline, graph)

### Step 6 — Create a Type Registry + constraints

For each type, specify:

* kind
* required primary attachment type(s)
* optional secondary attachment types
* allowed relation endpoints (if relation type)

### Step 7 — Migrate incrementally

Start with primary attachments and identity first.
Add organizer complexes and relations later without breaking identity.

---

# 9. Applying it to an existing domain with pre-existing entities

The key is to **wrap** existing entities as Entities in this system without rewriting them all at once.

## 9.1 Integration patterns

### Pattern A — “Entity wrapper” over existing tables/documents

* Keep existing storage unchanged.
* Create an `Entity` record that references external storage:

  * payload contains `{ externalType, externalId, sourceSystem }`
* Attachments, relations, and organization are managed in the new layer.

This is the safest “overlay” approach.

### Pattern B — “Dual write” for new objects

* New entities are created in the new model as the source of truth.
* Optionally mirrored into legacy storage for compatibility.

### Pattern C — “Progressive normalization”

* Start overlay-style.
* Gradually move the most valuable types (e.g., Tasks, Decisions) into native entities.
* Leave rarely touched legacy objects wrapped indefinitely.

## 9.2 Migration steps for existing data

1. **Assign stable IDs**

   * If legacy IDs exist, reuse them with a namespace prefix.
2. **Create root complexes**

   * e.g. one `Ecosystem` or per-tenant `Workspace`.
3. **Attach existing entities to a primary context**

   * even if initially coarse (everything attaches to one domain/project).
4. **Introduce finer complexes**

   * split into domains, repos/projects later by moving primary attachments.
5. **Add relations incrementally**

   * create relation entities for the relationships you query most.
6. **Backfill organizer complexes**

   * workstreams, boards, SBOMs, etc., via secondary attachments.

Because relations and organization are additive, you can evolve structure without breaking identity.

---

# 10. What this system buys you

* A **uniform modeling surface** across domains.
* A **clean separation** of content (atoms) from structure (complexes + relations).
* **Multiple overlapping organizations** via secondary attachments without duplication.
* **First-class relationships** with auditability and contextual ownership.
* **Incremental adoption** over legacy systems.

## Extension Points

This could be turned into a compact “spec” format (like a small RFC) plus a concrete “Type Registry” table for your example (including recommended relation types and constraints).

xxx