identity-canon/research/ResearchSeed.md

# ResearchSeed.md

# identity-canon Research Seed

This file captures the initial research seeding information for `identity-canon`.

The research goal is to distill a canonical terminology and conceptual data model for identity, user, organization, community, tenant, and relationship management in complex systems that are multi-tenant, multi-vendor, multi-community, and multi-user capable.

The model should support enterprises with sub-organizations, social communities, social-media follower graphs, single users, family entities, spontaneous interest groups, bots, service accounts, AI agents, and weak/strong synonymity between identity records.

## Initial Framing

The project should not start from a simple `user` table or from a classic `users + groups + roles` IAM schema.

A more robust canonical core is a graph of:

- actors;
- identities;
- accounts;
- identifiers;
- profiles;
- personas;
- scopes;
- tenants;
- organizations;
- communities;
- families/households;
- memberships;
- relationships;
- credentials;
- claims;
- evidence;
- synonymity assertions.

Classic IAM systems, social networks, enterprise directories, family accounts, communities, vendors, customers, and spontaneous groups can then be modeled as specializations or patterns over that graph.

## Important Research Domains

## 1. Identity Provisioning and Directory Models

Important sources:

- SCIM 2.0: RFC 7643 and RFC 7644;
- LDAP and inetOrgPerson: RFC 4519 and RFC 2798;
- Keycloak Organizations;
- ZITADEL organizations and projects;
- Ory Kratos and Keto.

Research focus:

- provisioning semantics;
- users and groups;
- organization/member terminology;
- directory assumptions;
- account lifecycle;
- separation between identity management and authorization.

SCIM is especially important as a provisioning baseline because it defines platform-neutral schemas and protocol operations for user and group resources.

LDAP and inetOrgPerson remain important because lightweight IAM stacks and enterprise systems still inherit LDAP-style person, organizational unit, and group terminology.

Keycloak and ZITADEL provide live multi-tenant IAM product vocabularies. Ory is useful because it separates identity management from authorization.

## 2. Authentication and Federation

Important sources:

- OpenID Connect Core;
- SAML 2.0;
- NIST SP 800-63-4;
- OpenID Shared Signals, CAEP, and RISC.

Research focus:

- issuer and subject concepts;
- pairwise and public subject identifiers;
- authentication assurance;
- federation assurance;
- assertions and claims;
- risk and security event streams;
- account linking and pseudonymous identifiers.

OIDC is central because externally issued subject identifiers and pairwise identifiers directly affect synonymity and account-linking semantics.

SAML remains important for enterprise federation and assertion semantics.

NIST identity guidance is useful for separating identity proofing, authentication assurance, federation assurance, and lifecycle management.

Shared Signals, CAEP, and RISC suggest that canonical identity models should also anticipate dynamic security and lifecycle events.

## 3. Social Graph and Community Models

Important sources:

- ActivityPub;
- FOAF;
- WebID;
- Solid profiles;
- Schema.org Person and Organization.

Research focus:

- actors;
- followers/following;
- public profiles;
- handles;
- accounts on federated servers;
- communities;
- groups;
- social relationships;
- semantic vocabularies for persons and organizations.

ActivityPub is especially relevant because it treats users as server-side actors with inboxes and outboxes. A person may have several actors across servers, which maps well to contextual identities and personas.

FOAF and Schema.org are useful because they distinguish persons, agents, organizations, groups, accounts, and membership-like properties.

WebID/Solid are useful for user-controlled profiles and decentralized identity-style profile discovery.

## 4. Authorization and Relationship Semantics

Important sources:

- Google Zanzibar;
- OpenFGA;
- Cedar;
- AWS Verified Permissions;
- Cerbos.

Research focus:

- relationship-based authorization;
- subject-relation-object tuples;
- principals;
- resources;
- actions;
- context;
- roles vs permissions vs relationships;
- delegated administration.

Zanzibar/OpenFGA-style relationship tuples are especially close to what `identity-canon` needs for memberships, ownership, representation, delegation, family roles, community moderation, vendor/customer relationships, and tenant administration.

Cedar’s principal-action-resource-context distinction is useful for preserving orthogonality between identity, action, resource, and request context.

## 5. Decentralized Identity and Verifiable Claims

Important sources:

- W3C DID Core;
- W3C Verifiable Credentials Data Model 2.0;
- OpenID for Verifiable Credentials.

Research focus:

- decentralized identifiers;
- DID subjects and controllers;
- verification methods;
- claims;
- issuers;
- holders;
- verifiers;
- presentations;
- portable identity claims;
- externally controlled identifiers.

DID and Verifiable Credentials are relevant when identity, membership, authorization, or representation claims are issued outside the platform.

The canonical model should distinguish claims from verified facts and should preserve issuer, evidence, scope, validity, and revocation state.

## 6. Entity Resolution, Synonymity, and Privacy

Important sources:

- deterministic matching;
- probabilistic matching;
- entity resolution and record linkage literature;
- GDPR pseudonymization and anonymization guidance.

Research focus:

- weak identity matches;
- strong identity links;
- scoped identity equivalence;
- operational account linking;
- legal identity links;
- privacy-preserving links;
- source and evidence;
- confidence;
- revocation;
- GDPR implications.

The model should avoid treating identity linkage as a destructive merge. Instead, synonymity should be modeled as an assertion with strength, scope, source, evidence, confidence, validity, and revocation state.

## Terminology Challenge

Many common terms are overloaded:

| Term | Common Meanings | Modeling Risk |
| --- | --- | --- |
| User | Human, account, login principal, profile, customer record, app user | Collapses person, account, and actor |
| Account | Login credential set, billing account, social media handle, tenant account | Collapses authentication and business relationship |
| Organization | Legal entity, tenant, department, team, community, vendor, customer | Collapses legal structure, membership scope, and operational boundary |
| Group | LDAP group, social group, permission group, family, team, community | Collapses social grouping and authorization grouping |
| Role | Job function, permission bundle, relationship label, social role | Collapses semantics, permissions, and responsibility |
| Identity | Real-world personhood, credentialed subject, account identity, profile | Collapses entity, claim, authenticator, and identifier |
| Principal | Human user, service account, agent, organization acting entity | Good for authorization, too narrow for social modeling |
| Tenant | Isolation boundary, customer organization, billing unit, realm | Collapses infrastructure boundary and social/legal actor |

The key design move is to stop using `user` as the root concept.

## Candidate Canonical Vocabulary

## Entity and Actor Layer

### Entity

Anything that can be referred to as a modeled thing: person, organization, family, community, bot, service, account, resource, project, domain, or device.

### Actor

An entity capable of intentional or delegated action in a system. Examples include human persons, organizations acting through representatives, AI agents, service accounts, and community bots.

### Natural Person

A human being. This should not be identical to `user`, because a person can have many accounts, profiles, personas, and relationships.

### Collective Actor

A group-like actor that can act collectively or be represented by members/admins. Subtypes include enterprise, department, family, community, interest group, vendor, customer tenant, and project team.

### Artificial Actor

A bot, service account, automation, coding agent, or autonomous agent.

## Identity and Account Layer

### Identity

A claim-bearing representation of an actor in a context. An actor can have multiple identities.

### Identifier

A value used to refer to an identity or entity: UUID, email address, username, OIDC subject, SAML NameID, DID, domain name, phone number, employee number.

### Account

A system-local operational identity used for login, profile, preferences, sessions, and credentials.

### Profile

A presentation surface of an identity or account. A profile may be public, private, tenant-local, app-local, community-local, or audience-specific.

### Persona

A deliberate contextual identity expression of an actor. Examples include private person, employee persona, admin persona, and pseudonymous community handle.

### Credential

Something used to authenticate or prove a claim: password, passkey, certificate, TOTP seed, recovery factor, verifiable credential, or domain ownership proof.

### Authenticator

The concrete authentication factor or mechanism bound to an account/subscriber.

## Scope and Tenancy Layer

### Scope

A bounded context in which identifiers, memberships, roles, policies, and profile data have meaning.

### Tenant

A scope with operational isolation and delegated administration. A tenant may be backed by an organization, family, community, individual, vendor, or platform unit.

### Realm / Identity Domain

A hard identity boundary with separate users, credentials, clients, policies, and lifecycle.

### Organization

A structured collective actor with governance, membership, and possibly sub-organizations. It may or may not be a legal entity.

### Legal Entity

An organization recognized by a jurisdiction. Not every organization, community, or team is a legal entity.

### Community

A collective actor primarily organized by shared interest, social graph, participation, or moderation rules rather than employment/legal hierarchy.

### Household / Family

A collective actor organized around family/household relationships, guardianship, shared resources, and dependent accounts.

### Spontaneous Group

A lightweight collective actor created ad hoc around temporary interest, event, project, or conversation.

Important distinction: tenant, organization, and community must not be synonyms. A tenant is an operational boundary. An organization, community, or family is a social/legal actor that may own or inhabit a tenant.

## Relationship Layer

### Relationship

A typed edge between entities, actors, accounts, scopes, resources, or other modeled concepts.

### Membership

A relationship where an actor participates in a collective actor or scope.

### Affiliation

A looser relationship indicating association without necessarily implying membership, authority, or access.

### Representation

A relationship where one actor can act on behalf of another.

### Delegation

A scoped, revocable grant of authority from one actor to another.

### Administration

A delegated authority to manage lifecycle, membership, policy, or resources in a scope.

### Ownership

A strong control or responsibility relationship over an entity, resource, or scope. This may require legal, operational, and data-control subtypes.

### Follower Relationship

A directional social relationship expressing subscription or attention, not necessarily trust, membership, or permission.

### Trust Relationship

A relationship where one actor accepts claims, credentials, or decisions from another actor under defined conditions.

## Role and Capability Layer

### Role

A named relationship pattern in a scope. Examples include member, owner, moderator, billing admin, guardian, employee, and vendor admin.

### Capability

An ability to perform an action, usually derived from roles, policies, relationships, credentials, or explicit grants.

### Permission

A concrete allowed action on a resource type or instance.

### Policy

A rule that derives permissions or capabilities from relationships, attributes, credentials, and context.

This prevents the classic collapse of role, group, permission bundle, and job title.

## Synonymity and Identity Resolution Layer

### Strong Synonymity

Two identifiers, accounts, or identities are asserted to refer to the same underlying actor with high confidence and strong evidence.

Examples:

- same verified OIDC subject from the same issuer;
- account explicitly linked after re-authentication;
- verifiable credential bound to the same DID/controller.

### Weak Synonymity

Two records may refer to the same actor based on partial, contextual, or probabilistic evidence.

Examples:

- same email seen in imported CSV and social profile;
- matching name/domain;
- same account handle without explicit proof.

### Scoped Synonymity

Two identifiers are treated as equivalent only within a defined context.

Example:

- a pairwise OIDC subject mapped to a local account for one relying party.

### Operational Link

A system-level account link used for convenience, not necessarily a real-world identity assertion.

### Legal Identity Link

A stronger assertion that may support contracts, billing, employment, guardianship, or compliance.

### Privacy-Preserving Link

A link that enables continuity without exposing global identity.

Examples:

- pairwise identifiers;
- pseudonymous handles;
- tenant-local subjects.

## Synonymity Assertion Fields

A synonymity assertion should carry at least:

```text
source
target
relation_type: same_as | probably_same_as | linked_to | represents | controls | acts_for
strength: weak | medium | strong | authoritative
scope
evidence
issuer/source_system
created_at
valid_from / valid_until
revocation_state
privacy_classification
```

## Initial Conceptual Model Shape

```text
Entity
  ├─ Actor
  │   ├─ NaturalPerson
  │   ├─ CollectiveActor
  │   │   ├─ Organization
  │   │   ├─ LegalEntity
  │   │   ├─ Community
  │   │   ├─ FamilyOrHousehold
  │   │   └─ SpontaneousGroup
  │   └─ ArtificialActor
  │       ├─ ServiceAccount
  │       ├─ Bot
  │       └─ Agent
  ├─ Account
  ├─ Profile
  ├─ Credential
  ├─ Resource
  └─ Scope
      ├─ Tenant
      ├─ Realm
      ├─ OrganizationScope
      ├─ CommunityScope
      └─ ApplicationScope
```

## Initial Relationship Model Shape

```text
Relationship
  subject_entity_id
  relation_type
  object_entity_id
  scope_id
  source
  evidence_ref
  strength
  status
  valid_from
  valid_until
  metadata
```

## Example Statements the Model Should Express

```text
Bernd is member of Binect
Binect is sub-organization of Whynot GmbH
User account A is operated by Bernd
ActivityPub actor @x follows @y
Child account C is represented by guardian G
Vendor tenant V provides application App1
Customer tenant C consumes application App1
Service account S acts for organization O in scope T
OIDC subject sub123 is strongly linked to local account U in relying-party scope R
Email e@example.com is weakly linked to person P based on imported evidence
```

## CLI/UI Implications for Later Work

Although `identity-canon` is not an implementation repository, the model should later support convenient CLI/UI workflows such as:

```text
create-person
create-organization
create-community
create-family
create-spontaneous-group
create-tenant-for-actor
invite-member
link-account
claim-domain
assign-admin
delegate-authority
create-service-account
create-agent
add-follower-edge
assert-synonymity
review-synonymity
revoke-link
export-scim
sync-ldap
provision-keycloak
```

These workflows should remain downstream implementation concerns.

## Working Hypothesis

A strong canonical model can be based on five orthogonal primitives:

```text
Actor        who/what can act
Identity     how an actor is represented or claimed in a context
Scope        where a statement has meaning
Relationship how modeled things are connected
Evidence     why a statement is trusted
```

From these, operational IAM concepts can be derived:

```text
User      = account/identity used by a natural person in a scope
Tenant    = operational scope with delegated administration
Group     = collective actor or membership set, depending on context
Role      = named relationship/policy pattern in a scope
Org       = structured collective actor
Community = participatory collective actor
Family    = household/kinship collective actor
```

## Research Direction

The next step is to populate the source-stack notes, extract terminology from each source, and create:

- terminology inventory;
- terminology conflict map;
- canonical glossary;
- concept cards;
- scenario tests;
- conceptual model;
- synonymity model;
- scope model;
- downstream recommendations.