21 KiB
ResearchProposal.md
Research Proposal: identity-canon
1. Purpose
identity-canon is a research and terminology project focused on developing a canonical conceptual model for identity, user, organization, community, tenant, and relationship management in complex multi-tenant systems.
The purpose of this research is to distill a coherent, orthogonal vocabulary and conceptual data model from overlapping domains such as identity and access management, enterprise directories, social graph systems, community platforms, family and household account models, decentralized identity, entity resolution, and relationship-based authorization.
The resulting work should provide a stable conceptual foundation for later implementation projects, including CLI tools, UI components, identity adapters, user-management engines, organization-management systems, and relationship graph services.
This repository is intentionally not an implementation project. It does not build a user-management system, identity provider, authorization engine, CLI, or UI. Its role is to clarify the conceptual ground before such systems are built.
2. Background
Modern identity and user-management systems often reuse the same terms with different meanings. Words such as user, account, identity, organization, group, role, tenant, profile, principal, and member are commonly overloaded.
In classic enterprise IAM, a user may mean an account in a directory. In social platforms, a user may mean a public profile, handle, or actor. In authorization systems, the relevant concept may be a principal or subject. In multi-tenant SaaS systems, an organization may be a customer, tenant, billing unit, legal entity, administrative boundary, or application-specific workspace. In communities and social graphs, groups, followers, moderators, families, and spontaneous interest groups introduce relationship patterns that do not fit cleanly into traditional enterprise directory models.
At the same time, modern systems increasingly need to support:
- multiple tenants;
- multiple vendors;
- multiple customer organizations;
- organizations with sub-organizations;
- communities and social graphs;
- family and household entities;
- individual users;
- bots, service accounts, and AI agents;
- delegated administration;
- cross-system identity linking;
- weak and strong synonymity between identity records;
- privacy-preserving pseudonymous identities;
- relationship-based authorization.
The research challenge is to develop a conceptual model that is broad enough to represent these cases, but precise enough to avoid conceptual collapse.
3. Research Goal
The primary goal is to answer the following guiding question:
What is the smallest clear set of orthogonal concepts needed to model persons, accounts, identities, organizations, tenants, communities, families, agents, and their relationships across enterprise IAM, social systems, and multi-tenant platforms?
The research should produce a canonical vocabulary and conceptual model that can serve as a reference for future identity-related systems.
4. Research Objectives
The research has the following objectives:
-
Identify and compare relevant terminology from major identity, directory, federation, authorization, social graph, and decentralized identity systems.
-
Document where common terms overlap, conflict, or imply different modeling assumptions.
-
Define a clear canonical vocabulary for the identity-canon project.
-
Distinguish social, legal, operational, authentication, and authorization meanings of identity-related concepts.
-
Develop a conceptual entity and relationship model that can represent natural persons, accounts, organizations, tenants, communities, families, service accounts, bots, agents, and spontaneous groups.
-
Define a model for weak and strong synonymity between entities, identities, accounts, identifiers, and profiles.
-
Identify how the conceptual model can map to practical systems such as SCIM, LDAP, OIDC, SAML, Keycloak, LLDAP, Authelia, privacyIDEA, OpenBao, OpenFGA, Cedar, ActivityPub, FOAF, DID, and Verifiable Credentials.
-
Provide recommendations for downstream implementation repositories without prematurely binding the research model to any specific tool or product.
5. Scope
5.1 In Scope
The research covers:
- terminology research;
- standards analysis;
- conceptual modeling;
- canonical vocabulary design;
- terminology conflict mapping;
- identity and account modeling;
- organization, tenant, community, family, and group modeling;
- actor and agent modeling;
- membership and relationship semantics;
- delegation and representation semantics;
- synonymity and identity-resolution concepts;
- source and evidence modeling;
- privacy and pseudonymity considerations;
- conceptual mapping to existing standards and tools;
- recommendations for downstream implementation work.
5.2 Out of Scope
The research does not cover:
- implementation code;
- database migrations;
- production APIs;
- CLI implementation;
- UI implementation;
- identity-provider implementation;
- authorization-policy implementation;
- direct integration with Keycloak, LDAP, SCIM, OIDC, SAML, OpenFGA, or other systems;
- operational runbooks;
- production security hardening;
- final product UX design.
Implementation-specific work may be derived from the research later, but remains outside this repository unless explicitly extracted into a separate implementation project.
6. Research Domains
The research should cover at least the following domains.
6.1 Identity Provisioning and Directory Models
Relevant topics include:
- SCIM users, groups, schemas, and provisioning flows;
- LDAP object classes and attributes;
- inetOrgPerson;
- organizational units;
- group membership models;
- enterprise directory assumptions;
- provisioning and deprovisioning lifecycle semantics.
Key questions:
- What does each system mean by
user,group,organization, andmember? - Which concepts are identity concepts, and which are directory organization concepts?
- How do provisioning models differ from authentication and authorization models?
6.2 Authentication and Federation
Relevant topics include:
- OpenID Connect;
- SAML;
- WebAuthn;
- passkeys;
- federated identity;
- subject identifiers;
- pairwise and public identifiers;
- identity provider and relying party relationships;
- authentication assurance.
Key questions:
- What is the difference between an actor, an identity, an account, a subject, and a principal?
- How should externally issued identifiers be represented?
- How should pairwise or pseudonymous identifiers be modeled?
- How should assurance levels and authentication evidence be represented?
6.3 Multi-Tenant IAM Systems
Relevant topics include:
- Keycloak realms, organizations, groups, roles, and clients;
- Keycape as a lightweight Keycloak-compatible concept;
- LLDAP and Authelia;
- privacyIDEA;
- ZITADEL organizations and projects;
- Ory Kratos and Keto;
- tenant administration;
- delegated user management.
Key questions:
- What is a tenant?
- How is a tenant different from an organization, realm, customer, or scope?
- How should vendor and customer relationships be modeled?
- How should one actor participate in multiple tenants or organizations?
6.4 Authorization and Relationship-Based Access Control
Relevant topics include:
- Google Zanzibar;
- OpenFGA;
- relationship tuples;
- ReBAC;
- RBAC;
- ABAC;
- Cedar;
- Cerbos;
- principal-action-resource-context models;
- delegated administration.
Key questions:
- Which relationships belong in the identity model?
- Which relationships belong only in the authorization model?
- How can canonical identity relationships support later authorization decisions without becoming an authorization engine?
- How should roles be distinguished from permissions, capabilities, memberships, and relationship labels?
6.5 Social Graph and Community Models
Relevant topics include:
- ActivityPub actors;
- followers and following relationships;
- communities;
- moderation roles;
- social profiles;
- handles;
- public and private personas;
- FOAF;
- WebID;
- Solid profiles;
- Schema.org person and organization concepts.
Key questions:
- How can social graph relationships coexist with enterprise IAM relationships?
- How should followers differ from members?
- How should communities differ from organizations?
- How should public personas differ from accounts and real-world persons?
6.6 Family, Household, and Small Group Models
Relevant topics include:
- family accounts;
- household membership;
- guardianship;
- dependent accounts;
- shared resources;
- informal group administration;
- temporary and spontaneous groups.
Key questions:
- How should families and households be represented without forcing them into enterprise organization models?
- How should guardianship, representation, and delegated control be modeled?
- How should informal groups differ from formal organizations and tenants?
6.7 Decentralized Identity and Verifiable Claims
Relevant topics include:
- Decentralized Identifiers;
- DID documents;
- Verifiable Credentials;
- issuers, holders, subjects, and verifiers;
- credential evidence;
- portable claims;
- externally controlled identifiers.
Key questions:
- How should externally issued claims be represented?
- How can credentials support relationship and membership assertions?
- How should the model distinguish a claim from a verified fact?
- How can decentralized identity concepts inform a canonical model without forcing the system to become a decentralized identity platform?
6.8 Entity Resolution, Synonymity, and Privacy
Relevant topics include:
- deterministic matching;
- probabilistic matching;
- entity resolution;
- record linkage;
- weak identity links;
- strong identity links;
- pseudonymization;
- anonymization;
- GDPR-sensitive identity linkage;
- source, evidence, confidence, and revocation.
Key questions:
- When can two accounts, identities, or identifiers be treated as the same actor?
- How should uncertainty be represented?
- What is the difference between operational linking and real-world identity equivalence?
- How should identity links be scoped?
- How should privacy-preserving identity linkage be modeled?
7. Initial Conceptual Hypothesis
The initial hypothesis is that the canonical model can be built around a small number of orthogonal primitives:
Entity— anything that can be referred to as a modeled thing;Actor— an entity capable of action or delegated action;Identity— a claim-bearing representation of an actor in a context;Account— a system-local operational identity used for access, profile, preferences, or credentials;Identifier— a value that refers to an entity, identity, account, or actor;Scope— a bounded context in which identifiers, relationships, policies, and claims have meaning;Relationship— a typed edge between modeled entities;Evidence— the basis on which a claim, relationship, or synonymity assertion is trusted;Credential— proof material or authentication material associated with an identity, account, or actor;Claim— a statement made by a source about an entity, relationship, or attribute.
From these primitives, more familiar concepts can be derived or specialized:
Useras an account or identity used by a natural person in a given scope;Tenantas an operational scope with isolation and delegated administration;Organizationas a structured collective actor;Communityas a participatory collective actor;FamilyorHouseholdas a kinship or domestic collective actor;Groupas either a collective actor, membership set, or authorization convenience, depending on context;Roleas a named relationship or policy pattern within a scope;Principalas an actor or account considered for an authorization decision.
This hypothesis should be challenged and refined through the research process.
8. Research Method
The research should proceed through iterative corpus analysis and model refinement.
8.1 Corpus Collection
Collect relevant standards, specifications, product documentation, architecture references, academic concepts, and practical examples.
Each source should be summarized with:
- source name;
- source type;
- domain;
- key concepts;
- relevant terminology;
- assumptions;
- modeling implications;
- conflicts with other sources;
- usefulness for identity-canon.
8.2 Terminology Extraction
Extract important terms from each source and classify them by domain.
For each term, capture:
- source-specific definition;
- implied assumptions;
- related terms;
- overloaded meanings;
- canonical candidate mapping;
- open questions.
8.3 Conflict Mapping
Identify where different systems use the same term for different concepts or different terms for similar concepts.
Examples:
uservspersonvsaccountvssubjectvsprincipal;tenantvsorganizationvsrealmvscustomer;groupvscommunityvsteamvsrole;membershipvsaffiliationvsfollower;identityvsidentifiervscredentialvsaccount.
8.4 Canonical Concept Design
Define canonical concepts that minimize overlap and make distinctions explicit.
Each canonical concept should include:
- name;
- definition;
- rationale;
- included meanings;
- excluded meanings;
- related concepts;
- examples;
- counterexamples;
- mapping to external systems;
- open issues.
8.5 Model Construction
Build an initial conceptual entity and relationship model.
The model should include:
- entity types;
- actor types;
- identity/account/profile/persona distinctions;
- scope and tenant distinctions;
- collective actor types;
- relationship types;
- synonymity assertions;
- evidence and claim references;
- lifecycle states.
8.6 Scenario Testing
Test the model against representative scenarios.
Initial scenarios should include:
- single person with one local account;
- person with multiple accounts across different scopes;
- enterprise with sub-organizations;
- vendor tenant providing applications to customer tenants;
- customer organization with delegated administrators;
- family with guardian and dependent accounts;
- spontaneous interest group;
- community with members, moderators, and followers;
- social media follower graph;
- bot or service account acting for an organization;
- AI agent acting under delegated authority;
- weak identity match from imported data;
- strong account link after explicit verification;
- pseudonymous profile linked only within a restricted scope;
- organization represented by a legal entity and several operational tenants.
8.7 Model Revision
Refine the model based on scenario tests and terminology conflicts.
Revisions should explicitly document:
- what changed;
- why it changed;
- which ambiguity was resolved;
- which scenarios are better supported;
- which trade-offs remain.
9. Expected Deliverables
The repository should eventually provide the following deliverables.
9.1 Research Corpus
A structured set of notes on relevant standards, tools, and conceptual domains.
Suggested location:
research/
9.2 Source Summaries
Concise summaries of individual sources.
Suggested location:
research/sources/
9.3 Terminology Inventory
A collected inventory of terms across sources.
Suggested location:
terminology/TerminologyInventory.md
9.4 Terminology Conflict Map
A document showing overloaded, conflicting, and ambiguous terms.
Suggested location:
terminology/TerminologyConflictMap.md
9.5 Canonical Glossary
The core glossary of identity-canon terms.
Suggested location:
canon/CanonicalGlossary.md
9.6 Concept Cards
Reusable concept definition files for important canonical concepts.
Suggested location:
canon/concepts/
9.7 Conceptual Model
A conceptual model describing entity types, relationship types, scopes, synonymity assertions, and evidence references.
Suggested location:
model/ConceptualModel.md
9.8 Scenario Tests
A set of scenario-based model validation notes.
Suggested location:
scenarios/
9.9 Open Questions
A running list of unresolved conceptual decisions.
Suggested location:
OpenQuestions.md
9.10 Downstream Recommendations
Recommendations for later implementation repositories.
Suggested location:
DownstreamRecommendations.md
10. Suggested Repository Structure
identity-canon/
INTENT.md
ResearchProposal.md
README.md
research/
CorpusIndex.md
sources/
scim.md
ldap.md
oidc.md
saml.md
nist-digital-identity.md
keycloak.md
zitadel.md
ory.md
activitypub.md
foaf.md
webid-solid.md
zanzibar-openfga.md
cedar.md
did.md
verifiable-credentials.md
entity-resolution.md
gdpr-pseudonymization.md
terminology/
TerminologyInventory.md
TerminologyConflictMap.md
ExternalTermMappings.md
canon/
CanonicalGlossary.md
DesignPrinciples.md
concepts/
Entity.md
Actor.md
NaturalPerson.md
CollectiveActor.md
Organization.md
Community.md
FamilyOrHousehold.md
Tenant.md
Scope.md
Identity.md
Account.md
Identifier.md
Profile.md
Persona.md
Credential.md
Principal.md
Relationship.md
Membership.md
Affiliation.md
Representation.md
Delegation.md
Ownership.md
Trust.md
SynonymityAssertion.md
Evidence.md
Claim.md
model/
ConceptualModel.md
RelationshipModel.md
SynonymityModel.md
ScopeModel.md
LifecycleModel.md
scenarios/
ScenarioIndex.md
SinglePersonSingleAccount.md
EnterpriseWithSubOrganizations.md
VendorCustomerTenants.md
FamilyWithDependentAccounts.md
CommunityWithFollowers.md
SpontaneousInterestGroup.md
ServiceAccountActingForOrganization.md
WeakIdentityMatch.md
StrongAccountLink.md
PseudonymousProfile.md
decisions/
DecisionLog.md
OpenQuestions.md
DownstreamRecommendations.md
11. Success Criteria
The research should be considered successful when it provides:
-
a clear distinction between persons, users, accounts, identities, profiles, personas, principals, and actors;
-
a clear distinction between tenants, realms, organizations, legal entities, customers, vendors, communities, families, groups, and scopes;
-
a relationship model that can represent memberships, affiliations, followers, ownership, representation, delegation, administration, and trust without collapsing them into generic groups or roles;
-
a synonymity model that can represent weak, strong, scoped, operational, legal, and privacy-preserving identity links;
-
terminology mappings to major external standards and tools;
-
scenario tests showing that the model can represent enterprise IAM, social graph, family, community, vendor/customer, and agentic use cases;
-
enough clarity for downstream projects to design schemas, APIs, CLI workflows, UI workflows, and adapters without re-solving the terminology problem.
12. Risks and Challenges
12.1 Over-Abstraction
The model may become too abstract to guide implementation. To reduce this risk, concepts should be tested against concrete scenarios.
12.2 Product Terminology Contamination
The model may accidentally inherit the terminology of one specific tool, such as Keycloak, LDAP, or OpenFGA. To reduce this risk, external mappings should be kept separate from canonical definitions.
12.3 Conceptual Collapse
Terms such as user, group, role, tenant, and organization may continue to overlap. To reduce this risk, the project should explicitly document excluded meanings and counterexamples for each canonical concept.
12.4 Privacy and Legal Complexity
Identity linking, synonymity, family relationships, and organizational representation may have privacy or legal implications. The research should document these implications without attempting to provide legal advice.
12.5 Scope Creep
The research may expand into implementation, authorization policy design, UI design, or operational tooling. To reduce this risk, downstream implementation ideas should be captured as recommendations, not built inside this repository.
13. Immediate Next Steps
-
Create the initial repository structure.
-
Add
INTENT.mdandResearchProposal.md. -
Create
research/CorpusIndex.md. -
Seed source notes for SCIM, LDAP, OIDC, SAML, Keycloak, ActivityPub, OpenFGA, Cedar, DID, Verifiable Credentials, and entity resolution.
-
Create
terminology/TerminologyInventory.md. -
Create
terminology/TerminologyConflictMap.md. -
Draft the first version of
canon/DesignPrinciples.md. -
Draft the first version of
canon/CanonicalGlossary.md. -
Create the first scenario tests.
-
Refine the conceptual model based on the scenario tests.
14. Working Definition of the Project
identity-canon researches and defines the conceptual language needed to model identity-related systems before they become implementation-specific.
It aims to make identity, account, organization, tenant, community, family, agent, relationship, and synonymity concepts precise enough that future systems can be built with less ambiguity, better interoperability, and clearer security boundaries.