Files
identity-canon/research/entity-resolution-privacy/gdpr-pseudonymization.md
tegwick 1c1b5c9bc6 Complete IDENTITY-WP-0003 corpus backfill and model refinement
Backfill all 23 research source notes with terminology extracts, modeling
assumptions, conflicts, canonical mappings, and references. Refresh terminology
artifacts, refine the conceptual model with explicit scenario paths, reconcile
canon surfaces and open questions, and mark the workplan finished.
2026-06-21 20:22:20 +02:00

114 lines
5.3 KiB
Markdown

# GDPR Pseudonymization and Privacy
## Source Type
Regulatory guidance. EU GDPR (Regulation 2016/679) Article 4(5) and Recital 26;
EDPB guidance on identifiability, anonymization, and data subject rights.
## Domain
Privacy regulation, pseudonymization, identifiability, data minimization, and
lawful basis for identity processing.
## Why This Source Matters
GDPR pseudonymization and identifiability concepts affect how canonical models
should represent privacy-limited links, scoped identifiers, and correlation risk.
## Key Concepts
- **Personal data**: information relating to identified or identifiable natural
person.
- **Identifiable person**: can be identified directly or indirectly by reasonable
means.
- **Pseudonymization (Art. 4(5))**: processing personal data so it cannot be
attributed to a subject without additional information kept separately.
- **Anonymization**: irreversible de-identification; data no longer personal.
- **Data subject**: identified or identifiable natural person.
- **Controller / Processor**: roles responsible for processing personal data.
- **Purpose limitation**: data used for specified, explicit, legitimate purposes.
- **Data minimization**: adequate, relevant, limited to necessary.
- **Right of access / erasure**: data subject rights affecting linked records.
- **Additional information**: key held separately to re-identify pseudonymous data.
## Relevant Terminology
| Term | Source meaning |
| --- | --- |
| Personal data | Data about identifiable natural person. |
| Pseudonymization | Reversible de-identification with separate key. |
| Anonymization | Irreversible; no longer personal data (if effective). |
| Data subject | Natural person the data relates to. |
| Identifiable | Reasonably linkable to person. |
| Additional information | Re-identification key stored separately. |
| Controller | Determines purposes and means of processing. |
| Processing | Any operation on personal data. |
| Erasure | Delete personal data (right to be forgotten). |
| Profiling | Automated evaluation of personal aspects. |
## Modeling Assumptions
- **Pseudonymization is not anonymization**; data may remain personal.
- **Separate storage of additional information** is required for pseudonymization.
- **Scope and access control on keys** determine correlation risk.
- **Linking pseudonymous records across purposes** may increase identifiability.
- **Legal basis and purpose** govern whether linking is permissible.
- **Erasure requests** may require breaking links or deleting assertions.
- **Regulatory role (controller)** is organizational, not purely technical.
## Identity-Canon Implications
- **Pseudonymous Identifier** and **Scoped Identifier** map to pseudonymization
techniques (pairwise sub, hashed email, internal IDs).
- **Privacy-limited Synonymity Assertion** must record privacy classification
and scope (S14).
- **Additional information** (re-identification key) maps to separately secured
**Evidence Source** or **Credential** with strict Scope access.
- **Data subject** maps to **Natural Person** with privacy rights overlay
(downstream policy, not canon legal advice).
- **Erasure** maps to Lifecycle State transitions: revoke assertions, sever
bindings, archive with legal exceptions noted downstream.
- Pairwise OIDC, tenant-local subjects, and restricted persona links are
technical pseudonymization patterns aligned with GDPR concepts.
- Reinforces visibility of privacy constraints on relationships (**P8**, S14 checks).
## Terminology Conflicts
- **Pseudonym vs. Pseudonymization**: pseudonym is identifier; pseudonymization
is processing technique.
- **Anonymous vs. Pseudonymous**: often conflated in product marketing.
- **Identity vs. Personal data**: not all identifiers are personal data in all
contexts.
- **Deletion vs. Revocation**: erasure may require more than assertion revocation.
- **Subject**: GDPR data subject vs. OIDC/SAML subject.
## Candidate Canonical Mappings
| GDPR concept | Candidate canonical concept |
| --- | --- |
| Data subject | Natural Person (privacy overlay) |
| Pseudonymization | Processing pattern on Identifier / Profile |
| Pseudonymous identifier | Scoped Identifier / Pseudonymous Identifier |
| Additional information | Separately secured Evidence Source / key |
| Purpose limitation | Scope + policy metadata on processing |
| Cross-system link | Synonymity Assertion (privacy classification required) |
| Erasure request | Lifecycle State + assertion revocation |
| Identifiability risk | Privacy classification on links |
| Controller | Organization actor (downstream legal role) |
| Anonymized dataset | Out of scope for personal identity linking |
## Open Questions
- Should canon include a standard `privacy_classification` enum for assertions?
- How should erasure of one account affect Synonymity Assertions touching other
accounts (S02)?
- Does pseudonymization key storage warrant a canonical secured Scope type?
- Should identifiability review be documented as operator workflow in downstream
recommendations only?
## References
- GDPR Article 4(5) pseudonymization — https://gdpr-info.eu/art-4-gdpr/
- GDPR Recital 26 on identifiability — https://gdpr-info.eu/recitals-novo/26/
- EDPB Guidelines on identifiability (various) — https://edpb.europa.eu/
- ISO/IEC 20889 privacy enhancing data de-identification terminology