# GDPR Pseudonymization and Privacy ## Source Type Regulatory guidance. EU GDPR (Regulation 2016/679) Article 4(5) and Recital 26; EDPB guidance on identifiability, anonymization, and data subject rights. ## Domain Privacy regulation, pseudonymization, identifiability, data minimization, and lawful basis for identity processing. ## Why This Source Matters GDPR pseudonymization and identifiability concepts affect how canonical models should represent privacy-limited links, scoped identifiers, and correlation risk. ## Key Concepts - **Personal data**: information relating to identified or identifiable natural person. - **Identifiable person**: can be identified directly or indirectly by reasonable means. - **Pseudonymization (Art. 4(5))**: processing personal data so it cannot be attributed to a subject without additional information kept separately. - **Anonymization**: irreversible de-identification; data no longer personal. - **Data subject**: identified or identifiable natural person. - **Controller / Processor**: roles responsible for processing personal data. - **Purpose limitation**: data used for specified, explicit, legitimate purposes. - **Data minimization**: adequate, relevant, limited to necessary. - **Right of access / erasure**: data subject rights affecting linked records. - **Additional information**: key held separately to re-identify pseudonymous data. ## Relevant Terminology | Term | Source meaning | | --- | --- | | Personal data | Data about identifiable natural person. | | Pseudonymization | Reversible de-identification with separate key. | | Anonymization | Irreversible; no longer personal data (if effective). | | Data subject | Natural person the data relates to. | | Identifiable | Reasonably linkable to person. | | Additional information | Re-identification key stored separately. | | Controller | Determines purposes and means of processing. | | Processing | Any operation on personal data. | | Erasure | Delete personal data (right to be forgotten). | | Profiling | Automated evaluation of personal aspects. | ## Modeling Assumptions - **Pseudonymization is not anonymization**; data may remain personal. - **Separate storage of additional information** is required for pseudonymization. - **Scope and access control on keys** determine correlation risk. - **Linking pseudonymous records across purposes** may increase identifiability. - **Legal basis and purpose** govern whether linking is permissible. - **Erasure requests** may require breaking links or deleting assertions. - **Regulatory role (controller)** is organizational, not purely technical. ## Identity-Canon Implications - **Pseudonymous Identifier** and **Scoped Identifier** map to pseudonymization techniques (pairwise sub, hashed email, internal IDs). - **Privacy-limited Synonymity Assertion** must record privacy classification and scope (S14). - **Additional information** (re-identification key) maps to separately secured **Evidence Source** or **Credential** with strict Scope access. - **Data subject** maps to **Natural Person** with privacy rights overlay (downstream policy, not canon legal advice). - **Erasure** maps to Lifecycle State transitions: revoke assertions, sever bindings, archive with legal exceptions noted downstream. - Pairwise OIDC, tenant-local subjects, and restricted persona links are technical pseudonymization patterns aligned with GDPR concepts. - Reinforces visibility of privacy constraints on relationships (**P8**, S14 checks). ## Terminology Conflicts - **Pseudonym vs. Pseudonymization**: pseudonym is identifier; pseudonymization is processing technique. - **Anonymous vs. Pseudonymous**: often conflated in product marketing. - **Identity vs. Personal data**: not all identifiers are personal data in all contexts. - **Deletion vs. Revocation**: erasure may require more than assertion revocation. - **Subject**: GDPR data subject vs. OIDC/SAML subject. ## Candidate Canonical Mappings | GDPR concept | Candidate canonical concept | | --- | --- | | Data subject | Natural Person (privacy overlay) | | Pseudonymization | Processing pattern on Identifier / Profile | | Pseudonymous identifier | Scoped Identifier / Pseudonymous Identifier | | Additional information | Separately secured Evidence Source / key | | Purpose limitation | Scope + policy metadata on processing | | Cross-system link | Synonymity Assertion (privacy classification required) | | Erasure request | Lifecycle State + assertion revocation | | Identifiability risk | Privacy classification on links | | Controller | Organization actor (downstream legal role) | | Anonymized dataset | Out of scope for personal identity linking | ## Open Questions - Should canon include a standard `privacy_classification` enum for assertions? - How should erasure of one account affect Synonymity Assertions touching other accounts (S02)? - Does pseudonymization key storage warrant a canonical secured Scope type? - Should identifiability review be documented as operator workflow in downstream recommendations only? ## References - GDPR Article 4(5) pseudonymization — https://gdpr-info.eu/art-4-gdpr/ - GDPR Recital 26 on identifiability — https://gdpr-info.eu/recitals-novo/26/ - EDPB Guidelines on identifiability (various) — https://edpb.europa.eu/ - ISO/IEC 20889 privacy enhancing data de-identification terminology