Files
identity-canon/research/entity-resolution-privacy/gdpr-pseudonymization.md
tegwick 1c1b5c9bc6 Complete IDENTITY-WP-0003 corpus backfill and model refinement
Backfill all 23 research source notes with terminology extracts, modeling
assumptions, conflicts, canonical mappings, and references. Refresh terminology
artifacts, refine the conceptual model with explicit scenario paths, reconcile
canon surfaces and open questions, and mark the workplan finished.
2026-06-21 20:22:20 +02:00

5.3 KiB

GDPR Pseudonymization and Privacy

Source Type

Regulatory guidance. EU GDPR (Regulation 2016/679) Article 4(5) and Recital 26; EDPB guidance on identifiability, anonymization, and data subject rights.

Domain

Privacy regulation, pseudonymization, identifiability, data minimization, and lawful basis for identity processing.

Why This Source Matters

GDPR pseudonymization and identifiability concepts affect how canonical models should represent privacy-limited links, scoped identifiers, and correlation risk.

Key Concepts

  • Personal data: information relating to identified or identifiable natural person.
  • Identifiable person: can be identified directly or indirectly by reasonable means.
  • Pseudonymization (Art. 4(5)): processing personal data so it cannot be attributed to a subject without additional information kept separately.
  • Anonymization: irreversible de-identification; data no longer personal.
  • Data subject: identified or identifiable natural person.
  • Controller / Processor: roles responsible for processing personal data.
  • Purpose limitation: data used for specified, explicit, legitimate purposes.
  • Data minimization: adequate, relevant, limited to necessary.
  • Right of access / erasure: data subject rights affecting linked records.
  • Additional information: key held separately to re-identify pseudonymous data.

Relevant Terminology

Term Source meaning
Personal data Data about identifiable natural person.
Pseudonymization Reversible de-identification with separate key.
Anonymization Irreversible; no longer personal data (if effective).
Data subject Natural person the data relates to.
Identifiable Reasonably linkable to person.
Additional information Re-identification key stored separately.
Controller Determines purposes and means of processing.
Processing Any operation on personal data.
Erasure Delete personal data (right to be forgotten).
Profiling Automated evaluation of personal aspects.

Modeling Assumptions

  • Pseudonymization is not anonymization; data may remain personal.
  • Separate storage of additional information is required for pseudonymization.
  • Scope and access control on keys determine correlation risk.
  • Linking pseudonymous records across purposes may increase identifiability.
  • Legal basis and purpose govern whether linking is permissible.
  • Erasure requests may require breaking links or deleting assertions.
  • Regulatory role (controller) is organizational, not purely technical.

Identity-Canon Implications

  • Pseudonymous Identifier and Scoped Identifier map to pseudonymization techniques (pairwise sub, hashed email, internal IDs).
  • Privacy-limited Synonymity Assertion must record privacy classification and scope (S14).
  • Additional information (re-identification key) maps to separately secured Evidence Source or Credential with strict Scope access.
  • Data subject maps to Natural Person with privacy rights overlay (downstream policy, not canon legal advice).
  • Erasure maps to Lifecycle State transitions: revoke assertions, sever bindings, archive with legal exceptions noted downstream.
  • Pairwise OIDC, tenant-local subjects, and restricted persona links are technical pseudonymization patterns aligned with GDPR concepts.
  • Reinforces visibility of privacy constraints on relationships (P8, S14 checks).

Terminology Conflicts

  • Pseudonym vs. Pseudonymization: pseudonym is identifier; pseudonymization is processing technique.
  • Anonymous vs. Pseudonymous: often conflated in product marketing.
  • Identity vs. Personal data: not all identifiers are personal data in all contexts.
  • Deletion vs. Revocation: erasure may require more than assertion revocation.
  • Subject: GDPR data subject vs. OIDC/SAML subject.

Candidate Canonical Mappings

GDPR concept Candidate canonical concept
Data subject Natural Person (privacy overlay)
Pseudonymization Processing pattern on Identifier / Profile
Pseudonymous identifier Scoped Identifier / Pseudonymous Identifier
Additional information Separately secured Evidence Source / key
Purpose limitation Scope + policy metadata on processing
Cross-system link Synonymity Assertion (privacy classification required)
Erasure request Lifecycle State + assertion revocation
Identifiability risk Privacy classification on links
Controller Organization actor (downstream legal role)
Anonymized dataset Out of scope for personal identity linking

Open Questions

  • Should canon include a standard privacy_classification enum for assertions?
  • How should erasure of one account affect Synonymity Assertions touching other accounts (S02)?
  • Does pseudonymization key storage warrant a canonical secured Scope type?
  • Should identifiability review be documented as operator workflow in downstream recommendations only?

References