2 min read
Why de-identification isn't enough to protect health data
Caitlin Anthoney Jan 20, 2025 6:11:00 PM

HIPAA’s Privacy Rule defines deidentification as, “Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information.”
However, according to a recent Cornell University study titled, ‘De-identification is not always enough,’ traditional data de-identification methods aren’t sufficient in safeguarding individuals’ protected health information (PHI).
While removing names, addresses, and other identifiers can seem effective, research shows that de-identified clinical notes remain vulnerable to privacy attacks. As the study states, "de-identification of real clinical notes does not protect records against a membership inference attack."
What is a membership inference attack?
Michalsons describes a membership interference attack as “an AI attack during which an attacker tries to determine if you have used a particular person’s personal information to train a machine learning model. The attacker’s goal is to access the person’s personal information.”
Even after de-identification, subtle data patterns can be exploited, making it possible for attackers to re-identify individuals.
Ultimately, individuals’ sensitive health data, including their PHI, isn’t as private as many believe.
Why de-identification falls short
Complex data patterns
Healthcare providers’ clinical notes are usually filled with detailed, patient-specific information, so traditional de-identification methods might not remove all the identifying clues. As the study states, these methods cannot "adequately obscure these complex data patterns," leaving data exposed.
Smarter technology
With advancing machine learning models, it's easier than ever to detect hidden correlations in datasets. Malicious actors can use these tools to reverse-engineer de-identified data, turning it back into identifiable information.
Data linkage risks
De-identified data isn’t safe from being cross-referenced with external datasets. So, attackers can combine de-identified health data with publicly available information to identify patient identities.
What’s the answer?
To mitigate these issues, some researchers use synthetic data with artificially generated data meant to mimic real clinical notes.
However, synthetic data comes with its own risks. The study warns, that "when synthetically generated notes closely match the performance of real data, they also exhibit similar privacy concerns to the real data."
Additionally, researchers demonstrated that even models trained on synthetic data are susceptible to privacy breaches. They "proposed a way to mount a membership inference attack where the target model is trained with synthetic data."
So, what can be done?
Neither de-identification nor synthetic data is a perfect solution. "Whether other approaches to synthetically generated clinical notes could offer better trade-offs and become a better alternative to sensitive real notes warrants further investigation."
More specifically, HIPAA-covered entities, like healthcare organizations should not rely solely on de-identification to protect patient PHI.
Using a HIPAA compliant email solution, like Paubox, can help providers safeguard patient PHI. These solutions use advanced security measures, including encryption and access controls, safeguarding PHI during transit and at rest.
These solutions help prevent potential data interception, mitigating the risk of data breaches while upholding federal regulations.
Learn more: HIPAA Compliant Email: The Definitive Guide
FAQs
What is HIPAA?
The Health Insurance Portability and Accountability Act (HIPAA) sets national standards for protecting the privacy and security of certain health information, known as protected health information (PHI).
HIPAA is designed to protect the privacy and security of individuals’ health information and to ensure that healthcare providers and insurers can securely exchange electronic health information.
Who does HIPAA apply to?
HIPAA applies to covered entities, which include healthcare providers, health plans, and healthcare clearinghouses. It also applies to business associates of these covered entities. These are entities that perform certain functions or activities on behalf of the covered entity.
How does encryption help HIPAA compliance?
Encryption converts email content into a secure format only authorized recipients can access, preventing potential data breaches and upholding HIPAA compliance.