Can AI Improve De-identification of PHI?

By Zac Amos, Features Editor, ReHack
LinkedIn: Zachary Amos
LinkedIn: ReHack Magazine

Healthcare providers and information technology (IT) teams tirelessly safeguard systems against cyberattacks and preserve patients’ privacy. De-identification, one of their best defenses, has long been a staple in the industry, as the Health Insurance Portability and Accountability Act (HIPAA) requires.

According to HIPAA, data is only de-identified if it doesn’t contain any of the 18 specified identifiers that it considers personally identifiable information (PII). While some firms view this as their goal, they should only consider it the baseline, cyberattacks in the healthcare industry are rampant and cybercriminals are clever. Can artificial intelligence help them improve?

The Importance of De-Identification in Healthcare

Even with proper safeguards, most hospitals experience cyberattacks. One study revealed cybercriminals compromised 90.49% of health records from 2015 to 2019. The popularity of digitalization has given cybercriminals more ways to infiltrate networks and sensitive data storage systems.

Bad actors infiltrate networks, hack into internet-connected wearables and inject malware into systems to get medical records. Since most data breaches in this industry target peoples’ personal health information (PHI) or PII, de-identification is crucial. Hacks and cyberattacks are borderline inevitable, so the records must be anonymized to protect patients.

Why Healthcare Organizations Are Turning to AI

More healthcare companies are looking to AI as a solution for administrative tasks because it processes information rapidly and can improve with each training session. Using it to de-identify records for HIPAA compliance could result in productivity gains and cost savings. Around 15% of organizations using it for compliance decreased their costs by up to 20%.

Purpose-built models consistently outperform out-of-the-box alternatives. IT teams should consider using federated learning to develop a machine learning algorithm. Having each party train locally, the model updates via a central server — reduces the need for data sharing, protecting sensitive information and preserving privacy.

AI’s Role in De-Identifying Patient Information

While healthcare facilities could simply use AI to automate the de-identification process, it has multiple unique features they should take advantage of.

Image Recognition Technology
AI can use its image and optical character recognition capabilities to de-identify unstructured and semi-structured data. Instead of being bound to text-based spreadsheets, they can de-identify audio recordings of visits, handwritten clinical notes or medical imaging tests.

Automated Image Annotation
Metadata — information about the data itself, may inadvertently reveal PHI or help hackers re-identify patients. With automated image annotation, AI can automatically assign de-identified labels, captions or keywords to records based on predefined parameters. For example, it could generalize age groups or genders. On top of preserving privacy, it makes retrieval easier.

Natural Language Processing
Algorithms use natural language processing to understand text and hold conversations. Healthcare companies can use it to de-identify electronic health records. This technology is intelligent enough to understand its limitations and flag fields for review, so accuracy won’t be an issue.

How Providers Can Use AI to Improve De-Identification

Providers using AI for de-identification, or directly related processes, will experience performance, productivity and security improvements.

Enhancing Encryption Methods
Machine learning models can recognize hidden patterns to enhance key randomness, helping IT teams develop more robust data masking methods. Healthcare providers could take this approach to strengthen format-preserving encryption, a cryptographic technique that retains the plaintext’s formatting after turning it into ciphertext.

Generating Synthetic PHI
Generative models can produce synthetic PII or PHI. Since standard pseudonymization lacks sophistication and is relatively easy to crack, this strategy is ideal. The algorithm can eliminate connections to the original details while ensuring the records remain legible, lessening confusion and minimizing re-identification risk.

Ensuring HIPAA-Compliant APIs
An application programming interface (API) that interacts with electronic PHI must generally adhere to HIPAA. AI can help IT teams manage compliance by automating authentication measures and periodic audits. It can handle the technical aspect of security and even notify leaders when a breach or privacy issue occurs.

The Bottom Line of Using AI for De-Identification

AI can be a powerful business-to-business asset to help healthcare organizations comply with regulations and protect their reputations. IT teams have several implementation options, including generative, machine learning and deep learning models. Whatever their de-identification-related pain points are, this technology can resolve them.