Are Health Care Institutions Prepared?
By Ittai Dayan, MD, Co-Founder and CEO, Rhino Health
Following the enactment of the Health Insurance Portability and Accountability Act of 1996 (HIPAA), patient privacy cemented itself among the foremost concerns of healthcare providers. This focus on patient privacy was only amplified after Europe passed its own, stronger set of regulations under The General Data Protection Regulation (GDPR). These well-meaning regulations have unintentionally obstructed medical research.
Researchers have struggled to gain access to sufficiently diverse patient data in a sea of varying complex legislations, often forcing them to perform analyses on relatively homogenous datasets. This means that the results often don’t generalize to the wider population, because the data being tested is unrepresentative of various demographics. This situation becomes even more problematic with current advancements in artificial intelligence (AI) and intelligent software products, which pose additional risks of transporting bias present in underlying real-world datasets to additional populations.
Innovation in analytical techniques over the last few years offers a solution to the proverbial double-edged sword: scientists needing access to diverse patient data for research purposes but also needing to protect that data by minimizing access to it.
With the advent of reliable containerization technology, as well as a new approach to machine learning, called “federated learning,” researchers are able to use data for computation and algorithm development, without the underlying raw data points ever having to move outside institutional firewalls, thus being under institutional control at all times. Put simply, this is because, instead of moving the data itself, researchers move an AI model to the data. The model then performs tests on the data and only reports the results of the test back on a centralized server.
Federated learning thereby enables researchers to train AI models on diverse datasets, without making data owners give up control over data by transferring copies of them to third parties, thus mitigating any risks of violating patient privacy. It seems like a win-win, but are health IT teams ready to handle these novel technologies?
Supporting AI adoption in clinical settings
The federated learning approach permits predictive AI models to learn from diverse medical data around the globe, boosting the models’ performance for demographics and populations typically underrepresented in data sets. Federated learning thereby can help improve both the diversity and the quality of data available to medical researchers. Obviously, that has significant implications for the entire healthcare industry.
Consider how the adoption of AI models in clinical research settings has been stalled by a lack of available data, as well as by a lack of generalizability, which diminished the models’ performance when being deployed at the bedside. Researchers were forced to confront widespread inaccuracies when models trained on limited data sets were generalized to larger populations. AI models are simply not that accurate with only a few data points to learn from, and such inconsistencies largely stigmatized the adoption of AI.
But if researchers could train and validate AI models on diverse data from around the globe—data that’s representative of more populations—AI models would perform better and thus do more good when deployed to the bedside. As a result, the use of AI models in the health care sector would expand and encourage AI adoption in clinical settings. Such adoption would also facilitate better cross-institutional collaboration, in which researchers can deploy models to each other’s patient data pools.
How Health IT Teams Can Prepare
Traditionally, to ensure patient privacy, medical research endeavors relied on data de-identification, in which all of a patient’s identifying features (as defined by relevant legislation, such as HIPAA) are removed from the data. Think name, address, etc., but not necessarily age, ethnicity, or sex, because those are frequently relevant to the analysis.
However, this means re-identification is always a possibility, and the more researchers that have access to the data, the more likely re-identification becomes. Malicious actors may be able to compromise the data if multiple copies are made, and if combined with other public data sources (e.g. publicly available records), the identity of individuals can sometimes be deduced.
Either way, this has serious implications for patient privacy. But with federated learning, de-identification is no longer a threat, and institutions can breathe a sigh of relief over data governance.
However, though federated learning allays much in the way of patient privacy concerns, institutions need to refocus their efforts towards interoperability. In other words, datasets need to be standardized among collaborating institutions. Data needs to be in a common format to allow AI algorithms trained on one data set to read and analyze another.
Though some data standardization protocols are in place, like the United States Core Data for Interoperability (USDCDI), such regulations are not legally mandated for all relevant institutions, and they also don’t apply beyond the national level, and thus do not affect international research collaborations. The adoption of common data models and interoperability standards is also not comprehensive to date.
In addition to understanding the new paradigm for distributed analysis, such as federated learning, health IT teams must also work to improve interoperability and focus on developing protocols to enable strong data governance. Doing so will ensure high data quality as such data become increasingly available for privacy-preserving collaborations. In turn, this will ultimately lead to stronger research collaborations and better medical analyses powered by AI models.
The health care sector is finally getting on board with AI, and that’s good news for everyone. We just need to prepare accordingly to take full advantage of the shift.