By Josh Rubel, Chief Commercial Officer, MDClone
Twitter: @MDClone_
From electronic health records to perioperative information systems to supply chain and finance, health systems are swimming in data – but using that data to improve care quality and performance remains a significant challenge.
All hospitals employ a staff of smart, well-educated, highly trained clinicians who hold the potential to use these rich data sets to discover new ways of enhancing patient care, but they are limited in their abilities to get their hands on the data, perform analysis, and drive change.
In many cases, the roadblock that limits clinicians’ access to data is patient privacy – and rightfully so. Health data is highly personal and deserves a strong level of protection. However, that protection often comes at the cost of denying clinicians the ability to derive potential insights from the troves of valuable data that hospitals possess. For example, a clinician may want to determine whether there is a correlation between patients who are prescribed a certain post-operative pain medication and readmission, but may lack the authorizations necessary to dig into all of the hospital’s data sets that would enable her to answer that question.
To overcome this problem, many hospitals and health systems have turned to synthetic data. Synthetic data is information that has been derived from original or real data, and it tells the same story as that original data, but it holds no information pertaining to real people. Synthetic data enables clinicians to understand what is happening inside of a real patient population, with the synthetic population full of fictional patients that match the characteristics of the real population, but created from scratch.
Coupling synthetic data with simple, easy to use extraction and modeling tools, clinicians are granted free access to the breadth of their health systems’ data, allowing them “self-service” to ask their own novel questions that can yield new insights to spur patient care quality improvement initiatives.
The traditional approach: Unscalable and demoralizing
Health systems’ traditional approach to deriving insights from their own data often involves large teams of data analysts. Clinicians think of their own questions, then send these queries to data analysts who must assemble an appropriate data set, which can be a lengthy process. In this approach, when a clinician poses an interesting question and thinks there is data inside the health system to answer it, she must effectively get in line with any other administrators or clinicians who are also seeking answers from the hospital’s data.
This approach is inherently unscalable and demoralizing to clinicians and researchers who must wait to obtain the answers to important questions that could help improve care quality and health system performance. In contrast, a self-service approach using synthetic data allows health systems to avoid the bottlenecks.
By offering self-service, health systems grant more clinicians and administrators in the organization direct access to have a dialogue with data, driving more quality and performance improvement initiatives at a much faster rate. Without the any of the privacy challenges normally associated with patient data, health systems can widen the scope of potential users of that data, enabling employees to work directly with their data to find opportunities for performance and clinical improvement.
Synthetic data in the real world: The Veterans Health Administration
Importantly, synthetic data differs from de-identified data in that it is built from scratch, as opposed being based on individual patient records, which means synthetic data cannot be de-anonymized. While synthetic data is relatively new to the healthcare industry, many health systems have already begun to embrace it as a means of analyzing large but sensitive samples of real individual-level patient data.
To cite one example, the Veterans Health Administration (VHA) leveraged synthetic data to build a neural network model to predict which heart failure patients were at risk of additional adverse events after their first admission for heart failure. Heart failure can be difficult to manage because the disease is variable. When veterans are discharged from the hospital, their therapy must be regularly adjusted based on their dynamic physiological state. The heart failure management guidelines provided by the VHA indicate that hospitalizations due to heart failure could be largely prevented if ambulatory care were provided in a timely and effective manner, but, too often, siloed data delays the intervention.
The VHA performed an initial cohort analysis and included a wide variety of clinical variables such as labs, conditions, and vital signs to build a prediction model to understand the phenotypes of patients more likely to die after discharge. The predictive models for adverse events were trained on these synthetic datasets and then rerun on de-identified, underlying data.
The team was able to demonstrate that the model results obtained from synthetic data were statistically similar to those using the deidentified data, enabling the VHA to begin developing a strong predictive model and gain early insights within the constraints of their data governance policies for sharing data for external collaborations. Without synthetic data, the VHA would have faced significant data governance challenges that would have presented hurdles to overcome and delay the innovation.
Health system IT leaders know they are in possession of mountains of valuable data, but sometimes struggle to extract full value from that data. By removing privacy concerns, self-service synthetic data enables health systems to empower their clinicians to obtain the answers to important patient-care questions, accelerating quality and performance improvement projects.