The healthcare industry stands to benefit from the recent technological advances in processing Big Data. However, information sources exist in silos, limiting our ability to solve healthcare’s most complex problems. We expect that the breakdown of these data barriers, coupled with the impact of machine learning and AI, will form the basis of what will serve as a ‘healthcare OS’, in which the unification of disparate data sources will power applications that will reduce the gap between healthcare costs and patient outcomes. The scope of this article will be limited to two key elements of a ‘healthcare OS’ - breaking down data barriers and applying machine learning and AI, other potential building blocks such as security and data privacy are not addressed.
Data-driven healthcare affects all major stakeholders in the healthcare ecosystem including pharma companies, insurance companies, providers, and patients. As a patient, ‘healthcare’ is a part of everything — from the food you eat to the drugs you consume, to the providers and insurance companies you interact with. Machine learning and big data applied across each of these verticals will enable a ‘full-stack’ data-driven healthcare system.
Enablers of data-driven healthcare
Before diving into the different branches of data-driven healthcare and what we can do differently to accelerate adoption, it’s helpful to understand the recent developments in technology, science, and policy that are motivating the creation of a data-driven healthcare system. Here are a few that we see:
Combination of big data and rich data. The widespread uptake of electronic health records (EHR) has generated vast data sets from the thousands of data points collected from every patient/provider interaction. Along with standard patient demographic data, the EHR has also consolidated critical pieces of information that can greatly enhance the predictive accuracy of models and algorithms, such as lab values, dosing history, and records of medication delivery.
Instead of perceiving this data as a byproduct of healthcare delivery, we need to leverage it to improve the quality and efficiency of patient care. In other words, let’s use the data to predict and improve, instead of just storing it for compliance purposes (making it no better than the physical files they replaced).
Combined with unique sources of data surfacing from advancements in diagnostic technologies (‘rich data’), big data from EHRs and rich biomarker data from diagnostic labs will eventually enable us to improve the accuracy by which we diagnose and treat disease.
New assays enable multi-dimensional data. Dramatic drops in the cost of sequencing makes gathering genetic data more affordable and experimentation more viable. Also, new assays and analytics allow researchers to integrate biological perspectives – RNA expression, epigenetics, proteomics—into predictive models that are more representative of our biology.
Machine Learning and cloud computing. By harnessing the virtually limitless amounts of computing and data storage in the Cloud, machine learning (along with other quantitative approaches) can generate new insights more readily as the volume and richness of data increase exponentially each year.
Big data and machine learning are inextricably linked. For example, throwing an enormous amount of data to a poor machine-learning model won’t improve the predictive accuracy of that model. Inversely, a strong machine-learning algorithm without enough data cannot be applied effectively, rendering it almost useless.
Passive Data Generation. Another force is the ubiquity of information collecting devices such as mobile phones and smart watches. With 5 billion mobile phones now in use worldwide, of which nearly half being smartphones, a stream of real-time healthcare-related data – data related to everything from physical activity, demographics to sleep patterns, can be collected. Mobile devices also provide the best platform for connecting with the individual healthcare consumer.
Macro-level force towards value-based care. New value-based payment models are now incentivizing based on cost, quality, and outcomes. Because these new payment methods have the potential to upend health care stakeholder’s traditional patient care and business models, a sharper focus is now being placed on delivering results and demonstrating value of care, rather than financially incentivizing providers based on the volume of tests and procedures performed.
Data-Driven Healthcare Categories
As a result of the developments listed above, there are several areas where big data and machine learning are finding new solutions to old problems. As you read the list, we ask that you keep in mind that companies employing new ML technologies still need to overcome traditional hurdles. For example, in drug discovery, no matter how a new molecule is discovered (using ML or not), the asset will still have to contend with the same risk factors. Will a pharma company buy this asset? How is this specific asset differentiated from other attempts in this category? Will the drug work?
Applying this concept more broadly, it is important to understand how a technology impacts every player in the ecosystem. A company may have identified a pain point and developed a solution, but have they defined the reaction across the ecosystem? Are incentives aligned for adoption across payers, providers, and patients?
Drug Discovery. This branch combines machine learning with genomics, metabolomics, and proteomics to accelerate the process of drug discovery by identifying molecules that will be successful early on in drug development. The layers of abstraction are many fold. Linking physicochemical properties of a drug (e.g. lipophilicity, number of hydrogen atoms etc.) to gene expression, disease pathways, metabolites and protein signatures, to cell morphology and even downstream adverse drug-related events. The promise of this space is compelling for drug development. Machine learning techniques gives us a way to not just predict optimal compounds, but we also gain a deeper biological understanding of the underlying disease and pharmacological pathways.
Diagnostics. This space combines machine learning with disease markers and/or imaging data to detect the onset of disease. Some companies and research groups focus on enhancing a healthcare practitioners ability to quickly identify regions of interest on a CT scan or MRI image, e.g. enhancing a radiologist’s ability to quickly identify regions of interest on a mammogram, potentially reducing the number of false positives. Several companies are also focusing on early cancer detection by combining liquid biopsy with machine learning and genomic data. Though there is a ways to go in this space, we believe early detection of cancer will be even more powerful as other sources of data from cellular readouts (e.g. metabolites, proteins, epigenetics) are integrated into the predictive algorithm.
Population Management. In this category, big data with machine learning is utilized in a hospital to identify at-risk patients. Based on a patient’s characteristics, once he/she leaves the hospital, how likely is it that this person will suffer from a subsequent stroke or heart attack? Will the patient require additional attention from post-acute care providers to prevent downstream complications?
Precision Medicine. In the space of precision medicine, machine learning is used to identify the optimal treatment and dosing regimens for a particular patient with a specific disease. The contextual application pharmacogenomics (how genes affect drug response) alongside other relevant clinical and demographic factors holds much promise in elucidating which drug and dose will be most effective for a given patient. For example, InsightRX is focused on leveraging patient demographics, genetic data, and clinical lab data to individualize treatment at the point of care. Companies like Foundation Medicine sequence DNA from tumors and help determine which cancer therapy will be most effective.
Clinical trial matching. Clinical trial operators struggle with matching the right patient to the right trial. For example, an Alzheimer’s patient might find 100 potential trials on clinicaltrials.gov, each of which has an exhaustive list of eligibility criteria that must be read and assessed. Companies are directly addressing this problem with natural language processing and artificial intelligence algorithms to match patients with clinical trials more efficiently. Taking relevant clinical features from a patient’s records, such as symptoms, diagnoses, treatments, diagnostic measurements to create a multi-dimensional vector can then be matched against clinical trial eligibility criteria to find patients suitable for a trial very quickly.
Adherence and virtual assistants. Big data and machine learning are also used to remotely assess a patient’s symptoms and deliver alerts to clinicians only when patient care is needed. From the clinical side, this has the potential to reduce unnecessary hospital visits. It can also lessen the burden on medical professionals. From the patient side the underlying hypothesis is that patients need guideline-driven answers, but also need someone to help them stay on track. Patient-centered virtual health assistants may be a good option to give patients around the clock access to current information tailored to their medical condition, especially if they do not live in the vicinity of a healthcare provider.
Robot-assisted surgery. Cognitive robotics can integrate information from pre-operation medical records with real-time clinical and operational metrics to physically guide and enhance the physician’s instrument precision. The technology incorporates data from actual surgical experiences to inform new, improved techniques and insights. Such improvements may enhance overall patient outcomes.
We’re beginning to grapple with how data-driven healthcare may look in its next wave. Is it possible that the next generation of healthcare startups will be powered by the unification of data between seemingly disparate stakeholders? As mentioned earlier, the combination of EHR data with new assay technologies will fuel machine-learning applications. However, the access to such data in a uniform manner remains a bottleneck, that if addressed, will power the next generation of data-driven healthcare applications. Without defining the structural components, we’ve been considering the potential applications that could be created on top of a healthcare OS.
Precision Dosing. The future of individualized pharmacotherapy will integrate elusive data types such as drug adherence, diet, and even metabolic reactions to optimize treatment. Precision dosing, especially in the outpatient setting, stands to benefit from the unification of disparate data from different sources, providing a complete picture of a patient’s pharmacological profile. The Otsuka/Proteus partnership to develop the Abilify Mycyte System, which passively records the date and time of tablet ingestion, as well as certain physiological data such as activity level, marks a significant step towards the integration of seemingly disparate data sources and the future of pharmacotherapy.
Payers meet passive wearables. Would it be possible for payers to assess their premiums according to the health data generated by your Apple Watch or Fitbit? Could a payers’ actuarial analysis use wearable health data as an input and automatically adjust premiums? UnitedHealthcare is exploring this idea.
Personalized Health. One of the more ambitious manifestations of a healthcare OS is being created by the Chinese company, iCarbonX. The company is looking to gather data from traditional diagnostic equipment--genomic sequencing, blood biomarkers, metabolites, heart data--as well as from microbiome tests via a ‘smart toilet’ amongst many other sources. While the final result remains to be seen, the effort to correlate this sort of information alongside non-traditional sources (like forum posts on health websites like PatientsLikeMe), may demonstrate the value of intersectional healthcare data.
Advances in EHRs, diagnostic technologies, cloud computing and the shift towards value-based care lowers the activation energy for Big Data and machine learning to fundamentally transform the healthcare system.
To improve on the existing progress, there are a few things that we should consider doing differently. As a community, we need to taper our outrageous claims and hold outlandish promises at bay. Marketing is a powerful tool when used correctly. We need to be careful about not putting the cart before the horse and claiming benefit that doesn’t yet exist. Complexity derives from the time to train models with the right data, to come up with the right clinical questions, time to contextually understand disease from both the science and medical side.
For example, when IBM Watson’s famous Jeopardy appearance propelled artificial intelligence out of the realm of science fiction, it quickly became the poster child for machine learning applications in healthcare. The community expected that Watson would be able to synthesize patient symptoms, gene sequence, pathology reports, physician notes, and even relevant journal articles to aid doctors in diagnosis and treatment, an audacious and clinically meaningful objective.
However, Watson hasn’t yet been able to fulfill that promise. Its recent criticism is not a failing of the company but instead stems from overly optimistic claims by the media about how far along Watson should be by now. The monumental task of using machine learning to improve diagnostic and treatment accuracy is an endeavor that will take time and is far more complex than often made out to be.
We need to communicate in terms of added clinical and economic value, instead of through jargon such as “predictions” and “ROC” curves. Also, we need to communicate statistics appropriately and be mindful that we are not misusing it to support unsubstantiated claims.
We also should be consistent with the terminology that we use. We all have a habit of using terms like machine learning, big data, rich data, and artificial intelligence interchangeably, which can create downstream confusion and ultimately an additional friction point of widespread adoption.
It’s critical to realize that machine learning and big data are ultimately tools that help provide guidance and suggestions. Stories about replacing clinicians have pervaded the field of healthcare, however, healthcare practitioners by no means are going to be replaced. For the tools to effectively take off, we need to be just as mindful of the key healthcare players involved in using the tools and their role in the overall patient journey.
Humans and AI Machines are symbiotic in nature. We possess intuition and empathy, two key characteristics imperative for high-quality patient care. Machines can make sense of enormous amounts of data and can be (with the right algorithms and data) incredibly powerful at pattern recognition, allowing us to forecast a future health state for a particular patient. Combining intuition and empathy with the ability to predict with precision represents the future of healthcare.