Note: This article is the second of a two-part series. The quotes in this article have been edited and condensed for clarity.
In the panel, “The Data Problem in Scaling Artificial Intelligence” in Healthcare at the recent MedCity CONVERGE conference, experts from big pharma and health tech discussed opportunities and challenges posed by the data underpinning AI tools.
Panelists talked about some of the ways they enlisted AI to support cancer treatment. They also called attention to challenges such as the curation of data sets with normalized/standardized data and the need for context and transparency to avoid so called “black boxes”.
Participants included Chris Boone, Head of Real World Data and Analytics for Pfizer; Gaurav Singal, chief data officer at Foundation Medicine; Janak Joshi, chief technology officer and head of strategy at Life Image; and Nate Nussbaum, senior medical director at Flatiron Health. Brenda Hodge, chief marketing officer for healthcare at Nuance, served as the moderator.
Nussbaum of Flatiron Health highlighted the shortcomings of some clinical data. He pointed out that some variables that are important for characterizing a population aren’t captured through routine care because clinicians aren’t making those assessments on a regular basis. That’s led some companies to develop analytic approaches to deal with the missing data or approaches for using modeling to understand how to fill in those data gaps.
He also pointed out the importance of understanding outcomes in real world data compared with the data generated from a clinical trial.
“Something like mortality is fairly easy to define once you have the necessary source data. But something like whether a tumor is responding to treatment, in the real world, is much more complicated than in a clinical trial where you’re following set rules like Response Evaluation Criteria In Solid Tumors (RECIST) — a set of published rules that define when tumors in cancer patients improve (respond), stay the same (stabilize), or worsen (progress) during treatment.”
Democratizing data: Balancing pragmatism and responsibility with ethical considerations
Boone of Pfizer said although he’s a self described “data hippie” the democratization of data is fraught with ethical pitfalls because there are not yet best practices for how companies should interpret or analyze this data.
“On the one hand, we want to democratize the use of data, especially real-world data, because it can be used in so many contexts with so many use cases across the drug development life cycle…I hope that we can start to see much more data liquidity internationally and we can learn from all these patient experiences and be able to feed that back into not just clinical practice but clinical research. So how we create that model is still the million-dollar question, right? How do we create a model that’s sustainable, that becomes a win-win for all parties involved.
Joshi believes the patient should be at the center of this discussion, particularly given that some states have passed or are reviewing data privacy legislation akin to the GDPR legislation passed in the European Union.
“The patient should always be in the loop. It should not matter which vendor, facility, whether it’s ambulatory, or acute care, or an in-patient setting. Anywhere they are seen, patients should be aware of where their data is and how it is used. Putting information in the hands of the patients and engaging patients is becoming increasingly more important,” said Joshi.
He continued, “There was a big push about five years ago in the pharmaceutical space. Everybody was talking about beyond the pill. However, the least discussed topic was patient privacy, patient rights, and putting the patient in the loop with the trial process.”
Nussbaum agreed with Joshi but also emphasized the importance of making clinical research more patient-centered.
“I think the other side is helping to prioritize research questions that matter to patients to democratize the insights that we’re getting from those data sets. We can do that through collaborations with research partners, with academic centers. There are lots of ways to do that, but it involves getting a lot of parties together, working with patient advocacy organizations to figure out the best ways to put the data sets to use.”
The challenges of the single gene, single drug model
Singal noted that when Foundation Medicine was founded six years ago, the idea was that once they identify the gene, they’ll know the drug. But he observed that some conditions involve more than one mutated gene and that makes modeling more complex.
“I think what we’ve realized is that may have been an overly simplistic view of what the answer would be. I think we’re now starting to appreciate there’s probably interactions between multiple components of somebody’s phenotype, whether that’s multiple genes, multiple pathways, other components of their clinical history that ultimately will impact our ability to accurately predict who will and won’t respond to a therapy, which is ultimately what we’re all trying to do.”
Identifying multiple genes or networks of genes requires more genomic profiles mutations reIf you start to get into the world of not just one gene but two genes or three genes or networks of genes, now the size you need of data, labeled data with clinical outcomes specifically, that has to grow quite a bit. Despite sequencing more patients every week than every academic medical center in the country put together — 300,000 cases total. We have about 50,000 cases of profiled genomic results in collaboration with Flatiron Health that are harmonized that are linked to really high-quality clinical data that Nate just described. 50,000 on one hand is a lot and, on the other hand, is not nearly enough.”
Handling dirty data/unstructured data when scaling AI
Nussbaum of Flatiron noted that both structured and unstructured data needs to go through a clean-up process — something that many take for granted.
“There is an assumption that structured data is usable and ready to go. That’s an assumption that actually doesn’t hold in many cases because when you take structured data, let’s say lab data, for example, it comes through in all sorts of forms. The process of cleaning and merging into a single unified data model, having the same units, for example, for a lab result, is crucial. And it’s not glamorous work at all, but it’s fundamental to be able to put data sets to use.”
For unstructured data, Nussbaum said it takes time to figure out how to take clinical data and draw insights from it. PD-L1 testing in different types of cancer, for example, is a rapidly evolving science so a solution Flatiron developed a few years ago to collect this data from medical records is no longer suitable.
“It requires constant iteration and working closely to maintain an understanding of where the science is and how to pull that data out in a thoughtful way. Right now, we’re using humans because the data is so complicated that it requires human eyes to really understand all of the complexity and be able to get to a result that we can actually interpret. It’s also crucial to understand data quality. You have to measure the quality of data sets to understand how to use them appropriately.”