Our guest this episode was Farnoosh Abbas Aghababazadeh, who joined Reuben Hall to explore how machine learning and federated data approaches are shaping the future of cancer research.
“If you want to have reproducible research, you should start with the right data and standardized data” — Farnoosh Abbas Aghababazadeh on reproducibility in biomedical research.
Find Moving Digital Health on Apple Podcasts and Spotify, and subscribe to the MindSea newsletter to be notified about future episodes.
Read Transcript:
Reuben (00:00)
Welcome to the MindSea podcast series, Moving Digital Health. Our guest today is Farnoosh Abbas Aghababazadeh, postdoctoral research fellow at the Princess Margaret Cancer Center University Health Network. Thanks for joining us today Farnoosh!
Farnoosh Abbas (00:16)
Thanks Reuben for having me! It’s great to be here and looking forward for our conversation.
Reuben (00:23)
Excellent. Could you start by telling us a little bit about your background?
Farnoosh Abbas (00:28)
Sure, definitely. So I come from the strong statistics background. So I did both my undergrad and master in statistics and mathematical statistics in my home country. then in 2010, I moved to Canada to do my PhD program at the University of Ottawa.
Mainly focusing on the biostatistics. And after completing my PhD, I moved to the United States for a postdoctoral position training at the Moffitt Cancer Center. And that was amazing experience.
So that helped me a lot to dive deeper into a statistical genomics and gave me hands-on experience building computational pipelines for cancer research, including the omics data and more complex data. And then I eventually returned to Canada and joined Dr. Hepke’s lab at the Princess Margaret Cancer Center and more or less keep working on the same stuff that I did in the previous training and more in the intersection of the machine learning and computational pipelines, including the multi-omics with applications in pharmacogenomics and the precision oncology.
Reuben (01:54)
Okay, and as a research fellow at Princess Margaret Cancer Center, what does a typical day look like for you?
Farnoosh Abbas (02:03)
That’s really good question. Yeah, my days can vary, depend on the project needs.
It depends, but can be started by planning, by look at the my calendar, calendars and like check the to-do list and review any meetings or the deadlines or priorities for the day. And also can be started by starting the project, depending on the priorities to build up or improve the pipelines or debugging them. And even sometimes if the project finalized, I can focus on writing.
But these days, I think that I spend also most of the time for the meetings in terms of collaboration meetings that we have or even guiding the students that I’m supervising or co-supervising them and discuss about their progress or even, for example, the problems that they have in their tasks or even, for example, plan for the next steps. yeah, but I really try to do some sort of balancing in terms of the collaborations, the things that I’m doing and also the, like the deep focus on my learning and research, but sometimes I think that I should accept that sometimes it seems that it doesn’t go in a way that I expected.
Reuben (03:37)
Yes, for sure. And how many different projects might you be working on at the same time?
Farnoosh Abbas (03:45)
Many projects, but to be honest, the two main projects that I’m working on that, because the lab that I’m involved in, so we are mainly focused on improving the biomarkers pipelines for the discovery of the biomarkers and the two main projects, one of them, it’s related to the immunoncology and like build up the pipelines to see how we can improve the biomarker discovery that are associated with the immunotherapy and or for example, collect the data, immunotherapy data and also see that how, for example, we can have better pipelines to do the predictions for the immunotherapy response. It is one of the main projects that I think that it takes the majority of the time for me.
And I’m also leading at the BHK Lab for this project. And the other projects that’s more associated, again, with the biomarker discovery, but associated with the rare cancer sarcoma and soft tissue sarcoma. That one is also really interesting and it has its own challenges due to the limitations that we have for the soft tissue sarcoma data in terms of like the cell line data and also for the patient data and the other projects more associated with the collaborations or even collaborations with the clinicians inside the UHN and Princess Margaret Cancer Centers or even the pharma companies like the Roach and Coal.
Reuben (05:26)
I see, well that is a lot. I know the quality of the data is obviously very important. Where are you able to source the data sets that you need to do some of that research?
Farnoosh Abbas (05:43)
That’s really good question. So I think that in our lab, we focus mainly in the data curations and get access to the data that are really important for our projects. Sometimes we are waiting maybe long time to get access to the private data sets, but still we are trying to be patient and having them in the lab. But sometimes we maybe have the publicly available data sets, whether the clinical data or even the cell lines or the preclinical data.
But any data that we provide publicly available or the private one, we try to curate them. So we try to follow the standardizations that are defined in the lab in terms of the pharmacogenomics because I mainly work in the pharmacogenomics. So those standardizations can be like aligned with the codes for the clinical data sets to follow like the common platforms and for the like the genomics and the transcriptome data sets. So we have also the standardized pipeline and we are going through all these standardized steps for all the data to be curated.
Reuben (07:03)
So is there a lot of time spent cleaning up the data and standardizing it so you can do the research properly?
Farnoosh Abbas (07:12)
Exactly, it times a lot and usually we have like amazing curation team in the lab and also I was lucky to have the students to work with me and help for the both immunotherapy projects and also the sarcoma to curate the data and I think that sometimes the people or the researchers think that It doesn’t matter, but I think that if you want to have the reproducible research, you should start from the right data and the standardized data. Otherwise, it’s really difficult in this area or the time that we are living to have better collaborations.
Reuben (07:55)
And do you find there’s enough data out there to get really valuable insights? Do you find you’re always kind of looking for, I need more, I need more to get larger sample sizes?
Farnoosh Abbas (08:09)
That’s really good questions. It is exactly our concern for both like the projects that I’m working, the rare cancer sarcoma and also for the immunotherapy one. Always we are saying that we need more data.
So yeah, that’s really good question. So let me to open these for you that why we need more data. For example, for the immunotherapy project. So yes, still be a lot of research already done, but still it’s not enough. Maybe for example, some researchers mainly focus on the specific AMICS data like only RNA, but some others only DNA. Still we have, we need the biomarkers for the immunotherapy. So that’s the reason that how we can improve those biomarkers by having more data. Sometimes we have, we look at the publications and we see, yes, there are like maybe, I don’t know, more than 10 or more than 50 published biomarkers in the literature review,
but they are not really powerful biomarkers because maybe they drive from the data sets that doesn’t have enough samples or patient there, or maybe not enough validation already done on those biomarkers, or even maybe there are like the pan cancer biomarkers, they are not validated for the cancer specific. So these are a lot of questions available.
That’s the reason that even when we did the first paper for the immunotherapy, after that we noticed that, we are still at the beginning and we need more data, whether for example, waiting and get access to the more data because most of the amazing data sets, they are private and the data providers usually keep the privacy of the patients. That’s totally fine and we respect all the privacy for the patients, but we also want to have a better biomarkers. So we should try to get access or find better pipelines for the biomarkers discovery.
Reuben (10:08)
Okay. In addition to your work at the University Health Network, you also collaborate with the Terry Fox Research Institute. How did you get involved in that organization and how are you working with them?
Farnoosh Abbas (10:23)
Yes, that’s right. So in 2023, we were thinking that in terms of having more data, why not to start collaborations with Terry Fox and MOHCCN and Marathon of Hope team and network to get access to the more data that’s already provided in Canada across all the provinces.
So that’s the reason that in 2023, we sent applications for our projects, Immuno-Oncology projects, and that application was awarded. And so that was the start of the collaborations with the Terry Fox and MOHCCN And we granted to get access to the more relevant data for the Immuno-Oncology projects to see how we can improve the pipelines and the biomarker pipelines.
Reuben (11:25)
And so is some of that related to the recently published article in the Cancer Cell Journal, the Terry Fox Research Institute Marathon of Hope Cancer Centers Network, like you say, MOHCCN and the Pan-Canadian Precision Oncology Initiative.
Farnoosh Abbas (11:46)
Yes, some of them, because these MOHCCN it’s like the national network that was led by the Terry Fox and that support, for example, the projects that are like mainly relevant to the precision oncology. And they also provide like the MOHCCN gold cohorts that include, I think 15,000 patients.
That’s a lot, including the clinical genomic and even transcriptomic data set. And also this group can make connections between the scientists, patients, clinicians across all the provinces. And that’s a good connection for even like the sharing data between the groups and even like the improving the pipelines by getting access to those groups without moving the data between the institutes or between the provinces. So they also will consider some sort of federated systems and pipelines for even data and for sure for the pipelines as well.
Reuben (13:04)
Okay, so I know one of the major focuses of Terry Fox Research Institute is their digital health and discovery platform or DHDP with federated data sharing. For people who aren’t familiar, how would you explain what that means and why it’s important?
Farnoosh Abbas (13:28)
Yes, federated data sharing means we can work with data from multiple institutes or multiple cancer centers without sharing the data. So which means that instead of like centralize the data from different locations, so how about we send the pipelines or tools to those locations and they can train the models locally using their data and also they can have the control on the privacy of their data.
Because in the traditional way we usually centralize all the data and in this way, yes, we could have amazing data in house, like let’s say at the UHN. But we also had some sort of limitations because sometimes the data provider is not sharing the data due to the privacy or even the identified the patients, it’s not really helpful for them. So that’s there is an in a set of waiting or get access so we can improve the pipelines as a federated or decentralized. And for sure, in this way, we can cover more patients, improve the number of samples in the pipelines and also we can include some groups that we call it underrepresented groups, like for example, having more females in immunotropy datasets, right?
Because usually when we look at the data, we see we have more males compared to the females or maybe having more different races in the data also, for example, different cancer types or even for the rare cancers, I’m sure that this sort of federated works very well.
Reuben (15:22)
And in terms of standardization, like you mentioned, if you have all these data sets from different sources, is it a challenge to get them to line up and standardize them?
Farnoosh Abbas (15:39)
Yes, that’s really good questions. That’s really challenging. that’s what because as a use case that now we are working with the DHDP and Terry Fox for the immunoncology, we noticed that it is really challenging to standardize that. And that was a question for myself and the team that’s the Candid team, one of the parties in the DHDP and the Terry Fox.
So they try to harmonize and have some standardizations to follow through them to standardize all the data that they want to, for example, store for like as a federated. So because otherwise it’s impossible, right? So any script or any computationals, maybe at the beginning when we work on that in the lab as the centralized, so we have like the nice standardized curated data, right?
And then that is standardized should be some things that’s as a national or like the general format that can be like generalized across all other institutes and all other, for example, data providers also follow those steps.
Reuben (16:50)
So what are some of the most promising use cases you see for the federated learning in oncology, particularly around immune oncology?
Farnoosh Abbas (17:00)
I’m sure that there are a lot of applications we can have for the federated learning in oncology or even in the immunoncology. So that can be including like predicting the treatment response, right?
So for example, because one of the challenges that we have in the immunotropy right now is figuring out which patients will actually respond to the specific treatment like, for example, immune checkpoint inhibitors. So then these trophies can be really life-changing, right? But they don’t, for example, can be valuable or they can’t work for all the patients or everyone, right? But I’m sure that the federated learning can build the predictive models.
by using the data from multiple institutions include like more patients and more samples and even without moving those data. And then for sure we can have a better, for example, or we can have better biomarkers or we can have better identified that those likely to benefit which patients can benefit from that. The other thing is the biomarker discovery.
And again, we can have the same story that by federated pipelines. So in a set up, for example, using only one local cohort to find the biomarkers. So we can include the different cohorts or different institute data and then improve the biomarker discoveries. And then for sure, we can have more powerful biomarkers and all these to like projects that can be defined for sure that can lead to have like better precision medicine and also better to design the clinical trials. So, and for sure we can have the improving the outcomes and efficiency as well.
Reuben (19:06)
Okay, and how do you see federated approaches evolving in the precision medicine over the next three to five years, let’s say?
Farnoosh Abbas (19:17)
It can become the core part for the precision medicine. so by making easier, just like respect to the privacy of the patients, but also make it easier for all the groups or the scientists across the, for example, let’s say in Canada, across different provinces to collaborate and for example, include more data and improve the biomarkers and also, for example, improve the pipelines without even moving the data. So for sure, it can be one of the core key for deep precision medicine.
Reuben (20:03)
Okay, and I know you’ve touched on some of this already, but could you tell us any more about some of your work in predictive oncology?
Farnoosh Abbas (20:15)
So for the predictive oncologies, yes, the Immuno oncology, it was one of the projects that I mainly focus on that and like improving the biomarker discovery and also do the predictions for the treatment response by considering we start with like the simple models, like the linear or the Cox models.
And then we also move to the more complicated one as the federated one and include the machine learning approaches. And even in this immunoncology, we try to dive in more about considering the impact of bias that we have in data in terms of the demographic aspects.
As I mentioned, for example, we have more males, less than females in the immunotropy datasets, or even we have more like the melanoma cohorts, but less for the sarcoma ⁓ publicly available or was provided. So that’s the reason we try to improve the pipelines as a federated to see how we can involve more cancer types, how we can involve even as the like checkpoint inhibitors involve more like treatments like targeted CTLA-4 instead of just for example because more are about the PD-1, PD-L1.
So also for example, try to see that by this federated, how we can address those sorts of bias that we may have in the data. So it is one of the projects and the other one, it’s more associated with the sarcoma and for that rare cancer sarcoma, specifically with the soft tissue sarcoma with like more than 10 subtypes that we have for the soft tissue sarcomas. So that’s a little bit challenging to find the biomarker as well. So that’s why we decided to start with like the preclinical or the cell line data that we have. And there are still limited and we had also challenges to annotate the cell lines for the soft tissue sarcomas in the right way and then, Do they like the biomarkers, predictive biomarkers, after finding the biomarkers from the cell lines and validate them on the clinical trial data.
Reuben (22:43)
Looking forward, what will be the focus of your research over the next year?
Farnoosh Abbas (22:51)
I keep working on the more or less the same path, but that I’m doing right now mainly focus on the cancer research and the precision oncology, but including more data modalities because for now I mainly focus on the DNA or RNA data or the clinical or even the preclinical data.
But we will see that more like data sets, single cell data sets, special data sets. So they are also coming and for sure, like any of these data sets individually matters. And even if we combine together, so for sure we can improve the biomarkers. So it is one of the things that I really want to try in terms of like also beside including more data also improving the pipelines that can handle the the multi-omics data as well. And for sure the federated or the decentralized pipelines can be like the priority as well in terms of including like the different data modality as well.
And beside all these things, I’m working in the lab and with a team that we mainly focus on the open science. And that really matters for me as well as a researcher or the computational scientists still keep it as the open science and in terms of the data or even the pipelines that we build.
Reuben (24:28)
Okay, well I really appreciate your insight today and all the work and research that you and your team are doing on cancer and improving our understanding and ways to treat this disease. Thank you so much for joining me on the podcast, Farnoosh
Farnoosh Abbas (24:46)
Thank you so much Reuben for having me.