Press Release

Artificial Intelligence in Health: USC Researchers receive $10.5 million to develop cutting edge machine learning approaches in cancer research

Bokie Muigai December 05, 2022
Big genomic data visualization. DNA test, genom map. Stream of encoded data. Graphic concept for your design
Image: iStock

The National Cancer Institute has awarded $10.5 million to the Division of Biostatistics at the Department of Population and Public Health Sciences at the Keck School of Medicine of USC. The Research Program Project Grant (P01) will enable researchers to develop statistical methods aimed at uncovering new risk-factors associated with cancer by drawing upon datasets of several cancer studies. The significance of the award is not only its highly competitive nature, but the allocation of funds towards the development of complex methods being applied to multiple sources of integrated data—rarely is a Program Project Grant awarded from the National Institutes of Health for solely statistical methodology research.

This incubator is co-led by James Gauderman, PhD, Chair of the Biostatistics Division, and Kimberly Siegmund, PhD, biostatistics expert in cancer modeling. The project aims to address some of the major problems faced in cancer research regarding vast data. The data are ‘big’ in terms of the large number of participants in epidemiological and clinical studies, as well as varying measures being collected across different investigations. The program will develop novel statistical methods to integrate large volumes of health, genomic, and exposure data to provide an opportunity for the discovery of genetic associations and genomic-by-exposure interactions related to cancer risk.

“We expect that the methods we develop and their corresponding applications to large ongoing studies will provide new insights into complex biological processes as well as discoveries of novel associations important for cancer risk, prognosis, and/or treatment” explains Gauderman. The first 5-year cycle of this program focused on colorectal cancer, including a study with over 130,000 subjects. Prior studies have produced a rich spectrum of potential biomarkers allowing for a better understanding of the disease and its prognosis. The renewal of the grant broadens the focus to also include breast, ovarian, and prostate cancer, with the potential to include other types of cancer. “What is exciting, is that there are many new technologies that allow us to measure what is happening inside our cells, and we are collecting those measurements on cancer cells,” shares Siegmund, an expert in statistical analysis and machine learning of epigenetic data in human disease. “So now, we are trying to create statistical methods to answer questions about how the cancer cells are behaving.”

This initiative is a collaborative effort that brings together numerous investigators with unique intellectual knowledge, interests, and experience from large-scale cancer epidemiology studies. This includes a recent nation-wide study of prostate cancer in African American men led by Chris Haiman, ScD, Chair of the Division of Epidemiology and Genetics. The grant also includes the Multi-ethnic Cohort established in 1993, with over 215,000 participants in Los Angeles counties and Hawaii. It is comprised of a diverse cohort including African American, Asian American, Caucasian, Hispanic, and Native Hawaiian participants. “We saw a need to bring together a number of people to the same table to discuss this new exciting data being collected, and explore how we can integrate genetics, environmental factors and a variety of omics measures,” explains Gauderman. “There is real utility in thinking about how to integrate data from multiple sources to examine aspects that were not anticipated by any one study alone.”

The researchers intend for their methods to be widely used throughout the field of cancer epidemiology. Similarly, another component of this study involves the development of statistical models into user-friendly software packages. It has been important to the research team to develop methods that will reach downstream users. “We are contributing to the software ecosystem by facilitating translation to other

data sets—so that anyone can mimic what we are doing, to their own data,” affirms Siegmund. In this manner, their methods will be applicable to genetic epidemiology studies investigating not only cancer but many other traits.

Aims of the Program

1. Develop statistical analysis and machine learning methods that integrate big data to identify novel cancer risk factors.

2. Apply these statistical methods to various cancer studies to discover new biological insights. This in turn will lead to predictive models to inform targeted cancer interventions, such as screening programs aimed at recommended lifestyle changes.

3. Share user-friendly software tools with the broader scientific community ensuring the methods developed are widely available to apply to a wide range of data.

The Research Program Project Grant (P01) supports broadly based, integrated, multi-project research programs that have a major scientific objective or common theme. The National Cancer Institute awarded $10.5 million to the Division of Biostatistics of the Department of Population and Public Health Sciences, Keck School of Medicine of USC. The grant will enable researchers to develop statistical methods aimed at uncovering new risk-factors associated with cancer