Development of a Universal Breast Cancer Risk Prediction Tool for the Diverse US Population Using Data from 2.4 million Women Across 29 Cohorts

In the U.S., breast cancer is the second most common and the second deadliest cancer in women. Estimating individualized absolute risk for breast cancer can help stratify women at high risk who may benefit from enhanced preventive screening. Most risk models are trained and validated on data from non-Hispanic White women, though some models are now available for other specific population groups, including Black, Hispanic, and Asian/Pacific Islander American women. However, race-specific models can sow mistrust by modifying risk across groups without a biological basis,and they leave behind people who do not fall into one of only a few racial/ethnic categories. In this project, we will develop and validate a universal breast cancer risk prediction tool for all women living in the U.S. We will develop the relative risk model with data from the NCI funded Breast Cancer Risk Prediction Project, the largest and most diverse breast cancer consortium with over 2.4 million women including more than 140,000 breast cancer cases. We will use modern transfer learning and data integration methods to combine disparate data across studies and maximize information from underrepresented groups to develop a universal relative risk model with established breast cancer risk factors including medical and family history, anthropometric measurements, reproductive history, lifestyle factors, mammographic density, and hormonal biomarkers. We will then expand the model to include a polygenic risk score developed from on ongoing large and multi-ancestry genome-wide association study of breast cancer. For predicting absolute risk, we will use the iCARE framework to develop models with and without race/ethnicity-specific population incidence rates and explore their risks and benefits for downstream applications. Long-term goals include validation of the models across additional cohort studies of diverse background and implementation of the model for clinical applications.