Multi-Risk Cancer Prediction Using Deep Sequential Modeling of Large-Scale Longitudinal EHRs

Early detection of cancers is of substantial benefit, as many malignancies are diagnosed too late; risk prediction in the general population can thus inform surveillance programs that enable earlier diagnosis. The presence of shared genetic, environmental, and lifestyle risk factors across different cancer types motivates a systematic, multi‐cancer risk‐prediction framework that offers greater predictive power than single‐cancer approaches. In our most recent work, we combined medication and diagnosis data from a large-scale VA database for pancreatic cancer risk prediction, achieving performance substantially better than risk estimates based solely on age and sex. Building on this success, we now propose an end‐to‐end, multi‐task deep‐learning architecture that leverages the longitudinal nature of electronic health record data and incorporates prior knowledge of tumor similarities to enhance predictive performance. Our model jointly learns both cancer‐specific and shared risk factors, enabling simultaneous risk prediction across multiple cancer types. We will benchmark this multi-cancer model against single-cancer versions of our earlier architecture and conduct a retrospective, out-of-distribution analysis comparing our AI-driven predictions with current clinical practice. Together with clinical experts, we will then devise a detailed plan for implementing a realistic cancer surveillance program.