RROADMAP: Creation of Metastatic Database From Radiology Reports Using Natural Language Processing
This project will generate a dynamic database of metastatic patterns for cancer patients reported in radiology studies, updated in real-time. While cancer databases typically contain patient metastatic status (M0 or M1), granular structured data of metastasis locations are lacking and requires manual curation of clinical reports by trained individuals. Given the increasing importance of incorporating metastatic locations into prognostic models of cancers, a comprehensive database of metastatic patterns would benefit cancer researchers investigating the underlying mechanisms of metastatic organotropism. Recent innovations in Language Models, a cutting-edge methodology in Natural Language Processing (NLP), allows accurate extraction of desirable information from radiology reports. Our group will build upon an existing database of metastatic patterns based on CT reports, to create a public database of metastatic tropism data as a resource for the cancer research community.
Final Report:
Our initial objectives were to develop natural language processing (NLP) models for annotation of PET/CT reports for metastatic disease, to document patterns of metastatic spread on imaging at large scale. Over the duration of this grant, our key accomplishments are:
- Development of BERT based language models trained with VASTA and radiologist annotations
- Transition to Large Language Models (LLMs)
- Integration of LLaMA and Bert models.
For a link to the complete narrative of these accomplishments, including tables, references and appendix, Download PDF



