A vision–language foundation model for precision oncology – Nature

Sammut, S.-J. et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature 601, 623–629 (2022).
Google Scholar
Vanguri, R. S. et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat. Cancer 3, 1151–1164 (2022).
Google Scholar
Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).
Google Scholar
Boehm, K. M., Khosravi, P., Vanguri, R., Gao, J. & Shah, S. P. Harnessing multimodal data integration to advance precision oncology. Nat. Rev. Cancer 22, 114–126 (2022).
Google Scholar
Lipkova, J. et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40, 1095–1110 (2022).
Google Scholar
Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).
Google Scholar
Kim, C. et al. Transparent medical image AI via an image–text foundation model grounded in medical literature. Nat. Med. 30, 1154–1165 (2024).
Google Scholar
Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).
Google Scholar
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Google Scholar
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).
Google Scholar
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Google Scholar
Vorontsov, E. et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat. Med. 30, 2924–2935 (2024).
Google Scholar
Wang, X. et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature 634, 970–978 (2024).
Google Scholar
Christensen, M., Vukadinovic, M., Yuan, N. & Ouyang, D. Vision–language foundation model for echocardiogram interpretation. Nat. Med. 30, 1481–1488 (2024).
Google Scholar
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).
Google Scholar
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
Google Scholar
Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature 634, 466–473 (2024).
Google Scholar
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. Int. Conf. Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).
Schuhmann, C. et al. LAION-5B: an open large-scale dataset for training next generation image-text models. Adv. Neural Inf. Process. Syst. 35, 25278–25294 (2022).
Google Scholar
Bhinder, B., Gilvary, C., Madhukar, N. S. & Elemento, O. Artificial intelligence in cancer research and precision medicine. Cancer Discovery 11, 900–915 (2021).
Google Scholar
Wang, W. et al. Image as a foreign language: BEiT pretraining for vision and vision-language tasks. In Proc. IEEE/CVF Conf. Computer Vision Pattern Recognition (eds Brown, M. S., Li, F.-F., Mori, G. & Sato, Y.) 19175–19186 (IEEE, 2023).
Gamper, J. & Rajpoot, N. Multiple instance captioning: learning representations from histopathology textbooks and articles. In Proc. IEEE/CVF Conf. Computer Vision Pattern Recognition (eds Brown, M. S., Sukthankar, R., Tan, T. & Zelnik, L.) 16549–16559 (IEEE, 2021).
Sun, Y. et a. PathMMU: a massive multimodal expert-level benchmark for understanding and reasoning in pathology. In Eur. Conf. Computer Vision (eds Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T. & Varol, G.) 56–73 (Springer, 2025).
Kim, J.-H., Jun, J. & Zhang, B.-T. Bilinear attention networks. In Adv. Neural Inf. Process. Syst. (eds Bengio, S.,Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N. &Garnett, R.). 1571–1581 (PMLR, 2018).
Nguyen, B. D. et al. Overcoming data limitation in medical visual question answering. In Proc. Medical Image Computing Computer Assisted Intervention–MICCAI 2019: 22nd Int. Conf. (eds Shen, D. et al.) 522–530 (Springer, 2019).
Li, L. H., Yatskar, M., Yin, D., Hsieh, C.-J. & Chang, K.-W. VisualBERT: a simple and performant baseline for vision and language. Preprint at https://arxiv.org/abs/1908.03557 (2019).
Naseem, U., Khushi, M., Dunn, A. G. & Kim, J. K-PathVQA: knowledge-aware multimodal representation for pathology visual question answering. IEEE J. Biomed. Health Inf. 28, 1886–1895 (2024).
Google Scholar
He, X., Zhang, Y., Mou, L., Xing, E. & Xie, P. PathVQA: 30000+ questions for medical visual question answering. Preprint at https://arxiv.org/abs/2003.10286 (2020).
Barbano, C. A. et al. Unitopatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading. In 2021 IEEE Int. Conf. Image Processing (ICIP) (eds alZahir, S., Labeau, F. & Mock, K.) 76–80 (IEEE, 2021).
Brancati, N. et al. BRACS: a dataset for breast carcinoma subtyping in H&E histology images. Database 2022, baac093 (2022).
Google Scholar
Veeling, B. S., Linmans, J., Winkens, J., Cohen, T. & Welling, M. Rotation equivariant CNNs for digital pathology. In Proc. Medical Image Computing Computer Assisted Intervention, MICCAI 2018: 21st Int. Conf. (eds Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C. & Fichtinger, G) 210–218 (Springer, 2018).
Kriegsmann, K. et al. Deep learning for the detection of anatomical tissue structures and neoplasms of the skin on scanned histopathological tissue sections. Front. Oncol. 12, 1022967 (2022).
Google Scholar
Kumar, N. et al. A multi-organ nucleus segmentation challenge. IEEE Trans. Med. Imaging 39, 1380–1391 (2019).
Google Scholar
Silva-Rodríguez, J., Colomer, A., Sales, M. A., Molina, R. & Naranjo, V. Going deeper through the gleason scoring scale: an automatic end-to-end system for histology prostate grading and cribriform pattern detection. Comput. Methods Programs Biomed. 195, 105637 (2020).
Google Scholar
Borkowski, A. A. et al. Lung and colon cancer histopathological image dataset (lc25000). Preprint at https://arxiv.org/abs/1912.12142 (2019).
Brummer, O., Pölönen, P., Mustjoki, S. & Brück, O. Integrative analysis of histological textures and lymphocyte infiltration in renal cell carcinoma using deep learning. Preprint at bioRxiv https://doi.org/10.1101/2022.08.15.503955 (2022).
Kather, J. N. et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16, e1002730 (2019).
Google Scholar
Arunachalam, H. B. et al. Viable and necrotic tumor assessment from whole slide images of osteosarcoma using machine-learning and deep-learning models. PLoS One 14, e0210706 (2019).
Google Scholar
Han, C. et al. Multi-layer pseudo-supervision for histopathology tissue semantic segmentation using patch-level classification labels. Med. Image Anal. 80, 102487 (2022).
Google Scholar
Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).
Google Scholar
Xu, F. et al. Predicting axillary lymph node metastasis in early breast cancer using deep learning on primary tumor biopsy slides. Front. Oncol. 11, 759007 (2021).
Google Scholar
Roetzer-Pejrimovsky, T. et al. The digital brain tumour atlas, an open histopathology resource. Sci. Data 9, 55 (2022).
Google Scholar
Atkins, M. B. et al. The state of melanoma: emergent challenges and opportunities. Clin. Cancer Res. 27, 2678–2697 (2021).
Google Scholar
Thompson, A. K., Kelley, B. F., Prokop, L. J., Murad, M. H. & Baum, C. L. Risk factors for cutaneous squamous cell carcinoma recurrence, metastasis, and disease-specific death: a systematic review and metaanalysis. JAMA Dermatol. 152, 419–428 (2016).
Google Scholar
VisioMel. Visiomel Challenge: Predicting Melanoma Relapse (2023) (accessed 1 April 2023); https://www.drivendata.org/competitions/148/visiomel-melanoma/page/674/.
Ikezogwo, W. et al. Quilt-1m: one million image-text pairs for histopathology. Adv. Neural Inf. Process. Syst. 36, 37995–38017 (2024).
Zhang, S. et al. Large-scale domain-specific pretraining for biomedical vision-language processing. Preprint at https://arxiv.org/abs/2303.00915 (2023).
Hellmann, M. D. et al. Nivolumab plus ipilimumab in advanced non-small-cell lung cancer. N. Engl. J. Med. 381, 2020–2031 (2019).
Google Scholar
Gandhi, L. et al. Pembrolizumab plus chemotherapy in metastatic non-small-cell lung cancer. N. Engl. J. Med. 378, 2078–2092 (2018).
Google Scholar
Samstein, R. M. et al. Tumor mutational load predicts survival after immunotherapy across multiple cancer types. Nat. Genet. 51, 202–206 (2019).
Google Scholar
Cristescu, R. et al. Pan-tumor genomic biomarkers for PD-1 checkpoint blockade-based immunotherapy. Science 362, eaar3593 (2018).
Google Scholar
Bagaev, A. et al. Conserved pan-cancer microenvironment subtypes predict response to immunotherapy. Cancer Cell 39, 845–865 (2021).
Google Scholar
Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).
Google Scholar
Mok, T. S. et al. Pembrolizumab versus chemotherapy for previously untreated, PD-L1-expressing, locally advanced or metastatic non-small-cell lung cancer (KEYNOTE-042): a randomised, open-label, controlled, phase 3 trial. Lancet 393, 1819–1830 (2019).
Google Scholar
Johnson, D. B., Nebhan, C. A., Moslehi, J. J. & Balko, J. M. Immune-checkpoint inhibitors: long-term implications of toxicity. Nat. Rev. Clin. Oncol. 19, 254–267 (2022).
Google Scholar
Bray, F. et al. Global cancer statistics 2022: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians 74, 229–263 (2024).
Google Scholar
Bruni, D., Angell, H. K. & Galon, J. The immune contexture and immunoscore in cancer prognosis and therapeutic efficacy. Nat. Rev. Cancer 20, 662–680 (2020).
Google Scholar
Herbst, R. S. et al. Atezolizumab for first-line treatment of PD-L1-selected patients with NSCLC. N. Engl. J. Med. 383, 1328–1339 (2020).
Google Scholar
Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q. V., Hinton, G. E., & Dean, J. Outrageously large neural networks: the Sparsely-Gated Mixture-of-Experts layer. Int. Conf. Learning Representations (eds Bengio, Y. & LeCun, Y.) 1–19 (OpenReview.net, 2017).
Bao, H. et al. Vlmo: unified vision-language pre-training with mixture-of-modality-experts. Adv. Neural Inf. Process. Syst. 35, 32897–32912 (2022).
Esser, P. et al. Scaling rectified flow transformers for high-resolution image synthesis. In Forty-first Int. Conf. Machine Learning (eds Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J. & Berkenkamp, F.) 12606–12633 (PMLR, 2024).
Sun, Y. et al. PathAsst: a generative foundation AI assistant towards artificial general intelligence of pathology. In AAAI Conf. Artificial Intelligence (ed. Wooldridge, M.) 5034–5042 (AAAI, 2024).
Li, J., Li, D., Xiong, C. & Hoi, S. C. H. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In Int. Conf. Machine Learning (eds Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G. & Sabato, S.) 12888–12900 (PMLR, 2022).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In North American Chapter Assoc. Comp. Linguistics (eds Burstein, J., Doran, C., Pedersen, T. & Solorio, T.) 4171–4186 (ACL, 2019).
Ramesh, A. et al. Zero-shot text-to-image generation. In Int. Conf. Machine Learning (eds Meila, M. & Zhang, T.) 8821–8831 (PMLR, 2021).
Peng, Z., Dong, L., Bao, H., Ye, Q. & Wei, F. BEiT v2: masked image modeling with vector-quantized visual tokenizers. Preprint at https://arxiv.org/abs/2208.06366 (2022).
Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
Google Scholar
Shen, Y., Luo, Y., Shen, D. & Ke, J. RandStainNA: learning stain-agnostic features from histology slides by bridging stain augmentation and normalization. In Int. Conf. Medical Image Computing and Computer-Assisted Intervention (eds Wang, L., Dou, Q., Fletcher, P. T., Speidel, S. & Li, S.) 212–221 (Springer, 2022).
Kang, M., Song, H., Park, S., Yoo, D. & Pereira, S. Benchmarking self-supervised learning on diverse pathology datasets. 2023 IEEE/CVF Conf. Computer Vision Pattern Recognition (CVPR) (eds Chellappa, R., Matas, J., Quan, L. & Shah, M.) 3344–3354 (IEEE, 2022).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In Int. Conf. Learning Representations (Tara Sainath, T.) 1–18 (OpenReview.net, 2019).
Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In Int. Conf. Machine Learning (eds Dy, J. & Krause, A.) 2127–2136 (PMLR, 2018).
Weinstein, J. N. et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Google Scholar
Kefeli, J. & Tatonetti, N. TCGA-reports: a machine-readable pathology report resource for benchmarking text-based AI models. Patterns 5, 100933 (2024).
Google Scholar
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S. & Avila, R. Gpt-4 technical report. arXiv https://arxiv.org/abs/2303.08774 (2023).
Callahan, A. et al. The Stanford Medicine data science ecosystem for clinical and translational research. JAMIA Open 6, ooad054 (2023).
Google Scholar
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Google Scholar