A multimodal fusion model for bone tumor benign and malignant diagnosis: development and validation with clinical text and radiographs

Scritto il 12/03/2026

da Ju Zeng

Transl Cancer Res. 2026 Feb 28;15(2):91. doi: 10.21037/tcr-2025-1832. Epub 2026 Feb 2.

ABSTRACT

BACKGROUND: Bone tumors have diverse clinical and imaging features, rendering preoperative differentiation of benign, intermediate/malignant types challenging. Unimodal methods (medical records or X-rays) are prone to misdiagnosis/missed diagnosis due to incomplete information. While postoperative histopathology is the gold standard, there is an urgent clinical demand for a precise preoperative diagnostic tool. This study aims to develop and validate a multimodal model integrating deep learning with Dempster-Shafer (DS) evidence theory for the differential diagnosis of benign, intermediate/malignant bone tumors. Using postoperative histopathology as the reference standard, the model achieves diagnosis by integrating preoperative clinical text and radiographs.

METHODS: This single-center retrospective study included 319 pathologically confirmed bone tumor patients admitted between 2020 and 2025 following selection criteria. Utilizing the patients' X-ray images and medical record text data, we constructed a fusion model based on deep learning and DS evidence theory to classify tumors into benign and intermediate/malignant categories. The performance of the model was evaluated using the receiver operating characteristic (ROC) curve along with its 95% confidence interval (CI).

RESULTS: The dataset comprised text data and radiographs from a total of 319 patients and it was stratified by time into a training set, an internal validation set, and an external validation set. On the internal validation set, the fusion model achieved an area under the curve (AUC) of 0.821 (95% CI: 0.713-0.916), with an accuracy of 81.6%, precision of 81.3%, recall of 76.5% and an F1 score of 78.8%, outperforming both the unimodal text model with an AUC of 0.814 and accuracy of 77.6% and the image model with an AUC of 0.782 and accuracy of 72.4%. On the external validation set, the fusion model maintained robust performance: AUC reached 0.808 (95% CI: 0.667-0.928), accuracy 77.3%, and F1 score 70.6%. Compared to the proposed fusion approach, most baseline models underperformed across all metrics, with their accuracy ranging from 59.1% to 77.3% and F1 score ranging from 47.1% to 70.6%. Furthermore, the model's diagnostic performance rivals that of senior radiologists and significantly outperforms junior radiologists. McNemar's test results confirmed no significant difference in diagnostic performance between the model and senior radiologists, while a statistically significant performance gap existed between junior and senior radiologists.

CONCLUSIONS: We have developed and validated a fusion model that integrated deep learning and DS evidence theory. In the task of distinguishing between benign and intermediate/malignant bone tumors, this fusion model demonstrated encouraging performance compared to models that utilize unimodal data and other baseline fusion models.

PMID:41815162 | PMC:PMC12971601 | DOI:10.21037/tcr-2025-1832