A stacking ensemble framework integrating radiomics and deep learning for prognostic prediction in head and neck cancer

Background: Radiomics models frequently face challenges related to reproducibility and robustness. To address these issues, we propose a multimodal, multi-model fusion framework utilizing stacking ensemble learning for prognostic prediction in head and neck cancer (HNC). This approach seeks to impro...

Full description

Bibliographic Details
Main Authors: Wang, Bingzhen, Liu, Jinghua, Zhang, Xiaolei, Lin, Jianpeng, Li, Shuyan, Wang, Zhongxiao, Cao, Zhendong, Wen, Dong, Liu, Tiange, Harun Ramli, Hafiz Rashidi, Harith, Hazreen Haizi, Wan Hasan, Wan Zuha, Dong, Xianling
Format: Article
Language:English
Published: BioMed Central 2025
Online Access:http://psasir.upm.edu.my/id/eprint/120354/
http://psasir.upm.edu.my/id/eprint/120354/1/120354.pdf
Description
Summary:Background: Radiomics models frequently face challenges related to reproducibility and robustness. To address these issues, we propose a multimodal, multi-model fusion framework utilizing stacking ensemble learning for prognostic prediction in head and neck cancer (HNC). This approach seeks to improve the accuracy and reliability of survival predictions. Methods: A total of 806 cases from nine centers were collected; 143 cases from two centers were assigned as the external validation cohort, while the remaining 663 were stratified and randomly split into training (n = 530) and internal validation (n = 133) sets. Radiomics features were extracted according to IBSI standards, and deep learning features were obtained using a 3D DenseNet-121 model. Following feature selection, the selected features were input into Cox, SVM, RSF, DeepCox, and DeepSurv models. A stacking fusion strategy was employed to develop the prognostic model. Model performance was evaluated using Kaplan-Meier survival curves and time-dependent ROC curves. Results: On the external validation set, the model using combined PET and CT radiomics features achieved superior performance compared to single-modality models, with the RSF model obtaining the highest concordance index (C-index) of 0.7302. When using deep features extracted by 3D DenseNet-121, the PET + CT-based models demonstrated significantly improved prognostic accuracy, with Deepsurv and DeepCox achieving C-indices of 0.9217 and 0.9208, respectively. In stacking models, the PET + CT model using only radiomics features reached a C-index of 0.7324, while the deep feature-based stacking model achieved 0.9319. The best performance was obtained by the multi-feature fusion model, which integrated both radiomics and deep learning features from PET and CT, yielding a C-index of 0.9345. Kaplan–Meier survival analysis further confirmed the fusion model’s ability to distinguish between high-risk and low-risk groups. Conclusion: The stacking-based ensemble model demonstrates superior performance compared to individual machine learning models, markedly improving the robustness of prognostic predictions.