Ensemble Feature Fusion of VGG16, ResNet50, and Vision Transformer for Pneumonia Detection in Chest X-ray Images

Abstract

This study proposes a novel heterogeneous ensemble deep learning architecture for pneumonia classifica- tion from chest X-ray images by integrating pretrained convolutional neural networks(CNN), VGG16 and ResNet50 with a fine-tuned vision transformer (ViT). The model employs a feature-level fusion strategy that concatenates deep local spatial features extracted by the CNN backbones and feeds them into the ViT to capture global contextual relationships via self-attention. This design effectively addresses the limitations of standalone CNN and ViT models by synergistically combining their complementary strengths. Extensive ablation studies and experimental evaluations demonstrate that the ensemble model significantly outper- forms individual CNN and ViT baseline models, achieving an accuracy of 98.5%, precision of 98.7%, recall of 98.3%, F1-score of 98.5%, and an area under the receiver operating characteristic (AUC-ROC) curve of 0.99 on the pneumonia X-ray dataset. The architecture balances detailed local feature extraction and holistic global context modelling, offering a robust and efficient solution for medical image classification.

Author Biographies

Deepa A B, Rajagiri School of Engineering & Technology, APJ Abdul Kalam Technological University, Kakkanad, Kerala, 682039, India and College of Engineering & Management, APJ Abdul Kalam Technological University, Punnapra, Kerala, 688003, India

Research Scholar

Varghese Paul, Rajagiri School of Engineering & Technology, APJ Abdul Kalam Technological University, Kakkanad, Kerala, 682039, India

Professor

Authors

  • Deepa A B Rajagiri School of Engineering & Technology, APJ Abdul Kalam Technological University, Kakkanad, Kerala, 682039, India and College of Engineering & Management, APJ Abdul Kalam Technological University, Punnapra, Kerala, 688003, India
  • Varghese Paul Rajagiri School of Engineering & Technology, APJ Abdul Kalam Technological University, Kakkanad, Kerala, 682039, India

DOI:

https://doi.org/10.31449/inf.v50i12.9647

Downloads

Published

05/13/2026

How to Cite

B, D. A., & Paul, V. (2026). Ensemble Feature Fusion of VGG16, ResNet50, and Vision Transformer for Pneumonia Detection in Chest X-ray Images. Informatica, 50(12). https://doi.org/10.31449/inf.v50i12.9647