Enhanced Cardio Care: Explainable Vision Transformer Multimodal Pipeline For Cardiac Abnormalities Detection Using Electrocardiogram Image Reports
Abstract
Electrocardiogram (ECG) based Artificial Intelligence (AI) analysis has evolved. Its performance in diagnosing arrhythmias is now comparable to that of human experts, and it has the potential to assist societies with limited healthcare resources. However, these settings often have paper-based ECG image archives
only, while the current AI-ECG analysis requires digitised ECG signals. To address this, we previously introduced Cardio Care, a mobile-friendly diagnostic pipeline capable of analysing both ECG signals and scanned ECG images. In this extended study, we enhance the pipeline’s explainability and expand its model benchmarking by comparing the Vision Transformer (ViT) with two of its data-efficient variants: DeiT and BEiT. These models were evaluated on two image-based ECG datasets—one public dataset (Mendeley) and one private dataset (Tam Duc Cardiometabolic). Our results show that ViT achieves the strongest classification performance among all three variants, with macro F1-scores of up to 0.99 on Mendeley and 0.81 on
Tam Duc. Additionally, we integrate a Grad-CAM-based explainability feature to visualise model attention, improving interpretability for clinical use. The enhanced Cardio Care pipeline now has an explainable function using Grad-Cam, demonstrating significant potential for scalable, low-cost cardiac screening in
underserved healthcare settings.
References
@article{Darmawahyuni2022DeepLE,
title={Deep learning-based electrocardiogram rhythm and beat features for heart abnormality classification},
author={Annisa Darmawahyuni and Siti Nurmaini and Muhammad Naufal Rachmatullah and Bambang Tutuko and Ade Iriani Sapitri and Firdaus Firdaus and Ahmad Fansyuri and Aldi Predyansyah},
journal={PeerJ Computer Science},
year={2022},
volume={8},
url={https://peerj.com/articles/cs-825/}
}
@article{Chiang2019,
title={Noise Reduction in ECG Signals Using Fully Convolutional Denoising Autoencoders},
author={Hsin-Tien Chiang and Yi-Yen Hsieh and Szu-Wei Fu and Kuo-Hsuan Hung and Yu Tsao and Shao-Yi Chien},
journal={IEEE Access},
year={2019},
volume={7},
pages={60806-60813},
url={10.1109/ACCESS.2019.2912036}
}
@article{ZHANG2021102373,
title = {Interpretable deep learning for automatic diagnosis of 12-lead electrocardiogram},
journal = {iScience},
volume = {24},
number = {4},
pages = {102373},
year = {2021},
issn = {2589-0042},
url = {10.1016/j.isci.2021.102373},
author = {Dongdong Zhang and Samuel Yang and Xiaohui Yuan and Ping Zhang}
}
@inproceedings{Khan2021ECGID,
title={ECG Images dataset of Cardiac Patients},
author={Ali Haider Khan and Muzammil Hussain},
year={2021},
doi = {10.17632/gwbz3fsgp8.2},
url={https://data.mendeley.com/datasets/gwbz3fsgp8/2}
}
@inproceedings{ECG,
title={ECG: Reading The Waves},
author={Manuals MSD},
year={2023},
url={https://www.msdmanuals.com/home/multimedia/image/ecg-reading-the-waves}
}
@article{Gour2023ACR,
title={A Comprehensive Review of Heart Disease Classification Techniques Utilizing ECG Signal Analysis},
author={Akshita Gour and Muktesh Gupta and Rajesh Wadhvani and Sanyam Shukla},
journal={2023 International Conference on Electrical, Electronics, Communication and Computers (ELEXCOM)},
year={2023},
pages={1-6},
url={https://ieeexplore.ieee.org/document/10370226}
}
@article{Khunte2024AutomatedDR,
title={Automated Diagnostic Reports from Images of Electrocardiograms at the Point-of-Care},
author={Akshay Khunte and Veer Sangha and Evangelos K. Oikonomou and Lovedeep Singh Dhingra and Arya Aminorroaya and Andreas C Coppi and Sumukh Vasisht Shankar and Bobak J. Mortazavi and Deepak L. Bhatt and Harlan M. Krumholz and Girish N. Nadkarni and Akhil Vaid and Rohan Khera},
journal={medRxiv},
year={2024},
url={10.1101/2024.02.17.24302976}
}
@article{Ng2018AnOA,
title={An Open Access Database for Evaluating the Algorithms of Electrocardiogram Rhythm and Morphology Abnormality Detection},
author={Eddie Y. K. Ng and et al.},
journal={Journal of Medical Imaging and Health Informatics},
year={2018},
url={https://doi.org/10.1166/JMIHI.2018.2442}
}
@article{doi:10.1161/CIR.0000000000001209,
author = {Seth S. Martin and et al.},
title = {2024 Heart Disease and Stroke Statistics: A Report of US and Global Data From the American Heart Association},
journal = {Circulation},
volume = {149},
number = {8},
pages = {e347-e913},
year = {2024},
url = {10.1161/CIR.0000000000001209},
eprint = {https://www.ahajournals.org/doi/pdf/10.1161/CIR.0000000000001209}
}
@article{ROTH20202982,
title = {Global Burden of Cardiovascular Diseases and Risk Factors, 1990–2019: Update From the GBD 2019 Study},
journal = {Journal of the American College of Cardiology},
volume = {76},
number = {25},
pages = {2982-3021},
year = {2020},
issn = {0735-1097},
url = {10.1016/j.jacc.2020.11.010},
author = {Gregory A. Roth and et al.},
keywords = {cardiovascular diseases, global health, health policy, population health},
}
@misc{dosovitskiy2021image,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby},
year={2021},
eprint={2010.11929},
archivePrefix={arXiv},
primaryClass={id='cs.CV' full_name='Computer Vision and Pattern Recognition' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.'}
}
@Article{Sadad2023,
AUTHOR = {Sadad, Tariq and Safran, Mejdl and Khan, Inayat and Alfarhood, Sultan and Khan, Razaullah and Ashraf, Imran},
TITLE = {Efficient Classification of ECG Images Using a Lightweight CNN with Attention Module and IoT},
JOURNAL = {Sensors},
VOLUME = {23},
YEAR = {2023},
NUMBER = {18},
ARTICLE-NUMBER = {7697},
PubMedID = {37765754},
ISSN = {1424-8220},
url = {10.3390/s23187697}
}
@ARTICLE{Abu2023,
author={Abubaker, Mohammed B. and Babayiğit, Bilal},
journal={IEEE Transactions on Artificial Intelligence},
title={Detection of Cardiovascular Diseases in ECG Images Using Machine Learning and Deep Learning Methods},
year={2023},
volume={4},
number={2},
pages={373-382},
keywords={Feature extraction;Electrocardiography;Heart;Diseases;Convolutional neural networks;Machine learning;Deep learning;Cardiovascular;deep learning;electrocar diogram (ECG) images;feature extraction;machine learning;transfer learning},
url={10.1109/TAI.2022.3159505}}
@article{doi:10.1161/JAHA.113.000268,
author = {James M. McCabe and Ehrin J. Armstrong and Ivy Ku and Ameya Kulkarni and Kurt S. Hoffmayer and Prashant D. Bhave and Stephen W. Waldo and Priscilla Hsue and John C. Stein and Gregory M. Marcus and Scott Kinlay and Peter Ganz },
title = {Physician Accuracy in Interpreting Potential ST‐Segment Elevation Myocardial Infarction Electrocardiograms},
journal = {Journal of the American Heart Association},
volume = {2},
number = {5},
pages = {e000268},
year = {2013},
url = {10.1161/JAHA.113.000268}
}
@InProceedings{10.1007/978-3-031-59091-7_16,
author="Chukwu, Emmanuel C.
and Moreno-S{'a}nchez, Pedro A.",
editor="S{"a}rest{"o}niemi, Mariella
and Keikhosrokiani, Pantea
and Singh, Daljeet
and Harjula, Erkki
and Tiulpin, Aleksei
and Jansson, Miia
and Isomursu, Minna
and van Gils, Mark
and Saarakkala, Simo
and Reponen, Jarmo",
title="Enhancing Arrhythmia Diagnosis with Data-Driven Methods: A 12-Lead ECG-Based Explainable AI Model",
booktitle="Digital Health and Wireless Solutions",
year="2024",
publisher="Springer Nature Switzerland",
address="Cham",
pages="242--259",
isbn="978-3-031-59091-7"
}
@article{Nus2011,
author = {NUSSINOVITCH, UDI and ELISHKEVITZ, KEREN POLITI and KAMINER, KEREN and NUSSINOVITCH, MOSHE and SEGEV, SHLOMO and VOLOVITZ, BENJAMIN and NUSSINOVITCH, NAOMI},
title = {The Efficiency of 10-Second Resting Heart Rate for the Evaluation of Short-Term Heart Rate Variability Indices},
journal = {Pacing and Clinical Electrophysiology},
volume = {34},
number = {11},
pages = {1498-1502},
keywords = {heart rate variability, electrocardiography, autonomic nervous system, resting heart rate},
url = {10.1111/j.1540-8159.2011.03178.x},
year = {2011}
}
@article{JI202188,
title = {A semi-supervised zero-shot image classification method based on soft-target},
journal = {Neural Networks},
volume = {143},
pages = {88-96},
year = {2021},
issn = {0893-6080},
url = {https://www.sciencedirect.com/science/article/pii/S089360802100215X},
author = {Zhong Ji and Qiang Wang and Biying Cui and Yanwei Pang and Xianbin Cao and Xuelong Li},
keywords = {Zero-shot learning, Image classification, Autoencoder, Soft-Target, Semi-supervised learning},
abstract = {Zero-shot learning (ZSL) aims at training a classification model with data only from seen categories to recognize data from disjoint unseen categories. Domain shift and generalization capability are two fundamental challenges in ZSL. In this paper, we address them with a novel Soft-Target Semi-supervised Classification (STSC) model. Specifically, an autoencoder network is leveraged, where both labeled seen data from the seen categories and unlabeled ancillary data collected from Internet or other datasets are employed as two branches, respectively. For the branch of labeled seen data, side information are employed as the latent vectors to separately connect the input of encoder and the output of decoder. In this way, visual and side information are implicitly aligned. For the branch of unlabeled ancillary data, it explicitly strengthens the reconstruction ability of the network. Meanwhile, these ancillary data can be viewed as a smooth to the domain distribution, which contributes to the alleviation of the domain shift problem. To further guarantee the generation ability, a Softmax-T loss function is proposed by making full use of the soft target. Extensive experiments on three benchmark datasets show the superiority of the proposed approach under tasks of both traditional zero-shot learning and generalized zero-shot learning.}
}
@ARTICLE{he2021,
author={He, Fang and Nie, Feiping and Wang, Rong and Jia, Weimin and Zhang, Fenggan and Li, Xuelong},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={Semisupervised Band Selection With Graph Optimization for Hyperspectral Image Classification},
year={2021},
volume={59},
number={12},
pages={10298-10311},
keywords={Optimization;Analytical models;Hyperspectral imaging;Feature extraction;Hidden Markov models;Computational modeling;Laplace equations;Band selection (BS);hyperspectral images (HSIs);optimal graph;semisupervised},
url={10.1109/TGRS.2020.3037746}}
@article{lee2019,
title={PyWavelets: A Python package for wavelet analysis},
author={Lee, Gregory and Gommers, Ralf and Waselewski, Filip and Wohlfahrt, Kai and O'Leary, Aaron},
journal={Journal of Open Source Software},
volume={4},
number={36},
pages={1237},
year={2019},
publisher={The Open Journal}
}
@article{vonesch2007,
title={Generalized Daubechies wavelet families},
author={Vonesch, C{'e}dric and Blu, Thierry and Unser, Michael},
journal={IEEE transactions on signal processing},
volume={55},
number={9},
pages={4415--4429},
year={2007},
publisher={IEEE}
}
@article{li2019,
title={Noise estimation for image sensor based on local entropy and median absolute deviation},
author={Li, Yongsong and Li, Zhengzhou and Wei, Kai and Xiong, Weiqi and Yu, Jiangpeng and Qi, Bo},
journal={Sensors},
volume={19},
number={2},
pages={339},
year={2019},
publisher={MDPI}
}
@article{aha,
author = {Paul Kligfield and Leonard S. Gettes and James J. Bailey and Rory Childers and Barbara J. Deal and E. William Hancock and Gerard van Herpen and Jan A. Kors and Peter Macfarlane and David M. Mirvis and Olle Pahlm and Pentti Rautaharju and Galen S. Wagner },
title = {Recommendations for the Standardization and Interpretation of the Electrocardiogram},
journal = {Circulation},
volume = {115},
number = {10},
pages = {1306-1324},
year = {2007},
url = {10.1161/CIRCULATIONAHA.106.180200},
abstract = {This statement examines the relation of the resting ECG to its technology. Its purpose is to foster understanding of how the modern ECG is derived and displayed and to establish standards that will improve the accuracy and usefulness of the ECG in practice. Derivation of representative waveforms and measurements based on global intervals are described. Special emphasis is placed on digital signal acquisition and computer-based signal processing, which provide automated measurements that lead to computer-generated diagnostic statements. Lead placement, recording methods, and waveform presentation are reviewed. Throughout the statement, recommendations for ECG standards are placed in context of the clinical implications of evolving ECG technology.}}
@article{Sangha2021AutomatedMD,
title={Automated multilabel diagnosis on electrocardiographic images and signals},
author={V. Sangha and B. Mortazavi and Adrian D. Haimovich and Ant{^o}nio H. Ribeiro and Cynthia A. Brandt and Denise L. Jacoby and Wade L. Schulz and Harlan M. Krumholz and Ant{^o}nio Luiz P. Ribeiro and Rohan Khera},
journal={Nature Communications},
year={2021},
volume={13},
url={https://rdcu.be/dRfuy}
}
@article{ji2021semi,
title={A semi-supervised zero-shot image classification method based on soft-target},
author={Ji, Zhong and Wang, Qiang and Cui, Biying and Pang, Yanwei and Cao, Xianbin and Li, Xuelong},
journal={Neural Networks},
volume={143},
pages={88--96},
year={2021},
publisher={Elsevier}
}
@article{he2020semisupervised,
title={Semisupervised band selection with graph optimization for hyperspectral image classification},
author={He, Fang and Nie, Feiping and Wang, Rong and Jia, Weimin and Zhang, Fenggan and Li, Xuelong},
journal={IEEE Transactions on Geoscience and Remote Sensing},
volume={59},
number={12},
pages={10298--10311},
year={2020},
publisher={IEEE}
}
@inproceedings{miao2021spatial,
title={Spatial-spectral hyperspectral image classification via multiple random anchor graphs ensemble learning},
author={Miao, Yanling and Wang, Qi and Chen, Mulin and Li, Xuelong},
booktitle={2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS},
pages={3641--3644},
year={2021},
organization={IEEE}
}
@article{xie2019srsc,
title={SRSC: selective, robust, and supervised constrained feature representation for image classification},
author={Xie, Guo-Sen and Zhang, Zheng and Liu, Li and Zhu, Fan and Zhang, Xu-Yao and Shao, Ling and Li, Xuelong},
journal={IEEE transactions on neural networks and learning systems},
volume={31},
number={10},
pages={4290--4302},
year={2019},
publisher={IEEE}
}
@article{sangha2022automated,
title={Automated multilabel diagnosis on electrocardiographic images and signals},
author={Sangha, Veer and Mortazavi, Bobak J and Haimovich, Adrian D and Ribeiro, Ant{^o}nio H and Brandt, Cynthia A and Jacoby, Daniel L and Schulz, Wade L and Krumholz, Harlan M and Ribeiro, Antonio Luiz P and Khera, Rohan},
journal={Nature communications},
volume={13},
number={1},
pages={1583},
year={2022},
publisher={Nature Publishing Group UK London}
}
@article{DiCesare2024TheHO,
title={The Heart of the World},
author={Mariachiara Di Cesare and Pablo Perel and Sean Taylor and Chodziwadziwa Whiteson Kabudula and Honor Bixby and Thomas A Gaziano and Diana Vaca McGhie and Jeremiah Mwangi and Borjana Pervan and Jagat Narula and Daniel Jos{'e} Pi{~n}eiro and Fausto J. Pinto},
journal={Global Heart},
year={2024},
volume={19},
url={https://api.semanticscholar.org/CorpusID:267254220}
}
@inproceedings{Touvron2020,
title={Training data-efficient image transformers & distillation through attention},
author={Hugo Touvron and Matthieu Cord and Matthijs Douze and Francisco Massa and Alexandre Sablayrolles and Herv'e J'egou},
booktitle={International Conference on Machine Learning},
year={2020},
url={https://api.semanticscholar.org/CorpusID:229363322}
}
@article{Bao2021,
title={BEiT: BERT Pre-Training of Image Transformers},
author={Hangbo Bao and Li Dong and Furu Wei},
journal={ArXiv},
year={2021},
volume={abs/2106.08254},
url={https://api.semanticscholar.org/CorpusID:235436185}
}
@InProceedings{cardiocaresoict,
author="Vu, Vo Quoc
and Minh To, Ngoc
and Nguyen Duc, Thanh
and Phung, Nhat
and Ngo, Quoc
and Kumar, Dinesh
and Dinh, Minh",
editor="Buntine, Wray
and Fjeld, Morten
and Tran, Truyen
and Tran, Minh-Triet
and Huynh Thi Thanh, Binh
and Miyoshi, Takumi",
title="Cardio Care: A Vision Transformer Cardiac Classification Based on Electrocardiogram Images and Signals",
booktitle="Information and Communication Technology",
year="2025",
publisher="Springer Nature Singapore",
address="Singapore",
pages="199--209",
abstract="Electrocardiograms, or ECGs, are essential for evaluating cardiac function. Most available models focus on digitized ECG data rather than imaging reports, making them unsuitable for under-resourced communities that only have access to paper-based ECG reports. To address this disadvantage, we propose Cardio Care, a mobile solution that processes both images and signals ECGs to detect abnormalities. It utilises Vision Transformer technology to enhance image recognition, making it more applicable for a broader range of scenarios. We use three datasets (public and local datasets) with varying sample sizes and input types to reflect the data in real-world settings. The results show consistent performance across all datasets, emphasizing the potential of Cardio Care to assist cardiologists in remote and resource-limited healthcare facilities. The average macro F1 scores achieved were 65, 99, and 82 for the CPSC, Mendeley, and Cardiometabolic datasets, respectively. This study proposed an alternative to preprocessing images and signals ECGs for a Vision Transformer-based deep learning network, with the inspiring goal of enhancing healthcare access for under-resourced and underserved communities.",
isbn="978-981-96-4285-4"
}
DOI: https://doi.org/10.31449/inf.v49i3.10180
This work is licensed under a Creative Commons Attribution 3.0 License.








