Efficient Transformer Architectures for Diabetic Retinopathy Classification from Fundus Images: DR-MobileViT, DR-EfficientFormer, and DR-SwinTiny
Abstract
Diabetic retinopathy (DR) is a prevalent cause of vision loss, necessitating efficient diagnostic tools, particularly in resource-limited settings. This study presents three lightweight transformer-based models— DR-MobileViT, DR-EfficientFormer, and DR-SwinTiny—for automated DR classification from fundus images (APTOS 2019: 3,662 images; Messidor-2: 1,748 images). After preprocessing including resizing to 224×224 pixels and CLAHE enhancement, these models, leveraging compact architectures (1.8–3.5M parameters), are trained using an AdamW optimizer with data augmentation. DR-MobileViT integrates convolutional and transformer layers, DR-EfficientFormer employs a dimension-consistent design, and DRSwinTiny utilizes shifted window attention. All models were initialized with ImageNet pretrained weights. Evaluated on the APTOS 2019 and Messidor-2 datasets, they achieve quadratic weighted kappa (QWK) scores up to 0.89 and areas under the ROC curve (AUC) up to 0.95. These models approach the performance of top-performing CNN ensembles from the APTOS 2019 challenge (which exceed 40M parameters) while reducing inference times to 10–15 ms/image (NVIDIA P100 GPU) and computational overhead by over 90%. These results indicate their potential for scalable, point-of-care DR screening, offering a viable solution for early detection in underserved regions.References
Cheung, N., Mitchell, P., and Wong, T. Y. (2010) Diabetic retinopathy, Lancet, 376(9735), pp. 124-136.
Ting, D. S. W., Pasquale, L. R., Peng, L., Campbell, J. P., Lee, A. Y., Raman, R., Tan, G. S. W., Schmetterer, L., Keane, P. A., and Wong, T. Y. (2019) Artificial intelligence and deep learning in ophthalmology, British Journal of Ophthalmology, BMJ Publishing Group Ltd, 103(2), pp. 167-175.
Gulshan, V., Peng, L., Coram, M., and others (2016) Development and validation of a deep learning algorithm for detection of diabetic retinopathy, JAMA, 316(22), pp. 2402-2410.
LeCun, Y., Bengio, Y., and Hinton, G. (2015) Deep learning, Nature, 521(7553), pp. 436-444.
Kaggle (2019) APTOS 2019 Blindness Detection Challenge, Available at: https://kaggle.com/c/aptos2019-blindness-detection.
Howard, A. G., Zhu, M., Chen, B., and others (2017) MobileNets: Efficient convolutional neural networks for mobile vision applications, arXiv:1704.04861.
Vaswani, A., Shazeer, N., Parmar, N., and others (2017) Attention is all you need, Advances in Neural Information Processing Systems (NeurIPS).
Chen, J., Lu, Y., Yu, Q., and others (2021) TransUNet: Transformers make strong encoders for medical image segmentation, arXiv:2102.04306.
Han, K., Wang, Y., Chen, H., and others (2022) A survey on vision transformer, IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), pp. 87-110.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., and others (2020) An image is worth 16x16 words: Transformers for image recognition, International Conference on Learning Representations (ICLR).
Raghu, M., Unterthiner, T., Kornblith, S., and others (2021) Do vision transformers see like convolutional neural networks?, Advances in Neural Information Processing Systems (NeurIPS).
Mehta, S. and Rastegari, M. (2021) MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer, arXiv:2110.02178.
Li, Y., Yuan, G., Wen, Y., and others (2022) EfficientFormer: Vision transformers at mobile speed, arXiv:2206.01191.
Liu, Z., Lin, Y., Cao, Y., and others (2021) Swin Transformer: Hierarchical vision transformer using shifted windows, IEEE/CVF International Conference on Computer Vision (ICCV).
He, K., Chen, X., Xie, S., and others (2022) Transformers in medical imaging: A survey, Medical Image Analysis, 81, pp. 102567.
Decencière, E., Zhang, X., Cazuguel, G., and others (2014) Feedback on a publicly distributed image database: The Messidor database, Image Analysis & Stereology, 33(3), pp. 231-234.
APTOS (2019) Asia Pacific Tele-Ophthalmology Society dataset.
Pratt, H., Coenen, F., Broadbent, D. M., and others (2019) Convolutional neural networks for diabetic retinopathy detection, Medical Image Analysis, 55, pp. 101-110.
Zuiderveld, K. (1994) Contrast limited adaptive histogram equalization, Graphics Gems IV, Academic Press.
APTOS 2019 Rank 1 Solution (2019) APTOS 2019 Rank 1 Solution, Available at: https://kaggle.com.
APTOS 2019 Rank 2 Solution (2019) APTOS 2019 Rank 2 Solution, Available at: https://kaggle.com.
Tan, M. and Le, Q. (2019) EfficientNet: Rethinking model scaling for convolutional neural networks, International Conference on Machine Learning (ICML).
Shorten, C. and Khoshgoftaar, T. M. (2019) A survey on image data augmentation for deep learning, Journal of Big Data, 6(1), pp. 60.
Loshchilov, I. and Hutter, F. (2017) SGDR: Stochastic gradient descent with warm restarts, International Conference on Learning Representations (ICLR).
Szegedy, C., Vanhoucke, V., Ioffe, S., and others (2016) Rethinking the inception architecture for computer vision, IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Wilcoxon, F. (1945) Individual comparisons by ranking methods, Biometrics Bulletin, 1(6), pp. 80-83.
Carion, N., Massa, F., Synnaeve, G., and others (2020) End-to-end object detection with transformers, European Conference on Computer Vision (ECCV).
Wang, X., Girshick, R., Gupta, A., and others (2021) Pyramid vision transformer: A versatile backbone for dense prediction, IEEE/CVF International Conference on Computer Vision (ICCV).
Rajpurkar, P., Chen, E., Banerjee, O., and others (2022) AI in healthcare: The future of diagnostics, Nature Medicine, 28(1), pp. 15-18.
Abràmoff, M. D., Lou, Y., Erginay, A., and others (2018) Improved automated detection of diabetic retinopathy, Ophthalmology, 125(12), pp. 1904-1912.
Jacob, B., Kligys, S., Chen, B., and others (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference, IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Panwar, N., Huang, P., Lee, J., and others (2020) Fundus photography in the 21st century—A review of portable devices, Eye, 34(5), pp. 849-856.
McMahan, H. B., Moore, E., Ramage, D., and others (2017) Communication-efficient learning of deep networks from decentralized data, Artificial Intelligence and Statistics (AISTATS).
Li, T., Sahu, A. K., Talwalkar, A., and others (2020) Robustness of deep learning models in real-world medical imaging, IEEE Transactions on Biomedical Engineering, 67(5), pp. 1432-1441.
DOI:
https://doi.org/10.31449/inf.v49i29.8695Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







