Liver Disease Classification - An XAI Approach to Biomedical AI

Ebenezer Agbozo, Daniel Musafiri Balungu


Explosive amounts of biological and physiological data, including medical images, electroencephalograms, genomic information, and protein sequences, have been made available to us thanks to advances in biological and medical technologies. Understanding human health and disease is made easier by using this data for learning. Deep learning-based algorithms, which were developed from artificial neural networks, have significant potential for identifying patterns and extracting features from large amounts of complex data. However, these recent advancements involve blackbox models: algorithms that do not provide human-understandable explanations in support of their decisions. This limitation hampers the fairness, accountability and transparency of these models; the field of XAI tries to solve this problem providing human-understandable explanations for black-box models. This paper focuses on the requirement for XAI to be able to explain in detail the decisions made by an AI in a biomedical setting to the expert in the domain, e.g., the physician in the case of AI-based clinical decisions related to diagnosis, treatment, or prognosis of a disease. In this paper, we made use of the Indian Patient Liver Dataset (IPLD) collected from Andhra Pradesh region. The deep learning model with a 0.81 accuracy score (0.82 for the hyperparameter- tuned model) is built on Keras-Tensorflow and due to the imbalance in the target values, we integrated GANs as a means of oversampling the dataset. This study integrated the XAI concept of Shapley Values to shed light on the predictive results obtained by the liver disease detection model.

Full Text:



Adadi, A., & Berrada, M. (2020). Explainable AI for Healthcare: From Black Box to Interpretable Models. Scopus.

Baltussen, R., & Niessen, L. (2006). Priority setting of health interventions: The need for multi-criteria decision analysis. Cost Effectiveness and Resource Allocation, 4(1), 1–9.

Bodyanskiy, Y., Perova, I., Vynokurova, O., & Izonin, I. (2018). Adaptive wavelet diagnostic neuro-fuzzy network for biomedical tasks. 2018 14th International Conference on Advanced Trends in Radioelecrtronics, Telecommunications and Computer Engineering (TCSET), 711–715.

Brunese, L., Mercaldo, F., Reginelli, A., & Santone, A. (2020). Explainable deep learning for pulmonary disease and coronavirus COVID-19 detection from X-rays. Computer Methods and Programs in Biomedicine, 196, 105608.

Bussmann, N., Giudici, P., Marinelli, D., & Papenbrock, J. (2021). Explainable machine learning in credit risk management. Computational Economics, 57(1), 203–216.

Cabitza, F., Rasoini, R., & Gensini, G. F. (2017). Unintended consequences of machine learning in medicine. Jama, 318(6), 517–518.

Dangare, C. S., & Apte, S. S. (2012). Improved study of heart disease prediction system using data mining classification techniques. International Journal of Computer Applications, 47(10), 44–48.

Drain, P. K., Hyle, E. P., Noubary, F., Freedberg, K. A., Wilson, D., Bishai, W. R., Rodriguez, W., & Bassett, I. V. (2014). Diagnostic point-of-care tests in resource-limited settings. The Lancet Infectious Diseases, 14(3), 239–249.

Etzioni, R., Pepe, M., Longton, G., Hu, C., & Goodman, G. (1999). Incorporating the time dimension in receiver operating characteristic curves: A case study of prostate cancer. Medical Decision Making, 19(3), 242–251.

Fan, C., Prat, A., Parker, J. S., Liu, Y., Carey, L. A., Troester, M. A., & Perou, C. M. (2011). Building prognostic models for breast cancer patients using clinical variables and hundreds of gene expression signatures. BMC Medical Genomics, 4(1), 1–15.

Fryer, D., Strümke, I., & Nguyen, H. (2021). Shapley values for feature selection: The good, the bad, and the axioms. IEEE Access, 9, 144352–144360.

Gramegna, A., & Giudici, P. (2020). Why to buy insurance? An explainable artificial intelligence approach. Risks, 8(4), 137.

Grassmann, F., Mengelkamp, J., Brandl, C., Harsch, S., Zimmermann, M. E., Linkohr, B., Peters, A., Heid, I. M., Palm, C., & Weber, B. H. (2018). A deep learning algorithm for prediction of age-related eye disease study severity scale for age-related macular degeneration from color fundus photography. Ophthalmology, 125(9), 1410–1420.

Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys (CSUR), 51(5), 1–42.

Guo, Y., Wu, G., Commander, L. A., Szary, S., Jewells, V., Lin, W., & Shen, D. (2014). Segmenting hippocampus from infant brains by sparse patch matching with deep-learned features. International Conference on Medical Image Computing and Computer-Assisted Intervention, 308–315.

Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., Jodoin, P.-M., & Larochelle, H. (2017). Brain tumor segmentation with deep neural networks. Medical Image Analysis, 35, 18–31.

Izonin, I., Trostianchyn, A., Duriagina, Z., Tkachenko, R., Tepla, T., & Lotoshynska, N. (2018). The combined use of the wiener polynomial and SVM for material classification task in medical implants production. International Journal of Intelligent Systems and Applications, 10(9), 40–47.

John, M. M., Olsson, H. H., & Bosch, J. (2021). Towards mlops: A framework and maturity model. 2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 1–8.

Kim, J., Moon, S., Rohrbach, A., Darrell, T., & Canny, J. (2020). Advisable learning for self-driving vehicles by internalizing observation-to-action rules. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9661–9670.

Kim, M., Wu, G., & Shen, D. (2013). Unsupervised deep learning for hippocampus segmentation in 7.0 Tesla MR images. International Workshop on Machine Learning in Medical Imaging, 1–8.

Kohli, A., & Jha, S. (2018). Why CAD failed in mammography. Journal of the American College of Radiology, 15(3), 535–537.

Lerouge, J., Hérault, R., Chatelain, C., Jardin, F., & Modzelewski, R. (2015). IODA: An input/output deep architecture for image labeling. Pattern Recognition, 48(9), 2847–2858.

Leung, M. K., Delong, A., Alipanahi, B., & Frey, B. J. (2015). Machine learning in genomic medicine: A review of computational problems and data sets. Proceedings of the IEEE, 104(1), 176–197.

Liao, S., Gao, Y., Oto, A., & Shen, D. (2013). Representation learning: A unified deep learning framework for automatic prostate MR segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, 254–261.

Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.

Maniquet, F. (2003). A characterization of the Shapley value in queueing problems. Journal of Economic Theory, 109(1), 90–103.

Matsui, B. M., & Goya, D. H. (2022). MLOps: A Guide to its Adoption in the Context of Responsible AI. 2022 IEEE/ACM 1st International Workshop on Software Engineering for Responsible Artificial Intelligence (SE4RAI), 45–49.

Melnykova, N., Shakhovska, N., & Sviridova, T. (2017). The personalized approach in a medical decentralized diagnostic and treatment. 2017 14th International Conference The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), 295–297.

Moeskops, P., Viergever, M. A., Mendrik, A. M., De Vries, L. S., Benders, M. J., & Išgum, I. (2016). Automatic segmentation of MR brain images with a convolutional neural network. IEEE Transactions on Medical Imaging, 35(5), 1252–1261.

Moreira, I. C., Amaral, I., Domingues, I., Cardoso, A., Cardoso, M. J., & Cardoso, J. S. (2012). Inbreast: Toward a full-field digital mammographic database. Academic Radiology, 19(2), 236–248.

Nayak, J., Acharya U, R., Bhat, P. S., Shetty, N., Lim, T.-C., & others. (2009). Automated diagnosis of glaucoma using digital fundus images. Journal of Medical Systems, 33(5), 337–346.

Nazar, M., Alam, M. M., Yafi, E., & Su’Ud, M. M. (2021). A Systematic Review of Human-Computer Interaction and Explainable Artificial Intelligence in Healthcare with Artificial Intelligence Techniques. IEEE Access. Scopus.

Ngo, T. A., Lu, Z., & Carneiro, G. (2017). Combining deep learning and level set for the automated segmentation of the left ventricle of the heart from cardiac cine magnetic resonance. Medical Image Analysis, 35, 159–171.

Pereira, S., Pinto, A., Alves, V., & Silva, C. A. (2016). Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Transactions on Medical Imaging, 35(5), 1240–1251.

Perova, I., Brazhnykova, Y., Bodyanskiy, Y., & Mulesa, P. (2018). Neural network for online principal component analysis in medical data mining tasks. 2018 IEEE First International Conference on System Analysis & Intelligent Computing (SAIC), 1–5.

Perova, I., Litovchenko, O., Bodvanskiy, Y., Brazhnykova, Y., Zavgorodnii, I., & Mulesa, P. (2018). Medical data-stream mining in the area of electromagnetic radiation and low temperature influence on biological objects. 2018 IEEE Second International Conference on

Data Stream Mining & Processing (DSMP), 3–6.

Perova, I., & Mulesa, P. (2015). Fuzzy spacial extrapolation method using Manhattan metrics for tasks of Medical Data mining. 2015 Xth International Scientific and Technical Conference" Computer Sciences and Information Technologies"(CSIT), 104–106.

Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., & Nielsen, M. (2013). Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. International Conference on Medical Image Computing and Computer-Assisted Intervention, 246–253.

PUB, M. H., Bowyer, K., Kopans, D., Moore, R., & Kegelmeyer, P. (n.d.). The digital database for screening mammography. Proceedings of the Fifth International Workshop on Digital Mammography, 212–218.

Raman, R., Dasgupta, D., Ramasamy, K., George, R., Mohan, V., & Ting, D. (2021). Using artificial intelligence for diabetic retinopathy screening: Policy implications. Indian Journal of Ophthalmology, 69(11), 2993–2998.

Ramana, B., Babu, M., & Venkateswarlu, N. (2012). ILPD (Indian Liver Patient Dataset) Data Set.

Roth, H. R., Farag, A., Lu, L., Turkbey, E. B., & Summers, R. M. (2015). Deep convolutional networks for pancreas segmentation in CT imaging. Medical Imaging 2015: Image Processing, 9413, 378–385.

Rudin, C. (2019). Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nature Machine Intelligence, 1(5), 206–215.

Russell, S. J. (2010). Artificial intelligence a modern approach. Pearson Education, Inc.

Sayres, R., Taly, A., Rahimy, E., Blumer, K., Coz, D., Hammel, N., Krause, J., Narayanaswamy, A., Rastegar, Z., Wu, D., & others. (2019). Using a deep learning algorithm and integrated gradients explanation to assist grading for diabetic retinopathy. Ophthalmology, 126(4), 552–564.

Schlegl, T., Waldstein, S. M., Vogl, W.-D., Schmidt-Erfurth, U., & Langs, G. (2015). Predicting semantic descriptions from medical images with convolutional neural networks. International Conference on Information Processing in Medical Imaging, 437–448.

Shebl, F. M., El-Kamary, S. S., Saleh, D. A., Abdel-Hamid, M., Mikhail, N., Allam, A., El-Arabi, H., Elhenawy, I., El-Kafrawy, S., El-Daly, M., & others. (2009). Prospective cohort study of mother-to-infant infection and clearance of hepatitis C in rural Egyptian villages. Journal of Medical Virology, 81(6), 1024–1031.

Shin, H.-C., Orton, M. R., Collins, D. J., Doran, S. J., & Leach, M. O. (2012). Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4D patient data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1930–1943.

Souza, J., & Leung, C. K. (2021). Explainable Artificial Intelligence for Predictive Analytics on Customer Turnover: A User-Friendly Interface for Non-expert Users. In Explainable AI Within the Digital Transformation and Cyber Physical Systems (pp. 47–67). Springer.

Tallón-Ballesteros, A., & Chen, C. (2020). Explainable AI: Using Shapley value to explain complex anomaly detection ML-based systems. Machine Learning and Artificial Intelligence, 332, 152.

Tang, Y., Wang, Y., Cooper, K. M., & Li, L. (2014). Towards big data Bayesian network learning-an ensemble learning based approach. 2014 IEEE International Congress on Big Data, 355–357.

Telenyk, S., Czajkowski, K., Bidiuk, P., & Zharikov, E. (2019). Method of assessing the state of monuments based on fuzzy logic. 2019 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), 1, 500–506.

Urdea, M., Penny, L. A., Olmsted, S. S., Giovanni, M. Y., Kaspar, P., Shepherd, A., Wilson, P., Dahl, C. A., Buchsbaum, S., Moeller, G., & others. (2006). Requirements for high impact diagnostics in the developing world. Nature, 444(1), 73–79.

Van Lent, M., Fisher, W., & Mancuso, M. (2004). An explainable artificial intelligence system for small-unit tactical behavior. Proceedings of the National Conference on Artificial Intelligence, 900–907.

Vijiyarani, S., & Sudha, S. (2013). Disease prediction in data mining technique–a survey. International Journal of Computer Applications & Information Technology, 2(1), 17–21.

Xu, Y., Li, Y., Wang, Y., Liu, M., Fan, Y., Lai, M., Eric, I., & Chang, C. (2017). Gland instance segmentation using deep multichannel neural networks. IEEE Transactions on Biomedical Engineering, 64(12), 2901–2912.

Xu, Y., Yang, X., Gong, L., Lin, H.-C., Wu, T.-Y., Li, Y., & Vasconcelos, N. (2020). Explainable object-induced action decision for autonomous vehicles. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9523–9532.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.