Predicting Stages of Liver Cirrhosis Using Data Mining and Machine Learning Techniques
Abstract
Liver cirrhosis often occurs as a result of the lengthy and persistent progression of chronic liver disorders. It is a key crucial cause of death on a global scale. Early diagnosis and identification of cirrhosis are essential for preventing the disease's progression and the complete devastation of liver tissue. This paper aims to build an intelligent automated system that can predict the stages of cirrhosis employing Machine Learning (ML) algorithms, including Random Forest (RF), Extra Trees (ET), and Support Vector Machine (SVM). The dataset used in this research is sourced from the Zenodo website, which is linked to the GitHub website. This was our initial use of the data, which is publicly accessible. Data mining techniques were also implemented to analyze the data before predicting the outcome. Due to the considerable imbalance in the dataset's classes, we applied the Synthetic Minority Oversampling Technique (SMOTE) to mitigate a bias problem in a machine learning model. A newly proposed model implemented feature selection techniques Chi-Square and Recursive Feature Elimination and Cross-Validation (RFECV) with classifiers RF and SVM (RF-RFECV, SVM-RFECV). The experimental findings demonstrate that the Extra-Trees model using the Chi-square feature selection method (ET-Chi-Square) achieved the maximum level of accuracy of 93.87%. Additionally, it obtained recall, F1-score, and precision values of 94% each, and an Area Under Curve (AUC) of 99%. Our method exhibited exceptional performance as compared to previous relevant research.DOI:
https://doi.org/10.31449/inf.v48i21.6752Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







