Analysis Implementation of the Ensemble Algorithm in Predicting Customer Churn in Telco Data: A Comparative Study

Globalization and technological advancements in the telecommunication industry have led to a significant rise in the number of operators, leading to intense market competition. This sector has become crucial in developed countries, and companies strive to increase profits by acquiring new customers, up-selling existing ones, and extending the retention period of current clients. In the traditional method of defect prediction, a single classifier is used to build a model on a pre-labeled dataset. However, this approach has limitations in predicting defects accurately under certain circumstances. To overcome these limitations, boosting is applied to combine multiple weak classifiers and create a robust classification model. Among many algorithms used for churn prediction, ensemble techniques have demonstrated greater accuracy than simpler approaches. This study aims to overcome these limitations by experimenting with five ensemble algorithms, including Adaboost, Gradient Boost, XGBoost, CatBoost, and LightGBM. The results indicate that XGBoost outperforms other techniques and is the most suitable algorithm to build the predictive model. Additionally, the study achieves higher accuracy by performing a Grid Search CV hyper-parameter setting with XGBoost, resulting in an accuracy of 81.2%. Povzetek: Študija je primerjala pet ansambelskih algoritmov za napovedovanje prekinitve naročniškega razmerja. Rezultati kažejo, da je XGBoost najboljši algoritem z natančnostjo 81,2 %.


Introduction
The telecommunication industry's globalization and advancements have resulted in an exponential increase in operators, leading to heightened market competition [1]. Over the past two decades, the telecommunications sector has emerged as a critical industry in developed countries [2]. Due to the availability of extensive data, data mining has become essential for prediction and analysis in this industry. One primary application of data mining is predicting churners, which helps increase customer retention and profitability. Data mining techniques are commonly used in telecom to monitor customer churn behavior. Customers anticipate excellent services at reasonable prices. If they are dissatisfied, they will quickly switch to another telecom network. In such a competitive market, companies must discover innovative approaches to forecast possible customer churn to thrive. Customer churn refers to the percentage of customers who have stopped utilizing a company's products or services during a specific period [3]. To maximize profits in this competitive era, companies have proposed various strategies, including acquiring new customers, up-selling existing customers, and increasing the retention period of existing customers. Among these strategies, customer retention is the least expensive. To adopt this strategy, companies must reduce potential customer churn, which occurs when customers switch service providers due to dissatisfaction with the consumer service and support system. Dissatisfaction, increased costs, poor quality, lack of features, and privacy concerns are among the reasons why customers may churn. To address this problem, it is necessary to forecast which customers are at risk of churning [4,5,6]. The telecommunications industry is expanding rapidly thanks to various technologies, and different companies offer varying data communication services with different levels of quality. To combat churn, companies offer various attractive services to retain their customers. Data mining technologies, such as Naïve Bayes, decision trees, neural networks, and logistic regression algorithms, are used to predict churn. An accurate prediction model is essential for correctly identifying customer churn and is critical in making retention decisions [7]. The most effective customer churn prediction model can identify churners and guide decision-makers to generate maximum profit [8,9]. There are differences between the use of algorithms in determining customer churn on telco data. These differences are influenced by variables, types of data, and the amount of data that varies. Therefore, previous studies have differences in determining the best algorithm for customer churn analysis. In this study, a literature study and experiment approach was used to prove the best algorithm that can be used in customer churn analysis. This study combines two methods: the experimental method as the primary method and the literature study method as a comparison method so that the results of this study produce a comparison of the experiments that have R.P. Sari et al.
been carried out using the worst possibility and the consequences of pre-existing research.
Churn analysis on telco data is a crucial process for any company operating in the telecommunications industry. The main aim of this analysis is to predict the likelihood of customers abandoning a company's services. By identifying customers likely to churn, companies can take proactive measures to retain them and prevent revenue loss. This analysis involves using machine learning techniques and data analysis methods to identify the factors influencing a customer's decision to stop using the company's services. Churn analysis on telco data can benefit companies in several ways. Firstly, it can help improve customer retention rates, as companies can take timely action to retain customers at risk of churning. This can lead to increased customer loyalty and higher revenue. Secondly, it can help increase customer satisfaction by identifying and addressing the factors that lead to customer dissatisfaction. This can lead to an overall improvement in the quality of services provided by the company. Lastly, it can reduce the cost of acquiring new customers, as retaining existing customers is often cheaper than acquiring new ones.
In conclusion, churn analysis on telco data is a critical process for companies operating in the telecommunications industry. It can help improve customer retention, increase customer satisfaction, and reduce the cost of acquiring new customers. By leveraging the right algorithmic approach, companies can gain valuable insights into customer behavior and make informed decisions to retain their customers. There is still scope for further research to explore new techniques and approaches to churn analysis and strengthen existing ones.

An ensemble algorithm
The conventional approach to forecasting defects involves utilizing a solitary classifier, such as the naïve Bayes classifier, decision trees, or a multilayer perceptron, to establish a predictive model based on a dataset that has already been labeled. However, particular classifiers may not be effective in predicting certain defects in specific situations. To tackle this problem, ensemble learning combines the advantages of several classifiers to enhance the identification of defects in the dataset. In recent years, various researchers have demonstrated through empirical evidence that ensemble methods yield greater classification accuracy than individual classifiers. [10]. Boosting is an approach to combining several weak classifiers to create a robust classifier. The first algorithm designed for binary classification to enhance accuracy was AdaBoost, now considered a practical technique for various types of boosting in machine learning. However, AdaBoost has an inherent drawback of being a costinsensitive boosting algorithm, which limits its applications in situations where the costs of misclassification errors need to be treated differently. This study aims to explore ways to overcome this limitation. [11]. Previous studies have indicated that an effective churn prediction model should efficiently use a large volume of historical data to identify churners accurately. However, existing models have several limitations that prevent efficient and accurate churn prediction. The telecom sector generates large amounts of data that may contain missing values, resulting in poor and inaccurate prediction outputs. The churn prediction model being proposed combines clustering and classification algorithms. Its performance is assessed on various datasets used for churn prediction. The evaluation involves using accuracy, precision, recall, and f-measure metrics. The research goals are to pinpoint problems in previous studies and create a more efficient model for predicting customer churn and accurately identify potential churners and offer retention strategies to them. The experimental results show that the proposed churn prediction model achieves higher accuracy and performs better in predicting churn [11] The research presents a new approach to prediction using a hybrid ensemble model that combines various classifiers. The proposed model is tested on two datasets related to telecom and demonstrates high accuracy. However, the model is specifically designed for particular datasets rather than being generalized. Additionally, the article introduces another methodology for churn prediction in fund management services that utilize ensemble learning and introduces a new weighting mechanism to handle imbalanced cost sensitivity when dealing with financial data. The model uses data from different companies and can be enhanced through other learning techniques [12,13]. Ensemble learning refers to the process of creating and combining multiple learners to achieve better results than what could be achieved with a single algorithm. The method utilizes machine learning algorithms to produce weak predictive outcomes based on features extracted from different data projections. These results are then combined using diverse voting methods to achieve better performance. [14,15,16]. Özer Çelik et al. found that deep learning techniques are beneficial for analyzing vast quantities of data. In contrast, ensemble machine learning techniques are more suitable for smaller datasets to achieve superior prediction outcomes. Additionally, the study discovered that the Cox Regression approach effectively identifies highly dispersed independent features in the dataset. However, the primary constraint of this study was how the dataset was adapted, where separate datasets were employed for machine learning and deep learning techniques. Utilizing a single dataset for machine learning and another for deep learning would have been more effective. [18]. Deng et al. predicted customer churn using Catboost, LightGBM, and RandomForest (RF), which produced the best algorithm, Random Forest, with 92% accuracy. Random Forest exhibited the best performance in this experiment, followed by Lightgbm. The dataset used in this experiment had many samples with over 80 features, making it susceptible to over-fitting. However, Random Forest employs an integrated algorithm, and its accuracy surpasses most unique algorithms. Due to the incorporation of two randomness factors, Random Forest is less prone to over-fitting and has a degree of resistance to noise, which results in a strong performance on the testing set. The model can identify the correlation between customer attributes, service attributes, customer spending data, and loss and provide specific mathematical formulas or rules to determine customer churn probability. By leveraging this information, customer churn can be effectively minimized through improved customer service. [19]. Thakkar et al. utilized AdaBoost with a Cost-Enabled Cost-Sensitive Classifier to predict customer churn. While various classifiers and boosting techniques, such as deep learning (DL) algorithms, have been recommended to address customer churn, traditional classification algorithms rely on an error-based framework that prioritizes improving the classifier's accuracy rather than cost sensitivity. In real-world scenarios, misclassification errors are unequal, but conventional classification algorithms treat them as such. However, DL algorithms are computationally intensive and time-consuming. To overcome these challenges, the study proposes a new class-dependent cost-sensitive boosting algorithm called AdaBoostWithCost, which seeks to minimize the cost of churn. The research assesses the proposed algorithm and demonstrates that it consistently outperforms the discrete AdaBoost algorithm in telecom churn prediction. The main objective of the AdaBoostWithCost classifier is to significantly decrease false harmful errors and misclassification costs compared to the AdaBoost algorithm. [20]. Ahmad et al. predicted customer churn using four models: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM," and Extreme Gradient Boosting "XGBOOST." The model developed a churn prediction model using four different models: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM," and Extreme Gradient Boosting "XGBOOST." Their work involved utilizing machine learning techniques on a big data platform and introducing a novel feature engineering and selection approach. To evaluate model performance, they adopted the Area Under Curve (AUC) standard measure, which yielded a value of 93.3%. Another significant contribution of the study was incorporating customer social networks into the prediction model through Social Network Analysis (SNA) feature extraction, which improved model performance from 84% to 93.3% against the AUC standard. The researchers worked on a large dataset created by transforming big raw data provided by SyriaTel telecom company and tested the model using the Spark environment. The best results were achieved by implementing the XGBOOST algorithm for classification in this churn predictive model. [21]. Based on the literature review that has been done, it is evident that customer churn analysis has become a crucial research topic in the telecommunication industry, given the significant impact of customer churn on company revenue and growth. Various studies have used machine learning algorithms, particularly ensemble algorithms, to analyze customer churn and predict customer behavior. However, there has yet to be a consensus on the best ensemble algorithm sequence for customer churn analysis. Therefore, this study aims to fill the gap by comparing and evaluating the performance of several commonly used ensemble algorithms, including XGBoost, AdaBoost, CatBoost, LightGBM, and Gradient Boost, on telco data, using a set of existing variables, including those with the worst possibility. The study aims to determine the best-performing algorithm sequence and provide insights into the factors contributing to churn prediction accuracy. Furthermore, this study aims to strengthen previous research using ensemble algorithms to perform customer churn analysis using a machine-learning approach. Previous studies have shown promising results in using ensemble algorithms to predict customer churn. Still, the results are often limited by the choice of variables, the algorithm sequence used, and the evaluation metrics applied. This study intends to address these limitations by comprehensively evaluating various ensemble algorithms using telco data and a set of existing variables. This study hopes to contribute to the growing body of literature on customer churn analysis and provide practical insights for telecommunication companies to improve their customer retention strategies.

Material and method
Figure1: Overall methodological framework. R.P. Sari et al.
The methodology used to create the prediction model in this research is illustrated in Figure 1. The individual steps involved in the process are elaborated on in the following sections. The dataset utilized in this study is sourced from the website www.kaggle.com and pertains to customer data from Telco. The data consists of 1409 entries, each with a variety of attributes such as the data contains information on customers who have terminated their services within the past month (listed in the "Churn" column), the services that each customer has subscribed to, such as phone, internet, online security, and streaming TV and movies; details on the customer's account, including their length of tenure, contract, payment method, monthly charges, and total charges; and demographic information about the customers, including their gender, age range, and whether they have partners and dependents. This task aims to conduct data preprocessing and eliminate any incomplete, noisy, or untrustworthy data from the system. This process is critical in creating a forecasting model to identify customer churn behavior. The "Python Pandas Library" was utilized to accomplish this goal. In this data, attribute disposal is carried out, namely, the customer ID, which is not used in the process analysis. The attributes of each data are described in Table 1.

Experiment and evaluation
Each model described above underwent training and testing through a 5-fold cross-validation method, which is an honest approach where each observation from the dataset has an equal opportunity to become part of the train and test sets [22]. Due to the small dataset, this approach is considered appropriate. The models were evaluated based on standard metrics for text classification, including accuracy (i.e., the proportion of correctly identified customer churn), recall (ratio of actual customer churn determined), precision (proportion of correctly identified customer churn cases), and F1-score (harmonic mean between recall and accuracy). The results for all metrics are reported in percentages, with higher values indicating better performance [23]. dataset. The correlation between variables in the telco dataset to find customer churn refers to the relationship or association between variables in the dataset that can be used to detect or identify customer churn behavior in the telecommunications industry. Telco datasets typically include information on phone calls, data usage, subscriptions, and customer costs, and these variables can have a significant correlation with customer churn behavior. In correlation analysis, a correlation coefficient measures the strength of the relationship between two variables. The correlation coefficient ranges from -1 to 1, with 1 indicating.

Result of identifying highly correlated features
The focus of the following discourse is on the outcomes of a comparison of multiple algorithms' efficiency in forecasting non-churners and churners in the telecommunications sector. Several algorithms were leveraged and evaluated against specific metrics during the prediction process. Further exploration of this endeavor is provided in the following section. The correlation between datasets is demonstrated in Figure 2. An extensive analysis was conducted during this procedure to comprehend the connection between various attributes and the target attribute. Pearson correlation analysis was utilized for this purpose. Figure 2 presents the numerical values and total scores obtained from comparing and contrasting the relationship between each variable and the target variable, 'churn.' The outcomes helped identify the highly correlated features with an absolute score of 0.5 or more. Further analysis was then performed to recognize only three out of ten attributes with a specific correlation level with the target variable. Therefore, the remaining highly correlated features were successfully eliminated from the a perfectly positive relationship, matter of -1 telling a perfectly negative relationship, and 0 indicating no relationship. In the context of customer churn analysis on telco datasets, the correlation between variables can help to identify the most influential variables in predicting customer churn behavior. It can be used to select the most essential features in the prediction model. In addition, correlation analysis can also help reduce the dataset's dimensions by eliminating highly correlated or redundant variables, thereby increasing the performance and efficiency of the prediction model. PhoneService Whether the customer has a phone service or not Boolean 8 MultipleLines Whether the customer has multiple line or not String 9 InternetService Customer's internet service provider (DSL, Fiber Optic or no) String 10 OnlineSecurity Whether the customer has online security or not String After identifying the critical attributes, the dataset was divided into two non-equivalent sections: training and testing. These sections were then fed into different machine learning techniques, such as Adaboost, Gradient Boost, XGBoost, CatBoost, and LightGBM, to determine the best one for creating the final forecasting model. The performance results during the forecasting process were assessed using several methods, including Accuracy, Confusion Matrix, Precision, Recall, and F1-Score, with Kfold cross-validation as the primary technique. [24]. Based on the results of the tests carried out, the values of each evaluation model can be seen in Table 2.
The results of machine learning algorithms used to build different models are presented in Table 2, which shows the accuracy, recall, precision, and F1 scores. Based on the outcomes, most ensemble algorithms effectively predicted customer churn, with an accuracy rate of over 80%, using 1409 datasets. Among the five algorithms tested, XGBoost demonstrated the highest accuracy, with an average accuracy value of 81.2%, recall of 91%, precision of 84%, and F1-Score of 88%. This confirms that XGBoost outperformed the other techniques and is the most suitable algorithm for the final predictive model. Moreover, to achieve even higher accuracy, a Grid Search CV hyperparameter setting was performed with XGBoost, resulting in an accuracy of 81.2% in forecasting churning behavior.

Comparison of performance
In addition to the results obtained from the tables as mentioned earlier, the K-fold cross-validation technique was performed with five folds to find the best approach to deal with any proportion of testing and training data can be seen in the following Figure 3. The graph presented illustrates the accuracy of various algorithms for each fold and the mean accuracy.
Overall, the accuracy decreases after the 2nd fold and continues declining until the 3rd fold, except for ensemble techniques demonstrating a more stable accuracy level.
Although there is a slight improvement in the accuracy of LightGBM in the 5th fold, the highest mean accuracy is achieved only by XGBoost. XGBoost is proven to be the best algorithm in the class of ensemble algorithms because it has a better level of accuracy [25]. Based on Swetha et al., who conducted a customer churn analysis in the telecommunication industry using XGBoost, the accuracy was 99.6%. The XGBoost algorithm has a good improvisation with an increase in accuracy from 27.4% to 52%, which increases gradually Fahd Idrissi Khamlichi et al. [26] conducted a study where various standalone machine learning techniques, including Random Forest, XGBoost, SVM, Decision Tree, Logistic Regression, and KNN, were applied to a publicly available dataset containing 5000 samples. According to their findings, XGBoost was the most successful technique, achieving an accuracy of 95% and an F-measure of 80%.  On this dataset, they obtained an improved accuracy and f1score of 96.25% and 86.34%, respectively. [27]. Table 4 provides a comparison of previous research with the experimental results of this study. By examining this table, researchers can gain insights into the similarities and differences between previous research and the results obtained in this study. This comparison can help researchers identify areas where further research is needed, or new experimental approaches may be necessary to improve the accuracy of customer churn analysis. By analyzing the strengths and weaknesses of previous research, researchers can refine their experimental approaches and better understand the most effective algorithms and techniques for analyzing customer churn in telco data. Table 4 compares the results of four previous studies that carried out comparisons using the best scheme. The best scheme involves maximizing data preprocessing and optimizing the algorithm's performance by focusing on several parameters to achieve the highest accuracy possible. As a result, previous studies have found that XGBoost is the algorithm that has the best accuracy for analyzing customer churn. However, this study used the worst scheme instead of the best scheme. The worst scheme used a smaller number of datasets compared to previous studies. It did not perform any special treatment on the dataset other than processing it without changing several data types from each dataset. Despite using the worst scheme, the XGBoost algorithm still outperformed the different algorithms in terms of accuracy. The experiments carried out in this study provide new insights into how XGBoost can achieve high accuracy even without using the best scheme. However, it should be noted that in a real case study, it is not feasible to implement the worst scheme. The worst scheme only tests the algorithm's strength in detecting and analyzing data.
This study further strengthens previous research findings by using a different approach in the ensemble algorithm comparison. While previous studies tended to use the best schema to improve the algorithm's accuracy, this study utilized a different schema to identify the best algorithm from the existing ensemble algorithm. Despite using a different schema, the results and conclusions of this study are consistent with previous studies, which have consistently found that the XGBoost algorithm is the most effective algorithm for analyzing customer churn with telco data. The findings suggest that the XGBoost algorithm is robust and can achieve high accuracy even when a less optimal schema is utilized. These results can be valuable for businesses looking to improve customer retention and reduce customer churn. By using the XGBoost algorithm, companies can effectively predict customer churn and implement strategies to retain their customers.

Conclusion
The experiment costumer churn prediction with 5 ensemble algorithms shows the result the most most of the ensemble algorithms accurately predicted customer churn, achieving an accuracy rate of over 80% using 1409 datasets. XGBoost performed the best among the five algorithms tested, achieving the highest accuracy rate. The dataset is then divided into training and testing sections and fed into various machine learning techniques, such as Adaboost, Gradient Boost, XGBoost, CatBoost, and LightGBM, to find the best algorithm for creating the final forecasting model. The performance of the models is evaluated using different metrics, including Accuracy, Confusion Matrix, Precision, Recall, and F1-Score, with K-fold crossvalidation being the primary method. the results of different machine learning algorithms used to construct various models. These results show that the majority of ensemble algorithms were successful in predicting customer churn, with an accuracy rate of over 80% using 1409 datasets. Among the five algorithms evaluated, XGBoost showed the best accuracy, with an average of 81.2%, recall of 91%, precision of 84%, and an F1-Score of 88%. This indicates that XGBoost outperformed the other algorithms and is the most appropriate algorithm to use in creating the final predictive model. Additionally, a Grid Search CV hyper-parameter setting was conducted with XGBoost to further enhance accuracy, resulting in an 81.2% accuracy rate in forecasting churning behavior. The final attempt resulted in the successful creation of a highly practical predictive model. This effort provided the advantage of accurately predicting the probability of customer churn. The model can be expanded further by creating a combined churn prediction model for the telecommunications industry. It can be regarded as the most suitable prediction model and can be used freely in various other companies. It was also observed that executing hyperparameter tuning before conducting cross-validation can improve the accuracy of ensemble techniques.