Stock Market Decision Support Modeling with Tree-Based Adaboost Ensemble Machine Learning Models

Forecasting stock market behavior has received tremendous attention from investors and researchers for a very long time due to its potential profitability. Predicting stock market behavior is regarded as one of the extremely challenging applications of time series forecasting. While there is divided opinion on the efficiency of markets, numerous empirical studies which are widely accepted have shown that the stock market is predictable to some extent. Statistical based methods and machine learning models are used to forecast and analyze the stock market. Machine learning (ML) models typically perform better than those of statistical and econometric models. In addition, performance of ensemble ML models is typically superior to those of individual ML models. In this paper, we study and compare the efficiency of treebased ensemble ML models (namely, Bagging classifier, Random Forest (RF), Extra trees classifier (ET), AdaBoost of Bagging (ADA_of_BAG), AdaBoost of RandomForest (ADA_of_RF), and AdaBoost of ExtraTrees (ADA_of_ET)). Stock data randomly collected from three different stock exchanges were used for the study. Forty technical indicators were computed and used as input features. The data set was spilt into training and test sets. The performance of the models was evaluated with the test set using accuracy, precision, recall, F1-score, specificity and AUC metrics. Kendall W test of concordance was used to rank the performance of the different models. The experimental results indicated that AdaBoost of Bagging (ADA_of_BAG) model was the highest performer among the tree-based ensemble models studied. Also, boosting of the bagging ensemble models improved the performance of the bagging ensemble models.


Introduction
Forecasting stock market behavior has received tremendous attention from investors, and researchers for a very long time due to its potential profitability (Bacchetta, et al, 2009;Campbell & Hamao, 1992;Granger & Morgenstern, 1970;Lin, et al, 2009;Rajashree & Pradipta, 2016;Weng et, al, 2018). It offers investors the opportunity to be proactive and take decisions which are knowledge-driven in order to gain good returns on their investments with less risk. Predicting stock market behaviour is regarded as one of the extremely challenging applications of time series forecasting. The stock market is affected by factors, such as economic policies, government decrees, political situations, psychology of investors, and so on (Tan, et al, 2007). These factors make the market very dynamic, nonlinear and complex, nonparametric, and chaotic nature (Abu- Mostafa & Atiya, 1996). While there is divided opinion on the efficiency of markets, numerous empirical studies which are widely accepted have shown that the stock market is predictable to some extent (Bollerslev, et al, 2014;Chen, et, al, 2003;Feuerriegel, & Gordon, 2018;Kim, et al, 2011;Phan, et, al, 2015). Statistical based methods and machine learning models are used to forecast and analyze the stock market. The statistical based approaches are not able to predict the stock market very well due the chaotic, noisy and nonlinear in nature of the market. Contrary to statistical approaches, machine learning methods are able deal with the dynamic, chaotic, noisy, and nonlinear data of the stock market and have been widely used for a more accurate forecasting of stock market ( The ensemble models create several individual models to make predictions and then aggregate the outcomes of each individual model to make a final prediction. The performance of ensemble models is better than that of individual models as the ensemble models reduce the generalization error of the predictions. The dominance of ensemble models over individual models has been demonstrated in the field of financial expert systems (Chen et al., 2007;Haung et al, 2008;Tsai et al., 2011). Hence, in this work, we study and compare the effectiveness of tree-based bagging ensemble machine 478 Informatica 44 (2020) 477-489 E. K. Ampomah et al. learning models and the impact of Boosting on the treebased bagging ensemble models. Specifically, the study compares the effectiveness of the following classifiers: Random forest classifier (RF), Bagging classifier (BAG), and Extra trees classifier (ET), AdaBoost of RandomForest classifier (ADA_of_RF) model, AdaBoost of Bagging classifier (ADA_of_BAG) model and AdaBoost of ExtraTrees classifier (ADA_of_ET) models in forecasting one-day ahead stock price movement.

Related studies
There have been a number of research studies on forecasting stock market behavior with machine learning algorithms. In this section, we provide a review of some of these studies. Tsai, et al, (2011) studied the performance of ensemble classifiers in analyzing stock returns. They considered the hybrid approaches of majority voting and bagging. They compared the performance homogeneous and heterogeneous ensemble classifiers with those of single baseline classifiers (decision trees, neural networks, and logistic regression). The experimental results indicated that ensemble classifiers outperformed the single classifiers in terms of prediction. In terms of prediction accuracy, there was no significant difference between majority voting and bagging, however, the majority voting had better stock returns than the bagging. Finally, the homogeneous neural networks ensemble classifiers produced the best performance by majority voting when predicting stock returns. Huang et al, (2008) applied wrapper approach to select subset of optimal features from the initial feature set of 23 technical indices and then employed an ensemble voting scheme that combines different classifiers to forecast the trend in Korea and Taiwan stock markets. Experimental outcome shows that the wrapper approach is able to produce better performance than the commonly used features filters, including 2 Statistic, Information gain, ReliefF, Symmetrical uncertainty and CFS. In addition, the proposed ensemble voting scheme performed better than the single classifier such as SVM, kth nearest neighbor, back-propagation neural network, decision tree, and logistic regression. Lunga & Marwala, (2006) investigated the predictability of direction of movement of stock market with Learn++ algorithm by predicting the daily movement direction of the Dow Jones. The Learn++ algorithm is derived from the AdaBoost algorithm. The framework was implemented with multi-layer Perceptron (MLP) as a weak Learner. Initially, a weak learning algorithm, which attempts to learn a class concept with a single input Perceptron, is established. The Learn++ algorithm is applied to improve the learning capacity of the weak MLP and introduces the concept of online incremental learning. The proposed framework can adapt as new data are introduced and is able to classify. Balling et al, (2015) compared the performance of ensemble classifier models (Random Forest, AdaBoost and Kernel Factory) against individual classifier models (Neural Networks, Logistic Regression, SVM, and K-Nearest Neighbor). They used data from 5767publicly listed European companies and AUC metric to evaluate the models. The experimental results indicated that Random Forest was the best performer with SVM, Kernel Factory, AdaBoost, Neural Networks, K-Nearest Neighbors and Logistic Regression following in that order. Nayak et al, (2016) made an attempt to predict stock market trend. Two models, one for daily prediction and the other for monthly prediction were built. Three supervised machine learning algorithms namely Decision Boosted Tree, Support Vector Machine, and Logistic Regression were used. With the daily prediction model, historical stock price data were combined with sentiment data. An accuracy of up to 70% were observed using the supervised machine learning algorithms on daily prediction model. It was observed that Decision Boosted Tree performed better than Support Vector Machine and Logistic Regression. The monthly prediction models were used to evaluate the similarity among any two different months trend. The evaluation demonstrated that trend of one month were least correlated with the trend of other months. Khan et al, (2020) employed machine learning algorithms on social media and financial news data to establish the influence of this data on stock market prediction accuracy for ten subsequent days. In order to improve performance and quality of predictions, the authors performed feature selection and spam tweets reduction on the data sets. In addition, experiments to determine stock markets that are difficult to predict and those that are more influence by social media and financial news. A comparison of results of different algorithms to find a consistent classifier was done. Deep learning is used and some classifiers are ensembled. The experimental outcome showed that highest prediction accuracies of 80.53% and 75.16% were attained using social media and financial news, respectively. Also, the results showed that, the New York and Red Hat stock markets are difficult to predict, the New York and IBM stocks are strongly influenced by social media, while London and Microsoft stocks are strongly influenced by financial news. Random forest classifier proved to be consistent and provided the highest accuracy of 83.22% by its ensemble. direction prediction ought to include ensemble techniques in their sets of algorithms. Vijha et al, (2020) utilized artificial neural network and random forest techniques to predict the next day closing price for five companies which belong to different sectors of operation. The authors generated new variables which are used as inputs to the model from the financial data: Open, High, Low and Close prices of stocks. The evaluation of the models was done using standard RMSE and MAPE.

Method
The stock data were subjected to (i) data cleaning; to deal with the missing and erroneous values, (ii) data normalization; to ensure that, the machine learning models perform well. Each dataset was split into training and test sets for the purpose of this experiment. The training set was made up of the initial 70% of the data set, and the final 30% of the data set constituted the test set. Each model was trained with the training set and evaluated using the test set.

Data and features
For this research study, we randomly collected ten different stock data from three different stock markets (namely NYSE, NASDAQ, and NSE) through the yahoo finance API. The data from the following companies and  Table 1

Feature scaling
The input features have different range of values. Hence, we apply standardization scaling (z-score) to bring all the input features within the same range. The z-score centres values around the mean with a unit standard deviation. The scaling of input features assures that the larger value features do not overwhelm smaller value inputs, and also helps minimize the prediction errors (Kim, 2003).

Machine learning algorithms
The study considered and compared the efficacy of Random forest classifier (RF), Bagging classifier (Bag), and Extra trees classifier (ET), AdaBoost of RandomForest (ADA_of_RF) model, AdaBoost of Bagging (ADA_of_BAG) model and AdaBoost of ExtraTrees (ADA_of_ET) in forecasting one-day ahead stock price movement. A discussion of these machine learning (ML) algorithms is presented here.

AdaBoost algorithm
AdaBoost is an ensemble/meta-learning approach that builds a strong classifier as a linear combination in an iterative way. In every iteration, it makes a call to a weak learning algorithm (the base learner) which returns a classifier, and gives a weight coefficient to it. AdaBoost tweaks subsequent base learners in favor of those instances misclassified by preceding classifiers. The outcome of the weak learners is aggregated into a weighted sum that represents the final outcome of the boosted classifier. The final output of the boosted classifier is decided by a weighted "vote" of the base classifiers. The smaller the error of the base classifier, the larger is its weight in the final vote (Freund & Schapire, 1996). AdaBoost is sensitive to outliers and noisy data. AdaBoost ML algorithm is given by algorithm 1 below.

Decision tree algorithm
Decision tree is a hierarchical tree structure that is used to determine the class label of instances based on a series of if-then rules about the features /attributes of the class. A decision tree consists of nodes (root, internal, and leaf), and branches. The root and internal nodes specify a test condition on a feature, each branch represents one of the possible values of the feature, and each leaf node contains a class label. To classify an instance, we start from the root node and apply the test condition to the instance and follow the branch with the value corresponding to the test outcome. This will take us to either an internal node, for which another test condition is executed, or to a leaf node.

Bagging algorithm
A Bagging classifier is an ensemble classifier which generates multiple base learners (decision tree) and fits each of these base learners on random subsets of the initial dataset and then combine their individual predictions (through voting or averaging) to produce a final prediction. All the base learners are trained in parallel with the new training sets which are generated by randomly drawing N samples with replacement from the original training datasetwhere N is the size of the original training set. The training set for each base learner is independent of the one another. Since the training set for each base learner is generated by resampling initial training data set with replacement, some instances may appear many times while others may not appear. If perturbing the training set can cause significant changes in the models built, then bagging can increase accuracy (Breiman, 1996). Bagging is less sensitivity to outliers and noise, and has a parallel structure for efficient implementations. It is a technique that reduces the variance of an estimated prediction function.

Random forest algorithm
Random Forest constructs an ensemble of de-correlated trees and aggregates them to improve upon the robustness and performance of the decision trees (Breiman, 2001). Each tree is trained with a bootstrap sample from the original training data. In addition, a subset of features is selected randomly from the full set of original features to grow the tree at each node. To establish the class label of a new instance, each decision tree delivers a class label for this instance, and random forest then aggregates the class labels predicted and selects the most voted prediction as the label for the new instance. Since RF searches for the best feature among a random subset of features, it leads to a wide diversity that generally produce a better model. RF can handle larger input datasets.

Extra trees algorithm
Extra trees algorithm is a tree-based ensemble machine learning algorithm. ET constructs an ensemble of base learners (decision trees) using the classical top-down procedure. The predictions of all the trees are combined to generate the final prediction through majority vote. ET is similar to RF in that it constructs the trees and split nodes with random subsets of features. However, ET differs from RF on two main counts which are (i) ET uses the entire training data to grow the trees (instead of a bootstrap replica). (ii) ET splits nodes by selecting split-points fully at random. The randomization of the cut-point and features together with ensemble averaging reduces variance while the use of the entire original training sample minimizes bias (Geurts, et al, 2006). ET is computationally efficient.

Hyperparameter optimization
Machine learning algorithms have a set of hyperparameters, and these hyperparameters determine how the model is structured. Our aim is to find the right combination of values for these hyperparameters which will ensure that the machine learning models perform at their best. In this work, we set the hyperparameters of the various machine learning algorithms using Bayesian hyperparameter optimization technique (

Evaluation metric
The following classical quality evaluation metrics are used to evaluate the performance of the tree-based AdaBoost ensemble ML models: AUC: it tells a model's ability to discriminate between positive and negative instances. The worst AUC is 0.5, and the best AUC is 1.0.

Results and discussion
The performances of the different tree-based ensemble ML models on the stock data sets are summarized and discussed in this section.         Overall, the mean accuracy value of ADA_of_BAG was the best among all the tree-based ensemble algorithms. Boosting of the bagging algorithms (ADA_of_ BAG, ADA_of_RF and ADA_of_ET) improved the mean accuracy values of their respective bagging algorithms (Bag, RF and ET). Figure 1 presents the box plot of the accuracy values of the various models. Table 3 presents the F1-Scores of the tree-based ensemble models on the various stock data. ADA_of_BAG obtained the highest F1-Score on AAPL, S&P_500, BAC and HPCL stock data sets. Also, Bag recorded the highest accuracy values on KMX and TATASTEEL stock data sets. ADA_of_RF achieved the best F1-Score on the ABT stock data set. In general, the mean F1-value of ADA_of_BAG was the best among all the tree-based ensemble algorithms. In addition, boosting of the bagging algorithms (ADA_of_ BAG, ADA_of_RF and ADA_of_ET) improved the mean F1 values of their respective base bagging algorithms (Bag, RF and ET). Figure 2 presents the box plot of the F1-Scores of the various models. Table 4 shows the specificity results of the tree-based ensemble models on the various stock data. ADA_of_BAG had the highest specificity on ABT, S&P_500 and HPCL stock data sets. Also, Bag obtained the highest specificity on KMX, TATASTEEL and BAC stock data sets. ADA_of_RF achieved the highest specificity on the ABT stock data set. The mean specificity value of ADA_of_BAG was the best among all the treebased ensemble algorithms. Moreover, boosting of the bagging algorithms (ADA_of_ BAG, ADA_of_RF and ADA_of_ET) improved the mean specificity results of their respective base bagging algorithms (Bag, RF and ET). Figure 3 presents the box plot of the specificity results of the various models. Table 5 presents the AUC results of the tree-based ensemble models on the various stock data. ADA_of_BAG performed better than the other models on AAPL, ABT, S&P_500, BAC and HPCL stock data sets. Similarly, the performance of Bag was higher than the other models on KMX and TATASTEEL stock data sets. In general, the mean AUC of ADA_of_BAG was the best among all the tree-based ensemble algorithms. In addition, boosting of the bagging algorithms Bag and RF (ADA_of_ BAG and ADA_of_RF) recorded a better mean AUC value than their respective base bagging algorithms (Bag and RF). Figure 4 shows the box plot of the AUC results of the various models. Figure 5-11 shows the ROC curves of all the treebased ensemble models considered in this study on the AAPL, ABT, KMX, S&P_500, TATASTEEL, HPCL and BAC stock data sets respectively.       The Kendall's coefficient of concordance (W) is applied to rank the efficiency of the different tree-based AdaBoost ensemble models. This test is a measure that applies ranks to establish an agreement among raters (Kendall & Babington, 1939). It determines the agreement among diverse raters who are evaluating a given set of n objects. Depending on the area where it is being applied, the raters can be variables, characters, and so on. The raters are the different data sets in this articleKendall's coefficient of concordance has been applied in many researches including Kendall's Coefficient of Concordance for Sociometric Rankings with Self Excluded by Gordon et al, (1971), Use of Kendall's coefficient of concordance to assess agreement among observers of very high-resolution imagery by Gearhart et al, (2013), Measuring and testing interdependence among random vectors based on Spearman's ρ and Kendall's τ by Zhang & Wang, (2020), In this study a cut-off value of 0.05 for the significance level (p-value) is used. The Kendall's coefficient is considered to be significant and having the capability of giving an overall ranking when p<0.05. At p = 0.05, the critical value of chi-square ( 2 ) for five (5) degrees of freedom is 11.07. The degrees of freedom equal the total number of ML algorithms (which is six) minus one. The results of Kendall's coefficient of concordance are given by tables 6-9 below using accuracy, precision, recall, F1-score, specificity, and AUC respectively. Table 6 shows that Kendall's coefficient using the accuracy metric is significant (p<0.05, >11.07) and that the performance of ADA_of_BAG model is the best among the ensemble methods. The overall ranking is ADA_of_BAG >Bag > ADA_of_RF > RF > ADA_of_ET >ET. Table 7 presents that Kendall's coefficient using the F1-Score metric is significant (p<0.05, >11.07) and the performance of ADA_of_BAG model is the best among the ML ensemble models. The overall ranking is ADA_of_BAG >Bag > ADA_of_RF > RF > ET > ADA_of_ET. Table 8 demonstrates that Kendall's coefficient using the specificity metric is significant (p>0.05, <11.07), and ADA_of_BAG had the highest rank. The overall ranking is ADA_of_BAG >Bag >ADA_of_RF > RF = ADA_of_ET > ET. Table 9 demonstrates that Kendall's coefficient using the AUC metric is significant (p<0.05, >11.07) and the performance of ADA_of_BAG model has the best rank

Conclusion
This study compares the efficacy of tree-based of bagging ensemble machine learning models and boosting of treebased bagging machine learning models in forecasting movement direction of stock prices. Seven randomly collected stock data from three different stock exchanges were used. The data sets were split into training and test sets. The performance of the models was evaluated using accuracy, F1-score, specificity, and AUC metrics on the test data set. Kendall W test of concordance was used to ranked the performance of the different models. The results indicated that boosting of tree-based bagging ensemble models, improves the performance of the bagging models. Overall, the performance of ADA_of_BAG model was superior to the remaining models used in the study. The limitation of this study is that it only considered bagging models and boosting of bagging models. Hence, future study will investigate boosting models and bagging of boosting models in predicting stock price behaviour.    Table 9: Kendall's coefficient of concordance ranks of tree-based ensemble models using AUC metric.

Volume Indicator Description
Chaikin A/D Line (ADL) Estimates the Advance/Decline of the market. Chaikin A/D Oscillator (ADOSC) Indicator of another indicator. It is created through application of MACD to the Chaikin A/D Line On Balance Volume (OBV) Uses volume flow to forecast changes in price of stock  The log return for a period of time is the addition of the log returns of partitions of that period of time. It makes the assumption that returns are compounded continuously rather than across sub-periods Percentage Price Oscillator (PPO) Computes the difference between two moving averages as a percentage of the bigger moving average Rate of change (ROC) Measure of percentage change between the current price with respect to a at closing price n periods ago. Relative Strength Index (RSI) Determines the strength of current price in relation to preceding price Stochastic (STOCH) Measures momentum by comparing closing of a security with earlier trading range over a specific period of time Stochastic Relative Strength Index (STOCHRSI) Used to estimate whether a security is overbought or oversold. It measures RSI over its own high/low range over a specified period. Ultimate Oscillator (ULTOSC) Estimates the price momentum of a security asset across different time frames.

Williams' %R (WILLR)
Indicates the position of the last closing price relative to the highest and lowest price over a time period.

Price Transform Indicator Description
Median Price (MEDPRICE) Measures the mid-point of each day's high and low Typical Price (TYPPRICE) Measures the average of each day's price. Weighted Close Price (WCLPRICE) Average of each day's price with extra weight given to the closing price.