A Consolidated Tree Structure Combining Multiple Regression Trees With Varying Depths, Resulting in an Efficient Ensemble Model

Elmira Ashoor Mahani, Koorush Ziarati

Abstract


Regression is a commonly used technique to predict a continuous target value based on a set of input features. Decision trees are hierarchical models that offer high interpretability, fast and precise reasoning, and are also used for regression tasks. However, determining the optimal stopping conditions for decision trees is a complex problem that has attracted significant research interest. Ensemble based modeling is an effective approach for adjusting hyper-parameters, where base models with varying parameter values are combined instead of searching for the best value. Random forests are a classic example of an ensemble model that combines decision trees generated from different perspectives. This paper proposes a novel approach that generates base trees using the same tree-generation procedure, but with different stopping conditions. Unlike random forests, this model can be efficiently integrated into a single tree structure. Additionally, the paper proposes some aggregation methods based on weighting the base models. Experimental results on standard datasets demonstrate that the proposed method outperforms well-known stopping conditions.


Full Text:

PDF

References


Abellan, J., Mantas, C. J., Castellano, J. G., & Moral-Garcia, S. (2018). Increasing diversity in random forest learning algorithm via imprecise probabilities. Expert Systems with Applications, 97, 228-243.

Ahmad, M. W., Reynolds, J., & Rezgui, Y. (2018). Predictive modelling for solar thermal energy systems: A comparison of support vector regression, random forest, extra trees and regression trees. Journal of cleaner production, 203, 810-821.

Alamgir, M. S. M., Sultana, M. N., & Chang, K. (2020). Link adaptation on an underwater communications network using machine learning algorithms: Boosted regression tree approach. IEEE access, 8, 73957-73971.

Asuncion, A., & Newman, D. (2007). UCI machine learning repository.

Avellaneda, F. (2020, April). Efficient inference of optimal decision trees. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 04, pp. 3195-3202).

Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25(2), 197-227.

Breiman, Leo, et al. Classification and regression trees. Routledge, 2017.

Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), 20-28.

Choubin, B., Moradi, E., Golshan, M., Adamowski, J., Sajedi-Hosseini, F., & Mosavi, A. (2019). An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Science of the Total Environment, 651, 2087-2096.

Fidalgo-Merino, R., & Nunez, M. (2011). Self-adaptive induction of regression trees. IEEE transactions on pattern analysis and machine intelligence, 33(8), 1659-1672.

Ghasemain, B., Asl, D. T., Pham, B. T., Avand, M., Nguyen, H. D., & Janizadeh, S. J. V. J. O. E. S. (2020). Shallow landslide susceptibility mapping: A comparison between classification and regression tree and reduced error pruning tree algorithms. Vietnam Journal of Earth Sciences, 42(3), 208-227.

Gomes, C. M. A., & Jelihovschi, E. (2020). Presenting the regression tree method and its application in a large-scale educational dataset. International Journal of Research & Method in Education, 43(2), 201-221.

Gomes, C. M. A., Amantes, A., & Jelihovschi, E. G. (2020). Applying the regression tree method to predict students’ science achievement. Trends in Psychology, 28(1), 99-117.

Hornung, R. (2020). Diversity forests: Using split sampling to allow for complex split procedures in random forest.

Hu, Y., Dai, Z., & Guldmann, J. M. (2020). Modeling the impact of 2D/3D urban indicators on the urban heat island over different seasons: A boosted regression tree approach. Journal of environmental management, 266, 110424.

Jadhav, D. A. (2021). An enhanced and secured predictive model of Ada-Boost and Random-Forest techniques in HCV detections. Materials Today: Proceedings.

Kordos, M., Piotrowski, J., Bialka, S., Blachnik, M., Golak, S., & Wieczorek, T. (2012, March). Evolutionary optimized forest of regression trees: application in metallurgy. In International Conference on Hybrid Artificial Intelligence Systems (pp. 409-420). Springer, Berlin, Heidelberg.

Loh, W. Y. (2002). Regression tress with unbiased variable selection and interaction detection. Statistica sinica, 361-386.

Lotfi, S., Ghasemzadeh, M., Mohsenzadeh, M., & Mirzarezaee, M. (2021). The Construction of Scalable Decision Tree based on Fast Splitting and J-Max Pre-Pruning on Large Datasets. International Journal of Engineering, 34(8).

Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American statistical association, 58(302), 415-434.

Muharam, F. M., Nurulhuda, K., Zulkafli, Z., Tarmizi, M. A., Abdullah, A. N. H., Che Hashim, M. F., ... & Ismail, M. R. (2021). UAV-and Random-Forest-AdaBoost (RFA)-Based Estimation of Rice Plant Traits. Agronomy, 11(5), 915.

Nancy, P., Muthurajkumar, S., Ganapathy, S., Kumar, S. S., Selvi, M., & Arputharaj, K. (2020). Intrusion detection using dynamic feature selection and fuzzy temporal decision tree classification for wireless sensor networks. IET Communications, 14(5), 888-895.

Panhalkar, A. R., & Doye, D. D. (2021). A novel approach to build accurate and diverse decision tree forest. Evolutionary intelligence, 1-15.

Pham, B. T., Prakash, I., & Bui, D. T. (2018). Spatial prediction of landslides using a hybrid machine learning approach based on random subspace and classification and regression trees. Geomorphology, 303, 256-270.

Rajesh, B., Vardhan, M. V. S., & Sujihelen, L. (2020, June). Leaf Disease Detection and Classification by Decision Tree. In 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184) (pp. 705-708). IEEE.

Sahoo, S., Subudhi, A., Dash, M., & Sabut, S. (2020). Automatic classification of cardiac arrhythmias based on hybrid features and decision tree algorithm. International Journal of Automation and Computing, 17(4), 551-561.

Salman Saeed, M., Mustafa, M. W., Sheikh, U. U., Jumani, T. A., Khan, I., Atawneh, S., & Hamadneh, N. N. (2020). An efficient boosted C5. 0 Decision-Tree-Based classification approach for detecting non-technical losses in power utilities. Energies, 13(12), 3242.

Shabani, S., Pourghasemi, H. R., & Blaschke, T. (2020). Forest stand susceptibility mapping during harvesting using logistic regression and boosted regression tree machine learning models. Global Ecology and Conservation, 22, e00974.

Vanfretti, L., & Arava, V. N. (2020). Decision tree-based classification of multiple operating conditions for power system voltage stability assessment. International Journal of Electrical Power & Energy Systems, 123, 106251.

Wang, C., Wang, A., Xu, J., Wang, Q., & Zhou, F. (2020). Outsourced privacy-preserving decision tree classification service over encrypted data. Journal of Information Security and Applications, 53, 102517.

Wang, Q., Zhou, Y., Ding, W., Zhang, Z., Muhammad, K., & Cao, Z. (2020). Random forest with self-paced bootstrap learning in lung cancer prognosis. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 16(1s), 1-12.

Wang, Y., Xia, S. T., & Wu, J. (2017). A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification. Knowledge-Based Systems, 120, 34-42.

Witten, I. H., Frank, E., Hall, M. A., Pal, C. J., & DATA, M. (2005). Practical machine learning tools and techniques. In DATA MINING (Vol. 2, p. 4).

Yang, Q., Williamson, A. M., Hasted, A., & Hort, J. (2020). Exploring the relationships between taste phenotypes, genotypes, ethnicity, gender and taste perception using Chi-square and regression tree analysis. Food Quality and Preference, 83, 103928.

Yang, S. B., & Chen, T. L. (2020). Uncertain decision tree for bank marketing classification. Journal of Computational and Applied Mathematics, 371, 112710.

Zhang, B., Wei, Z., Ren, J., Cheng, Y., & Zheng, Z. (2018). An empirical study on predicting blood pressure using classification and regression trees. IEEE access, 6, 21758-21768.




DOI: https://doi.org/10.31449/inf.v47i9.3844

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.