Optimizing Sequential Forward Selection on Classification using Genetic Algorithm

Knitchepon Chotchantarakun

Abstract


Regarding the digital transformation of modern technologies, the amount of data increases significantly resulting in novel knowledge discovery techniques in Data Analytic and Data Mining. These data usually consist of noises or non-informative features which affect the analysis results. The features-eliminating approaches have been studied extensively in the past few decades name feature selection. It is a significant preprocessing step of the mining process, which selects only the informative features from the original feature set. These selected features improve the learning model efficiency. This study proposes a forward sequential feature selection method called Forward Selection with Genetic Algorithm (FS-GA). FS-GA consists of three major steps. First, it creates the preliminarily selected subsets. Second, it provides an improvement on the previous subsets. Third, it optimizes the selected subset using the genetic algorithm. Hence, it maximizes the classification accuracy during the feature addition. We performed experiments based on ten standard UCI datasets using three popular classification models including the Decision Tree, Naive Bayes, and K-Nearest Neighbour classifiers. The results are compared with the state-of-the-art methods. FS-GA has shown the best results against the other sequential forward selection methods for all the tested datasets with O(n2) time complexity.

Full Text:

PDF

References


Zeng, Z., Zhang, H, Zhang, R. and Zhang, Y. (2014). Hybrid Feature Selection Method based on Rough Conditional Mutual Information and Naïve Bayesian Classifier, Hindawi Publishing Corporation, ISRN Applied Mathematics.

https://doi.org/10.1155/2014/382738

Somol, P., Pudil, P. and Kittler, J. (2004). Fast Branch & Bound Algorithms for Optimal Feature Selection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(7), pp. 900-912.

https://doi.org/10.1109/tpami.2004.28

Nakariyakul, S. and Casasent, D. P. (2007). Adaptive branch and bound algorithm for selecting optimal features, Pattern Recognition Letters, 28, pp. 1415-1427.

Cai, J., Luo, J., Wang, S. and Yang, S. (2018). Feature selection in machine learning: A new perspective, Neurocomputing, pp. 70-79.

https://doi.org/10.1016/j.neucom.2017.11.077

Chandrashekar, G. and Sahin, F. (2014). A survey on feature selection methods, Computers and Electrical Engineering, 40, pp. 16-28.

Sutha, K. and Tamilselvi, J. J. (2015). A Review of Feature Selection Algorithms for Data Mining Techniques, International Journal on Computer Science and Engineering (IJCSE), pp. 63-67.

Jovic, A., Brkic, K. and Bogunovic, N. (2015). A review of feature selection methods with applications, International Convention on Information and Communication Technology.

Pudil, P., Novovicova, J. and Kittler, J. (1994). Floating search methods in feature selection, Pattern Recognition Letters, pp. 1119-1125.

https://doi.org/10.1016/0167-8655(94)90127-9

Pavya, K. and Srinivasan, B. (2017). Feature Selection Techniques in Data Mining: A Study, International Journal of Scientific Development and Research (IJSDR), 2(6), pp. 594-598.

A. W. Whitney. (1971). A Direct Method of Nonparametric Measurement Selection, IEEE Transactions on Computers, pp. 1100-1103.

https://doi.org/10.1109/t-c.1971.223410

Somol, P., Pudil, P., Novovicova, J. and Paclik P. (1999). Adaptive floating search methods in feature selection, Pattern Recognition Letters, pp. 1157-1163.

Nakariyakul, S. and Casasent, D. P. (2009). An improvement on floating search algorithms for feature subset selection, Pattern Recognition, pp. 1932-1940.

Lv, J., Peng, Q. and Sun, Z. (2015). A modified sequential deep floating search algorithm for feature selection, International Conference on Information and Automation, pp. 2988-2933.

Pudil, P., Ferri, F. J., Novovicova, J. and Kittler, J. (1994). Floating Search Methods for Feature Selection with Nonmonotonic Criterion Functions, Proceedings of the 12th IAPR International Conference on Pattern Recognition, pp. 279-283.

https://doi.org/10.1109/icpr.1994.576920

Chotchantarakun, K. and Sornil, O. (2021). An Adaptive Multi-levels Sequential Feature Selection, International Journal of Computer Information Systems and Industrial Management Applications (IJCISIM), 13, pp. 010-019.

Chotchantarakun, K. and Sornil, O. (2021). Adaptive Multi-level Backward Tracking for Sequential Feature Selection, Journal of ICT Research and Applications, 15, pp. 1-20.

https://doi.org/10.5614/itbj.ict.res.appl.2021.15.1.1

Holland, J. H. (1992). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, MIT Press: Cambridge, UK.

El-Shafiey, M. G., Hagag, A., El-Dahshan, E. A. and Ismail, M. A. (2022). A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest, Multimedia Tools and Applications, 81, pp. 18155-18179.

Homsapaya, K. and Sornil, O. (2017). Improving Floating Search Feature Selection using Genetic Algorithm, Journal of ICT Research and Applications, 11(3), pp. 299-317.

Ileberi, E., Sun, Y. and Wang, Z. (2022). A machine learning based credit card fraud detection using the GA algorithm for feature selection, Journal of Big Data, 9(24).

https://doi.org/10.1186/s40537-022-00573-8

Aswal, S., Jyothi, A. and Mehra, R. (2023). Feature Selection Method Based on Honeybee-SMOTE for Medical Data Classification. Informatica, 46(9), pp. 111-118. https://doi.org/10.31449/inf.v46i9.4098

Alija, S., Beqiri, E., Gaafar, A. S. and Hamoud, A. K. (2023). Predicting Students Performance Using Supervised Machine Learning Based on Imbalanced Dataset and Wrapper Feature Selection. Informatica,47(1), pp. 11-20

https://doi.org/10.31449/inf.v47i1.4519

Al-jadir, I., Wong, K. W., Fung, C. C. and Xie, H. (2017). Text Document Clustering Using Memetic Feature Selection, Proceedings of the 9th International Conference on Machine Learning and Computing (ICMLC), pp. 415-420.

https://doi.org/10.1145/3055635.3056603

Panda, D., Panda, D., Dash, S. R. and Parida, S. (2021). Extreme Learning Machines with Feature Selection Using GA for Effective Prediction of Fetal Heart Disease: A Novel Approach. Informatica, 45(3), pp. 381-392.

https://doi.org/10.31449/inf.v45i3.3223

Dua, C. Graff. (2019). UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), Irvine, CA: University of California, School of Information and Computer Science.

https://archive.ics.uci.edu/datasets




DOI: https://doi.org/10.31449/inf.v46i9.4964

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.