Cancer Classification through Gene Selection Using the Social Spider Optimization Algorithm
Abstract
Cancer is a leading cause of global mortality, underscoring the need for advanced diagnostic tools to enable early and accurate detection. Microarray technology allows for the simultaneous analysis of thousands of genes, offering valuable insights into cancer biology. However, the high dimensionality of microarraydata presents significant challenges for classification tasks. In this study, we propose a novel approach that integrates the Social Spider Optimization (SSO) algorithm with mutual information-based feature selectionto identify the most discriminative genes for cancer classification. We evaluate the performance of four machine learning classifiers—Decision Tree (DT), K-Nearest Neighbors (K-NN), Neural Networks (NN), and Support Vector Machines (SVM)—with and without feature selection. Our results demonstrate that the SSO algorithm significantly enhances classification accuracy, with SVM achieving near-perfectperformance on leukemia and lymphoma datasets when combined with Max-Relevance Min-Redundancy (MRMR) feature selection. This hybrid approach provides a robust solution for cancer diagnosis by addressing key challenges such as data redundancy and computational complexity.References
Zhang L., Wang Y., Chen X., Deep learning for high-dimensional omics data in precision oncology: A survey, Briefings in Bioinformatics,
Volume 24, Number 1, pp bbac478, 2023. https://doi.org/10.1093/bib/bbac478.
Wang Y., Chen X., Single-cell RNA-seq in cancer: Computational methods to uncover heterogeneity and biomarkers, Nature Reviews Cancer, Volume 22, Number 5, pp 289-302, 2022. https://doi.org/10.1038/s41568-021-00438-4.
Li J., Zhang Q., Liu H.,Multi-objective feature selection for cancer classification using Paretooptimal ensembles, IEEE Transactions on Cybernetics, Volume 53, Number 2, pp 987-1001, 2023. https://doi.org/10.1109/TCYB.2021.3071448.
Almutairi F., Alghamdi M.,Explainable AI-driven gene selection for cancer diagnosis: A mutual information framework, Artificial Intelligence in Medicine, Volume 148, pp 102752, 2024. https://doi.org/10.1016/j.artmed.2024.102752.
Khan A., Khan S.,MRMR++: A scalable mutual information-based feature selection algorithm for multi-omics data, Bioinformatics,
Volume 40, Number 1, pp btae012, 2024. https://doi.org/10.1093/bioinformatics/btae012.
Gupta S., Kumar P., Sharma R.,Quantuminspired PSO for high-dimensional feature selection in cancer genomics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Volume 19, Number 4, pp 2162-2175, 2022. https://doi.org/10.1109/TCBB.2021.3052988.
Abdel-Basset M., El-Hasnony I., Sallam K.,Social Spider Optimization with adaptive deep learning for cancer subtype prediction, Expert Systems with Applications, Volume 213, pp 119283, 2023. https://doi.org/10.1016/j.eswa.2022.119283.
Tan J., Roberts N., White D., Transformer-based deep learning models for cancer biomarker discovery from transcriptomic data, NPJ Precision Oncology, Volume 7, Number 1, pp 45, 2023.
https://doi.org/10.1038/s41698-023-00387-z.
Khan A., Khan S., Advanced feature selection techniques for cancer genomics using mutual information variants, Journal of Biomedical
Informatics, Volume 142, pp 104378, 2024. https://doi.org/10.1016/j.jbi.2023.104378.
Cava, C., Sabetian, S., Salvatore, C. et al.Pan-cancer classification of multi-omics data based on machine learning models. Netw Model
Anal Health Inform Bioinforma 13(6), 2024. https://doi.org/10.1007/s13278-023-01134-4.
Gonzalez, K., Balakrishnan, S., and Hu, J.Machine Learning and Feature Selection in Gene Expression Analysis: A Review.
Current Genomics, 23(6), pp 437-455, 2022. https://doi.org/10.2174/1389202923666220816105416.
Li D. and Wang H., A Markov chain modelbased method for cancer classification, 8th International Conference on Natural Computation,
Chongqing, China, pp 1064-1068, 2012. https://doi.org/10.1109/ICNC.2012.6418633.
Ganesh Kumar P. , Ammu V. and Victoire T. A. A., Building Decision Rules Using a Novel Data Driven Method for Microarray Data Classification, International Conference on Process Automation, Control and Computing, Coimbatore, India, pp. 1-6, 2011. https://doi.org/10.1109/PACCON.2011.5976849.
Alqahtani, A., Alsubai S., Sha M., Vilcekova L., Javed T., Cardiovascular Disease Detection using Ensemble Learning, Computational Intelligence and Neuroscience, 5267498, 9 pages, 2022. https://doi.org/10.1155/2022/5267498.
Sun X., Park J., Kang K. and Hur J., Novel hybrid CNN-SVM model for recognition of functional magnetic resonance images, IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, pp 1001-1006, 2017. https://doi.org/10.1109/SMC.2017.8122739.
Durgalakshmi B., Vijayakumar V., Feature selection and classification using support vector machine and decision tree. Computational Intelligence, 36, pp 1480–1492, 2020. https://doi.org/10.1007/s00521-019-04289-w.
Ijaz, M.F.; Alfian, G.; Syafrudin, M.; Rhee, J. Hybrid Prediction Model for Type 2 Diabetes and Hypertension Using DBSCANBased Outlier Detection, Synthetic Minority Over Sampling Technique (SMOTE), andRandom Forest. Appl. Sci., 8, 1325, 2018. https://doi.org/10.3390/app8081325.
Maceika, A.; Bugajev, A.; ˇSostak, O.R.; Vilutien ˙e, T. Decision Tree and AHP Methods Application for Projects Assessment: A Case Study. Sustainability, 13, 5502, 2021. https://doi.org/10.3390/su13105502.
Dwaraka Srihith, P. Vijaya Lakshmi, A. David Donald, T. Aditya Sai Srinivas, & G. Thippanna, A Forest of Possibilities: Decision Trees and Beyond. Journal of Advancement in Parallel Computing, 6(3), pp 29–37, 2023. https://doi.org/10.56146/japc.v6i3.143.
Mahamdi, Yassine & Boubakeur, Ahmed & Mekhaldi, Abdelouahab & Benmahamed, Youcef, Power Transformer Fault Prediction using Naive Bayes and Decision tree based on Dissolved Gas Analysis, ENP Engineering Science Journal, 2, pp 1-5, 2022. https://doi.org/10.55970/enpesj.22.2.1.
Doe, J., Smith, A., and Johnson, B., Advanced Wrapper Methods for Feature Selection in Cardiac Arrhythmia Classification, Journal of Biomedical Engineering and Informatics, vol. 15, no. 4, pp. 450-463, 2022.
M. Injadat, Optimized Machine Learning Models Towards Intelligent Systems, phdthesis,2020.
Ünal H. & Ba¸s¸cift¸ci F., Evolutionary design of neural network architectures: a review of three decades of research. Artificial Intelligence Review, 55, 2022. https://doi.org/10.1007/s10462- 021-10060-3.
Grosan C. & Abraham A., Artificial Neural Networks, 17,pp 281 323, 2011. https://doi.org/10.1007/978-3-642-17342-3 6.
Arifin N. A.,Tiun S., Predicting Malay Prominent Syllable Using Support Vector Machine, Procedia Technology, Volume 11, pp 861-869, 2013. https://doi.org/10.1016/j.protcy.2013.12.262.
Nalepa, J., Kawulok, M., Selecting training sets for support vector machines: a review. Artif Intell Rev 52,pp 857–900, 2019. https://doi.org/10.1007/s10462-018-9632-3.
Anosh B. P. S., Annavarapu C. S. R., Dara S., Clustering-based hybrid feature selection approach for high dimensional microarray
data, Chemometrics and Intelligent Laboratory Systems, Volume 213, 104305, 2021. https://doi.org/10.1016/j.chemolab.2021.104305.
B. Li, P. Zhang, S. Liang and G. Ren, Feature extraction and selection for fault diagnosis of gear using wavelet entropy and mutual information, 9th International Conference on Signal
Processing, Beijing, China, pp 2846-2850, 2008. https://doi.org/10.1109/ICSP.2008.4705982.
Sulaiman M. A. and Labadin J., Feature selection based on mutual information, 9th International Conference on IT in Asia (CITA), Sarawak, Malaysia, pp. 1-6, 2015. https://doi.org/10.1109/CITASIA.2015.7350538.
Jalali-Najafabadi F., Stadler M., Dand N., et al, Application of information theoretic feature selection and machine learning methods for the development of genetic risk prediction models, Sci Rep.11(1):23335, Dec 2, 2021. https://doi.org/10.1038/s41598-021-02786-z.
Khumukcham R., Urikhimbam B.C., Nazrul H., Dhruba K. B., JoMIC: A joint MIbased filter feature selection method, Journal of Computational Mathematics and Data Science, Volume 6, 100075, 2023. https://doi.org/10.1016/j.jcmds.2023.100075.
Jain, P.K., Jain, M. & Pamula, R., Explaining and predicting employees’attrition: a machine learning approach. SN Appl. Sci. 2, 757, 2020. https://doi.org/10.1007/s42452-020-2541-1.
Ginny Y. Wong, Frank H.F. Leung, Sai- Ho Ling, A hybrid evolutionary preprocessing method for imbalanced datasets, Information Sciences, Volumes 454–455,pp 161-177, 2018.
https://doi.org/10.1016/j.ins.2018.03.030.
Xinteng G., Xinggao L., A novel effective diagnosis model based on optimized least squares support machine for gene microarray, Applied Soft Computing, Volume 66, pp 50-59,2018. https://doi.org/10.1016/j.asoc.2018.02.004.
Alba E. , Garcia-Nieto J. ,Jourdan L. and Talbi E. -G., Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid
algorithms,IEEE Congress on Evolutionary Computation, Singapore, pp. 284-290, 2007. https://doi.org/ 10.1109/CEC.2007.4424501.
Mylavarapu S. and Kaban A., Random projections versus random selection of features for classification of high dimensional data, 13th UK Workshop on Computational Intelligence (UKCI), Guildford, UK, pp 305-312, 2013. https://doi.org/10.1109/UKCI.2013.6658545.
Cherif C., Abdi M.K.,Ahmad A. and Maiza M., Predictive approach to the degree of business process change, International Journal of Computing and Digital Systems, 14(1), pp. 10505-10513, Dec. 2023. https://doi.org/10.12785/ijcds/140102.
Kou L. , Yuan Y., Sun J. and Lin Y., Prediction of Cancer Based on Mobile Cloud Computing and SVM, International Conference on Dependable Systems and Their Applications (DSA), Beijing, China, pp. 73-76, 2017. https://doi.org/10.1109/DSA.2017.17.
DOI:
https://doi.org/10.31449/inf.v49i3.9126Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







