Cancer Classification through Gene Selection Using the Social Spider Optimization Algorithm
Abstract
Cancer is a leading cause of global mortality, underscoring the need for advanced diagnostic tools to enable early and accurate detection. Microarray technology allows for the simultaneous analysis of thousands of genes, offering valuable insights into cancer biology. However, the high dimensionality of microarray
data presents significant challenges for classification tasks. In this study, we propose a novel approach that integrates the Social Spider Optimization (SSO) algorithm with mutual information-based feature selection
to identify the most discriminative genes for cancer classification. We evaluate the performance of four machine learning classifiers—Decision Tree (DT), K-Nearest Neighbors (K-NN), Neural Networks (NN), and Support Vector Machines (SVM)—with and without feature selection. Our results demonstrate that the SSO algorithm significantly enhances classification accuracy, with SVM achieving near-perfect
performance on leukemia and lymphoma datasets when combined with Max-Relevance Min-Redundancy (MRMR) feature selection. This hybrid approach provides a robust solution for cancer diagnosis by addressing key challenges such as data redundancy and computational complexity.
Full Text:
PDFReferences
Zhang L., Wang Y., Chen X., Deep learning for high-dimensional omics data in precision oncology: A survey, Briefings in Bioinformatics,
Volume 24, Number 1, pp bbac478, 2023. https://doi.org/10.1093/bib/bbac478.
Wang Y., Chen X., Single-cell RNA-seq in cancer: Computational methods to uncover heterogeneity and biomarkers, Nature Reviews Cancer, Volume 22, Number 5, pp 289-302, 2022. https://doi.org/10.1038/s41568-021-00438-4.
Li J., Zhang Q., Liu H.,Multi-objective feature selection for cancer classification using Paretooptimal ensembles, IEEE Transactions on Cybernetics, Volume 53, Number 2, pp 987-1001, 2023. https://doi.org/10.1109/TCYB.2021.3071448.
Almutairi F., Alghamdi M.,Explainable AI-driven gene selection for cancer diagnosis: A mutual information framework, Artificial Intelligence in Medicine, Volume 148, pp 102752, 2024. https://doi.org/10.1016/j.artmed.2024.102752.
Khan A., Khan S.,MRMR++: A scalable mutual information-based feature selection algorithm for multi-omics data, Bioinformatics,
Volume 40, Number 1, pp btae012, 2024. https://doi.org/10.1093/bioinformatics/btae012.
Gupta S., Kumar P., Sharma R.,Quantuminspired PSO for high-dimensional feature selection in cancer genomics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Volume 19, Number 4, pp 2162-2175, 2022. https://doi.org/10.1109/TCBB.2021.3052988.
Abdel-Basset M., El-Hasnony I., Sallam K.,Social Spider Optimization with adaptive deep learning for cancer subtype prediction, Expert Systems with Applications, Volume 213, pp 119283, 2023. https://doi.org/10.1016/j.eswa.2022.119283.
Tan J., Roberts N., White D., Transformer-based deep learning models for cancer biomarker discovery from transcriptomic data, NPJ Precision Oncology, Volume 7, Number 1, pp 45, 2023.
https://doi.org/10.1038/s41698-023-00387-z.
Khan A., Khan S., Advanced feature selection techniques for cancer genomics using mutual information variants, Journal of Biomedical
Informatics, Volume 142, pp 104378, 2024. https://doi.org/10.1016/j.jbi.2023.104378.
Cava, C., Sabetian, S., Salvatore, C. et al.Pan-cancer classification of multi-omics data based on machine learning models. Netw Model
Anal Health Inform Bioinforma 13(6), 2024. https://doi.org/10.1007/s13278-023-01134-4.
Gonzalez, K., Balakrishnan, S., and Hu, J.Machine Learning and Feature Selection in Gene Expression Analysis: A Review.
Current Genomics, 23(6), pp 437-455, 2022. https://doi.org/10.2174/1389202923666220816105416.
Li D. and Wang H., A Markov chain modelbased method for cancer classification, 8th International Conference on Natural Computation,
Chongqing, China, pp 1064-1068, 2012. https://doi.org/10.1109/ICNC.2012.6418633.
Ganesh Kumar P. , Ammu V. and Victoire T. A. A., Building Decision Rules Using a Novel Data Driven Method for Microarray Data Classification, International Conference on Process Automation, Control and Computing, Coimbatore, India, pp. 1-6, 2011. https://doi.org/10.1109/PACCON.2011.5976849.
Alqahtani, A., Alsubai S., Sha M., Vilcekova L., Javed T., Cardiovascular Disease Detection using Ensemble Learning, Computational Intelligence and Neuroscience, 5267498, 9 pages, 2022. https://doi.org/10.1155/2022/5267498.
Sun X., Park J., Kang K. and Hur J., Novel hybrid CNN-SVM model for recognition of functional magnetic resonance images, IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, pp 1001-1006, 2017. https://doi.org/10.1109/SMC.2017.8122739.
Durgalakshmi B., Vijayakumar V., Feature selection and classification using support vector machine and decision tree. Computational Intelligence, 36, pp 1480–1492, 2020. https://doi.org/10.1007/s00521-019-04289-w.
Ijaz, M.F.; Alfian, G.; Syafrudin, M.; Rhee, J. Hybrid Prediction Model for Type 2 Diabetes and Hypertension Using DBSCANBased Outlier Detection, Synthetic Minority Over Sampling Technique (SMOTE), andRandom Forest. Appl. Sci., 8, 1325, 2018. https://doi.org/10.3390/app8081325.
Maceika, A.; Bugajev, A.; ˇSostak, O.R.; Vilutien ˙e, T. Decision Tree and AHP Methods Application for Projects Assessment: A Case Study. Sustainability, 13, 5502, 2021. https://doi.org/10.3390/su13105502.
Dwaraka Srihith, P. Vijaya Lakshmi, A. David Donald, T. Aditya Sai Srinivas, & G. Thippanna, A Forest of Possibilities: Decision Trees and Beyond. Journal of Advancement in Parallel Computing, 6(3), pp 29–37, 2023. https://doi.org/10.56146/japc.v6i3.143.
Mahamdi, Yassine & Boubakeur, Ahmed & Mekhaldi, Abdelouahab & Benmahamed, Youcef, Power Transformer Fault Prediction using Naive Bayes and Decision tree based on Dissolved Gas Analysis, ENP Engineering Science Journal, 2, pp 1-5, 2022. https://doi.org/10.55970/enpesj.22.2.1.
Doe, J., Smith, A., and Johnson, B., Advanced Wrapper Methods for Feature Selection in Cardiac Arrhythmia Classification, Journal of Biomedical Engineering and Informatics, vol. 15, no. 4, pp. 450-463, 2022.
M. Injadat, Optimized Machine Learning Models Towards Intelligent Systems, phdthesis,2020.
Ünal H. & Ba¸s¸cift¸ci F., Evolutionary design of neural network architectures: a review of three decades of research. Artificial Intelligence Review, 55, 2022. https://doi.org/10.1007/s10462- 021-10060-3.
Grosan C. & Abraham A., Artificial Neural Networks, 17,pp 281 323, 2011. https://doi.org/10.1007/978-3-642-17342-3 6.
Arifin N. A.,Tiun S., Predicting Malay Prominent Syllable Using Support Vector Machine, Procedia Technology, Volume 11, pp 861-869, 2013. https://doi.org/10.1016/j.protcy.2013.12.262.
Nalepa, J., Kawulok, M., Selecting training sets for support vector machines: a review. Artif Intell Rev 52,pp 857–900, 2019. https://doi.org/10.1007/s10462-018-9632-3.
Anosh B. P. S., Annavarapu C. S. R., Dara S., Clustering-based hybrid feature selection approach for high dimensional microarray
data, Chemometrics and Intelligent Laboratory Systems, Volume 213, 104305, 2021. https://doi.org/10.1016/j.chemolab.2021.104305.
B. Li, P. Zhang, S. Liang and G. Ren, Feature extraction and selection for fault diagnosis of gear using wavelet entropy and mutual information, 9th International Conference on Signal
Processing, Beijing, China, pp 2846-2850, 2008. https://doi.org/10.1109/ICSP.2008.4705982.
Sulaiman M. A. and Labadin J., Feature selection based on mutual information, 9th International Conference on IT in Asia (CITA), Sarawak, Malaysia, pp. 1-6, 2015. https://doi.org/10.1109/CITASIA.2015.7350538.
Jalali-Najafabadi F., Stadler M., Dand N., et al, Application of information theoretic feature selection and machine learning methods for the development of genetic risk prediction models, Sci Rep.11(1):23335, Dec 2, 2021. https://doi.org/10.1038/s41598-021-02786-z.
Khumukcham R., Urikhimbam B.C., Nazrul H., Dhruba K. B., JoMIC: A joint MIbased filter feature selection method, Journal of Computational Mathematics and Data Science, Volume 6, 100075, 2023. https://doi.org/10.1016/j.jcmds.2023.100075.
Jain, P.K., Jain, M. & Pamula, R., Explaining and predicting employees’attrition: a machine learning approach. SN Appl. Sci. 2, 757, 2020. https://doi.org/10.1007/s42452-020-2541-1.
Ginny Y. Wong, Frank H.F. Leung, Sai- Ho Ling, A hybrid evolutionary preprocessing method for imbalanced datasets, Information Sciences, Volumes 454–455,pp 161-177, 2018.
https://doi.org/10.1016/j.ins.2018.03.030.
Xinteng G., Xinggao L., A novel effective diagnosis model based on optimized least squares support machine for gene microarray, Applied Soft Computing, Volume 66, pp 50-59,2018. https://doi.org/10.1016/j.asoc.2018.02.004.
Alba E. , Garcia-Nieto J. ,Jourdan L. and Talbi E. -G., Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid
algorithms,IEEE Congress on Evolutionary Computation, Singapore, pp. 284-290, 2007. https://doi.org/ 10.1109/CEC.2007.4424501.
Mylavarapu S. and Kaban A., Random projections versus random selection of features for classification of high dimensional data, 13th UK Workshop on Computational Intelligence (UKCI), Guildford, UK, pp 305-312, 2013. https://doi.org/10.1109/UKCI.2013.6658545.
Cherif C., Abdi M.K.,Ahmad A. and Maiza M., Predictive approach to the degree of business process change, International Journal of Computing and Digital Systems, 14(1), pp. 10505-10513, Dec. 2023. https://doi.org/10.12785/ijcds/140102.
Kou L. , Yuan Y., Sun J. and Lin Y., Prediction of Cancer Based on Mobile Cloud Computing and SVM, International Conference on Dependable Systems and Their Applications (DSA), Beijing, China, pp. 73-76, 2017. https://doi.org/10.1109/DSA.2017.17.
DOI: https://doi.org/10.31449/inf.v49i3.9126
This work is licensed under a Creative Commons Attribution 3.0 License.








