A Comparative Analysis of Extreme Gradient Boosting, Decision Tree, Support Vector Machines, and Random Forest Algorithm in Data Analysis of College Students' Psychological Health
Abstract
To solve the problem of identifying the mental health status of college students, this study investigated the psychological conditions of students in a certain department of a university in Hubei Province through a questionnaire survey using the SCL - 90 scale. It combined machine learning algorithms to analyze the applicability of the model and explore the differences between students with healthy and sub - healthy mental states. Data (including basic information) of 500 students were randomly collected. A self - compiled questionnaire was used in combination with on - site scoring by psychological teachers to classify the mental states of the 500 students into healthy and sub - healthy states. Questionnaire data were analyzed through decision tree, support vector machine, random forest, and XGBOOST algorithms to quickly identify the healthy and sub - healthy states and to mine the behavioral characteristics that have a certain correlation with the mental health status of students. The data information of 500 students was modeled respectively, and the classification effects of the models were evaluated through accuracy, precision, recall, F1 - score, and AUC. The results showed that among the four methods, the random forest had the best classification effect, with an R2 score of 0.8891, which was higher than the R2 score of 0.8393 for the decision tree, the R2 score of 0.8840 for the support vector machine, and the R2 score of 0.8618 for the XGBOOST algorithm. Considering the advantages of the random forest in terms of classification performance, modeling time, interpretability, feature selection, and simplicity, we recommend using the random forest model to assist in the diagnosis of mental health status classification. The experimental results on the SCL - 90 scale survey and the student basic information dataset show that the proposed model has high accuracy and can converge quickly, enabling more effective and accurate prediction of students' mental health status.DOI:
https://doi.org/10.31449/inf.v49i15.7004Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







