A Comparative Analysis of Extreme Gradient Boosting, Decision Tree, Support Vector Machines, and Random Forest Algorithm in Data Analysis of College Students' Psychological Health
Abstract
To solve the problem of identifying the mental health status of college students, this study investigated the psychological conditions of students in a certain department of a university in Hubei Province through a questionnaire survey using the SCL - 90 scale. It combined machine learning algorithms to analyze the applicability of the model and explore the differences between students with healthy and sub - healthy mental states. Data (including basic information) of 500 students were randomly collected. A self - compiled questionnaire was used in combination with on - site scoring by psychological teachers to classify the mental states of the 500 students into healthy and sub - healthy states. Questionnaire data were analyzed through decision tree, support vector machine, random forest, and XGBOOST algorithms to quickly identify the healthy and sub - healthy states and to mine the behavioral characteristics that have a certain correlation with the mental health status of students. The data information of 500 students was modeled respectively, and the classification effects of the models were evaluated through accuracy, precision, recall, F1 - score, and AUC. The results showed that among the four methods, the random forest had the best classification effect, with an R2 score of 0.8891, which was higher than the R2 score of 0.8393 for the decision tree, the R2 score of 0.8840 for the support vector machine, and the R2 score of 0.8618 for the XGBOOST algorithm. Considering the advantages of the random forest in terms of classification performance, modeling time, interpretability, feature selection, and simplicity, we recommend using the random forest model to assist in the diagnosis of mental health status classification. The experimental results on the SCL - 90 scale survey and the student basic information dataset show that the proposed model has high accuracy and can converge quickly, enabling more effective and accurate prediction of students' mental health status.DOI:
https://doi.org/10.31449/inf.v49i15.7004Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







