A Hybrid RF-CART-SMOTE-GA Model for Early Warning in University Budget Management
Abstract
With the advent of the big data era, university budget management faces heightened demands, necessitating effective responses to challenges arising from diversified funding sources and expanded scale. To address these challenges, this study employs the Random Forest algorithm as the foundation for a budget information early warning model. It combines the Classification and Regression Trees (CART) algorithm to optimize the decision tree structure, introduces weighted ensemble voting to enhance the classification process, and achieves data balancing and parameter optimization through Synthetic Minority Over-sampling Techniques and genetic algorithms. The model was validated using two datasets: the Integrated Postsecondary Education Data System (IPEDS) and the Higher Education Statistics Agency (HESA). Experimental results demonstrated that for the classification task during the budgeting phase, the model achieved a maximum classification accuracy of 94% on the HESA dataset, with a recall of 92.18%, an F1 score of 92.87%, and a sample balance rate of 81.64%, with the lowest budget information early warning error at 3.7%. Additionally, the average runtime was 0.04 seconds, and CPU utilization was only 24.17%, significantly outperforming models such as DBN, XGBoost, and VAE-TSAD. The research results demonstrate that the proposed model possesses high accuracy, real-time capability, and computational efficiency in early warning for university budget management information, providing reliable technical support for higher education financial decision-making.DOI:
https://doi.org/10.31449/inf.v50i6.9805Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







