Early Warning of Financial Crises in Manufacturing Using SMOTE-Tomek Random Forest and Sentiment-Enhanced Indicators
Abstract
In order to improve the accuracy of financial crisis warning in the manufacturing industry and solve the problems of single indicators and insufficient ability to handle imbalanced data in traditional models, a warning system integrating traditional financial indicators and text big data indicators has been studied and constructed. The synthetic minority oversampling technique Tomek link random forest (SMOTE-Tomek-RF) model for early warning is adopted. Moreover, using 21 manufacturing enterprises listed on the Shanghai Stock Exchange A-shares as samples and based on 22 warning indicators, core variables are selected through random forest (RF) feature selection to compare the warning performance of RF, SMOTE-RF, single decision tree (DT), and the proposed SMOTE-Tomek-RF model. The results showed that the importance scores of emotional inclination and popularity were 0.052 and 0.047, respectively. Both scores were higher than the threshold and were ranked high, effectively supplementing the information. The predictive model proposed by the research had a subject area under the working curve (AUC) of 0.968, an F1 score of 84.97%, and a G-Mean of 90.11%. The AUC of the traditional RF model, SMOTE-RF model, and DT model were only 0.934, 0.953, and 0.943, respectively. In addition, the prediction accuracy for healthy and crisis firms after combining text big data amounted to 100% and 92.86%, respectively. In summary, the prediction model can effectively deal with the data imbalance problem and improve the precision of early warning. This method provides a reliable method for financial crisis early warning in manufacturing industry, which is of great significance for enterprise risk control and investor decision-making.DOI:
https://doi.org/10.31449/inf.v49i32.11034Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







