A Bi-GRU and BERT-Based Intelligent Audit System for News Moderation via NLP and Sentiment Analysis
Abstract
In the era of information explosion, the exponential growth and rapid update of news data pose significant challenges to traditional manual news dissemination review mechanisms. Existing methods struggle to balance content moderation comprehensiveness and accuracy. To address these issues, this study develops an intelligent audit system for news communication that integrates natural language processing (NLP) and sentiment analysis. Leveraging advanced NLP techniques like semantic analysis and keyword extraction, the system swiftly identifies core news information and potential risk points. Sentiment analysis algorithms are integrated to precise assess the emotional tone and social impact of news content, enabling intelligent screening and risk early warning. The system employs models such as BERT and Bi-GRU for NLP and sentiment analysis components, respectively. Experimental results demonstrate its effectiveness: news audit efficiency has increased by nearly 40%, and the error rate has decreased by about 30%. It can also effectively detect and filter false information and public opinion risks, enhancing news credibility and social value. Outperforming existing methods in accuracy and recall, the system features a hierarchical architecture with data collection, preprocessing, NLP and sentiment analysis, and audit decision-making layers. Data collection is achieved through web crawlers, and preprocessing includes deduplication, cleaning, word segmentation, and vectorization. The BERT pre-trained model is fine-tuned for NLP tasks, while sentiment analysis utilizes an LSTM-attention mechanism model, all implemented in a Python environment with the PyTorch framework. Using the THUCNews corpus for news text classification and SST-2 for sentiment analysis training, the model achieves over 90% news classification accuracy and an F1 score exceeding 85% for sentiment analysis. Additionally, the system incorporates multilingual capabilities by integrating multilingual pre-trained models such as mBERT and XLM-R, and introducing language adapters. It can audit news texts in English, Chinese, Spanish, and other languages, achieving an average accuracy of 85% on multilingual datasets and a 30% improvement in cross-lingual transfer compared to monolingual models, effectively supporting global news dissemination audits and handling multilingual mixed content.DOI:
https://doi.org/10.31449/inf.v49i36.9274Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







