A Bi-GRU and BERT-Based Intelligent Audit System for News Moderation via NLP and Sentiment Analysis
Abstract
In the era of information explosion, the exponential growth and rapid update of news data pose significant challenges to traditional manual news dissemination review mechanisms. Existing methods struggle to balance content moderation comprehensiveness and accuracy. To address these issues, this study develops an intelligent audit system for news communication that integrates natural language processing (NLP) and sentiment analysis. Leveraging advanced NLP techniques like semantic analysis and keyword extraction, the system swiftly identifies core news information and potential risk points. Sentiment analysis algorithms are integrated to precise assess the emotional tone and social impact of news content, enabling intelligent screening and risk early warning. The system employs models such as BERT and Bi-GRU for NLP and sentiment analysis components, respectively. Experimental results demonstrate its effectiveness: news audit efficiency has increased by nearly 40%, and the error rate has decreased by about 30%. It can also effectively detect and filter false information and public opinion risks, enhancing news credibility and social value. Outperforming existing methods in accuracy and recall, the system features a hierarchical architecture with data collection, preprocessing, NLP and sentiment analysis, and audit decision-making layers. Data collection is achieved through web crawlers, and preprocessing includes deduplication, cleaning, word segmentation, and vectorization. The BERT pre-trained model is fine-tuned for NLP tasks, while sentiment analysis utilizes an LSTM-attention mechanism model, all implemented in a Python environment with the PyTorch framework. Using the THUCNews corpus for news text classification and SST-2 for sentiment analysis training, the model achieves over 90% news classification accuracy and an F1 score exceeding 85% for sentiment analysis. Additionally, the system incorporates multilingual capabilities by integrating multilingual pre-trained models such as mBERT and XLM-R, and introducing language adapters. It can audit news texts in English, Chinese, Spanish, and other languages, achieving an average accuracy of 85% on multilingual datasets and a 30% improvement in cross-lingual transfer compared to monolingual models, effectively supporting global news dissemination audits and handling multilingual mixed content.DOI:
https://doi.org/10.31449/inf.v49i36.9274Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







