KorvexChecker: A BERT-Based Verification Framework for LLM Outputs in Turkish
Abstract
Large Language Models (LLMs) have demonstrated exceptional capabilities in generating human-like text, yet their outputs often suffer from critical issues such as hallucination, ethical violations, and lack of meaningfulness. These challenges can result in misinformation, offensive language, and unreliable outputs, especially in niche languages like Turkish. To address these concerns, we introduce KorvexChecker, a modular and multi-functional tool for validating LLM outputs in terms of ethical compliance, meaningfulness, and hallucination detection. KorvexChecker employs a combination of fine-tuned BERT models for sentiment analysis, ethical violation detection, and text classification to identify whether a given text requires further verification. The tool integrates various external verification resources, such as Web Search API for factual accuracy and journal database for academical document validation. Furthermore, Zemberek, a Turkish NLP library, is utilized for preprocessing, normalization, and segmentation of Turkish text. The system was evaluated on a custom-labeled dataset of 7552 Turkish samples, divided into training, validation, and test sets. Experimental results demonstrate that the BERT-based classifiers achieved an average F1-score of 93.3% across multiple tasks, significantly outperforming both rule-based methods and traditional machine learning baselines. A small-scale user test showed the framework to be responsive and effective in real-time scenarios, with an average inference latency of 800 milliseconds per query. By leveraging both advanced machine learning techniques and external validation mechanisms, KorvexChecker offers a robust framework for ensuring the trustworthiness of AI-generated text, with potential applications in academia, media, and content moderation.References
S. Farquhar, J. Kossen, L. Kuhn, and Y. Gal, “Detecting hallucinations in large language models using semantic entropy,” Nature, vol. 630, pp. 625–630, Jun. 2024, doi: https://doi.org/10.1038/s41586-024-07421-0.
Elham Tajik, “The Ethical Concerns Surrounding GPT in Education,” arXiv, Jan. 2024, doi: http://dx.doi.org/10.2139/ssrn.4887922.
Zhiheng Xi, Rui Zheng, Tao Gui, “Safety and Ethical Concerns of Large Language Models,” China Natl. Conf. Comput. Linguist., pp. 9–16, 2023.
A. Taha Arslan, “Büyük dil modellerinin Türkçe verisetleri ile eğitilmesi ve ince ayarlanması,” arXiv, vol. 2306.03978, Jun. 2023.
T. Kesgin et al., “Optimizing Large Language Models for Turkish: New Methodologies in Corpus Selection and Training,” in Innovations in Intelligent Systems and Applications Conference (ASYU), IEEE, Oct. 2024, pp. 1–6. doi: 10.1109/ASYU62119.2024.10757019.
İlhami Sel , Davut Hanbay, “Ön Eğitimli Dil Modelleri Kullanarak Türkçe Tweetlerden Cinsiyet Tespiti,” Fırat Üniversitesi Mühendis. Bilim. Derg., vol. 33, no. 2, pp. 675–684, Sep. 2021, doi: 10.35234.
E. Ezin, R. S. Kiziltepe, and M. Karakus, “Using LLMs for Annotation and ML Methods for Comparative Analysis of Large-Scale Turkish Sentiment Datasets,” presented at the 2024 9th International Conference on Computer Science and Engineering (UBMK), Antalya, Turkey, Oct. 2024, pp. 204–209. doi: 10.1109/UBMK63289.2024.10773485.
A. Najafi and O. Varol, “TurkishBERTweet: Fast and reliable large language model for social media analysis,” Expert Syst. Appl., vol. Volume 255, 2024, doi: https://doi.org/10.1016/j.eswa.2024.124737.
A. C. Mazari and A. Djeffal, “Sentiment Analysis of Algerian Dialect Using Machine Learning and Deep Learning with Word2vec,” Informatica, vol. 46, no. 46, pp. 67–78, Mar. 2022, doi: https://doi.org/10.31449/inf.v46i6.3340.
H. Moalla, H. Abid, D. Sallami, E. Aïmeur, and B. B. Hamed, “Exploring the Power of Dual Deep Learning for Fake News Detection,” Informatica, vol. 48, pp. 567–594, Apr. 2024, doi: https://doi.org/10.31449/inf.v48i4.5977.
Kurt, M. S., & Yücel Demirel, E., “Türkçe Hakaret ve Nefret Söylemi Otomatik Tespit Modeli,” Veri Bilimi, vol. 6, no. 1, pp. 61–73, 2023.
Weber-Wulff, D., Anohina-Naumeca, A., Bjelobaba, S. et al., “Testing of detection tools for AI-generated text,” Int. J. Educ. Integr., vol. 19, no. 26, Dec. 2023, doi: 10.1007.
Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, Qing Wang, “Software Testing with Large Language Models: Survey, Landscape, and Vision,” arXiv, vol. 2307.07221v3, Mar. 2024.
W. Etaiwi, A. Awajan, and Dima Suleiman, “Deep Learning Based Techniques for Sentiment Analysis: A Survey,” Informatica, no. 45, pp. 89–95, Aug. 2021, doi: https://doi.org/10.31449/inf.v45i7.3674.
B. Bsir, N. Khoufi, and M. Zrigui, “Prediction of Author’s Profile Basing on Fine-Tuning BERT Model,” Informatica, no. 48, pp. 69–78, Feb. 2023.
A. Sabir, H. A. Ali, and M. A. Aljabery, “ChatGPT Tweets Sentiment Analysis Using Machine Learning and Data Classification,” Informatica, no. 48, Dec. 2023, doi: https://doi.org/10.31449/inf.v48i7.5535.
E. F. Tsani and D. Suhartono, “Personality Identification from Social Media Using Ensemble BERT and RoBERTa,” Informatica, no. 47, pp. 537–544, Mar. 2023.
K. Oflazer, bert-base-turkish-cased. [Online]. Available: https://huggingface.co/dbmdz/bert-base-turkish-cased
A. Zeer et al., “Cosmos-LLaVA: Görselle Sohbet Etmek Cosmos-LLaVA: Chatting with the Visual,” presented at the 8th International Artificial Intelligence and Data Processing Symposium (IDAP’24), Malatya, Türkiye, Sep. 2024.
DOI:
https://doi.org/10.31449/inf.v50i10.8378Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







