A Hybrid BERT-ALBERT Model for Text Classification: Improving Accuracy in Document Analysis

Abstract

In document analysis, text classification is an essential activity that facilitates automatic contentcategorisation, sentiment analysis, and effective information retrieval. This paper explores a combination(Bidirectional Encoder Representations from Transformers) BERT+ALBERT (A Lite BERT) to enhanceclassification accuracy while reducing computational complexity. This model utilizes transformer encoderblocks and bidirectional position encoding for text processing. Large text data handling requiresautomation, and attention techniques and transformers are becoming viable approaches. The model wasevaluated on a custom dataset consisting of 80,000 Turkish-language documents spanning 30 categories,including finance, education, healthcare, and travel. The dataset was split into 70% training, 15%validation, and 15% testing. Pretrained FastText embeddings were used alongside BERT and ALBERT tocapture rich semantic features. While ALBERT increases efficiency through parameter reduction andcross-layer parameter sharing, BERT offers deep contextual embeddings. According to experimentalassessments, the BERT+ALBERT hybrid model performs better than transformer models (BERT,ALBERT, and LSTM) and classic machine learning models, attaining the recommended model accuracyof 96.6%, precision of 95.7%, recall of 95.1%, and F1-Score of 95.5%. Statistical significance of theperformance gains was confirmed using t-tests (p < 0.05) across five independent runs. Stronggeneralisation and little overfitting are shown by the training and validation curves. These resultsdemonstrate the benefits of using many transformer topologies for document categorisation, providing atrade-off between computing efficiency and accuracy. Additional optimisations, such as domain-specificfine-tuning and sophisticated attention processes, can be investigated in future research.

Authors

  • Xiaokui Liu Anyang Normal University, Anyang,Henan 455000, China

DOI:

https://doi.org/10.31449/inf.v50i7.8368

Downloads

Published

02/21/2026

How to Cite

Liu, X. (2026). A Hybrid BERT-ALBERT Model for Text Classification: Improving Accuracy in Document Analysis. Informatica, 50(7). https://doi.org/10.31449/inf.v50i7.8368