MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced Language Model for Power Audit Text Understanding
Abstract
In the rapidly evolving energy sector, efficient access to relevant information from power audit reports is crucial for informed decision-making, regulatory compliance, and operational improvements. However, the intricate language, complex vocabulary, and unstructured format of power audit texts present significant challenges for conventional information retrieval techniques. To address these issues, the research proposes a novel power audit text understanding technology that combines multi-dimensional information retrieval enhancement with a domain-adapted Large Language Model (LLM) to enhance the performance of power audit text processing. The Multi-Dimensional Information Retrieval-based Bidirectional Encoder Representations from Transformers (MDIR-BERT) method captures electric-power-specific morphology, domain-specific vocabulary, and intricate entity relationships more effectively. MDIR-BERT is pre-trained on a huge quantity of electric power audit transcripts utilizing both word-level and entity-level covered language modeling tasks. The model is trained on a curated dataset of annotated electric power audit documents sourced from regulatory and industrial environments. MDIR-BERT integrates domain-specific pre-training with both word-level and entity-level masked language modeling, capturing electric power-specific morphology, terminology, and complex entity relationships. The data preprocessing steps include comprehensive text cleaning, normalization, and tokenization to ensure high-quality input for method training. Experimental results show that MDIR-BERT achieves a classification accuracy of 98.82%, representing a +16.86% improvement over the baseline EPAT-BERT model (81.96%), along with notable gains in precision, recall, and F1-score. These findings highlight the effectiveness of integrating enhanced information retrieval techniques with specialized language modeling for the intelligent understanding of power audit documentation, paving the way for more accurate, scalable, and interpretable audit methods.DOI:
https://doi.org/10.31449/inf.v49i12.9094Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







