MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced Language Model for Power Audit Text Understanding

Jia Xiaoliang; Li Sen; Cui Xia; Li Jing; Sun Chang-peng; Liu Dong-hua; Chen Zheng-long

doi:10.31449/inf.v49i12.9094

MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced Language Model for Power Audit Text Understanding

Abstract

In the rapidly evolving energy sector, efficient access to relevant information from power audit reports is crucial for informed decision-making, regulatory compliance, and operational improvements. However, the intricate language, complex vocabulary, and unstructured format of power audit texts present significant challenges for conventional information retrieval techniques. To address these issues, the research proposes a novel power audit text understanding technology that combines multi-dimensional information retrieval enhancement with a domain-adapted Large Language Model (LLM) to enhance the performance of power audit text processing. The Multi-Dimensional Information Retrieval-based Bidirectional Encoder Representations from Transformers (MDIR-BERT) method captures electric-power-specific morphology, domain-specific vocabulary, and intricate entity relationships more effectively. MDIR-BERT is pre-trained on a huge quantity of electric power audit transcripts utilizing both word-level and entity-level covered language modeling tasks. The model is trained on a curated dataset of annotated electric power audit documents sourced from regulatory and industrial environments. MDIR-BERT integrates domain-specific pre-training with both word-level and entity-level masked language modeling, capturing electric power-specific morphology, terminology, and complex entity relationships. The data preprocessing steps include comprehensive text cleaning, normalization, and tokenization to ensure high-quality input for method training. Experimental results show that MDIR-BERT achieves a classification accuracy of 98.82%, representing a +16.86% improvement over the baseline EPAT-BERT model (81.96%), along with notable gains in precision, recall, and F1-score. These findings highlight the effectiveness of integrating enhanced information retrieval techniques with specialized language modeling for the intelligent understanding of power audit documentation, paving the way for more accurate, scalable, and interpretable audit methods.

References

Authors

Jia Xiaoliang State Grid Tianjin Electric Power Company
Li Sen
Cui Xia
Li Jing
Sun Chang-peng State Grid Tianjin Electric Power Company,
Liu Dong-hua State Grid Tianjin Electric Power Company,
Chen Zheng-long

DOI:

https://doi.org/10.31449/inf.v49i12.9094

Downloads

Published

11/23/2025

Issue

Vol. 49 No. 12 (2025): Online-only issue

Section

Online-only

License

Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.

All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.

Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.

How to Cite

MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced Language Model for Power Audit Text Understanding. (2025). Informatica, 49(12). https://doi.org/10.31449/inf.v49i12.9094

Download Citation

MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced Language Model for Power Audit Text Understanding

Abstract

References

Authors

DOI:

Downloads

Published

Issue

Section

License

How to Cite

Developed By

Information