Joint Symbol-Text Parsing in Power Grid Blueprints via Multimodal Fusion Using YOLOv7, PP-OCRv3, and GCN
Abstract
Aiming to address the key needs for efficient analysis of blueprint information in the intelligent construction of power grid projects, this paper proposes a joint analysis algorithm for power grid blueprint symbols and texts based on multimodal fusion. This method designs a two-stream feature extraction and cross-modal alignment framework. Firstly, the YOLOv7 model and spatial pyramid pooling technology are adopted to enhance the detection ability of small-sized electrical symbols; Secondly, the high-precision PP-OCRv3 engine is used to realise character detection and recognition, and location coding is introduced to enhance its spatial perception. Finally, the symbol-text association matrix is constructed, and its topological connection relationship is modelled using a graph convolutional network (GCN). At the same time, an attention-guided feature fusion module (AG-Fusion) is designed to achieve dynamic weighted fusion of visual and textual features, thereby enabling joint parsing within the end-to-end process. To verify the effectiveness of the algorithm, this paper conducts a systematic experiment using the self-built power grid blueprint dataset, specifically GBD-1. 0, which contains 217 standard blueprints, 12 types of electrical symbols and 3862 text examples. The experimental results show that it achieves 93.7% mAP @ 0.5 in symbol detection, 95.4% F1 value in text recognition, and 89.2% accuracy in the most critical joint parsing. This algorithm resolves analysis ambiguity in complex scenarios, such as drawing occlusion and dense text, and provides reliable technical support for the digital construction of power grids.DOI:
https://doi.org/10.31449/inf.v50i7.12060Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







