Semantic Completeness-Enhanced Transformer Architecture for Named Entity and Relation Extraction in Noun Predicate Sentences
Abstract
To solve the problem of semantic association and syntactic pattern parsing in noun predicate sentences, this paper proposed an entity recognition and relationship extraction method based on Transformer ar-chitecture. This method innovatively integrated three core components: enhancing text understanding ability through Similarity Whole Word Masking (Sim WWM) pre training, introducing semantic integrity quantification of text semantic coverage defects, and combining multi head semantic integrity mapping networks to achieve multi-dimensional complex feature learning. The experiment was based on the CoNLL04 dataset containing 1437 sentences and the SciERC dataset with 500 scientific abstracts. MTL-NER, SPKTE, DRPFF, and other baseline models were used. In entity recognition tasks, the proposed method achieved an average accuracy, recall, and F1 score of 90.7%, 91.6%, and 90.6% on the CoNLL04 dataset, and 69.8%, 70.4%, and 69.8% on the SciERC dataset, respectively; In the relationship extraction task, the three indicators on the CoNLL04 dataset were 75.3%, 73.4%, and 73.2%, respectively, and on the SciERC dataset were 62.9%, 60.5%, and 61.4%, respectively, all significantly better than the baseline model. The ablation experiment verification showed that the introduction of semantic integrity improved the model recognition accuracy by 3.9%, which was the core performance gain source. This method can effectively handle the problem of confusion between nested entities and entity types, providing an efficient technical path for complex semantic relationship parsing.DOI:
https://doi.org/10.31449/inf.v50i9.11597Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







