Research on Fine-Tuned BERT-CRF Joint Labeling Model for Contractual Syntactic and Lexical Ambiguity Resolution in Business

Xiaohong Huang; Xiao Tao

doi:10.31449/inf.v50i13.10364

Research on Fine-Tuned BERT-CRF Joint Labeling Model for Contractual Syntactic and Lexical Ambiguity Resolution in Business

Abstract

Ambiguity in business English contracts (e.g., lexical polysemy, syntactic nesting) seriously undermines accurate understanding and execution efficiency. To address this, we propose a Fine-Tuned BERT-CRF joint labeling model, with the following key methodological details: 1) Corpus Construction: A 12,800- sample contract corpus (covering 8 business domains) is built, with unified labeling for 4 ambiguity types (lexical, syntactic, reference, pragmatic) and 9,750 annotated ambiguity instances. 2) Model Design: BERT is adaptively fine-tuned via “term-aware masking” (prioritizing contract-specific terms); CRF is integrated as a sequence labeling layer with a domain-prior transition matrix to optimize label generation. 3) Training Strategies: Adversarial training (FGSM/PGD perturbations) enhances robustness, and incremental online learning (importance sampling) enables dynamic adaptation. Experimental results demonstrate: For lexical ambiguity, the model achieves 67.5% accuracy, 78.9% recall, and 72.8% F1; boundary labeling accuracy for the clause “USD 90.1 million” improves from 56% to 79%; syntactic ambiguity F1 is 62.3%. Lightweight optimization (hidden layer dimension=34) reduces parameters by 45% (to 12.78 M) but requires 89 training rounds to balance performance. This model outperforms generic BERT-CRF by 9.2% in F1 for complex contract processing, verifying its effectiveness in resolving contractual syntactic and lexical ambiguity.

Authors

Xiaohong Huang School of Humanities and Law, Fuzhou Technology and Business University
Xiao Tao

DOI:

https://doi.org/10.31449/inf.v50i13.10364

Downloads

Published

05/18/2026

How to Cite

Huang, X., & Tao, X. (2026). Research on Fine-Tuned BERT-CRF Joint Labeling Model for Contractual Syntactic and Lexical Ambiguity Resolution in Business. Informatica, 50(13). https://doi.org/10.31449/inf.v50i13.10364

Download Citation

Issue

Vol. 50 No. 13 (2026): Online-only issue

Section

Online-only

License

Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.

All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.

Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.

Research on Fine-Tuned BERT-CRF Joint Labeling Model for Contractual Syntactic and Lexical Ambiguity Resolution in Business

Abstract

Authors

DOI:

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information