Research on Fine-Tuned BERT-CRF Joint Labeling Model for Contractual Syntactic and Lexical Ambiguity Resolution in Business

Abstract

Ambiguity in business English contracts (e.g., lexical polysemy, syntactic nesting) seriously undermines accurate understanding and execution efficiency. To address this, we propose a Fine-Tuned BERT-CRF joint labeling model, with the following key methodological details: 1) Corpus Construction: A 12,800- sample contract corpus (covering 8 business domains) is built, with unified labeling for 4 ambiguity types (lexical, syntactic, reference, pragmatic) and 9,750 annotated ambiguity instances. 2) Model Design: BERT is adaptively fine-tuned via “term-aware masking” (prioritizing contract-specific terms); CRF is integrated as a sequence labeling layer with a domain-prior transition matrix to optimize label generation. 3) Training Strategies: Adversarial training (FGSM/PGD perturbations) enhances robustness, and incremental online learning (importance sampling) enables dynamic adaptation. Experimental results demonstrate: For lexical ambiguity, the model achieves 67.5% accuracy, 78.9% recall, and 72.8% F1; boundary labeling accuracy for the clause “USD 90.1 million” improves from 56% to 79%; syntactic ambiguity F1 is 62.3%. Lightweight optimization (hidden layer dimension=34) reduces parameters by 45% (to 12.78 M) but requires 89 training rounds to balance performance. This model outperforms generic BERT-CRF by 9.2% in F1 for complex contract processing, verifying its effectiveness in resolving contractual syntactic and lexical ambiguity.

Authors

  • Xiaohong Huang School of Humanities and Law, Fuzhou Technology and Business University
  • Xiao Tao

DOI:

https://doi.org/10.31449/inf.v50i13.10364

Downloads

Published

05/18/2026

How to Cite

Huang, X., & Tao, X. (2026). Research on Fine-Tuned BERT-CRF Joint Labeling Model for Contractual Syntactic and Lexical Ambiguity Resolution in Business. Informatica, 50(13). https://doi.org/10.31449/inf.v50i13.10364