Research on Fine-Tuned BERT-CRF Joint Labeling Model for Contractual Syntactic and Lexical Ambiguity Resolution in Business
Abstract
Ambiguity in business English contracts (e.g., lexical polysemy, syntactic nesting) seriously undermines accurate understanding and execution efficiency. To address this, we propose a Fine-Tuned BERT-CRF joint labeling model, with the following key methodological details: 1) Corpus Construction: A 12,800- sample contract corpus (covering 8 business domains) is built, with unified labeling for 4 ambiguity types (lexical, syntactic, reference, pragmatic) and 9,750 annotated ambiguity instances. 2) Model Design: BERT is adaptively fine-tuned via “term-aware masking” (prioritizing contract-specific terms); CRF is integrated as a sequence labeling layer with a domain-prior transition matrix to optimize label generation. 3) Training Strategies: Adversarial training (FGSM/PGD perturbations) enhances robustness, and incremental online learning (importance sampling) enables dynamic adaptation. Experimental results demonstrate: For lexical ambiguity, the model achieves 67.5% accuracy, 78.9% recall, and 72.8% F1; boundary labeling accuracy for the clause “USD 90.1 million” improves from 56% to 79%; syntactic ambiguity F1 is 62.3%. Lightweight optimization (hidden layer dimension=34) reduces parameters by 45% (to 12.78 M) but requires 89 training rounds to balance performance. This model outperforms generic BERT-CRF by 9.2% in F1 for complex contract processing, verifying its effectiveness in resolving contractual syntactic and lexical ambiguity.DOI:
https://doi.org/10.31449/inf.v50i13.10364Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







