Adaptive Transformer-Based Framework for Cross-Lingual Translation Similarity Detection with Bilingual Embedding Alignment

Jiao Jiao

Abstract


This study proposes a novel deep learning framework for bilingual translation similarity detection that addresses semantic gaps between structurally different languages through an Adaptive Transformer with dynamic masking as the core innovation. The framework features three key components: the adaptive transformer with dynamic content-based and structure-aware masking mechanisms that adjust attention weights based on cross-lingual semantic relevance, cross-lingual feature representation with supervised and unsupervised bilingual embedding alignment strategies, and a multi-dimensional similarity measurement framework incorporating semantic, syntactic, and pragmatic dimensions. Experiments on three language pairs (English-Chinese, English-German, and English-Urdu) demonstrate significant performance improvements, with the proposed method achieving an F1 score of 0.876 — a 7.2% relative improvement over the best baseline (0.817). Ablation studies confirm that adaptive masking and cross-lingual alignment are crucial for handling cultural adaptations and non-literal translations. This research has significant applications in machine translation quality assessment, cross-lingual information retrieval systems, and multilingual plagiarism detection.


Full Text:

PDF

References


Yang F, Deng J. Design of intelligent module design for humanoid translation robot by combining the deep learning with blockchain technology. Scientific Reports, 2023, 13(1): 3948.

Lo C, Simard M. Fully unsupervised crosslingual semantic textual similarity metric based on BERT for identifying parallel data[C]//Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). 2019: 206-215.

Shajalal M, Aono M. Semantic textual similarity between sentences using bilingual word semantics. Progress in Artificial Intelligence, 2019, 8: 263-272.

Muneer I, Nawab R M A. Cross-lingual text reuse detection using translation plus monolingual analysis for English-Urdu language pair. Transactions on Asian and Low-Resource Language Information Processing, 2021, 21(2): 1-18.

Li J, Zhang J, Qian M. Cross-Linguistic Similarity Evaluation Techniques Based on Deep Learning. Advances in Multimedia, 2022, 2022(1): 5439320.

Seki K. Cross-lingual text similarity exploiting neural machine translation models. Journal of Information Science, 2021, 47(3): 404-418.

Chen M. A deep learning-based intelligent quality detection model for machine translation. IEEE Access, 2023, 11: 89469-89477.

Wu Y, Liang Q. An Intelligent Error Detection Model for Machine Translation Using Composite Neural Network-based Semantic Perception. IEEE Access, 2024.

Ranasinghe T, Mitkov R, Orăsan C, et al. Semantic textual similarity based on deep learning. Corpora in Translation and Contrastive Research in the Digital Age: Recent advances and explorations, 2021, 158: 101.

Natarajan B, Rajalakshmi E, Elakkiya R, et al. Development of an end-to-end deep learning framework for sign language recognition, translation, and video generation. IEEE Access, 2022, 10: 104358-104374.

JP S, Menon V K, KP S, et al. Generation of cross-lingual word vectors for low-resourced languages using deep learning and topological metrics in a data-efficient way. Electronics, 2021, 10(12): 1372.

Min J. Cross-Language Translation Algorithm Based on Word Vector and Syntactic Analysis. International Journal of Multiphysics, 2024, 18(2).

Sharma S, Diwakar M, Singh P, et al. Machine translation systems based on classical-statistical-deep-learning approaches. Electronics, 2023, 12(7): 1716.

Ju L, Salvosa A A. Research and Optimization of English Automatic Translation System Based on Machine Learning Algorithm[C]//2024 9th International Symposium on Computer and Information Processing Technology (ISCIPT). IEEE, 2024: 1-5.

Lei L. Intelligent Recognition English Translation Model Based on Embedded Machine Learning and Improved GLR Algorithm. Mobile Information Systems, 2022, 2022(1): 5632131.

Zhang J, Liu S, Li M, et al. Bilingually-constrained phrase embeddings for machine translation[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014: 111-121.

Yuan Y. Human translation quality estimation: feature-based and deep learning-based. University of Leeds, 2018.

Huy P Q. Cross-Lingual Evidence-Based Strategies for Identifying Fabrications in Neural Translation Systems. Transactions on Artificial Intelligence, Machine Learning, and Cognitive Systems, 2024, 9(11): 1-10.

Razaq A, Shah B, Khan G, et al. Improving paraphrase generation using supervised neural-based statistical machine translation framework. Neural Computing and Applications, 2023: 1-15.

Sun Y. [Retracted] Analysis of Chinese Machine Translation Training Based on Deep Learning Technology. Computational Intelligence and Neuroscience, 2022, 2022(1): 6502831.

Abdallah A, Kasem M, Hamada M A, et al. Automated question-answer medical model based on deep learning technology[C]//Proceedings of the 6th International Conference on Engineering & MIS 2020. 2020: 1-8.




DOI: https://doi.org/10.31449/inf.v49i32.8834

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.