Hybrid Context Aware Gujarati Spell Correction Using Norvig Algorithm, GRU, and IndicBERT
Abstract
Numerous applications in the domain of Natural Language Processing (NLP) rely on spelling and grammatical checks, including email, opinion mining, text summarization, chatbots, and countless more. An individual's credibility, cybersecurity efforts, legal ambiguities, and NLP application performance can all take a hit if they make a mistake when dealing with regional languages such as Assamese, Gujarati, Hindi, etc. In order to lessen the frequency of spelling errors, this article examines and concentrates on Gujarati. In addition to a thorough examination of issues related to the Gujarati language, this article provides up-to-date strategies for fixing spelling mistakes based on context of the word. A novel hybrid approach ensures top-notch Gujarati context aware spelling verification. After thoroughly considering all the suggestions, we used a two-layer GRU network and the IndicBERTv2-SS model, which was fine-tuned only on our curated Gujarati dataset of about 20,000 sentences (70/15/15 split into training, validation, and test), to choose the best correction while keeping the context in mind. Normalization for Gujarati (diacritics, compound characters, and numbers), regex-based tokenization, and edit-distance candidate creation were all part of preprocessing. Researchers used accuracy, precision, and recall to assess the test split. Our proposed IndicBERT-GUJBRIJAPU tool got 93.49% accuracy, 94.46% precision, 90.13% recall and 91.59% F1 Score, which is much better than other approaches for context-aware correction.References
1] N. G. Patel and D. D. B. Patel, "Research review of Rule Based Gujarati Grammar Implementation with the Concepts of Natural Language Processing (NLP)," Journal of Emerging Technologies and Innovative Research (JETIR), vol. 5, no. 9, 2018. https://doi.org/10.6084/m9.jetir.JETIRA006276
[2] N. P. Desai and V. K. Dabhi, "Resources and components for Gujarati NLP systems: a survey.," Artificial Intelligence Review, vol. 55, pp. 1-19, 2022. https://doi.org/10.1007/s10462-021-10120-1
[3] H. Patel, B. Patel and K. Lad, "Jodani: A spell checking and suggesting tool for Gujarati language," in 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2021. https://doi.org/10.1109/Confluence51648.2021.9377072
[4] S. Singh and S. Singh., "HINDIA: a deep-learning-based model for spell-checking of Hindi language," Neural Computing and Applications, vol. 33, no. 8, pp. 3825-3840, 2021. https://doi.org/10.1007/s00521-020-05207-9
[5] M. Gokani and R. Mamidi, "GSAC: A Gujarati Sentiment Analysis Corpus from Twitter," in Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, Association for Computational Linguistics, 2023. https://doi.org/ 10.18653/v1/2023.wassa-1.12
[6] S. Bhuva and D. Mishra, "Gujarati Optical Character Recognition Using Efficient Text Feature Extraction Approaches.," Informatica, vol. 49, no. 28, 2025. https://doi.org/10.31449/inf.v49i28.8341
[7] J. Baxi and B. Bhatt., "GujMORPH-ADatasetforCreatingGujaratiMorphological Analyzer," in ProceedingsoftheThirteenthLanguageResourcesandEvaluationConference, 2022. https://aclanthology.org/2022.lrec-1.767/
[8] A. Desai, "Gujarati handwritten numeral optical character reorganization through neural network.," Pattern recognition, vol. 43, no. 7, pp. 2582-2589, 2010. https://doi.org/10.1016/j.patcog.2010.01.008
[9] S. Antani and L. Agnihotri, "Gujarati character recognition," in Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99, Bangalore, India, 1999. https://doi.org/10.1109/ICDAR.1999.791813
[10] Tailor, C., Patel, B."Chunker for Gujarati Language Using Hybrid Approach," in Rising Threats in Expert Applications and Solutions. Advances in Intelligent Systems and Computing, 2021. https://doi.org/10.1007/978-981-15-6014-9_10
[11] K. Suba, D. Jiandani and P. Bhattacharyya, "Hybrid inflectional stemmer and rule-based derivational stemmer for gujarati.," in Proceedings of the 2nd workshop on south southeast Asian natural language processing (WSSANLP), 2011. https://aclanthology.org/W11-3001
[12] B. K. Y. Panchal and A. Shah, "Spell Checker Using Norvig Algorithm for Gujarati Language," in nternational Conference on Smart Data Intelligence. Singapore, Singapore, 2024. https://doi.org/10.1007/978-981-97-3191-6_21
[13] N. Patel and D. Patel, "Implementation Approach of Indian Language Gujarati Grammar's Concept “sandhi” using the Concepts of Rule-based NLP," in 8th International Conference on Computing for Sustainable Global Development (INDIACom)., 2021.https://doi.org/10.1109/INDIACom51348.2021.00085.
[14] J. Sheth and B. C. Patel., "Gujarati phonetics and Levenshtein based string similarity measure for Gujarati language.," in 5th National Conference on Indian Language Computing., 2015. https://www.researchgate.net/publication/314153559
[15] T. A. Gal. "Natural Language Processing (NLP) Pipeline." Medium, 23 Oct 2023. [Online]. Available: https://medium.com/@theaveragegal/natural-language-processing-nlp-pipeline-e766d832a1e5
[16] P. Patel, K. Popat and P. Bhattacharyya, "Hybrid stemmer for Gujarati," in Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing, 2010. https://aclanthology.org/W10-3607
[17] M. Parikh and A. Desai, "Recognition of Handwritten Gujarati Conjuncts Using the Convolutional Neural Network Architectures: AlexNet, GoogLeNet, Inception V3, and ResNet50," in Advances in Computing and Data Sciences: 6th International Conference, ICACDS2022, Kurnool, India, 2022. https://doi.org/10.1007/978-3-031-12641-3_24.
[18] B. K. Y. Panchal and A. Shah, "NLP‐Based Spellchecker and Grammar Checker for Indic Languages.," in Natural Language Processing for Software Engineering, Scrivener Publishing LLC, 2025, pp. 43-70. https://doi.org/10.1002/9781394272464.ch4
[19] C. Tailor and B. Patel, "Sentence Tokenization Using Statistical Unsupervised Machine LearningandRule-BasedApproachforRunningTextinGujaratiLanguage," in Emerging Trends in Expert Applications andSecurity.AdvancesinIntelligent SystemsandComputing, 2018. https://doi.org/10.1007/978-981-13-2285-3_38
[20] S. Sooraj, K. Manjusha, M. A. Kumar and K. P. Soman, "Deep learning-based spell checker for Malayalam language," Journal of Intelligent & Fuzzy Systems, vol. 34, no. 3, pp. 1427-1434, 2018. https://doi.org/10.3233/JIFS-169438
[21] S. Murugan, T. A. Bakthavatchalam and M. Sankarasubbu, "Symspell and lstm based spell-checkers for tamil," in Tamil Internet Conference, 2020. https://www.researchgate.net/publication/3499249
[22] N. Hossain, M. H. Bijoy, S. Islam and S. Shatabda, "Panini: a transformer-based grammatical error correction method for Bangla," Neural Computing and Applications, vol. 36, pp. 3463-3477, 2024. https://doi.org/10.1007/s00521-023-09211-7
[23] R. Phukan, M. Neog and N. Baruah, "A Deep Learning Based Approach For Spelling Error Detection In The Assamese Language," in 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 2023. https://doi.org/10.1109/ICCCNT56998.2023.10306972
[24] S. S. Jamwal and P. Gupta., "A Novel Hybrid Approach for the Designing and Implementation of Dogri Spell Checker," in Data, Engineering and Applications: Select Proceedings of IDEA 2021, Singapore, 2022. https://doi.org/10.1007/978-981-19-4687-5_53
[25] S. Singh and S. Singh, "Systematic review of spell-checkers for highly inflectional languages," Artificial Intelligence Review, vol. 53, no. 6, pp. 4051-4092, 2020. https://doi.org/10.1007/s10462-019-09787-4
[26] M. Das, S. Borgohain, J. Gogoi and S. Nair, "Design and implementation of a spell checker for Assamese," in Language Engineering Conference, 2002. Proceedings, 2002. https://doi.org/10.1109/LEC.2002.1182303
[27] S. Iqbal, W. Anwar, U. I. Bajwa and Z. Rehman., "Urdu spell checking: Reverse edit distance approach," in In Proceedings of the 4th workshop on south and southeast asian natural language processing, 2013. https://aclanthology.org/W13-4707
[28] R. Sakuntharaj and S. Mahesan, "A novel hybrid approach to detect and correct spelling in Tamil text," in 2016 IEEE international conference on information and automation for sustainability (ICIAfS), 2016. https://doi.org/10.1109/ICIAFS.2016.7946522
[29] B. Bhagat and M. Dua, "Enhancing performance of end-to-end gujarati language asr using combination of integrated feature extraction and improved spell corrector algorithm," in ITM Web of Conferences, 2023. https://doi.org/10.1051/itmconf/20235401016
[30] D. Kakwani, A. Kunchukuttan, S. Golla, G. NC, A. Bhattacharyya, M. M. Khapra and P. Kumar., "IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages," In Findings of the association for computational linguistics: EMNLP 2020, pp. 4948-4961, 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.445
[31] S. Khanuja, D. Bansal, S. Mehtani, S. Khosla, A. Dey, B. Gopalan, D. Margam, P. Aggarwal, R. Nagipogu, S. Dave and S. Gupta, "Muril: Multilingual representations for indian languages.," arXiv preprint arXiv:2103, p. 10730, 2021. https://doi.org/10.48550/arXiv.2103.10730
[32] Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer and V. Stoyanov., "Unsupervised cross-lingual representation learning at scale," arXiv preprint arXiv:1911.02116, 2019.
[33] A. Lawaye and B. S. Purkayastha, "KASHMIRI SPELL CHECKER AND SUGGESTION SYSTEM," THE COMMUNICATIONS, vol. 21, no. 2, p. 123, 2012. https://ddeku.edu.in/Files/2cfa4584-5afe-43ce-aa4b-ad936cc9d3be/Journal/6bb36225-ee44-4d4c-9d3d-0905436082e8.pdf
[34] Kaur and H. Singh, "Design and implementation of HINSPELL—Hindi spell checker using hybrid approach," International Journal of scientific research and management, vol. 3, no. 2, pp. 2058-2062, 2015. https://ijsrm.net/index.php/ijsrm/article/view/102
[35] R. Sankaravelayuthan, "Spell and grammar checker for Tamil.," Developing computing tools for Tamil, vol. 5, no. 23, pp. 52-64, 2015. https://doi.org/10.13140/RG.2.1.3700.6803
[36] A. Lawaye and B. S. Purkayastha, "Design and implementation of spell checker for Kashmiri," International Journal of Scientific Research, vol. 5, no. 7, 2016. https://www.researchgate.net/publication/321906322
[37] U. M. G. Rao, A. P. Kulkarni and a. P. K. Christopher Mala, "Telugu Spell-Checker," Vaagartha, 2012. https://sanskrit.uohyd.ac.in/faculty/amba/PUBLICATIONS/papers/ITIC-ss.pdf
[38] S. Saha, F. Tabassum, K. Saha, Akter. and Marjana, "Bangla Spell Checker and Suggestion Generator," (Dissertation, United International University), 2019. https://www.academia.edu/96829901/
[39] J. A. R. C. P. Pfeiffer, A. Kamath, I. Vulić, S. Ruder, K. Cho and I. Gurevych, "Adapterhub: A framework for adapting transformers," arXiv preprint arXiv:2007.07779, 2020. https://doi.org/10.48550/arXiv.2007.07779
[40] S. Deode, J. Gadre, A. Kajale, A. Joshi and R. Joshi, "L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence representations using multilingual BERT."," arXiv preprint arXiv:2304.11434, 2023. https://doi.org/10.48550/arXiv.2304.11434
[41] M. Nejja and A. Yousfi., "The context in automatic spell correction," Procedia Computer Science, vol. 73, pp. 109-114, 2015. https://doi.org/10.1016/j.procs.2015.12.055
[42] K. Ingason, S. B. Jóhannsson, E. Rögnvaldsson, H. Loftsson and S. Helgadóttir., "Context-sensitive spelling correction and rich morphology.," in Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009), 2009. https://aclanthology.org/W09-4634.pdf
[43] Yunus and M. Masum., "A context free spell correction method using supervised machine learning algorithms," International Journal of Computer Applications, vol. 176, no. 27, pp. 36-41, 2020. https://doi.org/10.5120/ijca2020920288
[44] P. Gupta, "A context-sensitive real-time Spell Checker with language adaptability," in 2020 IEEE 14th International Conference on Semantic Computing (ICSC), 2020. https://doi.org/10.48550/arXiv.1910.11242
[45] Priya, M.C.S., Renuka, D.K., Kumar, L.A. et al. Multilingual low resource Indian language speech recognition and spell correction using Indic BERT. Sādhanā 47, 227 (2022). https://doi.org/10.1007/s12046-022-01973-5
[46] Parida, S. et al. (2022). BertOdia: BERT Pre-training for Low Resource Odia Language. In: Dehuri, S., Prasad Mishra, B.S., Mallick, P.K., Cho, SB. (eds) Biologically Inspired Techniques in Many Criteria Decision Making. Smart Innovation, Systems and Technologies, vol 271. Springer, Singapore. https://doi.org/10.1007/978-981-16-8739-6_32
[47] Dashti, S.M.S., Khatibi Bardsiri, A. & Jafari Shahbazzadeh, M. PERCORE: A Deep Learning-Based Framework for Persian Spelling Correction with Phonetic Analysis. Int J Comput Intell Syst 17, 114 (2024). https://doi.org/10.1007/s44196-024-00459-y
DOI:
https://doi.org/10.31449/inf.v49i34.9836Downloads
Published
Versions
- 01/06/2026 (2)
- 10/22/2025 (1)
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







