Hybrid Context Aware Gujarati Spell Correction Using Norvig Algorithm, GRU, and IndicBERT
Abstract
Numerous applications in the domain of Natural Language Processing (NLP) rely on spelling and grammatical checks, including email, opinion mining, text summarization, chatbots, and countless more. An individual's credibility, cybersecurity efforts, legal ambiguities, and NLP application performance can all take a hit if they make a mistake when dealing with regional languages such as Assamese, Gujarati, Hindi, etc. In order to lessen the frequency of spelling errors, this article examines and concentrates on Gujarati. In addition to a thorough examination of issues related to the Gujarati language, this article provides up-to-date strategies for fixing spelling mistakes based on context of the word. A novel hybrid approach ensures top-notch Gujarati context aware spelling verification. After thoroughly considering all the suggestions, we used a two-layer GRU network and the IndicBERTv2-SS model, which was fine-tuned only on our curated Gujarati dataset of about 20,000 sentences (70/15/15 split into training, validation, and test), to choose the best correction while keeping the context in mind. Normalization for Gujarati (diacritics, compound characters, and numbers), regex-based tokenization, and edit-distance candidate creation were all part of preprocessing. Researchers used accuracy, precision, and recall to assess the test split. Our proposed IndicBERT-GUJBRIJAPU tool got 93.49% accuracy, 94.46% precision, 90.13% recall and 91.59% F1 Score, which is much better than other approaches for context-aware correction.References
1] N. G. Patel and D. D. B. Patel, "Research review of Rule Based Gujarati Grammar Implementation with the Concepts of Natural Language Processing (NLP)," Journal of Emerging Technologies and Innovative Research (JETIR), vol. 5, no. 9, 2018. https://doi.org/10.6084/m9.jetir.JETIRA006276
[2] N. P. Desai and V. K. Dabhi, "Resources and components for Gujarati NLP systems: a survey.," Artificial Intelligence Review, vol. 55, pp. 1-19, 2022. https://doi.org/10.1007/s10462-021-10120-1
[3] H. Patel, B. Patel and K. Lad, "Jodani: A spell checking and suggesting tool for Gujarati language," in 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2021. https://doi.org/10.1109/Confluence51648.2021.9377072
[4] S. Singh and S. Singh., "HINDIA: a deep-learning-based model for spell-checking of Hindi language," Neural Computing and Applications, vol. 33, no. 8, pp. 3825-3840, 2021. https://doi.org/10.1007/s00521-020-05207-9
[5] M. Gokani and R. Mamidi, "GSAC: A Gujarati Sentiment Analysis Corpus from Twitter," in Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, Association for Computational Linguistics, 2023. https://doi.org/ 10.18653/v1/2023.wassa-1.12
[6] S. Bhuva and D. Mishra, "Gujarati Optical Character Recognition Using Efficient Text Feature Extraction Approaches.," Informatica, vol. 49, no. 28, 2025. https://doi.org/10.31449/inf.v49i28.8341
[7] J. Baxi and B. Bhatt., "GujMORPH-ADatasetforCreatingGujaratiMorphological Analyzer," in ProceedingsoftheThirteenthLanguageResourcesandEvaluationConference, 2022. https://aclanthology.org/2022.lrec-1.767/
[8] A. Desai, "Gujarati handwritten numeral optical character reorganization through neural network.," Pattern recognition, vol. 43, no. 7, pp. 2582-2589, 2010. https://doi.org/10.1016/j.patcog.2010.01.008
[9] S. Antani and L. Agnihotri, "Gujarati character recognition," in Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99, Bangalore, India, 1999. https://doi.org/10.1109/ICDAR.1999.791813
[10] Tailor, C., Patel, B."Chunker for Gujarati Language Using Hybrid Approach," in Rising Threats in Expert Applications and Solutions. Advances in Intelligent Systems and Computing, 2021. https://doi.org/10.1007/978-981-15-6014-9_10
[11] K. Suba, D. Jiandani and P. Bhattacharyya, "Hybrid inflectional stemmer and rule-based derivational stemmer for gujarati.," in Proceedings of the 2nd workshop on south southeast Asian natural language processing (WSSANLP), 2011. https://aclanthology.org/W11-3001
[12] B. K. Y. Panchal and A. Shah, "Spell Checker Using Norvig Algorithm for Gujarati Language," in nternational Conference on Smart Data Intelligence. Singapore, Singapore, 2024. https://doi.org/10.1007/978-981-97-3191-6_21
[13] N. Patel and D. Patel, "Implementation Approach of Indian Language Gujarati Grammar's Concept “sandhi” using the Concepts of Rule-based NLP," in 8th International Conference on Computing for Sustainable Global Development (INDIACom)., 2021.https://doi.org/10.1109/INDIACom51348.2021.00085.
[14] J. Sheth and B. C. Patel., "Gujarati phonetics and Levenshtein based string similarity measure for Gujarati language.," in 5th National Conference on Indian Language Computing., 2015. https://www.researchgate.net/publication/314153559
[15] T. A. Gal. "Natural Language Processing (NLP) Pipeline." Medium, 23 Oct 2023. [Online]. Available: https://medium.com/@theaveragegal/natural-language-processing-nlp-pipeline-e766d832a1e5
[16] P. Patel, K. Popat and P. Bhattacharyya, "Hybrid stemmer for Gujarati," in Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing, 2010. https://aclanthology.org/W10-3607
[17] M. Parikh and A. Desai, "Recognition of Handwritten Gujarati Conjuncts Using the Convolutional Neural Network Architectures: AlexNet, GoogLeNet, Inception V3, and ResNet50," in Advances in Computing and Data Sciences: 6th International Conference, ICACDS2022, Kurnool, India, 2022. https://doi.org/10.1007/978-3-031-12641-3_24.
[18] B. K. Y. Panchal and A. Shah, "NLP‐Based Spellchecker and Grammar Checker for Indic Languages.," in Natural Language Processing for Software Engineering, Scrivener Publishing LLC, 2025, pp. 43-70. https://doi.org/10.1002/9781394272464.ch4
[19] C. Tailor and B. Patel, "Sentence Tokenization Using Statistical Unsupervised Machine LearningandRule-BasedApproachforRunningTextinGujaratiLanguage," in Emerging Trends in Expert Applications andSecurity.AdvancesinIntelligent SystemsandComputing, 2018. https://doi.org/10.1007/978-981-13-2285-3_38
[20] S. Sooraj, K. Manjusha, M. A. Kumar and K. P. Soman, "Deep learning-based spell checker for Malayalam language," Journal of Intelligent & Fuzzy Systems, vol. 34, no. 3, pp. 1427-1434, 2018. https://doi.org/10.3233/JIFS-169438
[21] S. Murugan, T. A. Bakthavatchalam and M. Sankarasubbu, "Symspell and lstm based spell-checkers for tamil," in Tamil Internet Conference, 2020. https://www.researchgate.net/publication/3499249
[22] N. Hossain, M. H. Bijoy, S. Islam and S. Shatabda, "Panini: a transformer-based grammatical error correction method for Bangla," Neural Computing and Applications, vol. 36, pp. 3463-3477, 2024. https://doi.org/10.1007/s00521-023-09211-7
[23] R. Phukan, M. Neog and N. Baruah, "A Deep Learning Based Approach For Spelling Error Detection In The Assamese Language," in 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 2023. https://doi.org/10.1109/ICCCNT56998.2023.10306972
[24] S. S. Jamwal and P. Gupta., "A Novel Hybrid Approach for the Designing and Implementation of Dogri Spell Checker," in Data, Engineering and Applications: Select Proceedings of IDEA 2021, Singapore, 2022. https://doi.org/10.1007/978-981-19-4687-5_53
[25] S. Singh and S. Singh, "Systematic review of spell-checkers for highly inflectional languages," Artificial Intelligence Review, vol. 53, no. 6, pp. 4051-4092, 2020. https://doi.org/10.1007/s10462-019-09787-4
[26] M. Das, S. Borgohain, J. Gogoi and S. Nair, "Design and implementation of a spell checker for Assamese," in Language Engineering Conference, 2002. Proceedings, 2002. https://doi.org/10.1109/LEC.2002.1182303
[27] S. Iqbal, W. Anwar, U. I. Bajwa and Z. Rehman., "Urdu spell checking: Reverse edit distance approach," in In Proceedings of the 4th workshop on south and southeast asian natural language processing, 2013. https://aclanthology.org/W13-4707
[28] R. Sakuntharaj and S. Mahesan, "A novel hybrid approach to detect and correct spelling in Tamil text," in 2016 IEEE international conference on information and automation for sustainability (ICIAfS), 2016. https://doi.org/10.1109/ICIAFS.2016.7946522
[29] B. Bhagat and M. Dua, "Enhancing performance of end-to-end gujarati language asr using combination of integrated feature extraction and improved spell corrector algorithm," in ITM Web of Conferences, 2023. https://doi.org/10.1051/itmconf/20235401016
[30] D. Kakwani, A. Kunchukuttan, S. Golla, G. NC, A. Bhattacharyya, M. M. Khapra and P. Kumar., "IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages," In Findings of the association for computational linguistics: EMNLP 2020, pp. 4948-4961, 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.445
[31] S. Khanuja, D. Bansal, S. Mehtani, S. Khosla, A. Dey, B. Gopalan, D. Margam, P. Aggarwal, R. Nagipogu, S. Dave and S. Gupta, "Muril: Multilingual representations for indian languages.," arXiv preprint arXiv:2103, p. 10730, 2021. https://doi.org/10.48550/arXiv.2103.10730
[32] Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer and V. Stoyanov., "Unsupervised cross-lingual representation learning at scale," arXiv preprint arXiv:1911.02116, 2019.
[33] A. Lawaye and B. S. Purkayastha, "KASHMIRI SPELL CHECKER AND SUGGESTION SYSTEM," THE COMMUNICATIONS, vol. 21, no. 2, p. 123, 2012. https://ddeku.edu.in/Files/2cfa4584-5afe-43ce-aa4b-ad936cc9d3be/Journal/6bb36225-ee44-4d4c-9d3d-0905436082e8.pdf
[34] Kaur and H. Singh, "Design and implementation of HINSPELL—Hindi spell checker using hybrid approach," International Journal of scientific research and management, vol. 3, no. 2, pp. 2058-2062, 2015. https://ijsrm.net/index.php/ijsrm/article/view/102
[35] R. Sankaravelayuthan, "Spell and grammar checker for Tamil.," Developing computing tools for Tamil, vol. 5, no. 23, pp. 52-64, 2015. https://doi.org/10.13140/RG.2.1.3700.6803
[36] A. Lawaye and B. S. Purkayastha, "Design and implementation of spell checker for Kashmiri," International Journal of Scientific Research, vol. 5, no. 7, 2016. https://www.researchgate.net/publication/321906322
[37] U. M. G. Rao, A. P. Kulkarni and a. P. K. Christopher Mala, "Telugu Spell-Checker," Vaagartha, 2012. https://sanskrit.uohyd.ac.in/faculty/amba/PUBLICATIONS/papers/ITIC-ss.pdf
[38] S. Saha, F. Tabassum, K. Saha, Akter. and Marjana, "Bangla Spell Checker and Suggestion Generator," (Dissertation, United International University), 2019. https://www.academia.edu/96829901/
[39] J. A. R. C. P. Pfeiffer, A. Kamath, I. Vulić, S. Ruder, K. Cho and I. Gurevych, "Adapterhub: A framework for adapting transformers," arXiv preprint arXiv:2007.07779, 2020. https://doi.org/10.48550/arXiv.2007.07779
[40] S. Deode, J. Gadre, A. Kajale, A. Joshi and R. Joshi, "L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence representations using multilingual BERT."," arXiv preprint arXiv:2304.11434, 2023. https://doi.org/10.48550/arXiv.2304.11434
[41] M. Nejja and A. Yousfi., "The context in automatic spell correction," Procedia Computer Science, vol. 73, pp. 109-114, 2015. https://doi.org/10.1016/j.procs.2015.12.055
[42] K. Ingason, S. B. Jóhannsson, E. Rögnvaldsson, H. Loftsson and S. Helgadóttir., "Context-sensitive spelling correction and rich morphology.," in Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009), 2009. https://aclanthology.org/W09-4634.pdf
[43] Yunus and M. Masum., "A context free spell correction method using supervised machine learning algorithms," International Journal of Computer Applications, vol. 176, no. 27, pp. 36-41, 2020. https://doi.org/10.5120/ijca2020920288
[44] P. Gupta, "A context-sensitive real-time Spell Checker with language adaptability," in 2020 IEEE 14th International Conference on Semantic Computing (ICSC), 2020. https://doi.org/10.48550/arXiv.1910.11242
[45] Priya, M.C.S., Renuka, D.K., Kumar, L.A. et al. Multilingual low resource Indian language speech recognition and spell correction using Indic BERT. Sādhanā 47, 227 (2022). https://doi.org/10.1007/s12046-022-01973-5
[46] Parida, S. et al. (2022). BertOdia: BERT Pre-training for Low Resource Odia Language. In: Dehuri, S., Prasad Mishra, B.S., Mallick, P.K., Cho, SB. (eds) Biologically Inspired Techniques in Many Criteria Decision Making. Smart Innovation, Systems and Technologies, vol 271. Springer, Singapore. https://doi.org/10.1007/978-981-16-8739-6_32
[47] Dashti, S.M.S., Khatibi Bardsiri, A. & Jafari Shahbazzadeh, M. PERCORE: A Deep Learning-Based Framework for Persian Spelling Correction with Phonetic Analysis. Int J Comput Intell Syst 17, 114 (2024). https://doi.org/10.1007/s44196-024-00459-y
DOI:
https://doi.org/10.31449/inf.v49i34.9836Downloads
Published
Versions
- 01/06/2026 (2)
- 10/22/2025 (1)
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







