Contextual Embedding Comparison for Out-of-vocabulary Handling in Indonesian POS Tagging

Abstract

Out-of-vocabulary (OOV) problems remain a significant challenge in part-of-speech (POS) tagging. These problems affect not only tagging performance, but also downstream tasks, particularly in educational case studies. This issue is related to the limited availability of datasets for low-resource languages (LRLs), the absence of representative features, and the complexity of grammatical variation. Current approaches perform well in recognizing patterned OOV words, but often fail with unpatterned OOV words, such as proper nouns and polysemous words. To address this issue, this study employs contextual embeddings to represent OOV words, improving model recognition. Two types of embeddings are compared: static embeddings (Word2Vec, GloVe, and FastText) and contextual embeddings (ELMo, BERT, and Flair). These embeddings provide appropriate representations for OOV words. We evaluate models using accuracy and the macro F1 score on a curated Indonesian corpus of 30,960 words. The model was evaluated using the k-fold cross-validation method with both OOV and in-vocabulary (IV) word scenarios. The results of the experiment show that models with contextual embeddings outperform those with static embeddings. Flair achieved the highest level of accuracy (95.65%), while BERT and ELMo achieved similar levels of 92.73% and 91.61% respectively. Our proposed model was effective in handling OOV cases, achieving an accuracy of 88.12%, which is a 25.15% improvement over the baseline model. However, it still struggles with redundant words and capitalized letters. Future research should explore integrating form-based and contextual information to improve performance.

Author Biographies

Muhammad Alfian, Department of Informatics, Institut Teknologi Sepuluh Nopember

Muhammad Alfian is currently pursuing the Ph.D. degree in computer science with the Institut Teknologi Sepuluh Nopember. He received the applied bachelor’s degree in Informatics Engineering and the applied master’s degree in Informatics and Computer Engineering from the Politeknik Elektronika Negeri Surabaya, Indonesia, in 2019 and 2021. His research interests include natural language processing, machine learning, and deep learning.

Umi Laili Yuhana, Department of Informatics, Institut Teknologi Sepuluh Nopember

Umi Laili Yuhana is currently a Professor of software engineering for education with the Department of Informatics, Institut Teknologi Sepuluh Nopember. She received the B.S. degree from the Department of Informatics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia, in 2002, the M.S. degree in computer science and information engineering from National Taiwan University, in 2008, and the Ph.D. degree from the Department of Electrical Engineering, ITS, in 2019. She is involved in teaching software engineering and machine learning. She has published more than 50 journal articles and conference papers related to software engineering in education. Her research interests include software engineering, artificial intelligence, natural language processing, computer-aided instruction, and data mining.

Daniel Siahaan, Department of Informatics, Institut Teknologi Sepuluh Nopember

Daniel Siahaan is currently a Professor with the Department of Informatics, Institut Teknologi Sepuluh Nopember. He received the master’s degree in software engineering from Technische Universiteit Delft, in 2002, and the Ph.D. degree in software engineering from Technische Universiteit Eindhoven, in 2004. He has published more than 50 journal articles and conference papers related to software engineering. His research interests include requirements engineering and natural language processing. He is a member of the IEEE Computer Society.

Harum Munazharoh, Department of Indonesian Language and Literature, Universitas Airlangga

Harum Munazharoh is currently a lecturer with Universitas Airlangga. She received the B.A. degree and M.A. degree in Linguistics from Gadjah Mada University, Indonesia, in 2016. She is involved in teaching phonology, morphology, and syntax. Her research interests include linguistics, Indonesian language and culture, discourse and text analysis.

Eric Pardede, Department of Computer and Information Technology, La Trobe University

Eric Pardede is currently a Professor with the Department of Computer Science and IT, La Trobe University, Melbourne, Australia. He has published more than 140 peer-reviewed papers in books, journals, and conference proceedings. He has supervised 15 Ph.D. students to completion. His current research interests include data processing and analytics and IT/science higher education pedagogies and practices

References

A. Chiche and B. Yitagesu, “Part of Speech tagging: A systematic review of Deep Learning and Machine Learning approaches,” J Big Data, vol. 9, no. 1, 2022, doi: 10.1186/s40537-022-00561-y.

M. Alfian, U. L. Yuhana, and D. Siahaan, “Indonesian Part-of-Speech tagger: A comparative study,” in 2023 10th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA), IEEE, Oct. 2023, pp. 1–6. doi: 10.1109/ICAICTA59291.2023.10390353.

S. F. Kusuma, D. O. Siahaan, and C. Fatichah, “Automatic question generation with various difficulty levels based on knowledge ontology using a query template,” Knowl Based Syst, vol. 249, p. 108906, Aug. 2022, doi: 10.1016/j.knosys.2022.108906.

M. Z. Abdullah and C. Fatichah, “Feature-based POS tagging and sentence relevance for news multi-document summarization in Bahasa Indonesia,” Bulletin of Electrical Engineering and Informatics, vol. 11, no. 1, pp. 541–549, 2022, doi: 10.11591/eei.v11i1.3275.

L. Hu, Y. Tang, X. Wu, and J. Zeng, “Considering optimization of English grammar error correction based on neural network,” Neural Comput Appl, vol. 34, no. 5, pp. 3323–3335, Mar. 2022, doi: 10.1007/S00521-020-05591-2/FIGURES/17.

D. Hoesen and A. Purwarianti, “Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger,” Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018, pp. 35–38, 2019, doi: 10.1109/IALP.2018.8629158.

J. V. Lochter, R. M. Silva, and T. A. Almeida, “Multi-level out-of-vocabulary words handling approach,” Knowl Based Syst, vol. 251, Sep. 2022, doi: 10.1016/j.knosys.2022.108911.

P. Kolachina, M. Riedl, and C. Biemann, “Replacing OOV Words For Dependency Parsing With Distributional Semantics,” in NoDaLiDa 2017 - 21st Nordic Conference of Computational Linguistics, Proceedings of the Conference, 2017, pp. 11–9.

S. Garcia-Bordils et al., “Out-of-Vocabulary challenge report,” in Computer Vision -- ECCV 2022 Workshops, 2023, pp. 359–375. doi: 10.1007/978-3-031-25069-9_24.

X. Cai, S. Dong, and J. Hu, “A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records,” BMC Med Inform Decis Mak, vol. 19, 2019, doi: 10.1186/s12911-019-0762-7.

Imamah, U. L. Yuhana, A. Djunaidy, and M. H. Purnomo, “Development of text classification based on difficulty level in adaptive learning system using Convolutional Neural Network,” International Electronics Symposium 2021: Wireless Technologies and Intelligent Systems for Better Human Lives, IES 2021 - Proceedings, pp. 238–243, Sep. 2021, doi: 10.1109/IES53407.2021.9594021.

F. Gargiulo, S. Silvestri, M. Ciampi, and G. De Pietro, “Deep Neural Network for hierarchical extreme multi-label text classification,” Applied Soft Computing Journal, vol. 79, pp. 125–138, 2019, doi: 10.1016/j.asoc.2019.03.041.

S. Chotirat and P. Meesad, “Part-of-Speech tagging enhancement to Natural Language Processing for Thai WH-Question classification with Deep Learning,” Heliyon, vol. 7, no. 10, 2021, doi: 10.1016/j.heliyon.2021.e08216.

S. K. Nambiar, S. Peter David, and S. Mary Idicula, “Abstractive summarization of text document in Malayalam language: enhancing attention model using POS tagging feature,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, no. 2, 2023, doi: 10.1145/3561819.

W. Liu and L. Wang, “POS-tagging enhanced Korean text summarization,” in Intelligent Computing Methodologies, Springer International Publishing, 2017, pp. 425–435. doi: 10.1007/978-3-319-63315-2_37.

W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, “Automatic Text Summarization: A comprehensive survey,” Mar. 01, 2021. doi: 10.1016/j.eswa.2020.113679.

V. H. Vu, Q. P. Nguyen, K. H. Nguyen, J. C. Shin, and C. Y. Ock, “Korean-Vietnamese neural machine translation with named entity recognition and part-of-speech tags,” IEICE Trans Inf Syst, vol. E103D, no. 4, 2020, doi: 10.1587/transinf.2019EDP7154.

Muljono, U. Afini, and C. Supriyanto, “Morphology analysis for Hidden Markov Model based Indonesian Part-of-Speech tagger,” in 2017 1st International Conference on Informatics and Computational Sciences (ICICoS), 2017, pp. 237–240. doi: 10.1109/ICICOS.2017.8276368.

I. I. Ayogu, A. O. Adetunmbi, B. A. Ojokoh, and S. A. Oluwadare, “A comparative study of hidden Markov model and conditional random fields on a Yorùba part-of-speech tagging task,” in Proceedings of the IEEE International Conference on Computing, Networking and Informatics, ICCNI 2017, 2017. doi: 10.1109/ICCNI.2017.8123784.

K. Nowakowski, M. Ptaszynski, F. Masui, and Y. Momouchi, “Improving Basic Natural Language Processing Tools for the Ainu Language,” Information 2019, Vol. 10, Page 329, vol. 10, no. 11, p. 329, Oct. 2019, doi: 10.3390/INFO10110329.

S. N. Bhattu, S. K. Nunna, D. V. L. N. Somayajulu, and B. Pradhan, “Improving code-mixed POS tagging using code-mixed embeddings,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 19, no. 4, p. 1, 2020, doi: 10.1145/3380967.

L. Moudjari, F. Benamara, and K. Akli-Astouati, “Multi-level embeddings for processing Arabic social media contents,” Comput Speech Lang, vol. 70, 2021, doi: 10.1016/j.csl.2021.101240.

M. Janicki, “Semi-supervised induction of POS-tag lexicons with tree models,” in International Conference Recent Advances in Natural Language Processing, RANLP, 2019, pp. 507–515. doi: 10.26615/978-954-452-056-4_060.

L. Keiper, A. Horbach, and S. Thater, “Improving POS tagging of German learner language in a reading comprehension scenario,” in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, 2016.

A. Jettakul, C. Thamjarat, K. Liaowongphuthorn, C. Udomcharoenchaikit, P. Vateekul, and P. Boonkwan, “A comparative study on various Deep Learning techniques for Thai NLP lexical and syntactic Tasks on noisy data,” in Proceeding of 2018 15th International Joint Conference on Computer Science and Software Engineering, JCSSE 2018, 2018. doi: 10.1109/JCSSE.2018.8457368.

D. G. Anastasyev, A. I. Andrianov, and E. M. Indenbom, “Part-of-speech tagging with rich language description,” in Komp’juternaja Lingvistika i Intellektual’nye Tehnologii, 2017.

E. Partalidou, E. Spyromitros-Xioufis, S. Doropoulos, S. Vologiannidis, and K. I. Diamantaras, “Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy,” in Proceedings - 2019 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2019, 2019, pp. 337–341. doi: 10.1145/3350546.3352543.

H. Yu, J. An, J. Yoon, H. Kim, and Y. Ko, “Simple methods to overcome the limitations of general word representations in natural language processing tasks,” Comput Speech Lang, vol. 59, pp. 91–113, 2020, doi: 10.1016/j.csl.2019.04.009.

M. S. Won, Y. S. Choi, S. Kim, C. W. Na, and J. H. Lee, “An embedding method for unseen words considering contextual information and morphological information,” in Proceedings of the ACM Symposium on Applied Computing, 2021, pp. 1055–1062. doi: 10.1145/3412841.3441982.

Y. Liu, W. Che, Y. Wang, B. Zheng, B. Qin, and T. Liu, “Deep contextualized word embeddings for universal dependency parsing,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 19, no. 1, pp. 1–17, 2019, doi: 10.1145/3326497.

F. Marulli, M. Pota, and M. Esposito, “A comparison of character and word embeddings in bidirectional LSTMs for POS tagging in Italian,” in Smart Innovation, Systems and Technologies, 2019, pp. 14–23. doi: 10.1007/978-3-319-92231-7_2.

S. Fu, N. Lin, G. Zhu, and S. Jiang, “Towards Indonesian Part-of-Speech tagging: Corpus and models,” 2018 International Conference on Asian Language Processing (IALP), vol. 1, pp. 303–307, 2018.

A. Millour and K. Fort, “Unsupervised data augmentation for less-resourced languages with no standardized spelling,” in International Conference Recent Advances in Natural Language Processing, RANLP, 2019, pp. 776–784. doi: 10.26615/978-954-452-056-4_090.

G. Antipov, S. A. Berrani, N. Ruchaud, and J. L. Dugelay, “Learned vs hand-crafted features for pedestrian gender recognition,” MM 2015 - Proceedings of the 2015 ACM Multimedia Conference, pp. 1263–1266, Oct. 2015, doi: 10.1145/2733373.2806332.

P. Passban, Q. Liu, and A. Way, “Boosting neural Pos tagger for farsi using morphological information,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 16, no. 1, pp. 1–15, 2016, doi: 10.1145/2934676.

M. Alfian, U. L. Yuhana, D. Siahaan, H. Munazharoh, and E. Pardede, “Handling Out-of-Vocabulary in Indonesian POS Tagging: A Comparative Study,” in 2025 International Conference on Smart Computing, IoT and Machine Learning, SIML 2025, Surakarta: Institute of Electrical and Electronics Engineers Inc., Jul. 2025, pp. 1–6. doi: 10.1109/SIML65326.2025.11080832.

A. Makazhanov and Z. Yessenbayev, “Character-based feature extraction with LSTM networks for POS-tagging task,” in Application of Information and Communication Technologies, AICT 2016 - Conference Proceedings, 2017. doi: 10.1109/ICAICT.2016.7991654.

A. Kemos, H. Adel, and H. Schütze, “Neural semi-Markov conditional random fields for robust character-based part-of-speech tagging,” in NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 2019, pp. 2736–2743.

P. Boonkwan and T. Supnithi, “Bidirectional deep learning of context representation for joint word segmentation and POS tagging,” in Advances in Intelligent Systems and Computing, 2018, pp. 184–196. doi: 10.1007/978-3-319-61911-8_17.

M. Pota, F. Marulli, M. Esposito, G. De Pietro, and H. Fujita, “Multilingual POS tagging by a composite Deep Architecture based on Character-Level features and on-the-fly enriched Word Embeddings,” Knowl Based Syst, vol. 164, pp. 309–323, 2019, doi: 10.1016/j.knosys.2018.11.003.

K. Kurniawan and A. F. Aji, “Toward a standardized and more accurate Indonesian Part-of-Speech tagging,” Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018, pp. 303–307, 2019, doi: 10.1109/IALP.2018.8629236.

S. Besharati, H. Veisi, A. Darzi, and S. H. H. Saravani, “A hybrid statistical and deep learning based technique for Persian part of speech tagging,” Iran Journal of Computer Science, vol. 4, no. 1, p. 35, 2021, doi: 10.1007/s42044-020-00063-1.

N. Bölücü and B. Can, “Unsupervised joint PoS tagging and stemming for agglutinative languages,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 18, no. 3, pp. 1–21, 2019, doi: 10.1145/3292398.

B. Wang, A. Wang, F. Chen, Y. Wang, and C. C. J. Kuo, “Evaluating word embedding models: Methods and experimental results,” 2019, Cambridge University Press. doi: 10.1017/ATSIP.2019.12.

T. Gui, Q. Zhang, H. Huang, M. Peng, and X. Huang, “Part-of-speech tagging for twitter with adversarial neural networks,” in EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings, 2017, pp. 2411–2420. doi: 10.18653/v1/d17-1256.

L. Qu, G. Ferraro, L. Zhou, W. Hou, N. Schneider, and T. Baldwin, “Big data small data, in domain out-of domain, known word unknown word: The impact of word representations on sequence labelling tasks,” in CoNLL 2015 - 19th Conference on Computational Natural Language Learning, Proceedings, 2015, pp. 83–93. doi: 10.18653/v1/k15-1009.

J. Wulff and A. Søgaard, “Learning finite state word representations for unsupervised Twitter adaptation of POS taggers,” in ACL-IJCNLP 2015 - Workshop on Noisy User-Generated Text, WNUT 2015 - Proceedings of the Workshop, 2015, pp. 162–166.

Authors

  • Muhammad Alfian Department of Informatics, Institut Teknologi Sepuluh Nopember
  • Umi Laili Yuhana Department of Informatics, Institut Teknologi Sepuluh Nopember
  • Daniel Siahaan Department of Informatics, Institut Teknologi Sepuluh Nopember
  • Harum Munazharoh Department of Indonesian Language and Literature, Universitas Airlangga
  • Eric Pardede Department of Computer and Information Technology, La Trobe University

DOI:

https://doi.org/10.31449/inf.v49i22.11204

Downloads

Published

12/18/2025

How to Cite

Alfian, M., Yuhana, U. L., Siahaan, D., Munazharoh, H., & Pardede, E. (2025). Contextual Embedding Comparison for Out-of-vocabulary Handling in Indonesian POS Tagging. Informatica, 49(22). https://doi.org/10.31449/inf.v49i22.11204