Integrating Equation-Based Labeling and Classification for Adaptive Turkish Vocabulary Acquisition
Abstract
Traditional vocabulary evaluation techniques frequently emphasize correctness above behavioral indications such as attempts and reaction time. To overcome this gap, our study proposes a machine learning technique that combines behavioral analysis with linguistic insights to discover vocabulary gaps among Turkish language learners. A Support Vector Machine (SVM) model was constructed with a Radial Basis Function (RBF) kernel and refined via grid search to maximize hyperparameters (C=10, γ=0.1) using a dataset of 1,000 interactions from 20 students. Behavioral attributes such as attempt count, answer response time, and answer correctness were collected to quantify student uncertainty and engagement. The approach also integrates word difficulty levels and thematic categories. An equation-based labeling technique was first applied to identify vocabulary weaknesses, laying the foundation for subsequent machine learning classification. The findings demonstrated strong performance, achieving an accuracy of 89%, precision of 86%, recall of 91%, and an F1-score of 88%, surpassing linear and polynomial kernel alternatives. These results underscore the importance of behavioral metrics in adaptive learning systems and support scalable integration into mobile applications.References
Albrecht, C. M., Marianno, F., & Klein, L. J. (2021). Autogeolabel: Automated label generation for geospatial machine learning. Paper presented at the 2021 IEEE International Conference on Big Data (Big Data).
Baker, R. S., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of educational data mining, 1(1), 3-17.
Berrar, D. (2019). Bayes’ theorem and naive Bayes classifier. Encyclopedia of Bioinformatics and Computational Biology, 1, 403-412. In.
Bobák, P., Čmolík, L., & Čadík, M. (2023). Reinforced Labels: Multi-agent deep reinforcement learning for point-feature label placement. IEEE Transactions on Visualization Computer Graphics, 30(9), 5908-5922.
Bolton, R. J., & Hand, D. J. (2002). Statistical fraud detection: A review. Statistical science, 17(3), 235-255.
Bratko, I. (1997). Machine learning: Between accuracy and interpretability. In Learning, networks and statistics (pp. 163-177): Springer.
Cambria, E., & White, B. (2014). Jumping NLP curves: A review of natural language processing research. IEEE Computational intelligence magazine, 9(2), 48-57.
De Ville, B. (2013). Decision trees. Wiley Interdisciplinary Reviews: Computational Statistics, 5(6), 448-455.
Diallo, R., Edalo, C., & Awe, O. O. (2024). Machine Learning Evaluation of Imbalanced Health Data: A Comparative Analysis of Balanced Accuracy, MCC, and F1 Score. In Practical Statistical Learning and Data Science Methods: Case Studies from LISA 2020 Global Network, USA (pp. 283-312): Springer.
François, T. (2009). Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL. Paper presented at the Proceedings of the Student Research Workshop at EACL 2009.
Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems: " O'Reilly Media, Inc.".
Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support vector classification. In: Taipei, Taiwan.
Karatzoglou, A., Meyer, D., & Hornik, K. (2006). Support vector machines in R. Journal of statistical software, 15, 1-28.
Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in medicine, 23(1), 89-109.
Kramer, O. (2013). Dimensionality reduction with unsupervised nearest neighbors (Vol. 51): Springer.
Michaud, E. J., Liu, Z., & Tegmark, M. (2023). Precision machine learning. Entropy, 25(1), 175.
Nam, S., Collins-Thompson, K., Jurgens, D., & Tong, X. (2024). Finding Educationally Supportive Contexts for Vocabulary Learning with Attention-Based Models. Paper presented at the Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).
PHP-ML. (2025). Retrieved from https://php-ml.readthedocs.io/en/latest/
Schmarje, L., Grossmann, V., Michels, T., Nazarenus, J., Santarossa, M., Zelenka, C., & Koch, R. (2023). Label Smarter, Not Harder: CleverLabel for Faster Annotation of Ambiguous Image Classification with Higher Quality. Paper presented at the DAGM German Conference on Pattern Recognition.
Shin, J., & Park, J. (2021). Pedagogical Word Recommendation: A novel task and dataset on personalized vocabulary acquisition for L2 learners. arXiv preprint arXiv:2112.13808.
Simon, L., Webster, R., & Rabin, J. (2019). Revisiting precision and recall definition for generative model evaluation. arXiv preprint arXiv, 05441.
Stember, J. N., & Shalu, H. (2022). Deep reinforcement learning with automated label extraction from clinical reports accurately classifies 3D MRI brain volumes. Journal of digital imaging, 35(5), 1143-1152.
Sulaiman, M., & Roy, K. (2022). Fair classification via transformer neural networks: Case study of an educational domain. arXiv preprint arXiv, 01410.
van der Waa, J., Nieuwburg, E., Cremers, A., & Neerincx, M. (2021). Evaluating XAI: A comparison of rule-based and example-based explanations. Artificial Intelligence in medicine, 291, 103404.
Zhang, F., Zhou, S., Wang, Y., Wang, X., & Hou, Y. (2024). Label assignment matters: A gaussian assignment strategy for tiny object detection. IEEE Transactions on Geoscience Remote Sensing.
Zhang, S., Jafari, O., & Nagarkar, P. (2021). A survey on machine learning techniques for auto labeling of video, audio, and text data. arXiv preprint arXiv, 03784.
Zhang, W., Wang, Y., & Wang, S. (2022). Predicting academic performance using tree-based machine learning models: A case study of bachelor students in an engineering department in China. Education Information Technologies, 27(9), 13051-13066.
DOI:
https://doi.org/10.31449/inf.v49i27.8821Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







