String Transformation Based Morphology Learning

László Kovács, Gabor Szabo

Abstract


There are several morphological methods that can solve the morphological rule induction problem. For different languages this task represents different difficulty levels. In this paper we propose a novel method that can learn prefix, infix and suffix transformations as well. The test language is Hungarian, and we chose a previously generated word pair set of accusative case for evaluating the method, comparing its training time, memory requirements, average inflection time and correctness ratio with some of the most popular models like dictionaries, finite state transducers, the tree of aligned suffix rules and a lattice based method. We also provide multiple training and searching strategies, introducing parallelism and the concept of prefix trees to optimize the number of rules that need to be processed for each input word. This newly created novel method can be applied not only for morphology, but also for any problems in the field of bioinformatics and data mining that can benefit from string transformations learning.

Full Text:

PDF

References


A. Gelbukh, M. Alexandrov and S.-Y. Han (2004) Detecting inflection patterns in natural language by minimization of morphological model, Iberoamerican Congress on Pattern Recognition, Springer, pp. 432–-438.

G. Satta and J. C. Henderson (1997) String transformation learning, In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 444-451.

J. Hajic (1988) Formal morphology, In Proceedings of International Conference on Computational Linguistics, pp. 223-229.

K. Koskenniemi (1983) Two-level morphology: A General Computational Model for Word-Form Recognition and Production, Department of General Linguistics, University of Helsinki, Finland.

L. Bauer (2003) Introducing linguistic morphology, Edinburgh University Press.

J. Oncina, P. García and E. Vidal (1993) Learning subsequential transducers for pattern recognition interpretation tasks, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 15, Number 5, pp. 448-458.

C. De la Higuera (2010) Grammatical inference: learning automata and grammars, Cambridge University Press.

D. Gildea and D. Jurafsky (1995) Automatic Induction of Finite State Transducers for Simple Phonological Rules, Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, Cambridge, Massachusetts, pp. 9-15.

P. Theron and I. Cloete (1997) Automatic acquisition of two-level morphological rules, Proceedings of the fifth conference on Applied natural language processing, pp. 103-110.

J. Goldsmith (2006) An algorithm for the unsupervised learning of morphology, Natural Language Engineering, Volume 12, Number 4, pp. 353-371.

J. Lee and J. Goldsmith (2016) Linguistica 5: Unsupervised Learning of Linguistic Structure, HLT-NAACL Demos, pp. 22-26.

K. Shalonova and P. Flach (2007) Morphology learning using tree of aligned suffix rules, In ICML Workshop: Challenges and Applications of Grammar Induction.

G. Szabó and L. Kovács (2018) Lattice based morphological rule induction, Acta Universitas Apulensis, Number 53, pp. 93-110.

C. Fellbaum (1998) WordNet: An electronic lexical database, MIT Press, Cambridge.

M. Miháltz, Cs. Hatvani, J. Kuti, Gy. Szarvas, J. Csirik, G. Prósz{'e}ky and T. Váradi (2007) Methods and results of the Hungarian WordNet project, In Proceedings of GWC 2008: 4th Global WordNet Conference, University of Szeged, pp. 311-320.

G. Szabó and L. Kovács (2015) Efficiency analysis of inflection rule induction, In Proceedings of the 2015 16th International Carpathian Control Conference (ICCC), IEEE, pp. 521-525.

N. C. Jones (2004) An introduction to bioinformatics algorithms, MIT press.

E. Mourad and Z. Y. Albert (2011) Algorithms in computational molecular biology: techniques, approaches and applications, Volume 21, John Wiley & Sons.

V. I. Levenshtein (1966) Binary codes capable of correcting deletions, insertions, and reversals, Soviet physics doklady, Volume 10, Number 8, pp. 707-710.

S. Cucerzan and E. Brill (2004) Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users, EMNLP, Volume 4, pp. 293-300.




DOI: https://doi.org/10.31449/inf.v43i4.2520

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.