String Transformation Based Morphology Learning

,


Introduction
In the area of natural language processing (NLP), word structure is an essential information for higher layer analysis such as syntax, part of speech tagging, named entity detection, sentiment and opinion analysis, and so on. The main difference between syntax and morphology is that while syntax works on the level of sentences, treating individual words as atoms, morphology works with intraword components.
According to morphology models, the words are built up using morphemes that are the smallest morphological units that encode semantic information. There are two types of morphemes: the lemma is the root, grammatically correct form of a word that's associated with the base meaning; while affixes are usually shorter character strings that slightly modify the meaning of the words. These affixes are language dependent, and can be prepended (incorrect), appended (flying) or simply inserted into the words. Prepended affixes are called prefixes, appended affixes are called suffixes, while affixes inserted inside the words are called infixes. This latter category is rare in most languages, one example is the Latin verb vincō where the n denotes present tense. The addition of affixes is called inflection, while the inverse opereation is called lemmatization.
Languages can be categorized into six main groups based on their morphological features [1]. Analytic languages such as English have a fix set of possible affixes for each part-of-speech category. Isolating languages like Chinese and Vietnamese usually have words that are their own stems, without any affixes. Languages that have only a few affix types usually use auxiliary words and word position to encode grammatical information. In intraflective languages (Arabic, Hebrew), consonants express the meaning of words, while vowels add the grammatical meaning. Synthetic languages have three subcategories: polysynthetic languages like Native American languages contain complicated words that are equivalent to sentences of other languages; in fusional languages such as Russian, Polish, Slovak, Czech, the morphemes are not easily distinguishable and often multiple grammatical relations are fused into one affix; agglutinative languages like Hungarian, Finnish, Turkish have many affix types and each word can contain a large number of affixes.
For different languages there are different models that can be used to learn morphological rules, as morphology is a language dependent area. Creating such models is a complex task, especially for agglutinative languages. In the literature we can find approaches that are based on suffix trees and error-driven learning [2] to optimally store transformation rules and search among them.
Hajic [3] proposed a generalized grammar model, suitable for both the synthetic and agglutinative languages. The author introduces a controlled rewriting system CRS A, V, K, t, R , where A is the alphabet, V is the set of variables, K contains the grammatical meanings (morphological categories), t maps the variables to types and R is a set of atomic rewrite rules. The substitution operation defined in the rewrite rules replaces all variables with some string, all instances of the same variable is replaced by the same string. The main parameters of an elementary substitution rule include the input state id, the output state id, the variable id, the morphological category and the resulting string. The article provides a formal framework to describe the transformation process, but it does not detail the rule generation process, since the model assumes that the rule set is constructed by human experts.
In the two-level morphology model [4], the inflected words are represented on two levels. The outer or surface level contains the written form of the words, while the inner or lexical level contains the morphological structures. For example, the surface level word "tries" is related to the lexical level "try+s". The lexical level represents the morphological categories and separator symbols for the surface form. The model uses a dictionary to store the valid lemmas and morpheme categories. The transformation between the lexical level and the surface level is implemented with a set of finite state transducers. A transducer is a special automaton that can model the string transformations.
FSTs (finite state transducers) are widely used to manage morphological analysis for both generation and recognition processes. One of the main issues related to this model is the computational complexity of the implementations. It was shown that it is inefficient to work with complex morphological constraints [5], where there are complex dependencies among the different morpheme units, like vowel harmony. The analysis shows that both recognition and generation are NP-hard problems. One of the most widely known approaches to construct an FST is the OSTIA method [6,7]. It first generates a prefix tree transducer, then merges all the possible states, pushes some output elements toward the initial state and eliminates all the non-deterministic elements.
The OSTIA algorithm was later improved by Gildea and Jurafsky [8]. They extended the algorithm with a better similarity alignment component. Theron and Cloete [9] proposed a more general method based on edit-distance similarities of the base and inflected words. The algorithm learns the two-level transformation rules, calculating the string edit difference between each source-target pair and determining the edit sequences as a minimal acyclic finite state automaton. The constructed automaton can segment the target word into its constituent morphemes. The algorithm determines the minimal discerning context for each rule. This processing phase is done by comparing all the possible contiguous contexts to determine the shortest context.
Regarding current achievements, one important approach is presented in [10] and [11]. In the proposal of Goldsmith, a simplified morphology model is used containing substitution of suffixes. The words are decomposed into sets of short substrings, where the substrings have a role similar to the morphemes. The proposed method uses the concept of minimal description length to determine the appropriate word segmentations.
Another popular and simple method is the so-called tree of aligned suffix rules (TASR) [12] that is a great match for morphological rule induction: it can be built very quickly according to previous evaluations and can be searched very quickly as well, providing an outstanding correction ratio. Unlike dictionary based systems and FSTs, the TASR method can inflect even previously unseen words correctly. The only downside of this model is that it can only handle inflection rules that modify the end of the input word. In Hungarian we must be able to describe not only suffix rules, but also prefix and infix rules.
Besides trees, there are existing models that use lattice structures to store transformation rules. The goal of [13] is to optimize the lattice size by dropping rules that have a small impact on the overall results. The rule model uses similar concepts to the Levenshtein model like additions, removals and replacements. The paper shows that this lattice based model has a very promising memory constraint, fast inflection time and a correctness ratio of almost 100%.
In this paper we present a novel model called Atomic String Transformation Rule Assembler (ASTRA) whose base concept is similar to TASR, but can handle any types of affixes, including prefixes, infixes and suffixes as well. Our test language is Hungarian, a morphologically complex, highly agglutinative language that is frequently targeted by morphological model researchers due to its complexity. In Hungarian, there are a high number of affix types that can form long affix type chains, moreover each affix type can modify the base form significantly, using vowel gradation and changing consonant lengths. The inflection rules of the language are complex, and there are several exceptions, too. Besides morphological rule induction, our model is capable of dealing with any other string transformation based problems as well. Such problems can be found in the area of biological informatics (e.g. investigating DNA sequences) and data mining (e.g. preprocessing of data including spelling correction and data cleaning).
The structure of this paper is the following: -Section 2 introduces the reference methods: dictionary based systems, finite state transducers, the tree of aligned suffix rules and the lattice based method.
-Section 3 describes the novel ASTRA method: its rule model, training phase and inflection phase. We also introduce three search algorithms to speed up inflection.
-The evaluation of the proposed method can be seen in section 4. The four metrics we measure and compare with the base methods are the training time, average inflection time, size and correctness ratio.
-In section 5 we present a general application of the ASTRA model.

Dictionary Based Models
One of the most basic methods for learning inflection rules is using dictionaries. A dictionary can be considered as a D ⊆ W × W relation for morphological usage: for each input word it can return an output word. Usually dictionaries not only contain the inflected forms of words, but also other semantic information like their meaning, part-of-speech tag, sample sentences and so on. There are many language dependent WordNet projects [14,15] whose goal is to build such databases. Besides automatic data mining techniques, these databases are often validated and corrected by human experts.
Because of the large magnitude of data (the Hungarian WordNet contains more than 40,000 synsets, i.e. word sets with the same meaning), dictionaries can take much time to build. Their advantage is that irregular morphological forms are guaranteed to be retained, they aren't dropped by generalization techniques. Besides the training time, the downside of dictionaries is the lack of generalization: other automated methods usually can handle previously unseen words, too, but dictionaries can only inflect and lemmatize words they know.

Finite State Transducers
Finite state automaton (FSA) is the base model for finite state transducers. An FSA is an A = Q, Σ, q , E, F where Q is the finite set of states, Σ is the input alphabet, q is the start state, E : Q × Σ → Q is the state transition relation and F is the set of accepting states.
Finite state transducers (FST) [7] extend this model with additional components, as well as with outputting strings. There are multiple transducer models. A rational transducer is a T = Q, Σ, Γ, q , E where Q, Σ and q are the same as for an FSA; Γ is the output alphabet and E ⊂ (Q × Σ * × Γ * × Q) is the state transition relation. In practice, Σ = Γ. A sequential transducer is almost the same, except for two additional conditions: A subsequential transducer is a special sequential transducer that has a sixth component: σ : Q → Γ * that is the state output function. Such transducer works in the following way: each input character causes a state transition and the label of this transition is appended to the output string. Finally, the ending state's output is also appended, resulting in the final output string. A transducer is onward if for every state, the state's output and the state transitions' outputs starting from this state have no common prefixes.
FSTs are used extensively while working with string transformations, because they have optimal sizes and can produce the output almost in constant time. However, as we'll see, with morphological applications, their generalization ability is not really usable.

Tree of Aligned Suffix Rules
There are 3 main types of substrings that can change in a word during inflection: prefixes, suffixes and infixes. The substring pre ∈ Σ * , |pre| > 0 is a prefix of the string s 1 ∈ Σ * if there exists another string s 2 ∈ Σ * such that s 1 = pre + s 2 . Similarly, the substring suff ∈ Σ * , |suff| > 0 is a suffix of the string s 1 if there exists another string s 2 such that s 1 = s 2 + suff. The substring inf ∈ Σ * , |inf| > 0 is an infix of the string s 1 if there exist two other strings s 2 , s 3 such that s 1 = s 2 + inf + s 3 where |s 2 | > 0 and |s 3 | > 0.
The TASR model can only work with morphological rules that modify the end of the words, meaning that it can only model suffix transformations. This restriction is acceptable for morphologically simpler languages, but complex agglutinative languages often contain prefix and infix transformation rules as well.
The goal of the TASR learning phase is to generate a set of suffix rules from a training word pair set. This set of rules is denoted by R T = {R T } in this paper. A suffix rule consists of two components: Here, σ T contains the word-ending characters that are modified by the rule, and τ T contains the replacement characters. As an example, for the English verb try whose past tense is tried, we can generate a suffix rule where σ T = y and τ T = ied.
If we have a word pair, for example (try, tried) we can generate multiple aligned suffix rules. The minimal suffix rule is (y, ied), and after extending this rule with one character at a time, we get (ry, ried) and (try, tried).
We can define a frequency metric freq (R T | I) for each rule R T based on the training word pair set I = {(w 1 , w 2 ) | w 1 , w 2 ∈ W }, counting the number of word pairs for which R T applies.
For every word pair in the training set, we must first generate all the aligned suffix rules according to the above definitions and insert these rules in a tree (T, ⊆). This tree will consist of nodes n T1 , n T2 , . . . , n Tm , each node n Ti associated with a set of rules n Ti → R Tij = σ Tij , τ Tij . All the rules associated with the same node have the same context.
Let's have two nodes: n T ↓ and n T ↑ . They are associated with the rules R T ↓i = σ T ↓i , τ T ↓i and R T ↑j = σ T ↑j , τ T ↑j , respectively. The n T ↓ node is the child of n T ↑ or shortly n T ↓ ⊂ n T ↑ if ∃x ∈ Σ : ∀i, j : The root node and rules are denoted by n T ⇑ → R T ⇑k = σ T ⇑k , τ T ⇑k . For the root, the following condition applies: ∀k : σ T ⇑k = min ij σ Tij .
After these definitions, we can define which rule is the winning rule of node n T ↓ among the associated R T ↓i = σ T ↓i , τ T ↓i rules. Let n T ↑ be the parent node with rules After that we can build the tree from the generated rules. Typically the most general rules will be close to the root node, while the most specific rules will be stored in the leaves. Therefore, during inflection we can search the tree in a bottom-up fashion, returning the winner rule of the first node we find whose context matches the input word. Since we start at the leaves, the first matching rule will be the most specific one, having the longest context. This means that the resulting inflected form will mirror the main characteristics of the training data.

Lattice Based Method
The rule model of the examined lattice based inflection method [13] where α ∈ Σ * is the prefix of the rule containing the characters before the changing part, σ ∈ Σ * is the core of the rule that is the changing part, ω ∈ Σ * is the postfix of the rule containing the characters after the changing part, -− → η ∈ N is the front index of the rule's context occurrence in the source word, -← − η ∈ N is the back index of the rule's context occurrence in the source word and δ i is a list of simple transformation steps on the core, These rules are generated automatically from training word pairs, then inserted into a lattice structure, where the parent-child relationship is based on rule context containment. In the original paper we formalized multiple lattice builder algorithms that tried to reduce the size of the resulting lattice. The best builder only inserts those rules and intersections into the lattice that are really responsible for the high correctness ratio, every other redundant rule is eliminated.
As we'll see, the size characteristics of this model is very promising, but because of the high degree of generalization, the lattice can inflect some words incorrectly. This is due to the overgeneralization effect of the lattice model itself.

Atomic string transformation rule assembler
The goal of the Atomic String Transformation Rule Assembler (ASTRA) model is to collect atomic, elementary patterns from a training word pair set during the training phase, and use the best matching atomic rules for each input word during the production phase. For these inputs, every matching, non-overlapping atomic rule is applied to produce the correct inflected form. As discussed previously, using these concepts, the proposed method can model prefix, infix and suffix inflection rules as well, thus can be used for morphologically complex agglutinative languages.
First of all, we define an extended alphabet so that it is easier to determine where a word starts and ends. Let's introduce two special characters, $ that will mark the start of the word and # that will mark the end of the word. If a rule's context contains any of these two special characters, it will be easier to determine if the beginning or the end of the word needs to be transformed.
Of course these characters are not part of the original Σ alphabet. The extended alphabet will be denoted byΣ = Σ ∪ {$, #}. We also define a new operator on strings that prepends $ and appends # to the string s: µ (w) =w = $+ w + #. The inverse operation drops the special characters from the input word: µ −1 (w) = w. The set of extended words is denoted byW .
The input of the training process for the new method is the same set of word pairs containing the base form and inflected form of the word, but the first step of the algorithm is to extend these word pairs with our new special characters. After the extension, we get a new training set I = {(w 1 ,w 2 )}.
We split each word pair to matching segments otherwise it is called invariant. In a segment decomposition, variant and invariant segments are alternating.
As one word pair might have multiple segment decompositions, we need to select the best one among them. To quantify the goodness of the decompositions, we use a segment fitness formula that returns how well-aligned the ψ i 1 → ψ i 2 segment is: Example 3.1. Let us choose a training word pair (dob, ledobott) 1 as an example to demonstrate the segment decomposition algorithm. First, the words are extended with the special characters: ($dob#, $ledobott#). One valid segment decomposition is the following: (ψ 1 1 = $, ψ 1 2 = $le), (ψ 2 1 = dob, ψ 2 2 = dob), (ψ 3 1 = #, ψ 3 2 = ott#). The middle segment is invariant, while the first and last ones are variant segments.
For each variant segment, we can define so-called atomic rules in the form of R A = (α A , σ A , τ A , ω A ) where α A is the prefix and ω A is the suffix. The rule context that must be searched in the input words later is γ A (R A ) = α A + σ A + ω A . We can see that with this rule model, not only suffix rules can be modelled, because of the new α A and ω A components.
Then, we can extend this core atomic rule with one character at a time on the left and right sides, symmetrically. Let's assume that In this case, the extended rule candidates are R Aij = α Aij , σ Aij , τ Aij , ω Aij with the following components (∀1 ≤ j ≤ min {n, m}): Here, w [i, j] denotes the substring of w from the ith to the jth character.
To make the generated atomic rules unambiguous, we have to make sure that the context of the rules only appear once in the base form of the word (w 1 ). Every atomic rule candidate whose context appears more than once in the base form of the word is dropped from the final set. Transforming a wordw ∈W using the atomic rule R A = (α A , σ A , τ A , ω A ) can be defined as means that we need to search γ A (R A ) inw, and replace σ A with τ A .
The base form of the method doesn't require to build a tree, we can simply group the atomic rules based on their contexts. A rule group is defined as a set of atomic rules For the atomic rules of example 3.2, we can produce nine different rule groups, each containing a single atomic rule except for the rule group with context $dob# that contains both (−, $, $le, dob#) and ($dob, #, ott#, −).
The goal of the training phase is to produce a set of rule groups R A = {Γ A } based on the training word pair set I. The generated atomic rule set can be used to inflect the given input words based on the training word pair set. For each input, our goal is to choose some atomic rules that match the input word. Rules with longer matching substrings in the input word are better than rules with shorter matching substrings. The fitness function is where k is a parameter and the θ function returns how similar the rule context is to the input word. To simplify things, we used k = 1 and a discrete θ function that returns 1 if γ (R A ) ⊆w, and 0 otherwise. Using this fitness function, we can choose the first n atomic rules that are best suited for the given input word where n is a parameter. We implemented three separate candidate selector algorithms. The first one is a sequential algorithm that processes each rule group one by one. If a rule group's context matches the input word, its atomic rules are added to the resulting set of candidate rules. The second one is a parallel algorithm that does the same thing in a divide and conquer manner, processing the rule groups in parallel. The number of threads depends on the number of our CPU cores. The third one uses a prefix tree that is built from the rule groups during the training phase. With the prefix tree, we can speed up the candidate search process by searching substrings of the input words. If a substring is found in the prefix tree, the appropriate rule group's atomic rules are added to the resulting set.
Since there might be multiple overlapping rule candidates that would transform the same substring of the word leading to ambiguity, among these rules only the first one is used, the others are dropped. After we chose the best non-overlapping rules, we can apply them one by one on the input word, producing its inflected form.

Evaluation of the proposed method
For evaluation purposes, we used a training word pair set generated by [16]. We chose the Hungarian accusative case We compared a custom dictionary implementation, Lucene's FST method, the TASR model, the previously mentioned lattice based method and the proposed ASTRA method, measuring their training times, their average inflection times, the sizes of their rule base and their correctness ratios, i.e. how much percent of evaluation words are inflected correctly after the training phase. If W + is the set of evaluation words for which the model yields a correct inflected form, and W − is the set of failed evaluation words, then the correctness ratio is W + / (W + + W − ). Where applicable, we also measured the differences using the sequential, parallel or prefix tree search algorithm in case of ASTRA.
In Figure 1a we can see the training time of the methods, using logarithmic scale for the y axis. As we can see, there are three different clusters based on the training time. The fastest solution is to store the already available set of word pairs in a dictionary, because we only have to store these records, no extra processing occurs. Building an FST is the next in line, but it has very similar characteristics to the AS-TRA method. If we include the prefix tree building as well, the ASTRA's training time increases a bit. The third cluster consists of the TASR and the lattice based methods. It can be seen that building a tree of aligned suffix rules takes more time as the previous methods, and the complexity of the lattice adds even more time to the TASR's results. Figure 1b shows the average inflection time of the methods. As we can expect, if we use an appropriate hash function in the dictionary implementation, retrieving the matching record for each input word becomes almost constant in time. The second best method as for average inflection time is the FST: it also has a very plain curve, but it's a bit higher than the dictionary's. ASTRA with a prefix tree comes next, but it's very close to the line of the lattice based method. The remaining methods have much steeper curves: TASR comes next, but the parallel search function with AS-TRA is very close to it; while the worst inflection time is achieved by the sequential search function. Note that although the inflection time of the prefix tree search variant is the best for ASTRA, it means a bit overhead during the training time. However, even with this overhead, we can say that it's worth using it.
In Figure 2 we can see the overall size of the rule bases, i.e. the number of word pairs in the dictionary, states in the FST, nodes in the TASR and the lattice, and atomic rules in case of ASTRA.
It is not surprising that there are more generated atomic rules in ASTRA than nodes in the tree of aligned suffix rules, since the atomic rule definition allows to have multiple variant segments in a word pair and from these variant segments, multiple core and extended atomic rules can be produced. On the other hand, TASR will only generate one minimal suffix rule per word pair and all of its aligned extensions. The advantage of the ASTRA model is that even with this higher number of rules and the prefix tree, we can train it faster than a TASR. Moreover it can cover more cases, including prefix, infix and suffix rules. The built FST has better size characteristics, because its builder algorithm merges every state that can be merged without losing information from the original training word pair set. It can be seen from the line of the dictionary that the number of states in an FST and the number of rules in the AS- TRA and TASR are higher than the number of input word pairs. However, the minimal lattice builder algorithm produces an even better lattice size, as the number of nodes in the resulting lattice is lower than the size of all the other structures. Finally, Figures 3a and 3b show the correctness ratio of the models. The results of the left side were achieved by using disjoint training and evaluation word pair sets. We can see that the correctness ratio platoes a bit below 95% for TASR and ASTRA, the latter one performing a bit better. It can be also seen that the lattice based method is worse, probably because of its higher degree of generalization. When we examined the results of the lattice compared to TASR and ASTRA, we saw that in multiple cases the lattice found a node whose rule resulted in an invalid inflected form. The correctness ratio of the dictionary and the FST is 0%, because they could not generalize at all. For the dictionary, it is understandable, because a dictionary is a static map of word pairs. On the other hand, although an FST can generalize, these types of morphological applications don't benefit from this generalization, as the generalized transformations do not result in real inflection rules.
On the right side of the figure, we can see what happens if we use the first 100, 200, . . . 10,000 word pairs to train the methods, and then use the same 10,000 word pairs for evaluation. All the methods have an almost 100% correctness ratio at the end of the diagram. The only reason that we cannot reach 100% is that in the training word pair set there are records with the same lemma and different inflected forms such as örömöt and örömet that are two valid inflected forms of the Hungarian word öröm (joy in English). The difference resides in the characteristics of the curves. The dictionary and the FST cannot really generalize inflection rules, so their lines are linear. The other methods can reach higher percentages more quickly, but as we can see, the ASTRA method is even better than the TASR in that it can produce a better correctness ratio with a smaller number of training word pairs. The lattice based method is worse than TASR and ASTRA in this case as well.

General application of the ASTRA model
One of the scientific areas of applying string algorithms including string transformation based methods is the area of bioinformatics and computational biology [17]. DNA sequences are modelled using strings of four characters matching the four types of bases: adenine (A), thymine (T), guanine (G) and cytosine (C). One of the goals of bioinformaics is to compare genes in DNAs to find regions that are important, find out which region is responsible for what functions and features and determine how genetic information is encoded. The process of DNA analysis is a very computational intensive task, that's why modelling, statistical algorithms and mathematical techniques are important aspects of success. Besides applying string transformations, computational biology uses many string matching and comparison techniques as well [18]. Finding the longest matching substrings of two strings (DNA sequences) helps in finding the best DNA alignments and thus comparing different DNA sequences, finding matching parts and differences. One of the techniques used for this comparison is the application of the edit distance computation originally published by Levenshtein [19] for morphological analysis.
Another application area where string transformation based methods are applied is data mining. Data mining engines usually consist of multiple phases to extract information out of unannotated training data such as long free texts. The first phase is often called data cleaning, where the raw input data is preprocessed so that invalid records are either removed or fixed before moving on with the data mining algorithms. One way to fix the typos and other errors in free texts is spelling correction. Spelling correction can be interpreted as learning those string transformations that can transform an unknown word containing typos to the closest known word. There are multiple techniques to solve this problem, usually iterative algorithms perform better as there can be multiple problems with a word that are easier to fix in multiple steps [20]. The goal is to find a word w ∈ W for any unknown string s so that their distance d (w, s) < δ is lower than an acceptable threshold.
A third, more intuitive non-morphological application of the ASTRA model is character sorting. Let's have a random string s ∈ Σ * with a given length of |s| = n. The goal is to rearrange the characters in s so that for each index i, 1 ≤ i < |s|, s i ≤ s i+1 for a given partial ordering, for example lexicographic ordering.
For our evaluation, we used input lengths 100, 200, . . . , 3000. For each input length, we generated a random string and applied a pre-trained ASTRA on the string incrementally until the output was equal to the input. Then we  For the training process of the ASTRA, we generated a training word pair set. Each word pair contained a necessary transformation as the core, such as (ba, ab), (ca, ac), . . . , (zy, yz). To make the rules more noisy, we also generated a random string of 10 characters and prepended and appended it to both words in the word pair. For each word pair, this random prefix-suffix part was different. The results were all correct. The number of required iterations and the sorting time is displayed in Figure 4.
Unlike ASTRA, the other examined methods could not sort the characters correctly. The dictionary and FST methods, as we saw previously, cannot be used for inputs that are not present in the training word pairs set. TASR can only transform inputs that should be modified at the end. The lattice based method's disadvantage in this case is that it is not position agnostic, therefore it cannot determine the atomic transformations necessary for sorting the characters.

Conclusion
In this paper we presented the novel ASTRA model. The motivation was that although the TASR method can handle suffix morphological rules extremely well, it cannot describe rules modifying the beginning or the middle of words. In the target language of our research, Hungarian, there are a few affix types that have prefix inflection rules. The proposed rule model contains multiple components to not only store the changing part of the word, but also its preceding and following characters. We also defined a novel training algorithm that can generate such rules and store them in rule groups. A fitness function was defined that helps us choose the best rules from the rule database for each input word and make sure we can produce the inflected form easily. Finally, we implemented three search algorithms: one sequential, one parallel and one prefix tree based search function. We evaluated the proposed method, comparing its training time, average inflection time, size and correctness ratio with the same metrics of some base models, including a dictionary based system, Lucene's FST implementation, the TASR method and a lattice based model. The training time of ASTRA is ex-ceptional, only the dictionary's and FST's training times are better, even if we also build a prefix tree from the generated rules. The same can be said about the average inflection times. The size of ASTRA is the worst compared to the other methods, but this is not really a problem, because the inflection time does not get worse, and we can handle more general inflection rules. The correctness ratio is also exceptional, moreover it reaches higher percentages even with less knowledge, i.e. fewer training word pairs than for example the TASR method. Besides these metrics, the advantage of the proposed novel ASTRA method is that it can be used not only for morphological rule induction, but also for any types of problems that can be modelled with string transformations. To demonstrate this, we adapted ASTRA to a character sorting problem with a correction ratio of 100%.

Aknowledgement
The described article/presentation/study was carried out as part of the EFOP-3.6.1-16-00011 "Younger and Renewing University -Innovative Knowledge City -institutional development of the University of Miskolc aiming at intelligent specialisation" project implemented in the framework of the Szechenyi 2020 program. The realization of this project is supported by the European Union, co-financed by the European Social Fund.