Graph Theoretical View on Text Understanding
Abstract
The system STAVEK-02 described in the contribution is concentrated on yielding supplemental information (besides parsing/tagging of words) for text understanding through the clustering of nouns and/or verbs according to their meanings and common features. The system consists of two word processing blocks. The first block is a vocabulary of 149,000 Slovenian word-roots and 3,100 endings and assigns the grammatical feature to the words by the grammatical rules without any link to pre-tagged lexical corpora. The second block is a Network of meanings of Slovenian words which in principle is a graph connecting 45,000 and 15,000 noun and verb lexemes, respectively, all of them hierarchically clustered into larger and larger groups having /exhibiting specific features and/or common properties of the words encompassed Such formations are in a similar lexical systems usually called synsets. Due to the complete connectivity between the synsets (groups) in the graph it is possible to find all possible property/feature paths between any pair of two words (nouns and/or verbs) in the network. Because clustering of words according to their meanings is made during the parsing of one, a pair, or several consecutive sentences, the features and properties that appear on the closest path between the particular words within the sentence are quite informative for their interpretation of the text. Clustering of the words according to their meanings during the parsing of text is a novel concept of the text interpretation. Ob the basis of a simple example of parsing a sentence and clustering of the nouns within it the concept using the network of meanings in the program STAVEK-02 is described and discussed.Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







