Using Semi-Supervised Learning and Wikipedia to Train an Event Argument Extraction System

Patrik Zajec, Dunja Mladenić


The paper presents a methodology for training an event argument extraction system in a semi-supervised setting. We use Wikipedia and Wikidata to automatically obtain a small noisily labeled dataset and a large unlabeled dataset. The dataset consists of event clusters containing Wikipedia pages in multiple languages. The unlabeled data is iteratively labeled using semi-supervised learning combined with probabilistic soft logic to infer the pseudo-label of each example from the predictions of multiple base learners. The proposed methodology is applied to Wikipedia pages about earthquakes and terrorist attacks in a  cross-lingual setting. Our experiments show improvement of the results when using the proposed methodology. The system achieves F1-score of 0.79 when only the automatically labeled dataset is used, and F1-score of 0.84 when trained according to the methodology with semi-supervised learning combined with probabilistic soft logic.

Full Text:



Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.