Discriminating Between Closely Related Languages on Twitter"

Nikola  Ljubešić; Denis Kranjčić

Discriminating Between Closely Related Languages on Twitter"

Abstract

Editorial: "In this paper we tackle the problem of discriminating Twitter users by the language they tweet in, taking into account very similar South-Slavic languages – Bosnian, Croatian, Montenegrin and Serbian. We apply the supervised machine learning approach by annotating a subset of 500 users from an existing Twitter collection by the language the users primarily tweet in. We show that by using a simple bag-ofwords model, univariate feature selection, 320 strongest features and a standard classifier, we reach user classification accuracy of 98%. Annotating the whole 63,160 users strong Twitter collection with the best performing classifier and visualizing it on a map via tweet geo-information, we produce a Twitter language map which clearly depicts the robustness of the classifier."

Authors

Nikola Ljubešić
Denis Kranjčić

Downloads

How to Cite

Ljubešić, N. ., & Kranjčić, D. (2015). Discriminating Between Closely Related Languages on Twitter". Informatica, 39(1). Retrieved from https://www.informatica.si/index.php/informatica/article/view/746

Download Citation

Issue

Vol. 39 No. 1 (2015)

Section

Regular papers

License

Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.

All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.

Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.

Discriminating Between Closely Related Languages on Twitter"

Abstract

Authors

Downloads

How to Cite

Issue

Section

License

Developed By

Information