Hybrid-MELAu: A Hybrid Mixing Engineered Linguistic Features Framework  Based on Autoencoder for Social Bot Detection.

Zineb Ferhat Hamida; Allaoua Refouf; Ahlem Drif; Silvia Giordano

doi:10.31449/inf.v46i6.4081

Hybrid-MELAu: A Hybrid Mixing Engineered Linguistic Features Framework Based on Autoencoder for Social Bot Detection.

Abstract

Social bots are defined as computer algorithms that generate massive amounts of obnoxious or meaningful information. Most bot detection methods leverage multitudinous characteristics, from network features, temporal dynamics features, activities features, and sentiment features. However, there has been fairly lower work exploring lexicon measurement and linguistic indicators to detect bots. The main purpose of this research is to recognize the social bots through their writing style. Thus, we carried out an exploratory study on the effectiveness of only a set of linguistic features (17 features) ex- ploitable for bot detection, without the need to resort to other types of features. And we develop a novel framework in a hybrid fashion of Mixing Engineered Linguistic features based on Autoencoders (Hybrid-MELAu). The semi-supervised Hybrid-MELAu frame- work is composed of two essential constituents: the features learner and the predictors. We establish the features learner innovated on two powerful structures: a) the first is a Deep dense Autoencoder fed by the Lexical and the Syntactic content (DALS) that represents the high order lexical and syntactic features in latent space, b) the second one is a Glove-BiLSTM autoencoder, which sculpts the semantic features; subsequently, we generate elite elements from the pre-trained encoder part from each latent space with transfer learning. We consider a sample of 1 Million from Cresci datasets to conduct our linguistic analysis comparison between the writing style of humans and bots. With this dataset, we observe that the bot’s textual lexical diversity median is greater than the human one and the syntactic analysis based on speech-tagging shows a creative behavior in human writing style. Finally, we test the model’s robustness on several public dataset (celebrity, pronbots-2019, and political bots). The proposed framework achieves a good accuracy of 92.22%. Overall, the results shown in this paper, and the related discussion, argue that it is possible to discern the differences between humans’ and bots’ writing styles based on an efficient linguistic deep framework.

References

Authors

Zineb Ferhat Hamida Department of Computer Science, University of Sétif 1, Sétif, Algeria
Allaoua Refouf Department of Computer Science, University of Sétif 1, Sétif, Algeria
Ahlem Drif Department of Computer Science, University of Sétif 1, Sétif, Algeria
Silvia Giordano Networking Lab, SUPSI University of Applied Sciences of Southern Switzerland Lugano, Switzerland

DOI:

https://doi.org/10.31449/inf.v46i6.4081

Downloads

Published

09/05/2022

Issue

Vol. 46 No. 6 (2022): Online-only issue

Section

Online-only

License

Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.

All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.

Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.

How to Cite

Hybrid-MELAu: A Hybrid Mixing Engineered Linguistic Features Framework Based on Autoencoder for Social Bot Detection. (2022). Informatica, 46(6). https://doi.org/10.31449/inf.v46i6.4081

Download Citation