Sentiment Analysis of Algerian Dialect Using Machine Learning and Deep Learning with Word2vec

Ahmed Cherif Mazari, Abdelhamid Djeffal

Abstract


In this paper, we deal with the issue of sentiment analysis on dialectal comments extracted from social media. These comments concern the Algerian spoken language, written in Arabic and/or Latin characters, which could be either Modern Standard Arabic, French or local dialect. This complexity gives rise to a large number of text processing issues.

The contributions of this work are fourfold. First, we build an Algerian dialect sentiment dataset of 11760 comments collecting from diverse social media platforms. Second, we also create Skip-Gram and CBOW model by word2vec from a corpus containing 466424 comments, these latter are used for enhancing the sentiment dataset by semantically similar words. Third, we propose an adapted preprocessing step set to deal with dialectal texts. Finally, we implement and conduct different machine learning classifiers (SVM, Naive Bayes via its three variants (Bernoulli NB, Gaussian NB and Multinomial NB)) and two deep learning architectures (CNN, RNN) to evaluate and compare the dataset in original version, in a transcribed to Latin character version and then in a semantically-enhanced version by word2vec models.

Experiments reach performances of sentiment classifiers applied on "dataset transcribed to Latin characters" of accuracies = (MNB:84.21%, CNN:64.11%) and on "transcribed dataset and enhanced by word2vec models" of accuracies = (SVM:83.70%, RNN:65.21%).

Full Text:

PDF

References


B. Liu, Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge university press, 2020.

L. Zhang, S. Wang, and B. Liu, “Deep Learning for Sentiment Analysis : A Survey,” Lang. Linguist. Compass, vol. 10, no. 12, pp. 701–719, Jan. 2018.

B. Agarwal, R. Nayak, N. Mittal, and S. Patnaik, Deep Learning-Based Approaches for Sentiment Analysis. Springer, 2020.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 1st Int. Conf. Learn. Represent. ICLR 2013 - Work. Track Proc., pp. 1–12, 2013.

A. M. Alayba, V. Palade, M. England, and R. Iqbal, “Improving Sentiment Analysis in Arabic Using Word Representation,” 2nd IEEE Int. Work. Arab. Deriv. Scr. Anal. Recognition, ASAR 2018, pp. 13–18, 2018.

C. Alfaro, J. Cano-Montero, J. Gómez, J. M. Moguerza, and F. Ortega, “A multi-stage method for content classification and opinion mining on weblog comments,” Ann. Oper. Res., vol. 236, no. 1, pp. 197–213, 2016.

O. Araque, I. Corcuera-Platas, J. F. Sánchez-Rada, and C. A. Iglesias, “Enhancing deep learning sentiment analysis with ensemble techniques in social applications,” Expert Syst. Appl., vol. 77, pp. 236–246, 2017.

M. Amjad, I. Voronkov, A. Saenko, and A. Gelbukh, “Comparison of text classification methods using deep learning neural networks,” in Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2019.

Y. Zhang, Z. Zhang, D. Miao, and J. Wang, “Three-way enhanced convolutional neural networks for sentence-level sentiment classification,” Inf. Sci. (Ny)., vol. 477, pp. 55–64, 2019.

O. Habimana, Y. Li, R. Li, X. Gu, and G. Yu, “Sentiment analysis using deep learning approaches: an overview,” Sci. China Inf. Sci., vol. 63, no. 1, p. 111102, 2019.

P. Ray and A. Chakrabarti, “A mixed approach of deep learning method and rule-based method to improve aspect level sentiment analysis,” Appl. Comput. Informatics, 2020.

A. Yadav and D. K. Vishwakarma, “Sentiment analysis using deep learning architectures: a review,” Artif. Intell. Rev., vol. 53, no. 6, pp. 4335–4385, 2020.

E. M. Alshari, A. Azman, S. Doraisamy, N. Mustapha, and M. Alkeshr, “Improvement of Sentiment Analysis Based on Clustering of Word2Vec Features,” in 2017 28th International Workshop on Database and Expert Systems Applications (DEXA), 2017, pp. 123–126.

J. Acosta, N. Lamaute, M. Luo, E. Finkelstein, and C. Andreea, “Sentiment analysis of twitter messages using word2vec,” Proc. Student-Faculty Res. Day, CSIS, Pace Univ., vol. 7, pp. 1–7, 2017.

Q. Chen and M. Sokolova, “Word2Vec and Doc2Vec in Unsupervised Sentiment Analysis of Clinical Discharge Summaries.,” CoRR, vol. 1805.00352. 2018.

B. Shi, J. Zhao, and K. Xu, “A Word2vec Model for Sentiment Analysis of Weibo,” in 2019 16th International Conference on Service Systems and Service Management (ICSSSM), 2019, pp. 1–6.

H. ElSahar and S. R. El-Beltagy, “Building large arabic multi-domain resources for sentiment analysis,” in International Conference on Intelligent Text Processing and Computational Linguistics, 2015, pp. 23–34.

A. Dahou, S. Xiong, J. Zhou, M. H. Haddoud, and P. Duan, “Word embeddings and convolutional neural network for arabic sentiment classification,” in Proceedings of coling 2016, the 26th international conference on computational linguistics: Technical papers, 2016, pp. 2418–2427.

M. Abdullah and M. Hadzikadic, “Sentiment analysis on arabic tweets: Challenges to dissecting the language,” in International Conference on Social Computing and Social Media, 2017, pp. 191–202.

S. Siddiqui, A. A. Monem, and K. Shaalan, “Evaluation and enrichment of Arabic sentiment analysis,” in Intelligent Natural Language Processing: Trends and Applications, Springer, 2018, pp. 17–34.

M. Al-Smadi, O. Qawasmeh, M. Al-Ayyoub, Y. Jararweh, and B. Gupta, “Deep recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews,” J. Comput. Sci., vol. 27, pp. 386–393, 2018.

M. Heikal, M. Torki, and N. El-Makky, “Sentiment analysis of Arabic Tweets using deep learning,” Procedia Comput. Sci., vol. 142, pp. 114–122, 2018.

I. Guellil, F. Azouaou, and F. Chiclana, “ArAutoSenti: automatic annotation and new tendencies for sentiment classification of Arabic messages,” Soc. Netw. Anal. Min., vol. 10, no. 1, p. 75, 2020.

A. Ghallab, A. Mohsen, and Y. Ali, “Arabic Sentiment Analysis: A Systematic Literature Review,” Appl. Comput. Intell. Soft Comput., vol. 2020, 2020.

K. Meftouh, N. Bouchemal, and K. Smaïli, “A study of a non-resourced language: an Algerian dialect,” in Spoken Language Technologies for Under-Resourced Languages, 2012.

H. Saadane and N. Habash, “A Conventional Orthography for Algerian Arabic,” in Proceedings of the Second Workshop on Arabic Natural Language Processing (ANLP), 2015, pp. 69–79.

M. Bettiche, M. Z. Mouffok, and C. Zakaria, “Opinion Mining in Social Networks for Algerian Dialect,” in International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, 2018, pp. 629–641.

A. Soumeur, M. Mokdadi, A. Guessoum, and A. Daoud, “Sentiment analysis of users on social networks: overcoming the challenge of the loose usages of the Algerian Dialect,” Procedia Comput. Sci., vol. 142, pp. 26–37, 2018.

L. Moudjari, K. Akli-Astouati, and F. Benamara, “An Algerian Corpus and an Annotation Platform for Opinion and Emotion Analysis,” in Proceedings of The 12th Language Resources and Evaluation Conference, 2020, pp. 1202–1210.

D. Holmes and M. C. McCabe, “Improving precision and recall for soundex retrieval,” in Proceedings. International Conference on Information Technology: Coding and Computing, 2002, pp. 22–26.

V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” in Soviet physics doklady, 1966, vol. 10, no. 8, pp. 707–710.




DOI: https://doi.org/10.31449/inf.v46i6.3340

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.