Machine Learning Approach for Emotion Recognition in Speech
speech. The approach consists of three steps. First, numerical features are extracted from the sound
database by using audio feature extractor. Then, feature selection method is used to select the most
relevant features. Finally, a machine learning model is trained to recognize seven universal emotions:
anger, fear, sadness, happiness, boredom, disgust and neutral. A thorough ML experimental analysis is
performed for each step. The results showed that 300 (out of 1582) features, as ranked by the gain ratio,
are sufficient for achieving 86% accuracy when evaluated with 10 fold cross-validation. SVM achieved
the highest accuracy when compared to KNN and Naive Bayes. We additionally compared the accuracy
of the standard SVM (with default parameters) and the one enhanced by Auto-WEKA (optimized
algorithm parameters) using the leave-one-speaker-out technique. The results showed that the SVM
enhanced with Auto-WEKA achieved significantly better accuracy than the standard SVM, i.e., 73% and
77% respectively. Finally, the results achieved with the 10 fold cross-validation are comparable and
similar to the ones achieved by a human, i.e., 86% accuracy in both cases. Even more, low energy
emotions (boredom, sadness and disgust) are better recognized by our machine learning approach
compared to the human.
This work is licensed under a Creative Commons Attribution 3.0 License.