Recognize basic emotional statesin speech by machine learning techniques using mel-frequency cepstral coefficient features

[thumbnail of ESR_for_home_AI.pdf]
Preview
Text - Accepted Version
· Please see our End User Agreement before downloading.
| Preview

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Yang, N., Dey, N., Sherratt, R. S. orcid id iconORCID: https://orcid.org/0000-0001-7899-4445 and Shi, F. (2020) Recognize basic emotional statesin speech by machine learning techniques using mel-frequency cepstral coefficient features. Journal of Intelligent & Fuzzy Systems, 39 (2). pp. 1925-1936. ISSN 1875-8967 doi: 10.3233/JIFS-179963

Abstract/Summary

Speech Emotion Recognition (SER) has been widely used in many fields, such as smart home assistants commonly found in the market. Smart home assistants that could detect the user’s emotion would improve the communication between a user and the assistant enabling the assistant to offer more productive feedback. Thus, the aim of this work is to analyze emotional states in speech and propose a suitable algorithm considering performance verses complexity for deployment in smart home devices. The four emotional speech sets were selected from the Berlin Emotional Database (EMO-DB) as experimental data, 26 MFCC features were extracted from each type of emotional speech to identify the emotions of happiness, anger, sadness and neutrality. Then, speaker-independent experiments for our Speech emotion Recognition (SER) were conducted by using the Back Propagation Neural Network (BPNN), Extreme Learning Machine (ELM), Probabilistic Neural Network (PNN) and Support Vector Machine (SVM). Synthesizing the recognition accuracy and processing time, this work shows that the performance of SVM was the best among the four methods as a good candidate to be deployed for SER in smart home devices. SVM achieved an overall accuracy of 92.4% while offering low computational requirements when training and testing. We conclude that the MFCC features and the SVM classification models used in speaker-independent experiments are highly effective in the automatic prediction of emotion.

Altmetric Badge

Item Type Article
URI https://reading-clone.eprints-hosting.org/id/eprint/88046
Identification Number/DOI 10.3233/JIFS-179963
Refereed Yes
Divisions Life Sciences > School of Biological Sciences > Biomedical Sciences
Life Sciences > School of Biological Sciences > Department of Bio-Engineering
Publisher IOS Press
Publisher Statement Special issue: Applied Machine Learning & Management of Volatility, Uncertainty, Complexity and Ambiguity
Download/View statistics View download statistics for this item

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Search Google Scholar