The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition
Human speech indirectly represents the mental state or emotion of others. The use of Artificial Intelligence (AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech. In this study, we introduced a robust method for emotion recognition from human speech using...
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Tech Science Press
2022
|
| Subjects: | |
| Online Access: | http://eprints.sunway.edu.my/2250/ http://eprints.sunway.edu.my/2250/1/28.pdf |
| _version_ | 1848802237769318400 |
|---|---|
| author | Uddin, Mohammad Amaz Chowdury, Mohammad Salah Uddin Khandaker, Mayeen Uddin * Tamam, Nissren Sulieman, Abdelmoneim |
| author_facet | Uddin, Mohammad Amaz Chowdury, Mohammad Salah Uddin Khandaker, Mayeen Uddin * Tamam, Nissren Sulieman, Abdelmoneim |
| author_sort | Uddin, Mohammad Amaz |
| building | SU Institutional Repository |
| collection | Online Access |
| description | Human speech indirectly represents the mental state or emotion of others. The use of Artificial Intelligence (AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech. In this study, we introduced a robust method for emotion recognition from human speech using a well-performed preprocessing technique together with the deep learning-based mixed model consisting of Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). About 2800 audio files were extracted from the Toronto emotional speech set (TESS) database for this study. A high pass and Savitzky Golay Filter have been used to obtain noise-free as well as smooth audio data. A total of seven types of emotions; Angry, Disgust, Fear, Happy, Neutral, Pleasant-surprise, and Sad were used in this study. Energy, Fundamental frequency, and Mel Frequency Cepstral Coefficient (MFCC) have been used to extract the emotion features, and these features resulted in 97.5% accuracy in the mixed LSTM+CNN model. This mixed model is found to be performed better than the usual state-of-the-art models in emotion recognition from speech. It also indicates that this mixed model could be effectively utilized in advanced research dealing with sound processing. |
| first_indexed | 2025-11-14T21:20:10Z |
| format | Article |
| id | sunway-2250 |
| institution | Sunway University |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-14T21:20:10Z |
| publishDate | 2022 |
| publisher | Tech Science Press |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | sunway-22502023-06-16T01:38:32Z http://eprints.sunway.edu.my/2250/ The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition Uddin, Mohammad Amaz Chowdury, Mohammad Salah Uddin Khandaker, Mayeen Uddin * Tamam, Nissren Sulieman, Abdelmoneim BF Psychology Q Science (General) TA Engineering (General). Civil engineering (General) Human speech indirectly represents the mental state or emotion of others. The use of Artificial Intelligence (AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech. In this study, we introduced a robust method for emotion recognition from human speech using a well-performed preprocessing technique together with the deep learning-based mixed model consisting of Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). About 2800 audio files were extracted from the Toronto emotional speech set (TESS) database for this study. A high pass and Savitzky Golay Filter have been used to obtain noise-free as well as smooth audio data. A total of seven types of emotions; Angry, Disgust, Fear, Happy, Neutral, Pleasant-surprise, and Sad were used in this study. Energy, Fundamental frequency, and Mel Frequency Cepstral Coefficient (MFCC) have been used to extract the emotion features, and these features resulted in 97.5% accuracy in the mixed LSTM+CNN model. This mixed model is found to be performed better than the usual state-of-the-art models in emotion recognition from speech. It also indicates that this mixed model could be effectively utilized in advanced research dealing with sound processing. Tech Science Press 2022-09-22 Article PeerReviewed text en cc_by_4 http://eprints.sunway.edu.my/2250/1/28.pdf Uddin, Mohammad Amaz and Chowdury, Mohammad Salah Uddin and Khandaker, Mayeen Uddin * and Tamam, Nissren and Sulieman, Abdelmoneim (2022) The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition. Computers, Materials & Continua, 74 (1). pp. 1709-1722. ISSN 1546-2226 https://doi.org/10.32604/cmc.2023.031177 10.32604/cmc.2023.031177 |
| spellingShingle | BF Psychology Q Science (General) TA Engineering (General). Civil engineering (General) Uddin, Mohammad Amaz Chowdury, Mohammad Salah Uddin Khandaker, Mayeen Uddin * Tamam, Nissren Sulieman, Abdelmoneim The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition |
| title | The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition |
| title_full | The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition |
| title_fullStr | The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition |
| title_full_unstemmed | The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition |
| title_short | The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition |
| title_sort | efficacy of deep learning-based mixed model for speech emotion recognition |
| topic | BF Psychology Q Science (General) TA Engineering (General). Civil engineering (General) |
| url | http://eprints.sunway.edu.my/2250/ http://eprints.sunway.edu.my/2250/ http://eprints.sunway.edu.my/2250/ http://eprints.sunway.edu.my/2250/1/28.pdf |