Speech Emotion Recognition Using Deep Learning

Authors

Nagaraj, Dhavan

Issue Date

2024

Degree

MSc in Artificial Intelligence

Publisher

Dublin Business School

Rights

Abstract

This study delves into the realm of emotion recognition in speech, employing advanced deep learning techniques to analyse and categorize emotions from audio data. The primary dataset used is Crema, a comprehensive collection of vocal expressions representing various emotions. The research involves preprocessing the audio data and extracting meaningful features, particularly Mel-Frequency Cepstral Coefficients (MFCCs) alongside x-vectors, which are crucial in understanding the tonal aspects of the speech. The processed data is then fed into two different neural network models: a Recurrent Neural Network (RNN) with SimpleRNN layers and a Long Short-Term Memory (LSTM) network. These models are trained and validated on the dataset to classify emotions into categories such as neutral, happy, sad, angry, fear, and disgust. The performance of these models is evaluated based on metrics like accuracy and F1 score. Results indicate a significant potential of deep learning in effectively recognizing and categorizing emotions in speech, though challenges in accuracy and model optimization persist. Keywords: Emotion Recognition, Speech Processing, Deep Learning, Neural Networks, MFCC, RNN, LSTM, Audio Data Analysis.