Spoken Arabic digits recognition using deep learning / AbdulAziz Saleh Mahfoudh Ba Wazir

The dissertation proposes an Arabic digits speech recognition model utilizing recurrent neural network. Speech Recognition model select the finest speech signal representation by feature extraction of Mel-Frequency Cepstrum Coefficients (MFCCs) after been processed for noise reduction and digits sep...

Full description

Bibliographic Details
Main Author: Abdul Aziz Saleh, Mahfoudh Ba Wazir
Format: Thesis
Published: 2018
Subjects:
Online Access:http://studentsrepo.um.edu.my/9521/
http://studentsrepo.um.edu.my/9521/1/AbdulAziz_Saleh_Mahfoudh_Ba_Wazir.jpg
http://studentsrepo.um.edu.my/9521/11/abdulaziz.pdf
Description
Summary:The dissertation proposes an Arabic digits speech recognition model utilizing recurrent neural network. Speech Recognition model select the finest speech signal representation by feature extraction of Mel-Frequency Cepstrum Coefficients (MFCCs) after been processed for noise reduction and digits seperation. Digit speeches extracted features are fed into a network with long short-term memory (LSTM) cells. The LSTM cells have the capability to solve problems associated with temporal dependencies and require learning long-term and solve the vanishing gradient problems associated with RNN. A dataset of 1040 samples of spoken Arabic digits from different dialects is used in this study where 840 samples used to train the network and another 200 samples are used for testing purpose. The model training is carried out using GPU. The LSTM model learning parameters is tuned for optimization purpose to achieve higher accuracy of 94% during model training. The testing results of the finest tuned parameters model shows that the LSTM model is 69% accurate in recognizing spoken Arabic digits samples. Model highest accuracy obtained when recognizing the digit zero with 80%.