Testing Sphinx’s language model fault-tolerance for the Holy Quran

The Carnegie Mellon University’s (CMU) Sphinx framework is increasingly used for the Arabic speech recognition in general and applied to the Holy Quran in particular. Generating the language model includes a tedious task of preparing the transcriptions for all the data. In this paper, we investigat...

Full description

Bibliographic Details
Main Authors: El Amrani, Mohamed Yassine, Rahman, M.M. Hafizur, Wahiddin, Mohamed Ridza, Shah, Asadullah
Format: Proceeding Paper
Language:English
English
Published: The Institute of Electrical and Electronics Engineers, Inc. 2017
Subjects:
Online Access:http://irep.iium.edu.my/54937/
http://irep.iium.edu.my/54937/1/54893_A%20Practical%20and%20Interactive%20%20Web-based.pdf
http://irep.iium.edu.my/54937/12/54937_Testing%20Sphinx%E2%80%99s%20language_scopus.pdf
_version_ 1848784494325137408
author El Amrani, Mohamed Yassine
Rahman, M.M. Hafizur
Wahiddin, Mohamed Ridza
Shah, Asadullah
author_facet El Amrani, Mohamed Yassine
Rahman, M.M. Hafizur
Wahiddin, Mohamed Ridza
Shah, Asadullah
author_sort El Amrani, Mohamed Yassine
building IIUM Repository
collection Online Access
description The Carnegie Mellon University’s (CMU) Sphinx framework is increasingly used for the Arabic speech recognition in general and applied to the Holy Quran in particular. Generating the language model includes a tedious task of preparing the transcriptions for all the data. In this paper, we investigate the fault-tolerance of the automatically generated language model as compared to a corrected and uncorrected transcription with and without silence tagging. This editing addresses the different repetitions and pauses encountered during recitations. Experiments show that the average difference between the lowest and highest Word Error Rate (WER) for each configuration of the number of Senones is 0.6% when using all files for the training and 1.6% when using 80% of the files for training the language model of 17 chapters of the Holy Quran. Results show that the performance of trained models without any correction can be close to when all required rectifications of transcriptions are performed.
first_indexed 2025-11-14T16:38:08Z
format Proceeding Paper
id iium-54937
institution International Islamic University Malaysia
institution_category Local University
language English
English
last_indexed 2025-11-14T16:38:08Z
publishDate 2017
publisher The Institute of Electrical and Electronics Engineers, Inc.
recordtype eprints
repository_type Digital Repository
spelling iium-549372018-02-04T06:49:47Z http://irep.iium.edu.my/54937/ Testing Sphinx’s language model fault-tolerance for the Holy Quran El Amrani, Mohamed Yassine Rahman, M.M. Hafizur Wahiddin, Mohamed Ridza Shah, Asadullah TK7800 Electronics. Computer engineering. Computer hardware. Photoelectronic devices The Carnegie Mellon University’s (CMU) Sphinx framework is increasingly used for the Arabic speech recognition in general and applied to the Holy Quran in particular. Generating the language model includes a tedious task of preparing the transcriptions for all the data. In this paper, we investigate the fault-tolerance of the automatically generated language model as compared to a corrected and uncorrected transcription with and without silence tagging. This editing addresses the different repetitions and pauses encountered during recitations. Experiments show that the average difference between the lowest and highest Word Error Rate (WER) for each configuration of the number of Senones is 0.6% when using all files for the training and 1.6% when using 80% of the files for training the language model of 17 chapters of the Holy Quran. Results show that the performance of trained models without any correction can be close to when all required rectifications of transcriptions are performed. The Institute of Electrical and Electronics Engineers, Inc. 2017-01-16 Proceeding Paper PeerReviewed application/pdf en http://irep.iium.edu.my/54937/1/54893_A%20Practical%20and%20Interactive%20%20Web-based.pdf application/pdf en http://irep.iium.edu.my/54937/12/54937_Testing%20Sphinx%E2%80%99s%20language_scopus.pdf El Amrani, Mohamed Yassine and Rahman, M.M. Hafizur and Wahiddin, Mohamed Ridza and Shah, Asadullah (2017) Testing Sphinx’s language model fault-tolerance for the Holy Quran. In: 6th International Conference on Information and Communication Technology for the Muslim World (ICT4M 2016), 22nd-24th November 2016, Jakarta, Indonesia. http://ieeexplore.ieee.org/document/7814882/ 10.1109/ICT4M.2016.27
spellingShingle TK7800 Electronics. Computer engineering. Computer hardware. Photoelectronic devices
El Amrani, Mohamed Yassine
Rahman, M.M. Hafizur
Wahiddin, Mohamed Ridza
Shah, Asadullah
Testing Sphinx’s language model fault-tolerance for the Holy Quran
title Testing Sphinx’s language model fault-tolerance for the Holy Quran
title_full Testing Sphinx’s language model fault-tolerance for the Holy Quran
title_fullStr Testing Sphinx’s language model fault-tolerance for the Holy Quran
title_full_unstemmed Testing Sphinx’s language model fault-tolerance for the Holy Quran
title_short Testing Sphinx’s language model fault-tolerance for the Holy Quran
title_sort testing sphinx’s language model fault-tolerance for the holy quran
topic TK7800 Electronics. Computer engineering. Computer hardware. Photoelectronic devices
url http://irep.iium.edu.my/54937/
http://irep.iium.edu.my/54937/
http://irep.iium.edu.my/54937/
http://irep.iium.edu.my/54937/1/54893_A%20Practical%20and%20Interactive%20%20Web-based.pdf
http://irep.iium.edu.my/54937/12/54937_Testing%20Sphinx%E2%80%99s%20language_scopus.pdf