Using Closely-related Language to Build an ASR for a Very Under-resourced Language: Iban

This paper describes our work on automatic speech recognition system (ASR) for an under-resourced language, namely the Iban language, which is spoken in Sarawak, a Malaysian Borneo state. To begin this study, we collected 8 hours of speech data due to no resources yet for ASR concerning this lang...

Full description

Bibliographic Details
Main Authors: Juan, Sarah Samson, Besacier, Laurent, Lecouteux, Benjamin, Tan, Tien-Ping
Format: Proceeding
Language:English
Published: 2014
Subjects:
Online Access:http://ir.unimas.my/id/eprint/8881/
http://ir.unimas.my/id/eprint/8881/1/COCOSDA-sarahsamsonjuan.pdf
_version_ 1848836463388524544
author Juan, Sarah Samson
Besacier, Laurent
Lecouteux, Benjamin
Tan, Tien-Ping
author_facet Juan, Sarah Samson
Besacier, Laurent
Lecouteux, Benjamin
Tan, Tien-Ping
author_sort Juan, Sarah Samson
building UNIMAS Institutional Repository
collection Online Access
description This paper describes our work on automatic speech recognition system (ASR) for an under-resourced language, namely the Iban language, which is spoken in Sarawak, a Malaysian Borneo state. To begin this study, we collected 8 hours of speech data due to no resources yet for ASR concerning this language. Following the lack of resources, we employed bootstrapping techniques on a closely-related language to build the Iban system. For this case, we utilized Malay data to bootstrap the grapheme-to-phoneme system (G2P) for the target language. We also developed several G2Ps to acquire Iban pronunciation dictionaries, which were later evaluated on the Iban ASR for obtaining the best version. Subsequently, we conducted experiments on cross-lingual ASR by using subspace Gaussian Mixture Models (SGMM) where the shared parameters obtained in either monolingual or multilingual fashion. From our observations, using out-of-language data as source language provided lower WER when Iban data is very imited.
first_indexed 2025-11-15T06:24:10Z
format Proceeding
id unimas-8881
institution Universiti Malaysia Sarawak
institution_category Local University
language English
last_indexed 2025-11-15T06:24:10Z
publishDate 2014
recordtype eprints
repository_type Digital Repository
spelling unimas-88812015-10-16T01:22:09Z http://ir.unimas.my/id/eprint/8881/ Using Closely-related Language to Build an ASR for a Very Under-resourced Language: Iban Juan, Sarah Samson Besacier, Laurent Lecouteux, Benjamin Tan, Tien-Ping Q Science (General) QA75 Electronic computers. Computer science This paper describes our work on automatic speech recognition system (ASR) for an under-resourced language, namely the Iban language, which is spoken in Sarawak, a Malaysian Borneo state. To begin this study, we collected 8 hours of speech data due to no resources yet for ASR concerning this language. Following the lack of resources, we employed bootstrapping techniques on a closely-related language to build the Iban system. For this case, we utilized Malay data to bootstrap the grapheme-to-phoneme system (G2P) for the target language. We also developed several G2Ps to acquire Iban pronunciation dictionaries, which were later evaluated on the Iban ASR for obtaining the best version. Subsequently, we conducted experiments on cross-lingual ASR by using subspace Gaussian Mixture Models (SGMM) where the shared parameters obtained in either monolingual or multilingual fashion. From our observations, using out-of-language data as source language provided lower WER when Iban data is very imited. 2014-09 Proceeding PeerReviewed text en http://ir.unimas.my/id/eprint/8881/1/COCOSDA-sarahsamsonjuan.pdf Juan, Sarah Samson and Besacier, Laurent and Lecouteux, Benjamin and Tan, Tien-Ping (2014) Using Closely-related Language to Build an ASR for a Very Under-resourced Language: Iban. In: COCOSDA 2014, Phuket, Thailand.
spellingShingle Q Science (General)
QA75 Electronic computers. Computer science
Juan, Sarah Samson
Besacier, Laurent
Lecouteux, Benjamin
Tan, Tien-Ping
Using Closely-related Language to Build an ASR for a Very Under-resourced Language: Iban
title Using Closely-related Language to Build an ASR for a Very Under-resourced Language: Iban
title_full Using Closely-related Language to Build an ASR for a Very Under-resourced Language: Iban
title_fullStr Using Closely-related Language to Build an ASR for a Very Under-resourced Language: Iban
title_full_unstemmed Using Closely-related Language to Build an ASR for a Very Under-resourced Language: Iban
title_short Using Closely-related Language to Build an ASR for a Very Under-resourced Language: Iban
title_sort using closely-related language to build an asr for a very under-resourced language: iban
topic Q Science (General)
QA75 Electronic computers. Computer science
url http://ir.unimas.my/id/eprint/8881/
http://ir.unimas.my/id/eprint/8881/1/COCOSDA-sarahsamsonjuan.pdf