Semi-supervised G2P Bootstrapping and Its Application to ASR for a Very Under-resourced Language: Iban
This paper describes our experiments and results on using a local dominant language in Malaysia (Malay), to bootstrap automatic speech recognition (ASR) for a very under-resourced language: Iban (also spoken in Malaysia on the Borneo Island part). Resources in Iban for building a speech recognition...
| Main Authors: | , , |
|---|---|
| Format: | Proceeding |
| Language: | English |
| Published: |
2014
|
| Subjects: | |
| Online Access: | http://ir.unimas.my/id/eprint/8879/ http://ir.unimas.my/id/eprint/8879/1/sltu2014_sarah.pdf |
| Summary: | This paper describes our experiments and results on using
a local dominant language in Malaysia (Malay), to bootstrap automatic speech recognition (ASR) for a very under-resourced language: Iban (also spoken in Malaysia on the Borneo Island part). Resources in Iban for building a speech recognition were nonexistent. For this, we tried to take advantage of a language from the same family with several similarities. First, to deal with the pronunciation dictionary, we proposed a bootstrapping strategy to develop an Iban pronunciation lexicon from a Malay one. A hybrid version, mix of Malay and Iban pronunciations, was also built and evaluated. Following this, we experimented with three Iban ASRs; each depended on either one of the three different pronunciation dictionaries: Malay, Iban or hybrid. |
|---|