Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-res...
| Main Authors: | , |
|---|---|
| Format: | Proceeding |
| Language: | English |
| Published: |
2013
|
| Subjects: | |
| Online Access: | http://ir.unimas.my/id/eprint/8876/ http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf |
| Summary: | This paper deals with the fast bootstrapping
of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition
(ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban - spoken in Sarawak and in several parts of the Borneo Island) for which no resource nor knowledge is really available. More precisely, a pre-existing Malay G2P is used to produce phoneme sequences of Iban words. The phonemes are then manually post-edited (corrected)
by an Iban native. This resource, which has been produced in a semi-supervised fashion, is later used to train the first G2P system for Iban language. As a by-product of this methodology, the analysis of the “pronunciation distance” between Malay and Iban enlighten the phonological and orthographic relations between these two
languages. The experiments conducted show that a rather efficient Iban G2P system can be obtained after only two hours of post-edition (correction) of the output of Malay G2P applied to Iban words. |
|---|