Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-res...
| Main Authors: | , |
|---|---|
| Format: | Proceeding |
| Language: | English |
| Published: |
2013
|
| Subjects: | |
| Online Access: | http://ir.unimas.my/id/eprint/8876/ http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf |
| _version_ | 1848836462579023872 |
|---|---|
| author | Juan, Sarah Samson Besacier, Laurent |
| author_facet | Juan, Sarah Samson Besacier, Laurent |
| author_sort | Juan, Sarah Samson |
| building | UNIMAS Institutional Repository |
| collection | Online Access |
| description | This paper deals with the fast bootstrapping
of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition
(ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban - spoken in Sarawak and in several parts of the Borneo Island) for which no resource nor knowledge is really available. More precisely, a pre-existing Malay G2P is used to produce phoneme sequences of Iban words. The phonemes are then manually post-edited (corrected)
by an Iban native. This resource, which has been produced in a semi-supervised fashion, is later used to train the first G2P system for Iban language. As a by-product of this methodology, the analysis of the “pronunciation distance” between Malay and Iban enlighten the phonological and orthographic relations between these two
languages. The experiments conducted show that a rather efficient Iban G2P system can be obtained after only two hours of post-edition (correction) of the output of Malay G2P applied to Iban words. |
| first_indexed | 2025-11-15T06:24:09Z |
| format | Proceeding |
| id | unimas-8876 |
| institution | Universiti Malaysia Sarawak |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-15T06:24:09Z |
| publishDate | 2013 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | unimas-88762015-10-16T01:10:04Z http://ir.unimas.my/id/eprint/8876/ Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language Juan, Sarah Samson Besacier, Laurent QA75 Electronic computers. Computer science T Technology (General) This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban - spoken in Sarawak and in several parts of the Borneo Island) for which no resource nor knowledge is really available. More precisely, a pre-existing Malay G2P is used to produce phoneme sequences of Iban words. The phonemes are then manually post-edited (corrected) by an Iban native. This resource, which has been produced in a semi-supervised fashion, is later used to train the first G2P system for Iban language. As a by-product of this methodology, the analysis of the “pronunciation distance” between Malay and Iban enlighten the phonological and orthographic relations between these two languages. The experiments conducted show that a rather efficient Iban G2P system can be obtained after only two hours of post-edition (correction) of the output of Malay G2P applied to Iban words. 2013-10 Proceeding PeerReviewed text en http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf Juan, Sarah Samson and Besacier, Laurent (2013) Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language. In: Proceedings of 4th Workshop on South and Southeast Asian Natural Language Processing 2013, Nagoya, Japan. http://www.aclweb.org/anthology/W13-4701 |
| spellingShingle | QA75 Electronic computers. Computer science T Technology (General) Juan, Sarah Samson Besacier, Laurent Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language |
| title | Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language |
| title_full | Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language |
| title_fullStr | Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language |
| title_full_unstemmed | Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language |
| title_short | Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language |
| title_sort | fast bootstrapping of grapheme to phoneme system for under-resourced languages - application to the iban language |
| topic | QA75 Electronic computers. Computer science T Technology (General) |
| url | http://ir.unimas.my/id/eprint/8876/ http://ir.unimas.my/id/eprint/8876/ http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf |