Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language

This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-res...

Full description

Bibliographic Details
Main Authors: Juan, Sarah Samson, Besacier, Laurent
Format: Proceeding
Language:English
Published: 2013
Subjects:
Online Access:http://ir.unimas.my/id/eprint/8876/
http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf
_version_ 1848836462579023872
author Juan, Sarah Samson
Besacier, Laurent
author_facet Juan, Sarah Samson
Besacier, Laurent
author_sort Juan, Sarah Samson
building UNIMAS Institutional Repository
collection Online Access
description This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban - spoken in Sarawak and in several parts of the Borneo Island) for which no resource nor knowledge is really available. More precisely, a pre-existing Malay G2P is used to produce phoneme sequences of Iban words. The phonemes are then manually post-edited (corrected) by an Iban native. This resource, which has been produced in a semi-supervised fashion, is later used to train the first G2P system for Iban language. As a by-product of this methodology, the analysis of the “pronunciation distance” between Malay and Iban enlighten the phonological and orthographic relations between these two languages. The experiments conducted show that a rather efficient Iban G2P system can be obtained after only two hours of post-edition (correction) of the output of Malay G2P applied to Iban words.
first_indexed 2025-11-15T06:24:09Z
format Proceeding
id unimas-8876
institution Universiti Malaysia Sarawak
institution_category Local University
language English
last_indexed 2025-11-15T06:24:09Z
publishDate 2013
recordtype eprints
repository_type Digital Repository
spelling unimas-88762015-10-16T01:10:04Z http://ir.unimas.my/id/eprint/8876/ Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language Juan, Sarah Samson Besacier, Laurent QA75 Electronic computers. Computer science T Technology (General) This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban - spoken in Sarawak and in several parts of the Borneo Island) for which no resource nor knowledge is really available. More precisely, a pre-existing Malay G2P is used to produce phoneme sequences of Iban words. The phonemes are then manually post-edited (corrected) by an Iban native. This resource, which has been produced in a semi-supervised fashion, is later used to train the first G2P system for Iban language. As a by-product of this methodology, the analysis of the “pronunciation distance” between Malay and Iban enlighten the phonological and orthographic relations between these two languages. The experiments conducted show that a rather efficient Iban G2P system can be obtained after only two hours of post-edition (correction) of the output of Malay G2P applied to Iban words. 2013-10 Proceeding PeerReviewed text en http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf Juan, Sarah Samson and Besacier, Laurent (2013) Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language. In: Proceedings of 4th Workshop on South and Southeast Asian Natural Language Processing 2013, Nagoya, Japan. http://www.aclweb.org/anthology/W13-4701
spellingShingle QA75 Electronic computers. Computer science
T Technology (General)
Juan, Sarah Samson
Besacier, Laurent
Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
title Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
title_full Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
title_fullStr Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
title_full_unstemmed Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
title_short Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
title_sort fast bootstrapping of grapheme to phoneme system for under-resourced languages - application to the iban language
topic QA75 Electronic computers. Computer science
T Technology (General)
url http://ir.unimas.my/id/eprint/8876/
http://ir.unimas.my/id/eprint/8876/
http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf