Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation

Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e.A-Z) identical to English language. The written language uses the character set as building blocks to...

Full description

Bibliographic Details
Main Authors: Shah, Asadullah, Saidin, Aznan Zuhid, Taha Alshaikhli, Imad Fakhri, Zeki, Akram M.
Format: Proceeding Paper
Language:English
Published: 2011
Subjects:
Online Access:http://irep.iium.edu.my/2933/
http://irep.iium.edu.my/2933/1/Poster-asadullah_aznan.ppt
_version_ 1848776075677532160
author Shah, Asadullah
Saidin, Aznan Zuhid
Taha Alshaikhli, Imad Fakhri
Zeki, Akram M.
author_facet Shah, Asadullah
Saidin, Aznan Zuhid
Taha Alshaikhli, Imad Fakhri
Zeki, Akram M.
author_sort Shah, Asadullah
building IIUM Repository
collection Online Access
description Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e.A-Z) identical to English language. The written language uses the character set as building blocks to build word, sentences and phrases along with special punctuations and signs to create documents of interest. In this paper, results of preliminary investigation of Malay text documents are provided. For this purpose scanning of articles written upon various topics in Malay were carried out. Approximately 31 thousand characters from different articles are scanned. Preliminary observations indicate that on average, character “A” occurs 19%, character “N” occur 10%, character “E” occur “9%”and character “I” 8% in text. However, it is also observed from the data that, these are the characters from over all set with highest frequencies of occurances and it is expected that during further investigation they will remain as higher frequency occurring characters. Furthermore, the results indicate that for Bahasa Melayu characters appearance in text is very close in character frequencies of Bahasa Indonesia, but having different appearance of characters than English language. The investigation also indicate that these two languages, Bahasa Melayu and Bahasa Indonesia share close phonetic structure but not English, though all three use same character set
first_indexed 2025-11-14T14:24:20Z
format Proceeding Paper
id iium-2933
institution International Islamic University Malaysia
institution_category Local University
language English
last_indexed 2025-11-14T14:24:20Z
publishDate 2011
recordtype eprints
repository_type Digital Repository
spelling iium-29332020-12-07T07:45:29Z http://irep.iium.edu.my/2933/ Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation Shah, Asadullah Saidin, Aznan Zuhid Taha Alshaikhli, Imad Fakhri Zeki, Akram M. PL Languages and literatures of Eastern Asia, Africa, Oceania PL5101 Malay Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e.A-Z) identical to English language. The written language uses the character set as building blocks to build word, sentences and phrases along with special punctuations and signs to create documents of interest. In this paper, results of preliminary investigation of Malay text documents are provided. For this purpose scanning of articles written upon various topics in Malay were carried out. Approximately 31 thousand characters from different articles are scanned. Preliminary observations indicate that on average, character “A” occurs 19%, character “N” occur 10%, character “E” occur “9%”and character “I” 8% in text. However, it is also observed from the data that, these are the characters from over all set with highest frequencies of occurances and it is expected that during further investigation they will remain as higher frequency occurring characters. Furthermore, the results indicate that for Bahasa Melayu characters appearance in text is very close in character frequencies of Bahasa Indonesia, but having different appearance of characters than English language. The investigation also indicate that these two languages, Bahasa Melayu and Bahasa Indonesia share close phonetic structure but not English, though all three use same character set 2011-07 Proceeding Paper PeerReviewed application/pdf en http://irep.iium.edu.my/2933/1/Poster-asadullah_aznan.ppt Shah, Asadullah and Saidin, Aznan Zuhid and Taha Alshaikhli, Imad Fakhri and Zeki, Akram M. (2011) Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation. In: 12th Conference of the Pacific Association for Computational Linguistics (PACLING 2011), 19 - 21 July 2011, IIUM. (Unpublished) http://kict.iium.edu.my/pacling/index.html
spellingShingle PL Languages and literatures of Eastern Asia, Africa, Oceania
PL5101 Malay
Shah, Asadullah
Saidin, Aznan Zuhid
Taha Alshaikhli, Imad Fakhri
Zeki, Akram M.
Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_full Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_fullStr Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_full_unstemmed Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_short Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_sort frequencies determination of characters for bahasa melayu: results of preliminary investigation
topic PL Languages and literatures of Eastern Asia, Africa, Oceania
PL5101 Malay
url http://irep.iium.edu.my/2933/
http://irep.iium.edu.my/2933/
http://irep.iium.edu.my/2933/1/Poster-asadullah_aznan.ppt