An application of Malay short-form word conversion using levenshtein distance

Bibliographic Details
Format: Restricted Document
_version_ 1860797495868653568
building INTELEK Repository
collection Online Access
collectionurl https://intelek.unisza.edu.my/intelek/pages/search.php?search=!collection407072
date 2024-08-27 15:40:33
format Restricted Document
id 12970
institution UniSZA
originalfilename 7277-01-FH02-FIK-20-43364.pdf
person HP
Hp
hp
recordtype oai_dc
resourceurl https://intelek.unisza.edu.my/intelek/pages/view.php?ref=12970
spelling 12970 https://intelek.unisza.edu.my/intelek/pages/view.php?ref=12970 https://intelek.unisza.edu.my/intelek/pages/search.php?search=!collection407072 Restricted Document Article Journal application/pdf 11 1.7 HP Hp hp Adobe Acrobat Pro DC 20.6.20042 2024-08-27 15:40:33 7277-01-FH02-FIK-20-43364.pdf UniSZA Private Access An application of Malay short-form word conversion using levenshtein distance Mathematical Sciences and Informatics Journal Formerly, short-form word was widely used in the field of journalism. However, nowadays, short-form word has been widely used by many people, especially in online communication. These short-form words trigger problems in the field of data mining, especially those involving online text processing. It leads to inaccurate result of text mining activities. On the other hand, only few works have investigated on Malay short-form word identification and conversion. Therefore, this work aims to develop an application that can identify and convert Malay short-form words into its’ full word. In order to develop this application, the short-form rules need to be carefully examined. The formal rules from Dewan Bahasa & Pustaka (DBP) are used as the primary reference for generating the short form word identification algorithm. While for the conversion algorithm, Levenshtein Distance (LD) is used to measure the similarity. The rule-based technique is also used as a complement to LD technique. As a result, 70.27% of the Malay short-form words have been correctly converted into their full words. The conversion rate is quite promising, and this work can be further strengthened by incorporating more rules into the algorithm. 1 1 34-42
spellingShingle An application of Malay short-form word conversion using levenshtein distance
summary Formerly, short-form word was widely used in the field of journalism. However, nowadays, short-form word has been widely used by many people, especially in online communication. These short-form words trigger problems in the field of data mining, especially those involving online text processing. It leads to inaccurate result of text mining activities. On the other hand, only few works have investigated on Malay short-form word identification and conversion. Therefore, this work aims to develop an application that can identify and convert Malay short-form words into its’ full word. In order to develop this application, the short-form rules need to be carefully examined. The formal rules from Dewan Bahasa & Pustaka (DBP) are used as the primary reference for generating the short form word identification algorithm. While for the conversion algorithm, Levenshtein Distance (LD) is used to measure the similarity. The rule-based technique is also used as a complement to LD technique. As a result, 70.27% of the Malay short-form words have been correctly converted into their full words. The conversion rate is quite promising, and this work can be further strengthened by incorporating more rules into the algorithm.
title An application of Malay short-form word conversion using levenshtein distance
title_full An application of Malay short-form word conversion using levenshtein distance
title_fullStr An application of Malay short-form word conversion using levenshtein distance
title_full_unstemmed An application of Malay short-form word conversion using levenshtein distance
title_short An application of Malay short-form word conversion using levenshtein distance
title_sort application of malay short-form word conversion using levenshtein distance