A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium

The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This rese...

Full description

Bibliographic Details
Main Authors: Abu Seman, Muhamad Sadry, Wan Mamat, Wan Ali @ Wan Yusoff, Noordin, Mohamad Fauzan, Othman, Roslina
Format: Monograph
Language:English
Published: 2019
Subjects:
Online Access:http://irep.iium.edu.my/73052/
http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf
_version_ 1848787729409638400
author Abu Seman, Muhamad Sadry
Wan Mamat, Wan Ali @ Wan Yusoff
Noordin, Mohamad Fauzan
Othman, Roslina
author_facet Abu Seman, Muhamad Sadry
Wan Mamat, Wan Ali @ Wan Yusoff
Noordin, Mohamad Fauzan
Othman, Roslina
author_sort Abu Seman, Muhamad Sadry
building IIUM Repository
collection Online Access
description The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This research assesses the quality scores of utilizing a prevalent statistical model, Statistical Model Transliteration (SMT) for jawi-roman transliteration. This research utilizes exploratory approach. The data used were extracted from 3 Malay manuscripts: Bidāyat al-Mubtadī bi-Faḍlillāh al-Muhdī, Kashf al-Asrār and Hujjat al-Balighah, acquired from ISTAC with a total of 3,420 rows of data transliterated into old jawi, modern jawi and roman form. Quality scores of Bilingual Evaluation Understudy (BLEU) score and word error rate are used for evaluation of SMT output. The findings show that E-Jawi.net word error rate for old jawi-roman is 55.8% error while modern jawi-roman is 32.42% on the initial data. Hence, the research opted for human expert to develop a quality corpus for SMT consisting of multiple transliterations of the manuscript contents in modern jawi and roman. Significantly, the model is dependable on a quality parallel corpus.
first_indexed 2025-11-14T17:29:33Z
format Monograph
id iium-73052
institution International Islamic University Malaysia
institution_category Local University
language English
last_indexed 2025-11-14T17:29:33Z
publishDate 2019
recordtype eprints
repository_type Digital Repository
spelling iium-730522019-12-01T03:57:12Z http://irep.iium.edu.my/73052/ A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina T Technology (General) Z665 Library Science. Information Science ZA4450 Databases The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This research assesses the quality scores of utilizing a prevalent statistical model, Statistical Model Transliteration (SMT) for jawi-roman transliteration. This research utilizes exploratory approach. The data used were extracted from 3 Malay manuscripts: Bidāyat al-Mubtadī bi-Faḍlillāh al-Muhdī, Kashf al-Asrār and Hujjat al-Balighah, acquired from ISTAC with a total of 3,420 rows of data transliterated into old jawi, modern jawi and roman form. Quality scores of Bilingual Evaluation Understudy (BLEU) score and word error rate are used for evaluation of SMT output. The findings show that E-Jawi.net word error rate for old jawi-roman is 55.8% error while modern jawi-roman is 32.42% on the initial data. Hence, the research opted for human expert to develop a quality corpus for SMT consisting of multiple transliterations of the manuscript contents in modern jawi and roman. Significantly, the model is dependable on a quality parallel corpus. 2019-07-01 Monograph NonPeerReviewed application/pdf en http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf Abu Seman, Muhamad Sadry and Wan Mamat, Wan Ali @ Wan Yusoff and Noordin, Mohamad Fauzan and Othman, Roslina (2019) A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium. Research Report. UNSPECIFIED. (Unpublished)
spellingShingle T Technology (General)
Z665 Library Science. Information Science
ZA4450 Databases
Abu Seman, Muhamad Sadry
Wan Mamat, Wan Ali @ Wan Yusoff
Noordin, Mohamad Fauzan
Othman, Roslina
A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_full A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_fullStr A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_full_unstemmed A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_short A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_sort model for islamic istilahnet in malay manuscripts for big data analytics and linguistics consortium
topic T Technology (General)
Z665 Library Science. Information Science
ZA4450 Databases
url http://irep.iium.edu.my/73052/
http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf