A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium

The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This rese...

Full description

Bibliographic Details
Main Authors:	Abu Seman, Muhamad Sadry, Wan Mamat, Wan Ali @ Wan Yusoff, Noordin, Mohamad Fauzan, Othman, Roslina
Format:	Monograph
Language:	English
Published:	2019
Subjects:	T Technology (General) Z665 Library Science. Information Science ZA4450 Databases
Online Access:	http://irep.iium.edu.my/73052/ http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf

_version_	1848787729409638400
author	Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina
author_facet	Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina
author_sort	Abu Seman, Muhamad Sadry
building	IIUM Repository
collection	Online Access
description	The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This research assesses the quality scores of utilizing a prevalent statistical model, Statistical Model Transliteration (SMT) for jawi-roman transliteration. This research utilizes exploratory approach. The data used were extracted from 3 Malay manuscripts: Bidāyat al-Mubtadī bi-Faḍlillāh al-Muhdī, Kashf al-Asrār and Hujjat al-Balighah, acquired from ISTAC with a total of 3,420 rows of data transliterated into old jawi, modern jawi and roman form. Quality scores of Bilingual Evaluation Understudy (BLEU) score and word error rate are used for evaluation of SMT output. The findings show that E-Jawi.net word error rate for old jawi-roman is 55.8% error while modern jawi-roman is 32.42% on the initial data. Hence, the research opted for human expert to develop a quality corpus for SMT consisting of multiple transliterations of the manuscript contents in modern jawi and roman. Significantly, the model is dependable on a quality parallel corpus.
first_indexed	2025-11-14T17:29:33Z
format	Monograph
id	iium-73052
institution	International Islamic University Malaysia
institution_category	Local University
language	English
last_indexed	2025-11-14T17:29:33Z
publishDate	2019
recordtype	eprints
repository_type	Digital Repository
spelling	iium-730522019-12-01T03:57:12Z http://irep.iium.edu.my/73052/ A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina T Technology (General) Z665 Library Science. Information Science ZA4450 Databases The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This research assesses the quality scores of utilizing a prevalent statistical model, Statistical Model Transliteration (SMT) for jawi-roman transliteration. This research utilizes exploratory approach. The data used were extracted from 3 Malay manuscripts: Bidāyat al-Mubtadī bi-Faḍlillāh al-Muhdī, Kashf al-Asrār and Hujjat al-Balighah, acquired from ISTAC with a total of 3,420 rows of data transliterated into old jawi, modern jawi and roman form. Quality scores of Bilingual Evaluation Understudy (BLEU) score and word error rate are used for evaluation of SMT output. The findings show that E-Jawi.net word error rate for old jawi-roman is 55.8% error while modern jawi-roman is 32.42% on the initial data. Hence, the research opted for human expert to develop a quality corpus for SMT consisting of multiple transliterations of the manuscript contents in modern jawi and roman. Significantly, the model is dependable on a quality parallel corpus. 2019-07-01 Monograph NonPeerReviewed application/pdf en http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf Abu Seman, Muhamad Sadry and Wan Mamat, Wan Ali @ Wan Yusoff and Noordin, Mohamad Fauzan and Othman, Roslina (2019) A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium. Research Report. UNSPECIFIED. (Unpublished)
spellingShingle	T Technology (General) Z665 Library Science. Information Science ZA4450 Databases Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title	A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_full	A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_fullStr	A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_full_unstemmed	A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_short	A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_sort	model for islamic istilahnet in malay manuscripts for big data analytics and linguistics consortium
topic	T Technology (General) Z665 Library Science. Information Science ZA4450 Databases
url	http://irep.iium.edu.my/73052/ http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf

A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium

Similar Items