Sentence-based alignment for parallel text corpora preparation for machine translation.

In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large nu...

Full description

Bibliographic Details
Main Author: Lee, Yong Wei
Format: Final Year Project / Dissertation / Thesis
Published: 2021
Subjects:
Online Access:http://eprints.utar.edu.my/4261/
http://eprints.utar.edu.my/4261/1/17ACB04464_FYP.pdf
_version_ 1848886112435568640
author Lee, Yong Wei
author_facet Lee, Yong Wei
author_sort Lee, Yong Wei
building UTAR Institutional Repository
collection Online Access
description In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large number of texts compared to human translators. With the aids of machine translator, it definitely saves a lot of our times. Besides, it is also cheaper than using a human translator. In machine translation, parallel corpus plays a significant role as a resource for translation training and language teaching. A good quality of parallel corpus will greatly increase the accuracy of the machine translation. Hence, sentence-based alignment for parallel text corpora plays an important role in helping NLP especially for machine translation. However, there are limited resources on parallel corpus for some selected source language and target language. Furthermore, the accuracy of machine translation on some target languages is still low. Therefore, an approach of generating parallel corpus on source language and target language is proposed. In this study, parallel corpus of English (source language) and Malay (target language) are collected. Besides, a machine translation is developed using recurrent neural network (RNN) model of neural network translation. An accuracy of training with 0.9 is obtained from the model. Besides, the translated Malay text achieved BLEU score of 0.65 which is considered a good score.
first_indexed 2025-11-15T19:33:19Z
format Final Year Project / Dissertation / Thesis
id utar-4261
institution Universiti Tunku Abdul Rahman
institution_category Local University
last_indexed 2025-11-15T19:33:19Z
publishDate 2021
recordtype eprints
repository_type Digital Repository
spelling utar-42612022-03-09T13:04:36Z Sentence-based alignment for parallel text corpora preparation for machine translation. Lee, Yong Wei QA75 Electronic computers. Computer science T Technology (General) In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large number of texts compared to human translators. With the aids of machine translator, it definitely saves a lot of our times. Besides, it is also cheaper than using a human translator. In machine translation, parallel corpus plays a significant role as a resource for translation training and language teaching. A good quality of parallel corpus will greatly increase the accuracy of the machine translation. Hence, sentence-based alignment for parallel text corpora plays an important role in helping NLP especially for machine translation. However, there are limited resources on parallel corpus for some selected source language and target language. Furthermore, the accuracy of machine translation on some target languages is still low. Therefore, an approach of generating parallel corpus on source language and target language is proposed. In this study, parallel corpus of English (source language) and Malay (target language) are collected. Besides, a machine translation is developed using recurrent neural network (RNN) model of neural network translation. An accuracy of training with 0.9 is obtained from the model. Besides, the translated Malay text achieved BLEU score of 0.65 which is considered a good score. 2021-04-15 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/4261/1/17ACB04464_FYP.pdf Lee, Yong Wei (2021) Sentence-based alignment for parallel text corpora preparation for machine translation. Final Year Project, UTAR. http://eprints.utar.edu.my/4261/
spellingShingle QA75 Electronic computers. Computer science
T Technology (General)
Lee, Yong Wei
Sentence-based alignment for parallel text corpora preparation for machine translation.
title Sentence-based alignment for parallel text corpora preparation for machine translation.
title_full Sentence-based alignment for parallel text corpora preparation for machine translation.
title_fullStr Sentence-based alignment for parallel text corpora preparation for machine translation.
title_full_unstemmed Sentence-based alignment for parallel text corpora preparation for machine translation.
title_short Sentence-based alignment for parallel text corpora preparation for machine translation.
title_sort sentence-based alignment for parallel text corpora preparation for machine translation.
topic QA75 Electronic computers. Computer science
T Technology (General)
url http://eprints.utar.edu.my/4261/
http://eprints.utar.edu.my/4261/1/17ACB04464_FYP.pdf