Sentence-based alignment for parallel text corpora preparation for machine translation.
In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large nu...
| Main Author: | |
|---|---|
| Format: | Final Year Project / Dissertation / Thesis |
| Published: |
2021
|
| Subjects: | |
| Online Access: | http://eprints.utar.edu.my/4261/ http://eprints.utar.edu.my/4261/1/17ACB04464_FYP.pdf |
| _version_ | 1848886112435568640 |
|---|---|
| author | Lee, Yong Wei |
| author_facet | Lee, Yong Wei |
| author_sort | Lee, Yong Wei |
| building | UTAR Institutional Repository |
| collection | Online Access |
| description | In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large number of texts compared to human translators. With the aids of machine translator, it definitely saves a lot of our times. Besides, it is also cheaper than using a human translator. In machine translation, parallel corpus plays a significant role as a resource for translation training and language teaching. A good quality of parallel corpus will greatly increase the accuracy of the machine translation. Hence, sentence-based alignment for parallel text corpora plays an important role in helping NLP especially for machine translation. However, there are limited resources on parallel corpus for some selected source language and target language. Furthermore, the accuracy of machine translation on some target languages is still low. Therefore, an approach of generating parallel corpus on source language and target language is proposed. In this study, parallel corpus of English (source language) and Malay (target language) are collected. Besides, a machine translation is developed using recurrent neural network (RNN) model of neural network translation. An accuracy of training with 0.9 is obtained from the model. Besides, the translated Malay text achieved BLEU score of 0.65 which is considered a good score. |
| first_indexed | 2025-11-15T19:33:19Z |
| format | Final Year Project / Dissertation / Thesis |
| id | utar-4261 |
| institution | Universiti Tunku Abdul Rahman |
| institution_category | Local University |
| last_indexed | 2025-11-15T19:33:19Z |
| publishDate | 2021 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | utar-42612022-03-09T13:04:36Z Sentence-based alignment for parallel text corpora preparation for machine translation. Lee, Yong Wei QA75 Electronic computers. Computer science T Technology (General) In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large number of texts compared to human translators. With the aids of machine translator, it definitely saves a lot of our times. Besides, it is also cheaper than using a human translator. In machine translation, parallel corpus plays a significant role as a resource for translation training and language teaching. A good quality of parallel corpus will greatly increase the accuracy of the machine translation. Hence, sentence-based alignment for parallel text corpora plays an important role in helping NLP especially for machine translation. However, there are limited resources on parallel corpus for some selected source language and target language. Furthermore, the accuracy of machine translation on some target languages is still low. Therefore, an approach of generating parallel corpus on source language and target language is proposed. In this study, parallel corpus of English (source language) and Malay (target language) are collected. Besides, a machine translation is developed using recurrent neural network (RNN) model of neural network translation. An accuracy of training with 0.9 is obtained from the model. Besides, the translated Malay text achieved BLEU score of 0.65 which is considered a good score. 2021-04-15 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/4261/1/17ACB04464_FYP.pdf Lee, Yong Wei (2021) Sentence-based alignment for parallel text corpora preparation for machine translation. Final Year Project, UTAR. http://eprints.utar.edu.my/4261/ |
| spellingShingle | QA75 Electronic computers. Computer science T Technology (General) Lee, Yong Wei Sentence-based alignment for parallel text corpora preparation for machine translation. |
| title | Sentence-based alignment for parallel text corpora preparation for machine translation. |
| title_full | Sentence-based alignment for parallel text corpora preparation for machine translation. |
| title_fullStr | Sentence-based alignment for parallel text corpora preparation for machine translation. |
| title_full_unstemmed | Sentence-based alignment for parallel text corpora preparation for machine translation. |
| title_short | Sentence-based alignment for parallel text corpora preparation for machine translation. |
| title_sort | sentence-based alignment for parallel text corpora preparation for machine translation. |
| topic | QA75 Electronic computers. Computer science T Technology (General) |
| url | http://eprints.utar.edu.my/4261/ http://eprints.utar.edu.my/4261/1/17ACB04464_FYP.pdf |