An automated web scraping tool for Malaysia tourism

This project is a web scraper design project for Malaysia tourism data. Data are the essential element of the data analytics process, but most public tourism data on the Internet have been overlooked for its value due to the process to collect data is very time-consuming and difficult. Therefore, th...

Full description

Bibliographic Details
Main Author: Choong, Wei Jen
Format: Final Year Project / Dissertation / Thesis
Published: 2019
Subjects:
Online Access:http://eprints.utar.edu.my/3493/
http://eprints.utar.edu.my/3493/1/CS%2D2019%2D1505499.pdf
_version_ 1848885915403943936
author Choong, Wei Jen
author_facet Choong, Wei Jen
author_sort Choong, Wei Jen
building UTAR Institutional Repository
collection Online Access
description This project is a web scraper design project for Malaysia tourism data. Data are the essential element of the data analytics process, but most public tourism data on the Internet have been overlooked for its value due to the process to collect data is very time-consuming and difficult. Therefore, this project is motivated to provide a low-cost and simple solution for collecting public tourism data on the Internet. Insights will be offered to those who want to build their own web scraper on the methodology, concept, and design through the realization of this project. As for the technical part, agile System Development Life Cycle (SDLC) methodology is being adopted throughout this project. Emphasize of this project has been placed on capturing the public tourism data from the travel website by targeting the HTML code structure of that particular website. Thus, this project will be demonstrating how to interpret the HTML code structure of a website and how to locate targeted element for data extraction through HTML locator. Besides, this project will discuss on the selection of the most suitable programming language, libraries, tools and frameworks. As this project will be developed in Python, therefore the understanding on building a simple user interface using Python and the technique to save the extracted data into a csv file will be delivered as well. Furthermore, this project also covered some degree of data pre-processing because the extracted data attributes may have excessive text. A very important aspect in this project is to test the performance of the proposed system, therefore the most appropriate testing approach will also be surveyed and implemented on the system. Last but not least, a contingency plan regarding backup and recovery will also be discussed in case of event that system encountered errors. A web scraping system which is specifically designed for Malaysia tourism will be developed to ease the process of collecting tourism data and it could potentially bring the focus of tourism industries and government sector on the public tourism data for the improvement of Malaysia tourism.
first_indexed 2025-11-15T19:30:11Z
format Final Year Project / Dissertation / Thesis
id utar-3493
institution Universiti Tunku Abdul Rahman
institution_category Local University
last_indexed 2025-11-15T19:30:11Z
publishDate 2019
recordtype eprints
repository_type Digital Repository
spelling utar-34932019-08-20T04:16:07Z An automated web scraping tool for Malaysia tourism Choong, Wei Jen Q Science (General) This project is a web scraper design project for Malaysia tourism data. Data are the essential element of the data analytics process, but most public tourism data on the Internet have been overlooked for its value due to the process to collect data is very time-consuming and difficult. Therefore, this project is motivated to provide a low-cost and simple solution for collecting public tourism data on the Internet. Insights will be offered to those who want to build their own web scraper on the methodology, concept, and design through the realization of this project. As for the technical part, agile System Development Life Cycle (SDLC) methodology is being adopted throughout this project. Emphasize of this project has been placed on capturing the public tourism data from the travel website by targeting the HTML code structure of that particular website. Thus, this project will be demonstrating how to interpret the HTML code structure of a website and how to locate targeted element for data extraction through HTML locator. Besides, this project will discuss on the selection of the most suitable programming language, libraries, tools and frameworks. As this project will be developed in Python, therefore the understanding on building a simple user interface using Python and the technique to save the extracted data into a csv file will be delivered as well. Furthermore, this project also covered some degree of data pre-processing because the extracted data attributes may have excessive text. A very important aspect in this project is to test the performance of the proposed system, therefore the most appropriate testing approach will also be surveyed and implemented on the system. Last but not least, a contingency plan regarding backup and recovery will also be discussed in case of event that system encountered errors. A web scraping system which is specifically designed for Malaysia tourism will be developed to ease the process of collecting tourism data and it could potentially bring the focus of tourism industries and government sector on the public tourism data for the improvement of Malaysia tourism. 2019-04-23 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/3493/1/CS%2D2019%2D1505499.pdf Choong, Wei Jen (2019) An automated web scraping tool for Malaysia tourism. Final Year Project, UTAR. http://eprints.utar.edu.my/3493/
spellingShingle Q Science (General)
Choong, Wei Jen
An automated web scraping tool for Malaysia tourism
title An automated web scraping tool for Malaysia tourism
title_full An automated web scraping tool for Malaysia tourism
title_fullStr An automated web scraping tool for Malaysia tourism
title_full_unstemmed An automated web scraping tool for Malaysia tourism
title_short An automated web scraping tool for Malaysia tourism
title_sort automated web scraping tool for malaysia tourism
topic Q Science (General)
url http://eprints.utar.edu.my/3493/
http://eprints.utar.edu.my/3493/1/CS%2D2019%2D1505499.pdf