Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce

Nowadays, growing expansion of data content on the web delivers a huge amount of collective resources. Twitter, one of the biggest social media site collects tweets in millions every day in the range of Petabyte per year. Societies share their experiences, thoughts or simply talk just about wh...

Full description

Bibliographic Details
Main Author: Busu, Norzaharawani
Format: Thesis
Language:English
Published: 2017
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/67852/
http://psasir.upm.edu.my/id/eprint/67852/1/FSKTM%202017%2024%20IR.pdf
_version_ 1848855963211137024
author Busu, Norzaharawani
author_facet Busu, Norzaharawani
author_sort Busu, Norzaharawani
building UPM Institutional Repository
collection Online Access
description Nowadays, growing expansion of data content on the web delivers a huge amount of collective resources. Twitter, one of the biggest social media site collects tweets in millions every day in the range of Petabyte per year. Societies share their experiences, thoughts or simply talk just about whatever concerns them online. Unstructured big data in social media plays vital roles in sentiment analysis or also known as opinion mining. Continuous structured and unstructured data are being generated in a large scale every day. These data are meaningless if they are not being captured and analyzed accordingly. Traditional RDBMS technology becomes less reliable when dealing with huge amount of structured data and the processing speed of data becomes sluggish if the infrastructure is not being upgraded to match the big amount of data. Furthermore, RDBMS is not capable to deal with unstructured data. Due to petabytes of records are generated every year on the net, capturing and analyzing big data can be challenging and cloud computing technologies are able to provide an on-demand infrastructures and services based on user requirements. Therefore, this thesis aims to use cloud based infrastructure which is Amazon Web Service to capture unstructured of big data, and afterward analyzing, visualizing and extracting useful information from large, diverse, distributed and mixed of data gathered from public data sets and Twitter’s Application Programming Interface (API). The results and explanation on the experiments mentioned in the chapter four; show the test bed result on collecting twitter data, test bed result on processing twitter input data and test bed result on output data. The analysis emphasizes on the elapsed time when collecting twitter data and also the performance of Amazon Elastic MapReduce (EMR). The infrastructures provided by Amazon Web Service are proficient enough to captured and manipulated large volume of unstructured big data on twitter. Afterward, this study have tested the capability of Amazon Elastic MapReduce (EMR) to process the input twitter data that had collected earlier, and transform them into a meaningful output that can be used for any decision making.
first_indexed 2025-11-15T11:34:06Z
format Thesis
id upm-67852
institution Universiti Putra Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T11:34:06Z
publishDate 2017
recordtype eprints
repository_type Digital Repository
spelling upm-678522019-03-28T07:07:39Z http://psasir.upm.edu.my/id/eprint/67852/ Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce Busu, Norzaharawani Nowadays, growing expansion of data content on the web delivers a huge amount of collective resources. Twitter, one of the biggest social media site collects tweets in millions every day in the range of Petabyte per year. Societies share their experiences, thoughts or simply talk just about whatever concerns them online. Unstructured big data in social media plays vital roles in sentiment analysis or also known as opinion mining. Continuous structured and unstructured data are being generated in a large scale every day. These data are meaningless if they are not being captured and analyzed accordingly. Traditional RDBMS technology becomes less reliable when dealing with huge amount of structured data and the processing speed of data becomes sluggish if the infrastructure is not being upgraded to match the big amount of data. Furthermore, RDBMS is not capable to deal with unstructured data. Due to petabytes of records are generated every year on the net, capturing and analyzing big data can be challenging and cloud computing technologies are able to provide an on-demand infrastructures and services based on user requirements. Therefore, this thesis aims to use cloud based infrastructure which is Amazon Web Service to capture unstructured of big data, and afterward analyzing, visualizing and extracting useful information from large, diverse, distributed and mixed of data gathered from public data sets and Twitter’s Application Programming Interface (API). The results and explanation on the experiments mentioned in the chapter four; show the test bed result on collecting twitter data, test bed result on processing twitter input data and test bed result on output data. The analysis emphasizes on the elapsed time when collecting twitter data and also the performance of Amazon Elastic MapReduce (EMR). The infrastructures provided by Amazon Web Service are proficient enough to captured and manipulated large volume of unstructured big data on twitter. Afterward, this study have tested the capability of Amazon Elastic MapReduce (EMR) to process the input twitter data that had collected earlier, and transform them into a meaningful output that can be used for any decision making. 2017-01 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/67852/1/FSKTM%202017%2024%20IR.pdf Busu, Norzaharawani (2017) Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce. Masters thesis, Universiti Putra Malaysia. Cloud computing - Data processing Big data
spellingShingle Cloud computing - Data processing
Big data
Busu, Norzaharawani
Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
title Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
title_full Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
title_fullStr Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
title_full_unstemmed Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
title_short Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
title_sort unstructured big data processing in cloud computing environment by using amazon elastic map reduce
topic Cloud computing - Data processing
Big data
url http://psasir.upm.edu.my/id/eprint/67852/
http://psasir.upm.edu.my/id/eprint/67852/1/FSKTM%202017%2024%20IR.pdf