Big data challenges & opportunities for development using Hadoop 2.0 platform

Today data analytics has become one of the fast growing research topics in the computation field, thanks to the technological advancements in the past few years that lead to this deluge of data era. Consequently; processing such large volumes of data has become a very complicated job as the data kee...

Full description

Bibliographic Details
Main Author:	Hegazi, Abdel Rahman Farag
Format:	Dissertation (University of Nottingham only)
Language:	English
Published:	2014
Subjects:	Apache Hadoop MapReduce YARN Dynamic Allocation Big Data Cluster Data-sets. Data Warehouse Resource Allocation Data Analysis Cloud Computing
Online Access:	https://eprints.nottingham.ac.uk/30755/

_version_	1848794051548020736
author	Hegazi, Abdel Rahman Farag
author_facet	Hegazi, Abdel Rahman Farag
author_sort	Hegazi, Abdel Rahman Farag
building	Nottingham Research Data Repository
collection	Online Access
description	Today data analytics has become one of the fast growing research topics in the computation field, thanks to the technological advancements in the past few years that lead to this deluge of data era. Consequently; processing such large volumes of data has become a very complicated job as the data keeps growing continuously with enormous rates. Unfortunately, the traditional data analytics systems cannot process these large volumes of data due to the difficulty of managing the systems resources on these large scaled systems. Enhancing the overall system performance is also another issue as performance keeps degrading while the queue of the tasks waiting to be processed gets longer. The term “bigdata” was coined to describe this trend of fast growing data-sets. These obstacles add extra complex layers to the underlying platforms to perform the data acquisition, transmission, storage management, and these large-scale data processing mechanisms. In this project we address the possibility of having such a platform to process large data-sets volumes, with fair dynamic resources allocation scheme between all the different jobs running instead of the static allocation schemes that waste resources through unbalanced resources allocation and minimize the resources utilization. In addition, the system overall performance should remain the same if not better with no performance degrading as more running applications asking for additional resources. We present a performance tuning scheme for Apache Hadoop platform, which is an open-source framework to process large-scale data-sets on clustered environment of commodity hardware. The outcome of our work shows there is an overall performance enhancement by 11.7%, and the overall resource utilization is enhanced by 17% in comparison with the un-tuned and older tradition data analytics systems.
first_indexed	2025-11-14T19:10:03Z
format	Dissertation (University of Nottingham only)
id	nottingham-30755
institution	University of Nottingham Malaysia Campus
institution_category	Local University
language	English
last_indexed	2025-11-14T19:10:03Z
publishDate	2014
recordtype	eprints
repository_type	Digital Repository
spelling	nottingham-307552017-10-19T15:04:36Z https://eprints.nottingham.ac.uk/30755/ Big data challenges & opportunities for development using Hadoop 2.0 platform Hegazi, Abdel Rahman Farag Today data analytics has become one of the fast growing research topics in the computation field, thanks to the technological advancements in the past few years that lead to this deluge of data era. Consequently; processing such large volumes of data has become a very complicated job as the data keeps growing continuously with enormous rates. Unfortunately, the traditional data analytics systems cannot process these large volumes of data due to the difficulty of managing the systems resources on these large scaled systems. Enhancing the overall system performance is also another issue as performance keeps degrading while the queue of the tasks waiting to be processed gets longer. The term “bigdata” was coined to describe this trend of fast growing data-sets. These obstacles add extra complex layers to the underlying platforms to perform the data acquisition, transmission, storage management, and these large-scale data processing mechanisms. In this project we address the possibility of having such a platform to process large data-sets volumes, with fair dynamic resources allocation scheme between all the different jobs running instead of the static allocation schemes that waste resources through unbalanced resources allocation and minimize the resources utilization. In addition, the system overall performance should remain the same if not better with no performance degrading as more running applications asking for additional resources. We present a performance tuning scheme for Apache Hadoop platform, which is an open-source framework to process large-scale data-sets on clustered environment of commodity hardware. The outcome of our work shows there is an overall performance enhancement by 11.7%, and the overall resource utilization is enhanced by 17% in comparison with the un-tuned and older tradition data analytics systems. 2014-12-09 Dissertation (University of Nottingham only) NonPeerReviewed application/pdf en https://eprints.nottingham.ac.uk/30755/1/AHegazi_dledata_temp_turnitintool_766644993._13264_1413205799_116303.pdf Hegazi, Abdel Rahman Farag (2014) Big data challenges & opportunities for development using Hadoop 2.0 platform. [Dissertation (University of Nottingham only)] Apache Hadoop MapReduce YARN Dynamic Allocation Big Data Cluster Data-sets. Data Warehouse Resource Allocation Data Analysis Cluster Cloud Computing
spellingShingle	Apache Hadoop MapReduce YARN Dynamic Allocation Big Data Cluster Data-sets. Data Warehouse Resource Allocation Data Analysis Cluster Cloud Computing Hegazi, Abdel Rahman Farag Big data challenges & opportunities for development using Hadoop 2.0 platform
title	Big data challenges & opportunities for development using Hadoop 2.0 platform
title_full	Big data challenges & opportunities for development using Hadoop 2.0 platform
title_fullStr	Big data challenges & opportunities for development using Hadoop 2.0 platform
title_full_unstemmed	Big data challenges & opportunities for development using Hadoop 2.0 platform
title_short	Big data challenges & opportunities for development using Hadoop 2.0 platform
title_sort	big data challenges & opportunities for development using hadoop 2.0 platform
topic	Apache Hadoop MapReduce YARN Dynamic Allocation Big Data Cluster Data-sets. Data Warehouse Resource Allocation Data Analysis Cluster Cloud Computing
url	https://eprints.nottingham.ac.uk/30755/

Big data challenges & opportunities for development using Hadoop 2.0 platform

Similar Items