Big data challenges & opportunities for development using Hadoop 2.0 platform

Today data analytics has become one of the fast growing research topics in the computation field, thanks to the technological advancements in the past few years that lead to this deluge of data era. Consequently; processing such large volumes of data has become a very complicated job as the data kee...

Full description

Bibliographic Details
Main Author: Hegazi, Abdel Rahman Farag
Format: Dissertation (University of Nottingham only)
Language:English
Published: 2014
Subjects:
Online Access:https://eprints.nottingham.ac.uk/30755/
_version_ 1848794051548020736
author Hegazi, Abdel Rahman Farag
author_facet Hegazi, Abdel Rahman Farag
author_sort Hegazi, Abdel Rahman Farag
building Nottingham Research Data Repository
collection Online Access
description Today data analytics has become one of the fast growing research topics in the computation field, thanks to the technological advancements in the past few years that lead to this deluge of data era. Consequently; processing such large volumes of data has become a very complicated job as the data keeps growing continuously with enormous rates. Unfortunately, the traditional data analytics systems cannot process these large volumes of data due to the difficulty of managing the systems resources on these large scaled systems. Enhancing the overall system performance is also another issue as performance keeps degrading while the queue of the tasks waiting to be processed gets longer. The term “bigdata” was coined to describe this trend of fast growing data-sets. These obstacles add extra complex layers to the underlying platforms to perform the data acquisition, transmission, storage management, and these large-scale data processing mechanisms. In this project we address the possibility of having such a platform to process large data-sets volumes, with fair dynamic resources allocation scheme between all the different jobs running instead of the static allocation schemes that waste resources through unbalanced resources allocation and minimize the resources utilization. In addition, the system overall performance should remain the same if not better with no performance degrading as more running applications asking for additional resources. We present a performance tuning scheme for Apache Hadoop platform, which is an open-source framework to process large-scale data-sets on clustered environment of commodity hardware. The outcome of our work shows there is an overall performance enhancement by 11.7%, and the overall resource utilization is enhanced by 17% in comparison with the un-tuned and older tradition data analytics systems.
first_indexed 2025-11-14T19:10:03Z
format Dissertation (University of Nottingham only)
id nottingham-30755
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T19:10:03Z
publishDate 2014
recordtype eprints
repository_type Digital Repository
spelling nottingham-307552017-10-19T15:04:36Z https://eprints.nottingham.ac.uk/30755/ Big data challenges & opportunities for development using Hadoop 2.0 platform Hegazi, Abdel Rahman Farag Today data analytics has become one of the fast growing research topics in the computation field, thanks to the technological advancements in the past few years that lead to this deluge of data era. Consequently; processing such large volumes of data has become a very complicated job as the data keeps growing continuously with enormous rates. Unfortunately, the traditional data analytics systems cannot process these large volumes of data due to the difficulty of managing the systems resources on these large scaled systems. Enhancing the overall system performance is also another issue as performance keeps degrading while the queue of the tasks waiting to be processed gets longer. The term “bigdata” was coined to describe this trend of fast growing data-sets. These obstacles add extra complex layers to the underlying platforms to perform the data acquisition, transmission, storage management, and these large-scale data processing mechanisms. In this project we address the possibility of having such a platform to process large data-sets volumes, with fair dynamic resources allocation scheme between all the different jobs running instead of the static allocation schemes that waste resources through unbalanced resources allocation and minimize the resources utilization. In addition, the system overall performance should remain the same if not better with no performance degrading as more running applications asking for additional resources. We present a performance tuning scheme for Apache Hadoop platform, which is an open-source framework to process large-scale data-sets on clustered environment of commodity hardware. The outcome of our work shows there is an overall performance enhancement by 11.7%, and the overall resource utilization is enhanced by 17% in comparison with the un-tuned and older tradition data analytics systems. 2014-12-09 Dissertation (University of Nottingham only) NonPeerReviewed application/pdf en https://eprints.nottingham.ac.uk/30755/1/AHegazi_dledata_temp_turnitintool_766644993._13264_1413205799_116303.pdf Hegazi, Abdel Rahman Farag (2014) Big data challenges & opportunities for development using Hadoop 2.0 platform. [Dissertation (University of Nottingham only)] Apache Hadoop MapReduce YARN Dynamic Allocation Big Data Cluster Data-sets. Data Warehouse Resource Allocation Data Analysis Cluster Cloud Computing
spellingShingle Apache Hadoop
MapReduce
YARN
Dynamic Allocation
Big Data
Cluster
Data-sets. Data Warehouse
Resource Allocation
Data Analysis
Cluster
Cloud Computing
Hegazi, Abdel Rahman Farag
Big data challenges & opportunities for development using Hadoop 2.0 platform
title Big data challenges & opportunities for development using Hadoop 2.0 platform
title_full Big data challenges & opportunities for development using Hadoop 2.0 platform
title_fullStr Big data challenges & opportunities for development using Hadoop 2.0 platform
title_full_unstemmed Big data challenges & opportunities for development using Hadoop 2.0 platform
title_short Big data challenges & opportunities for development using Hadoop 2.0 platform
title_sort big data challenges & opportunities for development using hadoop 2.0 platform
topic Apache Hadoop
MapReduce
YARN
Dynamic Allocation
Big Data
Cluster
Data-sets. Data Warehouse
Resource Allocation
Data Analysis
Cluster
Cloud Computing
url https://eprints.nottingham.ac.uk/30755/