Workflow optimization in distributed computing environment for stream-based data processing model / Saima Gulzar Ahmad

With the advancement in science and technology numerous complex scientific applications can be executed in heterogeneous computing environment. However, the bottle neck is efficient scheduling algorithms. Such complex applications can be expressed in the form of workflows. Geographically distribu...

Full description

Bibliographic Details
Main Author: Saima Gulzar, Ahmad
Format: Thesis
Published: 2017
Subjects:
Online Access:http://studentsrepo.um.edu.my/7761/
http://studentsrepo.um.edu.my/7761/2/All.pdf
http://studentsrepo.um.edu.my/7761/1/thesis.pdf
_version_ 1848773478477463552
author Saima Gulzar, Ahmad
author_facet Saima Gulzar, Ahmad
author_sort Saima Gulzar, Ahmad
building UM Research Repository
collection Online Access
description With the advancement in science and technology numerous complex scientific applications can be executed in heterogeneous computing environment. However, the bottle neck is efficient scheduling algorithms. Such complex applications can be expressed in the form of workflows. Geographically distributed heterogeneous resources can execute such workflows in parallel. This enhances the workflow execution. In data-intensive workflows, heavy data moves across the execution nodes. This causes high communication overhead. To avoid such overheads many techniques have been used, however in this thesis stream-based data processing model is used in which data is processed in the form of continuous instances of data items. Data-intensive workflow optimization is an active research area because numerous applications are producing huge amount of data that is increasing exponentially day by day. This thesis proposes data-intensive workflow optimization algorithms. The first algorithm architecture consists of two phases a) workflow partitioning, and b) partitions mapping. Partitions are made in such a way that minimum data should move across the partitions. It enables heavy data processing locally on same execution node because each partition is mapped to one execution node. It overcomes the high communication costs. In the mapping phase, a partition is mapped on that execution node which offers minimum execution time. Eventually, the workflow is executed. The second algorithm is a variation in first algorithm in which data parallelism is introduced in each partition. Most compute intensive task in each partition is identified and data parallelism is applied to that task. It reduces the execution time of that compute intensive tasks. The simulation results prove that proposed algorithms outperform from state of the art algorithms for variety of workflows. The datasets used for performance evaluation are synthesized as well as workflows derived from real world applications. The workflows derived from real world applications include Montage and Cybershake. Synthesized workflows were generated with different sizes, shapes and densities to evaluate the proposed algorithms. The simulation results shows 60% reduced latency with 47% improvement in the throughput. Similarly, when data parallelism is introduced in the algorithm the performance of the algorithm improved further by 12% in latency and 17% in throughput when compared to PDWA algorithm. In the real time stream processing framework the experiments were performed using STORM with a use-case data-intensive workflow (EURExpressII). Experiments show that PDWA outperforms in terms of execution time of the workflow with different input data size.
first_indexed 2025-11-14T13:43:03Z
format Thesis
id um-7761
institution University Malaya
institution_category Local University
last_indexed 2025-11-14T13:43:03Z
publishDate 2017
recordtype eprints
repository_type Digital Repository
spelling um-77612017-09-19T08:21:38Z Workflow optimization in distributed computing environment for stream-based data processing model / Saima Gulzar Ahmad Saima Gulzar, Ahmad QA75 Electronic computers. Computer science With the advancement in science and technology numerous complex scientific applications can be executed in heterogeneous computing environment. However, the bottle neck is efficient scheduling algorithms. Such complex applications can be expressed in the form of workflows. Geographically distributed heterogeneous resources can execute such workflows in parallel. This enhances the workflow execution. In data-intensive workflows, heavy data moves across the execution nodes. This causes high communication overhead. To avoid such overheads many techniques have been used, however in this thesis stream-based data processing model is used in which data is processed in the form of continuous instances of data items. Data-intensive workflow optimization is an active research area because numerous applications are producing huge amount of data that is increasing exponentially day by day. This thesis proposes data-intensive workflow optimization algorithms. The first algorithm architecture consists of two phases a) workflow partitioning, and b) partitions mapping. Partitions are made in such a way that minimum data should move across the partitions. It enables heavy data processing locally on same execution node because each partition is mapped to one execution node. It overcomes the high communication costs. In the mapping phase, a partition is mapped on that execution node which offers minimum execution time. Eventually, the workflow is executed. The second algorithm is a variation in first algorithm in which data parallelism is introduced in each partition. Most compute intensive task in each partition is identified and data parallelism is applied to that task. It reduces the execution time of that compute intensive tasks. The simulation results prove that proposed algorithms outperform from state of the art algorithms for variety of workflows. The datasets used for performance evaluation are synthesized as well as workflows derived from real world applications. The workflows derived from real world applications include Montage and Cybershake. Synthesized workflows were generated with different sizes, shapes and densities to evaluate the proposed algorithms. The simulation results shows 60% reduced latency with 47% improvement in the throughput. Similarly, when data parallelism is introduced in the algorithm the performance of the algorithm improved further by 12% in latency and 17% in throughput when compared to PDWA algorithm. In the real time stream processing framework the experiments were performed using STORM with a use-case data-intensive workflow (EURExpressII). Experiments show that PDWA outperforms in terms of execution time of the workflow with different input data size. 2017-07-24 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/7761/2/All.pdf application/pdf http://studentsrepo.um.edu.my/7761/1/thesis.pdf Saima Gulzar, Ahmad (2017) Workflow optimization in distributed computing environment for stream-based data processing model / Saima Gulzar Ahmad. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/7761/
spellingShingle QA75 Electronic computers. Computer science
Saima Gulzar, Ahmad
Workflow optimization in distributed computing environment for stream-based data processing model / Saima Gulzar Ahmad
title Workflow optimization in distributed computing environment for stream-based data processing model / Saima Gulzar Ahmad
title_full Workflow optimization in distributed computing environment for stream-based data processing model / Saima Gulzar Ahmad
title_fullStr Workflow optimization in distributed computing environment for stream-based data processing model / Saima Gulzar Ahmad
title_full_unstemmed Workflow optimization in distributed computing environment for stream-based data processing model / Saima Gulzar Ahmad
title_short Workflow optimization in distributed computing environment for stream-based data processing model / Saima Gulzar Ahmad
title_sort workflow optimization in distributed computing environment for stream-based data processing model / saima gulzar ahmad
topic QA75 Electronic computers. Computer science
url http://studentsrepo.um.edu.my/7761/
http://studentsrepo.um.edu.my/7761/2/All.pdf
http://studentsrepo.um.edu.my/7761/1/thesis.pdf