Cloud-to-cloud data transfer parallelization framework via spawning intermediate instances for scalable data migration

As enterprises are increasingly embracing the practice of multiple clouds federation, scalable data transfer between cloud datacenters is important from the standpoint of cloud consumers. Many existing works are done from the service provider perspective, requiring insights into the datacenter opera...

Full description

Bibliographic Details
Main Author: Boey, Calvin Mun Lek
Format: Final Year Project / Dissertation / Thesis
Published: 2019
Subjects:
Online Access:http://eprints.utar.edu.my/3570/
http://eprints.utar.edu.my/3570/1/Cloud%2Dto%2Dcloud_data_transfer_parallelization_framework_via_spawning_intermediate_instances_for_scalable_data_migration.pdf
_version_ 1848885936302063616
author Boey, Calvin Mun Lek
author_facet Boey, Calvin Mun Lek
author_sort Boey, Calvin Mun Lek
building UTAR Institutional Repository
collection Online Access
description As enterprises are increasingly embracing the practice of multiple clouds federation, scalable data transfer between cloud datacenters is important from the standpoint of cloud consumers. Many existing works are done from the service provider perspective, requiring insights into the datacenter operations which are not available to the cloud consumer. In this dissertation, a data transfer framework that allows cloud consumers to circumvent the bandwidth limitation by spawning intermediate nodes and perform parallel transfer through many-tomany nodes is proposed. However, the effectiveness of such approach depends on many factors such as the time required to spawn new nodes, and bandwidth between the nodes. The objective of this work is to investigate the limitation and potential of the cloud-to-cloud parallel transfer (CPT). Firstly, all the components needed in the parallel data transfer is identified and modelled. Based on the transfer time and cost models, the circumstances where parallel transfer is worthy is identified. Then, a few optimizations are proposed, namely pipelining and network data piping to increase the data transfer throughput. Pipelining enables each stages of the parallel transfer to work concurrently while network data piping reduces the time spent on dividing files into chunks. Secondly, selected cloud Virtual Machines (VM) are benchmarked. Based on the observed behavior, pre-testing and VM-type selection techniques are proposed. Pre-testing utilized nodes top performing nodes while VM-type selection utilize suitable VM type and sizing. Thirdly, the CPT is implemented and tested on Amazon EC2. The adapted CPT for transfer between Hadoop clusters is also tested. The results showed that the transfer time of CPT is not only lesser than DistCp, but also has a lower cost – up to 8x in certain scenario.
first_indexed 2025-11-15T19:30:31Z
format Final Year Project / Dissertation / Thesis
id utar-3570
institution Universiti Tunku Abdul Rahman
institution_category Local University
last_indexed 2025-11-15T19:30:31Z
publishDate 2019
recordtype eprints
repository_type Digital Repository
spelling utar-35702019-09-25T05:52:42Z Cloud-to-cloud data transfer parallelization framework via spawning intermediate instances for scalable data migration Boey, Calvin Mun Lek QA76 Computer software As enterprises are increasingly embracing the practice of multiple clouds federation, scalable data transfer between cloud datacenters is important from the standpoint of cloud consumers. Many existing works are done from the service provider perspective, requiring insights into the datacenter operations which are not available to the cloud consumer. In this dissertation, a data transfer framework that allows cloud consumers to circumvent the bandwidth limitation by spawning intermediate nodes and perform parallel transfer through many-tomany nodes is proposed. However, the effectiveness of such approach depends on many factors such as the time required to spawn new nodes, and bandwidth between the nodes. The objective of this work is to investigate the limitation and potential of the cloud-to-cloud parallel transfer (CPT). Firstly, all the components needed in the parallel data transfer is identified and modelled. Based on the transfer time and cost models, the circumstances where parallel transfer is worthy is identified. Then, a few optimizations are proposed, namely pipelining and network data piping to increase the data transfer throughput. Pipelining enables each stages of the parallel transfer to work concurrently while network data piping reduces the time spent on dividing files into chunks. Secondly, selected cloud Virtual Machines (VM) are benchmarked. Based on the observed behavior, pre-testing and VM-type selection techniques are proposed. Pre-testing utilized nodes top performing nodes while VM-type selection utilize suitable VM type and sizing. Thirdly, the CPT is implemented and tested on Amazon EC2. The adapted CPT for transfer between Hadoop clusters is also tested. The results showed that the transfer time of CPT is not only lesser than DistCp, but also has a lower cost – up to 8x in certain scenario. 2019-02 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/3570/1/Cloud%2Dto%2Dcloud_data_transfer_parallelization_framework_via_spawning_intermediate_instances_for_scalable_data_migration.pdf Boey, Calvin Mun Lek (2019) Cloud-to-cloud data transfer parallelization framework via spawning intermediate instances for scalable data migration. Master dissertation/thesis, UTAR. http://eprints.utar.edu.my/3570/
spellingShingle QA76 Computer software
Boey, Calvin Mun Lek
Cloud-to-cloud data transfer parallelization framework via spawning intermediate instances for scalable data migration
title Cloud-to-cloud data transfer parallelization framework via spawning intermediate instances for scalable data migration
title_full Cloud-to-cloud data transfer parallelization framework via spawning intermediate instances for scalable data migration
title_fullStr Cloud-to-cloud data transfer parallelization framework via spawning intermediate instances for scalable data migration
title_full_unstemmed Cloud-to-cloud data transfer parallelization framework via spawning intermediate instances for scalable data migration
title_short Cloud-to-cloud data transfer parallelization framework via spawning intermediate instances for scalable data migration
title_sort cloud-to-cloud data transfer parallelization framework via spawning intermediate instances for scalable data migration
topic QA76 Computer software
url http://eprints.utar.edu.my/3570/
http://eprints.utar.edu.my/3570/1/Cloud%2Dto%2Dcloud_data_transfer_parallelization_framework_via_spawning_intermediate_instances_for_scalable_data_migration.pdf