Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak

The advancement of next generation sequencing (NGS) technology has revolutionized the field of genomic and genetic studies. As compared to conventional methods, NGS generate comprehensive genomic data at a fraction of the cost with a higher percentage of accuracy. One of the processing and analyzing...

Full description

Bibliographic Details
Main Author: Nur ‘ Ain , Mohd Ishak
Format: Thesis
Published: 2020
Subjects:
Online Access:http://studentsrepo.um.edu.my/12724/
http://studentsrepo.um.edu.my/12724/1/Nur_'ain.pdf
http://studentsrepo.um.edu.my/12724/2/Nur_%E2%80%98ain.pdf
_version_ 1848774718050533376
author Nur ‘ Ain , Mohd Ishak
author_facet Nur ‘ Ain , Mohd Ishak
author_sort Nur ‘ Ain , Mohd Ishak
building UM Research Repository
collection Online Access
description The advancement of next generation sequencing (NGS) technology has revolutionized the field of genomic and genetic studies. As compared to conventional methods, NGS generate comprehensive genomic data at a fraction of the cost with a higher percentage of accuracy. One of the processing and analyzing NGS data is genome assembly. De novo assembly is a process of assembling short reads into contiguous sections of sequence without a reference which is different with conventional mapping technique. De Bruijn graph is one of the assembly algorithms that are widely used for short reads sequences produced from NGS platforms. In this study, the performance of four de novo assemblers (SPAdes, ABySS, Velvet and MaSuRCA) is reported, in which variants of de Brujin graph algorithms are applied, using genomic data generated by the Illumina sequencing platform. The computational performance regarding the assemblers running time were compared. The assembled contigs and scaffolds were also evaluated based on several qualities specifically for their length and the contiguity of the assembly using ABySS-fac. Results showed that on single-end data sets, MaSuRCA, and SPAdes produced generally the best results among all the four assemblers with highest percentage of contigs that were equal or longer than 500 bp, highest total base pairs, highest N50 and the lowest L50 for most assemblers. For paired-end data sets, Velvet are suitable to assemble all the seven bacteria genome sequences. This comparative study will advance the current knowledge of de novo genome assembly as it is the first step toward characterizing and revealing whole genomic information. In addition, this work provides a practical guideline that could aid researchers in identifying the appropriate assembler(s) for their research projects.
first_indexed 2025-11-14T14:02:45Z
format Thesis
id um-12724
institution University Malaya
institution_category Local University
last_indexed 2025-11-14T14:02:45Z
publishDate 2020
recordtype eprints
repository_type Digital Repository
spelling um-127242021-12-14T19:07:31Z Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak Nur ‘ Ain , Mohd Ishak Q Science (General) QH301 Biology The advancement of next generation sequencing (NGS) technology has revolutionized the field of genomic and genetic studies. As compared to conventional methods, NGS generate comprehensive genomic data at a fraction of the cost with a higher percentage of accuracy. One of the processing and analyzing NGS data is genome assembly. De novo assembly is a process of assembling short reads into contiguous sections of sequence without a reference which is different with conventional mapping technique. De Bruijn graph is one of the assembly algorithms that are widely used for short reads sequences produced from NGS platforms. In this study, the performance of four de novo assemblers (SPAdes, ABySS, Velvet and MaSuRCA) is reported, in which variants of de Brujin graph algorithms are applied, using genomic data generated by the Illumina sequencing platform. The computational performance regarding the assemblers running time were compared. The assembled contigs and scaffolds were also evaluated based on several qualities specifically for their length and the contiguity of the assembly using ABySS-fac. Results showed that on single-end data sets, MaSuRCA, and SPAdes produced generally the best results among all the four assemblers with highest percentage of contigs that were equal or longer than 500 bp, highest total base pairs, highest N50 and the lowest L50 for most assemblers. For paired-end data sets, Velvet are suitable to assemble all the seven bacteria genome sequences. This comparative study will advance the current knowledge of de novo genome assembly as it is the first step toward characterizing and revealing whole genomic information. In addition, this work provides a practical guideline that could aid researchers in identifying the appropriate assembler(s) for their research projects. 2020-10 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/12724/1/Nur_'ain.pdf application/pdf http://studentsrepo.um.edu.my/12724/2/Nur_%E2%80%98ain.pdf Nur ‘ Ain , Mohd Ishak (2020) Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak. Masters thesis, Universiti Malaya. http://studentsrepo.um.edu.my/12724/
spellingShingle Q Science (General)
QH301 Biology
Nur ‘ Ain , Mohd Ishak
Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak
title Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak
title_full Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak
title_fullStr Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak
title_full_unstemmed Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak
title_short Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak
title_sort performance analysis of bacterial genome assemblers using illumina next generation sequencing data / nur ‘ ain mohd ishak
topic Q Science (General)
QH301 Biology
url http://studentsrepo.um.edu.my/12724/
http://studentsrepo.um.edu.my/12724/1/Nur_'ain.pdf
http://studentsrepo.um.edu.my/12724/2/Nur_%E2%80%98ain.pdf