Nanopore adaptive sequencing of gigabase length genomes for mixed samples, whole exome capture, and targeted panels

Single molecule sequencing technologies, such as nanopore sequencing, provide new ways to investigate genomes and genetics. They permit the detailed analysis of stretches of DNA orders of magnitude larger than previously possible. Studying genomes at this detail allows for a better understanding of...

Full description

Bibliographic Details
Main Author: Payne, Stuart Alexander
Format: Thesis (University of Nottingham only)
Language:English
Published: 2022
Subjects:
Online Access:https://eprints.nottingham.ac.uk/69266/
Description
Summary:Single molecule sequencing technologies, such as nanopore sequencing, provide new ways to investigate genomes and genetics. They permit the detailed analysis of stretches of DNA orders of magnitude larger than previously possible. Studying genomes at this detail allows for a better understanding of genome organisation and structural variants that are typically difficult to resolve using short read sequencing. Oxford Nanopore Technologies sequencers drive single molecules of DNA through membrane bound protein nanopores by applying a voltage across the membrane. This applied voltage draws ions and DNA through the nanopore, which is measured as a real-time data stream of ionic current. Inspecting the current data in real-time allows for specific molecules to be rejected by reversing the voltage across an individual nanopore. This process is called “Read Until”. Previously, Read Until has been carried out by inspecting and comparing the current data produced during sequencing. This dissertation proposes a method for implementing Read Until using graphics cards to accelerate basecalling and optimised real-time alignment. To build up to a full system for selective sequencing, the raw signal data that nanopore sequencers output must be assessed (Chapter 3). Specifically to better understand the characteristics of the continuous data stream. This is accomplished by inspecting bulk FAST5 files, first a visualisation application is built. This visualisation application is then used to assess both DNA and RNA samples, specifically looking at how unblocking behaviour is actioned and the impact it has on sequencing. With a grasp of raw signal data an application, readfish, is developed aiming to enable real-time basecalling of read chunks for currently sequencing molecules (Chapter 4). This approach uses GPU accelerated basecalling and fast alignment to make decisions on selecting and rejecting individual molecules. In addition, a schema is designed to allow for arbitrary experiments to be devised allowing multiple experiments to take place simultaneously. Then, an optimised CPU basecaller and barcode demultiplexing are incorporated extending the platforms and types of samples that can be considered. As a proof-of-concept readfish is used to selectively sequence target panels encompassing thousands of loci in the form of whole exome sequencing of the human cell line NA12878. This single experiment demonstrates great flexibility in the chosen target panel and the ability to use reference genomes at a gigabase scale. In further experiments using the ZymoBIOMICS mock community adaptive techniques are introduced as the experimental parameters are updated — dynamically — in response to the data generated by the same experiment. Finally, exemplar problems and applications of selective sequencing are considered as well as other practical mechanisms for real-time feedback making the whole process adaptive (Chapter 5). These exemplar problems show how the methods developed in this thesis enable the time-efficient screening using panels of gene targets, decrease the time to identifying fusions in a leukaemic cell line, and reduce sequencing costs through standard library preparation methods.