Models and methods to integrate epidemiological and whole genome sequence data for effectively analysing infectious disease outbreak data
Advances in sequencing technology and the reduction in associated costs have enabled scientists to obtain highly detailed genomic data on disease-causing pathogens on a scale never seen before. Combining genomic data with traditional epidemiological data (e.g. incidence data) provides a unique oppor...
| Main Author: | |
|---|---|
| Format: | Thesis (University of Nottingham only) |
| Language: | English |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://eprints.nottingham.ac.uk/77270/ |
| _version_ | 1848800980441759744 |
|---|---|
| author | Marsh, J. S. |
| author_facet | Marsh, J. S. |
| author_sort | Marsh, J. S. |
| building | Nottingham Research Data Repository |
| collection | Online Access |
| description | Advances in sequencing technology and the reduction in associated costs have enabled scientists to obtain highly detailed genomic data on disease-causing pathogens on a scale never seen before. Combining genomic data with traditional epidemiological data (e.g. incidence data) provides a unique opportunity to determine the actual transmission pathway of the pathogen through a population. Despite recent advances, existing approaches have their own limitations, such as simplifications to the underlying biological processes, arbitrary phenomenological models or approximations to the likelihood function, to name a few.
We present a novel modelling framework for integrating epidemiological and whole genome sequence data to overcome the above limitations where (i) we use the matrix of pairwise horizontal distances between sequences as a summary statistic for the genetic data and (ii) explicitly derive joint probability distribution of pairwise genetic distances under the assumption of microevolution mutation models. We develop bespoke and computationally efficient data-augmentation MCMC algorithms to infer the transmission network, infection times and unobserved genetic distances from pathogen sequences at the time of transmission.
The framework presented is general and applicable to a variety of outbreak scenarios. For example, we explicitly consider a discrete time transmission model for healthcare associated infections and demonstrate the performance of our framework on simulated data and also analyse an outbreak of \textit{S. aureus} in an intensive care unit in Brighton during 2011-2012. Our approach integrates healthcare worker data at an individual level and considers the possibility of multiple distinct genetic subtypes.
Finally we also consider integrating genetic data with a continuous time SEIR model and analyse an outbreak of foot-and-mouth disease in Darlington, a town in the north west of the UK in 2001. We validated our inferred transmission network with previous modelling studies and demonstrate that pairwise genetic distance is an informative summary of the raw sequence data. |
| first_indexed | 2025-11-14T21:00:11Z |
| format | Thesis (University of Nottingham only) |
| id | nottingham-77270 |
| institution | University of Nottingham Malaysia Campus |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-14T21:00:11Z |
| publishDate | 2024 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | nottingham-772702024-07-24T04:40:51Z https://eprints.nottingham.ac.uk/77270/ Models and methods to integrate epidemiological and whole genome sequence data for effectively analysing infectious disease outbreak data Marsh, J. S. Advances in sequencing technology and the reduction in associated costs have enabled scientists to obtain highly detailed genomic data on disease-causing pathogens on a scale never seen before. Combining genomic data with traditional epidemiological data (e.g. incidence data) provides a unique opportunity to determine the actual transmission pathway of the pathogen through a population. Despite recent advances, existing approaches have their own limitations, such as simplifications to the underlying biological processes, arbitrary phenomenological models or approximations to the likelihood function, to name a few. We present a novel modelling framework for integrating epidemiological and whole genome sequence data to overcome the above limitations where (i) we use the matrix of pairwise horizontal distances between sequences as a summary statistic for the genetic data and (ii) explicitly derive joint probability distribution of pairwise genetic distances under the assumption of microevolution mutation models. We develop bespoke and computationally efficient data-augmentation MCMC algorithms to infer the transmission network, infection times and unobserved genetic distances from pathogen sequences at the time of transmission. The framework presented is general and applicable to a variety of outbreak scenarios. For example, we explicitly consider a discrete time transmission model for healthcare associated infections and demonstrate the performance of our framework on simulated data and also analyse an outbreak of \textit{S. aureus} in an intensive care unit in Brighton during 2011-2012. Our approach integrates healthcare worker data at an individual level and considers the possibility of multiple distinct genetic subtypes. Finally we also consider integrating genetic data with a continuous time SEIR model and analyse an outbreak of foot-and-mouth disease in Darlington, a town in the north west of the UK in 2001. We validated our inferred transmission network with previous modelling studies and demonstrate that pairwise genetic distance is an informative summary of the raw sequence data. 2024-07-24 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en cc_by https://eprints.nottingham.ac.uk/77270/1/JoeMarshPhDThesis.pdf Marsh, J. S. (2024) Models and methods to integrate epidemiological and whole genome sequence data for effectively analysing infectious disease outbreak data. PhD thesis, University of Nottingham. stochastic epidemic models; bayesian inference; genetic data; transmission trees; mrsa; foot and mouth disease; pairwise genetic distance data; who infected whom |
| spellingShingle | stochastic epidemic models; bayesian inference; genetic data; transmission trees; mrsa; foot and mouth disease; pairwise genetic distance data; who infected whom Marsh, J. S. Models and methods to integrate epidemiological and whole genome sequence data for effectively analysing infectious disease outbreak data |
| title | Models and methods to integrate epidemiological and whole genome sequence data for effectively analysing infectious disease outbreak data |
| title_full | Models and methods to integrate epidemiological and whole genome sequence data for effectively analysing infectious disease outbreak data |
| title_fullStr | Models and methods to integrate epidemiological and whole genome sequence data for effectively analysing infectious disease outbreak data |
| title_full_unstemmed | Models and methods to integrate epidemiological and whole genome sequence data for effectively analysing infectious disease outbreak data |
| title_short | Models and methods to integrate epidemiological and whole genome sequence data for effectively analysing infectious disease outbreak data |
| title_sort | models and methods to integrate epidemiological and whole genome sequence data for effectively analysing infectious disease outbreak data |
| topic | stochastic epidemic models; bayesian inference; genetic data; transmission trees; mrsa; foot and mouth disease; pairwise genetic distance data; who infected whom |
| url | https://eprints.nottingham.ac.uk/77270/ |