Statistical analysis of proteomic mass spectrometry data

This thesis considers the statistical modelling and analysis of proteomic mass spectrometry data. Proteomics is a relatively new field of study and tried and tested methods of analysis do not yet exist. Mass spectrometry output is high-dimensional and so we firstly develop an algorithm to identify p...

Full description

Bibliographic Details
Main Author: Handley, Kelly
Format: Thesis (University of Nottingham only)
Language:English
Published: 2007
Subjects:
Online Access:https://eprints.nottingham.ac.uk/10287/
_version_ 1848791058234736640
author Handley, Kelly
author_facet Handley, Kelly
author_sort Handley, Kelly
building Nottingham Research Data Repository
collection Online Access
description This thesis considers the statistical modelling and analysis of proteomic mass spectrometry data. Proteomics is a relatively new field of study and tried and tested methods of analysis do not yet exist. Mass spectrometry output is high-dimensional and so we firstly develop an algorithm to identify peaks in the spectra in order to reduce the dimensionality of the datasets. We use the results along with a variety of classification methods to examine the classification of new spectra based on a training set. Another method to reduce the complexity of the problem is to fit a parametric model to the data. We model the data as a mixture of Gaussian peaks with parameters representing the peak locations, heights and variances, and apply a Bayesian Markov chain Monte Carlo (MCMC) algorithm to obtain their estimates. These resulting estimates are used to identify m/z values where differences are apparent between groups, where the m/z value of an ion is its mass divided by its charge. A multilevel modelling framework is also considered to incorporate the structure in the data and locations exhibiting differences are again obtained. We consider two mass spectrometry datasets in detail. The first consists of mass spectra from breast cancer cells which either have or have not been treated with the chemotherapeutic agent Taxol. The second consists of mass spectra from melanoma cells classified as stage I or stage IV using the TNM system. Using the MCMC and multilevel techniques described above we show that, in both datasets, small subsets of the available m/z values can be identified which exhibit significant differences in protein expression between groups. Also we see that good classification of new data can also be achieved using a small number of m/z values and that the classification rate does not fall greatly when compared with results from the complete spectra. For both datasets we compare our results with those in the literature which use other techniques on the same data. We conclude by discussing potential areas for further research.
first_indexed 2025-11-14T18:22:28Z
format Thesis (University of Nottingham only)
id nottingham-10287
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T18:22:28Z
publishDate 2007
recordtype eprints
repository_type Digital Repository
spelling nottingham-102872025-02-28T11:07:43Z https://eprints.nottingham.ac.uk/10287/ Statistical analysis of proteomic mass spectrometry data Handley, Kelly This thesis considers the statistical modelling and analysis of proteomic mass spectrometry data. Proteomics is a relatively new field of study and tried and tested methods of analysis do not yet exist. Mass spectrometry output is high-dimensional and so we firstly develop an algorithm to identify peaks in the spectra in order to reduce the dimensionality of the datasets. We use the results along with a variety of classification methods to examine the classification of new spectra based on a training set. Another method to reduce the complexity of the problem is to fit a parametric model to the data. We model the data as a mixture of Gaussian peaks with parameters representing the peak locations, heights and variances, and apply a Bayesian Markov chain Monte Carlo (MCMC) algorithm to obtain their estimates. These resulting estimates are used to identify m/z values where differences are apparent between groups, where the m/z value of an ion is its mass divided by its charge. A multilevel modelling framework is also considered to incorporate the structure in the data and locations exhibiting differences are again obtained. We consider two mass spectrometry datasets in detail. The first consists of mass spectra from breast cancer cells which either have or have not been treated with the chemotherapeutic agent Taxol. The second consists of mass spectra from melanoma cells classified as stage I or stage IV using the TNM system. Using the MCMC and multilevel techniques described above we show that, in both datasets, small subsets of the available m/z values can be identified which exhibit significant differences in protein expression between groups. Also we see that good classification of new data can also be achieved using a small number of m/z values and that the classification rate does not fall greatly when compared with results from the complete spectra. For both datasets we compare our results with those in the literature which use other techniques on the same data. We conclude by discussing potential areas for further research. 2007 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en arr https://eprints.nottingham.ac.uk/10287/1/thesis_final.pdf Handley, Kelly (2007) Statistical analysis of proteomic mass spectrometry data. PhD thesis, University of Nottingham. Markov chain Monte Carlo MCMC multilevel modelling classification high-dimensional bioinformatics
spellingShingle Markov chain Monte Carlo
MCMC
multilevel modelling
classification
high-dimensional
bioinformatics
Handley, Kelly
Statistical analysis of proteomic mass spectrometry data
title Statistical analysis of proteomic mass spectrometry data
title_full Statistical analysis of proteomic mass spectrometry data
title_fullStr Statistical analysis of proteomic mass spectrometry data
title_full_unstemmed Statistical analysis of proteomic mass spectrometry data
title_short Statistical analysis of proteomic mass spectrometry data
title_sort statistical analysis of proteomic mass spectrometry data
topic Markov chain Monte Carlo
MCMC
multilevel modelling
classification
high-dimensional
bioinformatics
url https://eprints.nottingham.ac.uk/10287/