Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology

Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine...

Full description

Bibliographic Details
Main Authors: Swan, Anna L., Mobasheri, Ali, Allaway, David, Liddell, Susan, Bacardit, Jaume
Format: Article
Published: Mary Ann Liebert 2013
Online Access:https://eprints.nottingham.ac.uk/2349/
_version_ 1848790761828515840
author Swan, Anna L.
Mobasheri, Ali
Allaway, David
Liddell, Susan
Bacardit, Jaume
author_facet Swan, Anna L.
Mobasheri, Ali
Allaway, David
Liddell, Susan
Bacardit, Jaume
author_sort Swan, Anna L.
building Nottingham Research Data Repository
collection Online Access
description Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes.
first_indexed 2025-11-14T18:17:45Z
format Article
id nottingham-2349
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T18:17:45Z
publishDate 2013
publisher Mary Ann Liebert
recordtype eprints
repository_type Digital Repository
spelling nottingham-23492020-05-04T20:18:32Z https://eprints.nottingham.ac.uk/2349/ Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology Swan, Anna L. Mobasheri, Ali Allaway, David Liddell, Susan Bacardit, Jaume Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes. Mary Ann Liebert 2013-12 Article PeerReviewed Swan, Anna L., Mobasheri, Ali, Allaway, David, Liddell, Susan and Bacardit, Jaume (2013) Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology. OMICS: a Journal of Integrative Biology, 17 (12). pp. 595-610. ISSN 1536-2310 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3837439/ doi:10.1089/omi.2013.0017 doi:10.1089/omi.2013.0017
spellingShingle Swan, Anna L.
Mobasheri, Ali
Allaway, David
Liddell, Susan
Bacardit, Jaume
Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology
title Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology
title_full Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology
title_fullStr Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology
title_full_unstemmed Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology
title_short Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology
title_sort application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology
url https://eprints.nottingham.ac.uk/2349/
https://eprints.nottingham.ac.uk/2349/
https://eprints.nottingham.ac.uk/2349/