Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning
Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SN...
Main Authors: | , , , |
---|---|
Format: | Online |
Language: | English |
Published: |
Public Library of Science
2014
|
Online Access: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4006705/ |
id |
pubmed-4006705 |
---|---|
recordtype |
oai_dc |
spelling |
pubmed-40067052014-05-09 Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning Zhao, Nan Han, Jing Ginger Shyu, Chi-Ren Korkin, Dmitry Research Article Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/. Public Library of Science 2014-05-01 /pmc/articles/PMC4006705/ /pubmed/24784581 http://dx.doi.org/10.1371/journal.pcbi.1003592 Text en © 2014 Zhao et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
repository_type |
Open Access Journal |
institution_category |
Foreign Institution |
institution |
US National Center for Biotechnology Information |
building |
NCBI PubMed |
collection |
Online Access |
language |
English |
format |
Online |
author |
Zhao, Nan Han, Jing Ginger Shyu, Chi-Ren Korkin, Dmitry |
spellingShingle |
Zhao, Nan Han, Jing Ginger Shyu, Chi-Ren Korkin, Dmitry Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning |
author_facet |
Zhao, Nan Han, Jing Ginger Shyu, Chi-Ren Korkin, Dmitry |
author_sort |
Zhao, Nan |
title |
Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning |
title_short |
Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning |
title_full |
Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning |
title_fullStr |
Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning |
title_full_unstemmed |
Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning |
title_sort |
determining effects of non-synonymous snps on protein-protein interactions using supervised and semi-supervised learning |
description |
Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/. |
publisher |
Public Library of Science |
publishDate |
2014 |
url |
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4006705/ |
_version_ |
1612084273937907712 |