Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning

Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SN...

Full description

Bibliographic Details
Main Authors: Zhao, Nan, Han, Jing Ginger, Shyu, Chi-Ren, Korkin, Dmitry
Format: Online
Language:English
Published: Public Library of Science 2014
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4006705/
id pubmed-4006705
recordtype oai_dc
spelling pubmed-40067052014-05-09 Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning Zhao, Nan Han, Jing Ginger Shyu, Chi-Ren Korkin, Dmitry Research Article Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/. Public Library of Science 2014-05-01 /pmc/articles/PMC4006705/ /pubmed/24784581 http://dx.doi.org/10.1371/journal.pcbi.1003592 Text en © 2014 Zhao et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
repository_type Open Access Journal
institution_category Foreign Institution
institution US National Center for Biotechnology Information
building NCBI PubMed
collection Online Access
language English
format Online
author Zhao, Nan
Han, Jing Ginger
Shyu, Chi-Ren
Korkin, Dmitry
spellingShingle Zhao, Nan
Han, Jing Ginger
Shyu, Chi-Ren
Korkin, Dmitry
Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning
author_facet Zhao, Nan
Han, Jing Ginger
Shyu, Chi-Ren
Korkin, Dmitry
author_sort Zhao, Nan
title Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning
title_short Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning
title_full Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning
title_fullStr Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning
title_full_unstemmed Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning
title_sort determining effects of non-synonymous snps on protein-protein interactions using supervised and semi-supervised learning
description Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/.
publisher Public Library of Science
publishDate 2014
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4006705/
_version_ 1612084273937907712