Rule-based analysis for detecting epistasis using associative classification mining

The advancements in sequencing high-throughput human genome and computational abilities have tremendously improved the understanding of the genetic architecture behind the complex diseases. The development of high-throughput genotyping and next-generation sequencing technologies enables large-scale...

Full description

Bibliographic Details
Main Authors: Uppu, Suneetha, Krishna, Aneesh, Gopalan, Raj
Format: Journal Article
Published: 2015
Online Access:http://hdl.handle.net/20.500.11937/28822
_version_ 1848752639327600640
author Uppu, Suneetha
Krishna, Aneesh
Gopalan, Raj
author_facet Uppu, Suneetha
Krishna, Aneesh
Gopalan, Raj
author_sort Uppu, Suneetha
building Curtin Institutional Repository
collection Online Access
description The advancements in sequencing high-throughput human genome and computational abilities have tremendously improved the understanding of the genetic architecture behind the complex diseases. The development of high-throughput genotyping and next-generation sequencing technologies enables large-scale data for genetic epidemiological analysis. These advances led to the identification of a number of single nucleotide polymorphisms (SNPs) associated with complex diseases. The interactions between SNPs responsible for disease susceptibility have been increasingly explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. The goal of this research is to implement associative classification and study its effectiveness for detecting the epistasis in balanced and imbalanced datasets. The proposed approach was evaluated for single-locus models to six-locus models using simulated data. The datasets were generated for five different penetrance functions by varying heritability, minor allele frequency and sample size. In total, 57,300 datasets were generated and several experiments conducted to identify the disease causal SNP interactions. The accuracy of classification by the proposed approach was compared with the existing approaches. The experimental results demonstrated significant improvements in accuracy for detecting interactions associated with the phenotype. Further, the approach was successfully applied over sporadic breast cancer data. The results show interaction among six polymorphisms, which included five different estrogen-metabolism genes.
first_indexed 2025-11-14T08:11:49Z
format Journal Article
id curtin-20.500.11937-28822
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T08:11:49Z
publishDate 2015
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-288222017-09-13T15:17:10Z Rule-based analysis for detecting epistasis using associative classification mining Uppu, Suneetha Krishna, Aneesh Gopalan, Raj The advancements in sequencing high-throughput human genome and computational abilities have tremendously improved the understanding of the genetic architecture behind the complex diseases. The development of high-throughput genotyping and next-generation sequencing technologies enables large-scale data for genetic epidemiological analysis. These advances led to the identification of a number of single nucleotide polymorphisms (SNPs) associated with complex diseases. The interactions between SNPs responsible for disease susceptibility have been increasingly explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. The goal of this research is to implement associative classification and study its effectiveness for detecting the epistasis in balanced and imbalanced datasets. The proposed approach was evaluated for single-locus models to six-locus models using simulated data. The datasets were generated for five different penetrance functions by varying heritability, minor allele frequency and sample size. In total, 57,300 datasets were generated and several experiments conducted to identify the disease causal SNP interactions. The accuracy of classification by the proposed approach was compared with the existing approaches. The experimental results demonstrated significant improvements in accuracy for detecting interactions associated with the phenotype. Further, the approach was successfully applied over sporadic breast cancer data. The results show interaction among six polymorphisms, which included five different estrogen-metabolism genes. 2015 Journal Article http://hdl.handle.net/20.500.11937/28822 10.1007/s13721-015-0084-3 restricted
spellingShingle Uppu, Suneetha
Krishna, Aneesh
Gopalan, Raj
Rule-based analysis for detecting epistasis using associative classification mining
title Rule-based analysis for detecting epistasis using associative classification mining
title_full Rule-based analysis for detecting epistasis using associative classification mining
title_fullStr Rule-based analysis for detecting epistasis using associative classification mining
title_full_unstemmed Rule-based analysis for detecting epistasis using associative classification mining
title_short Rule-based analysis for detecting epistasis using associative classification mining
title_sort rule-based analysis for detecting epistasis using associative classification mining
url http://hdl.handle.net/20.500.11937/28822