Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification

The genetic epidemiology behind the complex diseases are characterised by multiple factors acting together or independently. The complex network of these multiple factors induces pathological mechanisms which lead to disease manifestation. Advances in genotyping technology have dramatically increase...

Full description

Bibliographic Details
Main Authors: Uppu, S., Krishna, Aneesh, Gopalan, Raj
Format: Journal Article
Published: Australian National University 2014
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/14459
_version_ 1848748628262256640
author Uppu, S.
Krishna, Aneesh
Gopalan, Raj
author_facet Uppu, S.
Krishna, Aneesh
Gopalan, Raj
author_sort Uppu, S.
building Curtin Institutional Repository
collection Online Access
description The genetic epidemiology behind the complex diseases are characterised by multiple factors acting together or independently. The complex network of these multiple factors induces pathological mechanisms which lead to disease manifestation. Advances in genotyping technology have dramatically increased the understanding of single nucleotide polymorphisms (SNPs) associated with complex diseases. The interactions between SNPs responsible for disease susceptibility are being intensively explored in this era of genome wide association studies (GWAS). Several machine learning and data mining approaches have been proposed to track the inheritance of the disease and its susceptibility towards the environmental factors. However, detecting these interactions continues to be a critical challenge due to bio-molecular complexities and computational limitations. The goal of this research is to study the effectiveness of associative classification for detecting the epistasis in balanced and imbalanced datasets. The proposed approach was evaluated for two locus epistasis interactions using simulated data. The datasets were generated for 5 different penetrance functions by varying heritability, minor allele frequency and sample size. In total, 23,400 datasets were generated and several experiments conducted to identify the disease causal SNP interactions. The accuracy of classification by the proposed approach was compared with the previous approaches. Though the associative classification showed small improvement in accuracy for balanced datasets, it outperformed existing approaches for higher order multi-locus interactions in imbalanced datasets.
first_indexed 2025-11-14T07:08:04Z
format Journal Article
id curtin-20.500.11937-14459
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T07:08:04Z
publishDate 2014
publisher Australian National University
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-144592017-01-30T11:43:57Z Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification Uppu, S. Krishna, Aneesh Gopalan, Raj associative classification Epistasis SNP interactions multi-locus The genetic epidemiology behind the complex diseases are characterised by multiple factors acting together or independently. The complex network of these multiple factors induces pathological mechanisms which lead to disease manifestation. Advances in genotyping technology have dramatically increased the understanding of single nucleotide polymorphisms (SNPs) associated with complex diseases. The interactions between SNPs responsible for disease susceptibility are being intensively explored in this era of genome wide association studies (GWAS). Several machine learning and data mining approaches have been proposed to track the inheritance of the disease and its susceptibility towards the environmental factors. However, detecting these interactions continues to be a critical challenge due to bio-molecular complexities and computational limitations. The goal of this research is to study the effectiveness of associative classification for detecting the epistasis in balanced and imbalanced datasets. The proposed approach was evaluated for two locus epistasis interactions using simulated data. The datasets were generated for 5 different penetrance functions by varying heritability, minor allele frequency and sample size. In total, 23,400 datasets were generated and several experiments conducted to identify the disease causal SNP interactions. The accuracy of classification by the proposed approach was compared with the previous approaches. Though the associative classification showed small improvement in accuracy for balanced datasets, it outperformed existing approaches for higher order multi-locus interactions in imbalanced datasets. 2014 Journal Article http://hdl.handle.net/20.500.11937/14459 Australian National University fulltext
spellingShingle associative classification
Epistasis
SNP interactions
multi-locus
Uppu, S.
Krishna, Aneesh
Gopalan, Raj
Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification
title Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification
title_full Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification
title_fullStr Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification
title_full_unstemmed Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification
title_short Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification
title_sort detecting snp interactions in balanced and imbalanced datasets using associative classification
topic associative classification
Epistasis
SNP interactions
multi-locus
url http://hdl.handle.net/20.500.11937/14459