Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification
The genetic epidemiology behind the complex diseases are characterised by multiple factors acting together or independently. The complex network of these multiple factors induces pathological mechanisms which lead to disease manifestation. Advances in genotyping technology have dramatically increase...
| Main Authors: | , , |
|---|---|
| Format: | Journal Article |
| Published: |
Australian National University
2014
|
| Subjects: | |
| Online Access: | http://hdl.handle.net/20.500.11937/14459 |
| _version_ | 1848748628262256640 |
|---|---|
| author | Uppu, S. Krishna, Aneesh Gopalan, Raj |
| author_facet | Uppu, S. Krishna, Aneesh Gopalan, Raj |
| author_sort | Uppu, S. |
| building | Curtin Institutional Repository |
| collection | Online Access |
| description | The genetic epidemiology behind the complex diseases are characterised by multiple factors acting together or independently. The complex network of these multiple factors induces pathological mechanisms which lead to disease manifestation. Advances in genotyping technology have dramatically increased the understanding of single nucleotide polymorphisms (SNPs) associated with complex diseases. The interactions between SNPs responsible for disease susceptibility are being intensively explored in this era of genome wide association studies (GWAS). Several machine learning and data mining approaches have been proposed to track the inheritance of the disease and its susceptibility towards the environmental factors. However, detecting these interactions continues to be a critical challenge due to bio-molecular complexities and computational limitations. The goal of this research is to study the effectiveness of associative classification for detecting the epistasis in balanced and imbalanced datasets. The proposed approach was evaluated for two locus epistasis interactions using simulated data. The datasets were generated for 5 different penetrance functions by varying heritability, minor allele frequency and sample size. In total, 23,400 datasets were generated and several experiments conducted to identify the disease causal SNP interactions. The accuracy of classification by the proposed approach was compared with the previous approaches. Though the associative classification showed small improvement in accuracy for balanced datasets, it outperformed existing approaches for higher order multi-locus interactions in imbalanced datasets. |
| first_indexed | 2025-11-14T07:08:04Z |
| format | Journal Article |
| id | curtin-20.500.11937-14459 |
| institution | Curtin University Malaysia |
| institution_category | Local University |
| last_indexed | 2025-11-14T07:08:04Z |
| publishDate | 2014 |
| publisher | Australian National University |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | curtin-20.500.11937-144592017-01-30T11:43:57Z Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification Uppu, S. Krishna, Aneesh Gopalan, Raj associative classification Epistasis SNP interactions multi-locus The genetic epidemiology behind the complex diseases are characterised by multiple factors acting together or independently. The complex network of these multiple factors induces pathological mechanisms which lead to disease manifestation. Advances in genotyping technology have dramatically increased the understanding of single nucleotide polymorphisms (SNPs) associated with complex diseases. The interactions between SNPs responsible for disease susceptibility are being intensively explored in this era of genome wide association studies (GWAS). Several machine learning and data mining approaches have been proposed to track the inheritance of the disease and its susceptibility towards the environmental factors. However, detecting these interactions continues to be a critical challenge due to bio-molecular complexities and computational limitations. The goal of this research is to study the effectiveness of associative classification for detecting the epistasis in balanced and imbalanced datasets. The proposed approach was evaluated for two locus epistasis interactions using simulated data. The datasets were generated for 5 different penetrance functions by varying heritability, minor allele frequency and sample size. In total, 23,400 datasets were generated and several experiments conducted to identify the disease causal SNP interactions. The accuracy of classification by the proposed approach was compared with the previous approaches. Though the associative classification showed small improvement in accuracy for balanced datasets, it outperformed existing approaches for higher order multi-locus interactions in imbalanced datasets. 2014 Journal Article http://hdl.handle.net/20.500.11937/14459 Australian National University fulltext |
| spellingShingle | associative classification Epistasis SNP interactions multi-locus Uppu, S. Krishna, Aneesh Gopalan, Raj Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification |
| title | Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification |
| title_full | Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification |
| title_fullStr | Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification |
| title_full_unstemmed | Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification |
| title_short | Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification |
| title_sort | detecting snp interactions in balanced and imbalanced datasets using associative classification |
| topic | associative classification Epistasis SNP interactions multi-locus |
| url | http://hdl.handle.net/20.500.11937/14459 |