Rule-based analysis for detecting epistasis using associative classification mining
The advancements in sequencing high-throughput human genome and computational abilities have tremendously improved the understanding of the genetic architecture behind the complex diseases. The development of high-throughput genotyping and next-generation sequencing technologies enables large-scale...
| Main Authors: | , , |
|---|---|
| Format: | Journal Article |
| Published: |
2015
|
| Online Access: | http://hdl.handle.net/20.500.11937/28822 |
| _version_ | 1848752639327600640 |
|---|---|
| author | Uppu, Suneetha Krishna, Aneesh Gopalan, Raj |
| author_facet | Uppu, Suneetha Krishna, Aneesh Gopalan, Raj |
| author_sort | Uppu, Suneetha |
| building | Curtin Institutional Repository |
| collection | Online Access |
| description | The advancements in sequencing high-throughput human genome and computational abilities have tremendously improved the understanding of the genetic architecture behind the complex diseases. The development of high-throughput genotyping and next-generation sequencing technologies enables large-scale data for genetic epidemiological analysis. These advances led to the identification of a number of single nucleotide polymorphisms (SNPs) associated with complex diseases. The interactions between SNPs responsible for disease susceptibility have been increasingly explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. The goal of this research is to implement associative classification and study its effectiveness for detecting the epistasis in balanced and imbalanced datasets. The proposed approach was evaluated for single-locus models to six-locus models using simulated data. The datasets were generated for five different penetrance functions by varying heritability, minor allele frequency and sample size. In total, 57,300 datasets were generated and several experiments conducted to identify the disease causal SNP interactions. The accuracy of classification by the proposed approach was compared with the existing approaches. The experimental results demonstrated significant improvements in accuracy for detecting interactions associated with the phenotype. Further, the approach was successfully applied over sporadic breast cancer data. The results show interaction among six polymorphisms, which included five different estrogen-metabolism genes. |
| first_indexed | 2025-11-14T08:11:49Z |
| format | Journal Article |
| id | curtin-20.500.11937-28822 |
| institution | Curtin University Malaysia |
| institution_category | Local University |
| last_indexed | 2025-11-14T08:11:49Z |
| publishDate | 2015 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | curtin-20.500.11937-288222017-09-13T15:17:10Z Rule-based analysis for detecting epistasis using associative classification mining Uppu, Suneetha Krishna, Aneesh Gopalan, Raj The advancements in sequencing high-throughput human genome and computational abilities have tremendously improved the understanding of the genetic architecture behind the complex diseases. The development of high-throughput genotyping and next-generation sequencing technologies enables large-scale data for genetic epidemiological analysis. These advances led to the identification of a number of single nucleotide polymorphisms (SNPs) associated with complex diseases. The interactions between SNPs responsible for disease susceptibility have been increasingly explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. The goal of this research is to implement associative classification and study its effectiveness for detecting the epistasis in balanced and imbalanced datasets. The proposed approach was evaluated for single-locus models to six-locus models using simulated data. The datasets were generated for five different penetrance functions by varying heritability, minor allele frequency and sample size. In total, 57,300 datasets were generated and several experiments conducted to identify the disease causal SNP interactions. The accuracy of classification by the proposed approach was compared with the existing approaches. The experimental results demonstrated significant improvements in accuracy for detecting interactions associated with the phenotype. Further, the approach was successfully applied over sporadic breast cancer data. The results show interaction among six polymorphisms, which included five different estrogen-metabolism genes. 2015 Journal Article http://hdl.handle.net/20.500.11937/28822 10.1007/s13721-015-0084-3 restricted |
| spellingShingle | Uppu, Suneetha Krishna, Aneesh Gopalan, Raj Rule-based analysis for detecting epistasis using associative classification mining |
| title | Rule-based analysis for detecting epistasis using associative classification mining |
| title_full | Rule-based analysis for detecting epistasis using associative classification mining |
| title_fullStr | Rule-based analysis for detecting epistasis using associative classification mining |
| title_full_unstemmed | Rule-based analysis for detecting epistasis using associative classification mining |
| title_short | Rule-based analysis for detecting epistasis using associative classification mining |
| title_sort | rule-based analysis for detecting epistasis using associative classification mining |
| url | http://hdl.handle.net/20.500.11937/28822 |