Filtering of Background DNA Sequences Improves DNA Motif Prediction Using Clustering Techniques

Noisy objects have been known to affect negatively on the performance of clustering algorithms. This paper addresses the problem of high false positive rates in using self-organizing map (SOM) for DNA motif prediction due to the noisy background sequences in the input dataset. We propose the use of...

Full description

Bibliographic Details
Main Authors: Lee, Nung Kion, Chieng, Allen Hoon Choong
Format: Article
Language:English
Published: Elsevier 2013
Subjects:
Online Access:http://ir.unimas.my/id/eprint/11945/
http://ir.unimas.my/id/eprint/11945/1/Filtering%20of%20background%20DNA_abstract.pdf
_version_ 1848837094413172736
author Lee, Nung Kion
Chieng, Allen Hoon Choong
author_facet Lee, Nung Kion
Chieng, Allen Hoon Choong
author_sort Lee, Nung Kion
building UNIMAS Institutional Repository
collection Online Access
description Noisy objects have been known to affect negatively on the performance of clustering algorithms. This paper addresses the problem of high false positive rates in using self-organizing map (SOM) for DNA motif prediction due to the noisy background sequences in the input dataset. We propose the use of sequence filter in the pre-processing step to remove portion of the noisy background before applying to the SOM. Our method is motivated by the evolutionary conservation property of binding sites as opposed to randomness of background sequences. Our contributions are: (a) propose the use of string mismatch as filtering threshold function; and (b) two filtering methods, namely sequence driven and gapped consensus pattern, are proposed for filtering. We employed real datasets to evaluate the performance of SOM for DNA prediction after the filtering process. Our evaluation results show promising improvements in term of precision rates and also data reduction. We conclude that filtering background sequences is a feasible solution to improve prediction accuracy of using SOM for DNA motif prediction.
first_indexed 2025-11-15T06:34:12Z
format Article
id unimas-11945
institution Universiti Malaysia Sarawak
institution_category Local University
language English
last_indexed 2025-11-15T06:34:12Z
publishDate 2013
publisher Elsevier
recordtype eprints
repository_type Digital Repository
spelling unimas-119452016-05-12T03:21:59Z http://ir.unimas.my/id/eprint/11945/ Filtering of Background DNA Sequences Improves DNA Motif Prediction Using Clustering Techniques Lee, Nung Kion Chieng, Allen Hoon Choong Q Science (General) QA Mathematics Noisy objects have been known to affect negatively on the performance of clustering algorithms. This paper addresses the problem of high false positive rates in using self-organizing map (SOM) for DNA motif prediction due to the noisy background sequences in the input dataset. We propose the use of sequence filter in the pre-processing step to remove portion of the noisy background before applying to the SOM. Our method is motivated by the evolutionary conservation property of binding sites as opposed to randomness of background sequences. Our contributions are: (a) propose the use of string mismatch as filtering threshold function; and (b) two filtering methods, namely sequence driven and gapped consensus pattern, are proposed for filtering. We employed real datasets to evaluate the performance of SOM for DNA prediction after the filtering process. Our evaluation results show promising improvements in term of precision rates and also data reduction. We conclude that filtering background sequences is a feasible solution to improve prediction accuracy of using SOM for DNA motif prediction. Elsevier 2013 Article PeerReviewed text en http://ir.unimas.my/id/eprint/11945/1/Filtering%20of%20background%20DNA_abstract.pdf Lee, Nung Kion and Chieng, Allen Hoon Choong (2013) Filtering of Background DNA Sequences Improves DNA Motif Prediction Using Clustering Techniques. Procedia - Social and Behavioral Sciences, 97. pp. 602-611. ISSN 1877-0428 http://ac.els-cdn.com/S1877042813037245/1-s2.0-S1877042813037245-main.pdf?_tid=9ff50ec4-135b-11e6-b07e-00000aab0f26&acdnat=1462519672_d9a1dd367fa2434926676d8ad2649fd1 doi:10.1016/j.sbspro.2013.10.279
spellingShingle Q Science (General)
QA Mathematics
Lee, Nung Kion
Chieng, Allen Hoon Choong
Filtering of Background DNA Sequences Improves DNA Motif Prediction Using Clustering Techniques
title Filtering of Background DNA Sequences Improves DNA Motif Prediction Using Clustering Techniques
title_full Filtering of Background DNA Sequences Improves DNA Motif Prediction Using Clustering Techniques
title_fullStr Filtering of Background DNA Sequences Improves DNA Motif Prediction Using Clustering Techniques
title_full_unstemmed Filtering of Background DNA Sequences Improves DNA Motif Prediction Using Clustering Techniques
title_short Filtering of Background DNA Sequences Improves DNA Motif Prediction Using Clustering Techniques
title_sort filtering of background dna sequences improves dna motif prediction using clustering techniques
topic Q Science (General)
QA Mathematics
url http://ir.unimas.my/id/eprint/11945/
http://ir.unimas.my/id/eprint/11945/
http://ir.unimas.my/id/eprint/11945/
http://ir.unimas.my/id/eprint/11945/1/Filtering%20of%20background%20DNA_abstract.pdf