Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method

Convolutionary neural network (CNN) is a popular choice for supervised DNA motif prediction due to its very high performance. To employ CNN, the input DNA sequences are required to be encoded as numerical values and represented as either vectors or multi-dimensional matrices. This paper evaluates a...

Full description

Bibliographic Details
Main Authors: Choong, Allen Chieng Hoon, Lee, Nung Kion
Format: Article
Language:English
Published: IEEE 2018
Subjects:
Online Access:http://ir.unimas.my/id/eprint/19014/
http://ir.unimas.my/id/eprint/19014/1/encoding1.pdf
_version_ 1848838630850691072
author Choong, Allen Chieng Hoon
Lee, Nung Kion
author_facet Choong, Allen Chieng Hoon
Lee, Nung Kion
author_sort Choong, Allen Chieng Hoon
building UNIMAS Institutional Repository
collection Online Access
description Convolutionary neural network (CNN) is a popular choice for supervised DNA motif prediction due to its very high performance. To employ CNN, the input DNA sequences are required to be encoded as numerical values and represented as either vectors or multi-dimensional matrices. This paper evaluates a simple and more compact ordinal encoding method versus the popular one-hot encoding for DNA sequences. We compare the performances of both encoding methods using three sets of datasets enriched with DNA motifs. We found that the ordinal encoding performs comparable to the one-hot method but with significant reduction in training time. In addition, the one-hot encoding performances are rather consistent across various datasets but would require suitable CNN configuration to perform well. The ordinal encoding with matrix representation performs best in some of the evaluated datasets. This study implies that the performances of CNN for DNA motif discovery depends on the suitable design of the sequence encoding and representation. The good performances of the ordinal encoding method demonstrates that there are still rooms for improvement for the one-hot encoding method.
first_indexed 2025-11-15T06:58:37Z
format Article
id unimas-19014
institution Universiti Malaysia Sarawak
institution_category Local University
language English
last_indexed 2025-11-15T06:58:37Z
publishDate 2018
publisher IEEE
recordtype eprints
repository_type Digital Repository
spelling unimas-190142021-11-03T02:26:34Z http://ir.unimas.my/id/eprint/19014/ Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method Choong, Allen Chieng Hoon Lee, Nung Kion Q Science (General) QA75 Electronic computers. Computer science T Technology (General) Convolutionary neural network (CNN) is a popular choice for supervised DNA motif prediction due to its very high performance. To employ CNN, the input DNA sequences are required to be encoded as numerical values and represented as either vectors or multi-dimensional matrices. This paper evaluates a simple and more compact ordinal encoding method versus the popular one-hot encoding for DNA sequences. We compare the performances of both encoding methods using three sets of datasets enriched with DNA motifs. We found that the ordinal encoding performs comparable to the one-hot method but with significant reduction in training time. In addition, the one-hot encoding performances are rather consistent across various datasets but would require suitable CNN configuration to perform well. The ordinal encoding with matrix representation performs best in some of the evaluated datasets. This study implies that the performances of CNN for DNA motif discovery depends on the suitable design of the sequence encoding and representation. The good performances of the ordinal encoding method demonstrates that there are still rooms for improvement for the one-hot encoding method. IEEE 2018-01-29 Article PeerReviewed text en http://ir.unimas.my/id/eprint/19014/1/encoding1.pdf Choong, Allen Chieng Hoon and Lee, Nung Kion (2018) Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method. IEEE Xplore, 1 (2018). pp. 1-6. https://ieeexplore.ieee.org/abstract/document/8270400 10.1109/ICONDA.2017.8270400
spellingShingle Q Science (General)
QA75 Electronic computers. Computer science
T Technology (General)
Choong, Allen Chieng Hoon
Lee, Nung Kion
Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method
title Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method
title_full Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method
title_fullStr Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method
title_full_unstemmed Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method
title_short Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method
title_sort evaluation of convolutionary neural networks modeling of dna sequences using ordinal versus one-hot encoding method
topic Q Science (General)
QA75 Electronic computers. Computer science
T Technology (General)
url http://ir.unimas.my/id/eprint/19014/
http://ir.unimas.my/id/eprint/19014/
http://ir.unimas.my/id/eprint/19014/
http://ir.unimas.my/id/eprint/19014/1/encoding1.pdf