Fast algorithms for approximate circular string matching

Background: Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate ci...

Full description

Bibliographic Details
Main Authors: Barton, C., Iliopoulos, Costas, Pissis, S.
Format: Journal Article
Published: Springer 2014
Online Access:http://hdl.handle.net/20.500.11937/21124
_version_ 1848750502625411072
author Barton, C.
Iliopoulos, Costas
Pissis, S.
author_facet Barton, C.
Iliopoulos, Costas
Pissis, S.
author_sort Barton, C.
building Curtin Institutional Repository
collection Online Access
description Background: Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area.Results: In this article, we present a suboptimal average-case algorithm for exact circular string matching requiring time O(n). Based on our solution for the exact case, we present two fast average-case algorithms for approximate circular string matching with k-mismatches, under the Hamming distance model, requiring time O(n) for moderate values of k, that is k = O(m/ logm). We show how the same results can be easily obtained under the edit distancemodel. The presented algorithms are also implemented as library functions. Experimental results demonstrate thatthe functions provided in this library accelerate the computations by more than three orders of magnitude compared to a naïve approach.Conclusions: We present two fast average-case algorithms for approximate circular string matching with k-mismatches; and show that they also perform very well in practice. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any biological pipeline. The source code of the library is freely available at <a href="http://www.inf.kcl.ac.uk/research/projects/asmf">http://www.inf.kcl.ac.uk/research/projects/asmf</a>/.
first_indexed 2025-11-14T07:37:51Z
format Journal Article
id curtin-20.500.11937-21124
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T07:37:51Z
publishDate 2014
publisher Springer
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-211242017-09-13T13:53:47Z Fast algorithms for approximate circular string matching Barton, C. Iliopoulos, Costas Pissis, S. Background: Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area.Results: In this article, we present a suboptimal average-case algorithm for exact circular string matching requiring time O(n). Based on our solution for the exact case, we present two fast average-case algorithms for approximate circular string matching with k-mismatches, under the Hamming distance model, requiring time O(n) for moderate values of k, that is k = O(m/ logm). We show how the same results can be easily obtained under the edit distancemodel. The presented algorithms are also implemented as library functions. Experimental results demonstrate thatthe functions provided in this library accelerate the computations by more than three orders of magnitude compared to a naïve approach.Conclusions: We present two fast average-case algorithms for approximate circular string matching with k-mismatches; and show that they also perform very well in practice. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any biological pipeline. The source code of the library is freely available at <a href="http://www.inf.kcl.ac.uk/research/projects/asmf">http://www.inf.kcl.ac.uk/research/projects/asmf</a>/. 2014 Journal Article http://hdl.handle.net/20.500.11937/21124 10.1186/1748-7188-9-9 Springer unknown
spellingShingle Barton, C.
Iliopoulos, Costas
Pissis, S.
Fast algorithms for approximate circular string matching
title Fast algorithms for approximate circular string matching
title_full Fast algorithms for approximate circular string matching
title_fullStr Fast algorithms for approximate circular string matching
title_full_unstemmed Fast algorithms for approximate circular string matching
title_short Fast algorithms for approximate circular string matching
title_sort fast algorithms for approximate circular string matching
url http://hdl.handle.net/20.500.11937/21124