Fast, Practical Algorithms for Computing All the Repeats in a String

Given a string x = x[1..n] on an alphabet of size α, and a threshold p min ≥ 1, we describe four variants of an algorithm PSY1 that, using a suffix array, computes all the complete nonextendible repeats in x of length p ≥ p min . The basic algorithm PSY1–1 and its simple extension PSY1–2 are fast on...

Full description

Bibliographic Details
Main Authors:	Puglisi, Simon, Smyth, William, Yusufu, M.
Format:	Journal Article
Published:	Springer 2010
Subjects:	Repeat – Repetition – Suffix array – Suffix tree
Online Access:	http://hdl.handle.net/20.500.11937/36774

_version_	1848754863766241280
author	Puglisi, Simon Smyth, William Yusufu, M.
author_facet	Puglisi, Simon Smyth, William Yusufu, M.
author_sort	Puglisi, Simon
building	Curtin Institutional Repository
collection	Online Access
description	Given a string x = x[1..n] on an alphabet of size α, and a threshold p min ≥ 1, we describe four variants of an algorithm PSY1 that, using a suffix array, computes all the complete nonextendible repeats in x of length p ≥ p min . The basic algorithm PSY1–1 and its simple extension PSY1–2 are fast on strings that occur in biological, natural language and other applications (not highly periodic strings), while PSY1–3 guarantees Θ(n) worst-case execution time. The final variant, PSY1–4, also achieves Θ(n) processing time and, over the complete range of strings tested, is the fastest of the four. The space requirement of all four algorithms is about 5n bytes, but all make use of the “longest common prefix” (LCP) array, whose construction requires about 6n bytes. The four algorithms are faster in applications and use less space than a recently-proposed algorithm (Narisawa in Proceedings of 18th Annual Symposium on Combinatorial Pattern Matching, pp. 340–351, 2007) that produces equivalent output. The suffix array is not explicitly used by algorithms PSY1, but may be required for postprocessing; in this case, storage requirements rise to 9n bytes. We also describe two variants of a fast Θ(n)-time algorithm PSY2 for computing all complete supernonextendible repeats in x.
first_indexed	2025-11-14T08:47:10Z
format	Journal Article
id	curtin-20.500.11937-36774
institution	Curtin University Malaysia
institution_category	Local University
last_indexed	2025-11-14T08:47:10Z
publishDate	2010
publisher	Springer
recordtype	eprints
repository_type	Digital Repository
spelling	curtin-20.500.11937-367742017-09-13T15:56:20Z Fast, Practical Algorithms for Computing All the Repeats in a String Puglisi, Simon Smyth, William Yusufu, M. Repeat – Repetition – Suffix array – Suffix tree Given a string x = x[1..n] on an alphabet of size α, and a threshold p min ≥ 1, we describe four variants of an algorithm PSY1 that, using a suffix array, computes all the complete nonextendible repeats in x of length p ≥ p min . The basic algorithm PSY1–1 and its simple extension PSY1–2 are fast on strings that occur in biological, natural language and other applications (not highly periodic strings), while PSY1–3 guarantees Θ(n) worst-case execution time. The final variant, PSY1–4, also achieves Θ(n) processing time and, over the complete range of strings tested, is the fastest of the four. The space requirement of all four algorithms is about 5n bytes, but all make use of the “longest common prefix” (LCP) array, whose construction requires about 6n bytes. The four algorithms are faster in applications and use less space than a recently-proposed algorithm (Narisawa in Proceedings of 18th Annual Symposium on Combinatorial Pattern Matching, pp. 340–351, 2007) that produces equivalent output. The suffix array is not explicitly used by algorithms PSY1, but may be required for postprocessing; in this case, storage requirements rise to 9n bytes. We also describe two variants of a fast Θ(n)-time algorithm PSY2 for computing all complete supernonextendible repeats in x. 2010 Journal Article http://hdl.handle.net/20.500.11937/36774 10.1007/s11786-010-0033-6 Springer fulltext
spellingShingle	Repeat – Repetition – Suffix array – Suffix tree Puglisi, Simon Smyth, William Yusufu, M. Fast, Practical Algorithms for Computing All the Repeats in a String
title	Fast, Practical Algorithms for Computing All the Repeats in a String
title_full	Fast, Practical Algorithms for Computing All the Repeats in a String
title_fullStr	Fast, Practical Algorithms for Computing All the Repeats in a String
title_full_unstemmed	Fast, Practical Algorithms for Computing All the Repeats in a String
title_short	Fast, Practical Algorithms for Computing All the Repeats in a String
title_sort	fast, practical algorithms for computing all the repeats in a string
topic	Repeat – Repetition – Suffix array – Suffix tree
url	http://hdl.handle.net/20.500.11937/36774

Fast, Practical Algorithms for Computing All the Repeats in a String

Similar Items