Inverted files versus suffix arrays for locating patterns in primary memory

Recent advances in the asymptotic resource costs of pattern matching with compressed suffix arrays are attractive, but a key rival structure, the compressed inverted file, has been dismissed or ignored in papers presenting the new structures. In this paper we examine the resource requirements of com...

Full description

Bibliographic Details
Main Authors:	Puglisi, Simon, Smyth, William, Turpin, Andrew
Other Authors:	Fabio Crestani
Format:	Conference Paper
Published:	Springer 2006
Online Access:	http://hdl.handle.net/20.500.11937/43428

_version_	1848756689022484480
author	Puglisi, Simon Smyth, William Turpin, Andrew
author2	Fabio Crestani
author_facet	Fabio Crestani Puglisi, Simon Smyth, William Turpin, Andrew
author_sort	Puglisi, Simon
building	Curtin Institutional Repository
collection	Online Access
description	Recent advances in the asymptotic resource costs of pattern matching with compressed suffix arrays are attractive, but a key rival structure, the compressed inverted file, has been dismissed or ignored in papers presenting the new structures. In this paper we examine the resource requirements of compressed suffix array algorithms against compressed inverted file data structures for general pattern matching in genomic and English texts. In both cases, the inverted file indexes q-grams, thus allowing full pattern matching capabilities, rather than simple word based search, making their functionality equivalent to the compressed suffix array structures. When using equivalent memory for the two structures, inverted files are faster at reporting the location of patterns when the number of occurrences of the patterns is high.
first_indexed	2025-11-14T09:16:11Z
format	Conference Paper
id	curtin-20.500.11937-43428
institution	Curtin University Malaysia
institution_category	Local University
last_indexed	2025-11-14T09:16:11Z
publishDate	2006
publisher	Springer
recordtype	eprints
repository_type	Digital Repository
spelling	curtin-20.500.11937-434282017-10-02T02:28:21Z Inverted files versus suffix arrays for locating patterns in primary memory Puglisi, Simon Smyth, William Turpin, Andrew Fabio Crestani Recent advances in the asymptotic resource costs of pattern matching with compressed suffix arrays are attractive, but a key rival structure, the compressed inverted file, has been dismissed or ignored in papers presenting the new structures. In this paper we examine the resource requirements of compressed suffix array algorithms against compressed inverted file data structures for general pattern matching in genomic and English texts. In both cases, the inverted file indexes q-grams, thus allowing full pattern matching capabilities, rather than simple word based search, making their functionality equivalent to the compressed suffix array structures. When using equivalent memory for the two structures, inverted files are faster at reporting the location of patterns when the number of occurrences of the patterns is high. 2006 Conference Paper http://hdl.handle.net/20.500.11937/43428 10.1007/11880561_11 Springer restricted
spellingShingle	Puglisi, Simon Smyth, William Turpin, Andrew Inverted files versus suffix arrays for locating patterns in primary memory
title	Inverted files versus suffix arrays for locating patterns in primary memory
title_full	Inverted files versus suffix arrays for locating patterns in primary memory
title_fullStr	Inverted files versus suffix arrays for locating patterns in primary memory
title_full_unstemmed	Inverted files versus suffix arrays for locating patterns in primary memory
title_short	Inverted files versus suffix arrays for locating patterns in primary memory
title_sort	inverted files versus suffix arrays for locating patterns in primary memory
url	http://hdl.handle.net/20.500.11937/43428

Inverted files versus suffix arrays for locating patterns in primary memory

Similar Items