Inverted files versus suffix arrays for locating patterns in primary memory
Recent advances in the asymptotic resource costs of pattern matching with compressed suffix arrays are attractive, but a key rival structure, the compressed inverted file, has been dismissed or ignored in papers presenting the new structures. In this paper we examine the resource requirements of com...
| Main Authors: | , , |
|---|---|
| Other Authors: | |
| Format: | Conference Paper |
| Published: |
Springer
2006
|
| Online Access: | http://hdl.handle.net/20.500.11937/43428 |
| _version_ | 1848756689022484480 |
|---|---|
| author | Puglisi, Simon Smyth, William Turpin, Andrew |
| author2 | Fabio Crestani |
| author_facet | Fabio Crestani Puglisi, Simon Smyth, William Turpin, Andrew |
| author_sort | Puglisi, Simon |
| building | Curtin Institutional Repository |
| collection | Online Access |
| description | Recent advances in the asymptotic resource costs of pattern matching with compressed suffix arrays are attractive, but a key rival structure, the compressed inverted file, has been dismissed or ignored in papers presenting the new structures. In this paper we examine the resource requirements of compressed suffix array algorithms against compressed inverted file data structures for general pattern matching in genomic and English texts. In both cases, the inverted file indexes q-grams, thus allowing full pattern matching capabilities, rather than simple word based search, making their functionality equivalent to the compressed suffix array structures. When using equivalent memory for the two structures, inverted files are faster at reporting the location of patterns when the number of occurrences of the patterns is high. |
| first_indexed | 2025-11-14T09:16:11Z |
| format | Conference Paper |
| id | curtin-20.500.11937-43428 |
| institution | Curtin University Malaysia |
| institution_category | Local University |
| last_indexed | 2025-11-14T09:16:11Z |
| publishDate | 2006 |
| publisher | Springer |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | curtin-20.500.11937-434282017-10-02T02:28:21Z Inverted files versus suffix arrays for locating patterns in primary memory Puglisi, Simon Smyth, William Turpin, Andrew Fabio Crestani Recent advances in the asymptotic resource costs of pattern matching with compressed suffix arrays are attractive, but a key rival structure, the compressed inverted file, has been dismissed or ignored in papers presenting the new structures. In this paper we examine the resource requirements of compressed suffix array algorithms against compressed inverted file data structures for general pattern matching in genomic and English texts. In both cases, the inverted file indexes q-grams, thus allowing full pattern matching capabilities, rather than simple word based search, making their functionality equivalent to the compressed suffix array structures. When using equivalent memory for the two structures, inverted files are faster at reporting the location of patterns when the number of occurrences of the patterns is high. 2006 Conference Paper http://hdl.handle.net/20.500.11937/43428 10.1007/11880561_11 Springer restricted |
| spellingShingle | Puglisi, Simon Smyth, William Turpin, Andrew Inverted files versus suffix arrays for locating patterns in primary memory |
| title | Inverted files versus suffix arrays for locating patterns in primary memory |
| title_full | Inverted files versus suffix arrays for locating patterns in primary memory |
| title_fullStr | Inverted files versus suffix arrays for locating patterns in primary memory |
| title_full_unstemmed | Inverted files versus suffix arrays for locating patterns in primary memory |
| title_short | Inverted files versus suffix arrays for locating patterns in primary memory |
| title_sort | inverted files versus suffix arrays for locating patterns in primary memory |
| url | http://hdl.handle.net/20.500.11937/43428 |