• xmlui.mirage2.page-structure.header.title
    • français
    • English
  • Help
  • Login
  • Language 
    • Français
    • English
View Item 
  •   BIRD Home
  • LAMSADE (UMR CNRS 7243)
  • LAMSADE : Publications
  • View Item
  •   BIRD Home
  • LAMSADE (UMR CNRS 7243)
  • LAMSADE : Publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse

BIRDResearch centres & CollectionsBy Issue DateAuthorsTitlesTypeThis CollectionBy Issue DateAuthorsTitlesType

My Account

LoginRegister

Statistics

Most Popular ItemsStatistics by CountryMost Popular Authors
Thumbnail - No thumbnail

AS-Index: A Structure For String Search Using n-grams and Algebraic Signatures

Constantin, Camelia; du Mouza, Cedric; Litwin, Witold; Rigaux, Philippe; Schwarz, Thomas (2016), AS-Index: A Structure For String Search Using n-grams and Algebraic Signatures, Journal of Computer Science and Technology, 31, 1, p. 147–166. 10.1007/s11390-016-1618-6

Type
Article accepté pour publication ou publié
External document link
https://hal.archives-ouvertes.fr/hal-01126550
Date
2016
Journal name
Journal of Computer Science and Technology
Volume
31
Number
1
Pages
147–166
Publication identifier
10.1007/s11390-016-1618-6
Metadata
Show full item record
Author(s)
Constantin, Camelia

du Mouza, Cedric

Litwin, Witold
Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
Rigaux, Philippe cc

Schwarz, Thomas
Abstract (EN)
We present the AS-Index, a new index structure for exact string search in disk resident databases. AS-Index relies on a classical inverted file structure, whose main innovation is a probabilistic search based on the properties of algebraic signatures used for both n-grams hashing and pattern search. Specifically, the properties of our signatures allow to carry out a search by inspecting only two of the posting lists. The algorithm thus enjoys the unique feature of requiring a constant number of disk accesses, independently from both the pattern size and the database size. We conduct extensive experiments on large datasets to evaluate our index behavior. They confirm that it steadily provides a search performance proportional to the two disk accesses necessary to obtain the posting lists. This makes our structure a choice of interest for the class of applications that require very fast lookups in large textual databases. We describe the index structure, our use of algebraic signatures, and the search algorithm. We discuss the operational trade-offs based on the parameters that affect the behavior of our structure, and present the theoretical and experimental performance analysis. We next compare the AS-Index with the state-of-the-art alternatives and show that 1) its construction time matches that of its competitors, due to the similarity of structures, 2) as for search time, it constantly outperforms the standard approach, thanks to the economical access to data complemented by signature calculations, which is at the core of our search method.
Subjects / Keywords
Full text indexing; Large scale indexing; Algebraic signatures; Signatures algébriques; Indexation de grande échelle; L'indexation de texte intégral

Related items

Showing items related by title and author.

  • Thumbnail
    AS-Index: A Structure For String Search Using n-grams and Algebraic Signatures 
    du Mouza, Cedric; Litwin, Witold; Rigaux, Philippe; Schwarz, Thomas (2009) Communication / Conférence
  • Thumbnail
    AS-Index: A Structure for String Search Using n-Grams and Algebraic Signatures 
    Rigaux, Philippe; Litwin, Witold; du Mouza, Cédric; Schwarz, Thomas (2009) Communication / Conférence
  • Thumbnail
    Fast nGram-Based String Search Over Data Encoded Using Algebraic Signatures 
    Litwin, Witold; Mokadem, Riad; Rigaux, Philippe; Schwartz, Thomas (2007) Communication / Conférence
  • Thumbnail
    Cumulative Algebraic Signatures for Fast String Search, Protection Against Incidental Viewing and Corruption of Data in an SDDS 
    Litwin, Witold; Mokadem, Riad; Schwarz, Thomas (2007) Communication / Conférence
  • Thumbnail
    The melodic signature index for fast content-based retrieval of symbolic scores 
    Constantin, Camelia; Faget, Zoe; du Mouza, Cédric; Rigaux, Philippe (2011) Communication / Conférence
Dauphine PSL Bibliothèque logo
Place du Maréchal de Lattre de Tassigny 75775 Paris Cedex 16
Phone: 01 44 05 40 94
Contact
Dauphine PSL logoEQUIS logoCreative Commons logo