• français
    • English
  • English 
    • français
    • English
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.
BIRD Home

Browse

This CollectionBy Issue DateAuthorsTitlesSubjectsJournals BIRDResearch centres & CollectionsBy Issue DateAuthorsTitlesSubjectsJournals

My Account

Login

Statistics

View Usage Statistics

Fast nGram-Based String Search Over Data Encoded Using Algebraic Signatures

Thumbnail
View/Open
Fast_ngram.PDF (450.5Kb)
Date
2007
Dewey
Organisation des données
Sujet
databases; algebraic signature; Search algorithm
Conference name
33rd International Conference on Very Large Data Bases VLDB 2007
Conference date
09-2007
Conference city
Vienne
Conference country
Autriche
Book title
Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007
Author
Koch, Christoph; Gehrke, Johannes; Garofalakis, Minos; Srivastava, Divesh; Aberer, Karl; Deshpande, Anand; Forescu, Daniela; Chan, Chee Yong; Ganti, Venkatesh; Kanne, Karl-Christian; Klas, Wolfgang; Neuhold, Erich J.
Publisher
AMC
Year
2007
ISBN
978-1-59593-649-3
URI
https://basepub.dauphine.fr/handle/123456789/4957
Collections
  • LAMSADE : Publications
Metadata
Show full item record
Author
Litwin, Witold
Mokadem, Riad
Rigaux, Philippe
Schwartz, Thomas
Type
Communication / Conférence
Item number of pages
207-218
Abstract (EN)
We propose a novel string search algorithm for data stored once and read many times. Our search method combines the sublinear traversal of the record (as in Boyer Moore or Knuth-Morris-Pratt) with the agglomeration of parts of the record and search pattern into a single character – the algebraic signature – in the manner of Karp-Rabin. Our experiments show that our algorithm is up to seventy times faster for DNA data, up to eleven times faster for ASCII, and up to a six times faster for XML documents compared with an im- plementation of Boyer-Moore. To obtain this speed-up, we store records in encoded form, where each original character is replaced with an algebraic signature. Our method applies to records stored in databases in general and to distributed implementations of a Database As Service (DAS) in particular. Clients send records for insertion and search patterns already in encoded form and servers never operate on records in clear text. No one at a node can involuntarily discover the content of the stored data.

  • Accueil Bibliothèque
  • Site de l'Université Paris-Dauphine
  • Contact
SCD Paris Dauphine - Place du Maréchal de Lattre de Tassigny 75775 Paris Cedex 16

 Content on this site is licensed under a Creative Commons 2.0 France (CC BY-NC-ND 2.0) license.