Show simple item record

dc.contributor.authorConstantin, Camelia
dc.contributor.authordu Mouza, Cedric
dc.contributor.authorLitwin, Witold
dc.contributor.authorRigaux, Philippe
dc.contributor.authorSchwarz, Thomas
dc.date.accessioned2020-05-27T12:39:12Z
dc.date.available2020-05-27T12:39:12Z
dc.date.issued2016
dc.identifier.urihttps://basepub.dauphine.fr/handle/123456789/20784
dc.language.isoenen
dc.subjectFull text indexingen
dc.subjectLarge scale indexingen
dc.subjectAlgebraic signaturesen
dc.subjectSignatures algébriquesen
dc.subjectIndexation de grande échelleen
dc.subjectL'indexation de texte intégralen
dc.subject.ddc005.7en
dc.titleAS-Index: A Structure For String Search Using n-grams and Algebraic Signaturesen
dc.typeArticle accepté pour publication ou publié
dc.description.abstractenWe present the AS-Index, a new index structure for exact string search in disk resident databases. AS-Index relies on a classical inverted file structure, whose main innovation is a probabilistic search based on the properties of algebraic signatures used for both n-grams hashing and pattern search. Specifically, the properties of our signatures allow to carry out a search by inspecting only two of the posting lists. The algorithm thus enjoys the unique feature of requiring a constant number of disk accesses, independently from both the pattern size and the database size. We conduct extensive experiments on large datasets to evaluate our index behavior. They confirm that it steadily provides a search performance proportional to the two disk accesses necessary to obtain the posting lists. This makes our structure a choice of interest for the class of applications that require very fast lookups in large textual databases. We describe the index structure, our use of algebraic signatures, and the search algorithm. We discuss the operational trade-offs based on the parameters that affect the behavior of our structure, and present the theoretical and experimental performance analysis. We next compare the AS-Index with the state-of-the-art alternatives and show that 1) its construction time matches that of its competitors, due to the similarity of structures, 2) as for search time, it constantly outperforms the standard approach, thanks to the economical access to data complemented by signature calculations, which is at the core of our search method.en
dc.relation.isversionofjnlnameJournal of Computer Science and Technology
dc.relation.isversionofjnlvol31en
dc.relation.isversionofjnlissue1en
dc.relation.isversionofjnldate2016-01
dc.relation.isversionofjnlpages147–166en
dc.relation.isversionofdoi10.1007/s11390-016-1618-6en
dc.identifier.urlsitehttps://hal.archives-ouvertes.fr/hal-01126550en
dc.subject.ddclabelOrganisation des donnéesen
dc.relation.forthcomingnonen
dc.relation.forthcomingprintnonen
dc.description.ssrncandidatenonen
dc.description.halcandidatenonen
dc.description.readershiprechercheen
dc.description.audienceInternationalen
dc.relation.Isversionofjnlpeerreviewednonen
dc.relation.Isversionofjnlpeerreviewednonen
dc.date.updated2020-05-27T12:36:29Z
hal.person.labIds
hal.person.labIds
hal.person.labIds989
hal.person.labIds
hal.person.labIds


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record