• français
    • English
  • English 
    • français
    • English
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.
BIRD Home

Browse

This CollectionBy Issue DateAuthorsTitlesSubjectsJournals BIRDResearch centres & CollectionsBy Issue DateAuthorsTitlesSubjectsJournals

My Account

Login

Statistics

View Usage Statistics

Graph sketching-based Space-efficient Data Clustering

Thumbnail
View/Open
Graph_sketching-based.pdf (5.440Mb)
Date
2018
Dewey
Programmation, logiciels, organisation des données
Sujet
space constraints; resources-limited mobile devices; DBMSTClu; clustering partition; Spectral Clustering method; data cluster
DOI
http://dx.doi.org/10.1137/1.9781611975321.2
Conference name
2018 SIAM International Conference on Data Mining
Conference date
05-2018
Conference city
San Diego
Conference country
United States
Book title
Proceedings of the 2018 SIAM International Conference on Data Mining
Author
Ester, Martin; Pedreschi, Dino
Publisher
SIAM - Society for Industrial and Applied Mathematics
Publisher city
Philadelphia
Pages number
764
ISBN
978-1-61197-532-1
Book URL
10.1137/1.9781611975321
URI
https://basepub.dauphine.fr/handle/123456789/20861
Collections
  • LAMSADE : Publications
Metadata
Show full item record
Author
Morvan, Anne
Choromanski, Krzysztof
Gouy-Pailler, Cedric
Atif, Jamal
989 Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
Type
Communication / Conférence
Item number of pages
10-18
Abstract (EN)
In this paper, we address the problem of recovering arbitrary-shaped data clusters from datasets while facing high space constraints, as this is for instance the case in many real-world applications when analysis algorithms are directly deployed on resources-limited mobile devices collecting the data. We present DBMSTClu a new space-efficient density-based non-parametric method working on a Minimum Spanning Tree (MST) recovered from a limited number of linear measurements i.e. a sketched version of the dissimilarity graph between the N objects to cluster. Unlike k-means, k-medians or k-medoids algorithms, it does not fail at distinguishing clusters with particular forms thanks to the property of the MST for expressing the underlying structure of a graph. No input parameter is needed contrarily to DBSCAN or the Spectral Clustering method. An approximate MST is retrieved by following the dynamic semi-streaming model in handling the dissimilarity graph as a stream of edge weight updates which is sketched in one pass over the data into a compact structure requiring O(N polylog(N)) space, far better than the theoretical memory cost O(N2) of . The recovered approximate MST as input, DBMSTClu then successfully detects the right number of nonconvex clusters by performing relevant cuts on in a time linear in N. We provide theoretical guarantees on the quality of the clustering partition and also demonstrate its advantage over the existing state-of-the-art on several datasets.

  • Accueil Bibliothèque
  • Site de l'Université Paris-Dauphine
  • Contact
SCD Paris Dauphine - Place du Maréchal de Lattre de Tassigny 75775 Paris Cedex 16

 Content on this site is licensed under a Creative Commons 2.0 France (CC BY-NC-ND 2.0) license.