Show simple item record

dc.contributor.authorDiday, Edwin
dc.date.accessioned2011-04-26T15:02:59Z
dc.date.available2011-04-26T15:02:59Z
dc.date.issued2011
dc.identifier.urihttps://basepub.dauphine.fr/handle/123456789/6055
dc.language.isoenen
dc.subjectMultidimensional dataen
dc.subjectData analysisen
dc.subject.ddc519en
dc.titlePrincipal Component Analysis for Categorical Histogram Data: Some Open Directions of Researchen
dc.typeChapitre d'ouvrage
dc.description.abstractenIn recent years, the analysis of symbolic data where the units are categories, classes or concepts described by interval, distributions, sets of categories and the like becomes a challenging task since many applicative fields generate massive amount of data that are difficult to store and to analyze with traditional techniques [1]. In this paper we propose a strategy for extending standard PCA to such data in the case where the variables values are “categorical histograms” (i.e. a set of categories called bins with their relative frequency). These variables are a special case of “modal” variables (see for example, Diday and Noirhomme [5]) or of “compositional” variables (Aitchison [1]) where the weights are not necessarily frequencies. First, we introduce “metabins” which mix together bins of the different histograms and enhance interpretability. Standard PCA applied on the bins of such data table loose the histograms constraints and suppose independencies between the bins but copulas takes care of the probabilities and the underlying dependencies. Then, we give several ways for representing the units (called “individuals”), the bins, the variables and the metabins when the number of categories is not the same for each variable. A way for representing the variation of the individuals, for getting histograms in output is given. Finally, some theoretical results allow the representation of the categorical histogram variables inside a hypercube covering the correlation sphere.en
dc.identifier.citationpages3-15en
dc.relation.ispartofseriestitleStudies in Classification, Data Analysis, and Knowledge Organization
dc.relation.ispartoftitleClassification and Multivariate Analysis for Complex Data Structuresen
dc.relation.ispartofeditorFichet, Bernard
dc.relation.ispartofeditorPiccolo, Domenico
dc.relation.ispartofeditorVerde, Rosanna
dc.relation.ispartofeditorVichi, Maurizio
dc.relation.ispartofpublnameSpringeren
dc.relation.ispartofpublcityBerlinen
dc.relation.ispartofdate2011
dc.relation.ispartofpages473en
dc.relation.ispartofurlhttp://dx.doi.org/10.1007/978-3-642-13312-1en
dc.description.sponsorshipprivateouien
dc.subject.ddclabelProbabilités et mathématiques appliquéesen
dc.relation.ispartofisbn978-3-642-13311-4en
dc.identifier.doihttp://dx.doi.org/10.1007/978-3-642-13312-1_1


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record