Show simple item record

dc.contributor.authorNguyen, Thanh Hai
dc.contributor.authorPrifti, Edi
dc.contributor.authorChevaleyre, Yann
dc.contributor.authorSokolovska, Nataliya
dc.contributor.authorZucker, Jean-Daniel
dc.date.accessioned2020-09-30T09:51:59Z
dc.date.available2020-09-30T09:51:59Z
dc.date.issued2018
dc.identifier.urihttps://basepub.dauphine.fr/handle/123456789/21018
dc.language.isoenen
dc.subjectMachine Learning for Health
dc.subjectclassification
dc.subjectmetagenomics
dc.subjectdeep learning
dc.subjectvisualization
dc.subject.ddc004en
dc.titleDisease Classification in Metagenomics with 2D Embeddings and Deep Learning
dc.typeCommunication / Conférence
dc.description.abstractenDeep learning (DL) techniques have shown unprecedented success when applied to images, waveforms, and text. Generally, when the sample size (N) is much bigger than the number of features (d), DL often out-performs other machine learning (ML) techniques, often through the use of Convolutional Neural Networks (CNNs). However, in many bioinformatics fields (including metagenomics), we encounter the opposite situation where d is significantly greater than N. In these situations, applying DL techniques would lead to severe over-fitting. Here we aim to improve classification of various diseases with metagenomic data through the use of CNNs. For this we proposed to represent metagenomic data as images. The proposed Met2Img approach relies on taxonomic and t-SNE embeddings to transform abundance data into " synthetic images ". We applied our approach to twelve benchmark data sets including more than 1400 metagenomic samples. Our results show significant improvements over the state-of-the-art algorithms (Random Forest (RF), Support Vector Machine (SVM)). We observe that the integration of phylogenetic information alongside abundance data improves classification. The proposed approach is not only important in classification setting but also allows to visualize complex metagenomic data. The Met2Img is implemented in Python.
dc.identifier.urlsitehttps://hal.sorbonne-universite.fr/hal-01819205
dc.subject.ddclabelInformatique généraleen
dc.relation.conftitleLa Conférence sur l'Apprentissage automatique (CAp)
dc.relation.confdate2018-06
dc.relation.confcityRouen
dc.relation.confcountryFRANCE
dc.relation.forthcomingnonen
dc.description.ssrncandidatenon
dc.description.halcandidatenon
dc.description.readershipnon-recherche
dc.description.audienceInternational
dc.date.updated2020-10-23T11:20:07Z
hal.person.labIds
hal.person.labIds
hal.person.labIds989
hal.person.labIds
hal.person.labIds


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record