Show simple item record

hal.structure.identifierLaboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
dc.contributor.authorBenhamou, Éric
dc.date.accessioned2020-11-12T10:33:45Z
dc.date.available2020-11-12T10:33:45Z
dc.date.issued2019
dc.identifier.urihttps://basepub.dauphine.fr/handle/123456789/21202
dc.language.isoenen
dc.subjectPolicy gradienten
dc.subjectSupervised learningen
dc.subjectCross entropyen
dc.subjectKullback Leibler divergenceen
dc.subjectentropyen
dc.subject.ddc006.3en
dc.titleSimilarities between policy gradient methods (PGM) in reinforcement learning (RL) and supervised learning (SL)en
dc.typeDocument de travail / Working paper
dc.description.abstractenReinforcement learning (RL) is about sequential decision making and is traditionally opposed to supervised learning (SL) and unsupervised learning (USL). In RL, given the current state, the agent makes a decision that may influence the next state as opposed to SL (and USL) where, the next state remains the same, regardless of the decisions taken, either in batch or on-line learning. Although this difference is fundamental between SL and RL, there are connections that have been overlooked. In particular, we prove in this paper that gradient policy method can be cast as a supervised learning problem where true label are replaced with discounted rewards. We provide a new proof of policy gradient methods (PGM) that emphasizes the tight link with the cross entropy and supervised learning. We provide a simple experiment where we interchange label and pseudo rewards. We conclude that other relationships with SL could be made if we modify the reward functions wisely.en
dc.publisher.cityParisen
dc.relation.ispartofseriestitlePreprint Lamsadeen
dc.identifier.urlsitehttps://hal.archives-ouvertes.fr/hal-02886505en
dc.subject.ddclabelIntelligence artificielleen
dc.description.ssrncandidatenonen
dc.description.halcandidatenonen
dc.description.readershiprechercheen
dc.description.audienceInternationalen
dc.date.updated2020-11-12T10:32:03Z
hal.author.functionaut


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record