• français
    • English
  • English 
    • français
    • English
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.
BIRD Home

Browse

This CollectionBy Issue DateAuthorsTitlesSubjectsJournals BIRDResearch centres & CollectionsBy Issue DateAuthorsTitlesSubjectsJournals

My Account

Login

Statistics

View Usage Statistics

Similarities between policy gradient methods (PGM) in reinforcement learning (RL) and supervised learning (SL)

Thumbnail
Date
2019
Publisher city
Paris
Collection title
Preprint Lamsade
Link to item file
https://hal.archives-ouvertes.fr/hal-02886505
Dewey
Intelligence artificielle
Sujet
Policy gradient; Supervised learning; Cross entropy; Kullback Leibler divergence; entropy
URI
https://basepub.dauphine.fr/handle/123456789/21202
Collections
  • LAMSADE : Publications
Metadata
Show full item record
Author
Benhamou, Éric
989 Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
Type
Document de travail / Working paper
Abstract (EN)
Reinforcement learning (RL) is about sequential decision making and is traditionally opposed to supervised learning (SL) and unsupervised learning (USL). In RL, given the current state, the agent makes a decision that may influence the next state as opposed to SL (and USL) where, the next state remains the same, regardless of the decisions taken, either in batch or on-line learning. Although this difference is fundamental between SL and RL, there are connections that have been overlooked. In particular, we prove in this paper that gradient policy method can be cast as a supervised learning problem where true label are replaced with discounted rewards. We provide a new proof of policy gradient methods (PGM) that emphasizes the tight link with the cross entropy and supervised learning. We provide a simple experiment where we interchange label and pseudo rewards. We conclude that other relationships with SL could be made if we modify the reward functions wisely.

  • Accueil Bibliothèque
  • Site de l'Université Paris-Dauphine
  • Contact
SCD Paris Dauphine - Place du Maréchal de Lattre de Tassigny 75775 Paris Cedex 16

 Content on this site is licensed under a Creative Commons 2.0 France (CC BY-NC-ND 2.0) license.