
Similarities between policy gradient methods (PGM) in reinforcement learning (RL) and supervised learning (SL)
Benhamou, Éric (2019), Similarities between policy gradient methods (PGM) in reinforcement learning (RL) and supervised learning (SL). https://basepub.dauphine.fr/handle/123456789/21202
Voir/Ouvrir
Type
Document de travail / Working paperLien vers un document non conservé dans cette base
https://hal.archives-ouvertes.fr/hal-02886505Date
2019Titre de la collection
Preprint LamsadeVille d’édition
Paris
Métadonnées
Afficher la notice complèteAuteur(s)
Benhamou, ÉricLaboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
Résumé (EN)
Reinforcement learning (RL) is about sequential decision making and is traditionally opposed to supervised learning (SL) and unsupervised learning (USL). In RL, given the current state, the agent makes a decision that may influence the next state as opposed to SL (and USL) where, the next state remains the same, regardless of the decisions taken, either in batch or on-line learning. Although this difference is fundamental between SL and RL, there are connections that have been overlooked. In particular, we prove in this paper that gradient policy method can be cast as a supervised learning problem where true label are replaced with discounted rewards. We provide a new proof of policy gradient methods (PGM) that emphasizes the tight link with the cross entropy and supervised learning. We provide a simple experiment where we interchange label and pseudo rewards. We conclude that other relationships with SL could be made if we modify the reward functions wisely.Mots-clés
Policy gradient; Supervised learning; Cross entropy; Kullback Leibler divergence; entropyPublications associées
Affichage des éléments liés par titre et auteur.
-
Benhamou, Éric; Saltiel, David; Ungari, Sandrine; Mukhopadhyay, Abhishek (2020) Document de travail / Working paper
-
Saltiel, David; Benhamou, Eric (2018) Document de travail / Working paper
-
Saltiel, David; Benhamou, Eric; Laraki, Rida; Atif, Jamal (2021) Communication / Conférence
-
Benhamou, Éric; Saltiel, David; Ungari, Sandrine; Mukhopadhyay, Abhishek (2020) Document de travail / Working paper
-
Benhamou, Éric; Saltiel, David; Tabachnik, Serge; Wong, Sui Kai; Chareyron, François (2021) Document de travail / Working paper