• français
    • English
  • English 
    • français
    • English
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.
BIRD Home

Browse

This CollectionBy Issue DateAuthorsTitlesSubjectsJournals BIRDResearch centres & CollectionsBy Issue DateAuthorsTitlesSubjectsJournals

My Account

Login

Statistics

View Usage Statistics

Variance Reduction in Actor Critic Methods (ACM)

Thumbnail
Date
2019
Publisher city
Paris
Collection title
Preprint Lamsade
Link to item file
https://hal.archives-ouvertes.fr/hal-02886487
Dewey
Intelligence artificielle
Sujet
Actor critic method; Variance reduction; Projection; Deep RL
URI
https://basepub.dauphine.fr/handle/123456789/21200
Collections
  • LAMSADE : Publications
Metadata
Show full item record
Author
Benhamou, Éric
989 Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
Type
Document de travail / Working paper
Abstract (EN)
After presenting Actor Critic Methods (ACM), we show ACM are control variate estimators. Using the projection theorem, we prove that the Q and Advantage Actor Critic (A2C) methods are optimal in the sense of the L 2 norm for the control variate estima-tors spanned by functions conditioned by the current state and action. This straightforward application of Pythagoras theorem provides a theoretical justification of the strong performance of QAC and AAC most often referred to as A2C methods in deep policy gradient methods. This enables us to derive a new formulation for Advantage Actor Critic methods that has lower variance and improves the traditional A2C method.

  • Accueil Bibliothèque
  • Site de l'Université Paris-Dauphine
  • Contact
SCD Paris Dauphine - Place du Maréchal de Lattre de Tassigny 75775 Paris Cedex 16

 Content on this site is licensed under a Creative Commons 2.0 France (CC BY-NC-ND 2.0) license.