Show simple item record

hal.structure.identifierLaboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
dc.contributor.authorBenhamou, Éric
dc.date.accessioned2020-11-12T10:18:17Z
dc.date.available2020-11-12T10:18:17Z
dc.date.issued2019
dc.identifier.urihttps://basepub.dauphine.fr/handle/123456789/21200
dc.language.isoenen
dc.subjectActor critic methoden
dc.subjectVariance reductionen
dc.subjectProjectionen
dc.subjectDeep RLen
dc.subject.ddc006.3en
dc.titleVariance Reduction in Actor Critic Methods (ACM)en
dc.typeDocument de travail / Working paper
dc.description.abstractenAfter presenting Actor Critic Methods (ACM), we show ACM are control variate estimators. Using the projection theorem, we prove that the Q and Advantage Actor Critic (A2C) methods are optimal in the sense of the L 2 norm for the control variate estima-tors spanned by functions conditioned by the current state and action. This straightforward application of Pythagoras theorem provides a theoretical justification of the strong performance of QAC and AAC most often referred to as A2C methods in deep policy gradient methods. This enables us to derive a new formulation for Advantage Actor Critic methods that has lower variance and improves the traditional A2C method.en
dc.publisher.cityParisen
dc.relation.ispartofseriestitlePreprint Lamsadeen
dc.identifier.urlsitehttps://hal.archives-ouvertes.fr/hal-02886487en
dc.subject.ddclabelIntelligence artificielleen
dc.description.ssrncandidatenonen
dc.description.halcandidatenonen
dc.description.readershiprechercheen
dc.description.audienceInternationalen
dc.date.updated2020-11-12T10:14:50Z
hal.author.functionaut


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record