• xmlui.mirage2.page-structure.header.title
    • français
    • English
  • Help
  • Login
  • Language 
    • Français
    • English
View Item 
  •   BIRD Home
  • LEDa (UMR CNRS 8007, UMR IRD 260)
  • LEDa : Publications
  • View Item
  •   BIRD Home
  • LEDa (UMR CNRS 8007, UMR IRD 260)
  • LEDa : Publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse

BIRDResearch centres & CollectionsBy Issue DateAuthorsTitlesTypeThis CollectionBy Issue DateAuthorsTitlesType

My Account

LoginRegister

Statistics

Most Popular ItemsStatistics by CountryMost Popular Authors
Thumbnail - No thumbnail

Robustness of stochastic bandit policies

Salomon, Antoine; Audibert, Jean-Yves (2014), Robustness of stochastic bandit policies, Theoretical Computer Science, 519, p. 46-67. http://dx.doi.org/10.1016/j.tcs.2013.09.019

Type
Article accepté pour publication ou publié
External document link
http://hal.inria.fr/hal-00821670
Date
2014
Journal name
Theoretical Computer Science
Volume
519
Publisher
Elsevier
Pages
46-67
Publication identifier
http://dx.doi.org/10.1016/j.tcs.2013.09.019
Metadata
Show full item record
Author(s)
Salomon, Antoine
Audibert, Jean-Yves
Abstract (EN)
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. [2] exhibit a policy such that with probability at least 1−1/n, the regret of the policy is of order log n. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. [3]. This work first answers an open question: it extends this negative result to any anytime policy (i.e. any policy that does not take the number of plays n into account). Another contribution of this paper is to design robust anytime policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms. We also show that, for any policy (i.e. even when the number of plays n is known), the regret is of order log n with probability at least 1−1/n, so that the policy of Audibert et al. has the best possible deviation properties.
Subjects / Keywords
Exploration–exploitation tradeoff; Multi-armed stochastic bandit; Regret deviations/risk
JEL
C73 - Stochastic and Dynamic Games; Evolutionary Games; Repeated Games
D81 - Criteria for Decision-Making under Risk and Uncertainty

Related items

Showing items related by title and author.

  • Thumbnail
    Lower Bounds and Selectivity of Weak-Consistent Policies in Stochastic Multi-Armed Bandit Problem. 
    El Alaoui, Issam; Audibert, Jean-Yves; Salomon, Antoine (2013-01) Article accepté pour publication ou publié
  • Thumbnail
    Time policies need time use researches 
    Boulin, Jean-Yves (2020) Communication / Conférence
  • Thumbnail
    Local Time Policies in Europe 
    Boulin, Jean-Yves (2017-06) Communication / Conférence
  • Thumbnail
    Local Time Policies in Europe 
    Boulin, Jean-Yves (2006) Chapitre d'ouvrage
  • Thumbnail
    Cohésion sociale emploi et compétitivité : éléments pour un débat 
    Trimouille, Frédérique; Saint-Martin, Anne; Lerais, Frédéric; Klein, Tristan; Kerbouch, Jean-Yves; Estrade, Marc-Antoine; Beaujolin, Rachel; Méda, Dominique (2002) Document de travail / Working paper
Dauphine PSL Bibliothèque logo
Place du Maréchal de Lattre de Tassigny 75775 Paris Cedex 16
Phone: 01 44 05 40 94
Contact
Dauphine PSL logoEQUIS logoCreative Commons logo