Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
The Thresholding Bandit (TB) problem is a popular sequential decision-making problem, which aims at identifying the systems whose means are greater than a threshold. Instead of working on the upper bound of a loss function, our approach stands out from conventional practices by directly minimizing the loss itself. Leveraging the large deviation theory, we firstly provide an asymptotically optimal allocation rule for the TB problem, and then propose a parameter-free Large Deviation (LD) algorithm to make the allocation rule implementable. Central limit theorem-based Large Deviation (CLD) algorithm is further proposed as a supplement to improve the computation efficiency using normal approximation. Extensive experiments are conducted to validate the superiority of our algorithms compared to existing methods, and demonstrate their broader applications to more general distributions and various kinds of loss functions.
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Comments on this article