Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
Published in International Conference on Machine Learning (ICML), 2015
This paper provides optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problems with multiple plays. We establish theoretical guarantees for Bayesian bandit algorithms in the multiple-play setting, showing that Thompson sampling achieves optimal regret bounds.
Recommended citation: Komiyama, J., Honda, J., & Nakagawa, H. (2015). “Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays.” In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), 1152-1161.
Recommended citation: Komiyama, J., Honda, J., & Nakagawa, H. (2015). "Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays." In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), 1152-1161.
Download Paper