Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

Published in International Conference on Machine Learning (ICML), 2015

This paper provides optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problems with multiple plays. We establish theoretical guarantees for Bayesian bandit algorithms in the multiple-play setting, showing that Thompson sampling achieves optimal regret bounds.

Download paper here

Recommended citation: Komiyama, J., Honda, J., & Nakagawa, H. (2015). “Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays.” In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), 1152-1161.

Recommended citation: Komiyama, J., Honda, J., & Nakagawa, H. (2015). "Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays." In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), 1152-1161.
Download Paper