Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
Published in Conference on Learning Theory (COLT), 2015
This paper establishes regret lower bounds and develops optimal algorithms for dueling bandit problems. We provide fundamental theoretical contributions to preference-based learning, showing that our algorithms achieve optimal regret bounds.
Recommended citation: Komiyama, J., Honda, J., Kashima, H., & Nakagawa, H. (2015). “Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem.” In Proceedings of the 28th Annual Conference on Learning Theory (COLT 2015), 1141-1154.
Recommended citation: Komiyama, J., Honda, J., Kashima, H., & Nakagawa, H. (2015). "Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem." In Proceedings of the 28th Annual Conference on Learning Theory (COLT 2015), 1141-1154.
Download Paper