Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

Published in Conference on Learning Theory (COLT), 2015

This paper establishes regret lower bounds and develops optimal algorithms for dueling bandit problems. We provide fundamental theoretical contributions to preference-based learning, showing that our algorithms achieve optimal regret bounds.

Download paper here

Recommended citation: Komiyama, J., Honda, J., Kashima, H., & Nakagawa, H. (2015). “Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem.” In Proceedings of the 28th Annual Conference on Learning Theory (COLT 2015), 1141-1154.

Recommended citation: Komiyama, J., Honda, J., Kashima, H., & Nakagawa, H. (2015). "Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem." In Proceedings of the 28th Annual Conference on Learning Theory (COLT 2015), 1141-1154.
Download Paper