Short paper

Multi-Armed Bandit with Sparse and Noisy Feedback

Abstract

The increased use of personal assistants has made question-answering a common method for user-system interaction. In these systems, while it is easy to observe implicit feedbacks such as a user clicking on a link provided by the QA system, they can be noisy. On the other hand, receiving explicit feedback on the response is rare but more valuable. To address this issue, this paper proposes a new stochastic multi-armed bandit model that considers both types of feedbacks, noisy and sparse rewards. The model is studied in both classical and contextual bandit settings, and efficient algorithm is proposed and analyzed based on the UCB framework. This algorithm is evaluated through empirical studies on various reward distributions and a real-world dataset and application.