Neural belief reasoner
Haifeng Qian
IJCAI 2020
Deep reinforcement learning combined with Monte-Carlo tree search (MCTS) has demonstrated high performance and thus has been attracting much attention. However, the learning convergence is quite time consuming. In comparison, learning by playing board games with human opponents is more efficient because skills and strategies can be acquired from the failure patterns. We assume that failure patterns contain much meaningful information to expedite the training process, working as prior knowledge for reinforcement learning. To utilize this prior knowledge, we propose an efficient tree search method that introduces the use of a failure ratio that has a high value for failure patterns. We tested our hypothesis by applying this method to the Othello board game. The results show that our method has a higher winning ratio than a state-of-the-art method, especially in the early stage of learning.
Haifeng Qian
IJCAI 2020
Debarun Bhattacharjya, Dharmashankar Subramanian, et al.
IJCAI 2020
Debarun Bhattacharjya, Tian Gao, et al.
IJCAI 2020
Don Joven Ravoy Agravante, Daiki Kimura, et al.
ACL 2023