Dzung Phan, Vinicius Lima
INFORMS 2023
This paper presents a case study in which the TD(A) algorithm for training connectionist networks, proposed in (Sutton, 1988), is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex nontrivial task. It is found that, with zero knowledge built in, networks are able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. The hidden units in these network have apparently discovered useful features, a longstanding goal of computer games research. Furthermore, when a set of handcrafted features is added to the input representation, the resulting networks reach a near-expert level of performance, and have achieved good results in tests against world-class human play.
Dzung Phan, Vinicius Lima
INFORMS 2023
Bing Zhang, Mikio Takeuchi, et al.
ICAIF 2024
Yuta Tsuboi, Yuya Unno, et al.
AAAI 2011
Hagen Soltau, Lidia Mangu, et al.
ASRU 2011