Dong Ki Kim, Matthew Riemer, et al.
NeurIPS 2022
Recent multi-agent extensions of Q-Learning require knowledge of other agents' payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed "Hyper-Q" Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents' strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented.
Dong Ki Kim, Matthew Riemer, et al.
NeurIPS 2022
Matthew Riemer, Ignacio Cases, et al.
ICLR 2019
Gerald Tesauro, David M. Chess, et al.
AAMAS 2004
Gerald Tesauro
IEEE Internet Computing