C.A. Micchelli, W.L. Miranker
Journal of the ACM
Many practical reinforcement learning environments have a discrete factored action space that induces a large combinatorial set of actions, thereby posing significant challenges. Existing approaches leverage the regular structure of the action space and resort to a linear decomposition of Q-functions, which avoids enumerating all combinations of factored actions. In this paper, we consider Q-functions defined over a lower dimensional projected subspace of the original action space and study the condition for the unbiasedness of decomposed Q-functions using causal effect estimation from the observed confounder setting in causal statistics. This leads to a general scheme that uses the projected Q-functions to approximate the Q-function in standard model-free reinforcement learning algorithms. The proposed approach is shown to improve sample complexity in a model-based reinforcement learning setting. We demonstrate improvements in sample efficiency compared to state-of-the-art baselines in online continuous control environments and a real-world offline sepsis treatment environment.
C.A. Micchelli, W.L. Miranker
Journal of the ACM
Kenneth L. Clarkson, Elad Hazan, et al.
Journal of the ACM
Thomas Bailie, Yun Singh Koh, et al.
AAAI 2025
Alexander Timms, Abigail Langbridge, et al.
AAAI 2025