Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Value-based reinforcement learning methods, like Deep Q-Networks (DQNs), typically estimate returns by minimizing the mean squared error between predicted and target values. From a Bayesian stand point, this procedure implicitly assumes that returns follow a uni- modal Gaussian distribution, with parameters learned via maximum likelihood estimation. However, this assumption can be limiting in environments characterized by high stochasticity or complex reward dynamics, where capturing uncertainty and multi-modality in the return distribution is critical for robust decision-making. We propose Gaussian Mixture Q-Networks (GQN), a novel exten- sion of Q-learning that models return distribution as a mixture of Gaussians. Architecturally, GQN can be interpreted as a mixture- of-experts Q-learning algorithm, where each Gaussian component acts as an expert head and mixture weights are adaptively updated via temporal-difference responsibilities inspired by Expectation–Maximization. We evaluate GQN on the Atari benchmark suite and observe improvements in both learning stability and final performance compared to standard DQN baselines.
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
Yidi Wu, Thomas Bohnstingl, et al.
ICML 2025
Gosia Lazuka, Andreea Simona Anghel, et al.
SC 2024