Enhanced Deep Q-Learning with Gaussian Mixtures

Chainesh Gautam; Kamanchi Chandramouli; Raghuram Bharadwaj Diddigi

AAMAS 2026

Short paper

25 Oct 2025

Enhanced Deep Q-Learning with Gaussian Mixtures

Abstract

Value-based reinforcement learning methods, like Deep Q-Networks (DQNs), typically estimate returns by minimizing the mean squared error between predicted and target values. From a Bayesian stand point, this procedure implicitly assumes that returns follow a uni- modal Gaussian distribution, with parameters learned via maximum likelihood estimation. However, this assumption can be limiting in environments characterized by high stochasticity or complex reward dynamics, where capturing uncertainty and multi-modality in the return distribution is critical for robust decision-making. We propose Gaussian Mixture Q-Networks (GQN), a novel exten- sion of Q-learning that models return distribution as a mixture of Gaussians. Architecturally, GQN can be interpreted as a mixture- of-experts Q-learning algorithm, where each Gaussian component acts as an expert head and mixture weights are adaptively updated via temporal-difference responsibilities inspired by Expectation–Maximization. We evaluate GQN on the Atari benchmark suite and observe improvements in both learning stability and final performance compared to standard DQN baselines.

Conference paper