Ben Huh, Avinash Baidya
NeurIPS 2022
Dealing with non-stationarity in environments (i.e., transition dynamics) and objectives (i.e., reward functions) is a challenging problem that is crucial in real-world applications of reinforcement learning (RL). While most current approaches model the changes as a single shared embedding vector, we leverage insights from the recent causality literature to model non-stationarity in terms of individual latent change factors and causal graphs across different environments. In particular, we propose Factored Adaptation for Non-Stationary RL (FANS-RL), a factored adaption approach that learns jointly the causal structure in terms of a factored MDP, and a factored representation of the individual time-varying change factors. We prove that under standard assumptions we can recover completely the causal graph representing the factored transition and reward function, and a partial structure between the individual change factors and the state components. Through our general framework, we can consider general non-stationary scenarios with different changing function types and changing frequency, including changes across episodes and within episodes. Experimental results demonstrate that FANS-RL outperforms existing approaches in terms of rewards, compactness of the latent state representation and robustness to varying degrees of non-stationarity.
Ben Huh, Avinash Baidya
NeurIPS 2022
Hongyu Tu, Shantam Shorewala, et al.
NeurIPS 2022
Shiqiang Wang, Nathalie Baracaldo Angel, et al.
NeurIPS 2022
Chanakya Ekbote, Moksh Jain, et al.
NeurIPS 2022