Saurabh Paul, Christos Boutsidis, et al.
JMLR
Standard Reinforcement Learning (RL) aims to optimize decision-making rules in terms of the expected return. However, especially for risk-management purposes, other criteria such as the expected shortfall are some-times preferred. Here, we describe a method of approximating the distribution of returns, which allows us to derive various kinds of information about the returns. We first show that the Bellman equation, which is a recursive formula for the expected return, can be extended to the cumulative return distribution. Then we derive a nonparametric return distribution estimator with particle smooth ing based on this extended Bellman equation. A key aspect of the proposed algorithm is to represent the recursion relation in the extended Bellman equation by a simple replacement procedure of particles associated with a state by using those of the successor state. We show that our algorithm leads to a risk-sensitive R.L paradigm. The usefulness of the proposed approach is demonstrated through numerical experiments. Copyright 2010 by the author(s)/owner(s).
Saurabh Paul, Christos Boutsidis, et al.
JMLR
C.A. Micchelli, W.L. Miranker
Journal of the ACM
Joxan Jaffar
Journal of the ACM
Kenneth L. Clarkson, Elad Hazan, et al.
Journal of the ACM