Aditya Malik, Nalini Ratha, et al.
CAI 2024
We consider a new form of decision making under uncertainty that is based on a general Markov decision process (MDP) framework devised to support opportunities to directly learn the optimal control policy. Our MDP framework extends the classical Bellman operator and optimality criteria by generalizing the definition and scope of a policy for any given state. We establish convergence and optimality results-both in general and within various control paradigms (e.g., piecewise linear control policies)-for our control-based methods through this general MDP framework, including convergence of Q-learning within the context of our MDP framework.
Aditya Malik, Nalini Ratha, et al.
CAI 2024
Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025