Hongliang Li, Miao He, et al.
IJCNN 2016
In this paper, we present error bound analysis of the Q-function for the action-dependent adaptive dynamic programming for solving discounted optimal control problems of unknown discrete-time nonlinear systems. The convergence of Q-functions derived by a policy iteration algorithm under ideal conditions is given. Considering the approximated errors of the Q-function and control policy in the policy evaluation step and policy improvement step, we establish error bounds of approximate Q-functions in each iteration. With the given boundedness conditions, the approximate Q-function will converge to a finite neighborhood of the optimal Q-function. To implement the presented algorithm, two three-layer neural networks are employed to approximate the Q-function and the control policy, respectively. Finally, a simulation example is utilized to verify the validity of the presented algorithm.
Hongliang Li, Miao He, et al.
IJCNN 2016
Hongliang Li, Bo Zhang, et al.
IBM J. Res. Dev
Ding Wang, Derong Liu, et al.
IEEE Transactions On SMC: Systems
Ding Wang, Derong Liu, et al.
IEEE Transactions On SMC: Systems