Mixing-time regularized policy gradientTetsuro MorimuraTakayuki Osogamiet al.2014AAAI 2014Conference paper