Modeling polarization for Hyper-NA lithography tools and masks
Kafai Lai, Alan E. Rosenbluth, et al.
SPIE Advanced Lithography 2007
We consider a class of multi-armed bandit problems where the set of available actions can be mapped to a convex, compact region of ℝd, sometimes denoted as the "continuum-armed bandit" problem. The paper establishes bounds on the efficiency of any arm-selection procedure under certain conditions on the class of possible underlying reward functions. Both finite-time lower bounds on the growth rate of the regret, as well as asymptotic upper bounds on the rates of convergence of the selected control values to the optimum are derived. We explicitly characterize the dependence of these convergence rates on the minimal rate of variation of the mean reward function in a neighborhood of the optimal control. The bounds can be used to demonstrate the asymptotic optimality of the Kiefer-Wolfowitz method of stochastic approximation with regard to a large class of possible mean reward functions. © 2009 IEEE.
Kafai Lai, Alan E. Rosenbluth, et al.
SPIE Advanced Lithography 2007
Rolf Clauberg
IBM J. Res. Dev
Sabine Deligne, Ellen Eide, et al.
INTERSPEECH - Eurospeech 2001
Nanda Kambhatla
ACL 2004