Harsha Kokel, Aamod Khatiwada, et al.
VLDB 2025
This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance, two ways of propagating reinforcement values spatially across states. A real behavior-based robot called OBELIX is described that learns several component behaviors in an example task involving pushing boxes. A simulator for the box pushing task is also used to gather data on the learning techniques. A detailed experimental study using the real robot and the simulator suggests two conclusions. 1. (1) The learning techniques are able to learn the individual behaviors, sometimes outperforming a handcoded program. 2. (2) Using a behavior-based architecture speeds up reinforcement learning by converting the problem of learning a complex task into that of learning a simpler set of special-purpose reactive subtasks. © 1992.
Harsha Kokel, Aamod Khatiwada, et al.
VLDB 2025
Arnold.L. Rosenberg
Journal of the ACM
Yehuda Naveli, Michal Rimon, et al.
AAAI/IAAI 2006
Rama Akkiraju, Pinar Keskinocak, et al.
Applied Intelligence