R. Sebastian, M. Weise, et al.
ECPPM 2022
This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance, two ways of propagating reinforcement values spatially across states. A real behavior-based robot called OBELIX is described that learns several component behaviors in an example task involving pushing boxes. A simulator for the box pushing task is also used to gather data on the learning techniques. A detailed experimental study using the real robot and the simulator suggests two conclusions. 1. (1) The learning techniques are able to learn the individual behaviors, sometimes outperforming a handcoded program. 2. (2) Using a behavior-based architecture speeds up reinforcement learning by converting the problem of learning a complex task into that of learning a simpler set of special-purpose reactive subtasks. © 1992.
R. Sebastian, M. Weise, et al.
ECPPM 2022
Michael Muller, Anna Kantosalo, et al.
CHI 2024
Paula Harder, Venkatesh Ramesh, et al.
EGU 2023
Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010