Uncorrected least-squares temporal difference with lambda-return
Takayuki Osogami
AAAI 2020
The dynamic Boltzmann machine (DyBM) has recently been proposed as a model of an artificial recurrent neural network that can be trained via spike-timing-dependent plasticity, a learning rule that has been postulated and experimentally confirmed for biological neural networks. More specifically, the parameters such as weights and biases of a DyBM can be optimized via this learning "rule" in a way that the likelihood of given time-series data is maximized (i.e., the time-series data are most likely to be generated according to the probability distributions defined by the DyBM with the optimized parameters). However, a DyBM has hyperparameters that need to be tuned by other means. These hyperparameters include the decay rates of eligibility traces and the lengths of first-in-first-out queues. Here, we propose ways to learn the values of the hyperparameters of a DyBM. In particular, we tune the value of the decay rate via a stochastic gradient method with an approximation, to keep the learning time in each step independent of the length of the time-series used for learning. We empirically demonstrate the effectiveness of the proposed approaches in the settings of predicting sequential patterns, including music.
Takayuki Osogami
AAAI 2020
Makoto Otsuka, Takayuki Osogami
AAAI 2016
Subhajit Chaudhury, Sakyasingha Dasgupta, et al.
MLSP 2017
Takayuki Osogami, Rudy Raymond
AAAI 2019