Network of tensor time series
Baoyu Jing, Hanghang Tong, et al.
WWW 2021
Learning a high-performance trade execution model via reinforcement learning (RL) requires interaction with the real dynamic market. However, the massive interactions required by direct RL would result in a significant training overhead. In this paper, we propose a cost-efficient reinforcement learning (RL) approach called Deep Dyna-Double Q-learning (D3Q), which integrates deep reinforcement learning and planning to reduce the training overhead while improving the trading performance. Specifically, D3Q includes a learnable market environment model, which approximates the market impact using real market experience, to enhance policy learning via the learned environment. Meanwhile, we propose a novel state-balanced exploration scheme to solve the exploration bias caused by the non-increasing residual inventory during the trade execution to accelerate model learning. As demonstrated by our extensive experiments, the proposed D3Q framework significantly increases sample efficiency and outperforms state-of-the-art methods on average trading cost as well.
Baoyu Jing, Hanghang Tong, et al.
WWW 2021
Taposh Banerjee, Miao Liu, et al.
ACC 2017
Pei Yang, Qi Tan, et al.
KAIS
Hongxia Yang, Yada Zhu, et al.
KDD 2017