Residential College | false |
Status | 已發表Published |
Integrated Double Estimator Architecture for Reinforcement Learning | |
Lv, Pingli1,2; Wang, Xuesong1,3; Cheng, Yuhu1,3; Duan, Ziming4; Chen, C. L.Philip5,6 | |
2022-05-01 | |
Source Publication | IEEE Transactions on Cybernetics |
ABS Journal Level | 3 |
ISSN | 2168-2267 |
Volume | 52Issue:5Pages:3111-3122 |
Abstract | Estimation bias is an important index for evaluating the performance of reinforcement learning (RL) algorithms. The popular RL algorithms, such as Q -learning and deep Q -network (DQN), often suffer overestimation due to the maximum operation in estimating the maximum expected action values of the next states, while double Q -learning (DQ) and double DQN may fall into underestimation by using a double estimator (DE) to avoid overestimation. To keep the balance between overestimation and underestimation, we propose a novel integrated DE (IDE) architecture by combining the maximum operation and DE operation to estimate the maximum expected action value. Based on IDE, two RL algorithms: 1) integrated DQ (IDQ) and 2) its deep network version, that is, integrated double DQN (IDDQN), are proposed. The main idea of the proposed RL algorithms is that the maximum and DE operations are integrated to eliminate the estimation bias, where one estimator is stochastically used to perform action selection based on the maximum operation, and the convex combination of two estimators is used to carry out action evaluation. We theoretically analyze the reason of estimation bias caused by using nonmaximum operation to estimate the maximum expected value and investigate the possible reasons of underestimation existence in DQ. We also prove the unbiasedness of IDE and convergence of IDQ. Experiments on the grid world and Atari 2600 games indicate that IDQ and IDDQN can reduce or even eliminate estimation bias effectively, enable the learning to be more stable and balanced, and improve the performance effectively. |
DOI | 10.1109/TCYB.2020.3023033 |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85118651347 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Science and Technology DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Affiliation | 1.Ministry of Education, China University of Mining and Technology, Engineering Research Center of Intelligent Control for Underground Space, Xuzhou, 221116, China 2.Jiangsu Vocational Institute of Architectural Technology, School of Information and Electronics Engineering, Xuzhou, 221116, China 3.China University of Mining and Technology, School of Information and Control Engineering, Xuzhou, 221116, China 4.China University of Mining and Technology, School of Mathematics, Xuzhou, 221116, China 5.University of Macau, Faculty of Science and Technology, SAR 99999, Macao 6.South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China |
Recommended Citation GB/T 7714 | Lv, Pingli,Wang, Xuesong,Cheng, Yuhu,et al. Integrated Double Estimator Architecture for Reinforcement Learning[J]. IEEE Transactions on Cybernetics, 2022, 52(5), 3111-3122. |
APA | Lv, Pingli., Wang, Xuesong., Cheng, Yuhu., Duan, Ziming., & Chen, C. L.Philip (2022). Integrated Double Estimator Architecture for Reinforcement Learning. IEEE Transactions on Cybernetics, 52(5), 3111-3122. |
MLA | Lv, Pingli,et al."Integrated Double Estimator Architecture for Reinforcement Learning".IEEE Transactions on Cybernetics 52.5(2022):3111-3122. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment