UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
Integrated Double Estimator Architecture for Reinforcement Learning
Lv, Pingli1,2; Wang, Xuesong1,3; Cheng, Yuhu1,3; Duan, Ziming4; Chen, C. L.Philip5,6
2022-05-01
Source PublicationIEEE Transactions on Cybernetics
ABS Journal Level3
ISSN2168-2267
Volume52Issue:5Pages:3111-3122
Abstract

Estimation bias is an important index for evaluating the performance of reinforcement learning (RL) algorithms. The popular RL algorithms, such as Q -learning and deep Q -network (DQN), often suffer overestimation due to the maximum operation in estimating the maximum expected action values of the next states, while double Q -learning (DQ) and double DQN may fall into underestimation by using a double estimator (DE) to avoid overestimation. To keep the balance between overestimation and underestimation, we propose a novel integrated DE (IDE) architecture by combining the maximum operation and DE operation to estimate the maximum expected action value. Based on IDE, two RL algorithms: 1) integrated DQ (IDQ) and 2) its deep network version, that is, integrated double DQN (IDDQN), are proposed. The main idea of the proposed RL algorithms is that the maximum and DE operations are integrated to eliminate the estimation bias, where one estimator is stochastically used to perform action selection based on the maximum operation, and the convex combination of two estimators is used to carry out action evaluation. We theoretically analyze the reason of estimation bias caused by using nonmaximum operation to estimate the maximum expected value and investigate the possible reasons of underestimation existence in DQ. We also prove the unbiasedness of IDE and convergence of IDQ. Experiments on the grid world and Atari 2600 games indicate that IDQ and IDDQN can reduce or even eliminate estimation bias effectively, enable the learning to be more stable and balanced, and improve the performance effectively.

DOI10.1109/TCYB.2020.3023033
URLView the original
Language英語English
Scopus ID2-s2.0-85118651347
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionFaculty of Science and Technology
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation1.Ministry of Education, China University of Mining and Technology, Engineering Research Center of Intelligent Control for Underground Space, Xuzhou, 221116, China
2.Jiangsu Vocational Institute of Architectural Technology, School of Information and Electronics Engineering, Xuzhou, 221116, China
3.China University of Mining and Technology, School of Information and Control Engineering, Xuzhou, 221116, China
4.China University of Mining and Technology, School of Mathematics, Xuzhou, 221116, China
5.University of Macau, Faculty of Science and Technology, SAR 99999, Macao
6.South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
Recommended Citation
GB/T 7714
Lv, Pingli,Wang, Xuesong,Cheng, Yuhu,et al. Integrated Double Estimator Architecture for Reinforcement Learning[J]. IEEE Transactions on Cybernetics, 2022, 52(5), 3111-3122.
APA Lv, Pingli., Wang, Xuesong., Cheng, Yuhu., Duan, Ziming., & Chen, C. L.Philip (2022). Integrated Double Estimator Architecture for Reinforcement Learning. IEEE Transactions on Cybernetics, 52(5), 3111-3122.
MLA Lv, Pingli,et al."Integrated Double Estimator Architecture for Reinforcement Learning".IEEE Transactions on Cybernetics 52.5(2022):3111-3122.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Lv, Pingli]'s Articles
[Wang, Xuesong]'s Articles
[Cheng, Yuhu]'s Articles
Baidu academic
Similar articles in Baidu academic
[Lv, Pingli]'s Articles
[Wang, Xuesong]'s Articles
[Cheng, Yuhu]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Lv, Pingli]'s Articles
[Wang, Xuesong]'s Articles
[Cheng, Yuhu]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.