Integrated Double Estimator Architecture for Reinforcement Learning

doi:10.1109/TCYB.2020.3023033

UM > Faculty of Science and Technology

Residential College	false
Status	已發表Published
	Integrated Double Estimator Architecture for Reinforcement Learning
	Lv, Pingli 1,2; Wang, Xuesong 1,3; Cheng, Yuhu 1,3; Duan, Ziming 4; Chen, C. L.Philip5,6
	2022-05-01
Source Publication	IEEE Transactions on Cybernetics
ABS Journal Level	3
ISSN	2168-2267
Volume	52 Issue:5 Pages:3111-3122
Abstract	Estimation bias is an important index for evaluating the performance of reinforcement learning (RL) algorithms. The popular RL algorithms, such as Q -learning and deep Q -network (DQN), often suffer overestimation due to the maximum operation in estimating the maximum expected action values of the next states, while double Q -learning (DQ) and double DQN may fall into underestimation by using a double estimator (DE) to avoid overestimation. To keep the balance between overestimation and underestimation, we propose a novel integrated DE (IDE) architecture by combining the maximum operation and DE operation to estimate the maximum expected action value. Based on IDE, two RL algorithms: 1) integrated DQ (IDQ) and 2) its deep network version, that is, integrated double DQN (IDDQN), are proposed. The main idea of the proposed RL algorithms is that the maximum and DE operations are integrated to eliminate the estimation bias, where one estimator is stochastically used to perform action selection based on the maximum operation, and the convex combination of two estimators is used to carry out action evaluation. We theoretically analyze the reason of estimation bias caused by using nonmaximum operation to estimate the maximum expected value and investigate the possible reasons of underestimation existence in DQ. We also prove the unbiasedness of IDE and convergence of IDQ. Experiments on the grid world and Atari 2600 games indicate that IDQ and IDDQN can reduce or even eliminate estimation bias effectively, enable the learning to be more stable and balanced, and improve the performance effectively.
DOI	10.1109/TCYB.2020.3023033
URL	View the original
Language	英語English
Scopus ID	2-s2.0-85118651347
Fulltext Access	View Full-Text via DOI View Full-Text via Web of Science View Full-Text via Scopus
Citation statistics
Document Type	Journal article
Collection	Faculty of Science and Technology DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation	1.Ministry of Education, China University of Mining and Technology, Engineering Research Center of Intelligent Control for Underground Space, Xuzhou, 221116, China 2.Jiangsu Vocational Institute of Architectural Technology, School of Information and Electronics Engineering, Xuzhou, 221116, China 3.China University of Mining and Technology, School of Information and Control Engineering, Xuzhou, 221116, China 4.China University of Mining and Technology, School of Mathematics, Xuzhou, 221116, China 5.University of Macau, Faculty of Science and Technology, SAR 99999, Macao 6.South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
Recommended Citation GB/T 7714	Lv, Pingli,Wang, Xuesong,Cheng, Yuhu,et al. Integrated Double Estimator Architecture for Reinforcement Learning[J]. IEEE Transactions on Cybernetics, 2022, 52(5), 3111-3122.
APA	Lv, Pingli., Wang, Xuesong., Cheng, Yuhu., Duan, Ziming., & Chen, C. L.Philip (2022). Integrated Double Estimator Architecture for Reinforcement Learning. IEEE Transactions on Cybernetics, 52(5), 3111-3122.
MLA	Lv, Pingli,et al."Integrated Double Estimator Architecture for Reinforcement Learning".IEEE Transactions on Cybernetics 52.5(2022):3111-3122.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh