Model-Free λ-Policy Iteration for Discrete-Time Linear Quadratic Regulation

doi:10.1109/TNNLS.2021.3098985

UM > Faculty of Science and Technology

Residential College	false
Status	已發表Published
	Model-Free λ-Policy Iteration for Discrete-Time Linear Quadratic Regulation
	Yang, Yongliang 1; Kiumarsi, Bahare 2; Modares, Hamidreza 3; Xu, Chengzhong 4
	2023-02-01
Source Publication	IEEE Transactions on Neural Networks and Learning Systems
ISSN	2162-237X
Volume	34 Issue:2 Pages:635-649
Abstract	This article presents a model-free λ-policy iteration ( λ-PI) for the discrete-time linear quadratic regulation (LQR) problem. To solve the algebraic Riccati equation arising from solving the LQR in an iterative manner, we define two novel matrix operators, named the weighted Bellman operator and the composite Bellman operator. Then, the λ-PI algorithm is first designed as a recursion with the weighted Bellman operator, and its equivalent formulation as a fixed-point iteration with the composite Bellman operator is shown. The contraction and monotonic properties of the composite Bellman operator guarantee the convergence of the λ-PI algorithm. In contrast to the PI algorithm, the λ-PI does not require an admissible initial policy, and the convergence rate outperforms the value iteration (VI) algorithm. Model-free extension of the λ-PI algorithm is developed using the off-policy reinforcement learning technique. It is also shown that the off-policy variants of the λ-PI algorithm are robust against the probing noise. Finally, simulation examples are conducted to validate the efficacy of the λ-PI algorithm.
Keyword	Algebraic Riccati Equation (Are) Fixed-point Theory Off-policy Reinforcement Learning (Rl) Optimal Control
DOI	10.1109/TNNLS.2021.3098985
URL	View the original
Indexed By	SCIE
Language	英語English
WOS Research Area	Computer Science ; Engineering
WOS Subject	Computer Science, Artificial Intelligence ; Computer Science, Hardware & Architecture ; Computer Science, Theory & Methods ; Engineering, Electrical & Electronic
WOS ID	WOS:000732146500001
Scopus ID	2-s2.0-85148473067
Fulltext Access	View Full-Text via DOI View Full-Text via Web of Science View Full-Text via Scopus
Citation statistics
Document Type	Journal article
Collection	Faculty of Science and Technology
Affiliation	1.University of Macau, State Key Laboratory of IoTSC, Taipa, 999078, Macao 2.Michigan State University, Department of Electrical and Computer Engineering, East Lansing, 48824, United States 3.Michigan State University, Department of Mechanical Engineering, East Lansing, 48824, United States 4.University of Macau, State Key Laboratory of IoTSC, Taipa, Macao
First Author Affilication	University of Macau
Recommended Citation GB/T 7714	Yang, Yongliang,Kiumarsi, Bahare,Modares, Hamidreza,et al. Model-Free λ-Policy Iteration for Discrete-Time Linear Quadratic Regulation[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(2), 635-649.
APA	Yang, Yongliang., Kiumarsi, Bahare., Modares, Hamidreza., & Xu, Chengzhong (2023). Model-Free λ-Policy Iteration for Discrete-Time Linear Quadratic Regulation. IEEE Transactions on Neural Networks and Learning Systems, 34(2), 635-649.
MLA	Yang, Yongliang,et al."Model-Free λ-Policy Iteration for Discrete-Time Linear Quadratic Regulation".IEEE Transactions on Neural Networks and Learning Systems 34.2(2023):635-649.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh