Residential College | false |
Status | 已發表Published |
Self-Supervised Imitation for Offline Reinforcement Learning With Hindsight Relabeling | |
Yu, Xudong1; Bai, Chenjia2; Wang, Changhong1; Yu, Dengxiu3; Chen, C. L.Philip4,5![]() ![]() | |
2023-12-01 | |
Source Publication | IEEE Transactions on Systems, Man, and Cybernetics: Systems
![]() |
ABS Journal Level | 3 |
ISSN | 2168-2216 |
Volume | 53Issue:12Pages:7732-7743 |
Abstract | Reinforcement learning (RL) requires a lot of interactions with the environment, which is usually expensive or dangerous in real-world tasks. To address this problem, offline RL considers learning policies from fixed datasets, which is promising in utilizing large-scale datasets, but still suffers from the unstable estimation for out-of-distribution data. Recent developments in RL via supervised learning methods offer an alternative to learning effective policies from suboptimal datasets while relying on oracle information from the environment. In this article, we present an offline RL algorithm that combines hindsight relabeling and supervised regression to predict actions without oracle information. We use hindsight relabeling on the original dataset and learn a command generator and command-conditional policies in a supervised manner, where the command represents the desired return or goal location according to the corresponding task. Theoretically, we illustrate that our method optimizes the lower bound of the goal-conditional RL objective. Empirically, our method achieves competitive performance in comparison with existing approaches in the sparse reward setting and favorable performance in continuous control tasks. |
Keyword | Hindsight Relabeling Offline Reinforcement Learning (Rl) Supervised Learning |
DOI | 10.1109/TSMC.2023.3297711 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Automation & Control Systems ; Computer Science |
WOS Subject | Automation & Control Systems ; Computer Science, Cybernetics |
WOS ID | WOS:001069562300001 |
Publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 445 HOES LANE, PISCATAWAY, NJ 08855-4141 |
Scopus ID | 2-s2.0-85168736234 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Science and Technology DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Wang, Zhen |
Affiliation | 1.Harbin Institute of Technology, Space Control and Inertial Technology Research Center, Harbin, 150001, China 2.Shanghai Artificial Intelligence Laboratory, Fundamental Theory, Shanghai, 200232, China 3.Unmanned System Research Institute, Northwestern Polytechnical University, Xi'an, 710072, China 4.South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China 5.University of Macau, Faculty of Science and Technology, Macao 6.Northwestern Polytechnical University, School of Artificial Intelligence, Optics and Electronics and the School of Cyberspace, Xi'an, 710072, China |
Recommended Citation GB/T 7714 | Yu, Xudong,Bai, Chenjia,Wang, Changhong,et al. Self-Supervised Imitation for Offline Reinforcement Learning With Hindsight Relabeling[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(12), 7732-7743. |
APA | Yu, Xudong., Bai, Chenjia., Wang, Changhong., Yu, Dengxiu., Chen, C. L.Philip., & Wang, Zhen (2023). Self-Supervised Imitation for Offline Reinforcement Learning With Hindsight Relabeling. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 53(12), 7732-7743. |
MLA | Yu, Xudong,et al."Self-Supervised Imitation for Offline Reinforcement Learning With Hindsight Relabeling".IEEE Transactions on Systems, Man, and Cybernetics: Systems 53.12(2023):7732-7743. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment