UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
Learning Disentanglement with Decoupled Labels for Vision-Language Navigation
Cheng, Wenhao1; Dong, Xingping2; Khan, Salman3; Shen, Jianbing4
2022-10-29
Conference Name17th European Conference on Computer Vision (ECCV)
Source PublicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13696
Pages309-329
Conference DateOCT 23-27, 2022
Conference PlaceTel Aviv, ISRAEL
PublisherSPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY
Abstract

Vision-and-Language Navigation (VLN) requires an agent to follow complex natural language instructions and perceive the visual environment for real-world navigation. Intuitively, we find that instruction disentanglement for each viewpoint along the agent’s path is critical for accurate navigation. However, most methods only utilize the whole complex instruction or inaccurate sub-instructions due to the lack of accurate disentanglement as an intermediate supervision stage. To address this problem, we propose a new Disentanglement framework with Decoupled Labels (DDL) for VLN. Firstly, we manually extend the benchmark dataset Room-to-Room with landmark- and action-aware labels in order to provide fine-grained information for each viewpoint. Furthermore, to enhance the generalization ability, we propose a Decoupled Label Speaker module to generate pseudo-labels for augmented data and reinforcement training. To fully use the proposed fine-grained labels, we design a Disentangled Decoding Module to guide discriminative feature extraction and help alignment of multi-modalities. To reveal the generality of our proposed method, we apply it on a LSTM-based model and two recent Transformer-based models. Extensive experiments on two VLN benchmarks (i.e., R2R and R4R) demonstrate the effectiveness of our approach, achieving better performance than previous state-of-the-art methods.

KeywordDisentanglement Imitation/reinforcement Learning Lstm And Transformer Modular Network Vision-and-language Navigation
DOI10.1007/978-3-031-20059-5_18
URLView the original
Indexed ByCPCI-S
Language英語English
WOS Research AreaComputer Science ; Imaging Science & Photographic Technology
WOS SubjectComputer Science, Artificial Intelligence, Imaging Science & Photographic Technology
WOS IDWOS:000903751800018
Scopus ID2-s2.0-85142667748
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionFaculty of Science and Technology
THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorShen, Jianbing
Affiliation1.School of Computer Science, Beijing Institute of Technology, Beijing, China
2.Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates
3.Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
4.SKL-IOTSC, Computer and Information Science, University of Macau, Macao
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Cheng, Wenhao,Dong, Xingping,Khan, Salman,et al. Learning Disentanglement with Decoupled Labels for Vision-Language Navigation[C]:SPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY, 2022, 309-329.
APA Cheng, Wenhao., Dong, Xingping., Khan, Salman., & Shen, Jianbing (2022). Learning Disentanglement with Decoupled Labels for Vision-Language Navigation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13696, 309-329.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Cheng, Wenhao]'s Articles
[Dong, Xingping]'s Articles
[Khan, Salman]'s Articles
Baidu academic
Similar articles in Baidu academic
[Cheng, Wenhao]'s Articles
[Dong, Xingping]'s Articles
[Khan, Salman]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Cheng, Wenhao]'s Articles
[Dong, Xingping]'s Articles
[Khan, Salman]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.