Learning Disentanglement with Decoupled Labels for Vision-Language Navigation

doi:10.1007/978-3-031-20059-5_18

UM > Faculty of Science and Technology

Residential College	false
Status	已發表Published
	Learning Disentanglement with Decoupled Labels for Vision-Language Navigation
	Cheng, Wenhao 1; Dong, Xingping 2; Khan, Salman 3; Shen, Jianbing4
	2022-10-29
Conference Name	17th European Conference on Computer Vision (ECCV)
Source Publication	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13696
Pages	309-329
Conference Date	OCT 23-27, 2022
Conference Place	Tel Aviv, ISRAEL
Publisher	SPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY
Abstract	Vision-and-Language Navigation (VLN) requires an agent to follow complex natural language instructions and perceive the visual environment for real-world navigation. Intuitively, we find that instruction disentanglement for each viewpoint along the agent’s path is critical for accurate navigation. However, most methods only utilize the whole complex instruction or inaccurate sub-instructions due to the lack of accurate disentanglement as an intermediate supervision stage. To address this problem, we propose a new Disentanglement framework with Decoupled Labels (DDL) for VLN. Firstly, we manually extend the benchmark dataset Room-to-Room with landmark- and action-aware labels in order to provide fine-grained information for each viewpoint. Furthermore, to enhance the generalization ability, we propose a Decoupled Label Speaker module to generate pseudo-labels for augmented data and reinforcement training. To fully use the proposed fine-grained labels, we design a Disentangled Decoding Module to guide discriminative feature extraction and help alignment of multi-modalities. To reveal the generality of our proposed method, we apply it on a LSTM-based model and two recent Transformer-based models. Extensive experiments on two VLN benchmarks (i.e., R2R and R4R) demonstrate the effectiveness of our approach, achieving better performance than previous state-of-the-art methods.
Keyword	Disentanglement Imitation/reinforcement Learning Lstm And Transformer Modular Network Vision-and-language Navigation
DOI	10.1007/978-3-031-20059-5_18
URL	View the original
Indexed By	CPCI-S
Language	英語English
WOS Research Area	Computer Science ; Imaging Science & Photographic Technology
WOS Subject	Computer Science, Artificial Intelligence, Imaging Science & Photographic Technology
WOS ID	WOS:000903751800018
Scopus ID	2-s2.0-85142667748
Fulltext Access	View Full-Text via DOI View Full-Text via Web of Science View Full-Text via Scopus
Citation statistics
Document Type	Conference paper
Collection	Faculty of Science and Technology THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding Author	Shen, Jianbing
Affiliation	1.School of Computer Science, Beijing Institute of Technology, Beijing, China 2.Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates 3.Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates 4.SKL-IOTSC, Computer and Information Science, University of Macau, Macao
Corresponding Author Affilication	University of Macau
Recommended Citation GB/T 7714	Cheng, Wenhao,Dong, Xingping,Khan, Salman,et al. Learning Disentanglement with Decoupled Labels for Vision-Language Navigation[C]:SPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY, 2022, 309-329.
APA	Cheng, Wenhao., Dong, Xingping., Khan, Salman., & Shen, Jianbing (2022). Learning Disentanglement with Decoupled Labels for Vision-Language Navigation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13696, 309-329.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh