Multimedia Generative Script Learning for Task Planning

Residential College	false
Status	已發表Published
	Multimedia Generative Script Learning for Task Planning
	Wang, Qingyun 1; Li, Manling 1; Chan, Hou Pong 2; Huang, Lifu 3; Hockenmaier, Julia 1; Chowdhary, Girish 1; Ji, Heng 1
	2023
Conference Name	61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Source Publication	Proceedings of the Annual Meeting of the Association for Computational Linguistics
Pages	986-1006
Conference Date	9 July 2023through 14 July 2023
Conference Place	Toronto
Publisher	Association for Computational Linguistics (ACL)
Abstract	Goal-oriented generative script learning aims to generate subsequent steps to reach a particular goal, which is an essential task to assist robots or humans in performing stereotypical activities. An important aspect of this process is the ability to capture historical states visually, which provides detailed information that is not covered by text and will guide subsequent steps. Therefore, we propose a new task, Multimedia Generative Script Learning, to generate subsequent steps by tracking historical states in both text and vision modalities, as well as presenting the first benchmark containing 5,652 tasks and 79,089 multimedia steps. This task is challenging in three aspects: the multimedia challenge of capturing the visual states in images, the induction challenge of performing unseen tasks, and the diversity challenge of covering different information in individual steps. We propose to encode visual state changes through a selective multimedia encoder to address the multimedia challenge, transfer knowledge from previously observed tasks using a retrieval-augmented decoder to overcome the induction challenge, and further present distinct information at each step by optimizing a diversity-oriented contrastive learning objective. We define metrics to evaluate both generation and inductive quality. Experiment results demonstrate that our approach significantly outperforms strong baselines.
URL	View the original
Language	英語English
Scopus ID	2-s2.0-85173080203
Fulltext Access	View Full-Text via Scopus
Citation statistics
Document Type	Conference paper
Collection	University of Macau
Affiliation	1.University of Illinois, Urbana-Champaign, United States 2.University of Macau, Macao 3.Virginia Tech, United States
Recommended Citation GB/T 7714	Wang, Qingyun,Li, Manling,Chan, Hou Pong,et al. Multimedia Generative Script Learning for Task Planning[C]:Association for Computational Linguistics (ACL), 2023, 986-1006.
APA	Wang, Qingyun., Li, Manling., Chan, Hou Pong., Huang, Lifu., Hockenmaier, Julia., Chowdhary, Girish., & Ji, Heng (2023). Multimedia Generative Script Learning for Task Planning. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 986-1006.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh