UM
Residential Collegefalse
Status已發表Published
Multimedia Generative Script Learning for Task Planning
Wang, Qingyun1; Li, Manling1; Chan, Hou Pong2; Huang, Lifu3; Hockenmaier, Julia1; Chowdhary, Girish1; Ji, Heng1
2023
Conference Name61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Source PublicationProceedings of the Annual Meeting of the Association for Computational Linguistics
Pages986-1006
Conference Date9 July 2023through 14 July 2023
Conference PlaceToronto
PublisherAssociation for Computational Linguistics (ACL)
Abstract

Goal-oriented generative script learning aims to generate subsequent steps to reach a particular goal, which is an essential task to assist robots or humans in performing stereotypical activities. An important aspect of this process is the ability to capture historical states visually, which provides detailed information that is not covered by text and will guide subsequent steps. Therefore, we propose a new task, Multimedia Generative Script Learning, to generate subsequent steps by tracking historical states in both text and vision modalities, as well as presenting the first benchmark containing 5,652 tasks and 79,089 multimedia steps. This task is challenging in three aspects: the multimedia challenge of capturing the visual states in images, the induction challenge of performing unseen tasks, and the diversity challenge of covering different information in individual steps. We propose to encode visual state changes through a selective multimedia encoder to address the multimedia challenge, transfer knowledge from previously observed tasks using a retrieval-augmented decoder to overcome the induction challenge, and further present distinct information at each step by optimizing a diversity-oriented contrastive learning objective. We define metrics to evaluate both generation and inductive quality. Experiment results demonstrate that our approach significantly outperforms strong baselines.

URLView the original
Language英語English
Scopus ID2-s2.0-85173080203
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionUniversity of Macau
Affiliation1.University of Illinois, Urbana-Champaign, United States
2.University of Macau, Macao
3.Virginia Tech, United States
Recommended Citation
GB/T 7714
Wang, Qingyun,Li, Manling,Chan, Hou Pong,et al. Multimedia Generative Script Learning for Task Planning[C]:Association for Computational Linguistics (ACL), 2023, 986-1006.
APA Wang, Qingyun., Li, Manling., Chan, Hou Pong., Huang, Lifu., Hockenmaier, Julia., Chowdhary, Girish., & Ji, Heng (2023). Multimedia Generative Script Learning for Task Planning. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 986-1006.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang, Qingyun]'s Articles
[Li, Manling]'s Articles
[Chan, Hou Pong]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang, Qingyun]'s Articles
[Li, Manling]'s Articles
[Chan, Hou Pong]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang, Qingyun]'s Articles
[Li, Manling]'s Articles
[Chan, Hou Pong]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.