Residential College | false |
Status | 已發表Published |
Multimedia Generative Script Learning for Task Planning | |
Wang, Qingyun1; Li, Manling1; Chan, Hou Pong2; Huang, Lifu3; Hockenmaier, Julia1; Chowdhary, Girish1; Ji, Heng1 | |
2023 | |
Conference Name | 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 |
Source Publication | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
Pages | 986-1006 |
Conference Date | 9 July 2023through 14 July 2023 |
Conference Place | Toronto |
Publisher | Association for Computational Linguistics (ACL) |
Abstract | Goal-oriented generative script learning aims to generate subsequent steps to reach a particular goal, which is an essential task to assist robots or humans in performing stereotypical activities. An important aspect of this process is the ability to capture historical states visually, which provides detailed information that is not covered by text and will guide subsequent steps. Therefore, we propose a new task, Multimedia Generative Script Learning, to generate subsequent steps by tracking historical states in both text and vision modalities, as well as presenting the first benchmark containing 5,652 tasks and 79,089 multimedia steps. This task is challenging in three aspects: the multimedia challenge of capturing the visual states in images, the induction challenge of performing unseen tasks, and the diversity challenge of covering different information in individual steps. We propose to encode visual state changes through a selective multimedia encoder to address the multimedia challenge, transfer knowledge from previously observed tasks using a retrieval-augmented decoder to overcome the induction challenge, and further present distinct information at each step by optimizing a diversity-oriented contrastive learning objective. We define metrics to evaluate both generation and inductive quality. Experiment results demonstrate that our approach significantly outperforms strong baselines. |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85173080203 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | University of Macau |
Affiliation | 1.University of Illinois, Urbana-Champaign, United States 2.University of Macau, Macao 3.Virginia Tech, United States |
Recommended Citation GB/T 7714 | Wang, Qingyun,Li, Manling,Chan, Hou Pong,et al. Multimedia Generative Script Learning for Task Planning[C]:Association for Computational Linguistics (ACL), 2023, 986-1006. |
APA | Wang, Qingyun., Li, Manling., Chan, Hou Pong., Huang, Lifu., Hockenmaier, Julia., Chowdhary, Girish., & Ji, Heng (2023). Multimedia Generative Script Learning for Task Planning. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 986-1006. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment