Residential College | false |
Status | 已發表Published |
Visual In-Context Learning for Large Vision-Language Models | |
Zhou, Yucheng1; Li, Xiang2; Wang, Qianning3; Shen, Jianbing1 | |
2024 | |
Conference Name | 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 |
Source Publication | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
Pages | 15890-15902 |
Conference Date | 11-16 August 2024 |
Conference Place | Hybrid, Bangkok |
Country | Thailand |
Publisher | Association for Computational Linguistics (ACL) |
Abstract | In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition. Our approach retrieves images via “Retrieval & Rerank” paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations that reduce token count and alleviate cross-modal interaction problem. Experimental evaluations on five visual reasoning datasets demonstrate the effectiveness of our method. Moreover, our extensive experiments leverage information flow analysis to elucidate the effectiveness of our method, and investigate the impact of length and position of demonstrations for LVLM. The use of in-context unlearning further shows promise in resetting specific model knowledge without retraining. |
DOI | 10.48550/arXiv.2402.11574 |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85205302832 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) |
Corresponding Author | Shen, Jianbing |
Affiliation | 1.SKL-IOTSC, CIS, University of Macau, Macao 2.Tianjin University, China 3.Nanjing Audit University, China |
First Author Affilication | University of Macau |
Corresponding Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Zhou, Yucheng,Li, Xiang,Wang, Qianning,et al. Visual In-Context Learning for Large Vision-Language Models[C]:Association for Computational Linguistics (ACL), 2024, 15890-15902. |
APA | Zhou, Yucheng., Li, Xiang., Wang, Qianning., & Shen, Jianbing (2024). Visual In-Context Learning for Large Vision-Language Models. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 15890-15902. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment