Visual In-Context Learning for Large Vision-Language Models

doi:10.48550/arXiv.2402.11574

UM > THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)

Residential College	false
Status	已發表Published
	Visual In-Context Learning for Large Vision-Language Models
	Zhou, Yucheng 1; Li, Xiang 2; Wang, Qianning 3; Shen, Jianbing1
	2024
Conference Name	62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Source Publication	Proceedings of the Annual Meeting of the Association for Computational Linguistics
Pages	15890-15902
Conference Date	11-16 August 2024
Conference Place	Hybrid, Bangkok
Country	Thailand
Publisher	Association for Computational Linguistics (ACL)
Abstract	In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition. Our approach retrieves images via “Retrieval & Rerank” paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations that reduce token count and alleviate cross-modal interaction problem. Experimental evaluations on five visual reasoning datasets demonstrate the effectiveness of our method. Moreover, our extensive experiments leverage information flow analysis to elucidate the effectiveness of our method, and investigate the impact of length and position of demonstrations for LVLM. The use of in-context unlearning further shows promise in resetting specific model knowledge without retraining.
DOI	10.48550/arXiv.2402.11574
URL	View the original
Language	英語English
Scopus ID	2-s2.0-85205302832
Fulltext Access	View Full-Text via DOI View Full-Text via Scopus
Citation statistics
Document Type	Conference paper
Collection	THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)
Corresponding Author	Shen, Jianbing
Affiliation	1.SKL-IOTSC, CIS, University of Macau, Macao 2.Tianjin University, China 3.Nanjing Audit University, China
First Author Affilication	University of Macau
Corresponding Author Affilication	University of Macau
Recommended Citation GB/T 7714	Zhou, Yucheng,Li, Xiang,Wang, Qianning,et al. Visual In-Context Learning for Large Vision-Language Models[C]:Association for Computational Linguistics (ACL), 2024, 15890-15902.
APA	Zhou, Yucheng., Li, Xiang., Wang, Qianning., & Shen, Jianbing (2024). Visual In-Context Learning for Large Vision-Language Models. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 15890-15902.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh