Residential Collegefalse
Status已發表Published
Visual In-Context Learning for Large Vision-Language Models
Zhou, Yucheng1; Li, Xiang2; Wang, Qianning3; Shen, Jianbing1
2024
Conference Name62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Source PublicationProceedings of the Annual Meeting of the Association for Computational Linguistics
Pages15890-15902
Conference Date11-16 August 2024
Conference PlaceHybrid, Bangkok
CountryThailand
PublisherAssociation for Computational Linguistics (ACL)
Abstract

In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition. Our approach retrieves images via “Retrieval & Rerank” paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations that reduce token count and alleviate cross-modal interaction problem. Experimental evaluations on five visual reasoning datasets demonstrate the effectiveness of our method. Moreover, our extensive experiments leverage information flow analysis to elucidate the effectiveness of our method, and investigate the impact of length and position of demonstrations for LVLM. The use of in-context unlearning further shows promise in resetting specific model knowledge without retraining.

DOI10.48550/arXiv.2402.11574
URLView the original
Language英語English
Scopus ID2-s2.0-85205302832
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionTHE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)
Corresponding AuthorShen, Jianbing
Affiliation1.SKL-IOTSC, CIS, University of Macau, Macao
2.Tianjin University, China
3.Nanjing Audit University, China
First Author AffilicationUniversity of Macau
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Zhou, Yucheng,Li, Xiang,Wang, Qianning,et al. Visual In-Context Learning for Large Vision-Language Models[C]:Association for Computational Linguistics (ACL), 2024, 15890-15902.
APA Zhou, Yucheng., Li, Xiang., Wang, Qianning., & Shen, Jianbing (2024). Visual In-Context Learning for Large Vision-Language Models. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 15890-15902.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhou, Yucheng]'s Articles
[Li, Xiang]'s Articles
[Wang, Qianning]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhou, Yucheng]'s Articles
[Li, Xiang]'s Articles
[Wang, Qianning]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhou, Yucheng]'s Articles
[Li, Xiang]'s Articles
[Wang, Qianning]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.