Residential Collegefalse
Status已發表Published
Unifying Image Processing as Visual Prompting Question Answering
Yihao Liu1,2; Xiangyu Chen1,2,3; Xianzheng Ma1; Xintao Wang4; Jiantao Zhou3; Yu Qiao1,2; Chao Dong1,2
2024-02
Conference NameProceedings of IEEE Conference on Machine Learning
Source PublicationProceedings of Machine Learning Research
Volume235
Pages30873 - 30891
Conference DateJuly 21 through July 27, 2024.
Conference PlaceVienna, Austria.
PublisherML Research Press
Abstract

Image processing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent vision applications. Traditionally, task-specific models are developed for individual tasks and designing such models requires distinct expertise. Building upon the success of large language models (LLMs) in natural language processing (NLP), there is a similar trend in computer vision, which focuses on developing large-scale models through pretraining and in-context learning. This paradigm shift reduces the reliance on task-specific models, yielding a powerful unified model to deal with various tasks. However, these advances have predominantly concentrated on high-level vision tasks, with less attention paid to low-level vision tasks. To address this issue, we propose a universal model for general image processing that covers image restoration, image enhancement, image feature extraction tasks, etc. Our proposed framework, named PromptGIP, unifies these diverse image processing tasks within a universal framework. Inspired by NLP question answering (QA) techniques, we employ a visual prompting question answering paradigm. Specifically, we treat the input-output image pair as a structured question-answer sentence, thereby reprogramming the image processing task as a prompting QA problem. PromptGIP can undertake diverse cross-domain tasks using provided visual prompts, eliminating the need for task-specific finetuning. Capable of handling up to 15 different image processing tasks, PromptGIP represents a versatile and adaptive approach to general image processing.

URLView the original
Language英語English
Scopus ID2-s2.0-85203824374
Citation statistics
Document TypeConference paper
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorChao Dong
Affiliation1.Shanghai Artificial Intelligence Laboratory
2.Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
3.University of Macau
4.ARC Lab, Tencent PCG
Recommended Citation
GB/T 7714
Yihao Liu,Xiangyu Chen,Xianzheng Ma,et al. Unifying Image Processing as Visual Prompting Question Answering[C]:ML Research Press, 2024, 30873 - 30891.
APA Yihao Liu., Xiangyu Chen., Xianzheng Ma., Xintao Wang., Jiantao Zhou., Yu Qiao., & Chao Dong (2024). Unifying Image Processing as Visual Prompting Question Answering. Proceedings of Machine Learning Research, 235, 30873 - 30891.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Yihao Liu]'s Articles
[Xiangyu Chen]'s Articles
[Xianzheng Ma]'s Articles
Baidu academic
Similar articles in Baidu academic
[Yihao Liu]'s Articles
[Xiangyu Chen]'s Articles
[Xianzheng Ma]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Yihao Liu]'s Articles
[Xiangyu Chen]'s Articles
[Xianzheng Ma]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.