UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
Learning A Low-Level Vision Generalist via Visual Task Prompt
Chen, Xiangyu1,2,3; Liu, Yihao4,5; Pu, Yuandong6; Zhang, Wenlong7; Zhou, Jiantao8; Qiao, Yu4,5; Dong, Chao4,5,9
2024-11
Conference Name32nd ACM International Conference on Multimedia, MM 2024
Source PublicationMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
Pages2671-2680
Conference Date28 October 2024 - 1 November 2024
Conference PlaceMelbourne
CountryAustralia
PublisherAssociation for Computing Machinery, Inc
Abstract

Building a unified model for general low-level vision tasks holds significant research and practical value. Current methods encounter several critical issues. Multi-task restoration approaches can address multiple degradation-to-clean restoration tasks, while their applicability to tasks with different target domains (e.g., image stylization) is limited. Methods like PromptGIP can handle multiple input-target domains but rely on the Masked Autoencoder (MAE) paradigm. Consequently, they are tied to the ViT architecture, resulting in suboptimal image reconstruction quality. In addition, these methods are sensitive to prompt image content and often struggle with low-frequency information processing. In this paper, we propose a Visual task Prompt-based Image Processing (VPIP) framework to overcome these challenges. VPIP employs visual task prompts to manage tasks with different input-target domains and allows flexible selection of backbone network suitable for general tasks. Besides, a new prompt cross-attention is introduced to facilitate interaction between the input and prompt information. Based on the VPIP framework, we train a low-level vision generalist model, namely GenLV, on 30 diverse tasks. Experimental results show that GenLV can successfully address a variety of low-level tasks, significantly outperforming existing methods both quantitatively and qualitatively. Codes are available at https://github.com/chxy95/GenLV.

KeywordGeneral Low-level Vision Image Restoration And Enhancement Multi-task Learning Visual Prompt
DOI10.1145/3664647.3681621
URLView the original
Language英語English
Scopus ID2-s2.0-85209806287
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionFaculty of Science and Technology
THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation1.University of Macau, Macao
2.Shanghai Ai Laboratory, Macau, Macao
3.Shenzhen Institute of Advanced Technology, Cas, Macau, Macao
4.Shanghai Ai Laboratory, Shanghai, China
5.Shenzhen Institute of Advanced Technology, Cas, Shanghai, China
6.Shanghai Jiao Tong University, Shanghai Ai Laboratory, Shanghai, China
7.Shanghai Ai Laboratory, The Hong Kong Polytechnic University, Shanghai, China
8.State Key Laboratory of Internet of Things for Smart City, University of Macau, Macao
9.Shenzhen University of Advanced Technology, Shanghai, China
First Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Chen, Xiangyu,Liu, Yihao,Pu, Yuandong,et al. Learning A Low-Level Vision Generalist via Visual Task Prompt[C]:Association for Computing Machinery, Inc, 2024, 2671-2680.
APA Chen, Xiangyu., Liu, Yihao., Pu, Yuandong., Zhang, Wenlong., Zhou, Jiantao., Qiao, Yu., & Dong, Chao (2024). Learning A Low-Level Vision Generalist via Visual Task Prompt. MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, 2671-2680.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Chen, Xiangyu]'s Articles
[Liu, Yihao]'s Articles
[Pu, Yuandong]'s Articles
Baidu academic
Similar articles in Baidu academic
[Chen, Xiangyu]'s Articles
[Liu, Yihao]'s Articles
[Pu, Yuandong]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Chen, Xiangyu]'s Articles
[Liu, Yihao]'s Articles
[Pu, Yuandong]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.