Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

UM > Faculty of Science and Technology > DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE

Residential College	false
Status	已發表Published
	Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
	Sun, Zeyi1 ; Fang, Ye2 ; Wu, Tong3 ; Zhang, Pan 4; Zang, Yuhang 4; KONG SHU5 ; Xiong, Yuanjun 4; Lin, Dahua 3; Wang, Jiaqi 4
	2024-06
Conference Name	The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR)
Conference Date	June 21, 2024
Conference Place	Seattle
Country	USA
Abstract	Contrastive Language-Image Pre-training (CLIP) plays an essential role in extracting valuable content information from images across diverse tasks. It aligns textual and visual modalities to comprehend the entire image, including all the details, even those irrelevant to specific tasks. However, for a finer understanding and controlled editing of images, it becomes crucial to focus on specific regions of interest, which can be indicated as points, masks, or boxes by humans or perception models. To fulfill the requirements, we introduce Alpha-CLIP, an enhanced version of CLIP with an auxiliary alpha channel to suggest attentive regions and fine-tuned with constructed millions of RGBA region-text pairs. Alpha-CLIP not only preserves the visual recognition ability of CLIP but also enables precise control over the emphasis of image contents. It demonstrates effectiveness in various tasks, including but not limited to open-world recognition, multimodal large language models, and conditional 2D / 3D generation. It has a strong potential to serve as a versatile tool for image-related tasks.
Document Type	Conference paper
Collection	DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding Author	Wang, Jiaqi
Affiliation	1.Shanghai Jiao Tong University 2.Fudan University 3.The Chinese University of Hong Kong 4.Shanghai AI Laboratory 5.University of Macau 6.MThreads, Inc.
Recommended Citation GB/T 7714	Sun, Zeyi,Fang, Ye,Wu, Tong,et al. Alpha-CLIP: A CLIP Model Focusing on Wherever You Want[C], 2024.
APA	Sun, Zeyi., Fang, Ye., Wu, Tong., Zhang, Pan., Zang, Yuhang., KONG SHU., Xiong, Yuanjun., Lin, Dahua., & Wang, Jiaqi (2024). Alpha-CLIP: A CLIP Model Focusing on Wherever You Want. .

Files in This Item:		Download All
File Name/Size	Publications	Version	Access	License
Alpha-CLIP.pdf（14556KB）	会议论文		开放获取	CC BY-NC-SA	View Download

File name:	Alpha-CLIP.pdf
Format:	Adobe PDF

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh