UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
A simple but effective vision transformer framework for visible–infrared person re-identification
Li, Yudong1; Zhao, Sanyuan1,2; Shen, Jianbing3
2024-12
Source PublicationComputer Vision and Image Understanding
ISSN1077-3142
Volume249Pages:104192
Abstract

In the context of visible–infrared person re-identification (VI-ReID), the acquisition of a robust visual representation is paramount. Existing approaches predominantly rely on convolutional neural networks (CNNs), which are guided by intricately designed loss functions to extract features. In contrast, the vision transformer (ViT), a potent visual backbone, has often yielded subpar results in VI-ReID. We contend that the prevailing training methodologies and insights derived from CNNs do not seamlessly apply to ViT, leading to the underutilization of its potential in VI-ReID. One notable limitation is ViT's appetite for extensive data, exemplified by the JFT-300M dataset, to surpass CNNs. Consequently, ViT struggles to transfer its knowledge from visible to infrared images due to inadequate training data. Even the largest available dataset, SYSU-MM01, proves insufficient for ViT to glean a robust representation of infrared images. This predicament is exacerbated when ViT is trained on the smaller RegDB dataset, where slight data flow modifications drastically affect performance—a stark contrast to CNN behavior. These observations lead us to conjecture that the CNN-inspired paradigm impedes ViT's progress in VI-ReID. In light of these challenges, we undertake comprehensive ablation studies to shed new light on ViT's applicability in VI-ReID. We propose a straightforward yet effective framework, named “Idformer”, to train a high-performing ViT for VI-ReID. Idformer serves as a robust baseline that can be further enhanced with carefully designed techniques akin to those used for CNNs. Remarkably, our method attains competitive results even in the absence of auxiliary information, achieving 78.58%/76.99% Rank-1/mAP on the SYSU-MM01 dataset, as well as 96.82%/91.83% Rank-1/mAP on the RegDB dataset. The code will be made publicly accessible.

KeywordCross-modality Visual Infrared Person Re-identification Vit
DOI10.1016/j.cviu.2024.104192
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaComputer Science ; Engineering
WOS SubjectComputer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS IDWOS:001336496500001
PublisherACADEMIC PRESS INC ELSEVIER SCIENCE, 525 B ST, STE 1900, SAN DIEGO, CA 92101-4495
Scopus ID2-s2.0-85206271981
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionFaculty of Science and Technology
THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorZhao, Sanyuan
Affiliation1.School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
2.Yangtze Delta Region Academy, Beijing Institute of Technology, Jiaxing, 314019, China
3.State Key Laboratory of Internet of Things for Smart City, Department of Computer and Information Science, University of Macau, Macao Special Administrative Region of China
Recommended Citation
GB/T 7714
Li, Yudong,Zhao, Sanyuan,Shen, Jianbing. A simple but effective vision transformer framework for visible–infrared person re-identification[J]. Computer Vision and Image Understanding, 2024, 249, 104192.
APA Li, Yudong., Zhao, Sanyuan., & Shen, Jianbing (2024). A simple but effective vision transformer framework for visible–infrared person re-identification. Computer Vision and Image Understanding, 249, 104192.
MLA Li, Yudong,et al."A simple but effective vision transformer framework for visible–infrared person re-identification".Computer Vision and Image Understanding 249(2024):104192.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li, Yudong]'s Articles
[Zhao, Sanyuan]'s Articles
[Shen, Jianbing]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li, Yudong]'s Articles
[Zhao, Sanyuan]'s Articles
[Shen, Jianbing]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li, Yudong]'s Articles
[Zhao, Sanyuan]'s Articles
[Shen, Jianbing]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.