Residential College | false |
Status | 已發表Published |
A simple but effective vision transformer framework for visible–infrared person re-identification | |
Li, Yudong1; Zhao, Sanyuan1,2; Shen, Jianbing3 | |
2024-12 | |
Source Publication | Computer Vision and Image Understanding |
ISSN | 1077-3142 |
Volume | 249Pages:104192 |
Abstract | In the context of visible–infrared person re-identification (VI-ReID), the acquisition of a robust visual representation is paramount. Existing approaches predominantly rely on convolutional neural networks (CNNs), which are guided by intricately designed loss functions to extract features. In contrast, the vision transformer (ViT), a potent visual backbone, has often yielded subpar results in VI-ReID. We contend that the prevailing training methodologies and insights derived from CNNs do not seamlessly apply to ViT, leading to the underutilization of its potential in VI-ReID. One notable limitation is ViT's appetite for extensive data, exemplified by the JFT-300M dataset, to surpass CNNs. Consequently, ViT struggles to transfer its knowledge from visible to infrared images due to inadequate training data. Even the largest available dataset, SYSU-MM01, proves insufficient for ViT to glean a robust representation of infrared images. This predicament is exacerbated when ViT is trained on the smaller RegDB dataset, where slight data flow modifications drastically affect performance—a stark contrast to CNN behavior. These observations lead us to conjecture that the CNN-inspired paradigm impedes ViT's progress in VI-ReID. In light of these challenges, we undertake comprehensive ablation studies to shed new light on ViT's applicability in VI-ReID. We propose a straightforward yet effective framework, named “Idformer”, to train a high-performing ViT for VI-ReID. Idformer serves as a robust baseline that can be further enhanced with carefully designed techniques akin to those used for CNNs. Remarkably, our method attains competitive results even in the absence of auxiliary information, achieving 78.58%/76.99% Rank-1/mAP on the SYSU-MM01 dataset, as well as 96.82%/91.83% Rank-1/mAP on the RegDB dataset. The code will be made publicly accessible. |
Keyword | Cross-modality Visual Infrared Person Re-identification Vit |
DOI | 10.1016/j.cviu.2024.104192 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Computer Science ; Engineering |
WOS Subject | Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic |
WOS ID | WOS:001336496500001 |
Publisher | ACADEMIC PRESS INC ELSEVIER SCIENCE, 525 B ST, STE 1900, SAN DIEGO, CA 92101-4495 |
Scopus ID | 2-s2.0-85206271981 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Science and Technology THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Zhao, Sanyuan |
Affiliation | 1.School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China 2.Yangtze Delta Region Academy, Beijing Institute of Technology, Jiaxing, 314019, China 3.State Key Laboratory of Internet of Things for Smart City, Department of Computer and Information Science, University of Macau, Macao Special Administrative Region of China |
Recommended Citation GB/T 7714 | Li, Yudong,Zhao, Sanyuan,Shen, Jianbing. A simple but effective vision transformer framework for visible–infrared person re-identification[J]. Computer Vision and Image Understanding, 2024, 249, 104192. |
APA | Li, Yudong., Zhao, Sanyuan., & Shen, Jianbing (2024). A simple but effective vision transformer framework for visible–infrared person re-identification. Computer Vision and Image Understanding, 249, 104192. |
MLA | Li, Yudong,et al."A simple but effective vision transformer framework for visible–infrared person re-identification".Computer Vision and Image Understanding 249(2024):104192. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment