Residential Collegefalse
Status已發表Published
GSB: Group superposition binarization for vision transformer with limited training samples
Gao, Tian1; Xu, Cheng Zhong2,3; Zhang, Le4; Kong, Hui2,3,5
2024-01-18
Source PublicationNeural Networks
ISSN0893-6080
Volume172Pages:106133
Abstract

Vision Transformer (ViT) has performed remarkably in various computer vision tasks. Nonetheless, affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. In addition, ViT generally demands heavy computing resources, which limit its deployment on resource-constrained devices. As a type of model-compression method, model binarization is potentially a good choice to solve the above problems. Compared with the full-precision one, the model with the binarization method replaces complex tensor multiplication with simple bit-wise binary operations and represents full-precision model parameters and activations with only 1-bit ones, which potentially solves the problem of model size and computational complexity, respectively. In this paper, we investigate a binarized ViT model. Empirically, we observe that the existing binarization technology designed for Convolutional Neural Networks (CNN) cannot migrate well to a ViT's binarization task. We also find that the decline of the accuracy of the binary ViT model is mainly due to the information loss of the Attention module and the Value vector. Therefore, we propose a novel model binarization technique, called Group Superposition Binarization (GSB), to deal with these issues. Furthermore, in order to further improve the performance of the binarization model, we have investigated the gradient calculation procedure in the binarization process and derived more proper gradient calculation equations for GSB to reduce the influence of gradient mismatch. Then, the knowledge distillation technique is introduced to alleviate the performance degradation caused by model binarization. Analytically, model binarization can limit the parameter's search space during parameter updates while training a model. Therefore, the binarization process can actually play an implicit regularization role and help solve the problem of overfitting in the case of insufficient training data. Experiments on three datasets with limited numbers of training samples demonstrate that the proposed GSB model achieves state-of-the-art performance among the binary quantization schemes and exceeds its full-precision counterpart on some indicators. Code and models are available at: https://github.com/IMRL/GSB-Vision-Transformer.

KeywordGroup Superposition Binarization Insufficient Training Data Self-attention Vision Transformer (Vit)
DOI10.1016/j.neunet.2024.106133
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaComputer Science ; Neurosciences & Neurology
WOS SubjectComputer Science, Artificial Intelligence ; Neurosciences
WOS IDWOS:001172493500001
PublisherPERGAMON-ELSEVIER SCIENCE LTD, THE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD OX5 1GB, ENGLAND
Scopus ID2-s2.0-85185191404
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Faculty of Science and Technology
THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)
DEPARTMENT OF ELECTROMECHANICAL ENGINEERING
Corresponding AuthorKong, Hui
Affiliation1.School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
2.State Key Laboratory of Internet of Things for Smart City (SKL-IOTSC), University of Macau, 999078, China
3.Department of Computer and Information Science (CIS), University of Macau, 999078, China
4.School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
5.Department of Electromechanical Engineering (EME), University of Macau, 999078, China
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Gao, Tian,Xu, Cheng Zhong,Zhang, Le,et al. GSB: Group superposition binarization for vision transformer with limited training samples[J]. Neural Networks, 2024, 172, 106133.
APA Gao, Tian., Xu, Cheng Zhong., Zhang, Le., & Kong, Hui (2024). GSB: Group superposition binarization for vision transformer with limited training samples. Neural Networks, 172, 106133.
MLA Gao, Tian,et al."GSB: Group superposition binarization for vision transformer with limited training samples".Neural Networks 172(2024):106133.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Gao, Tian]'s Articles
[Xu, Cheng Zhong]'s Articles
[Zhang, Le]'s Articles
Baidu academic
Similar articles in Baidu academic
[Gao, Tian]'s Articles
[Xu, Cheng Zhong]'s Articles
[Zhang, Le]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Gao, Tian]'s Articles
[Xu, Cheng Zhong]'s Articles
[Zhang, Le]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.