Residential Collegefalse
Status已發表Published
Group Multi-View Transformer for 3D Shape Analysis with Spatial Encoding
Xu, Lixiang1; Cui, Qingzhe1; Hong, Richang2; Xu, Wei1; Chen, Enhong3; Yuan, Xin4; Li, Chenglong5; Tang, Yuanyan6
2024
Source PublicationIEEE Transactions on Multimedia
ISSN1520-9210
Pages1-14
Abstract

In recent years, the results of view-based 3D shape recognition methods have saturated, and models with excellent performance cannot be deployed on memory-limited devices due to their huge size of parameters. To address this problem, we introduce a compression method based on knowledge distillation for this field, which largely reduces the number of parameters while preserving model performance as much as possible. Specifically, to enhance the capabilities of smaller models, we design a high-performing large model called Group Multi-view Vision Transformer (GMViT). In GMViT, the view-level ViT first establishes relationships between view-level features. Additionally, to capture deeper features, we employ the grouping module to enhance view-level features into group-level features. Finally, the group-level ViT aggregates group-level features into complete, well-formed 3D shape descriptors. Notably, in both ViTs, we introduce spatial encoding of camera coordinates as innovative position embeddings. Furthermore, we propose two compressed versions based on GMViT, namely GMViT-simple and GMViT-mini. To enhance the training effectiveness of the small models, we introduce a knowledge distillation method throughout the GMViT process, where the key outputs of each GMViT component serve as distillation targets. Extensive experiments demonstrate the efficacy of the proposed method. The large model GMViT achieves excellent 3D classification and retrieval results on the benchmark datasets ModelNet, ShapeNetCore55, and MCB. The smaller models, GMViT-simple and GMViT-mini, reduce the parameter size by 8 and 17.6 times, respectively, and improve shape recognition speed by 1.5 times on average, while preserving at least 90% of the recognition performance. The code is available at https://github.com/bigdata-graph/GMViT.

Keyword3d Object Recognition 3d Position Embedding Aggregates Computational Modeling Feature Extraction Knowledge Distillation Long Short Term Memory Multi-view Vit Shape Solid Modeling Three-dimensional Displays View Grouping
DOI10.1109/TMM.2024.3394731
URLView the original
Language英語English
PublisherInstitute of Electrical and Electronics Engineers Inc.
Scopus ID2-s2.0-85192199589
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation1.College of Artificial Intelligence and Big Data, Hefei University, Hefei, China
2.School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
3.Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
4.School of Electrical and Mechanical Engineering, The University of Adelaide, Adelaide, Australia
5.School of Artificial Intelligence, Anhui University, Hefei, China
6.Zhuhai UM Science and Technology Research Institute, FST University of Macau, Macau, China
Recommended Citation
GB/T 7714
Xu, Lixiang,Cui, Qingzhe,Hong, Richang,et al. Group Multi-View Transformer for 3D Shape Analysis with Spatial Encoding[J]. IEEE Transactions on Multimedia, 2024, 1-14.
APA Xu, Lixiang., Cui, Qingzhe., Hong, Richang., Xu, Wei., Chen, Enhong., Yuan, Xin., Li, Chenglong., & Tang, Yuanyan (2024). Group Multi-View Transformer for 3D Shape Analysis with Spatial Encoding. IEEE Transactions on Multimedia, 1-14.
MLA Xu, Lixiang,et al."Group Multi-View Transformer for 3D Shape Analysis with Spatial Encoding".IEEE Transactions on Multimedia (2024):1-14.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Xu, Lixiang]'s Articles
[Cui, Qingzhe]'s Articles
[Hong, Richang]'s Articles
Baidu academic
Similar articles in Baidu academic
[Xu, Lixiang]'s Articles
[Cui, Qingzhe]'s Articles
[Hong, Richang]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Xu, Lixiang]'s Articles
[Cui, Qingzhe]'s Articles
[Hong, Richang]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.