Residential College | false |
Status | 已發表Published |
Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks | |
Han, Wencheng1; Dong, Xingping2; Zhang, Yiyuan3; Crandall, David4; Xu, Cheng Zhong1; Shen, Jianbing1 | |
2024-05-14 | |
Source Publication | IEEE Transactions on Pattern Analysis and Machine Intelligence |
ISSN | 0162-8828 |
Pages | 3400873 |
Abstract | Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and timeconsuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks: visual object tracking, referring video object segmentation, and monocular 3D object detection. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency. |
Keyword | Fusing Features Asymmetric Convolution Convolution Feature Maps Vision Tasks |
DOI | 10.1109/TPAMI.2024.3400873 |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85193256171 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Science and Technology THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Shen, Jianbing |
Affiliation | 1.State Key Laboratory of Internet of Things for Smart City, Department of Computer and Information Science, University of Macau, Macau, China 2.Inception Institute of Artificial Intelligence, Abu Dhabi, UAE 3.Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong, China 4.School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana |
First Author Affilication | University of Macau |
Corresponding Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Han, Wencheng,Dong, Xingping,Zhang, Yiyuan,et al. Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 3400873. |
APA | Han, Wencheng., Dong, Xingping., Zhang, Yiyuan., Crandall, David., Xu, Cheng Zhong., & Shen, Jianbing (2024). Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3400873. |
MLA | Han, Wencheng,et al."Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks".IEEE Transactions on Pattern Analysis and Machine Intelligence (2024):3400873. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment