UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
Yin, Junbo1; Shen, Jianbing2; Chen, Runnan3; Li, Wei4; Yang, Ruigang4; Frossard, Pascal5; Wang, Wenguan6
2024-09
Conference Name2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Source PublicationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Pages14905-14915
Conference Date16-22 June 2024
Conference PlaceSeattle, WA, USA
CountryUSA
PublisherIEEE Computer Society
Abstract

Bird's eye view (BEV) representation has emerged as a dominant solution for describing 3D space in autonomous driving scenarios. However, objects in the BEV representation typically exhibit small sizes, and the associated point cloud context is inherently sparse, which leads to great challenges for reliable 3D perception. In this paper, we propose IS-Fusion, an innovative multimodal fusion framework that jointly captures the Instance- and Scene-level contextual information. IS-Fusion essentially differs from existing approaches that only focus on the BEV scene-level fusion by explicitly incorporating instance-level multimodal information, thus facilitating the instance-centric tasks like 3D object detection. It comprises a Hierarchical Scene Fusion (HSF) module and an Instance-Guided Fusion (IGF) module. HSF applies Point-to-Grid and Grid-to-Region transformers to capture the multimodal scene context at different granularities. IGF mines instance candidates, explores their relationships, and aggregates the local multimodal context for each instance. These instances then serve as guidance to enhance the scene feature and yield an instance-aware BEV representation. On the challenging nuScenes benchmark, IS-Fusion outperforms all the published multimodal works to date. Code is available at: https://github.com/yinjunbo/IS-Fusion.

KeywordPoint Cloud Compression Computer Vision Three-dimensional Displays Collaboration Object Detection Benchmark Testing Transformers Multimodal Object Detection Point Cloud Autonomous Driving
DOI10.1109/CVPR52733.2024.01412
URLView the original
Language英語English
Scopus ID2-s2.0-85189114966
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionFaculty of Science and Technology
THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorLi, Wei; Wang, Wenguan
Affiliation1.School of Computer Science and Technology, Beijing Institute of Technology, China
2.SKL-IOTSC, Cis, University of Macau, Macao
3.The University of Hong Kong, Hong Kong
4.Inceptio, United States
5.École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
6.ReLER, Ccai, Zhejiang University, China
Recommended Citation
GB/T 7714
Yin, Junbo,Shen, Jianbing,Chen, Runnan,et al. IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection[C]:IEEE Computer Society, 2024, 14905-14915.
APA Yin, Junbo., Shen, Jianbing., Chen, Runnan., Li, Wei., Yang, Ruigang., Frossard, Pascal., & Wang, Wenguan (2024). IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 14905-14915.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Yin, Junbo]'s Articles
[Shen, Jianbing]'s Articles
[Chen, Runnan]'s Articles
Baidu academic
Similar articles in Baidu academic
[Yin, Junbo]'s Articles
[Shen, Jianbing]'s Articles
[Chen, Runnan]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Yin, Junbo]'s Articles
[Shen, Jianbing]'s Articles
[Chen, Runnan]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.