IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

doi:10.1109/CVPR52733.2024.01412

UM > Faculty of Science and Technology

Residential College	false
Status	已發表Published
	IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
	Yin, Junbo 1; Shen, Jianbing2 ; Chen, Runnan 3; Li, Wei 4; Yang, Ruigang 4; Frossard, Pascal 5; Wang, Wenguan 6
	2024-09
Conference Name	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Source Publication	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Pages	14905-14915
Conference Date	16-22 June 2024
Conference Place	Seattle, WA, USA
Country	USA
Publisher	IEEE Computer Society
Abstract	Bird's eye view (BEV) representation has emerged as a dominant solution for describing 3D space in autonomous driving scenarios. However, objects in the BEV representation typically exhibit small sizes, and the associated point cloud context is inherently sparse, which leads to great challenges for reliable 3D perception. In this paper, we propose IS-Fusion, an innovative multimodal fusion framework that jointly captures the Instance- and Scene-level contextual information. IS-Fusion essentially differs from existing approaches that only focus on the BEV scene-level fusion by explicitly incorporating instance-level multimodal information, thus facilitating the instance-centric tasks like 3D object detection. It comprises a Hierarchical Scene Fusion (HSF) module and an Instance-Guided Fusion (IGF) module. HSF applies Point-to-Grid and Grid-to-Region transformers to capture the multimodal scene context at different granularities. IGF mines instance candidates, explores their relationships, and aggregates the local multimodal context for each instance. These instances then serve as guidance to enhance the scene feature and yield an instance-aware BEV representation. On the challenging nuScenes benchmark, IS-Fusion outperforms all the published multimodal works to date. Code is available at: https://github.com/yinjunbo/IS-Fusion.
Keyword	Point Cloud Compression Computer Vision Three-dimensional Displays Collaboration Object Detection Benchmark Testing Transformers Multimodal Object Detection Point Cloud Autonomous Driving
DOI	10.1109/CVPR52733.2024.01412
URL	View the original
Language	英語English
Scopus ID	2-s2.0-85189114966
Fulltext Access	View Full-Text via DOI View Full-Text via Scopus
Citation statistics
Document Type	Conference paper
Collection	Faculty of Science and Technology THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding Author	Li, Wei; Wang, Wenguan
Affiliation	1.School of Computer Science and Technology, Beijing Institute of Technology, China 2.SKL-IOTSC, Cis, University of Macau, Macao 3.The University of Hong Kong, Hong Kong 4.Inceptio, United States 5.École Polytechnique Fédérale de Lausanne (EPFL), Switzerland 6.ReLER, Ccai, Zhejiang University, China
Recommended Citation GB/T 7714	Yin, Junbo,Shen, Jianbing,Chen, Runnan,et al. IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection[C]:IEEE Computer Society, 2024, 14905-14915.
APA	Yin, Junbo., Shen, Jianbing., Chen, Runnan., Li, Wei., Yang, Ruigang., Frossard, Pascal., & Wang, Wenguan (2024). IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 14905-14915.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh