Residential College | false |
Status | 已發表Published |
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection | |
Yin, Junbo1; Shen, Jianbing2; Chen, Runnan3; Li, Wei4; Yang, Ruigang4; Frossard, Pascal5; Wang, Wenguan6 | |
2024-09 | |
Conference Name | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
Source Publication | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
Pages | 14905-14915 |
Conference Date | 16-22 June 2024 |
Conference Place | Seattle, WA, USA |
Country | USA |
Publisher | IEEE Computer Society |
Abstract | Bird's eye view (BEV) representation has emerged as a dominant solution for describing 3D space in autonomous driving scenarios. However, objects in the BEV representation typically exhibit small sizes, and the associated point cloud context is inherently sparse, which leads to great challenges for reliable 3D perception. In this paper, we propose IS-Fusion, an innovative multimodal fusion framework that jointly captures the Instance- and Scene-level contextual information. IS-Fusion essentially differs from existing approaches that only focus on the BEV scene-level fusion by explicitly incorporating instance-level multimodal information, thus facilitating the instance-centric tasks like 3D object detection. It comprises a Hierarchical Scene Fusion (HSF) module and an Instance-Guided Fusion (IGF) module. HSF applies Point-to-Grid and Grid-to-Region transformers to capture the multimodal scene context at different granularities. IGF mines instance candidates, explores their relationships, and aggregates the local multimodal context for each instance. These instances then serve as guidance to enhance the scene feature and yield an instance-aware BEV representation. On the challenging nuScenes benchmark, IS-Fusion outperforms all the published multimodal works to date. Code is available at: https://github.com/yinjunbo/IS-Fusion. |
Keyword | Point Cloud Compression Computer Vision Three-dimensional Displays Collaboration Object Detection Benchmark Testing Transformers Multimodal Object Detection Point Cloud Autonomous Driving |
DOI | 10.1109/CVPR52733.2024.01412 |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85189114966 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | Faculty of Science and Technology THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Li, Wei; Wang, Wenguan |
Affiliation | 1.School of Computer Science and Technology, Beijing Institute of Technology, China 2.SKL-IOTSC, Cis, University of Macau, Macao 3.The University of Hong Kong, Hong Kong 4.Inceptio, United States 5.École Polytechnique Fédérale de Lausanne (EPFL), Switzerland 6.ReLER, Ccai, Zhejiang University, China |
Recommended Citation GB/T 7714 | Yin, Junbo,Shen, Jianbing,Chen, Runnan,et al. IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection[C]:IEEE Computer Society, 2024, 14905-14915. |
APA | Yin, Junbo., Shen, Jianbing., Chen, Runnan., Li, Wei., Yang, Ruigang., Frossard, Pascal., & Wang, Wenguan (2024). IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 14905-14915. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment