Residential College | false |
Status | 已發表Published |
SparseInteraction: Sparse Semantic Guidance for Radar and Camera 3D Object Detection | |
Xu, Shaoqing1; Jiang, Shengyin2; Li, Fang3; Liu, Li3; Song, Ziying4; Yang, Bo2; Yang, Zhi Xin1 | |
2024-11 | |
Conference Name | 32nd ACM International Conference on Multimedia, MM 2024 |
Source Publication | MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia |
Pages | 9224-9233 |
Conference Date | 28 October 2024 - 1 November 2024 |
Conference Place | Melbourne |
Country | Australia |
Publisher | Association for Computing Machinery, Inc |
Abstract | Multi-modal fusion techniques, such as radar and images, enable a complementary and cost-effective perception of the surrounding environment regardless of lighting and weather conditions. However, existing fusion methods for surround-view images and radar are challenged by the inherent noise and positional ambiguity of radar, which leads to significant performance losses. To address this limitation effectively, our paper presents a robust, end-to-end fusion framework dubbed SparseInteraction. First, we introduce the Noisy Radar Filter (NRF) module to extract foreground features by creatively using queried semantic features from the image to filter out noisy radar features. Furthermore, we implement the Sparse Cross-Attention Encoder (SCAE) to effectively blend foreground radar features and image features to address positional ambiguity issues at a sparse level. Ultimately, to facilitate model convergence and performance, the foreground prior queries containing position information of the foreground radar are concatenated with predefined queries and fed into the subsequent transformer-based decoder. The experimental results demonstrate that the proposed fusion strategies markedly enhance detection performance and achieve new state-of-the-art results on the nuScenes benchmark. Source code is available at https://github.com/GG-Bonds/SparseInteraction. |
Keyword | 3d Object Detection Autonomous Driving Multi-modal |
DOI | 10.1145/3664647.3681565 |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85204507729 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | Faculty of Science and Technology DEPARTMENT OF ELECTROMECHANICAL ENGINEERING |
Corresponding Author | Yang, Zhi Xin |
Affiliation | 1.University of Macau, Macao 2.Beijing University of Posts and Telecommunications, Beijing, China 3.Beijing Institute of Technology, Beijing, China 4.Beijing Jiaotong University, Beijing, China |
First Author Affilication | University of Macau |
Corresponding Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Xu, Shaoqing,Jiang, Shengyin,Li, Fang,et al. SparseInteraction: Sparse Semantic Guidance for Radar and Camera 3D Object Detection[C]:Association for Computing Machinery, Inc, 2024, 9224-9233. |
APA | Xu, Shaoqing., Jiang, Shengyin., Li, Fang., Liu, Li., Song, Ziying., Yang, Bo., & Yang, Zhi Xin (2024). SparseInteraction: Sparse Semantic Guidance for Radar and Camera 3D Object Detection. MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, 9224-9233. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment