Residential College | false |
Status | 即將出版Forthcoming |
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching | |
Chu, Meng1; Zheng, Zhedong2![]() ![]() | |
2025 | |
Conference Name | 18th European Conference on Computer Vision, ECCV 2024 |
Source Publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
![]() |
Volume | 15069 LNCS |
Pages | 213-231 |
Conference Date | 29 September 2024 to 4 October 2024 |
Conference Place | Milan; Italy |
Publisher | Springer Science and Business Media Deutschland GmbH |
Abstract | Navigating drones through natural language commands remains challenging due to the dearth of accessible multi-modal datasets and the stringent precision requirements for aligning visual and textual data. To address this pressing need, we introduce GeoText-1652, a new natural language-guided geolocalization benchmark. This dataset is systematically constructed through an interactive human-computer process leveraging Large Language Model (LLM) driven annotation techniques in conjunction with pre-trained vision models. GeoText-1652 extends the established University-1652 image dataset with spatial-aware text annotations, thereby establishing one-to-one correspondences between image, text, and bounding box elements. We further introduce a new optimization objective to leverage fine-grained spatial associations, called blending spatial matching, for region-level spatial relation matching. Extensive experiments reveal that our approach maintains a competitive recall rate comparing other prevailing cross-modality methods. This underscores the promising potential of our approach in elevating drone control and navigation through the seamless integration of natural language commands in real-world scenarios. |
Keyword | Drone Navigation Geolocalization Spatial Relation Matching text Guidance |
DOI | 10.1007/978-3-031-73247-8_13 |
URL | View the original |
Indexed By | CPCI-S |
Language | 英語English |
WOS Research Area | Computer Science |
WOS Subject | Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications ; Computer Science, Theory & Methods |
WOS ID | WOS:001353688700013 |
Scopus ID | 2-s2.0-85210022886 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Zheng, Zhedong |
Affiliation | 1.School of Computing, National University of Singapore, Singapore, Singapore 2.FST and ICI, University of Macau, Macao 3.School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, China |
Corresponding Author Affilication | Faculty of Science and Technology |
Recommended Citation GB/T 7714 | Chu, Meng,Zheng, Zhedong,Ji, Wei,et al. Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching[C]:Springer Science and Business Media Deutschland GmbH, 2025, 213-231. |
APA | Chu, Meng., Zheng, Zhedong., Ji, Wei., Wang, Tingyu., & Chua, Tat Seng (2025). Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 15069 LNCS, 213-231. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment