Residential Collegefalse
Status已發表Published
Relational Network via Cascade CRF for Video Language Grounding
Zhang, Tong1; Lu, Xiankai1; Zhang, Hao2; Nie, Xiushan3; Yin, Yilong1; Shen, Jianbing4
2024
Source PublicationIEEE Transactions on Multimedia
ISSN1520-9210
Volume26Pages:8297-8311
Abstract

Video Language Grounding is one of the most challenging cross-modal video understanding tasks. This task aims to localize a target moment semantically corresponding to a given language query in an untrimmed video. Many existing VLG methods rely on the proposal-based framework, despite the dominant performance achieved, they usually focus on interacting a few internal frames with the query to score segment proposals, trapping in the long-range dependencies when the proposal feature is limited. Meanwhile, adjacent proposals share similar visual semantics, making VLG models hard to align the accurate semantics of video-query contents and degenerating the ranking performance. To remedy the above limitations, we propose VLG-CRF by introducing the conditional random fields (CRFs) to handle the discrete yet indistinguishable proposals. Specifically, VLG-CRF consists of two cascade CRF-based modules. The AttentiveCRFs is developed for multi-modal feature fusion to better integrate temporal and semantic relation between modalities. We also devise a new variant of ConvCRFs to capture the relation of discrete segments and rectify the predicting scores to make relatively high prediction scores clustered in a range. Experiments on three benchmark datasets, i.e., Charades-STA, ActivityNet-Caption, and TACoS, show the superiority of our method and the state-of-the-art performance is achieved.

KeywordVision-language Grounding Conditional Random Fields Temporal Relation Proposal Free
DOI10.1109/TMM.2023.3303712
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaComputer Science ; Telecommunications
WOS SubjectComputer Science, Information Systems ; Computer Science, Software Engineering ; Telecommunications
WOS IDWOS:001283692500020
PublisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 445 HOES LANE, PISCATAWAY, NJ 08855-4141
Scopus ID2-s2.0-85167800103
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionTHE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)
Corresponding AuthorLu, Xiankai; Yin, Yilong
Affiliation1.School of Software, Shandong University, Jinan 250101, China
2.School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798
3.School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China
4.SKL-IOTSC, University of Macau, Macau 999078, China
Recommended Citation
GB/T 7714
Zhang, Tong,Lu, Xiankai,Zhang, Hao,et al. Relational Network via Cascade CRF for Video Language Grounding[J]. IEEE Transactions on Multimedia, 2024, 26, 8297-8311.
APA Zhang, Tong., Lu, Xiankai., Zhang, Hao., Nie, Xiushan., Yin, Yilong., & Shen, Jianbing (2024). Relational Network via Cascade CRF for Video Language Grounding. IEEE Transactions on Multimedia, 26, 8297-8311.
MLA Zhang, Tong,et al."Relational Network via Cascade CRF for Video Language Grounding".IEEE Transactions on Multimedia 26(2024):8297-8311.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhang, Tong]'s Articles
[Lu, Xiankai]'s Articles
[Zhang, Hao]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhang, Tong]'s Articles
[Lu, Xiankai]'s Articles
[Zhang, Hao]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhang, Tong]'s Articles
[Lu, Xiankai]'s Articles
[Zhang, Hao]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.