UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
MIVCN: Multimodal interaction video captioning network based on semantic association graph
Wang, Ying1; Huang, Guoheng1; Yuming, Lin1; Yuan, Haoliang2; Pun, Chi Man3; Ling, Wing Kuen4; Cheng, Lianglun1
2021-08-07
Source PublicationAPPLIED INTELLIGENCE
ISSN0924-669X
Volume52Issue:5Pages:5241-5260
Abstract

In the field of computer vision, it is a challenging task to generate natural language captions from videos as input. To deal with this task, videos are usually regarded as feature sequences and input into Long-Short Term Memory (LSTM) to generate natural language. To get richer and more detailed video content representation, a Multimodal Interaction Video Captioning Network based on Semantic Association Graph (MIVCN) is developed towards this task. This network consists of two modules: Semantic association Graph Module (SAGM) and Multimodal Attention Constraint Module (MACM). Firstly, owing to lack of the semantic interdependence, existing methods often produce illogical sentence structures. Therefore, we propose a SAGM based on information association, which enables network to strengthen the connection between logically related languages and alienate the relations between logically unrelated languages. Secondly, features of each modality need to pay attention to different information among them, and the captured multimodal features are great informative and redundant. Based on the discovery, we propose a MACM based on LSTM, which can capture complementary visual features and filter redundant visual features. The MACM is applied to integrate multimodal features into LSTM, and make network to screen and focus on informative features. Through the association of semantic attributes and the interaction of multimodal features, the semantically contextual interdependent and visually complementary information can be captured by this network, and the informative representation in videos also can be better used for generating captioning. The proposed MIVCN realizes the best caption generation performance on MSVD: 56.8%, 36.4%, and 79.1% on BLEU@4, METEOR, and ROUGE-L evaluation metrics, respectively. Superior results are also reported on MSR-VTT about BLEU@4, METEOR, and ROUGE-L compared to state-of-the-art methods.

KeywordAttention Mechanism Gated Recurrent Unit Graph Convolutional Network Long-short Term Memory Multimodal Fusion Video Captioning
DOI10.1007/s10489-021-02612-y
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaComputer Science
WOS SubjectComputer Science, Artificial Intelligence
WOS IDWOS:000682627000001
Scopus ID2-s2.0-85112645418
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionFaculty of Science and Technology
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorHuang, Guoheng; Yuan, Haoliang; Pun, Chi Man; Ling, Wing Kuen
Affiliation1.School of Computer, Guangdong University of Technology, Guangzhou, 510006, China
2.School of Automation, Guangdong University of Technology, Guangzhou, 510006, China
3.Department of Computer and Information Science, University of Macau, 999078, Macao
4.School of Information Engineering, Guangdong University of Technology, Guangzhou, 510006, China
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Wang, Ying,Huang, Guoheng,Yuming, Lin,et al. MIVCN: Multimodal interaction video captioning network based on semantic association graph[J]. APPLIED INTELLIGENCE, 2021, 52(5), 5241-5260.
APA Wang, Ying., Huang, Guoheng., Yuming, Lin., Yuan, Haoliang., Pun, Chi Man., Ling, Wing Kuen., & Cheng, Lianglun (2021). MIVCN: Multimodal interaction video captioning network based on semantic association graph. APPLIED INTELLIGENCE, 52(5), 5241-5260.
MLA Wang, Ying,et al."MIVCN: Multimodal interaction video captioning network based on semantic association graph".APPLIED INTELLIGENCE 52.5(2021):5241-5260.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang, Ying]'s Articles
[Huang, Guoheng]'s Articles
[Yuming, Lin]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang, Ying]'s Articles
[Huang, Guoheng]'s Articles
[Yuming, Lin]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang, Ying]'s Articles
[Huang, Guoheng]'s Articles
[Yuming, Lin]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.