Temporal Relation Inference Network for Multi-modal Speech Emotion Recognition

doi:10.1109/TCSVT.2022.3163445

UM > Faculty of Science and Technology > DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE

Residential College	false
Status	已發表Published
	Temporal Relation Inference Network for Multi-modal Speech Emotion Recognition
	Dong, Guan Nan 1; Pun, Chi Man1 ; Zhang, Zheng 2
	2022
Source Publication	IEEE Transactions on Circuits and Systems for Video Technology
ISSN	1051-8215
Volume	32 Issue:9 Pages:6472-6485
Abstract	Speech emotion recognition (SER) is a non-trivial task for humans, while it remains challenging for automatic SER due to the linguistic complexity and contextual distortion. Notably, previous automatic SER systems always regarded multi-modal information and temporal relations of speech as two independent tasks, ignoring their association. We argue that the valid semantic features and temporal relations of speech are both meaningful event relationships. This paper proposes a novel temporal relation inference network (TRIN) to help tackle multi-modal SER, which fully considers the underlying hierarchy of phonetic structure and its associations between various modalities under the sequential temporal guidance. Mainly, we design a temporal reasoning calibration module to imitate real and abundant contextual conditions. Unlike the previous works, which assume all multiple modalities are related, it infers the dependency relationship between the semantic information from the temporal level and learns to handle the multi-modal interaction sequence with a flexible order. To enhance the feature representation, an innovative temporal attentive fusion unit is developed to magnify the details embedded in a single modality from semantic level. Meanwhile, it aggregates the feature representation from both the temporal and semantic levels to maximize the integrity of feature representation by an adaptive feature fusion mechanism to selectively collect the implicit complementary information to strengthen the dependencies between different information subspaces. Extensive experiments conducted on two benchmark datasets demonstrate the superiority of our TRIN method against some state-of-the-art SER methods.
Keyword	Cognition Correlation Emotion Recognition Feature Extraction Hidden Markov Models Multi-modal Learning Relation Inference Network Speech Emotion Recognition Speech Recognition Task Analysis Temporal Learning
DOI	10.1109/TCSVT.2022.3163445
URL	View the original
Indexed By	SCIE
Language	英語English
WOS Research Area	Engineering
WOS Subject	Engineering, Electrical & Electronic
WOS ID	WOS:000849300000061
Scopus ID	2-s2.0-85127503787
Fulltext Access	View Full-Text via DOI View Full-Text via Web of Science View Full-Text via Scopus
Citation statistics
Document Type	Journal article
Collection	DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding Author	Pun, Chi Man
Affiliation	1.Department of Computer and Information Science, University of Macau, Macau 999078, China. 2.Department of Computer and Information Science, University of Macau, Macau 999078, China, and Harbin Institute of Technology, Shenzhen, China.
First Author Affilication	University of Macau
Corresponding Author Affilication	University of Macau
Recommended Citation GB/T 7714	Dong, Guan Nan,Pun, Chi Man,Zhang, Zheng. Temporal Relation Inference Network for Multi-modal Speech Emotion Recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(9), 6472-6485.
APA	Dong, Guan Nan., Pun, Chi Man., & Zhang, Zheng (2022). Temporal Relation Inference Network for Multi-modal Speech Emotion Recognition. IEEE Transactions on Circuits and Systems for Video Technology, 32(9), 6472-6485.
MLA	Dong, Guan Nan,et al."Temporal Relation Inference Network for Multi-modal Speech Emotion Recognition".IEEE Transactions on Circuits and Systems for Video Technology 32.9(2022):6472-6485.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh