Inductive Document Representation Learning for Short Text Clustering

doi:10.1007/978-3-030-67664-3_36

UM > Faculty of Science and Technology > DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE

Residential College	false
Status	已發表Published
	Inductive Document Representation Learning for Short Text Clustering
	Chen, Junyang 1; Gong, Zhiguo1 ; Wang, Wei 1; Dong, Xiao 2; Wang, Wei 3; Liu, Weiwen 4; Wang, Cong 3; Chen, Xian 5
	2021-02-25
Conference Name	ECML PKDD 2020: Machine Learning and Knowledge Discovery in Databases
Source Publication	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	12459 LNAI
Pages	600-616
Conference Date	2020/09/14-2020/09/18
Conference Place	Online
Abstract	Short text clustering (STC) is an important task that can discover topics or groups in the fast-growing social networks, e.g., Tweets and Google News. Different from the long texts, STC is more challenging since the word co-occurrence patterns presented in short texts usually make the traditional methods (e.g., TF-IDF) suffer from a sparsity problem of inevitably generating sparse representations. Moreover, these learned representations may lead to the inferior performance of clustering which essentially relies on calculating the distances between the presentations. For alleviating this problem, recent studies are mostly committed to developing representation learning approaches to learn compact low-dimensional embeddings, while most of them, including probabilistic graph models and word embedding models, require all documents in the corpus to be present during the training process. Thus, these methods inherently perform transductive learning which naturally cannot handle well the representations of unseen documents where few words have been learned before. Recently, Graph Neural Networks (GNNs) has drawn a lot of attention in various applications. Inspired by the mechanism of vertex information propagation guided by the graph structure in GNNs, we propose an inductive document representation learning model, called IDRL, that can map the short text structures into a graph network and recursively aggregate the neighbor information of the words in the unseen documents. Then, we can reconstruct the representations of the previously unseen short texts with the limited numbers of word embeddings learned before. Experimental results show that our proposed method can learn more discriminative representations in terms of inductive classification tasks and achieve better clustering performance than state-of-the-art models on four real-world datasets.
DOI	10.1007/978-3-030-67664-3_36
URL	View the original
Indexed By	CPCI-S
Language	英語English
WOS Research Area	Computer Science ; Mathematics
WOS Subject	Computer Science, Artificial Intelligence ; Computer Science, Information Systems ; Mathematics, Applied
WOS ID	WOS:000717550500036
Scopus ID	2-s2.0-85103260063
Fulltext Access	View Full-Text via DOI View Full-Text via Web of Science View Full-Text via Scopus
Citation statistics
Document Type	Conference paper
Collection	DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding Author	Gong, Zhiguo; Wang, Wei
Affiliation	1.University of Macau, Macao, Macao 2.The University of Queensland, Brisbane, Australia 3.Dalian University of Technology, Dalian, China 4.The Chinese University of Hong Kong, Hong Kong, China 5.The University of Hong Kong, Hong Kong, Hong Kong
First Author Affilication	University of Macau
Corresponding Author Affilication	University of Macau
Recommended Citation GB/T 7714	Chen, Junyang,Gong, Zhiguo,Wang, Wei,et al. Inductive Document Representation Learning for Short Text Clustering[C], 2021, 600-616.
APA	Chen, Junyang., Gong, Zhiguo., Wang, Wei., Dong, Xiao., Wang, Wei., Liu, Weiwen., Wang, Cong., & Chen, Xian (2021). Inductive Document Representation Learning for Short Text Clustering. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12459 LNAI, 600-616.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh