Residential College | false |
Status | 已發表Published |
Inductive Document Representation Learning for Short Text Clustering | |
Chen, Junyang1; Gong, Zhiguo1; Wang, Wei1; Dong, Xiao2; Wang, Wei3; Liu, Weiwen4; Wang, Cong3; Chen, Xian5 | |
2021-02-25 | |
Conference Name | ECML PKDD 2020: Machine Learning and Knowledge Discovery in Databases |
Source Publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 12459 LNAI |
Pages | 600-616 |
Conference Date | 2020/09/14-2020/09/18 |
Conference Place | Online |
Abstract | Short text clustering (STC) is an important task that can discover topics or groups in the fast-growing social networks, e.g., Tweets and Google News. Different from the long texts, STC is more challenging since the word co-occurrence patterns presented in short texts usually make the traditional methods (e.g., TF-IDF) suffer from a sparsity problem of inevitably generating sparse representations. Moreover, these learned representations may lead to the inferior performance of clustering which essentially relies on calculating the distances between the presentations. For alleviating this problem, recent studies are mostly committed to developing representation learning approaches to learn compact low-dimensional embeddings, while most of them, including probabilistic graph models and word embedding models, require all documents in the corpus to be present during the training process. Thus, these methods inherently perform transductive learning which naturally cannot handle well the representations of unseen documents where few words have been learned before. Recently, Graph Neural Networks (GNNs) has drawn a lot of attention in various applications. Inspired by the mechanism of vertex information propagation guided by the graph structure in GNNs, we propose an inductive document representation learning model, called IDRL, that can map the short text structures into a graph network and recursively aggregate the neighbor information of the words in the unseen documents. Then, we can reconstruct the representations of the previously unseen short texts with the limited numbers of word embeddings learned before. Experimental results show that our proposed method can learn more discriminative representations in terms of inductive classification tasks and achieve better clustering performance than state-of-the-art models on four real-world datasets. |
DOI | 10.1007/978-3-030-67664-3_36 |
URL | View the original |
Indexed By | CPCI-S |
Language | 英語English |
WOS Research Area | Computer Science ; Mathematics |
WOS Subject | Computer Science, Artificial Intelligence ; Computer Science, Information Systems ; Mathematics, Applied |
WOS ID | WOS:000717550500036 |
Scopus ID | 2-s2.0-85103260063 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Gong, Zhiguo; Wang, Wei |
Affiliation | 1.University of Macau, Macao, Macao 2.The University of Queensland, Brisbane, Australia 3.Dalian University of Technology, Dalian, China 4.The Chinese University of Hong Kong, Hong Kong, China 5.The University of Hong Kong, Hong Kong, Hong Kong |
First Author Affilication | University of Macau |
Corresponding Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Chen, Junyang,Gong, Zhiguo,Wang, Wei,et al. Inductive Document Representation Learning for Short Text Clustering[C], 2021, 600-616. |
APA | Chen, Junyang., Gong, Zhiguo., Wang, Wei., Dong, Xiao., Wang, Wei., Liu, Weiwen., Wang, Cong., & Chen, Xian (2021). Inductive Document Representation Learning for Short Text Clustering. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12459 LNAI, 600-616. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment