Status | 已發表Published |
A nonparametric model for online topic discovery with word embeddings | |
Chen, J.; Gong, Z. G.; Liu, W. | |
2019-12-01 | |
Source Publication | Information Science |
ISSN | 0020-0255 |
Pages | 32-47 |
Abstract | With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. |
Keyword | Data mining Clustering Topic model Online topic discovery Nonparametric model Word embeddings |
URL | View the original |
Language | 英語English |
The Source to Article | PB_Publication |
PUB ID | 47654 |
Document Type | Journal article |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Recommended Citation GB/T 7714 | Chen, J.,Gong, Z. G.,Liu, W.. A nonparametric model for online topic discovery with word embeddings[J]. Information Science, 2019, 32-47. |
APA | Chen, J.., Gong, Z. G.., & Liu, W. (2019). A nonparametric model for online topic discovery with word embeddings. Information Science, 32-47. |
MLA | Chen, J.,et al."A nonparametric model for online topic discovery with word embeddings".Information Science (2019):32-47. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment