Status已發表Published
A nonparametric model for online topic discovery with word embeddings
Chen, J.; Gong, Z. G.; Liu, W.
2019-12-01
Source PublicationInformation Science
ISSN0020-0255
Pages32-47
AbstractWith the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.
KeywordData mining Clustering Topic model Online topic discovery Nonparametric model Word embeddings
URLView the original
Language英語English
The Source to ArticlePB_Publication
PUB ID47654
Document TypeJournal article
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Recommended Citation
GB/T 7714
Chen, J.,Gong, Z. G.,Liu, W.. A nonparametric model for online topic discovery with word embeddings[J]. Information Science, 2019, 32-47.
APA Chen, J.., Gong, Z. G.., & Liu, W. (2019). A nonparametric model for online topic discovery with word embeddings. Information Science, 32-47.
MLA Chen, J.,et al."A nonparametric model for online topic discovery with word embeddings".Information Science (2019):32-47.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Chen, J.]'s Articles
[Gong, Z. G.]'s Articles
[Liu, W.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Chen, J.]'s Articles
[Gong, Z. G.]'s Articles
[Liu, W.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Chen, J.]'s Articles
[Gong, Z. G.]'s Articles
[Liu, W.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.