Residential Collegefalse
Status已發表Published
A Dirichlet process biterm-based mixture model for short text stream clustering
Junyang Chen1; Zhiguo Gong1; Weiwen Liu2
2020-05
Source PublicationApplied Intelligence
ISSN0924-669X
Volume50Issue:5Pages:1609-1619
Abstract

Short text stream clustering has become an important problem for mining textual data in diverse social media platforms (e.g., Twitter). However, most of the existing clustering methods (e.g., LDA and PLSA) are developed based on the assumption of a static corpus of long texts, while little attention has been given to short text streams. Different from the long texts, the clustering of short texts is more challenging since their word co-occurrence pattern easily suffers from a sparsity problem. In this paper, we propose a Dirichlet process biterm-based mixture model (DP-BMM), which can deal with the topic drift problem and the sparsity problem in short text stream clustering. The major advantages of DP-BMM include (1) DP-BMM explicitly exploits the word-pairs constructed from each document to enhance the word co-occurrence pattern in short texts; (2) DP-BMM can deal with the topic drift problem of short text streams naturally. Moreover, we further propose an improved algorithm of DP-BMM with forgetting property called DP-BMM-FP, which can efficiently delete biterms of outdated documents by deleting clusters of outdated batches. To perform inference, we adopt an online Gibbs sampling method for parameter estimation. Our extensive experimental results on real-world datasets show that DP-BMM and DP-BMM-FP can achieve a better performance than the state-of-the-art methods in terms of NMI metrics.

KeywordData Mining Stream Clustering Topic Modeling
DOI10.1007/s10489-019-01606-1
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaComputer Science
WOS SubjectComputer Science, Artificial Intelligence
WOS IDWOS:000515703600001
PublisherSPRINGER, VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS
Scopus ID2-s2.0-85078879429
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation1.State Key Laboratory of Internet of Things for Smart City and Department of Computer and Information Science,University of Macau,Macao
2.Department of Computer Science and Engineering,The Chinese University of Hong Kong,Hong Kong
First Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Junyang Chen,Zhiguo Gong,Weiwen Liu. A Dirichlet process biterm-based mixture model for short text stream clustering[J]. Applied Intelligence, 2020, 50(5), 1609-1619.
APA Junyang Chen., Zhiguo Gong., & Weiwen Liu (2020). A Dirichlet process biterm-based mixture model for short text stream clustering. Applied Intelligence, 50(5), 1609-1619.
MLA Junyang Chen,et al."A Dirichlet process biterm-based mixture model for short text stream clustering".Applied Intelligence 50.5(2020):1609-1619.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Junyang Chen]'s Articles
[Zhiguo Gong]'s Articles
[Weiwen Liu]'s Articles
Baidu academic
Similar articles in Baidu academic
[Junyang Chen]'s Articles
[Zhiguo Gong]'s Articles
[Weiwen Liu]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Junyang Chen]'s Articles
[Zhiguo Gong]'s Articles
[Weiwen Liu]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.