Residential Collegefalse
Status已發表Published
Recurrent Coupled Topic Modeling over Sequential Documents
Jinjin Guo1; Longbing Cao2; Zhiguo Gong1
2022-02-01
Source PublicationACM Transactions on Knowledge Discovery from Data
ISSN1556-4681
Volume16Issue:1
Abstract

The abundant sequential documents such as online archival, social media, and news feeds are streamingly updated, where each chunk of documents is incorporated with smoothly evolving yet dependent topics. Such digital texts have attracted extensive research on dynamic topic modeling to infer hidden evolving topics and their temporal dependencies. However, most of the existing approaches focus on single-Topic-Thread evolution and ignore the fact that a current topic may be coupled with multiple relevant prior topics. In addition, these approaches also incur the intractable inference problem when inferring latent parameters, resulting in a high computational cost and performance degradation. In this work, we assume that a current topic evolves from all prior topics with corresponding coupling weights, forming the multi-Topic-Thread evolution. Our method models the dependencies between evolving topics and thoroughly encodes their complex multi-couplings across time steps. To conquer the intractable inference challenge, a new solution with a set of novel data augmentation techniques is proposed, which successfully discomposes the multi-couplings between evolving topics. A fully conjugate model is thus obtained to guarantee the effectiveness and efficiency of the inference technique. A novel Gibbs sampler with a backward-forward filter algorithm efficiently learns latent time-evolving parameters in a closed-form. In addition, the latent Indian Buffet Process compound distribution is exploited to automatically infer the overall topic number and customize the sparse topic proportions for each sequential document without bias. The proposed method is evaluated on both synthetic and real-world datasets against the competitive baselines, demonstrating its superiority over the baselines in terms of the low per-word perplexity, high coherent topics, and better document time prediction.

KeywordBayesian Network Data Augmentation Dropout Gibbs Sampling Multiple Dependency Topic Coupling Topic Evolution Topic Modeling
DOI10.1145/3451530
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaComputer Science
WOS SubjectComputer Science, Information Systems ; Computer Science, Software Engineering
WOS IDWOS:000675516600011
Scopus ID2-s2.0-85111140551
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation1.Department of Computer and Information Science, University of Macau, 999078, Macao
2.Data Science Lab, University of Technology Sydney, Ultimo, 2007, Australia
First Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Jinjin Guo,Longbing Cao,Zhiguo Gong. Recurrent Coupled Topic Modeling over Sequential Documents[J]. ACM Transactions on Knowledge Discovery from Data, 2022, 16(1).
APA Jinjin Guo., Longbing Cao., & Zhiguo Gong (2022). Recurrent Coupled Topic Modeling over Sequential Documents. ACM Transactions on Knowledge Discovery from Data, 16(1).
MLA Jinjin Guo,et al."Recurrent Coupled Topic Modeling over Sequential Documents".ACM Transactions on Knowledge Discovery from Data 16.1(2022).
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Jinjin Guo]'s Articles
[Longbing Cao]'s Articles
[Zhiguo Gong]'s Articles
Baidu academic
Similar articles in Baidu academic
[Jinjin Guo]'s Articles
[Longbing Cao]'s Articles
[Zhiguo Gong]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Jinjin Guo]'s Articles
[Longbing Cao]'s Articles
[Zhiguo Gong]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.