Residential College | false |
Status | 已發表Published |
Recurrent Coupled Topic Modeling over Sequential Documents | |
Jinjin Guo1; Longbing Cao2; Zhiguo Gong1 | |
2022-02-01 | |
Source Publication | ACM Transactions on Knowledge Discovery from Data |
ISSN | 1556-4681 |
Volume | 16Issue:1 |
Abstract | The abundant sequential documents such as online archival, social media, and news feeds are streamingly updated, where each chunk of documents is incorporated with smoothly evolving yet dependent topics. Such digital texts have attracted extensive research on dynamic topic modeling to infer hidden evolving topics and their temporal dependencies. However, most of the existing approaches focus on single-Topic-Thread evolution and ignore the fact that a current topic may be coupled with multiple relevant prior topics. In addition, these approaches also incur the intractable inference problem when inferring latent parameters, resulting in a high computational cost and performance degradation. In this work, we assume that a current topic evolves from all prior topics with corresponding coupling weights, forming the multi-Topic-Thread evolution. Our method models the dependencies between evolving topics and thoroughly encodes their complex multi-couplings across time steps. To conquer the intractable inference challenge, a new solution with a set of novel data augmentation techniques is proposed, which successfully discomposes the multi-couplings between evolving topics. A fully conjugate model is thus obtained to guarantee the effectiveness and efficiency of the inference technique. A novel Gibbs sampler with a backward-forward filter algorithm efficiently learns latent time-evolving parameters in a closed-form. In addition, the latent Indian Buffet Process compound distribution is exploited to automatically infer the overall topic number and customize the sparse topic proportions for each sequential document without bias. The proposed method is evaluated on both synthetic and real-world datasets against the competitive baselines, demonstrating its superiority over the baselines in terms of the low per-word perplexity, high coherent topics, and better document time prediction. |
Keyword | Bayesian Network Data Augmentation Dropout Gibbs Sampling Multiple Dependency Topic Coupling Topic Evolution Topic Modeling |
DOI | 10.1145/3451530 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Computer Science |
WOS Subject | Computer Science, Information Systems ; Computer Science, Software Engineering |
WOS ID | WOS:000675516600011 |
Scopus ID | 2-s2.0-85111140551 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Affiliation | 1.Department of Computer and Information Science, University of Macau, 999078, Macao 2.Data Science Lab, University of Technology Sydney, Ultimo, 2007, Australia |
First Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Jinjin Guo,Longbing Cao,Zhiguo Gong. Recurrent Coupled Topic Modeling over Sequential Documents[J]. ACM Transactions on Knowledge Discovery from Data, 2022, 16(1). |
APA | Jinjin Guo., Longbing Cao., & Zhiguo Gong (2022). Recurrent Coupled Topic Modeling over Sequential Documents. ACM Transactions on Knowledge Discovery from Data, 16(1). |
MLA | Jinjin Guo,et al."Recurrent Coupled Topic Modeling over Sequential Documents".ACM Transactions on Knowledge Discovery from Data 16.1(2022). |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment