Residential College | false |
Status | 已發表Published |
Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data | |
Zhu, Jiadi1; Yuan, Ziyang2; Shu, Lianjie3; Liao, Wenhui4; Zhao, Mingtao5; Zhou, Yan2 | |
2021-03-04 | |
Source Publication | Frontiers in Genetics |
ISSN | 1664-8021 |
Volume | 12Pages:642227 |
Abstract | Next-generation sequencing has emerged as an essential technology for the quantitative analysis of gene expression. In medical research, RNA sequencing (RNA-seq) data are commonly used to identify which type of disease a patient has. Because of the discrete nature of RNA-seq data, the existing statistical methods that have been developed for microarray data cannot be directly applied to RNA-seq data. Existing statistical methods usually model RNA-seq data by a discrete distribution, such as the Poisson, the negative binomial, or the mixture distribution with a point mass at zero and a Poisson distribution to further allow for data with an excess of zeros. Consequently, analytic tools corresponding to the above three discrete distributions have been developed: Poisson linear discriminant analysis (PLDA), negative binomial linear discriminant analysis (NBLDA), and zero-inflated Poisson logistic discriminant analysis (ZIPLDA). However, it is unclear what the real distributions would be for these classifications when applied to a new and real dataset. Considering that count datasets are frequently characterized by excess zeros and overdispersion, this paper extends the existing distribution to a mixture distribution with a point mass at zero and a negative binomial distribution and proposes a zero-inflated negative binomial logistic discriminant analysis (ZINBLDA) for classification. More importantly, we compare the above four classification methods from the perspective of model parameters, as an understanding of parameters is necessary for selecting the optimal method for RNA-seq data. Furthermore, we determine that the above four methods could transform into each other in some cases. Using simulation studies, we compare and evaluate the performance of these classification methods in a wide range of settings, and we also present a decision tree model created to help us select the optimal classifier for a new RNA-seq dataset. The results of the two real datasets coincide with the theory and simulation analysis results. The methods used in this work are implemented in the open-scource R scripts, with a source code freely available at https://github.com/FocusPaka/ZINBLDA. |
Keyword | Classification Nblda Plda Rna-seq Data Zinblda Ziplda |
DOI | 10.3389/fgene.2021.642227 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Genetics & Heredity |
WOS Subject | Genetics & Heredity |
WOS ID | WOS:000629970400001 |
Publisher | FRONTIERS MEDIA SAAVENUE DU TRIBUNAL FEDERAL 34, LAUSANNE CH-1015, SWITZERLAND |
Scopus ID | 2-s2.0-85102863457 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Business Administration DEPARTMENT OF ACCOUNTING AND INFORMATION MANAGEMENT |
Corresponding Author | Liao, Wenhui; Zhao, Mingtao; Zhou, Yan |
Affiliation | 1.Department of Mathematics and Statistics, Xidian University, Xi'an, China 2.Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen University, Shenzhen, China 3.Faculty of Business Administration, University of Macau, Macao 4.GuangDong University of Finance, Guangzhou, China 5.Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, China |
Recommended Citation GB/T 7714 | Zhu, Jiadi,Yuan, Ziyang,Shu, Lianjie,et al. Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data[J]. Frontiers in Genetics, 2021, 12, 642227. |
APA | Zhu, Jiadi., Yuan, Ziyang., Shu, Lianjie., Liao, Wenhui., Zhao, Mingtao., & Zhou, Yan (2021). Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data. Frontiers in Genetics, 12, 642227. |
MLA | Zhu, Jiadi,et al."Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data".Frontiers in Genetics 12(2021):642227. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment