UM  > Faculty of Business Administration
Residential Collegefalse
Status已發表Published
Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data
Zhu, Jiadi1; Yuan, Ziyang2; Shu, Lianjie3; Liao, Wenhui4; Zhao, Mingtao5; Zhou, Yan2
2021-03-04
Source PublicationFrontiers in Genetics
ISSN1664-8021
Volume12Pages:642227
Abstract

Next-generation sequencing has emerged as an essential technology for the quantitative analysis of gene expression. In medical research, RNA sequencing (RNA-seq) data are commonly used to identify which type of disease a patient has. Because of the discrete nature of RNA-seq data, the existing statistical methods that have been developed for microarray data cannot be directly applied to RNA-seq data. Existing statistical methods usually model RNA-seq data by a discrete distribution, such as the Poisson, the negative binomial, or the mixture distribution with a point mass at zero and a Poisson distribution to further allow for data with an excess of zeros. Consequently, analytic tools corresponding to the above three discrete distributions have been developed: Poisson linear discriminant analysis (PLDA), negative binomial linear discriminant analysis (NBLDA), and zero-inflated Poisson logistic discriminant analysis (ZIPLDA). However, it is unclear what the real distributions would be for these classifications when applied to a new and real dataset. Considering that count datasets are frequently characterized by excess zeros and overdispersion, this paper extends the existing distribution to a mixture distribution with a point mass at zero and a negative binomial distribution and proposes a zero-inflated negative binomial logistic discriminant analysis (ZINBLDA) for classification. More importantly, we compare the above four classification methods from the perspective of model parameters, as an understanding of parameters is necessary for selecting the optimal method for RNA-seq data. Furthermore, we determine that the above four methods could transform into each other in some cases. Using simulation studies, we compare and evaluate the performance of these classification methods in a wide range of settings, and we also present a decision tree model created to help us select the optimal classifier for a new RNA-seq dataset. The results of the two real datasets coincide with the theory and simulation analysis results. The methods used in this work are implemented in the open-scource R scripts, with a source code freely available at https://github.com/FocusPaka/ZINBLDA.

KeywordClassification Nblda Plda Rna-seq Data Zinblda Ziplda
DOI10.3389/fgene.2021.642227
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaGenetics & Heredity
WOS SubjectGenetics & Heredity
WOS IDWOS:000629970400001
PublisherFRONTIERS MEDIA SAAVENUE DU TRIBUNAL FEDERAL 34, LAUSANNE CH-1015, SWITZERLAND
Scopus ID2-s2.0-85102863457
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionFaculty of Business Administration
DEPARTMENT OF ACCOUNTING AND INFORMATION MANAGEMENT
Corresponding AuthorLiao, Wenhui; Zhao, Mingtao; Zhou, Yan
Affiliation1.Department of Mathematics and Statistics, Xidian University, Xi'an, China
2.Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen University, Shenzhen, China
3.Faculty of Business Administration, University of Macau, Macao
4.GuangDong University of Finance, Guangzhou, China
5.Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, China
Recommended Citation
GB/T 7714
Zhu, Jiadi,Yuan, Ziyang,Shu, Lianjie,et al. Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data[J]. Frontiers in Genetics, 2021, 12, 642227.
APA Zhu, Jiadi., Yuan, Ziyang., Shu, Lianjie., Liao, Wenhui., Zhao, Mingtao., & Zhou, Yan (2021). Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data. Frontiers in Genetics, 12, 642227.
MLA Zhu, Jiadi,et al."Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data".Frontiers in Genetics 12(2021):642227.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhu, Jiadi]'s Articles
[Yuan, Ziyang]'s Articles
[Shu, Lianjie]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhu, Jiadi]'s Articles
[Yuan, Ziyang]'s Articles
[Shu, Lianjie]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhu, Jiadi]'s Articles
[Yuan, Ziyang]'s Articles
[Shu, Lianjie]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.