Residential Collegefalse
Status已發表Published
Identification of cancer-related long non-coding RNAs using XGboost with high accuracy
Zhang,Xuan1,2; Li,Tianjun3; Wang,Jun4; Li,Jing1; Chen,Long3; Liu,Changning1
2019-08
Source PublicationFrontiers in Genetics
Volume10
Abstract

In the past decade, hundreds of long noncoding RNAs (lncRNAs) have been identified as significant players in diverse types of cancer; however, the functions and mechanisms of most lncRNAs in cancer remain unclear. Several computational methods have been developed to detect associations between cancer and lncRNAs, yet those approaches have limitations in both sensitivity and specificity. With the goal of improving the prediction accuracy for associations of lncRNA with cancer, we upgraded our previously developed cancer-related lncRNA classifier, CRlncRC, to generate CRlncRC2. CRlncRC2 is an eXtreme Gradient Boosting (XGBoost) machine learning framework, including Synthetic Minority Over-sampling Technique (SMOTE)-based over-sampling, along with Laplacian Score-based feature selection. Ten-fold cross-validation showed that the AUC value of CRlncRC2 for identification of cancer-related lncRNAs is much higher than previously reported by CRlncRC and others. Compared with CRlncRC, the number of features used by CRlncRC2 dropped from 85 to 51. Finally, we identified 439 cancer-related lncRNA candidates using CRlncRC2. To evaluate the accuracy of the predictions, we first consulted the cancer-related long non-coding RNA database Lnc2Cancer v2.0 and relevant literature for supporting information, then conducted statistical analysis of somatic mutations, distance from cancer genes, and differential expression in tumor tissues, using various data sets. The results showed that our approach was highly reliable for identifying cancer-related lncRNA candidates. Notably, the highest ranked candidate, lncRNA AC074117.1, has not been reported previously; however, integrated multi-omics analyses demonstrate that it is the target of multiple cancer-related miRNAs and interacts with adjacent protein-coding genes, suggesting that it may act as a cancer-related competing endogenous RNA, which warrants further investigation. In conclusion, CRlncRC2 is an effective and accurate method for identification of cancer-related lncRNAs, and has potential to contribute to the functional annotation of lncRNAs and guide cancer therapy.

KeywordCancer Long Noncoding Rna Machine Learning Synthetic Minority Over-sampling Technique Xgboost
DOI10.3389/fgene.2019.00735
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaGenetics & Heredity
WOS IDWOS:000480270100003
Scopus ID2-s2.0-85070591123
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorChen,Long; Liu,Changning
Affiliation1.Chinese Academy of Sciences
2.University of Chinese Academy of Sciences
3.University of Macau
4.Central South University
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Zhang,Xuan,Li,Tianjun,Wang,Jun,et al. Identification of cancer-related long non-coding RNAs using XGboost with high accuracy[J]. Frontiers in Genetics, 2019, 10.
APA Zhang,Xuan., Li,Tianjun., Wang,Jun., Li,Jing., Chen,Long., & Liu,Changning (2019). Identification of cancer-related long non-coding RNAs using XGboost with high accuracy. Frontiers in Genetics, 10.
MLA Zhang,Xuan,et al."Identification of cancer-related long non-coding RNAs using XGboost with high accuracy".Frontiers in Genetics 10(2019).
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhang,Xuan]'s Articles
[Li,Tianjun]'s Articles
[Wang,Jun]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhang,Xuan]'s Articles
[Li,Tianjun]'s Articles
[Wang,Jun]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhang,Xuan]'s Articles
[Li,Tianjun]'s Articles
[Wang,Jun]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.