北京大学学报(医学版) ›› 2019, Vol. 51 ›› Issue (4): 615-622. doi: 10.19723/j.issn.1671-167X.2019.04.003

• 论著 • 上一篇    下一篇

基于长链非编码RNA的生物信息学分析构建膀胱癌预后模型并确定预后生物标志物

杨飞龙,洪锴,赵国江,刘承,宋一萌(),马潞林()   

  1. 北京大学第三医院泌尿外科,北京 100191
  • 收稿日期:2019-03-13 出版日期:2019-08-18 发布日期:2019-09-03
  • 通讯作者: 宋一萌,马潞林 E-mail:song_yimeng@126.com;malulin@medmail.com.cn
  • 基金资助:
    国家自然科学基金(81711530048);国家自然科学基金(81572515);首都市民健康项目(Z151100003915105)

Construction of prognostic model and identification of prognostic biomarkers based on the expression of long non-coding RNA in bladder cancer via bioinformatics

Fei-long YANG,Kai HONG,Guo-jiang ZHAO,Cheng LIU,Yi-meng SONG(),Lu-lin MA()   

  1. Department of Urology, Peking University Third Hospital, Beijing 100191, China
  • Received:2019-03-13 Online:2019-08-18 Published:2019-09-03
  • Contact: Yi-meng SONG,Lu-lin MA E-mail:song_yimeng@126.com;malulin@medmail.com.cn
  • Supported by:
    Supported by the National Natural Science Foundation of China(81711530048);Supported by the National Natural Science Foundation of China(81572515);Beijing Municipal Science & Technology Commission(Z151100003915105)

摘要:

目的:构建基于长链非编码RNA(long non-coding RNA,lncRNA)的膀胱癌预后模型,并寻找预后生物标志物。方法:从癌症基因组图谱(The Cancer Genome Atlas,TCGA)数据库下载膀胱癌转录组及临床数据,Perl软件和R软件用于数据处理和分析。首先筛选差异表达lncRNA,继而对筛选结果进行单因素Cox回归分析以初步筛选与预后相关的lncRNA,再进一步用Lasso回归分析筛选影响预后的关键lncRNA,并运用多因素Cox回归分析构建预后模型。根据风险评分的中位数将患者分为高风险组和低风险组,运用Kaplan-Meier(K-M)生存分析、受试者接受特征(receiver operating characteristic,ROC)曲线和C指数对模型进行评价。此外,运用多因素Cox回归分析计算预后模型中各lncRNA的危险比和95%置信区间,并对差异有统计学意义的lncRNA进行K-M生存分析以确定预后生物标志物。结果:单因素Cox回归分析显示,在691个差异表达的lncRNA中, 35个可能与预后相关,其中23个经Lasso回归分析确认为影响预后的关键lncRNA。此外,K-M生存分析结果显示低风险组的总生存时间较高风险组长[(2.85±2.72)年vs. (1.58±1.51)年, P<0.001], ROC曲线显示3年生存率和5年生存率的曲线下面积分别为0.813和0.778,C指数为0.73。多因素Cox回归表明,23个关键lncRNA中有11个lncRNA差异有统计学意义,进一步的K-M生存分析表明,其中有3个lncRNA可能具有独立的预后价值,包括lncRNA AL589765.1(P = 0.004), AC023824.1(P = 0.022)和PKN2-AS1(P = 0.016)。结论:通过生物信息学分析,成功构建了基于23个lncRNA表达水平的膀胱癌预后模型,预测准确性中等,并确定了一个保护性预后生物标志物AL589765.1,以及两个不利的预后生物标志物AC023824.1PKN2-AS1

关键词: 长链非编码RNA, 预后模型, 预后生物标志物, 膀胱癌, 生物信息学

Abstract:

Objective: To construct the prognostic model and identify the prognostic biomarkers based on long non-coding RNA (lncRNA) in bladder cancer.Methods: The lncRNA expression data and corresponding clinical data of bladder cancer were collected from The Cancer Genome Atlas (TCGA) database. The software Perl and R, and R packages were used for data integration, extraction, analysis and visualization. Detailly, R package “edgeR” was utilized to screen differentially expressed lncRNA in bladder cancer tissues compared with the normal bladder samples. The univariate Cox regression and the least absolute shrinkage and selection operator (Lasso) regression were performed to identify key lncRNA that were utilized to construct the prognostic model by the multivariate Cox regression. According to the median value of the risk score, all patients were divided into the high-risk group and low-risk group to perform the Kaplan-Meier (K-M) survival curves, receiver operating characteristic (ROC) curve and C-index, estimating the prognostic power of the prognostic model. In addition, the hazard ratio (HR) and 95% confidence interval (CI) of each key lncRNA were also calculated by the multivariate Cox regression. Moreover, we performed the K-M survival analysis for each significant key lncRNA from the result of the multivariate Cox regression.Results: A total of 691 lncRNA were identified as differentially expressed lncRNA, and 35 lncRNA signatures were initially considered associated with the prognosis of bladder cancer, where in 23 lncRNA were identified as key lncRNA associated with the prognosis. The overall survival time in years of the low-risk group was obviously longer than that of the high-risk group [(2.85±2.72) years vs. (1.58±1.51) years, P<0.001]. The area under the ROC curve (AUC) was 0.813 (3-year survival) and 0.778 (5-year survival) respectively, and the C-index was 0.73. In addition, HR and 95%CI of each key lncRNA were calculated by the multivariate Cox regression and 11 lncRNA were significant. Furthermore, K-M survival analysis revealed the independent prognostic value of 3 lncRNA, including AL589765.1(P = 0.004), AC023824.1(P = 0.022)and PKN2-AS1(P = 0.016).Conclusion: The present study successfully constructed the prognostic model based on the expression level of 23 lncRNA and finally identified one protective prognostic biomarker AL589765.1, and two adverse prognostic biomarkers including AC023824.1 and PKN2-AS1 in bladder cancer.

Key words: Long non-coding RNA, Prognostic model, Prognostic biomarker, Bladder cancer, Bioinformatics

中图分类号: 

  • R737.11

图1

lncRNA差异表达分析和Lasso回归分析"

图2

单因素Cox回归分析筛选的lncRNA的热图,前19个基因高表达,后16个基因低表达"

表1

TCGA数据库膀胱移形细胞癌临床基线资料表"

Clinical characteristics Value
Gender, n
Male 303
Female 106
Age/years, x?±s 68.1±5.6
Grade
High 385
Low 21
Unknown 3
TNM stage
2
130
139
136
Unknown 2
T
T0 1
T1 3
T2 120
T3 194
T4 59
Unknown 32
N
N0 237
N1 47
N2 76
N3 8
Unknown 41
M
M0 194
M1 11
Unknown 204

"

lncRNA name Ensemble id HR P
AC005008.2 ENSG00000237896 1.54 0.001
AL159153.1 ENSG00000275611 1.17 <0.001
AC025437.2 ENSG00000253424 1.21 0.001
AC073316.2 ENSG00000231892 0.89 0.002
AC104793.1 ENSG00000249568 1.22 0.002
AC087071.1 ENSG00000229196 1.10 0.003
KRT73-AS1 ENSG00000257495 1.11 0.005
ADAMTS9-AS1 ENSG00000241158 1.09 0.006
AC023824.1 ENSG00000260073 1.10 0.007
AL589765.1 ENSG00000227045 0.89 0.008
AL139130.1 ENSG00000237390 0.89 0.008
AC092725.1 ENSG00000261482 1.20 0.009
MYO16-AS1 ENSG00000236242 1.08 0.009
AL391704.1 ENSG00000224750 1.10 0.011
AP002812.5 ENSG00000255449 0.90 0.011
LINC02474 ENSG00000228437 1.05 0.015
AL137804.1 ENSG00000255525 1.14 0.018
AC105053.1 ENSG00000229498 0.87 0.018
PKN2-AS1 ENSG00000237505 1.07 0.019
LINC00536 ENSG00000249917 1.09 0.019
AC104071.1 ENSG00000251434 1.17 0.023
RGMB-AS1 ENSG00000246763 1.13 0.024
LINC00608 ENSG00000236445 0.85 0.025
AL023584.1 ENSG00000233138 1.15 0.027
AL138885.3 ENSG00000231056 1.10 0.031
LINC01468 ENSG00000231131 1.06 0.033
AF279873.3 ENSG00000253642 1.06 0.034
AC020558.1 ENSG00000264666 1.07 0.035
AC026469.1 ENSG00000275088 0.91 0.038
AL138789.1 ENSG00000233589 1.07 0.039
AL356489.2 ENSG00000260947 1.08 0.039
AL353804.1 ENSG00000228906 1.08 0.041
AC104472.1 ENSG00000214919 0.90 0.044
AC004973.1 ENSG00000226661 0.90 0.046
LARGE-IT1 ENSG00000232081 1.13 0.047

图3

通过K-M生存分析和ROC曲线评价膀胱癌预后模型"

表3

23个关键lncRNA的多因素Cox回归分析结果"

lncRNA name n HR (95%CI) P
AC004973.1 406 0.87 (0.77 - 0.99) 0.029
AC005008.2 406 1.73 (1.40 - 2.13) <0.001
AC023824.1 406 1.10 (1.02 - 1.19) 0.019
AC025437.2 406 1.28 (1.09 - 1.50) 0.003
AC026469.1 406 0.84 (0.75 - 0.93) 0.001
AC073316.2 406 0.94 (0.87 - 1.02) 0.124
AC087071.1 406 1.03 (0.95 - 1.12) 0.501
AC092725.1 406 1.34 (1.12 - 1.61) 0.002
AC104071.1 406 1.13 (0.97 - 1.31) 0.113
AC104793.1 406 1.12 (0.99 - 1.28) 0.076
AC105053.1 406 0.85(0.75 - 0.96) 0.009
ADAMTS9-AS1 406 1.04 (0.97 - 1.12) 0.266
AL023584.1 406 1.15 (0.99 - 1.33) 0.070
AL137804.1 406 1.01 (0.89 - 1.14) 0.885
AL139130.1 406 0.93 (0.86 - 1.02) 0.137
AL159153.1 406 1.07 (0.96 - 1.19) 0.196
AL356489.2 406 1.12 (1.02 - 1.23) 0.020
AL589765.1 406 0.91 (0.84 - 0.99) 0.028
AP002812.5 406 0.94 (0.86 - 1.03) 0.189
LINC00608 406 0.74 (0.63 - 0.87) <0.001
LINC02474 406 1.05 (0.99 - 1.11) 0.105
MYO16-AS1 406 0.98 (0.90 - 1.07) 0.679
PKN2-AS1 406 1.11 (1.03 - 1.19) 0.007

图4

多因素Cox回归分析结果中11个差异有统计学意义的关键lncRNA的K-M生存分析结果"

[1] Bellmunt J, Orsola A, Leow JJ , et al. Bladder cancer: ESMO practice guidelines for diagnosis, treatment and follow-up[J]. Ann Oncol, 2014,25(Suppl 3):40-48.
[2] Welty CJ, Sanford TH, Wright JL , et al. Thecancer of the bladder risk assessment (COBRA) score: estimating mortality after radical cystectomy[J]. Cancer, 2017,123(23):4574-4582.
[3] 李吉, 刘裔道, 蚌凌青 . 肌层浸润性膀胱癌患者膀胱部分切除术结合放化疗的临床疗效及预后分析[J]. 临床泌尿外科杂志, 2017,32(10):767-770.
[4] Esteller M . Non-coding RNAs in human disease[J]. Nat Rev Genet, 2011,12(12):861-874.
[5] Martens-Uzunova ES, Bottcher R, Croce CM , et al. Long nonco-ding RNA in prostate, bladder, and kidney cancer[J]. Eur Urol, 2014,65(6):1140-1151.
[6] Jin Y, Feng SJ, Qiu S , et al. LncRNA MALAT1 promotes proliferation and metastasis in epithelial ovarian cancer via the PI3K-AKT pathway[J]. Eur Rev Med Pharmacol Sci, 2017,21(14):3176-3184.
[7] Gupta RA, Shah N, Wang KC , et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis[J]. Nature, 2010,464(7291):1071-1076.
[8] Quinn JJ, Chang HY . Unique features of long non-coding RNA biogenesis and function[J]. Nat Rev Genet, 2016,17(1):47-62.
[9] Djebali S, Davis CA, Merkel A , et al. Landscape of transcription in human cells[J]. Nature, 2012,489(7414):101-108.
[10] Lee JT . Epigenetic regulation by long noncoding RNAs[J]. Science, 2012,338(6113):1435-1439.
[11] Wang KC, Chang HY . Molecular mechanisms of long noncoding RNAs[J]. Mol Cell, 2011,43(6):904-914.
[12] Liu D, Li Y, Luo G , et al. LncRNA SPRY4-IT1 sponges miR-101-3p to promote proliferation and metastasis of bladder cancer cells through up-regulating EZH2[J]. Cancer Lett, 2017,388:281-291.
[13] Schmitt AM, Chang HY . Longnoncoding RNAs in cancer pathways[J]. Cancer Cell, 2016,29(4):452-463.
[14] Yue B, Qiu S, Zhao S , et al. LncRNA-ATB mediated E-cadherin repression promotes the progression of colon cancer and predicts poor prognosis[J]. J Gastroenterol Hepatol, 2016,31(3):595-603.
[15] Zhang S, Zhong G, He W , et al. lncRNA up-regulated in non-muscle invasive bladder cancer facilitates tumor growth and acts as a negative prognostic factor of recurrence[J]. J Urol, 2016,196(4):1270-1278.
[16] Berrondo C, Flax J, Kucherov V , et al. Expression of the long non-coding RNA HOTAIR correlates with disease progression in bladder cancer and is contained in bladder cancer patient urinary exosomes[J]. PLoS One, 2016,11(1):e0147236.
[17] Huang HW, Xie H, Ma X , et al. Upregulation of lncRNA PANDAR predicts poor prognosis and promotes cell proliferation in cervical cancer[J]. Eur Rev Med Pharmacol Sci, 2017,21(20):4529-4535.
[18] Cao X, Xu J, Yue D . LncRNA-SNHG16 predicts poor prognosis and promotes tumor proliferation through epigenetically silencing p21 in bladder cancer[J]. Cancer Gene Ther, 2018,25(1/2):10-17.
[19] Tuo Z, Zhang J, Xue W . LncRNA TP73-AS1 predicts the prognosis of bladder cancer patients and functions as a suppressor for bladder cancer by EMT pathway[J]. Biochem Biophys Res Commun, 2018,499(4):875-881.
[20] Li HJ, Sun XM, Li ZK , et al. LncRNA UCA1promotes mitochondrial function of bladder cancer via the miR-195/ARL2 signaling pathway[J]. Cell Physiol Biochem, 2017,43(6):2548-2561.
[1] 覃鸿泉,郑幽,王嫚娜,张峥嵘,牛祖彪,马骊,孙强,黄红艳,王小宁. 免疫相关GTP结合蛋白2的亚细胞定位分析[J]. 北京大学学报(医学版), 2020, 52(2): 221-226.
[2] 黄海文,闫兵,尚美霞,刘漓波,郝瀚,席志军. 女性膀胱癌患者腹腔镜膀胱全切术与开放膀胱全切术的倾向性评分匹配比较[J]. 北京大学学报(医学版), 2019, 51(4): 698-705.
[3] 覃子健,毕海,马潞林,黄毅,张帆. 肠代膀胱内原发肠源性腺癌1例[J]. 北京大学学报(医学版), 2018, 50(4): 737-739.
[4] 唐兴国,颜野,邱敏,卢剑,陆敏,侯小飞,黄毅,马潞林. 单中心16年青年膀胱尿路上皮癌患者的诊治[J]. 北京大学学报(医学版), 2018, 50(4): 630-633.
[5] 叶海云,许清泉,黄晓波,马凯,王晓峰. 卡介苗膀胱灌注治疗致结核性前列腺脓肿1例[J]. 北京大学学报(医学版), 2015, 47(6): 1039-1041.
[6] 沈棋, 胡帅, 李峻, 王静华, 何群. 膀胱前列腺切除术中前列腺偶发癌发生率及临床病理特点分析[J]. 北京大学学报(医学版), 2014, 46(4): 515-518.
[7] 景霞, 张其鹏, 国强华, 卢铭, 朱晓华, 石磊, 芮伟, 尚彤. 网上免费医学生物学数据库指南的建立[J]. 北京大学学报(医学版), 2004, 36(3): 322-326.
[8] 马大龙. 计算机分析SARS病毒可能含有Caveolin结合区[J]. 北京大学学报(医学版), 2003, 35(z1): 139-139.
[9] 芮伟, 张其鹏, 石磊, 卢铭, 景霞, 国强华, 尚彤. SARS冠状病毒RNA聚合酶编码区分析[J]. 北京大学学报(医学版), 2003, 35(z1): 137-138.
[10] 芮伟, 张其鹏, 石磊, 卢铭, 景霞, 国强华, 尚彤. SARS冠状病毒可能编码蛋白质的三级结构预测[J]. 北京大学学报(医学版), 2003, 35(z1): 135-136.
[11] 张其鹏, 石磊, 芮伟, 卢铭, 国强华, 景霞, 尚彤. SARS冠状病毒全基因组突变初步分析[J]. 北京大学学报(医学版), 2003, 35(z1): 130-131.
[12] 张其鹏, 石磊, 芮伟, 卢铭, 国强华, 景霞, 尚彤. SARS冠状病毒基因组初步分析[J]. 北京大学学报(医学版), 2003, 35(z1): 128-129.
[13] 郑霙, 马大龙. 人程序化死亡分子5(PDCD5)核酸和蛋白质序列的数据发掘[J]. 北京大学学报(医学版), 2003, 35(4): 353-359.
[14] 张其鹏, 张丹, 刘贝, 朱晓华, 卢铭, 陈光慧, 尚彤, 汤建. 高血压相关基因和蛋白质数据库的初构[J]. 北京大学学报(医学版), 2002, 34(2): 178-183.
[15] 卢铭, 尚彤. 医学生物信息网的建立和发展[J]. 北京大学学报(医学版), 2001, 33(2): 189-191.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 赵磊, 王天龙 . 右心室舒张末期容量监测用于肝移植术中容量管理的临床研究[J]. 北京大学学报(医学版), 2009, 41(2): 188 -191 .
[2] 万有, , 韩济生, John E. Pintar. 孤啡肽基因敲除小鼠电针镇痛作用增强[J]. 北京大学学报(医学版), 2009, 41(3): 376 -379 .
[3] 张燕, 韩志慧, 钟延丰, 王盛兰, 李玲玲, 郑丹枫. 骨骼肌活组织检查病理诊断技术的改进及应用[J]. 北京大学学报(医学版), 2009, 41(4): 459 -462 .
[4] 赵奇, 薛世华, 刘志勇, 吴凌云. 同向施压测定自酸蚀与全酸蚀粘接系统粘接强度[J]. 北京大学学报(医学版), 2010, 42(1): 82 -84 .
[5] 林红, 王玉凤, 吴野平. 学校生活技能教育对小学三年级学生行为问题影响的对照研究[J]. 北京大学学报(医学版), 2007, 39(3): 319 -322 .
[6] 丰雷, 程嘉, 王玉凤. 注意缺陷多动障碍儿童的运动协调功能[J]. 北京大学学报(医学版), 2007, 39(3): 333 -336 .
[7] 李岳玲, 钱秋瑾, 王玉凤. 儿童注意缺陷多动障碍成人期预后及其预测因素[J]. 北京大学学报(医学版), 2007, 39(3): 337 -340 .
[8] . 书讯[J]. 北京大学学报(医学版), 2007, 39(3): 225 -328 .
[9] 张宏文, 丁洁, 王芳, 杨惠霞. 一例X连锁Alport综合征女性妊娠期随访[J]. 北京大学学报(医学版), 2007, 39(4): 351 -354 .
[10] 韩金涛, 赵军, 栾景源, 张龙. 多发结核性腹主动脉瘤一例[J]. 北京大学学报(医学版), 2007, 39(4): 361 -364 .