北京大学学报(医学版) ›› 2021, Vol. 53 ›› Issue (6): 1163-1170. doi: 10.19723/j.issn.1671-167X.2021.06.026
吴静依1,2,林瑜1,蔺轲1,胡永华1,3,孔桂兰2,4,△()
WU Jing-yi1,2,LIN Yu1,LIN Ke1,HU Yong-hua1,3,KONG Gui-lan2,4,△()
摘要:
目的:基于三种机器学习算法——支持向量机(support vector machine,SVM)、分类回归树(classification and regression tree,CART)和随机森林(random forest,RF),构建重症监护室(intensive care unit,ICU)患者的ICU入住时长(length of ICU stay,LOS-ICU)分类预测模型,并与传统的定制版简化急性生理功能评分Ⅱ(simplified acute physiology score Ⅱ,SAPS-Ⅱ)模型进行比较。方法:使用美国大型重症医疗数据库(medical information mart for intensive care Ⅲ,MIMIC-Ⅲ),以ICU患者是否发生超长LOS-ICU(prolonged LOS-ICU,pLOS-ICU)作为结局指标,构建定制版SAPS-Ⅱ、SVM、CART和RF模型,使用递归特征消除法进行特征选择,基于五折交叉验证找出最佳预测模型。模型的预测性能评价指标包括Brier评分、受试者工作特征(receiver operation characteristic,ROC)曲线下面积(area under the ROC curve,AUROC)和估计校准度指数(estimated calibration index,ECI),模型性能指标之间的比较使用双侧t检验。使用本研究中预测性能最好的模型识别出来的各预测变量重要性排序结果,给出重要性排序前五位的预测变量。结果:最终共纳入40 200例ICU患者,发生pLOS-ICU的患者23.7%。其中,男性患者57.6%,患者平均年龄为(61.9±16.5)岁。五折交叉验证结果显示,相比于定制版SAPS-Ⅱ模型,三种机器学习模型的预测性能在各个指标上均有明显提升,且差异均具有统计学意义(P<0.01)。其中,RF模型在综合预测性能、区分度与校准度三个方面均表现最优,其Brier评分、AUROC和ECI分别为0.145、0.770和7.259。校准曲线结果显示,在高pLOS-ICU发生风险的ICU人群中,RF模型倾向于略微高估其风险;在低pLOS-ICU发生风险的ICU人群中,RF模型倾向于略微低估其风险。基于性能最优的RF模型识别的对pLOS-ICU预测最重要的五个变量依次为年龄、心率、收缩压、体温和动脉血氧分压与吸入氧分数之比。结论:基于机器学习方法构建ICU患者的pLOS-ICU预测模型相比于传统的定制版SAPS-Ⅱ模型,预测性能均有明显提升,其中,基于RF方法的pLOS-ICU预测模型性能最优,具有很大的临床应用潜力。
中图分类号:
[1] |
Bin D, Youzhong A, Yan K, et al. Characteristics of critically ill patients in ICUs in mainland China[J]. Crit Care Med, 2013, 41(1):84-92.
doi: 10.1097/CCM.0b013e31826a4082 pmid: 23222268 |
[2] |
Milbrandt EB, Kersten A, Rahim MT, et al. Growth of intensive care unit resource use and its estimated cost in Medicare[J]. Crit Care Med, 2008, 36(9):2504-2510.
doi: 10.1097/CCM.0b013e318183ef84 pmid: 18679127 |
[3] | Tsai PF, Chen PC, Chen YY, et al. Length of hospital stay prediction at the admission stage for cardiology patients using artificial neural network[J]. J Healthc Eng, 2016, 2016:7035463. |
[4] |
Nassar AP, Caruso P. ICU physicians are unable to accurately predict length of stay at admission: A prospective study[J]. Int J Qual Health Care, 2016, 28(1):99-103.
doi: 10.1093/intqhc/mzv112 |
[5] |
Le Gall JR, Lemeshow S, Saulnier F. A new simplified acute physiology ccore (SAPSⅡ) based on a European/North American multicenter study[J]. JAMA, 1993, 270(24):2957-2963.
doi: 10.1001/jama.1993.03510240069035 |
[6] |
Zimmerman JE, Kramer AA, Mcnair DS, et al. Acute physiology and chronic health evaluation (APACHE) Ⅳ: Hospital mortality assessment for today’s critically ill patients[J]. Crit Care Med, 2006, 34(5):1297-1310.
pmid: 16540951 |
[7] |
Vasilevskis EE, Kuzniewicz MW, Cason BA, et al. Mortality probability model Ⅲ and simplified acute physiology scoreⅡassessing their value in predicting length of stay and comparison to APACHE Ⅳ[J]. Chest, 2009, 136(1):89-101.
doi: S0012-3692(09)60409-8 pmid: 19363210 |
[8] |
Zimmerman JE, Kramer AA, Mcnair DS, et al. Intensive care unit length of stay: Benchmarking based on acute physiology and chronic health evaluation (APACHE) Ⅳ[J]. Crit Care Med, 2006, 34(10):2517-2529.
pmid: 16932234 |
[9] |
Lin K, Hu Y, Kong G. Predicting in-hospital mortality of patients with acute kidney injury in the ICU using random forest model[J]. Int J Med Inform, 2019, 125:55-61.
doi: S1386-5056(18)31087-6 pmid: 30914181 |
[10] | Saria S, Rajani AK, Gould J, et al. Integration of early physiolo-gical responses predicts later illness severity in preterm infants[J]. Sci Transl Med, 2010, 2(48):48-65. |
[11] |
Johnson AEW, Pollard TJ, Shen L, et al. MIMIC-Ⅲ, a freely accessible critical care database[J]. Sci Data, 2016, 3:160035.
doi: 10.1038/sdata.2016.35 |
[12] |
Agrawal S, Chen L, Tergas A I, et al. Characteristics associated with prolonged length of stay after hysterectomy for benign gynecologic conditions [J]. Am J Obstet Gynecol, 2018, 219(1): 89.e1-89.e15.
doi: 10.1016/j.ajog.2018.05.001 |
[13] |
Wolff J, Mccrone P, Patel A, et al. Predictors of length of stay in psychiatry: Analyses of electronic medical records[J]. BMC Psychiatry, 2015, 15:238.
doi: 10.1186/s12888-015-0623-6 pmid: 26446584 |
[14] |
Louw N, Steel SJ. Variable selection in kernel Fisher discriminant analysis by means of recursive feature elimination[J]. Comput Stat Data Anal, 2006, 51(3):2043-2055.
doi: 10.1016/j.csda.2005.12.018 |
[15] | Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers[C]// Proceedings of the fifth annual workshop on computational learning theory. New York: Association for Computing Machinery, 1992: 144-152. |
[16] |
Furey TS, Cristianini N, Duffy N, et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data[J]. Bioinformatics, 2000, 16(10):906-914.
pmid: 11120680 |
[17] | Breiman L, Friedman JH, Olshen RA, et al. Classification and regression trees[M]. Spain: Routledge, 2017. |
[18] |
Austin PC. A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality[J]. Stat Med, 2007, 26(15):2937-2957.
pmid: 17186501 |
[19] |
Breiman L. Random forests[J]. Machine Learning, 2001, 45(1):5-32.
doi: 10.1023/A:1010933404324 |
[20] |
Supatcha L, Chinae T, Chakarida N, et al. Heterogeneous ensemble approach with discriminative features and modified-SMOTE bagging for pre-miRNA classification[J]. Nucleic Acids Res, 2013, 41(1):e21.
doi: 10.1093/nar/gks878 |
[21] |
Wang SQ, Yang J, Chou KC. Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition[J]. J Theor Biol, 2006, 242(4):941-946.
doi: 10.1016/j.jtbi.2006.05.006 |
[22] |
Touw WG, Bayjanov JR, Overmars L, et al. Data mining in the life sciences with random forest: A walk in the park or lost in the jungle?[J]. Brief Bioinformatics, 2013, 14(3):315-326.
doi: 10.1093/bib/bbs034 |
[23] |
Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models a framework for traditional and novel measures[J]. Epidemiology, 2010, 21(1):128-138.
doi: 10.1097/EDE.0b013e3181c30fb2 pmid: 20010215 |
[24] |
van Hoorde K, van Huffel S, Timmerman D, et al. A spline-based tool to assess and visualize the calibration of multiclass risk predictions[J]. J Biomed Inform, 2015, 54:283-293.
doi: 10.1016/j.jbi.2014.12.016 pmid: 25579635 |
[25] |
Peigne V, Somme D, Guerot E, et al. Treatment intensity, age and outcome in medical ICU patients: Results of a French admi-nistrative database[J]. Ann Intensive Care, 2016, 6(1):7.
doi: 10.1186/s13613-016-0107-y |
[26] |
Fallenius M, Skrifvars MB, Reinikainen M, et al. Common intensive care scoring systems do not outperform age and glasgow coma scale score in predicting mid-term mortality in patients with spontaneous intracerebral hemorrhage treated in the intensive care unit[J]. Scand J Trauma Resusc Emerg Med, 2017, 25(1):102.
doi: 10.1186/s13049-017-0448-z |
[27] |
Szalados JE. Long-term mortality and quality of life after prolonged mechanical ventilation. Age and functional status as determinants of intensive care unit outcome: Sound basis for health policy or tip of the outcomes iceberg[J]. Crit Care Med, 2004, 32(1):291-293.
pmid: 14707597 |
[28] |
Brandberg C, Blomqvist H, Jirwe M. What is the importance of age on treatment of the elderly in the intensive care unit?[J]. Acta Anaesthesiol Scand, 2013, 57(6):698-703.
doi: 10.1111/aas.12073 pmid: 23373851 |
[29] |
Diringer MN, Reaven NL, Funk SE, et al. Elevated body temperature independently contributes to increased length of stay in neurologic intensive care unit patients[J]. Crit Care Med, 2004, 32(7):1489-1495.
pmid: 15241093 |
[30] |
Yien HW, Hseu SS, Lee LC, et al. Spectral analysis of systemic arterial pressure and heart rate signals as a prognostic tool for the prediction of patient outcome in the intensive care unit[J]. Crit Care Med, 1997, 25(2):258-266.
pmid: 9034261 |
[31] |
Grander W, Mullauer K, Koller B, et al. Heart rate before ICU discharge: a simple and readily available predictor of short- and long-term mortality from critical illness[J]. Clin Res Cardiol, 2013, 102(8):599-606.
doi: 10.1007/s00392-013-0571-4 |
[32] |
Esteve F, Lopez-Delgado JC, Javierre C, et al. Evaluation of the PaO2/FiO2 ratio after cardiac surgery as a predictor of outcome during hospital stay[J]. BMC Anesthesiol, 2014, 14:83.
doi: 10.1186/1471-2253-14-83 |
[33] |
Hu BJ, Tao LL, Rosenthal VD, et al. Device-associated infection rates, device use, length of stay, and mortality in intensive care units of 4 Chinese hospitals: International Nosocomial Control Consortium findings[J]. Am J Infect Control, 2013, 41(4):301-306.
doi: 10.1016/j.ajic.2012.03.037 |
[34] | 徐文秀, 方理超, 刘励军. APACHE Ⅱ评分和 SAPSⅡ评分预测危重症患者病死率的应用价值分析[J]. 中国血液流变学杂志, 2010, (2):245-247. |
[35] | 汪洋, 陈上仲, 胡才宝, 等. 基于随机森林法的严重脓毒症/脓毒性休克预后评估模型对患者28 d死亡的预测价值[J]. 中华危重病急救医学, 2017, 29(12):1071-1076. |
[1] | 许克新,丁泽华. 人工智能在功能泌尿外科的应用[J]. 北京大学学报(医学版), 2023, 55(5): 771-774. |
[2] | 林瑜,吴静依,蔺轲,胡永华,孔桂兰. 基于集成学习模型预测重症患者再入重症监护病房的风险[J]. 北京大学学报(医学版), 2021, 53(3): 566-572. |
[3] | 朱学华,杨明钰,夏海缀,何为,张智荧,刘余庆,肖春雷,马潞林,卢剑. 机器学习模型在预测肾结石输尿管软镜碎石术后早期结石清除率中的应用[J]. 北京大学学报(医学版), 2019, 51(4): 653-659. |
[4] | 蔺轲,谢俊卿,胡永华,孔桂兰. 支持向量机在ICU急性肾损伤患者住院死亡风险预测中的应用[J]. 北京大学学报(医学版), 2018, 50(2): 239-244. |
|