Journal of Peking University (Health Sciences) ›› 2021, Vol. 53 ›› Issue (3): 566-572. doi: 10.19723/j.issn.1671-167X.2021.03.021

Previous Articles     Next Articles

Prediction of intensive care unit readmission for critically ill patients based on ensemble learning

LIN Yu1,2,WU Jing-yi3,LIN Ke1,2,HU Yong-hua2,4,KONG Gui-lan1,3,Δ()   

  1. 1. National Institute of Health Data Science, Peking University, Beijing 100191, China
    2. Department of Epidemiology and Biostatistics, Peking University School of Public Health, Beijing 100191, China
    3. Advanced Institute of Information Technology, Peking University, Hangzhou 311215, China
    4. Peking University Medical Informatics Center, Beijing 100191, China
  • Received:2019-07-05 Online:2021-06-18 Published:2021-06-16
  • Contact: Gui-lan KONG E-mail:guilan.kong@hsc.pku.edu.cn
  • Supported by:
    National Natural Science Foundation of China(81771938);National Natural Science Foundation of China(91846101);Beijing Municipal Natural Science Foundation(7212201);Project of the University of Michigan Health System-Peking University Health Science Center Joint Institute for Translational and Clinical Research BMU(BMU2020JI011)

Abstract:

Objective: To develop machine learning models for predicting intensive care unit (ICU) readmission using ensemble learning algorithms. Methods: A publicly accessible American ICU database, medical information mart for intensive care (MIMIC)-Ⅲ as the data source was used, and the patients were selected by the inclusion and exclusion criteria. A set of variables that had the predictive ability of outcome including demographics, vital signs, laboratory tests, and comorbidities of patients were extracted from the dataset. We built the ICU readmission prediction models based on ensemble learning methods including random forest, adaptive boosting (AdaBoost), and gradient boosting decision tree (GBDT), and compared the prediction performance of the machine learning models with a conventional Logistic regression model. Five-fold cross validation was used to train and validate the prediction models. Average sensitivity, positive prediction value, negative prediction value, false positive rate, false negative rate, area under the receiver operating characteristic curve (AUROC) and Brier score were used as performance measures. After constructing the prediction models, top 10 predictive variables based on importance ranking were identified by the model with the best discrimination. Results: Among these ICU readmission prediction models, GBDT (AUROC=0.858) had better performance than random forest (AUROC=0.827), and was slightly superior to AdaBoost (AUROC=0.851) in terms of AUROC. Compared with Logistic regression (AUROC=0.810), the discrimination of the three ensemble learning models was much better. The feature importance provided by GBDT showed that the top ranking variables included vital signs and laboratory tests. The patients with ICU readmission had higher mean arterial pressure, systolic blood pressure, diastolic blood pressure, and heart rate than the patients without ICU readmission. Meanwhile, the patients readmitted to ICU experienced lower urine output and higher serum creatinine. Overall, the patients having repeated admissions during their hospitalization showed worse heart function and renal function compared with the patients without ICU readmission. Conclusion: The ensemble learning based ICU readmission prediction models had better performance than Logistic regression model. Such ensemble learning models have the potential to aid ICU physicians in identifying those patients with high risk of ICU readmission and thus help improve overall clinical outcomes.

Key words: Intensive care units, Patient readmission, Machine learning, Predictive value of tests

CLC Number: 

  • R459.7

Table 1

Performance comparison between random under-sampling and NearMiss methods"

Classifier Accuracy
Random under-sampling NearMiss
Logistic regression 0.615 0.838
Random forest 0.542 0.844
AdaBoost 0.620 0.873
GBDT 0.626 0.874

Figure 1

Recursive feature elimination based on Logistic regression RFE, recursive feature elimination; LR, Logistic regression."

Figure 2

Logit function"

Figure 3

Decision tree a, b, c, d, represent the threshold of feature value in node splitting."

Table 2

The characteristics of critically ill patients with and without ICU readmission"

Characteristics Total Patients with ICU readmission Patients without ICU readmission P value
Ageb/years, $\bar{x} \pm s$ 62.67±16.22 64.08±15.16 62.58±16.29 <0.001
Gender (female)a, n (%) 11 319 (42.38) 671 (41.69) 10 648 (42.49) 0.16
ICU stayb/d, $\bar{x} \pm s$ 4.56±5.60 5.46±6.12 4.50±5.56 <0.001
GCS total scoreb, $\bar{x} \pm s$ 14.33±1.42 14.17±1.58 14.34±1.41 <0.001
GCS motorb, $\bar{x} \pm s$ 5.89±0.50 5.88±0.50 5.89±0.50 0.08
GCS verbalb, $\bar{x} \pm s$ 4.41±1.28 4.40±1.18 4.41±1.29 <0.001
GCS eyesb, $\bar{x} \pm s$ 3.75±0.54 3.73±0.56 3.75±0.54 0.10
Admission typea, n (%)
Medical 18 208 (68.17) 1 054 (63.92) 17 154 (68.45)
Scheduled surgery 3 620 (13.55) 179 (10.86) 3 441 (13.73) <0.001
Unscheduled surgery 4 881 (18.27) 416 (25.23) 4 465 (17.82)

Table 3

The prediction performance of Logistic regression, random forest, AdaBoost, and GBDT"

Performance Logistic regression Random forest AdaBoost GBDT
Sensitivity 0.763±0.029 0.787±0.010 0.821±0.029 0.817±0.028
PPV 0.843±0.022 0.858±0.056 0.876±0.040 0.892±0.038
NPV 0.784±0.018 0.802±0.012 0.832±0.020 0.832±0.017
FPR 0.143±0.026 0.134±0.057 0.120±0.044 0.101±0.040
FNR 0.237±0.029 0.213±0.010 0.179±0.029 0.183±0.028
AUROC 0.810±0.013 0.827±0.029 0.851±0.017 0.858±0.013
Brier score 0.190±0.013 0.173±0.029 0.149±0.017 0.142±0.013

Table 4

Top 10 variables identified by feature importance based on GBDT model and their distributions"

Rank Variables Importance With ICU readmission, $\bar{x} \pm s$ Without ICU readmission, $\bar{x} \pm s$
1 Platelet/(×103/μL) 0.109 240.05±154.37 230.05±131.88
2 Glucose (maximum)/(mg/dL) 0.093 208.36±109.04 199.68±104.80
3 Urine output/mL 0.089 1 908.70±1 111.36 2 089.51±1 129.90
4 Mean arterial pressure (maximum)/mmHg 0.067 122.04±43.03 116.87±38.68
5 Glucose (minimum)/(mg/dL) 0.058 96.06±33.03 99.83±35.78
6 Heart rate (maximum)/(beat/min) 0.058 114.38±25.07 109.35±23.49
7 Creatinine/(mg/dL) 0.053 1.33±1.24 1.19±1.23
8 Systolic blood pressure (maximum)/mmHg 0.042 160.42±29.69 158.04±28.14
9 Diastolic blood pressure (minimum)/mmHg 0.035 40.47±12.59 41.99±12.85
10 Heart rate (minimum)/(beat/min) 0.031 68.79±15.12 68.30±14.46
[1] Halpern NA, Pastores SM. Critical care medicine in the United States 2000-2005: an analysis of bed numbers, occupancy rates, payer mix, and costs[J]. Crit Care Med, 2010,38(1):65-71.
doi: 10.1097/CCM.0b013e3181b090d0
[2] Woldhek AL, Rijkenberg S, Bosman RJ, et al. Readmission of ICU patients: A quality indicator?[J]. J Crit Care, 2017,38:328-334.
doi: 10.1016/j.jcrc.2016.12.001
[3] Kramer AA, Higgins TL, Zimmerman JE. The association between ICU readmission rate and patient outcomes[J]. Crit Care Med, 2013,41(1):24-33.
doi: 10.1097/CCM.0b013e3182657b8a
[4] Rosenberg AL, Hofer TP, Hayward RA, et al. Who bounces back? Physiologic and other predictors of intensive care unit readmission[J]. Crit Care Med, 2001,29(3):511-518.
pmid: 11373413
[5] Baker DR, Pronovost PJ, Morlock LL, et al. Patient flow variabi-lity and unplanned readmissions to an intensive care unit[J]. Crit Care Med, 2009,37(11):2882-2887.
doi: 10.1097/CCM.0b013e3181b01caf
[6] Martin LA, Kilpatrick JA, Al-Dulaimi R, et al. Predicting ICU readmission among surgical ICU patients: Development and validation of a clinical nomogram[J]. Surgery, 2019,165(2):373-380.
doi: S0039-6060(18)30429-X pmid: 30170817
[7] Lee H, Lim CW, Hong HP, et al. Efficacy of the APACHE Ⅱ score at ICU discharge in predicting post-ICU mortality and ICU readmission in critically ill surgical patients[J]. Anaesth Intensive Care, 2015,43(2):175-186.
doi: 10.1177/0310057X1504300206
[8] Fialho AS, Cismondi F, Vieira SM, et al. Data mining using clinical physiology at discharge to predict ICU readmissions[J]. Expert Syst Appl, 2012,39(18):13158-13165.
doi: 10.1016/j.eswa.2012.05.086
[9] Desautels T, Das R, Calvert J, et al. Prediction of early unplanned intensive care unit readmission in a UK tertiary care hospital: a cross-sectional machine learning approach[J]. BMJ Open, 2017,7(9):e017199.
doi: 10.1136/bmjopen-2017-017199
[10] Hosni M, Abnane I, Idri A, et al. Reviewing ensemble classification methods in breast cancer[J]. Comput Methods Programs Biomed, 2019,177:89-112.
doi: 10.1016/j.cmpb.2019.05.019
[11] Liu Y, Gu Y, Nguyen JC, et al. Symptom severity classification with gradient tree boosting[J]. J Biomed Inform, 2017,75S:S105-S111.
[12] Johnson AE, Pollard TJ, Shen L, et al. MIMIC-Ⅲ, a freely accessible critical care database[J]. Sci Data, 2016,3:160035.
doi: 10.1038/sdata.2016.35
[13] Austin SR, Wong YN, Uzzo RG, et al. Why summary comorbidity measures such as the Charlson comorbidity index and Elixhauser score work[J]. Med Care, 2015,53(9):E65-E72.
doi: 10.1097/MLR.0b013e318297429c
[14] Oakes DF, Borges IN, Forgiarini Junior LA, et al. Assessment of ICU readmission risk with the stability and workload index for transfer score[J]. J Bras Pneumol, 2014,40(1):73-76.
doi: 10.1590/S1806-37132014000100011
[15] Xue Y, Klabjan D, Luo Y. Predicting ICU readmission using grouped physiological and medication trends[J]. Artif Intell Med, 2019,95:27-37.
doi: S0933-3657(17)30648-6 pmid: 30213670
[16] He HB, Garcia EA. Learning from imbalanced data[J]. IEEE T Knowl Data En, 2009,21(9):1263-1284.
doi: 10.1109/TKDE.2008.239
[17] Rahman R, Matlock K, Ghosh S, et al. Heterogeneity aware random forest for drug sensitivity prediction[J]. Sci Rep, 2017,7(1):11347.
doi: 10.1038/s41598-017-11665-4
[18] Hu J. Automated detection of driver fatigue based on AdaBoost classifier with EEG signals[J]. Front Comput Neurosci, 2017,11:72.
doi: 10.3389/fncom.2017.00072
[19] Friedman JH. Greedy function approximation: A gradient boosting machine[J]. Ann Stat, 2001,29(5):1189-1232.
doi: 10.1214/aos/1013203450
[20] Mani I, Zhang I. kNN approach to unbalanced data distributions: a case study involving information extraction[C]// ICML 2003 Workshop on Learning from Imbalanced Datasets, August 21-24, 2003. Washington, D.C.: ICML, 2003.
[1] Jing-yi WU,Yu LIN,Ke LIN,Yong-hua HU,Gui-lan KONG. Predicting prolonged length of intensive care unit stay via machine learning [J]. Journal of Peking University (Health Sciences), 2021, 53(6): 1163-1170.
[2] Xue-hua ZHU,Ming-yu YANG,Hai-zhui XIA,Wei HE,Zhi-ying ZHANG,Yu-qing LIU,Chun-lei XIAO,Lu-lin MA,Jian LU. Application of machine learning models in predicting early stone-free rate after flexible ureteroscopic lithotripsy for renal stones [J]. Journal of Peking University(Health Sciences), 2019, 51(4): 653-659.
[3] YANG Cheng, ZHANG Yu-qi, TANG Xun, GAO Pei, WEI Chen-lu, HU Yong-hua. Retrospective cohort study for the impact on readmission of patients with ischemic stroke after treatment of aspirin plus clopidogrel or aspirin mono-therapy [J]. Journal of Peking University(Health Sciences), 2016, 48(3): 442-447.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] . [J]. Journal of Peking University(Health Sciences), 2009, 41(4): 456 -458 .
[2] . [J]. Journal of Peking University(Health Sciences), 2009, 41(2): 125 -128 .
[3] . [J]. Journal of Peking University(Health Sciences), 2009, 41(2): 135 -140 .
[4] . [J]. Journal of Peking University(Health Sciences), 2009, 41(2): 158 -161 .
[5] . [J]. Journal of Peking University(Health Sciences), 2009, 41(2): 217 -220 .
[6] . [J]. Journal of Peking University(Health Sciences), 2009, 41(1): 52 -55 .
[7] . [J]. Journal of Peking University(Health Sciences), 2009, 41(3): 297 -301 .
[8] . [J]. Journal of Peking University(Health Sciences), 2009, 41(5): 599 -601 .
[9] . [J]. Journal of Peking University(Health Sciences), 2009, 41(5): 516 -520 .
[10] . [J]. Journal of Peking University(Health Sciences), 2007, 39(3): 304 -309 .