Quick Take
- Gradient-boosted (XGBoost) and ridge logistic regression models were trained on 1.6 million emergency department (ED) visits from electronic health records (EHRs) and temporally validated on ~719,000 visits to predict sepsis within 48 hours, achieving area under the receiver operating characteristic curve (AUROC) ≈ 0.92–0.94.
- Models could be embedded into Epic/Cerner ED decision support to trigger sepsis order sets and help prioritize high‑risk pediatric patients, enabling earlier treatment and targeted pharmacy review workflows.
Why it Matters
- Sepsis is a leading cause of pediatric death
- early ED recognition is essential to prevent organ dysfunction, yet existing predictive tools have not reliably identified children before dysfunction occurs, leaving a gap in early prognostic identification.
- That gap complicates ED triage and medication workflows because prior models often focused on patients already meeting sepsis criteria or had suboptimal test characteristics
- validated early predictors are needed to support targeted EHR decision support, antimicrobial stewardship, and efficient use of constrained medication and staffing resources.
What They Did
- Used a multisite EHR registry (Epic and Cerner) from five U.S. pediatric ED systems caring for children aged 2 months to <18 years, with 1.6 million visits for model training and ~719,000 temporally held‑out validation visits.
- Trained gradient‑boosted tree (XGBoost) and ridge logistic regression models using features from the first 4 hours of ED care (vital signs, laboratory results, Emergency Severity Index [ESI], and markers of medical complexity).
- Predicted sepsis and septic shock within 48 hours using the Phoenix Sepsis Criteria (PSC)
- visits that met sepsis during the 4‑hour feature window were excluded from training and evaluation.
- Validated models with temporal holdout (Jan 2021–Dec 2022) and held‑out site testing, and excluded March–Dec 2020 to avoid atypical early‑COVID care.
What They Found
- Models trained on 1.6M ED visits and temporally validated on ~719K achieved AUROCs of 0.923 (ridge) to 0.936 (XGBoost) for PSC sepsis and ≥0.92 for PSC shock, with site AUROCs ranging 0.922–0.945.
- At 90% sensitivity, XGBoost had specificity 0.807 (positive predictive value [PPV] 0.017
- number needed to evaluate [NNE] 59) versus ridge specificity 0.779 (PPV 0.015
- NNE 68)
- Youden cutpoints increased positive likelihood ratios (LR+) to ~5.8–6.2 and PPV to ~0.022 — implying pharmacists may need to review on the order of ~59–68 alerts per true sepsis case.
- Event counts: PSC sepsis — 5,634 (0.35%) in training and 2,639 (0.37%) in test
- PSC shock — 2,446 (0.15%) in training and 1,062 (0.15%) in test.
- Top predictors included ESI triage criteria, age‑adjusted vital signs (minimum oxygen saturation, shock index), and markers of medical complexity.
Takeaways
- Integrate model outputs into Epic/Cerner ED workflows to flag high‑risk pediatric visits, trigger sepsis order sets, and populate pharmacist review workqueues so teams can prioritize early action.
- Plan staffing and thresholds around the alert burden: expect roughly 59–68 alerts per true sepsis case at 90% sensitivity
- pilot a two‑step clinician + pharmacist review and tune cutpoints (for example, Youden) to reduce alerts before wider deployment.
- Establish multidisciplinary governance (informatics, ED, pharmacy), monitor PPV/NNE and subgroup performance (age, payer), schedule periodic recalibration, and log model‑driven actions for quality review.
- Operationalize the model as an ‘early‑warning spotlight’ that surfaces candidates for focused human review rather than as an automatic treatment trigger
- preserve pharmacist clinical judgment and formalize thresholds and workflow changes through governance.
Strengths and Limitations
Strengths:
- Large multicenter EHR derivation cohort with temporal and held‑out site validation across five pediatric ED systems, enhancing robustness and generalizability.
- Rigorous hyperparameter tuning and cross‑validation plus Shapley additive explanations (SHAP) for feature importance supported interpretability and facilitated assessment of performance equity.
Limitations:
- Outcomes may reflect care‑process artifacts and EHR missingness (for example, absent oxygen saturation), which can confound physiologic signals and affect transportability across settings.
- Complex gradient‑boosted models pose challenges for explainability and deployment
- low event prevalence and payer skew mean prospective local calibration is needed to limit false alerts and alert fatigue.
Bottom Line
Models are ready for pilot integration into ED and pharmacy workflows to prioritize high‑risk pediatric patients and support pharmacist review and sepsis order sets, but require local calibration, two‑step review workflows, and multidisciplinary governance to limit alert burden.