Quick Take

  • Transfer learning (TL) raised 90‑day mortality area under the precision‑recall curve (AUPRC) from 0.37 to 0.54 (+38%) in the esophageal cohort, and improved AUPRC by 14% in liver and 8% in pancreatic cohorts (all p<0.001).
  • Pharmacy implication: TL enables building accurate, locally‑fine‑tuned predictive models from limited local data, allowing prioritization of highest‑risk patients/orders and reducing manual review burden.

Why it Matters

  • Data scarcity prevents reliable, high‑dimensional risk modeling for low‑frequency, high‑impact events; conventional scores (American Society of Anesthesiologists Physical Status, ASA‑PS) and small 'homegrown' models can miss patients or overfit to noise, undermining safety and wasting pharmacist time.
  • Transfer learning (TL) leverages large pre‑trained models with local fine‑tuning to stabilize predictions and recover clinically plausible drivers (age, Charlson Comorbidity Index), enabling accurate stratification where local samples are too small for standalone machine learning.
  • For inpatient pharmacy this creates a feasible path to trustworthy prediction for rare adverse drug events (ADEs), therapeutic drug monitoring (TDM) edge cases, and risk‑sorted pharmacist workflows—reducing manual review, strengthening stewardship, and improving clinical decision support under resource constraints.

What They Did

  • Retrospective multicenter cohort from three tertiary hospitals (2015–2023) including 14,922 advanced general surgery patients; primary endpoint was 90‑day mortality.
  • Trained large source neural networks (NNs) on 85 preoperative variables using non‑target cases, then transferred and organ‑fine‑tuned models for esophageal, liver, pancreatic and colorectal cohorts.
  • Compared TL models to conventional organ‑specific neural nets and to ASA‑PS and Charlson Comorbidity Index (CCI) scores, evaluating area under the receiver operating characteristic curve (AUROC), area under the precision‑recall curve (AUPRC), and F1 with cross‑validation and held‑out test splits.
  • Design notes: retrospective electronic health record data with k‑nearest neighbors (k‑NN) imputation for missing values, Shapley Additive Explanations (SHAP) for feature importance, nested hyperparameter tuning, and no external validation beyond the three hospitals.

What They Found

  • Transfer learning increased 90‑day mortality AUPRC: esophageal 0.37 → 0.54 (+38%); liver 0.30 → 0.34 (+14%); pancreatic 0.29 → 0.31 (+8%); esophageal AUROC rose 0.75 → 0.84.
  • Colorectal (n=7,424) already showed high performance with conventional training (AUROC 0.92, AUPRC 0.57); TL provided no net gain but fine‑tuning restored parity (final NN fine‑tuned AUPRC 0.57).
  • All neural networks outperformed ASA‑PS and CCI across domains (example: esophageal ASA AUPRC 0.12 vs TL 0.54; liver ASA AUPRC 0.10 vs NNFI 0.34).
  • SHAP analyses showed TL models consistently prioritized age and Charlson Comorbidity Index (normalized ≈1.0) and avoided spurious predictors that appeared in small‑sample conventional models—practical pharmacy implication: TL‑derived risk scores deliver stable, actionable drivers to prioritize pharmacist review and triage high‑risk medication orders.
  • The performance gains were driven by stable, clinically plausible pre‑trained feature representations that prevented small‑sample overfitting.

Takeaways

  • Pilot a TL 90‑day mortality (or analogous ADE/TDM) score: fine‑tune a pre‑trained source model on local cases and surface the score in pharmacist workqueues (for example, as a dedicated column in the verification queue) to triage reviews.
  • Data and interpretability requirements: map the paper’s 85 preoperative features to local EHR fields, audit completeness, define imputation rules, and show SHAP top drivers alongside each score; train pharmacists to interpret feature drivers.
  • Prioritize small, high‑risk services where TL adds most (analogous to the esophageal cohort); for large, well‑represented groups, baseline models may suffice. Monitor AUPRC, alert volume, and subgroup performance with dashboards.
  • Operational framing: TL behaves like an experienced float pharmacist—arrives skilled and adapts locally. Keep human oversight: an AI governance group should set thresholds and escalation rules, and pharmacists retain final authority.

Strengths and Limitations

Strengths:

  • Multicenter cohort from three tertiary centers with nested cross‑validation, hyperparameter tuning, and an AUPRC‑focused evaluation tailored to imbalanced mortality outcomes.
  • SHAP explainability, TRIPOD/TITAN‑aligned reporting, and public code increased transparency and reproducibility.

Limitations:

  • Retrospective design with k‑NN imputation for ~25% missingness and inclusion only of patients with 90‑day follow‑up, risking imputation and follow‑up bias.
  • No external validation beyond the three urban teaching centers, limiting transportability across other hospitals and electronic health record systems.

Bottom Line

Transfer learning is ready for controlled pharmacy pilots: it can enable accurate risk stratification in data‑scarce medication domains, but local fine‑tuning, rigorous silent validation on local data, and governance (interpretability, monitoring for bias, and maintenance) are essential before operational deployment.