Quick Take

  • PROP-RL (Pipeline for Learning Robust Policies in Reinforcement Learning) was estimated to lower in-hospital mortality from 6.2% to 5.7% overall (0.5 percentage points) and from 3.8% to 2.2% (1.6 percentage points) in the subgroup where the learned recommendations diverged from clinicians.
  • Operational guidance for pharmacists: prioritize alerts for high‑impact loop‑diuretic states, concentrate reviews on patients likely to be treatment‑responsive, and reduce unnecessary chart review and workflow disruption by limiting recommendations to actionable states.

Why it Matters

  • Loop diuretics are commonly prescribed, yet substantial uncertainty and variability about when to start or stop them can produce inadequate care and is associated with worse outcomes (for example, higher acute kidney injury and electrolyte disturbances).
  • Decisions about diuretics are sequential, multi‑day, and data‑rich
  • this complexity burdens pharmacy and clinical teams and makes machine learning (ML) policies vulnerable to overfitting and misleading evaluations unless state representations and hyperparameter selection are robust.
  • Addressing these issues is important because reproducible, conservative recommendations that limit unnecessary deviations from clinicians can reduce low‑value alerts, protect constrained pharmacist time, and improve the reliability of clinical decision support (CDS) under limited resources.

What They Did

  • Retrospective, single‑center electronic health record (EHR) study at Michigan Medicine (2015–2019) of adults admitted via the emergency department who received supplemental oxygen
  • 36,570 hospitalizations (development 2015–2018, test 2019).
  • Developed PROP‑RL: an offline reinforcement learning (RL) pipeline that converts routine EHR features into discrete daily states and recommends a binary give/not action for loop diuretics.
  • Trained policies using offline constraints (batch‑constrained Q‑learning and pessimistic MDP) with multi‑partition hyperparameter tuning (Split‑Select‑Retrain, SSR) and evaluated performance against estimated clinician behavior using weighted importance sampling (WIS) and other off‑policy evaluation (OPE) estimators on the 2019 hold‑out.
  • Simulated retrospective deployment with 24‑hour windows (6 AM cutoffs), used FIDDLE preprocessing to derive 243‑dimensional features, clustered embeddings to form discrete states, and applied an 'unimportant‑state' relaxation to defer to clinicians in states unlikely to affect outcomes and reduce alert volume.

What They Found

  • On the held‑out test set, PROP‑RL was estimated to reduce in‑hospital mortality from 6.2% to 5.7% (0.52 percentage points
  • 95% CI −0.03 to 1.05
  • P = .03), outperforming clinician behavior in 967/1000 bootstrap trials (96.7%).
  • In the subgroup of hospitalizations that passed through the 2 divergent states, mortality was estimated to fall from 3.8% to 2.2% (1.58 percentage points
  • 95% CI 0.38–2.75
  • P = .006)
  • the learned policy outperformed clinicians in 994/1000 bootstrap trials (99.4%)
  • effective sample size (ESS) ≈ 550.
  • Of 60 discrete states, the policy deferred to clinicians in 36 'unimportant' states and produced recommendations in 24 actionable states
  • among windows in actionable states, only 17.7% received a different action under the learned policy, and overall disagreement with clinicians was ~21%, implying focused, lower‑volume pharmacy alerts.
  • Ablation studies showed that removing unimportant‑state relaxation or multi‑partition/state tuning produced either no viable policy (insufficient ESS) or markedly worse worst‑case performance. The improvements were driven by robust state‑definition tuning, unimportant‑state relaxation, and multi‑partition hyperparameter selection.

Takeaways

  • Integrate PROP‑RL with local EHR pipelines (using FIDDLE and the open‑source PROP‑RL codebase) to derive the discrete state space, and surface recommendations only for the 24 actionable states (and the 2 divergent states) in clinician‑facing alerts or dashboards timed to morning rounds (6 AM) for pharmacist review.
  • Operationalize cautiously: adopt SSR multi‑partition tuning and unimportant‑state relaxation, validate retrospectively with WIS and ESS thresholds during development, pilot a pharmacist‑led review workflow for flagged windows, and iteratively adjust thresholds to constrain alert volume before broader CDS deployment.
  • Practical framing: treat PROP‑RL as an 'AI pre‑rounds filter' that triages likely diuretic responders for pharmacist attention rather than as a replacement for clinical judgment
  • establish pharmacist‑led governance, train staff on state visualizations and disagreement metrics, and continuously monitor clinical performance and alert volume.

Strengths and Limitations

Strengths:

  • Pipeline robustness: multi‑partition SSR tuning, state‑definition tuning, unimportant‑state relaxation, and ablation studies improved robustness and constrained recommendations.
  • Rigorous evaluation: held‑out 2019 test set, primary use of WIS with high ESS and corroboration from other OPE methods
  • adherence to TRIPOD+AI reporting and publicly available code.

Limitations:

  • Single‑center, retrospective reliance on off‑policy evaluation (WIS during tuning) — prospective external validation and safety testing are required before clinical deployment.
  • Coarse action/time granularity: decisions were binary (give/not) at fixed daily times (6 AM) and did not account for dose, other medications, or finer timing, limiting immediate clinical applicability without further refinement.

Bottom Line

PROP‑RL produces conservative, actionable loop‑diuretic recommendations that focus pharmacist attention and reduce alert volume; it is suitable for pharmacist‑led pilot deployment as a triage tool but requires prospective external validation and safety testing before wider clinical use.