Summary
A three-stage evaluation of a prolonged opioid use (POU) prediction model showed that external population shifts reduced performance and changed where the model provided clinical benefit, with retraining improving accuracy but not eliminating subgroup disparities. Decision curve analysis using standardized net benefit (SNB) revealed that different patient groups benefited at different risk thresholds, indicating a single cutoff could produce unequal care. The model’s clinical utility varied meaningfully by subgroup—especially in opioid-exposed and high-comorbidity patients—so health systems should locally validate and recalibrate thresholds before EHR implementation. Fairness assessment should move beyond AUROC parity to include calibration and SNB analyses across demographics and risk groups.
Citation
Naderalvojoud B, Curtin C, Asch SM, et al. Evaluating the impact of data biases on algorithmic fairness and clinical utility of machine learning models for prolonged opioid use prediction. JAMIA Open. 2025;8(5):ooaf115. Published 2025 Sep 30. doi:10.1093/jamiaopen/ooaf115