Summary

For implementation, the paper emphasizes rigorous local validation on retrospective institutional data using cohorts that reflect real practice (consecutive) and probe edge cases (enriched), and warns against overreliance on area under the receiver operating characteristic curve or Dice scores when positive predictive value and error patterns drive workload and safety (only 15% of FDA-cleared AI-CDS publicly report test cohorts >1,000). Prospective limited trials should vet the human–AI interface (e.g., triage, overlays, confidence), integration into PACS/EHR/reporting, algorithm turnaround and status visibility, and user feedback before scaling; full deployment requires site harmonization, compute/network scaling, metric trending, and explicit downtime plans as AI-enabled workflows become part of critical communication pathways. Value assessment spans financial and nonfinancial returns: direct reimbursement is rare and often temporary (e.g., NTAP up to 3 years), efficiency gains may be real but context-dependent (MRI sequence time reductions >60% do not always shorten room time), vendor ROI claims are often overstated, and costs beyond licensing (IT/integration/legal/clinical time) are substantial; non-monetary benefits include accuracy, timeliness, and provider satisfaction (e.g., 82% of clinicians reported improved job satisfaction with ambient AI scribes). Post-deployment surveillance should log inputs/outputs and track subgrouped performance, input/output distributions, and calibration; detect drift with predefined thresholds and structured governance oversight, recognizing label collection challenges and the need to combine quantitative monitoring with clinician feedback (illustrated by control-chart QC with ±2 SD triggers in MRI reconstruction).

Pharmacy translation: apply the same four-phase governance to medication-use AI (e.g., order verification triage, antimicrobial decision support, prior authorization automation, inventory/compounding QC, or large language model assistants). Prioritize locally relevant metrics like positive predictive value, alert burden, pharmacist intervention acceptance, turnaround time, and safety outcomes; run silent or limited pilots inside Epic/Willow workflows, ensure reliable routing and status indicators, plan downtimes/fallbacks, track fairness across sites and patient subgroups, and build dashboards that combine performance, workload, and ROI with clear ownership by an interdisciplinary governance committee.


Citation

Shah C, Ghodasara S, Chen D, Chen PH. Does It Work, Help, and Stay? A Framework for Deploying AI Tools in Radiology. J Am Coll Radiol. Published online October 6, 2025. doi:10.1016/j.jacr.2025.09.032