Quick Take

  • Census review of 1,012 FDA Summary of Safety and Effectiveness (SSED) documents for AI/ML-enabled devices found a mean AI Characteristics Transparency Reporting (ACTR) score of 3.3 out of 17 (SD 3.1), with 30% of devices scoring 0 and a maximum observed score of 12.
  • Transparency improved only modestly after the FDA’s 2021 Good Machine Learning Practice (GMLP) guidance (mean +0.88 points, 95% CI 0.54–1.23); over half (51.6%) of devices reported no performance metrics and most omitted training/testing provenance.

Why it Matters

  • FDA summaries are often the primary public source procurement and clinical teams can access; widespread omission of dataset provenance, demographics, and evaluation metrics prevents assessment of whether a cleared algorithm is valid for local patients and workflows.
  • Nearly half of devices reported no clinical study and most clinical evaluations were retrospective, which risks optimistic performance estimates that may not generalize to real-time inpatient pharmacy operations.
  • These transparency gaps convert many cleared AI tools into operational 'black boxes,' forcing pharmacy informatics and safety teams to perform resource-intensive local validation and ongoing stewardship to manage unknown risks.

What They Did

  • Census review of 1,012 accessible FDA SSEDs for AI/ML-enabled devices cleared or approved through December 2024 (devices spanned specialties, 76% radiology, and clearance pathways, predominantly 510(k)).
  • Manual extraction of model, dataset, clinical-study, and performance fields from each public SSED with dual-reviewer checks and consensus resolution; development of a 17-point ACTR transparency score mapped to FDA GMLP elements.
  • Compared reporting before versus after the 2021 GMLP guidance using linear mixed-effects models (controlling for deep learning use and predicate AI) and descriptive frequencies; study limited to publicly disclosed SSED information.

What They Found

  • Mean ACTR score was 3.3/17 (SD 3.1); 30% of devices (n = 304) scored 0 and the highest observed score was 12.
  • After the 2021 guidance, ACTR increased by 0.88 points (95% CI 0.54–1.23) but remained low overall.
  • Dataset and provenance gaps: only 23.7% reported dataset demographics, training dataset source was absent in 93.3% of SSEDs, and testing dataset source was absent in 75.5%.
  • Clinical evidence gaps: 46.9% of devices reported no clinical study; of those that did, 60.5% were retrospective and 14% were prospective. Clinical study sample size was reported for 39.8% of devices (n = 403).
  • Performance reporting gaps: 51.6% reported no performance metric. Where metrics were reported, median values were sensitivity 91.2%, specificity 91.4%, AUROC 96.1%, positive predictive value (PPV) 59.9%, and negative predictive value (NPV) 98.9%.
  • Regulatory context relevant to pharmacy: 96.4% of devices were cleared via the 510(k) pathway; 76.3% lacked demographic disclosure and 51.6% lacked performance metrics, indicating hospitals will frequently need local 'shadow' validation before operational deployment.
  • The modest post-2021 transparency gains were driven primarily by small increases in reporting of dataset sources, sizes, and demographics rather than wholesale improvements in clinical evaluation or public performance disclosure.

Takeaways

  • Treat FDA clearance as a regulatory minimum, not as sufficient clinical evidence—SSEDs routinely omit the provenance, demographics, and performance details needed to assess local fit.
  • For inpatient pharmacy, consider every AI/ML tool a potential high-risk intervention: require vendor disclosure of training/testing populations, clear performance metrics (including PPV/NPV at relevant prevalence), and prospective or multi-site validation before deployment.
  • Plan to resource mandatory local validation (shadow-mode testing, concordance analysis, and subgroup/bias checks) and ongoing monitoring (sentinel metrics for drift) as part of procurement and implementation budgets.
  • Use procurement leverage to require structured disclosures (standardized 'model cards' or equivalent) and Predetermined Change Control Plans (PCCPs) as contract conditions; if vendors will not provide these, decline adoption or require robust local evaluation before go-live.

Strengths and Limitations

Strengths:

  • Census of 1,012 publicly accessible FDA SSEDs across specialties provides a comprehensive, system-level audit of public regulatory transparency.
  • A prespecified 17-point ACTR score aligned to FDA GMLP elements, with dual-reviewer extraction and mixed-effects modeling, supports analytic rigor and appropriate adjustment for confounders.

Limitations:

  • Analysis is limited to publicly available SSEDs; manufacturers may have submitted more complete data confidentially to the FDA that are not reflected in public summaries.
  • Manual abstraction of heterogeneous narratives and use of a novel ACTR index may introduce subjectivity and may not capture all nuances or the relative clinical weight of different reporting elements.

Bottom Line

Public FDA device summaries for AI/ML are insufficiently transparent for pharmacy reliance—treat FDA clearance as a floor, require vendor-provided model cards and PCCPs, and implement mandatory local validation and monitoring before operational use.