Why FDA-Cleared AI Still Needs Pharmacy Scrutiny

Quick Take

Census review of 1,012 FDA Summary of Safety and Effectiveness (SSED) documents for AI/ML-enabled devices found a mean AI Characteristics Transparency Reporting (ACTR) score of 3.3 out of 17 (SD 3.1), with 30% of devices scoring 0 and a maximum observed score of 12.

Transparency improved only modestly after the FDA’s 2021 Good Machine Learning Practice (GMLP) guidance (mean +0.88 points, 95% CI 0.54–1.23); over half (51.6%) of devices reported no performance metrics and most omitted training/testing provenance.

Why it Matters

FDA summaries are often the primary public source procurement and clinical teams can access; widespread omission of dataset provenance, demographics, and evaluation metrics prevents assessment of whether a cleared algorithm is valid for local patients and workflows.

Nearly half of devices reported no clinical study and most clinical evaluations were retrospective, which risks optimistic performance estimates that may not generalize to real-time inpatient pharmacy operations.

These transparency gaps convert many cleared AI tools into operational 'black boxes,' forcing pharmacy informatics and safety teams to perform resource-intensive local validation and ongoing stewardship to manage unknown risks.

What They Did

Census review of 1,012 accessible FDA SSEDs for AI/ML-enabled devices cleared or approved through December 2024 (devices spanned specialties, 76% radiology, and clearance pathways, predominantly 510(k)).

Manual extraction of model, dataset, clinical-study, and performance fields from each public SSED with dual-reviewer checks and consensus resolution; development of a 17-point ACTR transparency score mapped to FDA GMLP elements.

Compared reporting before versus after the 2021 GMLP guidance using linear mixed-effects models (controlling for deep learning use and predicate AI) and descriptive frequencies; study limited to publicly disclosed SSED information.

What They Found

Mean ACTR score was 3.3/17 (SD 3.1); 30% of devices (n = 304) scored 0 and the highest observed score was 12.

After the 2021 guidance, ACTR increased by 0.88 points (95% CI 0.54–1.23) but remained low overall.

Dataset and provenance gaps: only 23.7% reported dataset demographics, training dataset source was absent in 93.3% of SSEDs, and testing dataset source was absent in 75.5%.

Clinical evidence gaps: 46.9% of devices reported no clinical study; of those that did, 60.5% were retrospective and 14% were prospective. Clinical study sample size was reported for 39.8% of devices (n = 403).

Performance reporting gaps: 51.6% reported no performance metric. Where metrics were reported, median values were sensitivity 91.2%, specificity 91.4%, AUROC 96.1%, positive predictive value (PPV) 59.9%, and negative predictive value (NPV) 98.9%.

Regulatory context relevant to pharmacy: 96.4% of devices were cleared via the 510(k) pathway; 76.3% lacked demographic disclosure and 51.6% lacked performance metrics, indicating hospitals will frequently need local 'shadow' validation before operational deployment.

The modest post-2021 transparency gains were driven primarily by small increases in reporting of dataset sources, sizes, and demographics rather than wholesale improvements in clinical evaluation or public performance disclosure.

Takeaways

Treat FDA clearance as a regulatory minimum, not as sufficient clinical evidence—SSEDs routinely omit the provenance, demographics, and performance details needed to assess local fit.

For inpatient pharmacy, consider every AI/ML tool a potential high-risk intervention: require vendor disclosure of training/testing populations, clear performance metrics (including PPV/NPV at relevant prevalence), and prospective or multi-site validation before deployment.

Plan to resource mandatory local validation (shadow-mode testing, concordance analysis, and subgroup/bias checks) and ongoing monitoring (sentinel metrics for drift) as part of procurement and implementation budgets.

Use procurement leverage to require structured disclosures (standardized 'model cards' or equivalent) and Predetermined Change Control Plans (PCCPs) as contract conditions; if vendors will not provide these, decline adoption or require robust local evaluation before go-live.

Strengths and Limitations

Strengths:

Census of 1,012 publicly accessible FDA SSEDs across specialties provides a comprehensive, system-level audit of public regulatory transparency.

A prespecified 17-point ACTR score aligned to FDA GMLP elements, with dual-reviewer extraction and mixed-effects modeling, supports analytic rigor and appropriate adjustment for confounders.

Limitations:

Analysis is limited to publicly available SSEDs; manufacturers may have submitted more complete data confidentially to the FDA that are not reflected in public summaries.

Manual abstraction of heterogeneous narratives and use of a novel ACTR index may introduce subjectivity and may not capture all nuances or the relative clinical weight of different reporting elements.

Bottom Line

Public FDA device summaries for AI/ML are insufficiently transparent for pharmacy reliance—treat FDA clearance as a floor, require vendor-provided model cards and PCCPs, and implement mandatory local validation and monitoring before operational use.