Quick Take
- Census review of 1,012 FDA Summary of Safety and Effectiveness (SSED) documents for AI/ML-enabled devices found a mean AI Characteristics Transparency Reporting (ACTR) score of 3.3 out of 17 (SD 3.1), with 30% of devices scoring 0 and a maximum observed score of 12.
- Transparency improved only modestly after the FDA’s 2021 Good Machine Learning Practice (GMLP) guidance (mean +0.88 points, 95% CI 0.54–1.23); over half (51.6%) of devices reported no performance metrics and most omitted training/testing provenance.
Why it Matters
- FDA summaries are often the primary public source procurement and clinical teams can access; widespread omission of dataset provenance, demographics, and evaluation metrics prevents assessment of whether a cleared algorithm is valid for local patients and workflows.
- Nearly half of devices reported no clinical study and most clinical evaluations were retrospective, which risks optimistic performance estimates that may not generalize to real-time inpatient pharmacy operations.
- These transparency gaps convert many cleared AI tools into operational 'black boxes,' forcing pharmacy informatics and safety teams to perform resource-intensive local validation and ongoing stewardship to manage unknown risks.
What They Did
- Census review of 1,012 accessible FDA SSEDs for AI/ML-enabled devices cleared or approved through December 2024 (devices spanned specialties, 76% radiology, and clearance pathways, predominantly 510(k)).
- Manual extraction of model, dataset, clinical-study, and performance fields from each public SSED with dual-reviewer checks and consensus resolution; development of a 17-point ACTR transparency score mapped to FDA GMLP elements.
- Compared reporting before versus after the 2021 GMLP guidance using linear mixed-effects models (controlling for deep learning use and predicate AI) and descriptive frequencies; study limited to publicly disclosed SSED information.
What They Found
- Mean ACTR score was 3.3/17 (SD 3.1); 30% of devices (n = 304) scored 0 and the highest observed score was 12.
- After the 2021 guidance, ACTR increased by 0.88 points (95% CI 0.54–1.23) but remained low overall.
- Dataset and provenance gaps: only 23.7% reported dataset demographics, training dataset source was absent in 93.3% of SSEDs, and testing dataset source was absent in 75.5%.
- Clinical evidence gaps: 46.9% of devices reported no clinical study; of those that did, 60.5% were retrospective and 14% were prospective. Clinical study sample size was reported for 39.8% of devices (n = 403).
- Performance reporting gaps: 51.6% reported no performance metric. Where metrics were reported, median values were sensitivity 91.2%, specificity 91.4%, AUROC 96.1%, positive predictive value (PPV) 59.9%, and negative predictive value (NPV) 98.9%.
- Regulatory context relevant to pharmacy: 96.4% of devices were cleared via the 510(k) pathway; 76.3% lacked demographic disclosure and 51.6% lacked performance metrics, indicating hospitals will frequently need local 'shadow' validation before operational deployment.
- The modest post-2021 transparency gains were driven primarily by small increases in reporting of dataset sources, sizes, and demographics rather than wholesale improvements in clinical evaluation or public performance disclosure.
Takeaways
- Treat FDA clearance as a regulatory minimum, not as sufficient clinical evidence—SSEDs routinely omit the provenance, demographics, and performance details needed to assess local fit.
- For inpatient pharmacy, consider every AI/ML tool a potential high-risk intervention: require vendor disclosure of training/testing populations, clear performance metrics (including PPV/NPV at relevant prevalence), and prospective or multi-site validation before deployment.
- Plan to resource mandatory local validation (shadow-mode testing, concordance analysis, and subgroup/bias checks) and ongoing monitoring (sentinel metrics for drift) as part of procurement and implementation budgets.
- Use procurement leverage to require structured disclosures (standardized 'model cards' or equivalent) and Predetermined Change Control Plans (PCCPs) as contract conditions; if vendors will not provide these, decline adoption or require robust local evaluation before go-live.
Strengths and Limitations
Strengths:
- Census of 1,012 publicly accessible FDA SSEDs across specialties provides a comprehensive, system-level audit of public regulatory transparency.
- A prespecified 17-point ACTR score aligned to FDA GMLP elements, with dual-reviewer extraction and mixed-effects modeling, supports analytic rigor and appropriate adjustment for confounders.
Limitations:
- Analysis is limited to publicly available SSEDs; manufacturers may have submitted more complete data confidentially to the FDA that are not reflected in public summaries.
- Manual abstraction of heterogeneous narratives and use of a novel ACTR index may introduce subjectivity and may not capture all nuances or the relative clinical weight of different reporting elements.
Bottom Line
Public FDA device summaries for AI/ML are insufficiently transparent for pharmacy reliance—treat FDA clearance as a floor, require vendor-provided model cards and PCCPs, and implement mandatory local validation and monitoring before operational use.