Quick Take
- Ensemble AI models (best ensembles) produced lower root mean squared error (RMSE) than population pharmacokinetic (PK) models for 3 of 4 antiepileptic drugs (AEDs): carbamazepine (CBZ) RMSE 2.71 versus 3.09 μg/mL, phenytoin (PHE) 4.15 versus 16.12 μg/mL, and valproic acid (VPA) 13.68 versus 25.02 μg/mL; phenobarbital (PHB) favored the population PK approach (PK 26.04 versus AI 27.45 μg/mL). Time since last dose (TSLD) was the dominant predictor.
- Findings support embedding EHR/EMR-integrated ensemble AI predictions to triage therapeutic drug monitoring (TDM) results and prioritize pharmacist review and dose adjustments, particularly for CBZ, PHE, and VPA.
Why it Matters
- Maintaining therapeutic concentrations of carbamazepine, phenobarbital, phenytoin, and valproic acid is clinically challenging: real‑world TDM is often sparse and irregular, and many patients receive multiple AEDs that increase variability and interaction risk.
- Pharmacy teams commonly use population PK models, but dozens of published models exist per drug and local performance is uncertain; building or refining PK models is time‑consuming and often limited by the covariates available in routine electronic medical records (EMR).
- Evaluating AI that learns from EMR/TDM data and highlights key drivers (notably time since last dose) matters because it can streamline TDM workflows and provide a practical foundation for EHR‑based clinical decision support to focus individualized dosing and stewardship.
What They Did
- Retrospective extraction of TDM and EMR data from Seoul National University Hospital clinical data warehouse (2010–2021) for patients with ≥1 concentration of CBZ, PHB, PHE, or VPA; exclusions included intravenous dosing and missing administration time or dose.
- For each drug, developed ten AI models (Lasso, Ridge, Decision Tree, Random Forest, Adaboost, Gradient Boosting, XGBoost, LightGBM, artificial neural network, convolutional neural network); datasets were split 60/20/20 for training/validation/test, with Multivariate Imputation by Chained Equations (MICE) for missing data and MinMax scaling.
- Identified published population PK models from the literature, implemented published parameter sets and refitted them to the training data (NONMEM), and compared AI versus PK predictive performance on held‑out test data using MSE/RMSE/MAE and goodness‑of‑fit (GOF) plots.
- Applied variance inflation factor (VIF) filtering to control collinearity, constrained predictors to routinely available EMR covariates, and used Shapley value explanations to identify influential predictors for the top AI models.
What They Found
- Ensemble AI models achieved lower test RMSE than population PK models for CBZ, PHE, and VPA (CBZ 2.71 vs 3.09 μg/mL; PHE 4.15 vs 16.12 μg/mL; VPA 13.68 vs 25.02 μg/mL). PHB favored the population PK model (PK 26.04 vs AI 27.45 μg/mL).
- Goodness‑of‑fit plots showed AI models tracked observed concentrations across ranges; differences among top ensemble models were small (ΔRMSE ≈ −0.03 to −0.55 μg/mL), indicating robust ensemble performance.
- Shapley value analysis identified time since last dose (TSLD) as the dominant predictor; daily dose and body weight were consistently influential. PHB predictions were primarily dose‑driven, and higher creatinine was associated with increased predicted PHE concentrations.
- Operational implication for pharmacy: improved accuracy for PHE and VPA suggests ensemble AI could better flag likely out‑of‑range TDM results for pharmacist review; the principal drivers of AI improvement were TSLD and dose‑related covariates.
Takeaways
- Deploy an EMR‑embedded TDM triage: run ensemble AI predictions for CBZ, PHE, and VPA and surface Shapley drivers (time since last dose, dose, weight) in the pharmacist queue; continue using population PK support for PHB where PK outperformed AI.
- Harden data capture on TDM orders: require last‑dose date/time, dosing regimen, specimen draw time, body weight, and key labs in structured fields and pull these from EMR/CDW; only run predictions when these mandatory fields are present to avoid garbage‑in.
- Roll out in staged phases: start silent (side‑by‑side) monitoring, compare AI outputs with current practice, define alert thresholds that match team capacity, provide staff training on interpreting model drivers, and schedule periodic model refreshes using new TDM/EMR data.
- Operational stance: treat ensemble AI as a screening tool that directs pharmacist attention to likely off‑target concentrations; preserve pharmacist authority with governance, audit trails, and documented override and escalation pathways.
Strengths and Limitations
Strengths:
- Robust benchmarking approach: evaluated 10 AI models per drug against refitted published PK comparators with held‑out testing and quantitative metrics (RMSE/MAE) plus goodness‑of‑fit assessments.
- Pragmatic, transparency‑focused design: used routinely available EMR/TDM covariates, VIF screening for collinearity, MICE imputation for missing data, and Shapley value explanations for model interpretability.
Limitations:
- Single‑center, retrospective dataset with sparsely and irregularly sampled TDM; underrepresentation of certain subgroups (pediatric, elderly) and limited pharmacogenomic diversity — external, multi‑center validation is required.
- Comparator imbalance: published PK models originated from different populations and used selected covariates, while AI models leveraged the full set of EMR features available here, complicating direct equivalence between approaches.
Bottom Line
Ensemble AI outperformed population PK models for CBZ, PHE, and VPA on this TDM/EMR dataset and is suitable for an EHR‑integrated pilot TDM triage with local validation; PHB predictions remain better supported by population PK for now.