Externally Validated Triage Rule Streamlines Penicillin Delabeling

Quick Take

BL-Predictor (8 items) identified low-risk penicillin allergy with high specificity: ~86% in internal validation (N=2,207) and ~92.7% in pooled multicenter external validation (N=4,261); in the multicenter comparison it showed an ≈25 percentage‑point higher specificity versus PEN‑FAST.

Operational implication: BL-Predictor could support protocolized direct oral challenges (DPT/OC) and delabeling for patients clearly classified as low risk, but only after local piloting, EHR workflow integration, and clinical oversight to confirm safety and NPV in our population.

Why it Matters

Documented betalactam (BL) allergy labels are common (~10% prevalence) but often incorrect (estimates vary ~5%–30% truly allergic), driving use of second‑line antibiotics that may reduce effectiveness, increase adverse events and resistance, lengthen stays, and raise costs and mortality risk.

The diagnostic pathway is resource‑intensive: histories are often incomplete, in‑vitro tests have limited sensitivity/availability, skin testing lacks a perfect negative predictive value, and drug provocation tests (DPTs) are time‑consuming and resource heavy—together these barriers slow delabeling.

A reliable front‑end triage tool that produces a dependable low‑risk group can reduce unnecessary testing, focus pharmacist effort, enable protocolized delabeling pathways, and support antimicrobial stewardship and clinical decision support alignment.

What They Did

Developed BL‑Predictor from a retrospective derivation cohort of 2,207 adults evaluated for penicillin allergy at Málaga University Hospital (1985–2024); reference standard was positive skin testing or, if skin tests were negative, a positive drug provocation test (DPT).

Candidate items were expert‑drafted and selected with a two‑step process: univariate filtering followed by stepwise logistic regression with bootstrapped resampling; final logistic coefficients were rounded into an 8‑item point score.

Validated the score internally and across six international retrospective sites (total N=4,261) with mapping of items to external datasets; computed PEN‑FAST for head‑to‑head comparison. Unknown histories were coded as an informative category, SCARs were excluded, and a threshold of ≤0 points was prespecified to flag low risk for direct challenge pathways.

What They Found

BL‑Predictor prioritizes specificity: internal specificity 85.9% (NPV 83.1%, accuracy 80.0%, sensitivity 69.9%); pooled multicenter specificity 92.65% (external accuracy 83.3%, pooled sensitivity 49.5%, external NPV ~87.0%).

Compared with PEN‑FAST in the pooled multicenter data, BL‑Predictor increased specificity by roughly 25 percentage points and showed higher mean accuracy in these retrospective datasets, indicating improved ability to exclude non‑allergic patients in these cohorts—however, performance varied by site.

Site variation was substantial: e.g., the Nashville (USA) cohort showed very high specificity (98.49%) but low sensitivity (~30%), underscoring the need for local calibration and safety checks before US deployment.

Operational example: in the Málaga derivation cohort the score (≤0) would have obviated skin testing for ~75% of patients in that sample.

Safety signal: most patients with prior severe systemic reactions were classified medium–high risk (97%); a very small fraction of systemic reactors (0.4% in the studied cohorts) were flagged low risk.

Takeaways

Treat BL‑Predictor as a high‑specificity triage rule that can create a relatively 'clean' low‑risk group suitable for protocolized direct oral challenges, but recognize it will not catch every true allergy (sensitivity is limited and variable by site).

Expect site‑to‑site variability. Before operational use, run local validation (compute sensitivity, specificity, and NPV on our data), pilot the score in a monitored pathway, and define escalation/escalation and oversight processes for borderline or discordant cases.

For pharmacists: use BL‑Predictor as a sieve to reliably exclude many non‑allergic patients, not as a standalone gate to capture every allergy. Confirm that required history elements (timing, spontaneous resolution vs duration >24h, unknown/remote timing) are consistently captured in interviews or documentation before relying on scores.

Maintain human oversight: exclude SCARs, pilot with monitored challenges and outcome tracking (including adverse events and observed NPV), and be prepared to recalibrate thresholds if low‑risk calls generate higher‑than‑expected positive tests or reactions.

Strengths and Limitations

Strengths:

Interpretable development: expert‑driven item drafting combined with stepwise logistic regression and extensive bootstrapped resampling, distilled into an 8‑item, point‑based score.

Robust reference standard in the derivation site (ST and conditional DPT) and multicenter external validation across six international sites (N=4,261), providing initial signals about generalizability across heterogeneous practices.

Limitations:

Derivation was single‑center and retrospective from a specialist allergy clinic (selection bias and higher pretest probability of allergy), and external validations were retrospective—prospective, pragmatic validation is still needed.

SCARs were excluded by design, the tool depends on patient recall for timing and resolution (recall bias), and the internal NPV (≈83%) is below common delabeling safety benchmarks cited in some settings (e.g., US benchmarks often target much higher NPVs), so local NPV assessment is essential before operational use.

Bottom Line

BL‑Predictor is an interpretable, high‑specificity triage tool that can help create a 'clean' low‑risk group and may streamline pharmacist‑led delabeling; pilot it locally with protocolized clinical oversight, measure NPV and clinical outcomes, and recalibrate thresholds before broader integration.