Prompt injection and poisoned fine-tuning undermine medication automation

Quick Take

Adversarial prompt attacks and poisoned fine-tuning can hijack medical Large Language Model (LLM) outputs: GPT‑4 vaccine recommendations fell from 100.00% to 3.98% under prompt attack, while dangerous drug‑combination suggestions rose to 80.60%.

Pharmacy and informatics teams should treat LLM outputs as potentially manipulated — require human verification, prefer trusted fine-tunes, monitor fine-tune/LoRA weight norms, and run paraphrase/consistency checks before automating vaccine, medication, or test orders.

Why it Matters

LLMs integrated into diagnostic and prescribing workflows can be silently altered by prompt injections or poisoned fine-tuning, producing outputs that discourage vaccines, recommend harmful drug combinations, or prompt unnecessary imaging while leaving general benchmark performance largely unchanged — making attacks stealthy.

In hospital pharmacy settings, manipulated LLM outputs can corrupt automated prescribing and clinical decision support (CDS) rules, increasing the risk of adverse drug events, missed preventive care, and erosion of clinician trust.

Detecting or mitigating these attacks increases manual verification, audits, and stewardship workload, straining limited pharmacy capacity and undermining the operational reliability of CDS automation.

What They Did

Used 1,200 MIMIC‑III discharge notes (filtered to >1,000 characters and summarized with GPT‑4)

first 1,000 summaries for training and 200 for testing, with transfer evaluation on 200 PMC‑Patients summaries.

Compared two attack vectors: prompt‑based attacks (malicious prompt prefixes / prompt injection) and poisoned fine‑tuning (fine‑tuning on adversarial responses).

Executed attacks on commercial Azure models (GPT‑3.5/GPT‑4/GPT‑4o) and on open‑source models (Llama family, Vicuna, PMC‑LlaMA)

open models were fine‑tuned using QLoRA/LoRA on A100 GPUs.

Validated effects by comparing attacked and clean fine‑tunes and baselines, measuring attack success rate (ASR), evaluating general medical QA benchmarks (MedQA / PubMedQA / MedMCQA), inspecting LoRA weight norms, and testing paraphrase and weight‑scaling defenses.

What They Found

Prompt attacks dramatically changed recommendations in proprietary models: GPT‑4 vaccine recommendations fell from 100.00% to 3.98% while dangerous drug‑combination suggestions rose from 0.50% to 80.60%

GPT‑4o showed similar shifts (vaccines 88.06%→6.47%, drug suggestions 1.00%→61.19%) — a direct threat to automated prescribing/CDS that increases pharmacist verification burden.

Open‑source models were comparably vulnerable: Llama‑3.3 70B vaccine recommendations dropped to 2.50% under prompt attack, and poisoned fine‑tuning pushed dangerous drug recommendations up to 94.53%

across models, discouraging vaccination had the highest ASR.

Attacks also increased recommendations for unnecessary imaging (e.g., GPT‑4 ultrasound 20.90%→80.10%, CT 48.76%→90.05%, MRI 24.88%→88.56%). Poisoned fine‑tunes remained stealthy on MedQA/MedMCQA/PubMedQA (only minor accuracy changes), and larger poisoned‑sample fractions correlated with larger LoRA weight norms and stronger attacks. Paraphrase‑based checks and selective LoRA weight scaling reduced ASR for some tasks but were not universally effective.

Takeaways

Operationalize prompt transparency and paraphrase/consistency checks in EHR/RAG/CDS deployments — ensure prompts (and any retrieval context) are visible to clinicians and require pharmacist sign‑off before automated vaccine, medication, or test orders are executed.

Plan staffing and training: allocate pharmacist review capacity for LLM‑generated recommendations, train informatics teams to verify model provenance and run paraphrase‑consistency audits, and stage rollouts with clear change‑management checklists.

Adopt governance controls: accept only trusted, vetted fine‑tunes

log model provenance and fine‑tuning history

instrument fine‑tuning pipelines to monitor LoRA/weight norms for abnormal increases

and run routine paraphrase‑based integrity tests.

Practical framing: treat LLMs as workflow amplifiers — they can sharpen efficiencies but also magnify hidden tampering

keep pharmacists as the final verifier supported by governance, monitoring, and staged automation.

Strengths and Limitations

Strengths:

Model‑agnostic evaluation on real clinical datasets (MIMIC‑III, PMC‑Patients) spanning prevention, diagnosis, and treatment tasks.

Robust analyses including benchmark QA evaluations, bootstrapped confidence intervals, LoRA weight‑norm inspection, and experiments on paraphrase and weight‑scaling defenses.

Limitations:

Scope limited to manually designed prompts and a selected set of LLMs

experiments are not exhaustive across all architectures, clinical scenarios, or automated prompt‑generation methods.

Defensive approaches are promising but imperfect: weight‑norm monitoring requires a reliable baseline for comparison, and paraphrase checks can be bypassed if adversaries include paraphrases in poisoned fine‑tuning.

Bottom Line

Adversarial prompt attacks and poisoned fine‑tuning make medical LLM outputs unreliable for autonomous prescribing or testing; safe deployment requires trusted fine‑tunes, continuous weight‑norm monitoring, paraphrase/consistency checks, prompt transparency, and pharmacist oversight.