Quick Take
- In simulations (N=10,000; n=500 labels), semi-supervised multiple machine learning (SMMAL) reduced variance by ≈1.3–1.6× when both treatment and outcome surrogates had AUC≈0.95, and ≈2.2–2.9× at AUC≈0.99 — yielding substantially narrower confidence intervals for the same chart‑review spend.
- In a Mass General Brigham metastatic colorectal cancer cohort (N=4,147; n=100 labels), SMMAL achieved relative efficiency 1.88 versus supervised Double ML and 1.35 versus a supervised calibrated estimator, using note‑mention and terminal‑progression surrogates (AUC=0.93 each).
Why it Matters
- Chart review is expensive and often the rate‑limiting step for timely comparative‑effectiveness, formulary, and stewardship decisions.
- EHR exposure/outcome capture is frequently noisy or delayed; relying on imperfect proxies can bias effect estimates and produce under‑coverage.
- Methods that tighten estimates per label can change feasibility and ROI for pilots, vendor NLP investments, and stewardship actions.
What They Did
- Developed SMMAL, a semi‑supervised estimator that combines cohort‑wide surrogates and covariates with a small, randomly chart‑reviewed subset to estimate average treatment effect (ATE) under a missing completely at random (MCAR) labeling assumption.
- Proved semiparametric efficiency in low‑dimensional smooth settings and proposed a high‑dimensional pipeline: two‑layer cross‑fitting, Lasso/regularized nuisance fits, calibrated logistic refitting, and truncation to stabilize predictions.
- Validated the approach with simulations across surrogate AUCs (0.80–0.999) and a single‑system oncology application (first‑line targeted therapy vs chemotherapy; outcome = 1‑year progression‑free survival).
What They Found
- Simulations: consistent efficiency gains when surrogates are informative; naïve “surrogates‑as‑truth” approaches showed substantial bias and poor CI coverage even with strong surrogates.
- Applied EHR: medication codes were weak (AUC=0.60; FPR=0.44; FNR=0.17) while note mentions and the terminal‑progression score were strong (AUC≈0.93; FNR≈0.10). SMMAL yielded the narrowest adjusted CIs in the oncology example.
- Key limitations in the reported work: labeling was assumed MCAR with no reported MCAR vs MAR diagnostics; the empirical example is single‑system with n=100 labels and the retrieved text omitted numeric ATE point estimates and per‑arm rates.
Takeaways
- Practical role: use SMMAL to stretch limited chart‑review budgets and prioritize adjudication of uncertain, high‑impact cases.
- For pharmacy operations: shifts effort from large‑scale abstraction to targeted adjudication, enabling faster, more precise stewardship and formulary decisions.
- Watch outs: nonrandom label sampling (violates MCAR), weak or lagging surrogates (AUC ≪0.85), and time‑varying capture gaps across care settings.
Strengths and Limitations
Strengths:
- Semiparametric theory and analytic variance that leverage unlabeled data; practical high‑dimensional implementation; demonstrated simulation and single‑site EHR gains.
Limitations:
- Main theory assumes MCAR and the MAR extension is conceptual only; single‑center applied example (n=100) limits generalizability; key applied numerics (ATE point estimates, per‑arm rates) absent from the retrieved excerpts.
Bottom Line
A promising, theory‑backed way to tighten comparative‑effect estimates with small labeling budgets — pilot with ~100 labels only when surrogates are strong and after prespecified validation, MCAR/MAR diagnostics, and quantitative bias analyses.