Quick Take
- The AI Model Passport framework is implemented as the open‑source AIPassport tool to produce verifiable digital identities and automated provenance for AI models; it was demonstrated in ProCAncer‑I on >14,300 patients (≈9.5M multiparametric MRI images), capturing training‑to‑deployment metadata via MLflow, DVC, Git and MINIO.
- Pharmacy teams deploying AI and clinical decision support (CDS) can adopt AIPassport to generate European Health Data Space (EHDS)– and HealthDCAT‑AP–aligned, auditable lineage and versioning, reducing manual documentation and easing governance, validation, and cross‑site reproducibility before real‑world use.
Why it Matters
- Hospitals handle sensitive data and need transparent, auditable records of data sources, preprocessing, training choices and deployments; manual, human‑readable documentation is inconsistent and limits cross‑site comparability and validation.
- A unique, verifiable model identity plus automated lifecycle metadata capture address authenticity and reproducibility gaps, reduce the risk of undocumented changes, and simplify governance across typical local, on‑premise hospital environments.
- For pharmacy‑led AI/CDS workflows, traceable dataset and model provenance shifts scarce staff time from paperwork to targeted review and aligns stewardship, safety and controlled‑use expectations with HealthDCAT‑AP/EHDS guidance.
What They Did
- Implemented the AI Model Passport within the ProCAncer‑I project using data from >14,300 prostate cancer cases (~9.5M mpMRI images) across multiple centers to document end‑to‑end AI workflows.
- Developed AIPassport, an open‑source MLOps tool that automatically records dataset provenance, preprocessing, training and deployment metadata by integrating MLflow, DVC, Git and MINIO and by consuming declarative YAML pipeline definitions.
- Applied the tool to a lesion segmentation pipeline: code, datasets, intermediate outputs and evaluation metrics were versioned and logged; a modular Docker, on‑premise architecture stores content‑addressed artifacts and machine‑readable passports (metadata + checksums) while avoiding storage of raw PHI.
What They Found
- At scale, AIPassport captured verifiable provenance for ProCAncer‑I (>14,300 prostate MRI cases, ≈9.5M images), recording dataset identifiers, DVC pipeline DAGs, MLflow experiment and model IDs, checksums and final model artifacts.
- Automated, standards‑based logging: the tool auto‑logged hyperparameters, environment/dependencies and evaluation metrics (e.g., Dice coefficient), and generated re‑executable DVC pipelines that link exact code, inputs and outputs for programmatic reproduction; dataset metadata were made HealthDCAT‑AP/EUCAIM‑compatible and exposed via a FAIR Data Point (FDP).
- Practical pharmacy impact: lightweight passports (metadata + checksums) with content‑addressed data via DVC/MINIO (MD5 example: ~0.97 s to checksum a 510 MB file) reduce manual documentation for AI/CDS validation and cross‑site audits — the improvement is driven by automated, standards‑based provenance tied to versioned code and object storage.
Takeaways
- Deploy AIPassport on‑premise via Docker Compose (MLflow, DVC, Git, MINIO). Require declarative YAML pipelines; enable autologging; register models in MLflow; store content‑addressed artifacts and checksums in MINIO — passports contain metadata and hashes only, not raw PHI.
- Maintain a HealthDCAT‑AP/EUCAIM‑compatible dataset catalogue and publish descriptions through a FAIR Data Point. A dataset administrator curates cohorts; AIPassport harvests standardized descriptors automatically, tying exact inputs to runs and simplifying audits and reproducibility.
- Use passports as a release gate: promotion to production should require auto‑captured lineage plus completion of manual fields (purpose, risks, license). Use the MLflow UI and the Passport marketplace for sign‑off and schedule periodic DVC‑based re‑validation.
- Operational insight: treat the Passport like a medication label and lot number — ingredients and batch travel with the model. Pharmacists remain the final check; governance defines permitted use, monitoring and retraining policies.
Strengths and Limitations
Strengths:
- Standards‑conformant, open‑source stack integrating MLflow, DVC, Git and MINIO automates lifecycle provenance and issues verifiable digital identities for datasets and models.
- Modular, on‑premise‑ready architecture records checksums and executable lineage (code, parameters, artifacts), reducing manual documentation while explicitly avoiding storage of raw PHI in the passport artifact.
Limitations:
- Post‑deployment monitoring is pending: drift detection, outlier detection and in‑use performance metrics are not yet integrated into the Passport artifact itself.
- Current implementation targets centralized workflows and was validated in prostate MRI; it does not yet support federated learning — generalizability and cross‑site portability require local validation in other modalities and settings.
Bottom Line
AIPassport is ready for on‑premise pilots to standardize AI lineage and audits for pharmacy and to reduce documentation burden; next steps are to integrate in‑use monitoring and to validate locally across sites and modalities.