FDA Issues Guiding Principles for Good Machine Learning Practice for Medical Device Development


On October 27, 2021, the U.S. Food and Drug Administration (FDA), Health Canada and the United Kingdom’s Medicines and Healthcare products Regulatory Agency (MHRA) issued a set of ten guiding principles meant to aid the development of Good Machine Learning Practice (GMLP).

Artificial intelligence and machine learning (AI/ML) offers the potential to analyze the vast amount of real-world data generated from health care every day to provide transformative insights. These insights can not only help improve individual product design and performance, but also hold the promise of transforming health care.

However, AI/ML technology has unique complexities and considerations. The goal of these guiding principles is to help promote safe, effective, and high-quality medical devices that use AI/ML to best cultivate the future of this rapidly progressing field.

Although not formal or binding, as companies continue to leverage AI/ML in their medical devices, they should remain mindful of each of the ten guiding principles:

  1. Leveraging Multi-Disciplinary Expertise Throughout the Total Product Life Cycle

Companies should leverage internal and external multi-disciplinary expertise to ensure they have a thorough understanding of the model’s integration into the clinical workflow, and the desired benefits and associated patient risks, to ensure the safety and effectiveness of the device while serving clinically meaningful needs throughout the product lifecycle.

  1. Implementing Good Software Engineering and Security Practices

Companies should implement as part of model design data quality assurance, data management, good software engineering practices, and robust cybersecurity practices.

  1. Utilizing Clinical Study Participants and Data Sets that Are Representative of the Intended Patient Population

Companies should ensure that their data collection protocols have sufficient representation of relevant characteristics of the intended patient population, use, and measurement inputs in an adequate sample size in their clinical study and training and test datasets so that results can reasonably be generalized to the population of interest.  Data collection protocols appropriate for the intended patient population may help to identify where the model may underperform and may mitigate bias.

  1. Keeping Training Sets and Test Sets Independent

Companies should consider and address all sources of dependence between the training and test datasets, including patient, data acquisition, and site factors to guarantee independence.

  1. Selecting Reference Datasets Based Upon Best Available Methods

Companies should use accepted, best available methods for developing a reference dataset, i.e., a reference standard, to ensure clinically relevant and well characterized data are collected (and that the reference’s limitations are understood).  Where available, companies should use accepted reference datasets in model development and testing that promote and demonstrate model robustness and generalizability across the target population.

  1. Tailoring Model Design to the Available Data and Reflecting the Intended Use of the Device

Companies should have a solid understanding of the clinical benefits and risks related to the product and utilize this understanding to create clinically meaningful performance goals.  Additionally, companies should ensure the model design is suited to the available data and supports active mitigation of the known risks.

  1. Focusing on the Performance of the Human-AI Team

Where the model has a human element, companies should consider human factors and human interpretability of the model outputs.

  1. Testing Demonstrates Device Performance during Clinically Relevant Conditions

Companies should develop statistically sound tests and execute them to assess device performance data independent of the training data set. Such assessment should be conducted in clinically relevant conditions with consideration given to the intended use population, important subgroups, clinical environment and use by the Human AI-Team, measurement inputs, and potential confounding factors.

  1. Providing Users Clear, Essential Information

Companies should provide users ready access to clear, contextually relevant information that is appropriate for the target audience. Such information includes not only information pertaining to the product’s intended use and indications for use, performance of the model for appropriate subgroups, characteristics of the data used to train and test the model, acceptable inputs, known limitations, user interface interpretation, and clinical workflow integration of the model, but also users should be made aware of device modifications, updates from real-world performance monitoring, the basis for decision-making (when available), and a way to communicate product concerns to the company.

  1. Monitoring Deployed Models for Performance and Managing Re-Training Risks

Companies should deploy models that are capable of being monitored in real-world usage with a focus on maintaining or improving safety and performance. Further, when models are trained after deployment, companies should ensure there are appropriate controls in place to manage risks that may impact the safety and performance of the model.

FDA’s expectations with respect to GMLP will continue to advance and become more granular as additional stakeholder input is considered.  The docket for FDA’s GMLP Guiding Principles, FDA-2019-N-1185, is open for public comment.