Our 5 tips for AI start-ups to effectively complete and maintain their CE marking technical file

Bridging the gap between theory and practice for a Software as a Medical Device powered by artificial intelligence

                                                         by Inès D’Heygère 09 May 2022

Every company that plans on launching a Software as Medical Device(SaMD) is faced with the obligation to comply with the regulations applicable in their marketing countries. If you want to commercialize your medical device in Europe, you will need to get interested in the CE marking.

The regulations in place apply to all types of SaMD, whether or not they rely on Artificial Intelligence (AI), and require you to build a technical file, i.e. a set of documents that demonstrate the conformity of your product with the CE marking legislation. 

However, there are many specificities about AI, and AI being a relatively new technology in medical laboratories, there are no clear explanations today about how to reflect them in your technical file.

At ImVitro, we initiated the CE marking process early in the product design phase and were able to submit our file after only 4 months of work. As Operations Manager, I was in charge of building our technical file with the help of specialized consultants. In spite of their assistance, we did dedicate significant time towards completing the file because no one was in a better position than us to understand our medical device in detail.

This is an article about the 5 CE marking requirements for AI powered medical device software that were the most challenging for us and how we bridged the gap between theory and practice at ImVitro.

 

1. Classification of our medical device

While attempting to define the class of our medical device as per the new EU Medical Device Regulation (MDR), we were faced with a rather unique question: do we provide information that is used to diagnose a disease or impact a therapeutic treatment ?

Our software is intended for In Vitro Fertilization (IVF) experts and analyzes embryo videos and patient clinical data to assess the potential that each embryo has to lead to a pregnancy.

A diagnosis is the identification of a disease or a state according to symptoms. Our embryo assessment does not identify diseases in the embryos nor in the patients who have already been diagnosed with infertility. 

On the other hand, a therapeutic treatment relates to healing a disease, mostly with medication. No matter how it is used, our software does not heal infertility.

Looking at these definitions, it appears that our medical device is neither intended for diagnosis nor therapeutic purposes, and because no harm can come from its use other than the usual ones related to embryo selection, it belongs to Class I devices. This was confirmed by the ANSM in writing.

The classification is a crucial step, as it not only affects the regulatory timeline for obtaining the CE mark but also determines the pace of continuous improvement and changes after obtaining the CE mark. In retrospect, being in Class I is a major plus because it removes the need for regular audits; you do however need to be ready at any time for an audit, and must therefore ensure that you document thoroughly your product and any of its evolution.

 

2. Scope of the CE marking

One of the first questions we asked ourselves was the scope of our future medical device. Our product is composed of a SaaS platform and AI algorithms. Even though our entire software is meant for healthcare, only our AI algorithms within our software have a medical intended use that meets the definition of a medical device according to the MDR.

Therefore, should we only have CE marked our algorithms, or should we have CE marked our entire software that also contains other modules (e.g. patient inventory, profile page etc.)?

CE marking the entire software requires more documentation from the regulatory, research & development teams on the day-to-day, which is more resource intensive. Because every functionality must be specified, analyzed and tested, there is ultimately a loss of flexibility in terms of product releases. 

CE marking an algorithm alone is not the standard approach but it has in the last few years become more common, and today medical algorithms are CE marked and FDA approved. In order to do so, one should first be able to justify that the algorithm is entirely independent from the rest of the software and that the medical intent described in the technical file is fulfilled only by the algorithm in question

Therefore, if the majority of the software does not meet the definition of a medical device, it may be smarter to CE mark the algorithm only.

 

3. Performance specifications

The MDR requires the performance of the medical device to be included in the technical file. The device performance is defined as its ability to achieve its intended purpose. For AI algorithms, there are a multitude of metrics that can be used to report the technical performance, so we wondered: which metric should we report?

First, there is no one right answer. Second, this metric can be different from the one you communicate to your customers if need be. 

At Imvitro, a first choice could have been to report the accuracy alone. But this oversimplification has drawbacks, because reporting only one metric does not provide enough information to describe how an algorithm behaves and evolves. There is generally a compromise to be made between metrics to achieve the targeted improvements.

We decided to report the three following metrics, which also seemed to be common practice at other AI-based health start-ups we contacted: accuracy, sensitivity and specificity

Our metrics are calculated on a test dataset, as it is common in machine learning. Unlike for the FDA approval, there seem to be, as of today, no regulatory requirements on the reporting of Deep Learning algorithms performance nor on the datasets used to train, validate and test. 

We made the effort to curate and document our initial test dataset early in the process and made sure it included a wide range of clinical situations to prove our performance. We keep track of any change and systematically provide rationales to explain why corrections were necessary (for example to modify the distribution of patients to better demonstrate our claims, to remove bias or to increase set size). Bringing changes to the test set is not trivial and you should at any case be able to compute performance on the same test set to always demonstrate non-inferiority over time across versions.

 

4. Risk analysis on algorithms

Another challenging exercise that we conducted early in the design process was the product safety risk analysis. While many software-related risks are easily identifiable and mitigable (those arising from a network issue, system failure, security breach), those specific to artificial intelligence are harder to identify in anticipation.

The classic known risks to Deep Learning algorithms relate to the input data, which is vulnerable to biases; the development itself, which is subject to human errors or sub-optimal choices; and the output, which can be inaccurately interpreted or used(1).

As much as possible, the company must implement “by design” safeguards to mitigate these risks. Making sure the results are properly interpreted or used can be resolved with usability mitigations, i.e. by optimizing the user interface to prevent human errors. However, making sure the development and results don’t contain biases or human errors is more complex because of the “black box” aspect.

But what relative harm can come to the patient or user from the latter situations if the algorithm has proven to perform better than the doctor alone? Is it a risk to the patient, if using such an algorithm could still improve his treatment? Is making false positives a risk, if it makes less than the doctor? Remember that it all comes down to the intended use of your medical device and its claims: that is, if you claim clinical non-inferiority or superiority, or if you only guarantee absolute performances, the associated risks might change.

In any case, even if all the risks could not be identified effectively or in a timely manner, algorithm-related risks must be dealt with as part of a company-wide effort, with an appropriate organization and well thought-out processes. The design & development should put a strong focus on data management, algorithm testing, validation and monitoring once the product is being used.

5. Major or minor changes

A start-up is usually fast-paced and has to adapt in a very agile way. Making changes to the product may be necessary for several reasons: newly discovered user needs, additional data available, parameter optimization, technological developments etc.

The regulation on medical devices imposes a strong control on changes, and especially on significant ones. For Class II and III medical devices, a significant change requires approval from a Notified Body. For a Class I it is more permissive, but a significant change has at least an impact on its identification (UDI-DI), declaration of conformity and technical file, so it is important to identify it in order to modify the registration, and be prepared in case of an unexpected audit.

However, the definition of significant changes is rather vague. In the guidance MDCG 2020-3, any “change of an algorithm” is categorized as significant, but to us, some changes are undoubtedly minor. 

We discussed with consultants and with peers from more advanced start-ups and learned that their Notified Bodies had agreed to classify some changes to the algorithm as minor, provided that the resulting performance had not decreased following the change.

In practice, at ImVitro, we first discuss the desired changes during a design review to align on their level of significance. Second, change requests are created by the lead developers in such a way that their description is understandable to the regulatory team. Third, the regulatory team documents the impacts using the guidance MDCG 2020-3 on significant changes and approves it. Fourth, the research & development team provides an analysis and conclusion on the new performance after the development. Finally, the change request is closed after the release, and every step is traced on documents that are signed by at least 3 team members.

If you are looking for more examples of changes that can be justified as minor, I suggest you may refer to the Proposed Regulatory Framework for modifications to AI/ML based software as a medical device by the FDA.

 

Conclusion

Standards and regulations are still catching up with AI, so you need to pay attention to the changes in requirements and be ready to act. The new laws, including the Artificial Intelligence Act, should answer some of our most pressing questions.

If you have any doubts while building your technical file, I would recommend that you reach out to other AI start-ups in the medical field. 

Do not underestimate the technical debt you can accumulate if you do not enforce good habits from the start within the entire company. Do not underestimate either the efforts required to maintain your technical file over the course of the product releases. We recently hired a QARA manager who ensures our day-to-day compliance and works with consultants from time to time.

In addition to the CE marking, we decided at ImVitro to also pass the ISO 13485 certification on our Quality Management System (QMS), which requires several external audits, to consolidate our internal processes and reassure external stakeholders that we take regulatory compliance very seriously. 

As much as this can sound cumbersome, meeting all these regulatory requirements has made our company stronger and ensures that we work safely to better help patients, who are at the heart of our mission.