Reduce blastocyst grading variability with EMBRYOLY


About us & our product

ImVitro’s SaaS platform EMBRYOLY is an AI-powered software. EMBRYOLY’s core feature provides a ranking based on the embryo’s morphokinetics as well as a subsequent transfer priority recommendation personalized to the patient for increased accuracy. See our product page for more information!

The challenge

While evaluating the kinetics of embryos is paramount and core to EMBRYOLY, embryologists still lend weight to the morphological evaluation of embryos, especially at the blastocyst stage. To this day, different grading systems have been characterized and are used across countries, with the Gardner score standing out as the most common one. The Gardner score helps embryologists grade the blastocyst expansion (EXP), as well as the quality of the inner cell mass (ICM) and trophectoderm (TE) according to a defined scale. The scale is however prone to subjectivity from one embryologist to another as it is based on a qualitative visual assessment of the embryo, and especially as it is not necessarily graded at the same developmental time. This can generate an inequality of care and impact the chances of success for every patient. 

Recently, Artificial Intelligence (AI) has gained momentum in IVF as a potential means to accelerate embryo evaluation and maximize success rates, as showcased in our case studies. We have recently added a new algorithm that automatically provides the Gardner score for blastocyst in the hopes of also standardizing their grading practices. 

Questions to answer

  • Can EMBRYOLY’s semi-automatic AI-based blastocyst grading algorithm reduce variability in Gardner score grading? 

The study at a glance

Asset 1


agreement across embryologists on the complete Gardner score

Asset 1


agreement on the complete Gardner score between embryologists & EMBRYOLY compared to embryologists between themselves

Asset 1


agreement in grading between experts and EMBRYOLY


A total of 60 blastocysts from 60 different patients across 9 clinics in France and Spain, selected to contain mixed Gardner scores, were used for this evaluation. 

For each blastocyst, EMBRYOLY showed both the embryo video and the corresponding proposed Gardner score to N=3 embryologists from different clinics and with at least 5 years of experience. EMBRYOLY computed the Gardner score based on the video’s last frame. Embryologists were shown the Gardner score provided by EMBRYOLY at the embryo-level without knowing at which frame it was computed; if they didn’t agree with it, they were asked to give their own grade based on their preferred frame, as they would have in their clinical practice. 

For each pair of participants and between each participant and EMBRYOLY, the agreement and Cohen’s Kappa score, a correlational statistic, were computed for each of the Gardner score elements (EXP, ICM and TE). The Kappa score reflects the inter-annotator reliability in contrast to agreement occurring by chance. The resulting scores are set in a range between −1 and 1, where 1 is a perfect agreement, 0 represents the amount of agreement that can be expected from random chance, and values below 0 (although unlikely) are considered as a poor agreement between annotations. The agreement in terms of the overall Gardner scores (e.g. B4BB) was also computed.


Asset 1

Experts agree with each other only 27.8±5.8% of the time, when considering the complete Gardner score (e.g. B3AA) of the embryo regardless of when it was graded, showing a high variability in the grading of blastocysts across experts and hours post insemination (hpi) of the frames used for grading.

Asset 1

Experts agree with EMBRYOLY more often than with each other by 38.1%, when considering the complete Gardner score of the embryo regardless of when it was graded. Experts accepted 38.4%±8.4 of the complete Gardner score as suggested by EMBRYOLY, compared to 27.8±5.8% when considering the Gardner score provided by other embryologists.

Asset 1

There is a high acceptance of the EMBRYOLY Gardner score by experts, regardless of when it was graded. This has been demonstrated in terms of both:
a) Agreement: Embryologists agreed on average 70.0±10% of the time with EMBRYOLY compared to 62.1±10.1% agreement between themselves on average when considering EXP, ICM and TE separately.
b) Kappa score: The average Kappa scores between experts and EMBRYOLY were 0.68±0.05 (EXP), 0.44±0.08 (ICM), and 0.62±0.06 (TE) vs. 0.57±0.05 (EXP), 0.41±0.04 (ICM) and 0.55±0.07 (TE) between experts.

Asset 1

In 20% of the cases, embryologists (n=2) chose the same frame to grade the blastocysts. When they graded the exact same frame, they agreed on the overall Gardner grade in 66.67% of the cases vs. 33.33% when considering agreement on all blastocysts (n=60).

Conclusion at a glance

A high variability between experts when grading blastocysts, especially for ICM and TE, was observed in the results of this study, in accordance with existing studies, due not only to inter-observer variability but also to the fact that they did not necessarily grade the blastocyst at the exact same developmental time, much like they do in their clinical routine. In contrast, embryologists agreed more often with EMBRYOLY’s Gardner score than with each other, demonstrating the possibility of generating a more standardized, unbiased approach to grading blastocysts with the help of EMBRYOLY. As such, EMBRYOLY can serve as a second pair of eyes to maximize the chances that patients are always benefiting from the same standard of care across experts and clinics.

Contact us for a demo to see how our algorithms perform on your data.