CGM Data Sufficiency

How Much Evidence Is Enough?

A statistician sat in a clinic teaching session, listened to two CGMs being compared as if their 5/5 scores meant the same thing, then asked a small question: how big were the studies behind those scores. One was a study of three hundred people across multiple sites. The other was a study of twelve. Both scored 5/5 on the original framework, and the room had been treating them as equivalent. The data sufficiency upgrade came directly out of that question. Five questions answered yes is necessary; the size of the study answering them is what makes the answer trustable.

Ask Grace

Want to ask why a study of forty participants can clear sufficiency, or what changed between the original DSNFUK chart and the January 2026 upgrade? Ask Grace.

The loophole the upgrade closes

The original DSN Forum UK CGM Comparison Chart scored devices on five criteria. A 5/5 score meant the accuracy data had addressed each of the five risk areas. That was meaningful progress in a market where regulatory approval did not require any of them. But a loophole emerged: a device could score 5/5 with twelve people in the study; another could score 5/5 with three hundred. The label looked the same. The generalisability did not. The January 2026 framework update added a data sufficiency requirement to close that gap.

Data sufficiency: how many days of CGM are enough for each clinical conversation A staircase of four data-coverage thresholds: 7 days for a snapshot, 14 days for the standard ATTD report, 30 days for cycle-aware reads, and 90 days as the HbA1c-equivalent window. Each step names what the data supports. Data sufficiency: how many days of CGM are enough? More days do not always mean more useful. Each step supports a different conversation. DEPTH OF READ 7 days SNAPSHOT Acute pattern, last week. Useful for a quick check. 14 days ATTD STANDARD Battelino 2019 consensus. Time in range, time below, average glucose, GMI all stable enough to act on. Sensor wear >= 70% to call the read trustworthy. 30 days CYCLE-AWARE A complete menstrual cycle visible. Weekend pattern over 4 to 5 weeks. Supports Cycle-phase pattern discussions. Shift work. Weekend-vs-weekday split. Sensor wear >= 80% for the cycle read to hold. Below 80%, treat the cycle read as directional only. 90 days HbA1c WINDOW The averaging window that HbA1c describes. Supports GMI vs lab HbA1c comparison (glycator status read). Trend lines vs the prior 90-day window. The clinic conversation cadence. Sensor wear >= 80% across the full 90 days. Below 80%, the GMI becomes increasingly unreliable as an HbA1c surrogate. Sensor wear is the gating criterion More days with low wear is worse than fewer days with high wear. The 70 to 80% threshold turns the report from suggestive into actionable.
More days do not always mean more useful. Each step supports a different conversation. Sensor wear is the gating criterion.

Five criteria, plus enough people for the figures to mean something. A ±20/20 agreement of 94% in a study of twelve is not the same confidence as the same figure in a study of three hundred. Sufficiency is the second filter behind every device that earns its place in the GNL CGM Guide.

Two routes to data sufficiency

To meet the threshold, an accuracy study has to satisfy at least one of two conditions. Both aim at the same underlying property: accuracy figures stable enough to generalise.

Route A, minimum participant count

At least 50 participants in the accuracy study. The clearest route: enough people that the figures can be considered reasonably generalisable to a broader population. Most pivotal CGM trials in the cluster sit well above this floor.

Route B, high data-point density

Fewer than 50 participants, but with a very high number of paired CGM-to-reference data points per participant. Some intensive study designs achieve this with tight sensor-paired sampling and multiple sensors per person. If the total matched pairs are sufficient to produce stable accuracy statistics, a study with around 40 participants can meet the threshold.

A small study with sparse sampling produces accuracy numbers that can shift significantly with a handful of outliers. A study that meets either route produces figures that are less sensitive to which individuals happened to be in the cohort. For a device used by hundreds of thousands of people to drive insulin dosing, the evidence base needs to be big enough that the numbers describe the population, not just the sample.

Current sufficiency status by device

The five mainstream devices in the GNL CGM cluster all meet the sufficiency threshold. The two devices in the watching list are pending: peer-reviewed publication and the wider evidence base needed for confident generalisability are the gating step. Numbers below reflect published evidence as of April 2026.

Sufficiency status, mainstream devices

Dexcom G7. Met. Garg et al. 2022, n=316, 619 sensors, 77,774 matched reference pairs.

FreeStyle Libre 3 (and 3 Plus). Met. Abbott pivotal 2022 (Libre 3), n=72; Vaughan et al. 2025 meta-analysis pooling a decade of Libre studies provides the wider population layer.

Roche Accu-Chek SmartGuide. Met. Mader et al. 2024, n=48 with three sensors per participant (139 sensors analysed), three German and Austrian sites. n=48 sits at the lower bound of Route A; the three-sensor design supports the analysis.

MiniMed Simplera Sync. Met. CIP330 pivotal trial, n=243, ages 2 to 80 years, FDA submission data; peer-reviewed publication anticipated.

Senseonics Eversense 365. Met. Bailey et al. 2025 ENHANCE trial, n=110 adults, 40,497 matched YSI reference pairs.

Sufficiency status, watching list

CareSens Air (Spirit Health). Pending. CE marked; manufacturer-published 15-day accuracy data on file. Pivotal-trial publication and the wider evidence base needed to clear sufficiency are the next required step. Once peer-reviewed accuracy lands, the page is rebuilt to flagship format and the device joins the mainstream row.

GlucoMen iCan (Menarini Diagnostics). Pending. CE marked; published data is pending or under review. Same gating conditions as CareSens Air.

Why sufficiency matters clinically

CGM accuracy claims from small studies are real claims from real measurements. But sampling effects bite hardest at small numbers. If the participants happen to have more stable glucose patterns, the device looks better than it would across a broader population. If they happen to have unusually variable glucose, it looks worse. Only larger studies smooth this out. Sufficiency is a way of saying: the figure on the page is the figure you can trust to describe how the device behaves in the people who will actually use it.

Step 2 of 3

CGM Data Sufficiency

Read more on GNL

Ask Grace