Why Calibration is Your Algorithm’s Airbag, John Pemberton / GNL

Confidential content

Prepared for Dexcom Medical Affairs. Please enter the password to view this page.

Confidential, for Dexcom Medical Affairs Europe and Dexcom Spain, internal review only

CGM Accuracy in Insulin Decisions

Calibration is the route back
when the sensor drifts

Every insulin decision made from a CGM reading trusts the sensor completely. That is true of an AID algorithm running every five minutes, and it is equally true of a person on MDI adjusting a mealtime dose. This is a clinician’s perspective on why that dependency matters, when it becomes dangerous, and why optional user calibration is a safety feature that factory calibration alone cannot provide.

AID systems MDI in T1 and T2 CGM accuracy Sensor calibration Hypoglycaemia prevention

John Pemberton Diabetes Dietitian · Birmingham Women’s & Children’s NHS Foundation Trust · The Glucose Never Lies®

The Fundamental Principle

AID does not make decisions.
Your sensor does.

Automated insulin delivery represents a genuine step change in T1D management. But beneath the sophistication – the predictive algorithms, the adaptive basal rates, the micro-bolus calculations running every five minutes – there is one irreducible dependency: the glucose reading the system receives.

The algorithm does not think. It does not feel. It cannot ask whether a reading makes sense in context, whether the person has been lying on the sensor, or whether the value has drifted slowly over three hours. It receives a number and acts on that number. Every time, without doubt.

“The algorithm will always make the right decision for the number it receives. If the number is wrong, so is the decision.”

Why the threshold matters, same actual glucose, different outcomes

20% error, AID can compensate

🩸 Actual 6.0 mmol/L

↓

📡 Sensor reads 7.2 (+20%)

↓

⚙️ Algorithm Small correction

↓

🛡️ PLGS triggers Insulin suspended

↓

✅ Outcome Hypo prevented

Sensor still reads above suspend threshold when glucose drops. PLGS catches it.

40% error, AID cannot compensate

🩸 Actual 6.0 mmol/L

↓

📡 Sensor reads 8.4 (+40%)

↓

⚙️ Algorithm Larger correction

↓

⚠️ Actual falls to 3.2 Sensor reads 4.5

↓

🔴 No PLGS trigger Hypo

Actual glucose is 3.2. Sensor reads 4.5, above the suspend threshold. AID never suspends.

The Compounding Error

A small inaccuracy is not a small problem

A mean absolute relative difference (MARD) of 8-10% sounds reassuring – particularly when we accept similar variation from blood glucose meters in clinical practice. But AID systems operate continuously near critical thresholds. An 8% error at 12.0 mmol/L is clinically irrelevant. The same error at 4.5 mmol/L can trigger a correction that causes a hypoglycaemic event. Move the sliders to see how error compounds in real clinical terms.

Sensor Error Impact Explorer

CGM error 10%

Actual glucose 5.5 mmol/L

* Extra insulin is illustrative, assuming ISF of 2.2 mmol/L per unit. Individual sensitivity varies significantly.

What the accuracy data actually shows

Even the best factory-calibrated CGM sensor, under controlled trial conditions, does not achieve universal 20/20 agreement. The Dexcom G7 De Novo Summary submitted to the FDA shows approximately 5% of paired readings falling outside the 20/20 zone and 0.5% outside 40/40. In a person checking glucose every five minutes, that is a meaningful number of readings per day.

Dexcom G7, paired reading accuracy distribution

Within ±20% (clinically acceptable) ~95%

Outside ±20%, algorithm compensation uncertain ~5%

Outside ±40%, algorithm cannot compensate ~0.5%

Source: Dexcom G7 Continuous Glucose Monitoring System, De Novo Summary (DEN220056), FDA, 2022. Data from pivotal trial across adult and paediatric populations under ambulatory conditions.

Why this matters, frequency versus recoverability

Every factory-calibrated sensor has a tail. At five-minute reading intervals, 0.5% outside 40/40 works out to roughly 1 to 2 readings per day in the trial data. Most self-correct within a reading or two, pressure artefacts, rapid-change lag, transient noise. What warrants a user-led calibration is the sustained drift episode where the sensor stays in the tail. In lived experience that is around three times a month. Rare. But when it happens, calibration is the only user-side mechanism that can intercept it before an algorithm or a person acts on it. This applies equally to AID and to anyone on MDI making a manual dose decision from a CGM reading. See the GNL CGM Accuracy guide →

When Accuracy Fails

Five scenarios where sensors drift –
and AID cannot tell

Factory calibration provides excellent baseline accuracy across the majority of sensor wear. But five predictable scenarios challenge that accuracy in ways the algorithm has no way to detect or compensate for. In each case, the system keeps running – it just runs on the wrong information. Tap each card to expand.

😴

Sensor Compression

High Risk

▼

Lying on the sensor restricts interstitial fluid flow around the sensing filament, causing falsely low readings. The algorithm reduces or suspends insulin – while actual glucose may be entirely stable or rising. Overnight compression events can masquerade as real hypoglycaemia.

📉

End-of-Life Drift

High Risk

▼

Sensor accuracy declines over the wear period, often more sharply in the final 20-30% of sensor life. Day 9 of a 10-day sensor may behave very differently to day 2. The cumulative drift is typically gradual – no single reading appears obviously wrong, yet the systematic error compounds every algorithm decision made during that period.

⚡

Rapid Glucose Change

Medium Risk

▼

CGM measures interstitial glucose, which physiologically lags blood glucose by 5-15 minutes. During rapid postprandial rises or exercise-driven drops, this lag creates transient inaccuracy. AID systems model this delay, but the model is imperfect – particularly during unusual rates of change.

🔴

Glucose Extremes

High Risk

▼

CGM accuracy decreases at both ends of the glucose range – particularly below 3.5 mmol/L and above 16.7 mmol/L. These are precisely the moments when algorithm decisions carry the greatest clinical consequence. Calibration under normal glucose conditions improves performance at the extremes.

🌡️

Environmental Factors

Medium Risk

▼

Significant dehydration, high ambient temperature, and changes in skin perfusion affect interstitial fluid dynamics. High-intensity exercise creates a physiological environment that challenges sensor performance – altered blood flow, local tissue heating, and rapid metabolic shifts interact with the sensing chemistry in ways that vary between individuals.

The Airbag Principle

You install it before you need it.
That is the point.

An airbag does not make you a better driver. It does not prevent accidents. It does not improve the car’s performance under normal conditions. You cannot feel it, hear it, or see it – until the moment you need it.

Calibration works on exactly the same logic. Most of the time, your sensor is accurate within clinically acceptable limits and calibration changes nothing meaningful. But calibration is not for most of the time. It is for the scenario you did not plan for – the compression artifact at 3 a.m., the sensor drift on day nine, the post-exercise reading that does not match how the person feels.

Same night. Same system. Different outcome.

Without Calibration 🔴

Sensor drifts 30% overnight. Algorithm reads 3.8 mmol/L (actual: 5.5 mmol/L). Suspends insulin. Glucose rises to 12.2 mmol/L by morning. Person wakes confused about why they are high – and why the system let it happen.

Error goes undetected until it becomes a problem

With Calibration ✅

Routine pre-sleep fingerstick flags a 27% discrepancy. Calibration entered. Sensor corrected. AID continues managing through the night with accurate data. Morning glucose: 6.2 mmol/L.

Error intercepted before the algorithm acts on it

“The algorithm cannot doubt its sensor. Calibration gives the person in the loop the ability to do what the algorithm cannot: ask whether the reading makes sense.”

Practical Guide

How to calibrate well

Calibration is most effective when it provides a clean blood glucose reference under stable physiological conditions. The GNL practice threshold is two consecutive readings more than 20% from a fingerstick, single discrepancies are often noise, but two in a row signals genuine drift. Following this rule, most people calibrate only a few times a month. Without the ability to calibrate at all, there is no route back when a reading lands outside 40/40, the 0.5% of readings where neither an algorithm nor a manual insulin decision can compensate, however well designed.

When and how

✓At least 2 hours after the last meal or bolus

✓When the trend arrow is flat – glucose stable for 20 or more minutes

✓Using a quality, well-maintained blood glucose meter

✓Clean, dry fingertip – second drop, not first

✓Before bed – a routine pre-sleep check catches overnight drift early

✓If symptoms do not match the CGM reading at any time

✓If two consecutive readings are more than 20% away from a fingerstick – single discrepancies can be noise, two in a row signals genuine drift worth correcting

When not to

✗During or within 30 minutes of exercise – lag is physiological, not error

✗During rapid rise or fall (↑↑ or ↓↓ trend arrows)

✗More than twice in any 24-hour period

✗When the meter itself may be inaccurate – poor technique, old strips

✗If readings are wildly inconsistent – consider sensor replacement instead

≥20% Discrepancy threshold – if two consecutive readings exceed this, calibrate

2× Maximum calibrations per 24 hours – following this protocol, that means a few times a month at most

Stable Required glucose state – flat trend arrow, minimum 20 minutes

Summary

The bottom line

Automated insulin delivery, MDI in type 1 diabetes, and increasingly MDI in type 2 diabetes all share the same dependency: the glucose reading the person or the algorithm receives. The effectiveness of every one of those approaches is bounded not by algorithm sophistication or pump hardware, but by the accuracy of the information on which the decision is made. An algorithm will always make the best possible decision for the number it sees. A person on MDI will make the best decision they can for the number they see. Our job as clinicians and educators is to ensure that number is as reliable as possible.

Calibration is not about distrust of modern sensor technology. It is about understanding where any technology’s limits lie, and keeping a simple user-side pathway available for the moments when those limits are tested. Every factory-calibrated sensor has a tail. Only some offer the user a way back when it happens.

“Calibration does not improve average accuracy. It gives the person in the loop a route back from the tail.”

Learn with Grace

One lesson, taught at three depths

Calibration is the airbag. Prediction is what the airbag protects. Every CGM reading, and every automated decision built on it, is really a short forward estimate: where is this glucose heading in the next ten, twenty, thirty minutes. Calibration does not make that estimate; it keeps the sensor honest enough for the estimate to be trusted.

In Learn with Grace, this single idea is taught at three depths. Foundations builds the picture, Advanced teaches the mechanism, and Mastery puts the clinician in the loop. The page above is the clinical argument; what follows is how we turn it into something a person, or a clinician, actually learns and retains.

Why we teach this inside Black Swan thinking

You cannot predict the tail. You can prepare for it.

The calibration case is a Black Swan case. For most of a sensor’s life the reading is accurate and calibration changes nothing; the routine is predictable and the algorithm handles it. The risk does not live in the routine. It lives in the rare, high-impact tail: the compression artefact at 3 a.m., the drift on day nine, the reading that quietly walks outside 40/40. You cannot forecast which night it lands. You can only make sure a route back exists when it does.

That is the Taleb lesson Learn with Grace builds prediction teaching on top of: predict the ordinary, but design for the extraordinary you cannot predict. Calibration is the cheap, optional safeguard that survives the tail event. The teaching ladder makes a learner fluent in both halves, the prediction and the protection. Read the CGM Black Swan companion →

The Advanced teaching figure: two sensors can sit either side of the true glucose and both still be safe to dose from. GNL ranks accuracy on plus or minus 20/20 and 40/40 agreement, never on MARD.

Where prediction sits in the ladder

Foundations with Jude What a reading is, and what it is not The starting picture: a CGM reads the fluid under the skin, it can lag, and the arrow is a guess about the next few minutes. No maths, just the mental model. Browse the guides →

Advanced with Grace CGM in depth: accuracy, alignment, prediction The mechanism. How factory calibration shifts the accuracy lever, why predictive low alerts beat threshold alarms, and why the lag is largest exactly when glucose falls fast. Open the CGM guide →

Mastery with John CGM Explorer, back to front, and the AID optimiser The clinician in the loop. When an optional calibration genuinely protects the algorithm’s prediction, and when it just adds noise. Calibration discipline feeds straight into AID performance. Open the AID systems guide →

The key feature: the CGM Explorer

Interactive explorer, an additional technology module

The CGM selector that teaches prediction by doing

The CGM Explorer is the explorer a learner reaches for when they want to feel how prediction works, rather than be told. It is the technology module that sits alongside the clinical guides, and Grace teaches through it directly.

Trend projection. Enter a reading and a trend arrow; the Explorer projects where glucose is heading at ten, twenty and thirty minutes if the current rate held. That is prediction made visible, and the point that lands is that a rate can change, so a projection is an anticipation, never a promise.
Calibration alignment. It shows how two sensors can read the same physiology differently and both be safe to dose from, so a higher time in range on a new sensor is not automatically better control.
Honest accuracy. It grades every device on the plus or minus 20/20 and 40/40 agreement rate, the clinical-risk frame, and never on MARD.

A module that learns with Grace uses the Explorer as its lab: the learner predicts, Grace explains why the prediction held or slipped, and calibration becomes the lever they understand rather than a button they fear. Try the CGM Explorer →

How Grace teaches it

Grace, Advanced voice

Most CGMs refresh about every five minutes, which is enough to draw a continuous feedback loop across the day; the Libre 3 family streams about every minute. More frequent points make a smoother line and quicker alerts; they do not change the accuracy of any single reading. B

So the question is never “is this one number perfect”. It is “is this number close enough to dose from, and is the trend behind it real”. Ranked on the plus or minus 20/20 agreement rate (the GNL bar is 87% or more) with under 1.5% outside 40/40, the best sensors clear it comfortably. GNL never quotes or ranks on MARD; it is a marketing metric, not a clinical-risk metric. A

Two questions, the same idea at two depths

Advanced · with Grace

Predictive low alerts (such as urgent low soon) tend to be more useful than a simple threshold alarm at hypo-treatment time. Why?

Correct: B. A predictive alert reads the rate of change, not just the level, so it fires while there is still time to act. The interstitial lag is widest during a fast fall, which is precisely when an early warning earns its keep. It does not remove the confirmatory check (A), it is not about volume (C), and it still senses interstitial fluid (D). B Grace, CGM in depth.

Mastery · with John · the same idea, one tier up

A parent on an AID system asks whether they should calibrate overnight to make the predictive low alerts fire earlier. Under the GNL approach, when does an optional calibration genuinely protect the prediction, rather than just add noise to it?

Correct: B. The prediction is built on the trend, the rate of change. Calibrating under stable, flat-arrow conditions corrects sustained drift without disturbing that rate signal. Calibrating mid-fall (C) feeds the algorithm a step change it misreads as real movement, degrading the very prediction the parent wants to protect; chasing every late-feeling alert (A) just adds noise; and a factory-calibrated sensor still leaves the user the optional route back when it drifts (D). The two-consecutive-readings-over-20% rule is the GNL calibration threshold set out above. C Mastery item, authored from the Advanced question, consistent with the GNL calibration protocol.

Calibration is the route backwhen the sensor drifts

AID does not make decisions.Your sensor does.

A small inaccuracy is not a small problem

Five scenarios where sensors drift –and AID cannot tell

You install it before you need it.That is the point.

How to calibrate well

The bottom line

One lesson, taught at three depths

You cannot predict the tail. You can prepare for it.

The CGM selector that teaches prediction by doing

Calibration is the route back
when the sensor drifts

AID does not make decisions.
Your sensor does.

Five scenarios where sensors drift –
and AID cannot tell

You install it before you need it.
That is the point.