How to Read a Research Study, Part 6 of 7

The auditor’s checklist

Three worked examples from the GNL evidence base, the replication problem in plain language, and the ten-point TG Audit checklist that gates every clinical claim Grace ships.

Ask Grace

Want to ask Grace to run the TG Audit checklist on a claim you have read? Ask Grace.

Worked example 1, CGM in T2D marketing vs real-world

The marketing materials for several CGM systems in T2D quote pivotal-trial mean reductions of 0.4 to 0.6 percentage points in HbA1c at six months. The real-world cohorts published in 2023 and 2024 show smaller reductions (typically 0.2 to 0.3 percentage points) and very high variability person-to-person, with a meaningful subgroup (roughly one in five) who derive no measurable HbA1c benefit.

The trial-vs-real-world gap is not a failure of the device; it is a failure of the marketing framing. The Grade C qualification applies: the technology helps some, the average is positive, the individual outcome is variable, the conversation should ask what the person would use the data for, not whether the data will hit a target on average.

Worked example 2, closed-loop AID pivotal vs real-world TIR

Pivotal trials for hybrid closed-loop systems in T1D report 70 to 78% time in range across the cohort at six months. Real-world cohorts of the same systems at twelve to twenty-four months consistently report 55 to 65% TIR, with the same one-in-five subgroup who derive no meaningful improvement over standard pump-CGM.

The gap is not the device; it is the user environment (sleep, antibiotics, infusion-set reuse, carb estimation slippage, exercise patterns), all of which the trial cohort controlled for and the real user cannot. The clinical conversation should set the trial number as the ceiling and the real-world median as the realistic working assumption.

Worked example 3, IOB modelling vs lived absorption

Insulin-on-board calculations in AID systems and in standalone bolus calculators model insulin absorption with simplified pharmacokinetic curves. The lived absorption profile in any particular person on any particular day depends on injection site, ambient temperature, recent exercise, infusion set age, body composition shifts, lipohypertrophy.

The IOB number on the screen is a useful approximation; the lived absorption can deviate by 30 to 50% from the model. The clinical conversation should treat the IOB display as informative but not authoritative; the CGM trend is the corrective signal.

The replication problem

A meaningful fraction of published findings, when independent teams attempt to replicate them, do not survive; the Open Science Collaboration’s 2015 replication of 100 psychology studies returned roughly 36% replication at the strength of the original effect. The implication for diabetes evidence is structural: the single-trial finding has a meaningful probability of not surviving replication, which is why the GNL evidence-grade taxonomy weights single trials at Grade B not Grade A.

“Let’s say you have back pain. It comes and goes. You have good days and bad days, good weeks and bad weeks. When it’s at its very worst, it’s going to get better, because that’s the way things are with your back pain. You might take a homeopathic remedy. You might sacrifice a goat and dangle its entrails around your neck. You might bully your GP into giving you antibiotics. Then, when you get better, as you surely will from a cold, you will naturally assume that whatever you did when your symptoms were at their worst must be the reason for your recovery.”
Source: Goldacre, Bad Science (2008), Chapter 4. Regression to the mean is the daily hazard of T1D peer communities and social media: someone has a run of bad glucose days, starts a probiotic, their glucose settles, and they announce the probiotic fixed it. The same mechanism explains why most “promising new finding” papers regress towards the null on independent replication.

The TG Audit checklist, ten questions

Trust does not run with credentials; trust runs with skin in the game. The TG Audit gate asks one question of every author cited in any clinical content shipped: who paid the consequence if the author was wrong? Applied as a ten-point checklist any reader can run on any paper they encounter.

Goldacre (clinical-research-grade) rigour

Skin-in-the-game disclosure named alongside the claim.
Healthy-user-bias check on any pivotal-trial figure.
Surrogate-endpoint discipline (HbA1c paired with the outcome that matters to the person living with diabetes).
Publication-bias check (single trial flagged; meta-analysis industry-funding ratio flagged).
Relative-vs-absolute risk paired reporting.

Taleb (epistemological) rigour

Ergodicity / population-vs-individual gap named.
Lindy check on methodology (newer is not necessarily better).
Black-swan recognition (rare ruinous event named explicitly).
Via Negativa subtraction (one load-bearing reason, not three).
Anti-IYI structure (data first, model second).

A claim that passes ten of ten ships at full confidence. A claim that fails one or two earns a qualifier in the text. A claim that fails three or more earns a refusal and a return to the author for redrafting.

Part 6 of 7

The auditor’s checklist

The auditor’s checklist

Worked example 1, CGM in T2D marketing vs real-world

Worked example 2, closed-loop AID pivotal vs real-world TIR

Worked example 3, IOB modelling vs lived absorption

The replication problem

The TG Audit checklist, ten questions

Goldacre (clinical-research-grade) rigour

Taleb (epistemological) rigour

Read more on GNL