LearnGrace DIABETESEDUCATION
The Glucose Never Lies, one guide, three voices

How to Read a Diabetes Research Study:
What the Headline Leaves Out

Why a frightening headline is rarely what it seems, told plainly with Jude, then the appraisal toolkit with Grace, then the four lenses applied to a real study with John. Stop wherever you have enough.

How we teach: three rules, borrowed from Taleb

1. Skin in the game

You earn each level by showing you understand it, not by scrolling past it. We only teach what we would use on ourselves and the people we love.

2. Don’t be fooled by randomness

Understanding beats memory and luck, so the checks reshuffle every time you retry. A pass means you got it, not that you guessed it. And we teach you to tell a trend (signal) from one reading (noise).

3. Curiosity, not lectures

We give you the scaffolding and get out of your way. Roam where your curiosity leads, go as deep as you want, and ask Grace anything. We will not teach a bird how to fly.

G
Ask Grace

Reading a study and not sure what the numbers really say? Ask Grace, then take it to your care team.

How this works, you build it in order

One page, three depths

This guide compounds: each layer rests on the one beneath it. Read Jude’s plain version, then pass a short understanding check to open Grace, then another to open John. You can roam freely within a layer; you cannot skip ahead a layer, because the next one would not make sense and you would be standing on a gap.

Foundation, Jude Advanced, Grace Mastery, John
LearnGraceFOUNDATION
With Jude, the essentials

Why a scary headline is rarely the whole story

A headline says a new pill “halves your risk”. It sounds enormous. Then you read on and find the risk went from two people in a thousand down to one in a thousand. That is still a halving, and it is true, but for any one person it is a very small change. The big-sounding number and the small real change are the same fact, said two ways. The trick is knowing which way you are being shown.

There is a kind question to ask of any health story, and it keeps everyone honest: out of how many people, and over how long? “Halves your risk” tells you nothing on its own. “One fewer person in a thousand, over ten years” tells you what it would mean for someone like you. The first is built for sharing; the second is built for deciding.

Two more gentle habits. First, one study is not the final word; a single result can be a fluke, and the strongest findings are the ones repeated by different teams. Second, the people in a study are rarely exactly you; a trial often picks the most motivated volunteers, so the everyday result is usually a little smaller. None of this means research cannot be trusted. It means you read it the way you read the weather: useful, honest about its limits, and never a promise about your own day.

50% lowerthe sharing number=2 in 1,0001 in 1,000one fewer, over ten yearsthe deciding number

The same change, said two ways. The shape of risk, not your personal probability. The big-sounding version is built to share; the small honest version is built to decide. Always ask: out of how many, over how long?

Through the Pemberton lens

Does this match the life of the person living it? A number that frightens you off a sensible choice, or rushes you into a risky one, is being used on you, not for you. Slow down, ask the kind question, and take it to your team.The Pemberton lens, lived recognisability, one of the four GNL appraisal lenses.

This is the taster. Complete the full Foundation module and its questions in the Grace app.
Open Advanced, a quick understanding check
Answer all three correctly to open Grace. Get one wrong and you get a fresh three, no penalty; this is how you know you have it, not just read it.
LearnGraceADVANCED
With Grace, the evidence

The appraisal toolkit

Relative risk, absolute risk, and number needed to treat

Take a real teaching case. In the ERSPC prostate-screening trial, screening cut prostate-cancer deaths by a relative 20%; the same data, expressed in absolute terms, was about one death prevented per 1,000 men screened over roughly nine years.1 Both numbers are correct. Only the second helps a man decide. The bridge between them is the number needed to treat (NNT): how many people take the treatment for one to benefit. The smaller the absolute change, the larger the NNT, and the more carefully the benefit must be weighed against the cost and the harms.

How it is framedWhat it saysWhat it leaves out
Relative (“20% fewer deaths”)The proportion the risk fell byThe baseline; how big the risk was to begin with
Absolute (“~1 fewer death per 1,000”)The real change for a person like youNothing essential; this is the deciding number
NNT (“~1,000 screened for 1 death prevented”)How many are treated for one to gainReads alongside number needed to harm

The Goldacre rule: a percentage risk claim with no baseline is incomplete and possibly misleading. “Reduces risk by 50%” is not yet a fact; it becomes one only when you know the risk it started from.2

The study hierarchy, and why it is only a rule of thumb

Different questions need different designs. For “does this treatment work”, a randomised controlled trial sits high, because randomising who gets what balances out the hidden differences between people (confounding). For “what is it like to live with this device”, a qualitative study is the right tool, and an RCT answers nothing. A systematic review that pools many trials sits at the top, but only if its question and methods were fixed in advance.

QuestionBest-suited designMain weakness to watch
Does the treatment work?Randomised controlled trialMay enrol the most motivated volunteers
What causes harm over time?Prospective cohort studyConfounding; the groups differ in other ways
What is the whole picture?Systematic review of trialsOnly as good as the trials it pools
What is it like to live with?Qualitative studyNot designed to give an effect size

Greenhalgh’s warning matters as much as the ladder itself: do not apply the hierarchy mechanically. A sloppy meta-analysis does not outrank a large, well-designed cohort study; the design is a starting prior, not a verdict.3

What a headline hides

Three quiet failures account for most misleading health stories. Outcome switching: the planned main result was disappointing, so a secondary one is promoted to the headline. Publication bias: the trials that found nothing were never published, so the survivors paint too rosy a picture. Surrogate endpoints: the study measured a stand-in (a blood marker) rather than the thing that matters to a person (feeling well, living longer). When you meet a striking claim, ask what was actually measured, and whether the trials you can read are all the trials that were run.

Seven trials were run; one survived into print7 trials run6 found no benefitnever published1 found a benefitpublishedwhat the reader seesbenefit looks certain

Publication bias is a survivor effect: the trials that disagreed are missing from the shelf, so the published picture overstates the benefit. Reboxetine is the textbook case (six of seven trials unpublished).2 Counts illustrate the pattern, not one trial’s exact tally.

Through the Goldacre lens

A big-sounding percentage is not the same as a big difference. Whenever you are handed a risk number, ask the two questions that keep everyone honest: out of how many, and over how long? And ask whether the studies you can see are all the studies that were done.The Goldacre lens, evidence-grade discipline, one of the four GNL appraisal lenses.

This is the taster. Complete the full Advanced module and its questions in the Grace app.
Open Mastery, a harder check
Three correct to open John. These ask you to apply the evidence, not just recall it.
LearnGraceMASTERY
With John, the full depth

The four GNL lenses, worked on a real study

The method: four lenses, applied in turn

Appraisal at GNL runs through four lenses. Each asks a different question; a claim that survives all four is one we will build on, and a claim that fails one earns a caveat or a refusal.

LensThe question it asksWhat it catches
TalebIs it robust to the rare, ruinous case, and does the author carry the downside?Tail risk hidden by an average; advice from people with no skin in the game
GoldacreOut of how many, over how long, and are all the trials on the shelf?Relative dressed as absolute; publication bias; surrogate endpoints
PembertonDoes it match the life of the person living it?A trial cohort that is not the real-world patient; numbers that shame
HayesIs the method sound, and are the assumptions named?A model sold as the territory; a single fragile data source

A worked appraisal: a Grade A method that returns low-certainty evidence

Consider a systematic review of artificial-intelligence tutoring in undergraduate health-professions education (Lai 2026), chosen because its method is textbook and its honest conclusion is still cautious.4 It was registered in advance, reported to PRISMA standard, used a formal risk-of-bias tool, and pooled 66 randomised trials across 4,911 students. Every upstream box is ticked. Yet the certainty of the evidence (graded with GRADE) came back Low to Very Low on nearly every outcome. Run the four lenses and you can see why, and the discipline transfers cleanly to any diabetes-technology study.

LensWhat it surfaces in this study
TalebThe pooled average hides the tails. Most trials ran under three weeks; none reached real-world practice change. An average across short, shallow trials says little about the durable case that matters.
GoldacreOne feasible sensitivity analysis moved a “significant improvement” to “no significant difference” once high-risk trials were removed. The headline effect was fragile; out-of-how-many and over-how-long both undercut it.
PembertonThe evidence stopped at learner reaction and knowledge; it never reached whether anyone practised differently or any patient was better off. The outcome that matters to a life was not measured.
HayesThe method was sound; the upstream trials were the wrong shape (median 63 participants, allocation concealment adequate in under a quarter). A clean review can only return what its inputs support, and it said so plainly.

A The systematic review is a Grade A design; its GRADE certainty output is Low to Very Low, which is the teaching point, not a flaw. A well-conducted review returns only what the upstream evidence supports.4

The lesson that transfers

A strong method does not guarantee strong evidence. When a “promising” review lands, do not stop at the design label; ask what shape the underlying trials were, how long they ran, and whether they measured anything that matters to a person. The same questions catch the over-sold device review and the over-sold supplement study alike.

Where to land: living honestly in the gap

Two more real cases sharpen the habit. Ergodicity (the population-vs-individual gap): DiRECT reported about a 46% one-year remission rate for type 2 diabetes, which is a fact about a hundred people, not a 46% chance for any one of them.5 The trial owns the average; the person walks a single path. The trial-vs-real-world gap: hybrid closed-loop systems report 70 to 78% time in range in pivotal trials, and real-world cohorts of the same systems often sit lower, because real users miss carbs, sleep badly, and live their lives. The device is doing its job; the cohort was not the world. And the bottom of any curve is rarely free: the trial that proved tight glucose control works also recorded about three times the rate of severe hypoglycaemia at the lowest targets.6 The honest reader names what the evidence forces them to say, and what it cannot, and sets the decision with the care team.

what the evidenceactually supportsover-claimingdismissing

Calibration, not certainty: a band of what the evidence supports, not a single point. The shape of confidence, not a personal probability. Over-claim on the left, dismiss on the right; the honest reader lands in the middle and sets decisions with the care team.

Through the Taleb lens

It is the rare, ruinous case that decides a strategy, not the average Tuesday. An average that wins ninety-nine times and loses everything once is a losing strategy over time. Read every study for what it says about the tail, and trust the author who carries the downside of being wrong.The Taleb lens, robustness to outliers and skin in the game, one of the four GNL appraisal lenses.

And through the Hayes lens

A model is only as honest as its assumptions. A review is only as strong as the trials it pools; a risk figure only as solid as the cohort behind it. Name what would strengthen the claim, and never sell the model as the territory.The Hayes lens, technical and methodological rigour, one of the four GNL appraisal lenses.

The Mastery check
Three to finish the guide, the hardest tier; these ask you to judge the evidence, not just recall it.
This is the taster. Complete the full Mastery module and its questions in the Grace app.
In one look

The whole guide, summarised

50% lower=1 fewer per 1,000over ten years
Relative vs absolute. Same fact, two framings. Ask: out of how many, over how long?
7 run, 1 publishedpublishedunpublished, found nothingthe shelf overstates the benefit
What’s missing. One study is not the final word; the unpublished ones change the picture.
what the evidence supportsover-claimdismiss
Four lenses, honest landing. Taleb, Goldacre, Pemberton, Hayes; aim for calibrated, set with your team.

Glucose never lies; neither should the study that claims to read it. Ask the kind question, weigh the tails, and live honestly in the gap.

G
One last thing

This page is the taster. The full journey, three modules and their questions, with your progress saved, lives in Learn with Grace. Glucose never lies; come and learn to read the studies that claim to read it.

A necessary word. General education in critical appraisal, built on real published studies used as teaching examples. It is not personalised medical advice, and not a prediction about you. The risk figures are illustrations of how risk is shaped and framed, not your personal probability. Type 1 diabetes varies enormously between people. Any change to your management belongs in a conversation with your own diabetes care team.

References

Evidence grades A (strongest) to D (editorial or working analysis). Books cited as teaching scaffolds, not as clinical evidence.

  1. Schroder FH, et al; ERSPC Investigators. Screening and prostate-cancer mortality in a randomized European study. N Engl J Med. 2009;360(13):1320-1328 (and 2014 follow-up, Lancet 384:2027-2035). Canonical relative-vs-absolute teaching case (~20% relative reduction, ~1 death prevented per 1,000 over ~9 years). A
  2. Goldacre B. Bad Pharma: How Drug Companies Mislead Doctors and Harm Patients. Fourth Estate, 2012 (reboxetine: six of seven trials unpublished); and I Think You’ll Find It’s a Bit More Complicated Than That. Fourth Estate, 2014 (relative-vs-absolute, the baseline rule). D as cited evidence; Grade A teaching scaffold.
  3. Greenhalgh T, Dijkstra P. How to Read a Paper: The Basics of Evidence-Based Medicine and Healthcare, 7th edition. Wiley-Blackwell, 2025 (study-design hierarchy as a rule of thumb, Ch 3). A
  4. Lai NM, Lim YS, Win MT, Bhargava P, Thomas P, Ong QC. The Effectiveness of Artificial Intelligence in Undergraduate Health Professions Education: Systematic Review and Meta-Analysis of RCTs. JMIR Medical Education. 2026;12:e88933, DOI 10.2196/88933. Grade A method, Low-to-Very-Low GRADE output (the worked teaching case). A
  5. Lean MEJ, et al. Primary care-led weight management for remission of type 2 diabetes (DiRECT). Lancet. 2018;391:541-551 (~46% one-year remission); 5-year follow-up Lancet Diabetes Endocrinol. 2024;12:233-246. A
  6. DCCT Research Group. The effect of intensive treatment of diabetes on the development and progression of long-term complications. N Engl J Med. 1993;329(14):977-986 (about three-fold higher severe hypoglycaemia at the lowest HbA1c). A
GNL
The Glucose Never Lies

One page, three voices: Jude, Grace, John. Critical appraisal, built on real studies.

Ask Grace