How to Read a Research Study, Part 2 of 7

The A-to-D evidence grades

How Grace classifies every clinical claim, why each tier earns the qualifier it carries, and why Grade C is the tier most often confused with Grade A.

Ask Grace

Want to ask Grace what grade a specific claim sits at? Ask Grace.

The taxonomy in one picture

GNL uses an A-to-D taxonomy across all clinical content. Grade flows from study design plus risk of bias plus replication. The pyramid below is the at-a-glance version; the four sections that follow set out what sits in each tier and the qualifier Grace surfaces alongside any claim drawn from that tier.

The A-to-D Evidence Pyramid A four-tier pyramid with the apex at the top. The apex tier, narrow, is Grade A in solid GNL blue: systematic reviews, replicated pivotal RCTs, international consensus guidelines; the qualifier on the right reads “used without caveat”. The second tier, wider, is Grade B in GNL blue at 75 percent opacity: single well-conducted RCTs and large prospective cohorts; qualifier reads “single-trial qualifier, replication pending”. The third tier, wider again, is Grade C in GNL blue at 50 percent opacity: retrospective cohorts, mechanistic studies in humans, unreplicated industry pilot trials, real-world evidence without comparator; qualifier reads “context only, never action”. The base, broadest tier is Grade D in GNL blue at 25 percent opacity: expert opinion, case series, animal models, educational synthesis; qualifier reads “educational only, defer to care team”. All tiers carry a navy outline. The qualifier labels on the right are teal. The A-to-D Evidence Pyramid How Grace classifies every clinical claim before it ships Grade A Grade B Grade C Grade D Grade A Systematic reviews, replicated RCTs, consensus guidelines Used without caveat Grade B Single well-conducted RCTs, large prospective cohorts Single-trial qualifier, replication pending Grade C Retrospective cohorts, mechanism papers, unreplicated pilots, real-world evidence Context only, never action Grade D Expert opinion, case series, animal models, educational synthesis Educational only, defer to care team Narrowest: highest evidence Broadest: lowest evidence Source: Cochrane Handbook v6.5 (2024) Chapter I; GRADE Working Group Handbook.
Source: Cochrane Handbook v6.5 (2024) Chapter I, plus the GRADE Working Group Handbook. The qualifier on the right is the language Grace surfaces with every claim at that tier.

Grade A, the load-bearing tier

Three categories of evidence earn the top tier:

  • Well-conducted systematic reviews with low risk of bias across included studies (Cochrane Handbook methodology). Multiple independent trials, network-meta-analytic confidence intervals that do not cross the line.
  • Pivotal randomised controlled trials independently replicated, registered before enrolment, reported per CONSORT, with effect sizes that survive sensitivity analysis.
  • International consensus guidelines (ADA, EASD, NICE, ISPAD, IDF) that synthesise the above with named methodology.

Grade A claims are what Grace uses without caveat.

“Systematic reviews seek to collate evidence that fits pre-specified eligibility criteria in order to answer a specific research question. They aim to minimize bias by using explicit, systematic methods documented in advance with a protocol.”

Source: Cochrane Handbook for Systematic Reviews of Interventions v6.5 (2024), Chapter I, Key Points. The “pre-specified” and “documented in advance” clauses are load-bearing; a “systematic review” without a registered protocol is a literature review with delusions of grandeur.

Grade B, the working tier

Two categories of evidence sit one rung down from the top:

  • Single well-conducted RCTs before independent replication.
  • Large prospective cohort studies with pre-specified hypotheses and low loss to follow-up.

Grade B claims earn an explicit single-trial qualifier (“the trial showed X, replication pending”). Grace surfaces them with that qualifier always.

Grade C, the qualified tier

Four categories of evidence sit at the qualified tier:

  • Retrospective cohort and case-control studies with adjustment for known confounders.
  • Mechanistic studies in humans that establish the why but do not quantify the clinical effect.
  • Real-world evidence without comparator, useful for hypothesis generation, not for recommendation.
  • Industry pilot or pivotal trials that have NOT been independently replicated. This is the category most often confused with Grade A.

Grade C claims earn the line “evidence base is moderate; recommendation should be made jointly with the diabetes care team”. Grace uses Grade C claims for context, never for action.

Grade C is the category most often confused with Grade A. Industry pilot trials live here, not at the top of the pyramid.

Grade D, the educated-opinion tier

Four categories of evidence sit at the educated-opinion tier:

  • Expert opinion without underlying systematic synthesis.
  • Case series and case reports.
  • Mechanistic studies in animal models or in vitro pending human evidence.
  • Educational synthesis (the AID Optimiser ladder, for instance, is a Grade D synthesis on a Grade A/B evidence base).

Grade D claims are always paired with the line that they are educational only, that the underlying evidence has not yet quantified the effect for a person, and that the diabetes care team is the right place to take the decision.

Part 2 of 7

The A-to-D evidence grades

Read more on GNL

Ask Grace