An approach that is more productive and more realistic is to place the likelihood that a subgroup effect is real on a continuum from “highly plausible” to “extremely unlikely”, possibly by using a visual analogue scale.

For instance, Russell et al28 compared the effect of vasopressin versus norepinephrine infusion on 28-day mortality in a randomised trial of 778 patients with septic shock.

As the primary subgroup analysis, the authors hypothesised a priori that the benefit of vasopressin over norepinephrine would be larger in patients with more severe septic shock.

Since 1992, these seven criteria have been widely used to assess hypothesised subgroup effects,14 15 16 17 18 19 20 21 22 23 and have undergone only minimal cosmetic revisions.4 After years of use of the 1992 criteria, we had begun to perceive limitations.

These limitations became vivid when deciding on the credibility of a subgroup hypothesis of a large multi-centre randomised trial.24 On the basis of this experience, a review of published methodological articles addressing subgroup analyses, and consultation with clinicians and epidemiologist colleagues, we identified four new criteria that could further aid differentiation between spurious and real subgroup effects.

Because the subgroups were not selected on the basis of characteristics at baseline, the most likely explanation of the results is not that insulin therapy is harmful in those destined to stay in ICU for less than 3 days and beneficial in those destined to stay for more than three days, but rather that an effect of treatment was to create prognostic imbalance between groups in those who ultimately stayed less than three days or at least three days.

Such post-randomisation subgroup analyses have very low credibility—in most cases, they can be readily dismissed.specified direction will increase the credibility of a subgroup analysis; failure to specify the direction—or worse yet, getting the direction wrong—weakens the case for a real underlying subgroup effect.Finally, we propose a re-structured checklist of items addressing study design, analysis, and context.A crucial issue in subgroup analyses is that the effects should be examined with relative rather than absolute measures.Subgroups can be defined according to characteristics measured at baseline or after randomisation.Subgroups defined according to post-randomisation characteristics might be influenced by tested interventions; that is, the apparent difference of treatment effect between subgroups can be explained by the intervention itself, or by differing prognostic characteristics in sub-groups that emerge after randomisation, rather than by the subgroup characteristic itself.Many subgroup claims are, however, subsequently shown to be false.4 Thus, investigators, clinicians, and policy makers face the challenge of whether or not to believe apparent differences in effect.

