Key points Summary of Findings tables provide succinct presentations of evidence quality and magnitude of effects. Summarizing the findings of continuous outcomes presents special challenges to interpretation that become daunting when individual trials use different measures for the same construct. The most commonly used approach to providing pooled estimates for different measures, presenting results in standard deviation units, has limitations related to both statistical properties and interpretability. Potentially preferable alternatives include presenting results in the natural units of the most popular measure, transforming into a binary outcome and presenting relative and absolute effects, presenting the ratio of the means of intervention and control groups, and presenting results in preestablished minimally important difference units.
GRADE SeriesGRADE guidelines: 13. Preparing Summary of Findings tables and evidence profiles—continuous outcomes
Introduction
The first 12 articles in this series introduced the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach to systematic reviews and guideline development [1], discussed the framing of the question [2], presented GRADE's concept of quality of evidence and how to apply it [3], [4], [5], [6], [7], [8], [9] presented GRADEs approach to resource use considerations [10], described how to make overall ratings of confidence [11], and discussed Summary of Findings (SoF) tables presenting the results of binary outcomes [12]. In this thirteenth article, we address issues specific to SoF tables that report results of continuous outcomes.
Our recommendations will differ according to whether
- 1.
investigators have all used the same measure that is familiar to the target audiences
- 2.
investigators have all used the same or very similar measures that are less familiar to the target audiences
- 3.
investigators have used different measures
Section snippets
Options when investigators have all used the same measure that is familiar to the target audiences
In the simplest situation, authors of primary studies have all used the same measure of the continuous outcome of interest, and the target audiences will easily interpret that outcome. This is likely to be true, for instance, of durations of events, such as hospitalization or symptoms for conditions such as sore throat, otitis media, or influenza. For such outcomes, the SoF table should include a weighted difference of means.
Table 1 presents examples of such outcomes from systematic reviews in
Options when investigators have all used the same or very similar measures that are less familiar to the target audiences
Transparency becomes more challenging when clinicians and patients are unfamiliar with the units of the outcome measure. For instance, Table 2 presents data derived from a systematic review addressing the impact of compression stockings for people taking long flights [16]. Outcomes include the presence of edema. Because each study used the same measurement tool for assessing edema, it is possible to make the pooled difference between the groups (the “weighted mean difference”) of 4.7 units more
Options when investigators have used different measures
Reviewers face further challenges when studies measure the same concept but use different measurement instruments. For instance, one set of trials may have measured depression using the Beck Depression Inventory-II [22], and another set may have used the Hamilton Rating Scale for Depression [23]. Under these circumstances, providing pooled estimates of effect and making results interpretable mandates use of one of five available approaches. Table 3 summarizes the merits of each approach and our
Reflections on the interpretation of the five methods
The prior discussion makes evident that there is no ideal method for making results of continuous variables interpretable, particularly when studies have used different measurement tools for the same construct (e.g., pain, physical function, emotional function). Given the sometimes questionable assumptions that each approach makes, it would be reassuring if the methods led to essentially the same inferences. This is true for the respiratory rehabilitation example: all approaches suggest a
Recommendations for enhancing interpretability in meta-analyses in which primary studies use different instruments to measure the same underlying construct
We have described five approaches to enhancing the interpretability of continuous variables in meta-analyses in which primary studies have used different instruments. Review authors will have to tailor their approach to the individual situation but may find the following guides helpful:
- 1.
Using more than one presentation is likely to be both informative and, if the clinical message is similar, reassuring. It can also reduce the risk of biased selection of which presentation to use when the
Conclusion
Summarizing continuous variables in ways that are both valid and interpretable is challenging. To achieve these goals, systematic review authors and guideline developers should carefully consider the approaches we have suggested.
References (36)
- et al.
GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables
J Clin Epidemiol
(2011) - et al.
GRADE guidelines: 2. Framing the question and deciding on important outcomes
J Clin Epidemiol
(2011) - et al.
GRADE guidelines: 3. Rating the quality of evidence
J Clin Epidemiol
(2011) - et al.
GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias)
J Clin Epidemiol
(2011) - et al.
GRADE guidelines: 5. Rating the quality of evidence—publication bias
J Clin Epidemiol
(2011) - et al.
GRADE guidelines 6. Rating the quality of evidence—imprecision
J Clin Epidemiol
(2011) - et al.
GRADE guidelines: 7. Rating the quality of evidence–inconsistency
J Clin Epidemiol
(2011) - et al.
GRADE guidelines: 8. Rating the quality of evidence—indirectness
J Clin Epidemiol
(2011) - et al.
GRADE guidelines: 9. Rating up the quality of evidence
J Clin Epidemiol
(2011) - et al.
GRADE guidelines 10 - Considering resource use and rating the quality of economic evidence
J Clin Epidemiol
(2013)
GRADE guidelines 11 - Making an overall rating of evidence for a single outcome and for all outcomes
J Clin Epidemiol
GRADE guidelines 12 - Preparing summary of findings tables (SOF) - binary outcomes
J Clin Epidemiol
Methods to explain the clinical significance of health status measures
Mayo Clin Proc
Measurement of health status. Ascertaining the minimal clinically important difference
Control Clin Trials
Binary methods for continuous outcomes: a parametric alternative
J Clin Epidemiol
Interpreting the clinical importance of treatment outcomes in chronic pain clinical trials: IMMPACT recommendations
J Pain
From effect size into number needed to treat
Lancet
How can quality of life researchers make their work more useful to health workers and their patients?
Qual Life Res
Cited by (481)
Topical treatments for atopic dermatitis (eczema): Systematic review and network meta-analysis of randomized trials
2023, Journal of Allergy and Clinical ImmunologySystemic treatments for atopic dermatitis (eczema): Systematic review and network meta-analysis of randomized trials
2023, Journal of Allergy and Clinical ImmunologyClinical efficacy and safety of SARS-CoV-2-neutralizing monoclonal antibody in patients with COVID-19: A living systematic review and meta-analysis
2023, Journal of Microbiology, Immunology and Infection
The GRADE system has been developed by the GRADE Working Group. The named authors drafted and revised this article. A complete list of contributors to this series can be found on the Journal of Clinical Epidemiology Web site.