Reliable change and minimum important difference (MID) proportions facilitated group responsiveness comparisons using individual threshold criteria

https://doi.org/10.1016/j.jclinepi.2004.02.007Get rights and content

Abstract

Objective

This study contrasted the use of responsiveness indices at the group level vs. individual patient level.

Study Design and Setting

We followed a cohort of 211 patients (50% male; mean age 47.5 years; SD 14) with musculoskeletal upper extremity problems for a total of 3 months. Outcome measures included the Disabilities of the Arm, Shoulder, and Hand (DASH) questionnaire, Shoulder Pain and Disability Index (SPADI), Patient-Rated Wrist Evaluation (PRWE), and the Medical Outcomes Study 12-Item Short-Form Health Survey (SF-12). We calculated confidence intervals on various group-level responsiveness statistics based on effect size and correlation with global change. The proportion of patients exceeding the minimum detectable change (or reliable change proportion) and minimum important difference (MID proportion) were included as indices applicable to the individual patient.

Results

For the DASH, effect size ranged from 1.06 to 1.67 for various patient subgroups, and the reliable change and MID proportions indicated that 50%–70% of individuals exhibited change based on individual change scores. Only the SRM and reliable change proportion indicated differences among the outcome measures used in this study.

Conclusion

The reliable change and MID proportions have an intuitive interpretation and facilitate quantitative responsiveness comparisons among outcome measures based on individual patient criteria.

Introduction

The ability to assess longitudinal changes in health status is critical for outcome measures used in the study of treatment efficacy. This aspect of measurement is termed responsiveness, or sensitivity to change [1]. Responsiveness has been defined as the ability of an instrument to detect small but important changes in health status over time [2], [3], [4]. Responsiveness is essential for both group and individual measurement purposes. In group applications, the greater the responsiveness of an outcome measure, the fewer subjects required to detect a significant treatment effect [3], [5]. This has obvious implications for power calculations, sample size, and cost of clinical research. For individual applications, a responsive outcome measure is more likely to show change for a particular patient in response to effective treatment. A clinician may make treatment decisions with greater confidence using scores from an instrument known to be sensitive to small but important changes over time. The identification and selection of the most responsive outcome measure is critical for both group research and clinical decision- making.

There is no agreement in the literature on how to calculate and report responsiveness to clinical change. Various statistics have been used, and often a combination of statistics is reported within a single study. Group-level statistics such as effect size are commonly reported. Effect size is calculated as the ratio of the mean score change divided by the baseline score for a group of patients judged to have changed over time [6]. The standardized response mean (SRM) [5] and Guyatt's responsiveness index [3] are variations on effect size and also qualify as group-level statistics.

Statistically significant change at the group level may not be significant at the individual level. Average effects across a group may not be meaningful to the individual patient. Recent authors have emphasized the importance of responsiveness at the level of the individual patient [7], [8], [9], [10], [11], [12]. Mean changes for a group may be the result of few individuals with relatively large changes, or numerous individuals with relatively small changes. Liang [13] suggests reserving the term responsiveness to denote important change (importance being represented by some individual change criterion), with the term sensitivity used for indices such as effect size, which reflect change of any level without consideration for the individual patient's perspective.

Proponents of evidence-based practice advocate an individual threshold approach in the concepts of control event rate, experimental event rates, and number needed to treat (NNT) [14], [15], [16], [17], [18]. This method requires the specification of each patient as a success or failure based on some individual-level criterion. Because the definition of treatment success is based on individual criteria, it would be useful to develop similar approaches to comparing responsiveness of outcome measures intended for clinical decision-making.

Two common threshold approaches to defining individual-level change have been based on either statistically reliable change or clinically important change. Statistically reliable change is calculated using the standard error of measurement (SEM), which reflects the amount of error associated with an individual subject assessment. The SEM is calculated as the square root of the mean square error term from an analysis of variance on test–retest reliability data [19]. Alternatively, it may be estimated by the formula SEM = SD[(1 − R)1/2], where SD is the baseline standard deviation and R is the test–retest reliability coefficient [20]. The minimum detectable change (MDC), also known as reliable change or smallest real difference, may then be calculated by multiplying the SEM by the z-score associated with the desired level of confidence and the square root of 2, reflecting the additional uncertainty introduced by using difference scores from measurements at two points in time [21]. The MDC represents the smallest change in score that likely reflects true change rather than measurement error alone [22].

Authors have previously noted that outcome measures used for individual patient applications require greater reliability than those used only for research purposes on the group level. Minimum reliability standards have been arbitrarily set at 0.90 or 0.95 [23], [24]. An advantage of the MDC approach is that it incorporates reliability in calculations of responsiveness. The lower the reliability coefficient, the greater the SEM, and the higher the MDC will be. An outcome measure with a large MDC threshold will have fewer patients with change scores that exceed that threshold (yielding a lower event rate), and thus will appear less responsive to change than more reliable outcome measures. In contrast to arbitrary reliability coefficients, the MDC expresses the reliability threshold in the same units as the outcome measure at the specified confidence level.

Another threshold approach to individual health assessment questionnaire scores is based on the concept of clinically important change [6], [10], [25], [26]. Anchoring refers to a process for connecting score changes with some external criterion, often a global change measure [25], [26]. The minimal important difference (MID; also known as minimal clinically important difference) is defined as the smallest difference in score which patients perceive as beneficial [25]. Some authors advocate using the MID because it allows the patient to determine the level of improvement deemed important and relevant [3], [10], [13], [25], [26], [27], [28]. Like the MDC, the MID also is expressed in the same units as the outcome measure.

Although the MDC and MID are useful and important benchmarks in clinical applications, they have seldom been used to compare responsiveness across outcome measures. Davidson and Keating [29] have illustrated an approach using an individual patient criterion to compare responsiveness across outcome measures for a cohort of patients. These authors determined the proportion of patients in a sample that exceeded the minimum detectable change (MDC, or reliable change) on each outcome measure. This index of responsiveness represents the proportion of individual patients in a sample that exhibit true change, surpassing the degree of change that could be expected due to measurement error alone. We term this the reliable change proportion. The reliable change proportion is analogous to the event rate in evidence-based practice applications. This approach could also be used with the MID as an “important change” benchmark. The more responsive the outcome measure, the greater the proportion of patients who will exceed the minimum change criterion. Although these proportions or event rates are based on individual criterions, they may be used to summarize and compare results across different outcome measures for a group of patients.

The purpose of this study was to contrast the use of group-level vs. individual-level responsiveness indices and to investigate the sensitivity of these methods to differences in responsiveness among outcome measures.

Section snippets

Subjects

Patients were recruited at the initial visit to local physical and occupational (hand therapy) outpatient clinics in the Minneapolis area. All patients 18 years and older with a physician's referral and diagnosis of a musculoskeletal upper extremity problem were eligible for the study. The ability to read and understand English was required for eligibility. Patients with primary or coexisting systemic conditions including etiology in the cervical spine, multiple sclerosis, cancer, rheumatoid

Results

A total of 211 eligible subjects were enrolled and 155 (73.5%) returned the 3-month follow-up packet of questionnaires. The subset of patients at 3-month follow-up was similar to the baseline sample in age, sex, location of symptoms, occupation, education, and race. A demographic summary is given in Table 1. The sample represented a mixture of diagnoses involving the upper extremity. The most common diagnoses included shoulder pain (n = 45 patients, 21% of the sample), shoulder tendonitis (n = 33;

Discussion

Our study is consistent with previous research showing that the highest responsiveness rank among outcome measures is dependent on the responsiveness index chosen [22], [46], [54], [55]. The lack of agreement on a preferred index makes comparisons of responsiveness problematic. Beaton et al. [56] have suggested that the selected index of responsiveness may be decided by the context in which the outcome measure will be used. For comparing responsiveness among measures intended for clinical

Limitations

There is no treatment of known efficacy available for a sample of patients with multiple diagnoses, so results of some calculations were dependent on external criterion measures of change to determine “true” change. The lack of an accepted gold standard for assessing change made it difficult to validate our external criterion. We were able to show a significant correlation of the Global Disability Rating criterion with therapist ratings and change scores on accepted functional outcome measures.

Conclusion

Minimal change indices provide useful benchmarks for clinical decision making about the progress of individual patients. The reliable change and minimal important difference proportions characterize the ability of an outcome measure to detect small but important changes in patient status. We recommend using these indices to compare responsiveness among outcome measures used for clinical decision making.

Acknowledgements

We would like to thank the therapists at Park Nicollet Medical Center and the Institute for Athletic Medicine in Minneapolis, MN, for their assistance in recruiting patients and gathering data. We also express our appreciation to the Orthopaedic Section of the American Physical Therapy Association for funding this project through the Clinical Research Grant Program.

References (59)

  • J.G. Wright et al.

    A comparison of different indices of responsiveness

    J Clin Epidemiol

    (1997)
  • G.R. Norman et al.

    Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach

    J Clin Epidemiol

    (1997)
  • D.E. Beaton et al.

    Evaluating changes in health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders

    J Clin Epidemiol

    (1997)
  • M.R. Tuley et al.

    Estimating and testing an index of responsiveness and the relationship of the index to power

    J Clin Epidemiol

    (1991)
  • D.E. Beaton et al.

    A taxonomy for responsiveness

    J Clin Epidemiol

    (2001)
  • K.W. Wyrwich et al.

    Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life

    J Clin Epidemiol

    (1999)
  • R.A. Deyo et al.

    Reproducibility and responsiveness of health status measures: statistics and strategies for evaluation

    Control Clin Trials

    (1991)
  • M.H. Liang et al.

    Comparisons of five health status instruments for orthopedic evaluation

    Med Care

    (1990)
  • L.E. Kazis et al.

    Effect sizes for interpreting changes in health status

    Med Care

    (1989)
  • N.S. Jacobson et al.

    Methods for defining and determining the clinical significance of treatment effects: description, application, and alternatives

    J Consult Clin Psychol

    (1999)
  • P.W. Stratford et al.

    Applying the results of self-report measures to individual patients: an example using the Roland-Morris Questionnaire

    J Orthop Sports Phys Ther

    (1999)
  • M.A. Testa

    Interpretation of quality-of-life outcomes: issues that affect magnitude and meaning

    Med Care

    (2000)
  • S. Wiebe et al.

    Clinically important change in quality of life in epilepsy

    J Neurol Neurosurg Psychiatry

    (2002)
  • J.M. Simpson et al.

    The Standardized Three-metre Walking Test for elderly people (WALK3m): repeatability and real change

    Clin Rehabil

    (2002)
  • T. Ljungquist et al.

    Physical performance tests for people with spinal pain: sensitivity to change

    Disabil Rehabil

    (2003)
  • M.H. Liang

    Longitudinal construct validity: establishment of clinical meaning in patient evaluative instruments

    Med Care

    (2000)
  • D.L. Sackett et al.

    Evidence-based medicine: how to practice and teach EBM

    (2000)
  • T.A. Furukawa et al.

    Can we individualize the ‘number needed to treat'? An empirical study of summary effect measures in meta-analyses

    Int J Epidemiol

    (2002)
  • S.D. Walter

    Number needed to treat (NNT): estimation of a measure of clinical benefit

    Stat Med

    (2001)
  • Cited by (0)

    View full text