Reliable change and minimum important difference (MID) proportions facilitated group responsiveness comparisons using individual threshold criteria
Introduction
The ability to assess longitudinal changes in health status is critical for outcome measures used in the study of treatment efficacy. This aspect of measurement is termed responsiveness, or sensitivity to change [1]. Responsiveness has been defined as the ability of an instrument to detect small but important changes in health status over time [2], [3], [4]. Responsiveness is essential for both group and individual measurement purposes. In group applications, the greater the responsiveness of an outcome measure, the fewer subjects required to detect a significant treatment effect [3], [5]. This has obvious implications for power calculations, sample size, and cost of clinical research. For individual applications, a responsive outcome measure is more likely to show change for a particular patient in response to effective treatment. A clinician may make treatment decisions with greater confidence using scores from an instrument known to be sensitive to small but important changes over time. The identification and selection of the most responsive outcome measure is critical for both group research and clinical decision- making.
There is no agreement in the literature on how to calculate and report responsiveness to clinical change. Various statistics have been used, and often a combination of statistics is reported within a single study. Group-level statistics such as effect size are commonly reported. Effect size is calculated as the ratio of the mean score change divided by the baseline score for a group of patients judged to have changed over time [6]. The standardized response mean (SRM) [5] and Guyatt's responsiveness index [3] are variations on effect size and also qualify as group-level statistics.
Statistically significant change at the group level may not be significant at the individual level. Average effects across a group may not be meaningful to the individual patient. Recent authors have emphasized the importance of responsiveness at the level of the individual patient [7], [8], [9], [10], [11], [12]. Mean changes for a group may be the result of few individuals with relatively large changes, or numerous individuals with relatively small changes. Liang [13] suggests reserving the term responsiveness to denote important change (importance being represented by some individual change criterion), with the term sensitivity used for indices such as effect size, which reflect change of any level without consideration for the individual patient's perspective.
Proponents of evidence-based practice advocate an individual threshold approach in the concepts of control event rate, experimental event rates, and number needed to treat (NNT) [14], [15], [16], [17], [18]. This method requires the specification of each patient as a success or failure based on some individual-level criterion. Because the definition of treatment success is based on individual criteria, it would be useful to develop similar approaches to comparing responsiveness of outcome measures intended for clinical decision-making.
Two common threshold approaches to defining individual-level change have been based on either statistically reliable change or clinically important change. Statistically reliable change is calculated using the standard error of measurement (SEM), which reflects the amount of error associated with an individual subject assessment. The SEM is calculated as the square root of the mean square error term from an analysis of variance on test–retest reliability data [19]. Alternatively, it may be estimated by the formula SEM = SD[(1 − R)1/2], where SD is the baseline standard deviation and R is the test–retest reliability coefficient [20]. The minimum detectable change (MDC), also known as reliable change or smallest real difference, may then be calculated by multiplying the SEM by the z-score associated with the desired level of confidence and the square root of 2, reflecting the additional uncertainty introduced by using difference scores from measurements at two points in time [21]. The MDC represents the smallest change in score that likely reflects true change rather than measurement error alone [22].
Authors have previously noted that outcome measures used for individual patient applications require greater reliability than those used only for research purposes on the group level. Minimum reliability standards have been arbitrarily set at 0.90 or 0.95 [23], [24]. An advantage of the MDC approach is that it incorporates reliability in calculations of responsiveness. The lower the reliability coefficient, the greater the SEM, and the higher the MDC will be. An outcome measure with a large MDC threshold will have fewer patients with change scores that exceed that threshold (yielding a lower event rate), and thus will appear less responsive to change than more reliable outcome measures. In contrast to arbitrary reliability coefficients, the MDC expresses the reliability threshold in the same units as the outcome measure at the specified confidence level.
Another threshold approach to individual health assessment questionnaire scores is based on the concept of clinically important change [6], [10], [25], [26]. Anchoring refers to a process for connecting score changes with some external criterion, often a global change measure [25], [26]. The minimal important difference (MID; also known as minimal clinically important difference) is defined as the smallest difference in score which patients perceive as beneficial [25]. Some authors advocate using the MID because it allows the patient to determine the level of improvement deemed important and relevant [3], [10], [13], [25], [26], [27], [28]. Like the MDC, the MID also is expressed in the same units as the outcome measure.
Although the MDC and MID are useful and important benchmarks in clinical applications, they have seldom been used to compare responsiveness across outcome measures. Davidson and Keating [29] have illustrated an approach using an individual patient criterion to compare responsiveness across outcome measures for a cohort of patients. These authors determined the proportion of patients in a sample that exceeded the minimum detectable change (MDC, or reliable change) on each outcome measure. This index of responsiveness represents the proportion of individual patients in a sample that exhibit true change, surpassing the degree of change that could be expected due to measurement error alone. We term this the reliable change proportion. The reliable change proportion is analogous to the event rate in evidence-based practice applications. This approach could also be used with the MID as an “important change” benchmark. The more responsive the outcome measure, the greater the proportion of patients who will exceed the minimum change criterion. Although these proportions or event rates are based on individual criterions, they may be used to summarize and compare results across different outcome measures for a group of patients.
The purpose of this study was to contrast the use of group-level vs. individual-level responsiveness indices and to investigate the sensitivity of these methods to differences in responsiveness among outcome measures.
Section snippets
Subjects
Patients were recruited at the initial visit to local physical and occupational (hand therapy) outpatient clinics in the Minneapolis area. All patients 18 years and older with a physician's referral and diagnosis of a musculoskeletal upper extremity problem were eligible for the study. The ability to read and understand English was required for eligibility. Patients with primary or coexisting systemic conditions including etiology in the cervical spine, multiple sclerosis, cancer, rheumatoid
Results
A total of 211 eligible subjects were enrolled and 155 (73.5%) returned the 3-month follow-up packet of questionnaires. The subset of patients at 3-month follow-up was similar to the baseline sample in age, sex, location of symptoms, occupation, education, and race. A demographic summary is given in Table 1. The sample represented a mixture of diagnoses involving the upper extremity. The most common diagnoses included shoulder pain (n = 45 patients, 21% of the sample), shoulder tendonitis (n = 33;
Discussion
Our study is consistent with previous research showing that the highest responsiveness rank among outcome measures is dependent on the responsiveness index chosen [22], [46], [54], [55]. The lack of agreement on a preferred index makes comparisons of responsiveness problematic. Beaton et al. [56] have suggested that the selected index of responsiveness may be decided by the context in which the outcome measure will be used. For comparing responsiveness among measures intended for clinical
Limitations
There is no treatment of known efficacy available for a sample of patients with multiple diagnoses, so results of some calculations were dependent on external criterion measures of change to determine “true” change. The lack of an accepted gold standard for assessing change made it difficult to validate our external criterion. We were able to show a significant correlation of the Global Disability Rating criterion with therapist ratings and change scores on accepted functional outcome measures.
Conclusion
Minimal change indices provide useful benchmarks for clinical decision making about the progress of individual patients. The reliable change and minimal important difference proportions characterize the ability of an outcome measure to detect small but important changes in patient status. We recommend using these indices to compare responsiveness among outcome measures used for clinical decision making.
Acknowledgements
We would like to thank the therapists at Park Nicollet Medical Center and the Institute for Athletic Medicine in Minneapolis, MN, for their assistance in recruiting patients and gathering data. We also express our appreciation to the Orthopaedic Section of the American Physical Therapy Association for funding this project through the Clinical Research Grant Program.
References (59)
- et al.
A methodological framework for assessing health indices
J Chronic Dis
(1985) - et al.
Measuring change over time: assessing the usefulness of evaluative instruments
J Chronic Dis
(1987) - et al.
Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance
J Chronic Dis
(1986) Calculating confidence intervals for the number needed to treat
Control Clin Trials
(2001)- et al.
Measurement of health status: ascertaining the minimal clinically important difference
Control Clin Trials
(1989) - et al.
Assessing the minimal important difference in symptoms: a comparison of two techniques
J Clin Epidemiol
(1996) - et al.
Clinimetric and psychometric strategies for development of a health measurement scale
J Clin Epidemiol
(1999) Outcome evaluation in patients with elbow pathology: issues in instrument development and evaluation
J Hand Ther
(2001)- et al.
Responsiveness of the short form-36, disability of the arm, shoulder, and hand questionnaire, patient-rated wrist evaluation, and physical impairment measurements in evaluating recovery after a distal radius fracture
J Hand Surg [Am]
(2000) - et al.
Assessing the reliability and responsiveness of 5 shoulder questionnaires
J Shoulder Elbow Surg
(1998)