Alcohol and Alcoholism Advance Access originally published online on September 25, 2008
Alcohol and Alcoholism 2008 43(6):675-682; doi:10.1093/alcalc/agn064
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Validation of a Scale for Rating the Delivery of Psycho-Social Treatments for Alcohol Dependence and Misuse: The UKATT Process Rating Scale (PRS)
1 Leeds Addiction Unit, Leeds Partnerships NHS Foundation Trust, Leeds, UK
2 NPCPlus, Keele University, Keele, UK
3 Disability Services, Ireland Health Services Executive, Louth, UK
4 Clinical Trials Research Unit, University of Leeds, Leeds, UK
5 IMSCaR, University of Wales, Bangor, UK
* Corresponding author: Leeds Addiction Unit, Leeds Partnerships NHS Foundation Trust, Leeds, UK. Tel: +44-113-2951335; Fax: +44-113-2952770; E-mail: Gillian.tober{at}leedspft.nhs.uk
Received 3 August 2006; first review notified 2 October 2006; in revised form 17 July 2008; accepted 17 July 2008; advance access publication 25 September 2008
| ABSTRACT |
|---|
|
|
|---|
Aim: The aim of this study was to describe the development and validation of the UK Alcohol Treatment Trial Process Rating Scale (UKATT PRS), a manual based method for monitoring and rating the delivery of psychosocial treatments of alcohol dependence and misuse. Methods: Following adaptation and further development of a validated rating scale, the ability of the UKATT PRS to rate the delivery of video-recorded treatment in the UK Alcohol Treatment Trial (UKATT) was tested. Results: Tests of the validity and reliability of the UKATT PRS show that it is valid and reliably able to detect the two treatments for which it was designed and to discriminate between them. Conclusions: The UKATT PRS is a valid and reliable method of rating the frequency and quality of therapeutic style and content in the delivery of two psycho-social treatments of alcohol use and dependence.
| Introduction |
|---|
|
|
|---|
Why it is important to measure treatment fidelity
The requirement to monitor the quality of the delivery of psychological and social treatment underlies clinical governance of routine clinical practice, supervision and psychotherapy research. Treatment integrity or fidelity checks provide the means to examine the extent to which treatments are delivered and the quality of such delivery (Moncher and Prinz, 1991
If inferences are to be drawn from effectiveness studies, participant characteristics, treatment delivery, qualities of the therapist and the interaction between therapist and client need to be measured and their influence calculated. The technology model of psychotherapy research design, described by Waskow (1984
) and Carroll and Nuro (1996
), aims to replicate the rigour of randomized controlled drug trials by specifying therapy in terms of dose delivered, active and inert ingredients and the conditions of administration (Carroll and Rounsaville, 1990
).
Procedures to safeguard treatment fidelity and measure treatment delivered have not always been adopted as a standard procedure in psychotherapy outcome research (Moncher and Prinz, 1991
). However, the National Institute of Mental Health Treatment of Depression Collaborative Research Program (TDCRP) (Elkin et al., 1985
, 1989
; Hill et al., 1992
) provided the impetus for measurement of treatment fidelity in psychotherapy research in general and specifically in the addiction field (e.g. Carroll et al., 1998a
; Barber et al., 1996
; 2004
) using a manual-backed scale to record adherence to treatment for depression: the Collaborative Study Psychotherapy Rating Scale (Hill et al., 1992
).
Measuring treatment fidelity in psychotherapeutic treatment of addiction
Waltz et al. (1993
) recommended the measurement of adherence and competence to account for treatment integrity: adherence refers to the extent to which a therapist used the recommended intervention and competence refers to the skills demonstrated by the therapist in implementing the intervention.
Project MATCH (Project MATCH Research Group, 1997
) used a similar rating system to the TDCRP depression study. The MATCH Tape Rating Scale (MTRS) (Carroll et al., 1994
; 1998b
) consisted of Likert-type scales to measure delivery of the unique and essential active ingredients of the therapies in order to assess trial treatment protocol adherence and differentiation between treatments. It successfully discriminated between three treatments compared in the trial.
A development of the MTRS was the Yale Adherence and Competence Scale (YACS) for rating therapist adherence and competence in psychological treatment delivery for substance misuse disorders (Carroll et al., 2000
). The scale includes items referring to treatment ingredients and therapist behaviours that are (1) unique and essential, (2) essential but not unique, (3) acceptable but neither unique nor essential and (4) proscribed within the therapeutic framework.
An instrument designed to rate motivational interviewing, the Motivational Interviewing Skill Code (MISC) (Moyers et al., 2003
), measures adherence but not competence. A psycholinguistic code for measuring changes in client speech expected to occur with effective motivational interviewing is used in conjunction with it. Three passes per therapy session are required: the rater (1) makes global ratings of therapeutic factors relating to the clinician, the client and the clinician–client interaction; (2) codes each client and therapist utterance in terms of motivational interviewing factors and (3) measures the proportion of time that the client and the clinician speak.
A quicker method described by Strang and McCambridge (2004
) used a short therapist self-report that included categorical and 5-point scale ratings to rate session content, taking about 2 min in writing after the session. Although this method is time efficient, the potential unreliability of the therapist self-report throws into question its validity. Miller and Mount (2001
) found that therapists were likely to report greater increases in motivational interviewing skills following a motivational interviewing training programme than other observers who rated therapy sessions using the MISC. In a study investigating the effectiveness of psychotherapy and pharmacotherapy for cocaine users, Carroll et al. (1998a
) similarly found a lack of concordance between therapist and observer ratings of session content when using a session checklist immediately after the session.
The purpose of the present study was to develop and validate a manual-based, time-efficient method of rating treatment fidelity, including frequency and quality of the delivery of treatment components, treatment manual adherence, therapeutic style and discriminability between different treatments. It was hypothesized that the UK Alcohol Treatment Trial Process Rating Scale (UKATT PRS) would detect the delivery and measure the quality of delivery of two treatments in the UKATT (UKATT Research Team, 2001
, 2005a
, 2005b
) and discriminate between them. It was designed to be readily adaptable to rate different types of substance misuse treatment.
| Method |
|---|
|
|
|---|
Scale and manual development and piloting
The MTRS (Carroll et al., 1998b
To generate rating scale items, essential active ingredients (both unique and common) of each treatment were identified from treatment manuals. Active ingredients included style and content specific to each treatment. The items were re-examined to ensure that they covered all treatment components, balanced between the treatments and combined into a 20-item scale, divided into two sections: one contained 13 items measuring treatment-specific tasks and the other contained 7 items measuring treatment-specific therapist style. Appendix A shows the items with the treatment to which they were hypothesized to belong. These 20 items were rated on two 5-point scales, one measuring the extent to which the item was performed (frequency) and the other measuring how well the therapist performed the item (quality). The scale measuring frequency was anchored at 0 (Not at all) and 4 (Extensively) with intermediate labels of A little, Somewhat and Considerably. The scale measuring quality was anchored at 0 (Not at all well) and 4 (Very well) with unlabelled intermediate points.
Item definitions, descriptions of the characteristics of high and low ratings for frequency and quality, and examples of therapist dialogue illustrating these were specified in a manual. Guidance was included on differentiating the frequency and quality of therapist behaviours, on avoiding common pitfalls indicative of rater bias and the method for note-taking during the session.
Sampling
One video per trial client (where available) was sampled for process rating. The sample of 452 videotapes was stratified by treatment (MET or SBNT), session number (1–3 for MET, 1–8 for SBNT) and centre (see Table 1). To maintain balance between treatments, session numbers and centres, replacement sampling was used when a video was unrateable. Fifty randomly selected videos were scored by two raters independently, and of these 25 were scored by three independent raters, balanced by treatment, session number and centre.
|
Rater training and supervision
The primary rater, blind to the types of therapy, was trained in the use of the scale by two of the authors (G.T. and W.C.) and supervised throughout the study at 3-week intervals, when independently rated tapes were discussed to enhance rating manual adherence and consistency over time.
| Analysis |
|---|
|
|
|---|
Data were collected and analysed using SPSS version 14.
Validity
To test construct validity, the factor structure of the scale was examined by Principal Components Analysis. Summary scores were calculated for treatment-specific items that had factor loadings of >0.25 on a single treatment component: METf was the mean of the frequency scores for MET items; METq was the mean of the quality scores for MET items where frequency ratings were >0; SBNTf was the mean of the frequency scores for SBNT items and SBNTq was the mean of the quality scores for SBNT items where frequency ratings >0.
The ability of the scale to discriminate between the two treatments was investigated by comparing individual item scores and frequency summary scores for each treatment. It was hypothesized that MET item scores and MET summary scores would be high for MET sessions and low for SBNT sessions and vice versa. A t-test was used to compare the mean item scores and the mean frequency summary scores between SBNT and MET.
Concurrent validity was examined by comparing manual-derived quality summary scores for the two treatments with global ratings of individual therapist's skills (low/medium/ high) given by the treatment-specific supervisors. These global ratings were derived following an instruction to the supervisors to base their response on consistency and quality of delivery across the whole period of treatment.
Reliability
Cronbach's alpha was calculated to assess the internal consistency of the four summary scores. Inter-rater reliability of the individual frequency items of the scale was examined using the intraclass correlation coefficient (ICC) two-way mixed effects model (Case 3, Shrout and Fleiss, 1979
) to estimate the reliability of a mean of several ratings (Fleiss, 1981
). For the four summary scores, the average of the two raters summary scores was plotted against difference in their summary score (Bland and Altman, 1986
) to make pairwise comparisons between raters. This illustrates graphically whether the summary scores are rated consistently, how well the raters agree on average and what the limits of agreement are. Figure 3 shows the (dotted) line of mean difference that indicates whether one rater consistently rates higher (or lower) than the other and the spread of data points about the line of mean difference that illustrates the variability in agreement between the raters.
|
|
|
| Results |
|---|
|
|
|---|
Five hundred and sixty-four of 774 (72.8%) clients had at least one video: 76.1% of 443 were for MET and 68.6% of 331 were for SBNT. Four hundred and fifty-two clients (58.4% of 774) had a rateable video: 259 were for MET and 193 were for SBNT. One hundred and twelve clients had at least one unrateable video—a total of 160 videos were unrateable: 101 for MET and 59 for SBNT. Thus videos for SBNT were rarer but the proportion of rateable videos in SBNT and MET was the same. Selection of videos was successful in capturing a spread of equivalent proportions across the sessions (see Table 1).
Construct validity
Principal Components Analysis of treatment-specific therapist task and style items showed a dominant eigenvalue of 5.13 accounting for 26% of the variance (see Figure 1). Although the scree plot in Fig. 1 hints that there could be a second factor, there is no evidence that this is more than random variation. The single factor solution provides an adequate characterization of the data.
Eighteen treatment-specific items had a loading >0.25, all of the nine originally hypothesized SBNT items and all but two of the originally hypothesized MET items. All MET items had positive loadings and all SBNT items had negative loadings suggesting a treatment component where the more MET was practiced, the less SBNT was practiced. Results of the Principal Components Analysis are shown in Table 2.
|
Concurrent validity
Global ratings of quality of therapists treatment delivery were provided in three categories (high, medium and low quality) by the two treatment-specific supervisors and compared with quality summary ratings made by the primary rater for the whole sample. The magnitude of ratings between the primary rater and the supervisors showed concurrence in which rater-derived scores were highest for those in the supervisors high category and lowest in the supervisors low category. Analysis of variance revealed a significant difference in quality ratings given by the primary rater for therapists with global ratings of high, medium and low provided by the treatment supervisors (see Table 3).
|
Criterion validity
Table 4 shows mean frequency scores for the treatment-specific items for both treatments. There is a significant difference between frequency ratings in each of the treatments with a higher rating in each case for the treatment for which the item was designed.
|
SBNT and MET frequency summary scores by treatment group are represented in Figure 2. Mean scores for frequency of MET items were significantly higher (P < 0.001) in MET (MET item mean = 1.3) than in SBNT (MET item mean = 0.4; 95% CI for the difference = 0.91–1.02). Mean scores for frequency of SBNT items were significantly higher in SBNT (SBNT mean = 1.4) than in MET (MET mean = 0.4; 95% CI for the difference = –1.10 to –0.93).
As quality scores were only given if the item was given a frequency rating of 1 or more (that is, if the item was rated as having occurred), some items had very low numbers of quality ratings, particularly for the treatment to which those items were not attributed. Items with 10 or more quality ratings were included in the analysis. Of the 13 items with sufficient data, 3 of 7 MET items showed a significantly higher quality score for MET than for SBNT. Six of the seven SBNT items had significantly higher ratings of quality for SBNT than for MET (see Table 5).
|
Figure 2 also shows SBNT and MET quality summary scores by the randomized group. Where SBNT quality ratings are given, the quality is rated significantly higher in SBNT treatment (mean summary score 2.4) than in MET treatment (mean = 1.9; 95% CI for the difference = –0.61 to –0.30). Where MET quality ratings are given, they are higher in MET treatment (mean = 2.5) than in SBNT treatment (mean = 2.4; 95% CI for the difference = 0.1–0.2).
Reliability
Item analysis was conducted separately for frequency of MET items and for frequency of SBNT items producing Cronbach's alpha of 0.71 for MET items and 0.76 for SBNT items. Item-total correlations are given in Table 4.
Inter-rater reliability as measured by the ICC is reported in Table 4. The items generally show high values of ICC, indicating good levels of consistency between raters, as the majority of the variation is attributable to the clients rather than the raters. The SBNT item, active agent for change, shows a low level of consistency with an ICC of 0.28, while the MET items creating conflict, exploration of feelings and empathy show moderate levels of consistency with ICCs of 0.45, 0.51 and 0.60, respectively.
Agreement for individual items is relatively high, whereas there is much more variability in agreement between raters when summary scores for frequency are compared (see Figure 3). Comparisons between the primary rater and second rater show more disagreement than the other figures. A positive difference indicates second rater scores higher than primary rater; a negative difference indicates second rater scores lower than primary rater. So in Figure 3 with a mean difference of –0.58 for the average agreement on the MET frequency summary score, when the primary rater gives higher ratings, the second rater gives lower ratings overall.
| Discussion |
|---|
|
|
|---|
The scale developed for process rating the delivery of MET and SBNT in the UK Alcohol Treatment Trial is able to accurately detect components of each of the treatments and to discriminate between them. The scale detected that the randomized treatment was delivered as planned and that characteristics of the other treatment were either missing or were delivered infrequently.
The strengths of the study included the absence of sampling bias suggested by the similar proportions of available videos and the same proportion of rateable videos for the two treatments. Furthermore, video recordings were available for 73% of the study clients. Of those sampled on the basis of one video from each client, 160 were unrateable. When a video was found to be unrateable, a further video was randomly selected from the same therapist and the same client where available. The resulting sample was large for this kind of study. Quality assurance of the rating process was equally rigorous.
Principal Components Analysis produced a single factor (labelled UKATT treatment) accounting for 25% of the variance, with positive loadings >0.25 for MET items with the exception of creating conflict and commitment to goal (these items are deemed suitable for rejection or modification) and negative loadings for all SBNT items, supporting the evidence that therapists delivered either MET or SBNT and no evidence of a second factor in the scree plot.
Relatively low frequency ratings were found for four of the MET items (self-efficacy for change, commitment to change, creating conflict and ambivalence) and four of the SBNT items (homework, alternative activities to drinking, active agent for change and collaboration). It is possible that therapists performed some aspects of these treatments infrequently. That is, the scale did detect these aspects of the two treatments when they occurred but they did not occur very frequently. Alternatively it is possible that the therapists performed these components of the two treatments but the scale did not accurately measure performance. Reliability analysis enables us to determine which explanation is more likely. As inter-rater reliability was high we can assume that these items were clearly specified and functional, leading us to question the level of performance of the specific component of treatment.
The scale is also able to distinguish SBNT and MET on the grounds of quality ratings. Six out of seven SBNT items with sufficient data for analysis were found to have significantly higher quality ratings in SBNT than in MET, and two out of the seven MET items with sufficient data for analysis were found to have significantly higher quality ratings in MET than in SBNT. Summary MET quality scores are not significantly different in the two treatment conditions. It is possible either that the manual guidance given for rating quality is better able to detect quality of SBNT delivery or that the treatments have a different number of components that are both essential and unique to that treatment. For example, SBNT is likely to contain more essential items that are also unique than does MET; the latter more established treatment was designed to incorporate basic principles of a widely used evidence-based effective counselling style, namely motivational interviewing whose characteristic style of delivery is more likely to be practiced across different treatment types without adversely effecting the integrity of the treatment. It may also be relevant that one of the qualifying criteria for admission to train as a therapist in the trial was a demonstrable ability to talk to clients in the style of motivational interviewing.
To assess the validity of the scale in measuring the quality of delivery, summary quality scores were compared with global ratings of the quality of therapists practice provided by the treatment supervisors. Supervisors ratings were general ratings of performance of therapists across the whole of the treatment trial and the primary ratings were derived from observation of individual sessions. Global ratings are likely to focus on one aspect of practice, instead of the detailed rating of the delivery of all session components. However, they are in the right direction and the general agreement in these sets of ratings suggests that the UKATT PRS is able to measure the quality of therapists delivery of the two treatments.
Measurement of consistency between the three independent raters for individual frequency items is relatively high and comparable for MET and SBNT sessions. This suggests that the scale is a reliable measure of components of the two treatments. Agreement between the three raters for summary scores is variable.
Five items, elicit self-efficacy, commitment to drinking goal, create conflict (dissonance), elicit commitment to change drinking and active agent for change, could be removed from the scale on the basis of their low frequency ratings, their low corrected item total correlations and low loadings on the treatment factor in Principal Components Analysis.
| Conclusion |
|---|
|
|
|---|
The UKATT-PRS is a valid and reliable method of rating the delivery of two psychosocial treatments for alcohol problems and dependence and identifying which one is being delivered. It is likely to be adaptable to rating the delivery of other psycho-social treatments applying the same principles used in its development. It can therefore form the basis of measuring performance and treatment fidelity in clinical trials involving MET and SBNT, in treatment audit and in routine supervision of practice.
| ACKNOWLEDGEMENTS |
|---|
We are grateful to Veronica Morton for her assistance with statistical analysis and for the contribution to this study of the UKATT Research Team whose names and affiliations are listed in the acknowledgements in the cited paper, UKATT Research Team (2001
|
| References |
|---|
|
|
|---|
Barber JP, Foltz C, Crits-Christoph P, et al. Therapists adherence and competence and treatment discrimination in the NIDA Collaborative Cocaine Treatment Study. J Clin Psychol (2004) 60:29–41.[CrossRef][Web of Science][Medline]
Barber JP, Mercer D, Krakauer I, et al. Development of an adherence/competence rating scale for individual drug counselling. Drug Alcohol Depend (1996) 43:125–32.[CrossRef][Web of Science][Medline]
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet (1986) i:307–10.
Carroll KM, Connors GJ, Cooney NL, et al. Internal validity of Project MATCH treatments: discriminability and integrity. J Consult Clin Psychol (1998) 66:290–303.[CrossRef][Web of Science][Medline]
Carroll KM, Kadden RM, Donovan DM, et al. Implementing the treatment and protecting the validity of the independent variable in treatment matching studies. J Stud Alcohol (1994) 55(Suppl. 12):149–55.[Web of Science][Medline]
Carroll KM, Nich C, Rounsaville BJ. Utility of therapist session checklists to monitor delivery of coping skills treatment for cocaine abusers. Psychother Res (1998) 8:307–20.[CrossRef][Web of Science]
Carroll KM, Nich C, Sifry RL, et al. A general system for evaluating therapist adherence and competence in psychotherapy research in the addictions. Drug Alcohol Depend (2000) 57:225–38.[CrossRef][Web of Science][Medline]
Carroll KM, Nuro KF. The Technology Model: An Introduction to Psychotherapy Research in Substance Abuse (1996) Yale: Yale University Psychotherapy Development Center. Training Series No.1.
Carroll KM, Rounsaville BJ. Can a technology model be applied to psychotherapy research in cocaine abuse treatment? In: Psychotherapy and Counseling in the Treatment of Drug Abuse—Onken LS, Blaine JD, eds. (1990) NIDA Research Monograph Series, Number 104. Rockville, MD: NIDA, 91–104.
Copello A, Orford J, Hodgson R, et alon behalf of the UKATT Research Team. Social behaviour and network therapy. Basic principles and early experiences. Addict Behav (2002) 27:345–66.[CrossRef][Web of Science][Medline]
Elkin I, Parloff MB, Hadley SW, et al. NIMH treatment of depression collaborative research program. Arch Gen Psychiatry (1985) 42:305–16.
Elkin I, Shea T, Watkins JT, et al. NIMH treatment of depression collaborative research program: general effectiveness of treatment. Arch Gen Psychiatry (1989) 46:971–82.
Fleiss JL. Statistical Methods for Rates and Proportions (1981) 2nd edn. New York: Wiley. 38–46.
Hill CE, OGrady KE, Elkin I. Applying the collaborative study psychotherapy rating scale to rate therapist adherence in cognitive behavior therapy, interpersonal therapy and clinical management. J Consult Clin Psychol (1992) 60:73–9.[CrossRef][Medline]
Kazdin AE. Methodology, design and evaluation in psychotherapy research. In: Handbook of Psychotherapy and Behavior Change—Bergin AE, Garfield SL, eds. (1994) 4th edn. New York: John Wiley. 19–71.
Miller WR., Mount KA. A small study of training in motivational interviewing: does one workshop change clinician and client behavior? Behav Cogn Psychother (2001) 29:457–71.[CrossRef]
Miller WR, Zweben A, DiClemente CC, et al. Motivational Enhancement Therapy Manual: A Clinical Guide for Therapists Treating Individuals with Alcohol Abuse and Dependence (1995) NIAAA Project MATCH monograph seriesm, Vol. 2. USA: Rockville.
Moncher FJ, Prinz RJ. Treatment fidelity in outcome studies. Clin Psychol Rev (1991) 11:247–66.[CrossRef][Web of Science]
Moyers T, Martin T, Catley D, et al. Assessing the integrity of motivational interviewing interventions: reliability of the Motivational Interviewing Skills Code. Behav Cogn Psychother (2003) 31:177–84.[CrossRef]
Project MATCH Research Group. Matching alcoholism treatment to client heterogeneity: project match research group post treatment drinking outcomes. J Study Alcohol (1997) 58:7–29.[Web of Science][Medline]
Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull (1979) 86:420–7.[CrossRef][Web of Science][Medline]
Strang J, McCambridge J. Can the practitioner correctly predict outcome in motivational interviewing? J Subst Abuse Treat (2004) 27:83–8.[CrossRef][Web of Science][Medline]
Tober G, Godfrey C, Parrott S, et alon behalf of the UKATT Research Team. Setting standards for training and competence: the UK Alcohol Treatment Trial. Alcohol Alcohol (2005) 40:413–8.
UKATT Research Team. United Kingdom Alcohol Treatment Trial (UKATT): hypotheses, design and methods. Alcohol Alcohol (2001) 36:11–21.
UKATT Research Team. Effectiveness of treatment for alcohol problems: findings of the randomised UK Alcohol Treatment Trial. Br Med J (2005) 331:541–4.
UKATT Research Team. Cost-effectiveness of treatment for alcohol problems: findings of the randomised UK Alcohol Treatment Trial (UKATT). Br Med J (2005) 351:544–8.
Waltz J, Addis ME, Koerner K, et al. Testing the integrity of a psychotherapy protocol: assessment of adherence and competence. J Consult Clin Psychol (1993) 61:620–30.[CrossRef][Web of Science][Medline]
Waskow IE. Specification of the technique variable in the NIMH Treatment of Depression Collaborative Research Program. In: Psychotherapy Research—Williams JB, Spitzer RL, eds. (1984) New York: Guilford Press. 150–9.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Orford, R. Hodgson, A. Copello, M. Krishnan, M. de Madariaga, S. Coulton, and on behalf of the UKATT Research Team What Was Useful about That Session? Clients' and Therapists' Comments after Sessions in the UK Alcohol Treatment Trial (UKATT) Alcohol Alcohol., May 1, 2009; 44(3): 306 - 313. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



