You are on page 1of 13

J CltnEp&miolVol.46, No. 9, PP.959-971,1993 Printedin Great Britain.

All rightsreserved

0895-4356/93 $6.00 + 0.00 Copyright 0 1993 Pergamon PressLtd

RANDOMIZED

DISCONTINUATION AND EFFICIENCY

TRIALS:

UTILITY

JACEK A. KOPEC,* MICHAL ABRAHAMOWICZ~ and JOHN M. ESDAILE~ Department of Epidemiology and Biostatistics, and Divisions of Rheumatology and Clinical Epidemiology, Department of Medicine, Montreal General Hospital, McGill University, Montreal, Quebec, Canada H3G lA4
(Received in revised form 14 April 1993)

Abstract-The randomized discontinuation trial (RDT) is a two-phase trial. In phase I all patients are openly treated with the medication being evaluated. In phase II, those who have responded are randomly assigned to continue the same treatment or switch to placebo. Usually, non-compliers and adverse reactors identified in phase I are excluded from phase II. To investigate the value of this design, we reviewed the advantages and limitations of discontinuation studies, and compared the RDT design to the classic randomized clinical trial design in terms of clinical utility and efficiency (sample size). A computer model was used to study the efficiency of the two designs under a broad range of assumptions. The RDT design is particularly useful in studying the effect of long-term, non-curative therapies, especially when the clinically important effect is relatively small, and the use of placebo should be minimized for ethical or feasibility reasons. However, its use is limited if the objective of an investigation is to estimate the magnitude of absolute treatment effects and toxic effects in the source population, or to evaluate a potentially curative treatment. Our results indicate that selecting responders prior to randomization has a very strong effect on the relative efficiency of the trial. Further improvement may be achieved by excluding non-compliers and adverse reactors. Under the assumptions tested in our model, the sample size required in phase II of an RDT was only 20-50% of that in a classic trial. Epidemiology Efficiency Statistics Clinical trials Research design Sample size

INTRODUCTION

The methods of randomized discontinuation (withdrawal) trials (RDT) have been used in
*Research fellow of the National Health Research and Development Program of Canada. TResearch Scholar of the Montreal General Hospital Research Institute. $Senior Research Scholar of the Fonds de la recherche en Sante du Quebec and Visiting Professor of Medicine,
Department of Rheumatology and Immunology, Harvard Medical School, Brigham and Women s Hospital, Robert B. Brigham Multipurpose Arthritis and Musculoskeletal Disease Center, Boston, MA (grants AR 36308, AI 07306 and AR 07530). Requests for reprints should be addressed to: Dr John M. Esdaile, Division of Clinical Epidemiology, Montreal General Hospital, Montreal, Quebec, Canada H3G 1AY.

clinical research for two decades. The design was first described by Amery and Dony [l] in 1975, as a method of minimizing the duration and degree of patient exposure to placebo in drug efficacy studies. In the RDT, in contrast to the classic randomized clinical trial (RCT), only those patients who appear to improve when treated with the medication under study (the responders) are selected for the second, randomized phase (Fig. 1). The responders can be identified in two ways. One approach is to give the treatment to all eligible patients for a short period of time prior to randomization (the so-called run-in period). Alternatively, the response among patients
959

960

JACEKA. KOPECet al.

treated with the medication of interest in normal clinical practice, sometimes for many years, is evaluated retrospectively. During the open phase, non-compliers and those developing serious adverse reactions are usually excluded. In the second phase, which resembles a traditional, double-blind randomized controlled trial, the responders are randomly assigned to continue the same treatment or switch to placebo. After a suitable period of time, the two groups are compared in terms of response, or relapse rates. Quitkin and Rabkin [2] discussed the utility of this design in clinical research in psychiatry and suggested that it may be particularly suitable for studying new drugs. They also stated that the design is less likely to result in type II errors (false acceptance of the null hypothesis) than the classic design. Friedman et al. [3, p. 451 pointed out that because a highly selected sample is evaluated, this design can overestimate benefit and underestimate toxicity. These authors also observed that in discontinuation trials the same general standards should be adhered to that are used with other designs. Recently, the concept of excluding nonresponders, non-compliants and adverse reactors prior to randomization has been discussed by Knipschild et al. [4], in a broader context of dealing with certain common problems in the design, analysis and ethics of clinical trials. These authors suggested that a prerandomization qualification period can be used to identify suitable baseline characteristics in the admitted patients, adjust the dose of medication, remove placebo responders, determine special therapeutic stratifications, and improve the ethics of informed consent. The RDT design has been applied to assess the efficacy of a number of medical interventions [5-331. Yet compared to the classic design, it has been used relatively infrequently, and only recently has gained some popularity. Apart from its application in efficacy trials, the RDT approach has been employed to investigate the
PHASE I

optimal duration of therapies known to be effective [34-391, and to assess the so called rebound effect after treatment withdrawal [4@-44]. In this paper we are concerned with the use of the RDT design in studies of efficacy, where it can often be considered an alternative to the classic RCT. In the first part, we review the advantages and limitations of this design, with an emphasis on its clinical utility. Next, we compare the RDT design with the classic, one-phase RCT in terms of efficiency. Technical details, including the formulas for sample size calculation, are provided in the Appendix.

CLINICAL UTILITY

Due to the selection of responders prior to randomization, the difference in response rates between the treatment and placebo groups observed in phase II of a discontinuation trial cannot be generalized to the source population of patients with a given condition, eligible for phase I. It is intuitively rather obvious that this difference will generally be larger than that observed in a classic trial conducted in the same source population. (Later, we shall discuss conditions under which this is true). This apparent selection bias has been viewed by some authors as a major limitation of the RDT design [3]. However, others have argued that those subjects who appear to respond to initial, open treatment are more likely to receive the medication of interest in normal clinical practice [I]. Therefore, the effect observed in this group may be more meaningful. For example, in terms of day-to-day decision making, knowing that 80% of patients with rheumatoid arthritis who seem to respond to initial treatment with a new drug may be expected to improve in the long run (as opposed to, say, 40% on placebo), is at least as interesting to a clinician as knowing what proportion of all patients with rheumatoid arthritis are likely to improve (say, 30% on the drug and
PHASE II Same ___._+ Relapse or Treatment No Relapse

Source Population 7 Exclusions (NoResponse, No Compliance, R c

f Placebo Relapse or No Relapse

Toxicity)

Fig. 1. Schematic representation of the randomized discontinuation trial (RDT) design.

Randomized Discontinuation

Trials

961

20% on placebo), even if the number of initial responders is relatively low. Even though the quantitariue result from an RDT does not generally apply to the same population as the result from a classic trial, the qualitative result (i.e. whether or not the treatment has any effect at all) does. A statistically significant difference in response rates in phase II provides evidence that there are some patients in the source population for whom the treatment is effective. In that sense, the RDT design and the classic randomized trial design can, under very broad assumptions, be regarded as alternative methods of addressing the same question. Furthermore, under some specific assumptions to be discussed later, the estimate of relative effect (response rate ratio) obtained in a discontinuation study could also be generalized to the source population. There are a number of limitations in the application of discontinuation studies that do not apply to classic, one-phase randomized clinical trials. The RDT design cannot be used to evaluate a definitive irreversible treatment, such as a surgical intervention. In studying other forms of therapy, a negative result of an RDT may be due to a permanent, curative effect of treatment in a substantial number of subjects, rather than the lack of any clinically significant effect. A negative result may also occur if the medication under study has a short-term beneficial effect but fails to provide an effective maintenance therapy. Although for many diseases and treatments, particularly drug therapies in most chronic conditions, both situations seem unlikely, these alternative explanations of a negative result should always be kept in mind. A positive response to treatment initiation in a symptomatic patient may involve a different biological mechanism than a relapse observed after treatment withdrawal. In particular, if the withdrawal of a given medication is likely to produce a strong rebound effect, distinguishing between this effect and reemergence of the natural disease process may be difficult. For example, in a study of low-dose prednisolone in rheumatoid arthritis [45], a flare occurring after sudden discontinuation of therapy was interpreted as demonstrating low-dose steroid efficacy. However, others argued that the results were more likely to reflect steroid withdrawal symptoms [46]. Problems of this type may often be avoided by tapering the dose of the drug as opposed to abruptly withdrawing the medication. Nevertheless, some understanding of the

biological mechanism involved may be helpful in interpreting the results of discontinuation trials. Quantitative assessment of treatment toxicity in the randomized phase of a discontinuation study is limited, since most subjects prone to adverse reactions are eliminated in phase I. As a result, the rates of adverse reactions in the treatment and placebo groups, as well as the difference between them, will generally be lower than in the source population. On the other hand, the observed rates will apply to those patients who are most likely to receive the treatment over a long period of time. It has been suggested [47,48] that effective blinding in the randomized phase of a trial may be more difficult to achieve if the patients have already been exposed to the active treatment or placebo. In most studies which used the RDT design the effectiveness of blinding was not reported, but at least in one study [31] the authors found that blinding was successful. Nevertheless, the hypothetical risk of unblinding may explain the reluctance on the part of some researchers to use this design. The RDT design may sometimes be applicable in situations where the classic randomized trial might be considered unfeasible for ethical or logistic reasons. The design allows all patients to obtain any beneficial effect that might result from the initial treatment, thus improving cooperation from both patients and physicians. A good example is a study of hydroxychloroquine in systemic lupus erythematosus [31]. Although prior to that study the drug had never been conclusively proved to be effective for this disorder, it had been widely used by physicians and was believed to have a beneficial effect. At this stage, randomizing patients with symptomatic systemic lupus erythematosus to active drug versus placebo was unacceptable to many physicians and patients. However, neither patients nor physicians objected to a random withdrawal of treatment among the responders. A randomized discontinuation trial was then conducted to demonstrate the efficacy of the drug. The design could prove particularly valuable in testing new drugs, as suggested by previous methodological studies [l, 21. In early stages of drug evaluation, open treatment is routinely used to establish the optimal dose level and to look for obvious side effects. In a selected group of responders, the initial treatment could be followed by a randomized discontinuation

962

JACEKA. KOPEC et al.

phase. Such an approach might provide strong evidence of drug efficacy at a relatively low cost. It should be stressed, however, that the possibility of a curative effect or a short-term effect (when the results are negative), and withdrawal syndrome (when the results are positive) should always be taken into account when discontinuation studies are used to evaluate new drugs.

General model

EFFICIENCY

Since selecting patients who are more likely to respond to a given treatment tends to increase the difference in response rates, it will also improve the efficiency of the trial. Thus a discontinuation trial conducted among initial responders may be expected to require a smaller sample size than a classic trial of the same power, carried out in the general population of patients. Hallstrom et al. [49] studied the relative efficiency of a modified RDT design, referred to as a predosed design, employed in the Cardiac Arrythmia Suppression Trial [28]. These authors concluded that selecting responders may have a very significant impact on the required sample size and strongly recommended the use of this design. The efficiency of a trial may be further improved by identifying and eliminating potential non-compliers and adverse reactors prior to randomization. The effect of noncompliance on the required sample size was studied by Shork and Remington [50] and Halperin et al. [51]. Probstfield [47], Lang [52] and Knipschild et al. [4] discussed the role of prerandomization screening procedures in improving compliance in clinical trials. Brittain and Wittes [48] investigated the effect of misclassifying subjects with respect to compliance during a run-in period on the efficiency of a trial. These authors concluded that the run-in period is most effective when there is a high proportion of poor compliers and a low rate of misclassification. In a discontinuation trial, patients who do not comply with the treatment or experience serious adverse reactions are usually eliminated during the initial open phase. This should improve the efficiency of the study over and above what has been achieved by excluding non-responders. It should also reduce the bias in the estimate of treatment effect among those patients who initially respond to the treatment.

We used a computer model to compare the sample size required in a discontinuation trial and in a classic one-phase trial, under equivalent assumptions. The following factors that might affect the sample size have been taken into account: (1) true response rates on placebo and on active treatment in the source population; (2) frequency of non-compliance and adverse reactions in that population; (3) accuracy of identifying three categories of patients in the initial phase of the trial: responders, compliers and tolerators (those who do not experience serious adverse reactions). For the comparison between the two designs to be valid, both trials must refer to the same source population of patients, representing the domain of interest in terms of standard eligibility criteria, such as the diagnosis, treatment indications and contra-indications, demographic characteristics, etc. The difference is that in the classic trial all subjects who agree to participate are randomized to receive active treatment versus placebo, whereas in the discontinuation trial only those deemed eligible after the initial open treatment are randomized. Our analysis is restricted to trials in which treatment efficacy is measured against placebo rather than against another active intervention, and the outcome of interest is defined as a dichotomous response to treatment. It may be noted that this model applies not only to discontinuation trials but to any trial in which subjects are selected based on their initial response. The results presented in this paper are obtained under some additional assumptions. First, we assume that the treatment does not have any permanent, curative effect. In other words, patients who respond to active treatment will relapse in phase II if the treatment is discontinued. We refer to these patients as active responders. Furthermore, we define spontaneous responders as patients who improve, and remain in remission, independently of the treatment they are given. We assume that the proportion of spontaneous responders on active treatment is equal to the proportion of placebo responders in the same population (i.e. placebo response is not prevented by active medication). Finally, the accuracy of identifying active responders in phase I is assumed to be the same as the accuracy of identifying spontaneous responders. Although departures from these assumptions may occur in actual trials, they are

Randomized Discontinuation

Trials

963

difficult to predict given the paucity of empirical data on the natural history of many chronic conditions. We shall examine some effects of serious violations of these assumptions in a later section of this paper. The relative sample size (RSS) is calculated as the ratio of the sample size required in a discontinuation trial to that in a classic trial. Thus the relative efficiency of the RDT design can be defined as l/RSS. The sample sizes are based on the standard, approximate formula for a difference between two proportions [53]. The proportions of responders (response rates) on active treatment and placebo in the randomized phase of a trial, denoted as r and r,,, respectively, are calculated using the formulas derived in the Appendix. In all sample size calculations we take the probability of type I error a = 0.05 (two-sided), and the probability of type II error /I = 0.20 (80% power).
Eflect of excluding non -responders

phase II had all subjects been admitted to that phase. Then the expected absolute effect of treatment in phase II can be calculated as (see the Appendix):
rd=r-rO=(R-R,)

SE,
(SE,)(R)+(l-SP,)(I-R) ( )

It is useful to consider a situation in which the rates of non-compliance and adverse reactions in the source population are zero and there is perfect agreement between the response observed in phase I and phase II. Under these assumptions the response rate in the treatment group of an RDT will be r = 1.O,or 100% (since only responders are selected for phase II), and the response rate in the placebo group will be r,, = RJR, were R, and R are, respectively, the response rates on placebo and active treatment in the source population. Note that the difference rd = 1 - R,,,lR is larger than the difference RD = R - & observed in a classic trial, if
1>R>RQ.

The assumption of a perfect agreement between treatment response in phase I and phase II is unrealistic. Published data indicate that in most trials there will be some patients who appear to respond to the initial open treatment, but turn out to be non-responders in the second randomized phase [6, 7, 9-16, 18, 21-23, 26, 27, 30-333. If the results of phase II are considered a gold standard, those patients who appear to respond in phase I but fail to respond in phase II may be referred to as false responders. Similarly, some of the true responders may be misclassified as non-responders in phase I and excluded from the randomized phase of the trial. Thus it may be useful to treat the initial assessment of response like a diagnostic test with certain sensitivity and specificity, relative to the results that would have been observed in

where SE, and SP, are the sensitivity and specificity of identifying the responders in phase I. Both SE, and SP, are treated in a deterministic rather than stochastic fashion. It can be shown that rd is greater than RD if SE, > 1 - SP,, i.e. as long as selecting responders on the basis of results obtained in phase I is better than random selection. However, if RD = 0 then rd = 0 as well. This demonstrates that qualitatively the assessment of treatment efficacy in an RDT is valid for the source population (as we stated previously), and is practically always more efficient than in the classic trial. Quantitatively, the absolute effect (rate difference) can only be generalized to the population of initial responders. The relative effect (rate ratio), on the other hand, remains the same as in the source population as long as the assumption of equal selection probabilities for active and spontaneous responders holds. The rate ratio will be inflated if active responders are selected with higher probability than spontaneous responders, and it will be reduced if the reverse is true. The exclusion of non-responders, even if not complete, has a very strong effect on the relative efficiency of the trial. The gain in efficiency is particularly dramatic for small treatment effects and low placebo response rates (Fig. 2). For example, assuming 80% sensitivity and 80% specificity of the criteria for identifying responders, the sample size in a discontinuation trial may be only 30% of that in a classic trial. Figure 3 shows that improving the specificity of these criteria may significantly reduce the relative sample size, whereas the sensitivity of selecting responders has much less effect. This is understandable, given that the proportion of false responders among those admitted to the second phase depends on the specificity of selecting responders to a much greater extent than on its sensitivity.
Non -compliance and adverse reactions .

In this section we examine the effect of excluding potential non-compliers and dropouts

964

JACEK A. KOPEC et al.

90%-

RFi=relative response rate

80%-

80%-

20% ,

0.1

I I I / I I1 I 80 I I I 81 I I IV I I I I I I I I II I1 I I s I
0.2 0.3 0.4 0.5 0.8 0.7 REFERENCE (PLACEBO) RESPONSE RATE

0.8

Fig. 2. Sample size required in an RDT relative to that in a classic RCT (in X), for three different relative response rates in the source population: 1.2, 1.5 and 2.0. The frequencies of non-compliance and adverse reactions are assumed to be zero, and the sensitivity and specificity of identifying responders are assumed to be 0.8.

due to adverse reactions on the relative efficiency of the discontinuation design. To avoid an excessive number of arbitrary parameters, the following additional assumptions will be made: (1) response rate in the placebo group is not affected by non-compliance (no contamination of the control group); (2) response rate among spontaneous responders (in the treatment group) is not affected by non-

compliance; (3) adverse reactions are equally common in the treatment and placebo groups; (4) withdrawals due to adverse reactions are not included in the final analysis; (5) responders, compliers and tolerators are selected independently in phase I. We recognize that in many trials some of these assumptions may be violated to a varying degree. However, both theoretical considerations and the results of our

90%

SE=sensitivity of identifying responders 80% SP=specificity of identifying responders

1 E: -;;::y i=
40%30%4 k

SE=0.9 SP=O.7

SE=0.7 SP=O.9

14 -%E=SP=O.O
O% m
I I I1 I1 I I I1 I I I I I
I

I,,,,,,

I,

0.1

0.8 &:ERENCE0.;LACEB:j4RESPONifi RATE

Fig. 3. Sample size required in an RDT relative to that in a classic RCT (in %), for different sensitivities and specificities of identifying responders. The relative response rate is assumed to be 1.5, and the frequencies of non-compliance and adverse reactions are assumed to be zero.

Randomized Discontinuation

Trials

965

NR excluded
60%NR+AR

Ii

excl.

z 3 2 ;5 >!= 5

70%60%NR+NC excl.

NR+AR+NC excl. =-

lo%',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 0.6 0.2 0.3 0.4 0.5 0.1

I-

0.7 0.6 REFERENCE (PLACEBO) RESPONSE RATE

0 .9

Fig. 4. Effect of excluding non-compliers (NC), adverse reactors (AR) or both, in addition to non-responders (NR), on the sample size required in an RDT relative to a classic RCT (in %). The relative response rate is assumed to be 1.5, the frequencies of non-compliance and adverse reactions in the source population are both assumed to be 0.2 and the sensitivities and specificities of identifying responders, compliers and tolerators are all assumed to be 0.8.

sensitivity analysis (data not provided) suggest that these violations are unlikely to have any substantial effect on the relative efficiency of the RDT design. For the sake of clarity, more technical aspects of the model are discussed in the Appendix. If the frequency of non-compliance and the frequency of adverse reactions are similar, excluding non-compliers has a somewhat stronger effect on sample size than excluding adverse reactors (Fig. 4). When both frequencies are assumed to be 0.2, excluding these two groups of individuals with 80% sensitivity and 80% specificity may reduce the relative sample size by more than 30%. Given that non-responders have already been excluded with a fairly good accuracy, the sample size required in a discontinuation trial may be less than 20% of that in a classic trial of similar power (Fig. 4). As the rates of non-compliance and adverse reactions in the source population increase, the sample size gets smaller relative to that in a classic trial (even though the actual number of subjects required for each type of study will, of course, increase). For example, if both rates were 0.3, the relative sample size could be reduced by as much as 50-60%.
Departures from assumptions

Among the assumptions of the model discussed in the previous sections, the most critical

is the assumption that the treatment does not have a permanent effect. If some of the active responders are cured in phase I, the observed response rate on placebo will increase (relapse rate will decrease) and the efficiency of the study will be reduced. The relationship between the relative sample size in a discontinuation trial and the rate of curability in phase I for two levels of treatment effect is presented in Fig. 5. It is noteworthy that the RDT design is more efficient than the classic design even if a substantial proportion of the subjects are cured, especially for low reference response rates and small treatment effects. We also assumed that the sensitivities of identifying spontaneous and active responders in phase I are the same. If, in fact, active responders are identified with a higher sensitivity than spontaneous responders (which in most instances seems more likely, especially if the initial phase is relatively short), the relative efficiency of the RDT design will be further improved. If the opposite is true, the efficiency will be reduced. This effect is shown in Fig. 6, where we consider two scenarios: selecting 100% and 50% of spontaneous responders. In both cases we assume that 80% of active responders are identified. It seems clear that even very strong departures from the assumption of equal selection probabilities would not invalidate our conclusions,

966

JACEK A, KOPEC

et al.

0% I I I I I 1 1 1 0 1 0 0.1 0:2

0.3 0.4 CURABILITY RATE

1I I I I I I I I I I I 0.6 0.7 0.5

Fig. 5. Effect of disease curability on the sample size required in an RDT relative to that in a classic RCT (in %), for two relative response rates: 2.0 and 1.5. The reference response rate is assumed to be 0.2, the frequencies of non-compliance and adverse reactions in the source population are both assumed to be 0.2, and the sensitivities and specificities of identifying responders, compliers and tolerators are all assumed to be 0.8. 70% SE(S) =sensitivity 60%of identiiing responders

spontaneous

l w
Y !k 3
W f $

50%-

40%-

30%-

20%-

10%

< 3 I 31 11 0.1 0.2

0 1 1 0 10 I 8 0.4 0.5 REFERENCE (PLACEBO) RESPONSE RATE

3 10

10 0.3

0.6

Fig. 6. Sample size required in an RDT relative to that in a classic RCT (in %), for three different sensitivities of identifying spontaneous responders: 1.O, 0.8 and 0.5 (specificity is assumed to be 0.8). The relative response rate is assumed to be 1.5, the frequencies of non-compliance and adverse reactions are both assumed to be 0.2, and the sensitivities and specificities of identifying active responders. compliers and tolerators are all assumed to be 0.8.

Phase I sample size In a discontinuation study, only a fraction of all subjects initially treated are admitted to the randomized phase. It is, therefore, important to ask whether the initial pool of candidates for the study (phase I sample size) has to be larger than

that needed for a classic trial. The question particularly relevant to those trials in which first phase is conducted prospectively, since number of subjects available at the onset of study may be limited. Our analysis shows that the initial number subjects needed in phase I of an RDT tends

is the the the of to

Randomized Discontinuation 140%-

Trials

967

SE=.sensitlvity of identifyingresponden SP=specificily of identiiing responders

I! z !!i s a 9 $ a

100?690%90?670%m509b4096-n I I I I 81 -1(I 11 1 I I 0.6 0.5 0.4 0.3 0.2 0.1 REFERENCE (PLACEBO) RESPONSE RATE SE=0.7 SP-0.9

Fig. 7. The initial (phase I) sample size required in an RDT relative to a classic RCT (in %), for different sensitivities and specificities of identifying responders. The relative response rate is assumed to be 1.5 and the frequencies of non-compliance and adverse reactions in the source population are both assumed to be zero.

be slightly lower than in a classic trial, if the sensitivity and specificity of the criteria for identifying responders are above 0.7 (Fig. 7). Both sensitivity and specificity are important, but for different reasons. High specificity reduces the number of false responders and thus improves the efficiency of phase II, whereas high sensitivity allows for a larger proportion of true responders to enter the second, randomized phase. These results also suggest that the initial sample has to be larger than the sample needed for a classic trial if the methods of selecting responders are very inaccurate. The exclusion of non-compliers is likely to increase the required phase I sample size (data not provided), unless the sensitivity and, to a lesser degree, specificity of detecting these individuals in phase I are very high.

DISCUSSION

It is important to emphasize that the results presented in this paper should be applied with caution. Our goal was to develop a realistic and yet relatively parsimonious model of the RDT design. In real life, some of the simplifying assumptions built into the model will not hold exactly, and the actual values of the parameters needed to calculate the required sample size may be difficult to estimate with precision. Nevertheless, if reasonable estimates of these parameters can be obtained, the for-

mulas given in the Appendix should provide a fairly accurate assessment of the response rates that might be expected in phase II of an RDT. Our results indicate that in those situations where the RDT design is applicable, it may offer a considerable advantage over the traditional randomized trial design. The design might be particularly useful in studying the effect of long-term, non-curative therapies, especially when the clinically important effect is relatively small and the use of placebo should be minimized for ethical or feasibility reasons. It seems reasonable to expect that identifying and eliminating non-responders in phase I may reduce the required sample size by at least 50%, depending on the reference response rate and the effect of treatment. Further reduction may be achieved by excluding non-compliers and adverse reactors. The relative efficiency of the RDT design depends to a large extent on the accuracy (mainly specificity) of the selection criteria with respect to treatment response and, to a somewhat lesser degree, with respect to patient compliance and treatment tolerance. The sensitivity of these criteria becomes an important issue if the number of potential candidates for the trial is limited. The findings of a similar study conducted by Hallstrom et al. [49] appear to suggest that the gain in efficiency due to selecting responders

968

JACEKA. KOPEC et al.

may be even larger than reported here. However, to compare our results with those obtained by Hallstrom et al., it is necessary to discuss briefly the differences between their model, based on the design of the Cardiac Arrythmia Suppression Trial (CAST), and our model of a typical discontinuation trial. In the CAST study, patients who appeared to respond to an antiarrthymic drug by showing suppression of premature ventricular depolarizations (the interim outcome) were subsequently randomized to a continuation of the drug or placebo. Thus the selection of responders was based on a very specific interim measure of effect, different from the main outcome of interest. In principle, this modification of the RDT design could be handled within the framework offered in this paper. Hallstrom et al. made an additional adjustment for the possible loss of a small proportion of subjects who may develop the primary outcome during the initial open phase. They also assumed that the proportion of patients responsive to the interim outcome may be slightly higher in a predosed study than in a classic trial. On the other hand, the results provided by Hallstrom et al. do not take into consideration the additional effect of excluding non-compliers and adverse reactors during the initial open treatment. It should also be noted that these authors provide data that refer to the initial number of subjects needed, rather than the number of subjects randomized. As we pointed out earlier, many discontinuation trials include patients treated with the medication of interest in normal clinical practice. Since these patients have already been preselected in terms of treatment response, compliance and tolerance, we prefer to use the sample size needed for the randomized phase in assessing relative efficiency. A more important but not obvious difference between our model and that used by Hallstrom et al. lies in +he assumptions concerning the sensitivity and specificity of identifying responders. In our model, these parameters are fixed a priori, and are allowed to take any values between 0 and 1 (although we provide data for only a few, selected values). By contrast, in the Hallstrom et al. model, the specificity and overall sensitivity of the initial response in predicting the primary outcome are determined by the reference response rate, the interim response rate on treatment and the ratio of the primary outcome rate among initial responders to that among non-responders. However, these authors

make an implicit assumption that the sensitivity of initial response in identifying those patients who will ultimately benefit from the treatment (active responders in our model) is 100%, whereas spontaneous responders are selected randomly, i.e. with a sensitivity equal to the overall proportion of subjects deemed responsive to the interim outcome and selected for the randomized phase. This explains the relatively large reduction in the initial sample size in Hallstrom et d s study, compared to our data presented in Fig. 7. The above assumption implies that the true effect of treatment with respect to the primary outcome is always mediated through the observed suppression of arrythmia. Although this may seem plausible in the context of the CAST study, an analogous assumption would be difficult to justify in most discontinuation trials, in which specific tests of interim treatment effect are not available. Nevertheless, once all the above differences in assumptions are adjusted for, the two models give exactly the same results. Although our model was restricted to trials in which treatment efficacy is evaluated against placebo rather than against another active intervention, the RDT design could also be used to compare two active treatments. To this end, a group of patients would have to be treated simultaneously with two different medications. Subsequently, those who appear to respond are randomized into discontinuation of one medication or the other. Since the selection of responders is unlikely to be perfectly accurate, validity of this approach hinges on the assumption of equal selection probabilities for subjects responding to different treatments. A final point relates to the problem of maximizing the efficiency of a trial in a more general context. In theory, the selection of subjects could be further refined by discontinuing active therapy in all responders after a period of open treatment, and randomizing only those who initially improved but relapsed during the discontinuation phase. Since the ratio of active to spontaneous (placebo) responders should be much higher among the responders-relapsers than among initial responders, this approach would seem optimal in providing qualitative evidence of treatment efficacy while minimizing the number of subjects to be randomized and exposed to placebo. However, it might prove impractical, and the clinical utility of the quan-

Randomized Discontinuation

Trials

969

titative result obtained limited.

in this way might be


17.

Acknowledgements-The authors wish to thank Dr S. Shapiro for helpful comments on an earlier draft of this manuscript, and Drs J. C. Bailar III and 0. S. Miettinen for useful discussions about the topics addressed in this paper. This study was supported in part by grants from the Arthritis Society of Canada, and the Natural Sciences and Engineering Council of Canada.

18.

REFERENCES 1. Amery W, Dony J. A clinical trial design avoiding undue placebo treatment. J ClIn Pharmacol 1975; October: 674-679. 2. Quitkin FM, Rabkin JG. Methodological problems in studies of depressive disorder: utility of the discontinuation design:J Clin Psychopharmac& 1981; 1: 283-288. 3. Friedman LM, Furberg CD, DeMets DL. Fundamentals of Clinical Trials, 2nd edn. Littleton, MA: PSG Publishing Company; 1985. 4. Knipschild P, Leffers P, Feinstein AR. The qualification period. J Clin Epidemlol 1991; 44: 461-464. 5. Verhaeghe L. Treatment of angina pectoris with lidoflazine. Arzneim-Forseh 1969: 1842-1848. 6. The Sixty Plus Reinfarction Study Research Group. A double blind trial to assess long-term oral anticoagulant therapy in elderly patients after myocardial infarction. Lancet 1980; 2: 990-994. 7. Pontbn J, Biber B, Bjurii T, Henriksson B-A, Hjalmarson A. Beta-receptor blocker withdrawal. A preoperative problem in gener surgery? Acta Anaesth Stand 1982; Suppl. 76: 32-37. 8. Pot&r J, Biber B, Bjurij T, Henriksson B-A, Hjalmarson A, Lundberg D. Beta-receptor blockade and spinal anaesthesia. Withdrawal versus continuation of longterm therapy. Aeta Anaesth Seand 1982; Suppl. 76: 62-69. 9. Ahern MJ, Hall ND, Case K, Maddison PJ. p-Penicillamine withdrawal in rheumatoid arthritis. Ann Rhemn Dis 1984; 43: 213-217. 10. DiBianco R. Shabetai R, Silverman BD, Leier CV, Benotti JR. Oral amrinone for the treatment of chronic congestive heart failure: results of a multicenter randomized double blind and placebo-controlled withdrawal study. J Am Coil Cardiol 1984; 4: 855-866. 11. Evans JR, Pacht K, Huss P, Unverferth DV, Bashore TM, Leier CV. Chronic oral amrinone therapy in congestive heart failure: a double-blind placebo-controlled withdrawal study. Int J Chin Phamt Res 1984; IV(l): 9-18. 12. DiBianco R, Katz RJ, Chesler E, Alpert JS, Spann JF. Long-term efficacy of bepridil in patients with chronic stable angina pectoris: results of a multicenter, placebo-controlled study of extended bepridil use. Am J Cardiol 1985; 55: SOC-54C. 13. Stellon AJ, Portmann B, Hegarty JE, Williams R. Randomized controlled trial of azathioprine withdrawal in autoimmune chronic active hepatitis. Laneet 1985; 1: 668-670. 14. Ruoff G. Effect of withdrawal of terazosin therapy in patients with hypertension. Am J Med 1986; 80 (Suppl. 5B): 3541. 15. Casaer P, Aicardi J, Curatolo P, Dias K, Maia M, Motte J, Pineda M, Pouplard F, Preney-Cramatte S, Stephenson J, Szliwowski H. Flunarizine in alternating hemiplegia in childhood. An international study in 12 children. NemopedIatrics 1987; 18: 191-195. 16. Frazier LM, Mulrow CD, Alexander LT Jr, Harris RT, Heise KR, Brown JT, Feussner JR. Need for

19.

20.

21.

22.

23.

24. 25. 26.

27.

28.

29.

30. 31.

insulin therapy in type II diabetes mellitus. A randomized trial. Areh Intern Med 1987; 147: 1085-1089. Giles TD, Sander GE, Roffidal L, Thomas MG, Mersch DP, Moyer RR, Burris JF, Mroczek WJ, Brachfeld J. Remission of mild to moderate hypertension after treatment with carteolol, a beta-adrenoceptor blocker with intrinsic sympathomimetic activity. Arch Intern Med 1988; 148: 1725-1728. Liebowitz MR, Gorman JM, Fyer AJ, Campeas R, Levin AP. Sandbera D, Hollander E, Paoo L. Goetz D. Pharmacotherapy of social phobia:* an interim report of a placebo-controlled comparison of phenelzine and atenolol. J Clin Psychiatry 1988; 49: 252-257. Stellon AJ, Keating JJ, Johnson PJ, McFarlane IG, Williams R. Maintenance of remission in autoimmune chronic active hepatitis with azathioprine after corticosteroid withdrawal. Hepatology 1988; 8: 781-784. Gisslinger H, Linkesch W, Fritz E, Ludwig H, Chott A, Radaszkiewicz Th. Long-term interferon therapy for thrombocytosis in myeloproliferative diseases. Lane& 1989; 1: 634637. Misra SP, Thorat VK, Sachdev GK, Anand BS. Long-term treatment of irritable bowel syndrome: results of a randomized controlled trial. Q J Med 1989; 73: 931-939. Scottish Schizophrenia Research Group: McCreadie RG, Wiles D, Grant S, Crockett GT, Mahmood Z, Livingston MG, Watt JAG, Greene JG, Kershaw PW, Todd NA, Scott AM, Loudon J, Dyer JAT, Philip AE, Batchelor D. The Scottish first episode schizophrenia study. VII. Two-year follow-up. Acta Psychiatr &and 1989; 80: 59742. Dunk AA, Prabhu U, Tobin A, O Morain C, Mowat NA. The safety and efficacy of tripotassium dicitrato bismuthate (De-Nol) maintenance therapy in patients with duodenal ulceration. Aliment Phannacol Ther 1990; 4: 157-162. Fabricias PG, Weizert P, Dunzendorfer D et a/. Efficacy of one-a-day terazosin in benign prostatic hyperplasia. Pro&ate Suppl 1990; 3: 85-93. Horan RF, Sheffer AL, Austen KF. Cromolyn sodium in the management of systemic mastocytosis. J ABergy CIln Immuwl 1990; 85: 852-855. Rabkin JG, McGrath PJ, Quitkin FM, Tricamo E, Stewart JW, Klein DF. Effects of pill giving on maintenance of placebo response in patients with chronic mild depression. Am J Psychiatry 1990; 147: 1622-1626. Eklund K, Forsman A. Minimal effective dose and relapse--double blind trial: haloperidol decanoate vs. placebo. Chin Neuropharmaeol 1991; 14(Suppl. 2): S7-S12. Echt DS, Liebson PR, Mitchell LB, Peters RW, ObiasManno D, Barker AH, Arensberg D, Baker A, Friedman L, Greene LH, Huther ML, Richardson DW and the CAST Investigators. Mortality and morbidity in patients receiving encainide, flecainide, or placebo. The Cardiac Arrhythmia Suppression Trial. N EngI J Med 1991; 324: 781-788. Robinson DS, Lerfald SC, Bennett B, Laux D, Devereaux E, Kayser A, Corcella J, Albright D. Continuation and maintenance treatment of major depression with the monoamine oxidase inhibitor phenelzine: a double blind placebo-controlled discontinuation study. Psychopharm&ol Bull 199I; 27: 3 l-39. Ruskin PE, Nvman G. Discontinuation of neuroleptic medication. in -older, outpatient schizophrenics. J Nerv Ment Dis 1991; 179: 212-214. The Canadian hydroxychloroquine study group. A randomized study of the effect of withdrawing hydroxychloroquine sulphate in systemic lupus erythematosus. N Engl J Med 1991; 324: 150-I 54.

970 32.

JACEK A. KOPEC et al. Walsh BT, Hadigan CM, Devlin MJ, Gladis M, Roose SP. Long-term outcome of antidepressant treatment for bulimia nervosa. Am J Psychiatry 1991; 148: 120&1212. Mavissakalian M, Perel JM. Clinical experiments in maintenance and discontinuation of imipramine therapy in panic disorder with agoraphobia. Arch Gen Psychiatry 1992; 49: 3 18-323. Snider DE Jr, Long MW, Cross FS, Farer LS. Sixmonth isoniazid-rifampin therapy for pulmonary tuberculosis. Am Rev Respir Dis 1984; 129: 573-579. Langford HG, Blaufox MD, Oberman A, Hawkins CM, Curb JD, Cutter GR, Wassertheil-Smaller S, Pressel S, Babcock C, Abernethy JD, Hotchkiss HJ, Tyler M. Dietary therapy slows the return of hypertension after stopping prolonged medication. JAMA 1985; 253: 657-664. Sobel JD. Recurrent vulvovaginal candidiasis. A prospective study of the efficacy of maintenance ketoconazole therapy. N Engl J Med 1986; 315: 1455-1458. Fletcher AE, Franks PJ, Bulpitt CJ. The effect of withdrawing antihypertensive therapy: a review. J Hypertension 1988; 6: 431436. Coiffier B, Gisselbrecht Ch, Herbrecht R, Tilly H, Bosly A, Brousse N. LNH-84 Regimen: A multicenter study of intensive chemotherapy in 737 patients with aggressive malignant lymphoma. J Clin Oncol 1989; 7: 1018-1026. Freis ED, Thomas JR, Fisher SG, Hamburger R, Borreson RE, Mezey KC, Mukherji B, Neal WW, Perry HM, Taguchi JT. Effects of reduction in drugs or dosage after long-term control of systemic hypertension. Am J Cardiol 1989; 63: 702-708. Rickels K, Case WG, Downing RW, Winokur A. Long-term diazepam therapy and clinical outcome. JAMA 1983; 250: 767-771. Schroeder JS, Walker SD, Skalland ML, Hemberger JA. Absence of rebound from diltiazem therapy in Printzmetal s variant angina. J Am CoU Cardiol 1985; 6: 174178. Greenblatt DJ, Harmatz JS, Zinny MA, Shader RI. Effect of gradual withdrawal on the rebound sleep disorder after discontinuation of triazolam. N Engl J Med 1987, 317: 722-728. Lederle FA, Pluhar RE, Joseph AM, Niewoehner DE. Tapering of corticosteroid therapy following exacerbation of asthma. A randomized, double-blind, placebo-controlled trial. Arch Intern Med 1987; 147: 2201-2203. Mander AJ, Loudon JB. Rapid recurrence of mania following abrupt discontinuation of lithium. Lancet 1988, 2: 15-17. Buchanan WW, Stephen LJ, Buchanan HM. Are homeopathic doses of oral corticosteroids effective in rheumatoid arthritis? Clin EXD Rheumatol 1988: 6: 28 l-284. Caldwell JR, Furst DE. The efficacy and safety of low-dose corticosteroids for rheumatoid arthritis. Semin Arthritis Rheum 1991; 21: l-11. Probstfield JL. Clinical trial prerandomization compliance (adherence) screen. In: Cramer JA, Spilker B, Eds. Patient Compliance in Medical Practice and CIinical Trials. New York: Raven Press; 1991: 323-333. Brittain E, Wittes J. The run-in period in clinical trials: The effect of misclassification on efficiency. Controlled ClIn Trials 1990; 11: 327-338. Hallstrom AP, Verter J, Friedman L, for the Cardiac Arrythmia Suppression Trial (CAST) Investigators. Randomizing responders. Controlled Clin Trials 1991; 12: 486-503. Schork MA, Remington RD. The determination of sample size in treatment-control comparisons for chronic disease studies in which drop-out or non-adherence is a problem. J Chron Dis 1967; 20: 233-239. Halperin M, Rogot E, Gurian J, Ederer F. Sample sizes for medical trials with special reference to longterm therapy. J Chron Dis 1968; 21: 13-24. Lang JL. The use of run-in to enhance compliance. Stat Med 1990; 9: 87-95. Armitage P, Berry G. Statistical Methods in Medical Research, 2nd edn. Oxford: Blackwell Scientific Publications; 1989.

51.

33.

52. 53.

34.

35.

APPENDIX Effect of Excluding Non -responders If all subjects in the source population were admitted to phase II, the results of a discontinuation study could be displayed in the following 2 x 2 table (the results of phase II are the gold standard): TRUE RESPONSE YES PHASE I RESPONSE YES NO
I I I

36.

37.

NO

38.

39.

1-R

1.0

40.

In this table f,, fz, f3and f4denote proportions and f, +h is the proportion of subjects that would normally be admitted to-phase II of an RDT. Since the sensitivity of identifying responders in phase I is: SE, =;, and the specificity is:

41.

42.

SPr=(l-R)
then,

f, +h = Wr)(R)+(1 -spr)U -R).


Thus the expected response rate in the treatment

(A.3) group is:

43.

A
r=m+fi)=(sE,)(~)+(l

Wr) CR 1
-sP,)(I -R) (A.4)

44.

45.

Let R, be the reference response rate, i.e. the proportion of subjects in the source population who respond (improve) spontaneously (i.e. when given a placebo), and& the proportion of spontaneous responders admitted to phase II. If we assumed the same sensitivity for identifying active and spontaneous responders, the expected response rate in the placebo group of a discontinuation trial would be:

46.

.fi
r=Ifi+f2)((SE,)(R)+(I

(SE,)(R,)
-SP,)(l -R) (A 5)

47.

From equations (A.4) and (A.S), the expected absolute effect of treatment (response rate difference) in phase II is:

48.

and the relative

effect (response R rr = y=RR, r. O

rate ratio)

is: (A.7)

49.

50.

The expected absolute effect of treatment in a classic fy-phase) trial conducted in the same source population naturally, RD = R - & and the relative effect is RR = R/R,.

Randomized Discontinuation Trials Effect of Excluding Non-compliers

971

It is reasonable to expect that non-compliance will affect


only those subjects who actually respond to active treatment, whereas spontaneous responders will not be affected. The proportion of active responders in the source population is R -R,,. If the rate (frequency) of compliance in phase II is pe, then the response rate in the treatment group of an RDT will be:
r=

spontaneous responders in phase I, the response rates on treatment and placebo would be given by the following, general formulas: r= (R-%)(S&)p,+(%)(SE,) (R -%)(SE,)+(%)(SE,)+(l

-SP,)(l-

R) (A.ll)

WKR - 41)P, + &I


(SE,)(R)+(l -SP,)(l -R)

CR - 4,) 6% )c + (4, ) (SE,)


(A.8) r=(R-R,)(SE,)+(R,)(SE,)+(l-S~,)(l-R) (A.12) where SE, and SE, are the sensitivities of the criteria for identifying active and spontaneous responders, respectively. The values for r and r, obtained using these formulas can be employed in calculating the sample size. Effect of Excluding Adverse Reactors We are only concerned here with adverse reactions that result in a patient s withdrawal from the study. We assume that withdrawals due to adverse reactions are not included in the final analysis and that the withdrawal rate is the same in the treatment and placebo groups (assuming unequal withdrawal rates would not change the results substantially). If N, denotes the number of subjects originally randomized, and p, is the proportion of patients who tolerate the treatment in phase II (i.e. do not withdraw due to adverse reactions), then the number of subjects available for analysis is simply: N = N,p,. (A.13)

If there is no contamination, response rate in the placebo group will not be affected by non-compliance. The expected rate of compliance in phase II, pc. can be calculated using the same approach as was used to derive formulas (A.4) and (AS). If we denote the sensitivity and specificity of detecting compliance in phase I by SE, and SP,, respectively, and x, is the rate of compliance in the source population, then the expected rate of compliance among those selected for phase II can be calculated as:

Pc=(SE,)(rr,)+(l-SP,)(l-n,)~ (A.9)
The expected response rates in the classic, one-phase trial source population are the carried out in R, = (R - R&T, + & in the treatment group, and & in the placebo group. Effect of Curability Let c be the proportion of active responders who are permanently cured in phase I. Then the expected response rate on placebo in a discontinuation study is:
r, =

(SE,) 6~)

SErKR- %)c + %)I


(SE,)(R)+(l-SP,)(l-R)

(A.lO)

To calculate p,. we assume that tolerators are identified in phase I with sensitivity SE, and specificity SP,. If n, is the true proportion of tolerators in the source population, then, analogously to equation (A.9), we have:

Different Selection Probabilities If we assumed different selection probabilities (different sensitivities of the identification criteria) for active and

(S-9 (n,)
=(SE,)(x,)+(l -SP,)(l -n,)

(A.14)

You might also like