Professional Documents
Culture Documents
72–85
doi:10.1093/biostatistics/kxj034
Advance Access publication on April 5, 2006
Keywords: Barthel index; Bounded outcome scores; Compliance research; Logistic-transform, Ordinal probit
regression.
1. I NTRODUCTION
Bounded outcome scores are measurements that are restricted to a finite interval, which can be closed,
open, or half-closed. Examples of bounded outcome scores can be found in many medical disciplines.
For instance, in compliance research one measures the proportion of days that patients correctly take
their drug, hereafter denoted as “pdays.” Another example is the Barthel index (Mahoney and Barthel,
1965) which is an Activities on Daily Living scale that (in one version) jumps with steps of 5 from 0
c The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.
The logistic transform for bounded outcome scores 73
(death or completely immobilized) to 100 (able to perform all daily activities independently). This scale
is often used in stroke trials to measure the recovery of a patient after an acute stroke. Finally, in pain and
pain-relief studies, a visual analog score (VAS) is used to measure the psychological state of the subject.
Bounded outcome scores show a variety of distributions, from unimodal to J - and U -shaped. These
peculiar shapes often motivate the use of non-parametric methods, like the Wilcoxon test (Lesaffre and
others, 1993) when comparing two treatments. However, possibilities for statistical modeling, e.g. when
covariate adjustment is envisaged, are then limited. Alternatively, a dichotomized version of the score
may be constructed and analyzed using logistic regression. For instance, the Barthel index could be split
at 0.9. A value above 0.9 implies that the patients are able to perform most of their daily activities, and
hence the dichotomized Barthel index has a simple interpretation. However, such an approach has two
disadvantages; first the choice of the threshold is usually ad hoc and second reducing the score to a binary
variable may reduce the efficiency of the comparison. Ordinal regression (McCullagh, 1980) is an alter-
The LN distribution can take very different shapes depending on the choice of µ and σ 2 , as is shown
in Figure 1. Note that when µ changes sign this corresponds to mirroring the distribution around u = 0.5.
Hence, the logistic transformation is very well suited to model a variety of distributions on (0, 1). A
similar property holds for the Beta family, but Aitchison and Begg (1976) indicate that the LN distribution
is richer and can approximate any Beta density.
It is clear that when the bounded outcome scores have a LN distribution, the analysis could be done
on the Z -scale using classical statistical analyses assuming a Gaussian distribution. For instance, suppose
we wish to compare the effects of control and new treatments based on a bounded score with distributions
LN(µ, σ12 ) and LN(µ + , σ22 ), respectively. When σ12 = σ22 = σ 2 , a simple unpaired t-test can be
calculated on the final Z -values and a 95% confidence interval can be obtained for (>0) on the Z -
scale. The interpretation of is more difficult because it represents a location shift on the transformed
scale. Since the logistic transformation is strictly monotone, = log νν21 /(1−ν 2)
/(1−ν1 ) , where ν1 and ν2 are the
medians for the control and new treatment, respectively, on the original scale. Figure 2 gives an example
of how the location-shift alternative on the Z -scale is translated into an alternative hypothesis on the
observed U -scale, when σ12 = σ22 . The parameter can also be interpreted in relation to the Wilcoxon–
Mann–Whitney test (Lehmann and D’Abrera, 1998). Assume that Z 1 and Z 2 are independent random
variables on the transformed scale
corresponding to the control and new treatments, respectively, then
P(Z 2 > Z 1 ) = P = √ , which is also equal to P(U2 > U1 ) for the corresponding original U
σ 2
values. Brunner and Munzel (2000) called P the “relative effect” of the treatment, which is therefore
seen to be determined by the ratio /σ . In general, the relative effect is equal to F1 dF2 , where F j is the
cumulative distribution function of Z j or here equivalently of U j ( j = 1, 2). Hence, loosely speaking, P
determines the proportion of individuals better off with the new treatment than with the control treatment.
A 95% CI for P can be obtained using the Delta method when estimates for , σ , and their covariance
matrix are available. If instead transformation to a logistic distribution is envisaged, then /σ can be
directly interpreted as a log-odds ratio of cumulative distribution functions, see Section 6.
When σ1 = σ2 , one could use the Welch test (Welch, 1951) on the transformed Z -scale. This test is
also called the unpaired t-test for unequal variances. However, it is well known that ignoring the inequality
of the variances, i.e. applying in this case the classical unpaired t-test, has no great impact on the type I
error as long as n 1 ≈ n 2 (see, e.g.
Wetherill,
1960;Murphy, 1967). Further, in this case the relative effect is
equal to P(Z 2 > Z 1 ) = P = / σ12 + σ22 , which can be estimated in a similar manner as above.
The logistic transform for bounded outcome scores 75
The logistic transformation is useful for power and sample size calculations in a clinical trial with
a bounded outcome score U as primary endpoint because the classical location-shift alternative is most
often not appropriate. While power and sample size calculations are more difficult, they can be realized
by first specifying the relative effect together with σ .
Finally, the logistic transformation is also useful in statistical modeling of bounded outcome scores on
(0, 1). Indeed, the logistic regression model
U
log = xT β + σ Z , (2.1)
1−U
with Z ∼ N (0, 1), has been used in various applications (Kieschnick and McCullough, 2003). This
approach is especially useful in clinical trial applications when baseline covariate adjustment is envisaged.
Finally, Expression (2.1) can easily be extended to allow σ to depend on covariates, as for example in
Pourahmadi (1999).
ri ∼ Bin(Ui , Ni ) (i = 1, . . . , n) (3.1)
with Ui ∼ LN(µ, σ 2 ) and say that ri has a BLN distribution. For each value of Ui , one observes Ni
binary outcomes Wi j ( j = 1, . . . , Ni ) summing up to ri . For the compliance example, the rec-
orded adherence is the observed proportion of days that the patients take their medication correctly
(with respect to dosage and timing) in a period of Ni days. In this case, the Ui could be interpreted
as the (true but unobserved) latent adherence of the ith patient to the drug. Observe that this model is
actually a classical measurement error model (Carroll and others, 1995), specifying the distribution
f (Y |U ).
Model (3.1) can be extended by replacing µ by xiT β to give a “generalized linear mixed-effects model,”
However, our approach also allows a grid varying with the subjects.
The framework of coarsened data has been formalized by Heitjan and Rubin (1991) and Heitjan
(1993). In their terminology, we consider here only deterministic CO. More formally, we assume that
as(i) Ui < as(i)+1 when Yi is recorded. For the likelihood this implies the following expression:
n
as(i)+1
L(θθ ; y) = g(u i ; θ )du, (3.2)
i=1 as(i)
where g(·) is the probability density function of the LN distribution. This leads to the likelihood
(u)
(l)
n
z s(i) − xiT β z s(i) − xiT β
L(θθ ; y) ≡ L(β
β , σ ; y) = − , (3.3)
σ σ
i=1
(l) (u)
where z s(i) = logit(as(i) ), z s(i) = logit(as(i)+1 ), and (·) is the distribution function of the standard
normal distribution. At the boundaries, i.e. a0 = 0 and a(m+1) = 1, the values of z s(i) become −∞ and
∞, respectively. When σ depends on covariates, the above expression needs to be adapted. For instance,
in the two-group comparison Expression (3.3) splits up in two parts, one with σ1 (first treatment) and
the other with σ2 (second treatment). For obvious reasons, we have called this the “CO approach” either
assuming equal variances or allowing unequal variances.
The maximum likelihood estimates for this model can easily be obtained using standard numerical
optimization procedures such as the Broyden, Fletcher, Goldfarb, and Shanno (BFGS) algorithm (Lange,
2004).
The logistic transform for bounded outcome scores 77
where −∞ < θ0 θ1 · · · θ(m−1) < ∞ are unknown ordered cut points that need to be estimated
from the data.
Suppose that, after standardization to the [0, 1] interval, Yi satisfies the CO model of Section 3.2 and
σ does not depend on covariates. Then (3.2) and (3.3) imply that
4. S IMULATION STUDY
In this section, we describe a simulation study that we have performed to evaluate our proposals in Sections
3.1 and 3.2 for analyzing bounded outcome scores on [0, 1] in the two-group situation and under the
location-shift alternative on the transformed scale. For more details, we refer to supplementary material
available at Biostatistics online (http://www.biostatistics.oxfordjournals.org).
78 E. L ESAFFRE AND OTHERS
Coarsened data on [0, 1]. First we summarize the results when there are no covariates. When σ1 = σ2 ,
the type I error was well preserved for all approaches. Further, overall the power of the CO2-approach
was less than for the other approaches, which is natural because the other approaches are developed under
the assumption of equal variances. In all cases, the treatment effect was estimated without bias. When
σ1 = σ2 , the type I error was well preserved for the CO2-approach, but was sometimes severely increased
The logistic transform for bounded outcome scores 79
for the other approaches. For the CO1- and the OP regression models, the reason is that the treatment
effect is sometimes estimated with a large bias. The anti-conservative character of the Wilcoxon test is
explained by its relationship with ordinal logistic regression (McCullagh, 1980). The power of the CO2-
approach was sometimes much less than for the other approaches, but this can be explained by their anti-
conservative character. Indeed, the power of the other approaches was even higher than the corresponding
power obtained from the Welch test, i.e. when no CO is involved.
When covariates are available, the first and obvious conclusion is that the power can be greatly im-
proved depending on the relationship of the covariates with the response. Apart from that, the conclusions
are similar to those reported.
Discussion. We expected the BLN approach, and especially the CO approach, to be more powerful than
OP regression since the latter requires more parameters to be estimated. However, the simulation results
5. A PPLICATIONS
5.1 A compliance-enhancing intervention study: THAMES study
Recently, an open-label, multicenter compliance-enhancing intervention (THAMES) study was completed
in Belgium to measure the effect of a program of pharmaceutical care, designed to enhance adherence to
atorvastatin treatment. Four well-defined districts were identified, two in Flanders (northern Belgium)
and two in Wallonia (southern Belgium). In both Flanders and Wallonia, all pharmacists in one of the
districts were to apply measures to improve compliance and enhance persistence, whereas in the second
district no such measures were taken. There were 187 patients in the intervention group and 182 patients
in the control group. All pharmacists were equipped with the Medication Electronic Monitoring System
(MEMS) system, an electronically monitored pharmaceutical package designed to compile the dosing
histories of ambulatory patients taking oral medications (Urquhart, 1997). The total study duration was
12 months. The number of visits to the pharmacy ranged from 5 to 13. At each visit, the patient’s dosing
history was checked by means of the electronic monitoring system. The period between the first and
second visit was considered to be the baseline period. More details on the setup of this intervention study
can be found in Vrijens and others (2006).
The primary efficacy parameter of the THAMES study was adherence to prescribed therapy in the
post-baseline period, whereby adherence was defined for each patient as the proportion of days during
which the MEMS record showed that the patient had opened the pill container correctly. This variable
was also estimated at baseline (baseline adherence). Finally, for the calculation of the “post-baseline
adherence” the post-baseline period was arbitrarily cut off at day 300.
Baseline covariates. The THAMES study could not be randomized due to practical difficulties. There-
fore, we need to compare the baseline covariates of the intervention and control groups. In Table 1, we
80 E. L ESAFFRE AND OTHERS
Table 1. THAMES study: comparison of baseline covariates for difference in the two groups. For
categorical variables frequencies (percentages) and for continuous variables the mean are reported
Efficacy comparison. Initially, we checked for a treatment effect without correcting for any baseline
covariates. Both the Wilcoxon test and the BLN model gave a significant intervention effect with p <
0.001. Figure 3 shows histograms for the two treatment groups with superimposed kernel estimates and
fitted LN distributions. The LN distributions provide a good fit to the observed data. Since ˆ = 1.054,
ˆ
and σ̂ = 1.848, the estimated effect size /σ̂ is equal to 0.57 with 95% CI = (0.36, 0.79), supporting the
intervention effect. The same conclusion can be drawn from P̂ = 0.66 with 95% CI (0.60, 0.71).
We then re-analyzed the data taking into account baseline covariates, including the logit of baseline
adherence. Table 2 shows the results for the BLN model. The effect of intervention and baseline adherence
are both highly significant ( p < 0.001). The estimated value of decreased from 1.054 to 0.818 after
taking into account the imbalance at baseline. Further, the estimated value of σ decreased from 1.848 to
1.433 because the inclusion of covariates decreased the residual variability. As a result, the intervention
effect remained at / ˆ σ̂ = 0.57 with 95% CI (0.35, 0.79) and P̂ = 0.66 with 95% CI (0.60, 0.72). An
additional analysis, allowing for unequal variances for the latent adherence score, resulted in practically
identical parameter estimates and almost equal variances.
Thus, we can conclude that the intervention significantly improves the adherence of the patients de-
spite the fact that the two groups differed in adherence already at baseline. Finally, we did not fit an OP
regression model, for reasons stated above.
Table 2. THAMES study: parameter estimates, standard errors, and p-values for the BLN model for the
compliance data
the latent score for most if not all patients is less than 1. This is medically supported since an observed
score of 1 does not necessarily imply complete neurologic recovery of the patient. Also, patients who
survived a stroke with an observed score of zero may have a true latent score close to, rather than equal to,
zero, whereas non-survivors can be considered as having a true score of zero. One way to treat “death” in
the analysis is to regard it as a separate class and hence to distinguish it from the zero values of survivors.
82 E. L ESAFFRE AND OTHERS
Table 3. ECASS-1 study: parameter estimates, standard errors, and p-values for the CO2-approach
model for the Barthel index
However, in a clinical trial context this approach does not give a clear picture of the overall effect of
6. A LTERNATIVE APPROACHES
In this section, we will consider only the case of equal variances. When the logistic transformation yields
a logistic distribution on the z-scale, then the effect size /σ can be interpreted as a log-odds ratio for the
cumulative distributions on the z-scale, i.e.
F0 (z)/(1 − F0 (z))
= log , (6.1)
σ F (z)/(1 − F (z))
The logistic transform for bounded outcome scores 83
where F0 (z) and F (z) are the cumulative logistic distributions under H0 and Ha , respectively. Thus,
a generalization of the proportional odds model to continuous data is obtained. Further, a more robust
approach would be to use the logistic t-distribution instead of the LN (Lange and others, 1989). Of course,
for scores on (0, 1) a part of the attractiveness of the approach is lost because we would need to replace
classical techniques, like the t-test, by more sophisticated ones.
Another extension to the LN distribution is obtained by changing the logistic transformation. Aitchi-
son and Lauder (1985) suggested a Box-Cox transformation. For the analysis of clinical trial data, we
suspect that this extra flexibility has little to offer. Trigonometric functions like arctan (suitably scaled to
[0, 1]) provide another class of transformations to normality. Dexter and Chestnut (1995) experimented
with the arc sine square root transformation for the analysis of VAS pain data. This is an interesting
transformation since it attains its boundary values. Finally, one referee pointed out that OP regression
allows the possibility that an arbitrary transformation of the underlying distribution is normally dis-
7. C ONCLUSION
The logistic transform is one of the most used transformations in statistics. Hence, we do not claim orig-
inality when proposing the BLN and CO approaches. However, we argue that the strategy to analyze
bounded outcome scores as laid down in this paper has some advantages over other well-established
approaches, like OP and logistic regression. First, our strategy makes the distinction between the two
types of bounded outcome scores. Second, we believe that explicitly drawing the relationship with a
latent normal variable on the transformed scale will help practitioners in planning a clinical trial even
when they decide to use an ordinal regression model for the analysis. Third, our approach is quite flex-
ible. The extension to unequal variances could be quite important in practice. Other extensions, like
allowing for a varying grid of cut points, are also easy to do. Finally, we have developed an ap-
proach to perform sample size calculation based on the CO approach, see Tsonaka and others (2005).
Given that the power of the OP regression model is close to that of the CO approach in the case of
equal variances, we believe that this approach is also useful for sample size calculations for ordinal re-
gression.
With respect to further research, we believe that models for repeated bounded outcome scores either
using a multivariate approach as suggested by Aitchison and Shen (1980) or incorporating random effects
could be useful. In this context, it would also be useful to draw the connection with random-effects OP
regression. Further, it is of interest to develop an approach which can handle a bounded outcome score as
a covariate in a regression model. Recently, Liang and others (2005) have published a related paper that
deals with a bounded covariate. Finally, we intend to explore various procedures to analyze VAS data,
again allowing the variance to depend on covariates.
ACKNOWLEDGMENTS
The authors acknowledge support from the Interuniversity Attraction Poles Program P5/24—Belgian
State—Federal Office for Scientific, Technical, and Cultural Affairs. The authors also thank Pfizer
Belgium for the permission to use the data of the THAMES study and Boehringer Ingelheim for the
permission to use the ECASS-1 data. Finally, the authors also acknowledge the comments of a referee,
the associate editor, and the editor which improved the readability of the paper considerably. Conflict of
Interest: None declared.
84 E. L ESAFFRE AND OTHERS
R EFERENCES
A ITCHISON , J. AND B EGG , C. (1976). Statistical diagnosis when cases are not classified with certainty. Biometrika
63, 1–12.
A ITCHISON , J. AND L AUDER , I. J. (1985). Kernel density estimation for compositional data. Applied Statistics 34,
129–37.
A ITCHISON , J. AND S HEN , S. (1980). Logistic-normal distributions: some properties and uses. Biometrika 67,
261–72.
B RUNNER , E. AND M UNZEL , U. (2000). The non-parametric Behrens–Fisher problem: asymptotic theory and small-
sample approximation. Biometrical Journal 42, 17–25.
B UTLER , B. AND L OUIS , T. (1992). Random effects models with non-parametric priors. Statistics in Medicine 11,
T SONAKA , R., R IZOPOULOS , D. AND L ESAFFRE , E. (2005). Power and sample size calculations for discrete
bounded outcome scores (submitted).
U RQUHART, J. (1997). The electronic medication event monitor: lessons for pharmacotherapy. Clinical Pharmacoki-
netics 32, 345–56.
V RIJENS , B., B ELMANS , A., M ATTHYS , K., K LERK , E. D. AND L ESAFFRE , E. (2006). Effect of intervention
through a pharmaceutical care program on patient adherence with prescribed once-daily atorvastatin. Pharma-
coepidemiology and Drug Safety, 15, 115–121.
WAINER , H. AND B ROWN , L. (2004). Two statistical paradoxes in the interpretation of group differences: illustrated
with medical school admission and licensing data. American Statistician 58, 117–23.
W ELCH , B. (1951). On the comparison of several mean values. Biometrika 38, 330–6.
[Received September 30, 2004; first revision June 8, 2005; second revision January 17, 2006;
third revision February 17, 2006; fourth revision March 16, 2006 ]