Wang 2015

Computational Statistics and Data Analysis 83 (2015) 140150
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda
Regression analysis of bivariate current status data under the

Gamma-frailty proportional hazards model using the EM
algorithm
Naichen Wang a , Lianming Wang a, , Christopher S. McMahan b
a
Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, USA
article
info
Article history:
Received 6 December 2013
Received in revised form 8 August 2014
Accepted 13 October 2014
Available online 22 October 2014
Keywords:
Current status data
EM algorithm
Frailty model
Monotone splines
Multivariate regression
Poisson latent variables
Proportional hazards model
abstract
The Gamma-frailty proportional hazards (PH) model is commonly used to analyze correlated survival data. Despite this models popularity, the analysis of correlated current status
data under the Gamma-frailty PH model can prove to be challenging using traditional techniques. Consequently, in this paper we develop a novel expectationmaximization (EM)
algorithm under the Gamma-frailty PH model to study bivariate current status data. Our
method uses a monotone spline representation to approximate the unknown conditional
cumulative baseline hazard functions. Proceeding in this fashion leads to the estimation
of a finite number of parameters while simultaneously allowing for modeling flexibility.
The derivation of the proposed EM algorithm relies on a three-stage data augmentation
involving Poisson latent variables. The resulting algorithm is easy to implement, robust to
initialization, and enjoys quick convergence. Simulation results suggest that the proposed
method works well and is robust to the misspecification of the frailty distribution. Our
methodology is used to analyze chlamydia and gonorrhea data collected by the Nebraska
Public Health Laboratory as a part of the Infertility Prevention Project.
2014 Published by Elsevier B.V.
1. Introduction
Current status data, also referred to as case 1 interval-censored data, commonly arise in many epidemiological, social,
and medical studies. Current status data are characterized by the fact that the failure time of interest is not directly observed, but is known to occur either before or after an examination time. In other words, the failure times are either left- or
right-censored. Data of this form typically result from studies in which subjects (i.e., study units) are examined only once
during the course of an investigation. For example, in tumorigenicity studies conducted by the National Toxicology Program,
rats are exposed to different test agents in an effort to assess their toxicity. Researchers then examine the animals organs
for tumors at the time of their death. Consequently, the tumor onset time for a particular animal is never known exactly,
but rather is known relative to the animals time of death; i.e., the tumor onset time is either before or after the time of
death. The statistical literature is replete with methods of analyzing current status data pertaining to a single failure time of
interest. The focus of this paper will be to extend existing univariate techniques to allow for the analysis of bivariate current
status data; i.e., the situation in which two correlated failure times of interest are both either left- or right-censored.
Corresponding author.
E-mail address: wang99@mailbox.sc.edu (L. Wang).
http://dx.doi.org/10.1016/j.csda.2014.10.013
0167-9473/ 2014 Published by Elsevier B.V.
N. Wang et al. / Computational Statistics and Data Analysis 83 (2015) 140150
141
The primary goals of analyzing multivariate current status data are usually centered around the estimation of the survival
functions, assessing the significance of covariate effects, and estimating the correlation between failure times. Most existing
approaches for regression analysis of correlated current status data can be classified into two categories: marginal or frailty
modeling techniques. Existing methods which do not fall into these categories include the techniques presented in Wang
et al. (2008) and Kim (2014), which make use of copula and multistate models, respectively.
The marginal likelihood approach (Wei et al., 1989), which ignores the correlation between the failure times, has been
a popular method of analyzing multivariate survival data. Various regression models have been proposed and investigated
for studying current status and interval-censored data using this modeling technique. For example, Goggins and Finkelstein
(2000) and Kim and Xue (2002) studied correlated interval-censored data using the marginal proportional hazards (PH)
model. Wang et al. (2006) proposed a goodness-of-fit test for the marginal Cox model for multivariate interval-censored
data. Chen et al. (2007) and Tong et al. (2008) investigated the marginal proportional odds model and the marginal additive
hazards model, respectively. Although the marginal likelihood models provide robust inferences in general, it does not
account for the correlation that naturally exists between the multiple failure times.
In order to acknowledge the underlying correlation structure, frailty models are commonly used to jointly model multivariate survival data; i.e., one or more frailty terms are introduced in order to describe the dependence structure between
the multiple responses. For example, Dunson and Dinse (2002) proposed a probit model with normal frailty for bivariate current status data with informative censoring. Komarek and Lesaffre (2007) proposed a frailty accelerated failure time model
for correlated interval-censored data. Zuma (2007) explored the Gamma-frailty Weibull model for multivariate intervalcensored data. Chen et al. (2009) studied multivariate current status data under the PH model with a normal frailty. Lin
and Wang (2011) proposed a Bayesian proportional odds model with a shared gamma frailty for bivariate or clustered
current status data. Callegaro and Lacobeli (2012) proposed a Cox model with log-skewed-normal frailties for clustered
right-censored data. Chen et al. (2014) studied multivariate current status data under univariate informative censoring. For
a more in depth review of frailty modeling techniques in survival analysis, please see Hougaard (2000), Ibrahim et al. (2008),
and Wienke (2012).
The Gamma-frailty PH model, as a special frailty model, has been widely used for studying correlated survival data.
For the purposes of analyzing multivariate right-censored data, the Gamma-frailty PH model has become quite popular;
e.g., see Klein (1992), Andersen et al. (1997), Rondeau et al. (2003), Cui and Sun (2004), and Yin and Ibrahim (2005) among
many others. In contrast, the Gamma-frailty PH model is less frequently adopted for studying multivariate current status
or interval-censored data, due to the complex structure of these types of data. For clustered current status data, Wen and
Chen (2011) proposed a computational method under the Gamma-frailty PH model, which involves iteratively updating the
regression and frailty parameters through a NewtonRaphson technique, while the parameters associated with the cumulative baseline hazard functions are updated through self-consistency equations. Chang et al. (2007) proposed a correlated
Gamma-frailty PH model for clustered current status data. Hens et al. (2009) developed an estimation approach that makes
use of a shared and correlated Gamma-frailty PH models for bivariate current status data. Considering the popularity of
the Gamma-frailty PH model, further efforts to develop flexible techniques of analyzing multivariate current status and
interval-censored data under this model are obviously needed.
The primary goal of this work is to develop a precise, flexible, and computationally efficient method that can be used to
analyze correlated bivariate current status data under the Gamma-frailty PH model. To provide adequate modeling flexibility, the monotone splines of Ramsay (1988) are adopted to model the unknown conditional cumulative baseline hazard
functions. Through a three-stage data augmentation procedure, involving latent Poisson random variables, a novel EM algorithm is derived for the purposes of model fitting. At each iteration of our algorithm the spline coefficients are updated
in closed form, with the regression parameters being updated through solving a low-dimensional system of equations. Additionally, all of the expectations involved in the E-step of our algorithm are available in closed form. These features allow
our approach to be very computationally efficient, when compared to other competing methods, especially for the analysis
of large data sets as is illustrated in our simulation and data analysis sections. Further our proposed technique is easy to
implement and robust to initialization.
The remainder of this article is organized as follows. In Section 2, we present the Gamma-frailty PH model, the observed
data likelihood, and describe how monotone splines are used to approximate the unknown conditional cumulative baseline
hazard functions. In Section 3, we present the derivation of our EM algorithm. In Section 4 we illustrate the performance
of our method through a simulation study and in Section 5 we present the results from a real data application with a large
sample size. Section 6 concludes with a summary and discussion.
2. Model, data and likelihood
2.1. Gamma-frailty PH model
Let T1 and T2 denote two failure times of interest. Under the Gamma-frailty PH model, the conditional cumulative hazard
function for Tj , given the frailty , can be expressed as
j (t |x, ) = 0j (t ) exp(x j ),
for j = 1, 2,
(1)
142
where Ga( 1 , 1 ), 0j () is the conditional cumulative baseline hazard function, x is a vector of covariates, and
j is the corresponding vector of regression parameters. The frailty distribution is taken to have the same shape and rate
parameters so that the mean of the frailty distribution is 1. This constraint is necessary for identifiability reasons. In the above
model, it is assumed that T1 and T2 are conditionally independent given the frailty term . Further, the conditional cumulative
distribution function for Tj given the frailty takes the form Fj (t |x, ) = 1 exp{0j (t ) exp(x j )}. Theoretically for
Fj (t |x, ) to be proper, 0j () must be a nondecreasing function with 0j (0) = 0 and limt 0j (t ) = . The latter
characteristic can be relaxed when one considers a finite time domain; e.g., when conducting data analysis.
The Gamma-frailty PH model has several desirable properties. First, greater modeling flexibility can be obtained since
the conditional cumulative baseline hazard functions (i.e., 0j (), for j = 1, 2) are not required to have a specific form. A
discussion on how to exploit this characteristic is provided in Section 2.3. Second, the marginal and joint survival functions
can be expressed in closed form as
Sj (t |x) = P (Tj > t |x) = 1 + 0j (t ) exp(x j )
for j = 1, 2,
(2)
S (t1 , t2 |x) = P (T1 > t1 , T2 > t2 |x) = 1 + 01 (t1 ) exp(x 1 ) + 02 (t2 ) exp(x 2 )
respectively. Notice, it can be ascertained from (2) that Tj marginally follows a generalized odds-rate hazards model (Scharfstein et al., 1998; Banerjee et al., 2007). Consequently, the regression parameter j can also be interpreted as the marginal
covariate effects on Tj under the generalized odds-rate hazards model. Finally, the Gamma-frailty PH model also provides
a closed-form expression for the statistical association between the two failure times in terms of Kendalls with = +
2
(Hougaard, 2000; Sun, 2006). Kendalls is defined as
= E [sign{(Ti1 Tj1 )(Ti2 Tj2 )}],

where (Ti1 , Ti2 ) and (Tj1 , Tj2 ) are two independent and identically distributed copies of (T1 , T2 ) and sign() is the usual sign
function; i.e., this function takes values 1, 0, and 1 when the argument is positive, zero, and negative, respectively. It is
important to notice that under the Gamma-frailty PH model is a deterministic function of . Therefore, the association
(i.e., ) between the two failure times can be accurately estimated as long as a good estimate of is available.
2.2. Data structure and observed likelihood function
In order to derive the observed data likelihood, we proceed under several common assumptions as in Chen et al. (2009),
Hens et al. (2009), and Wen and Chen (2011) among others. In particular, we assume that the two failure times are subject to
univariate censoring at the observation time C , and moreover that Tj , given the covariates, is independent of C , for j = 1, 2.
Let j = 1(Tj C ) denote the censoring indicator for Tj , where 1() denotes the usual indicator function; i.e., j = 1 if the failure
time is left-censored and j = 0 if it is right-censored. Consequently, the observed data from a study consisting of n subjects
can be succinctly expressed as {(ci , xi , i1 , i2 ), i = 1, . . . , n}, where each (ci , xi , i1 , i2 ) is an independent realization of
(C , x, 1 , 2 ). Under the Gamma-frailty PH model and the aforementioned assumptions, the observed data likelihood is
Lobs () =
n
2
i =1
{1 Sj (ci |xi , i )}ij {Sj (ci |xi , i )}(1ij ) g (i | )di ,
j =1
where denotes the unknown parameters in the model, Sj (t |xi , i ) = 1 Fj (t |xi , i ), and g (| ) is the probability density
function of a gamma random variable whose shape and rate parameters both equal 1 . Using the expressions that were
presented in Section 2.1 for the marginal and joint survival functions, one can rewrite the observed data likelihood, after
integrating out the frailty terms, as
Lobs () =
S (ci , ci |xi )
iA1
{S1 (ci |xi ) S (ci , ci |xi )}
iA2
{S2 (ci |xi ) S (ci , ci |xi )}
iA3
{1 S1 (ci |xi ) S2 (ci |xi ) + S (ci , ci |xi )},
(3)
iA4
where A1 = {i : i1 = 0, i2 = 0}, A2 = {i : i1 = 0, i2 = 1}, A3 = {i : i1 = 1, i2 = 0}, and A4 = {i : i1 = 1, i2 = 1}.

2.3. Monotone splines representation of 0j ()
The unknown parameters in (3) involve the regression parameters j , the frailty variance , and the nondecreasing
functions 0j (), for j = 1, 2. It is important to note that 0j () is an infinite dimensional parameter. Following the work of
Cai et al. (2011) and McMahan et al. (2013), we propose to model these unknown functions using the monotone splines of
Ramsay (1988). Specifically, we assume that 0j () can be written as,
0j (t ) =
l =1
jl Ijl (t ),
(4)
143
where Ijl (), for l = 1, . . . , k, is a monotone spline basis function and jl is the corresponding spline coefficient. In particular, the spline basis functions are nondecreasing piecewise polynomials ranging from 0 to 1 and the spline coefficients are
restricted to be nonnegative (i.e., jl 0). Proceeding in this fashion ensures the monotonicity of 0j ().
The monotone spline basis functions Ijl s are fully determined once their degree and knot set have been specified (Ramsay,
1988). The degree of the basis functions controls the overall smoothness of the splines; e.g., specifying the degree to be 1,
2, or 3 corresponds to the splines being linear, quadratic, or cubic, respectively. The knot set is typically comprised of an
increasing sequence of values within the data range, and in conjunction with the degree controls the shape of the splines.
Given the degree and knot set, the number of corresponding basis functions (k) is equal to the degree plus the number of
interior knots.
Since both failure times are subject to the same censoring time, we are modeling both of the conditional cumulative
baseline hazard functions over the same time domain. Consequently, it is reasonable to use the same set of basis functions
to model both 01 () and 02 (). Thus, to simplify our notation we write Ijl (), for j = 1, 2, as Il () from henceforward. It has
been our experience that specifying the degree of the basis functions to be either 2 or 3 provides adequate smoothness (Lin
and Wang, 2010; Cai et al., 2011; Wang and Lin, 2011). Further, we recommend that the knot set consist of a fixed number of
equally spaced points between the minimum and maximum of the censoring times. Model selection criteria, such as Akaikes
information criterion (AIC) or the Bayesian information criterion (BIC), can be used to determine the appropriate number
of knots, as was demonstrated in Rosenberg (1995) and McMahan et al. (2013). An alternative approach would be to treat
both the number and position of the knots as unknown parameters and optimize over them according to some selection
criterion as in Shen (1998) among others. Methods based on this strategy are usually computationally burdensome and time
consuming.
3. The proposed method
3.1. Data augmentation steps
In lieu of the monotone spline representation of the conditional cumulative baseline hazard functions, the unknown
parameters in (3) are = (1 , 2 , 1 , 2 , ) , where j = (j1 , . . . , jk ) for j = 1, 2. Consequently, one could obtain
an estimator of by directly maximizing the observed data likelihood; i.e., the maximum likelihood estimate (MLE) could
be obtained as
= argmax Lobs (). However, numerically maximizing (3) with respect to is challenging because the
spline coefficients are constrained to be nonnegative. Further, we have found that it is very difficult to provide good
initial values for constrained optimization procedures, and even with good initial values being specified, it is common that
numerical optimization techniques (e.g., Newton type algorithms) often fail to converge. To obviate these difficulties, we
have developed a novel EM algorithm for the purpose of obtaining the MLE of .
The derivation of our proposed EM algorithm relies on a three-stage data augmentation involving latent Poisson random
variables. In the first stage we introduce the individual frailty terms (i.e., i , for i = 1, . . . , n) as latent random variables,
and obtain the following conditional likelihood
L1 () =
g (i | )
i=1
{1 Sj (ci |xi , i )}ij {Sj (ci |xi , i )}(1ij ) .
(5)
j =1
The second stage involves relating the censoring indicator ij to a latent Poisson random variable zij ; i.e., we introduce
zij |i P {|0j (ci ) exp(xi j )i } so that ij = 1(zij >0) , where P (|a) denotes the probability mass function for the Poisson
distribution with mean a. Consequently, the conditional likelihood of the observed data and the latent variables can be
expressed as
L2 () =
g (i | )
i=1
1(z >0)
ij
ij
(1 ij )
1(z =0)
ij
P {zij |0j (ci ) exp(xi j )i }.
(6)
j =1
In the final stage, we decompose each zij as the sum of k independent latent Poisson random variables; i.e., we let zij =
zijl , where zijl |i P {|jl Il (ci ) exp(xi j )i }, for l = 1, . . . , k. At this layer, we arrive at the conditional likelihood that
we refer to as the complete data likelihood, which is given by
l =1
Lc () =
g (i | )
i=1
j =1
1(z >0)
ij
ij
1(z =0)
ij
(1 ij )
zij =
l=1
zijl
P {zijl |jl Il (ci ) exp(xi j )i }.
(7)
l =1
It is important to notice that by integrating all of the zijl s out of (7) one would obtain (6), similarly integrating all of the zij s
out of (6) results in (5), and finally integrating all of the i s out of (5) leads back to the observed data likelihood.
3.2. The EM algorithm
To develop our EM algorithm we will view (7) as our complete data likelihood, in which the latent variables (i.e., the zij s,
zijl s, and i s) are viewed as missing data. The E-step in our EM algorithm involves taking the expectation of logLc () with
144
respect to all of the latent variables conditional on the observed data, which we denote by D , and the current parameter
(m)
, (2m) , (1m) , (2m) , (m) ) . This yields Q (, (m) ) = H1 (, (m) ) + H2 (, (m) ) + H3 ( (m) ), where
n
2
k
{log(jl ) + xi j }E (zijl ) jl Il (ci ) exp(xi j )E (i ) ,

H1 (, (m) ) =
estimate (m) = (1
i=1 j=1 l=1
H2 (, (m) ) = n 1 log( 1 ) n log{ ( 1 )} + 1
[E {log(i )} E (i )],
i=1
(m)
(m)
and H3 ( ) is a function of
but is free of . Notice, for notational brevity we have suppressed the conditioning
arguments in the above expectations; i.e., it is understood that E () = E (|D , (m) ) from henceforth. All of the conditional
expectations in H1 (, (m) ) and H2 (, (m) ) have closed form expressions and are provided in the Appendix A, along with a
brief sketch of their derivation. The M-step in our algorithm then finds (m+1) as the maximizer of Q (, (m) ); i.e., (m+1) =
argmax Q (, (m) ). First we note that H1 (, (m) ) is free of , consequently one can obtain (m+1) by directly maximizing
H2 (, (m) ), a univariate function of . It can be shown that this maximizer is unique. Since is positive, the maximization of
H2 (, (m) ) with respect to can be accomplished by either using constrained maximization routines (e.g., optim in R) or by
using unconstrained optimization routines (e.g., Newton related algorithms) after a reparameterization; e.g., = log().
It has been our experience that the former approach is typically more reliable, when compared to the latter. To obtain the
updates of j and j , for j = 1, 2, we first note that H2 (, (m) ) is free of the regression parameters and spline coefficients, so
we need only maximize H1 (, (m) ) with respect to the j s and j s. To this end, we consider the following partial derivatives,
n
H1 (, (m) )
=
E (zij ) E (i )0j (ci ) exp(xi j ) xi , for j = 1, 2,
j
i =1
n
(m)
H1 (, )
=
jl 1 E (zijl ) Il (ci ) exp(xi j )E (i ) , for l = 1, . . . , k and j = 1, 2.
jl
i =1
Setting H1 (, (m) )/jl equal to 0 and solving for jl we obtain the maximizer as a function of j , which can be expressed
as
n
(m+1)
jl
E (zijl )
i=1
(j ) =
for l = 1, . . . , k and j = 1, 2.
(8)
E (i )Il (ci ) exp(xi j )
i=1
(m+1)
To find j
(m+1)
for j
(m+1)
, one would then replace jl by jl

(m+1)
; i.e., j
would be obtained as the solution to the following system of equations
E (zij ) E (i )
i =1
(j ) in the system of equations given by H1 (, (m) )/j = 0 and solve
jl(m+1) (j )Il (ci ) exp(xi j ) xi = 0,
for j = 1, 2.
l=1
Since j is unconstrained, the aforementioned system of equations can easily be solved using standard root finding software.
(m+1)
In doing so, jl
(m+1)
is simultaneously determined as jl
(m+1)
expression of jl
= jl(m+1) (j(m+1) ), for l = 1, . . . , k and j = 1, 2. Notice the
automatically satisfies the nonnegative constraint for each j and l.
We now succinctly state our EM algorithm. First, initialize (0) and set m = 0, then repeat the following steps until
convergence:
(m+1)
1. Obtain j
as the solution to the following system of equations,
E (zij ) E (i )
i =1
jl(m+1) (j )Il (ci ) exp(xi j ) xi = 0,
for j = 1, 2.
l=1
(m+1)
2. Calculate jl
= jl(m+1) ((j m+1) ), for l = 1, . . . , k and j = 1, 2.
3. Obtain (m+1) = argmax H2 (; (m) ) and update m = m + 1.

Denote the final value (m+1) at convergence as
. It is easy to show that
satisfies the score equations associated with the
observed data likelihood, and it is therefore the MLE of .
145
3.3. Variance estimation

In order to draw inference, an estimate of the variancecovariance matrix of
has to be obtained. Since the observed
data likelihood exists in closed form, a natural estimator of the variancecovariance matrix would be {I (
)}1 , where I ()
denotes the observed information matrix or the so-called Hessian matrix, i.e., I () = 2 log Lobs / . As one might
expect, analytically expressing the mixed partial derivatives involved in I () can be quite tedious, and likewise evaluating
them can be computationally burdensome. An alternative is to use Louiss method to evaluate I (
), but this approach is also
fraught with the same complexities. A simple and feasible solution is to use existing statistical routines that numerically
approximate the Hessian matrix at . Here we provide a brief description of an approximation that was similarly used in
Zeng et al. (2006) and Lin and Wang (2010). In particular, we propose to approximate Is,l (
), the (s, l)th element of I (
),
using
2
h
log{Lobs (
+ hn es + hn el )} log{Lobs (
+ hn es )} log{Lobs (
+ hn el )} + log{Lobs (
)} ,
n
where
es is a binary vector whose sth element is 1 with all others being 0 and hn is a small tuning constant. In particular, as
the tuning parameter hn goes to zero (i.e., hn 0) the approximation is expected to improve, although numerical instability
can be encountered if hn is taken to be too small. In general, hn should be selected to be of order n1/2 , and we have found that
selecting a decreasing sequence of hn and approximating I (
) at each allows one to establish a range of values for which the
tuning parameter performs well; i.e., a range of hn for which the approximation of I (
) is stable. Proceeding in this fashion
provides a straightforward, reliable, and computationally efficient method of estimating the variancecovariance matrix of
.
4. Simulation study
In order to evaluate the finite sample performance of the proposed methodology, several extensive simulation studies
were conducted. In particular, three scenarios were considered: Scenario (1) considers the situation in which the assumed
model (i.e., the Gamma-frailty PH model) is correctly specified; Scenario (2) investigates the situation in which the frailty
distribution is misspecified; and Scenario (3) examines the situation in which the two failure times share both a common
cumulative baseline hazard function and set of regression parameters. The two former scenarios investigate the performance
of the proposed methodology across a wide variety of settings, while the latter allows for a direct comparison between the
proposed approach and the methodology presented in Wen and Chen (2011).
Under Scenario (1), the following models for T1 and T2 were considered,
Fj (t |x1 , x2 , ) = 1 exp{0j (t ) exp(j1 x1 + j2 x2 )},
for j = 1, 2,
where x1 Bernoulli(0.5), x2 N (0, 0.5 ), G(1, 1), 0j (t ) = log(1 + t ) + t 3/2 , and each of the regression parameters
took on values 0.5 or 0.5, with 11 = 21 and 12 = 22 . The censoring time C was generated from a truncated exponential
distribution E () with support (0, 10). In order to examine different censoring rates, several values of the rate parameter
were considered; i.e., {1, 2, 5}. The censoring indicator, j , was sampled according to a Bernoulli distribution with
success probability F (C |x1 , x2 , ), for each j. Proceeding in this fashion circumvents sampling the failure times directly. For
each of the regression parameter configurations we generated 500 data sets, each containing n = 200 observations.
To fit the proposed model, the degree of the monotone splines was specified to be 3 and a knot set consisting of 5 equally
spaced knots within the minimum and maximum of the censoring times for each data set was considered. The EM algorithm
outlined in Section 3.2 was then used to estimate both the regression and spline coefficients as well as the frailty variance
parameter for each of the data sets. Under all simulation settings, the regression parameters were initialized to be 0, the
initial value of the frailty variance was taken to be 2, and the initial values of the spline coefficients were randomly generated according to an E (1) distribution. Convergence of the algorithm was declared when all of the absolute differences
between consecutive updates for the regression parameters and the frailty variance were less than 104 . To approximate
the observed information matrix, the tuning parameter hn was taken to be 0.01n1/2 . Table 1 summarizes the parameter
estimates resulting from the proposed approach, across a variety of the considered simulation settings. In particular, the settings chosen for Table 1 provide a summary of the performance of the proposed methodology across a variety of censoring
rates. Additional simulation results under Scenario (1) can be found in the web-based supplementary file (see Appendix B).
The summarized results in Table 1 include the average of the 500 MLEs minus their true value (BIAS), the average of the
estimated standard errors (ESE), the sample standard deviation of the 500 MLEs (SSD), and empirical coverage probabilities associated with 95% Wald confidence intervals (CP95). As seen in Table 1, the parameter estimates exhibit little, if any,
evidence of bias, the averaged standard errors are in agreement with the sample standard deviations of the MLEs, and the
coverage probabilities are close to their nominal levels, for all the regression parameters and the frailty variance parameter
under all considered configurations.
Scenario (2) was aimed at investigating how sensitive the proposed techniques is to the gamma frailty assumption. This
simulation considered exactly the same model specifications as in Scenario (1), with the exception that a mixture log-normal
distribution was specified for the frailty term; i.e.,
2
f () = 0.25LN (1, 2) + 0.50LN (1, 0.61) + 0.25LN (0.5, 0.39),
146
Table 1
Simulation results under Scenario (1) with different left-censoring rates (LR). Summarized results include the average of the 500 MLEs minus their true
value (BIAS), the sample standard deviation of the 500 MLEs (SSD), the average of the estimated standard errors (ESE), and empirical coverage probabilities
associated with 95% Wald confidence intervals (CP95), for each parameter.
j1 = 0.5 and j2 = 0.5

LR
BIAS
45.2%
45.2%
30.9%
30.8%
16.2%
16.0%
11
12
21
22
11
12
21
22
11
12
21
22
SSD
ESE
CP95
j1 = 0.5 and j2 = 0.5

LR
BIAS
0.011
0.012
0.001
0.030
0.034
0.347
0.340
0.335
0.374
0.336
0.348
0.352
0.350
0.352
0.367
0.952
0.956
0.952
0.946
0.982
54.3%
0.040
0.366
0.395
0.368
0.384
0.416
0.364
0.372
0.363
0.372
0.414
0.952
0.946
0.946
0.956
0.978
39.9%
0.482
0.532
0.507
0.487
0.716
0.459
0.460
0.457
0.458
0.635
0.934
0.928
0.906
0.926
0.974
22.7%
0.021
0.029
0.038
0.077
0.037
0.037
0.060
0.017
0.187
54.1%
39.7%
22.7%
11
12
21
22
11
12
21
22
11
12
21
22
SSD
ESE
CP95
0.014
0.018
0.354
0.370
0.330
0.355
0.302
0.346
0.351
0.344
0.350
0.346
0.936
0.942
0.968
0.954
0.982
0.011
0.026
0.010
0.044
0.039
0.361
0.355
0.363
0.354
0.355
0.350
0.355
0.352
0.355
0.385
0.956
0.952
0.946
0.966
0.992
0.046
0.048
0.010
0.014
0.088
0.410
0.410
0.400
0.406
0.539
0.403
0.405
0.403
0.402
0.509
0.954
0.956
0.942
0.952
0.976
0.007
0.005
0.013
Table 2
Simulation results under Scenario (2) with different left-censoring rates (LR) when the gamma frailty distribution is misspecified. Summarized results
include the average of the 500 MLEs minus their true value (BIAS), the sample standard deviation of the 500 MLEs (SSD), the average of the estimated
standard errors (ESE), and empirical coverage probabilities associated with 95% Wald confidence intervals (CP95), for each parameter.
j1 = 0.5 and j2 = 0.5

LR
BIAS
43.3%
43.4%
29.0%
37.7%
15.0%
15.0%
SSD
ESE
CP95
j1 = 0.5 and j2 = 0.5

LR
BIAS
11
12
21
22
0.020
11
12
21
22
0.006
0.009
11
12
21
22
11
12
21
22
0.034
0.005
0.025
0.013
0.340
0.358
0.342
0.348
0.340
0.348
0.337
0.346
0.956
0.948
0.948
0.954
52.1%
11
12
21
22
0.015
0.004
0.042
0.370
0.377
0.372
0.377
0.952
0.938
0.940
0.956
29.2%
0.011
0.391
0.398
0.399
0.394
11
12
21
22
0.005
0.060
0.009
0.029
0.485
0.504
0.486
0.505
0.471
0.480
0.464
0.478
0.944
0.944
0.938
0.944
21.1%
52.2%
37.6%
21.4%
SSD
ESE
CP95
0.327
0.339
0.344
0.332
0.332
0.335
0.334
0.336
0.948
0.956
0.950
0.950
0.013
0.006
0.335
0.368
0.375
0.361
0.357
0.362
0.360
0.355
0.968
0.938
0.930
0.958
0.017
0.022
0.011
0.034
0.426
0.428
0.444
0.426
0.414
0.417
0.420
0.415
0.946
0.958
0.948
0.950
0.013
0.030
0.002
where LN (, 2 ) denotes the log-normal distribution with location parameter and scale parameter . The proposed
model was again fit using the same settings as were described in Scenario (1). Table 2 presents a summary of the regression
parameter estimates obtained by the proposed methodology for the same simulation configurations as were considered in
Table 1. These results again exhibit little, if any, bias in the point estimates, the average estimated standard errors remain
in agreement with the sample standard deviation of the MLEs, and the coverage probabilities are again close to 0.95 for all
regression parameters, under all considered settings. This suggests that our proposed methodology can provide accurate
point and variance estimates for all of the regression parameters even when the frailty distribution is egregiously misspecified; i.e., the proposed methodology is robust to the misspecification of the frailty distribution. Additional simulation results
under Scenario (2) can be found in the web-based supplementary file (see Appendix B).
Scenario (3) was designed to compare the proposed approach with the methodology presented in Wen and Chen (2011).
In particular, the technique proposed by Wen and Chen (2011) was designed to analyze clustered current status data under
the Gamma-frailty PH model. This method can also be used to analyze bivariate current status data, under the assumption
that the two correlated failure times share both a common cumulative baseline hazard function and set of regression
coefficients. In order to facilitate this comparison, slight modifications of the methods presented in Section 3 were made
so that these two techniques could be compared. In particular, the modifications allow for different censoring times for the
two events and require 1 = 2 = and 1 = 2 = . For further details pertaining to these modifications, see Section 2
of the web-based supplementary file (see Appendix B). Simulation settings were chosen to emulate the studies conducted
in Wen and Chen (2011). Specifically, the common cumulative baseline hazard function was taken to be 0 (t ) = t, the
frailty distribution was specified to be G(1, 1), and the two censoring times were independently generated according to
U(0,1) . The settings for the regression coefficients and covariate distributions were specified to be the same as in Scenario (1).
147
Table 3
Simulation results from the proposed method and the approach of Wen and Chen (2011) under Scenario (3). Summarized results include the average of
the 500 MLEs minus the true value (BIAS), the sample standard deviation of the 500 MLEs (SSD), the average of the estimated standard errors (ESE), and
empirical coverage probabilities associated with 95% Wald confidence intervals (CP95), for each parameter.
Proposed method
BIAS
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
1
2
1
2
1
2
1
2
0.018
0.010
0.019
0.024
0.018
0.048
0.023
0.020
0.119
0.025
0.035
0.058
Wen and Chen (2011)
SSD
ESE
CP95
0.232
0.246
0.404
0.239
0.234
0.438
0.244
0.265
0.522
0.253
0.261
0.525
0.230
0.230
0.427
0.231
0.232
0.428
0.257
0.250
0.542
0.257
0.253
0.534
0.954
0.936
0.990
0.950
0.942
0.990
0.950
0.950
0.976
0.954
0.948
0.990
BIAS
0.043
0.035
0.097
0.045
0.038
0.120
0.048
0.060
0.214
0.044
0.040
0.124
SSD
ESE
CP95
0.247
0.261
0.445
0.255
0.245
0.473
0.261
0.275
0.572
0.260
0.280
0.570
0.240
0.238
0.461
0.241
0.245
0.473
0.258
0.267
0.594
0.255
0.257
0.569
0.954
0.934
0.966
0.952
0.944
0.956
0.942
0.954
0.994
0.952
0.944
0.980
Convergence for both methods was declared when the difference between consecutive updates of the parameters of interest
were less than 105 . This stricter convergence criterion was chosen because it was the default value used in Wen and
Chens Matlab package for implementing their method. Table 3 summarizes the parameter estimates obtained by the two
competing techniques. In summary, both methods seem to perform well; however, the proposed method seems to perform
slightly better; i.e., the proposed technique results in parameter estimates that exhibit both a smaller bias and variance.
In summary, through simulation the proposed methodology has been shown to be very reliable for the purposes of
analyzing bivariate current status data. Further, through the derivation of a novel EM algorithm, the proposed model can be
fit at a minimal computational expense; e.g., the average time required for the EM algorithm to both converge and estimate
the variancecovariance matrix under Scenarios (1) and (2) was approximately 34 s per data set. Time trials were also
conducted between the proposed approach and the methodology presented in Wen and Chen (2011). The results from these
additional studies, which are provided in the web-based supplementary file (see Appendix B), indicate that the proposed
method is far more computationally efficient when compared to this competing technique, especially for larger sample sizes.
For example, when N = 2000 the proposed approach is approximately 470 times faster than the methodology proposed
in Wen and Chen (2011). This finding suggests that our methodology, unlike many competing approaches, can be used to
analyze relatively large data sets in a timely fashion.
5. Data application
Chlamydia, one of the most common sexually transmitted diseases (STDs), is a bacterial infection that often co-exists
with gonorrhea. Both of these diseases are predominantly asymptomatic and if left untreated, can lead to serious medical
complications, to include sterility, ectopic pregnancy, and pelvic inflammatory disease. Started by the Centers for Disease
Control and Prevention, the Infertility Prevention Project (IPP) funds screening and treatment services for both of these
diseases, throughout the United States. For example, the Nebraska Public Health Laboratory (NPHL), as a part of the
IPP, annually screens approximately 25,000 patients for both of these diseases. Like many national infertility prevention
programs, one of the key components of the IPP is to analyze STD screening data for the purposes of both monitoring trends
in disease prevalence and assessing the effectiveness of the program.
In this analysis, we consider chlamydia and gonorrhea data collected by the NPHL during the calendar years of 2008 and
2009, which consist of 23,146 and 27,551 observations, respectively. This data consists of the infection status of each of
the patients, for both diseases. In addition to this information, several covariates were collected on each patient, to include
their age, past sexual behavior, race, etc. In particular, we consider several binary covariates: the patients gender (Gender),
the reason for the clinical visit (StdScreen), whether they have had multiple sexual partners (MultSP), whether they were
Caucasian (C-Amer)/African-American (A-Amer), and whether they presented with symptoms (Symptoms). A summary of
the disease prevalence rates and covariate information for this study population is provided in the web-based supplementary file (see Appendix B). Using this information, we will jointly model the infection times of these diseases, both of which
are not directly observed but are rather known to be either left- or right-censored relative to the age of the patient at the
time of testing. Given the observed predictor information, we believe that it reasonable to proceed under our modeling
assumptions, for further discussion please see Section 6. The purpose of this analysis is two-fold, first we wish to jointly
model the infection times of these diseases for the purposes of identifying the sub-populations that are at higher risk as well
as quantifying the statistical dependence that exists between these two diseases. Second, we wish to further illustrate the
computational efficiency of our proposed methodology, when applied to a relatively large data set; i.e., a data set consisting
of over 50,000 observations.
To implement our proposed methodology, we specified the degree of the monotone splines to be 3 and investigated
several knot sets each consisting of m equally spaced interior knots, where m {1, . . . , 10}. In implementing our method,
148
Table 4
Nebraska IPP: Estimates of the covariate effects and their standard errors (in parenthesis) for the chlamydia (CT) and gonorrhea (GC) data using m equally
spaced interior knots. Also provided are the AIC and BIC values as well as model fitting times (Time), in seconds. The list of covariates considered in
this analysis: gender (Gender), whether the visit was for STD screening (StdScreen), whether the participant had multiple sexual partners (MultSP), was
Caucasian (C-Amer), was African American (A-Amer), and presented with symptoms (Symptoms).
Number of interior knots
CT
GC
m=2
m=3
m=4
m=5
m=6
m=7
Gender
StdScreen
MultSP
C-Amer
A-Amer
Symptoms
0.13 (0.04)
0.13 (0.04)
0.13 (0.04)
0.13 (0.04)
0.13 (0.05)
0.13 (0.04)
0.09 (0.05)
0.41 (0.05)
0.32 (0.06)
0.34 (0.06)
0.45 (0.04)
0.09 (0.05)
0.42 (0.05)
0.32 (0.06)
0.33 (0.06)
0.46 (0.04)
0.09 (0.05)
0.42 (0.05)
0.32 (0.06)
0.33 (0.06)
0.46 (0.04)
0.09 (0.05)
0.42 (0.05)
0.32 (0.06)
0.33 (0.06)
0.46 (0.04)
0.09 (0.06)
0.42 (0.05)
0.32 (0.07)
0.33 (0.07)
0.46 (0.04)
0.09 (0.05)
0.42 (0.05)
0.32 (0.06)
0.33 (0.06)
0.46 (0.04)
Gender
StdScreen
MultSP
C-Amer
A-Amer
Symptoms
0.36 (0.08)
0.37 (0.09)
0.37 (0.08)
0.37 (0.08)
0.37 (0.08)
0.37 (0.08)
0.10 (0.14)
0.64 (0.08)
0.22 (0.16)
1.46 (0.16)
1.49 (0.07)
0.10 (0.22)
0.64 (0.08)
0.22 (0.27)
1.45 (0.24)
1.50 (0.08)
0.09 (0.16)
0.64 (0.08)
0.22 (0.20)
1.45 (0.18)
1.50 (0.08)
0.09 (0.15)
0.64 (0.08)
0.22 (0.18)
1.45 (0.17)
1.50 (0.08)
0.09 (0.14)
0.64 (0.08)
0.22 (0.17)
1.45 (0.16)
1.50 (0.07)
0.10 (0.18)
0.64 (0.08)
0.22 (0.22)
1.45 (0.20)
1.50 (0.08)
2.58 (0.20)
35966.28
36169.46
1796.39
2.48 (0.21)
35898.11
36118.95
1729.99
2.47 (0.20)
35889.97
36128.48
1705.76
2.47 (0.19)
35893.96
36150.13
1659.40
2.47 (0.20)
35899.00
36172.85
1707.14
2.47 (0.20)
35905.12
36196.63
1776.90
AIC
BIC
Time
Table 5
Nebraska IPP: Summary of the two final models. The presented results include the estimated covariate effects, their standard errors, and the corresponding
p-values. The models selected by AIC and BIC make use of m = 4 and 3 interior knots, respectively. The list of covariates considered in this analysis: gender
(Gender), whether the visit was for STD screening (StdScreen), whether the participant had multiple sexual partners (MultSP), was Caucasian (C-Amer),
was African American (A-Amer), and presented with symptoms (Symptoms).
Covariate
m=3
m=4
Chlamydia
Gonorrhea
Est
SE
p.v.
Est
SE
p.v.
Gender
StdScreen
MultSP
C-Amer
A-Amer
Symptoms
0.1336
0.0410
0.0486
0.0481
0.0578
0.0626
0.0397
0.0011
0.0561
0
2.4E8
9.2E8
0
0.3722
0.0872
0.2161
0.0798
0.2654
0.2419
0.0812
2.0E5
0.6571
0
0.4065
2.9E9
0
Gender
StdScreen
MultSP
C-Amer
A-Amer
Symptoms
0.1339
0.0408
0.0485
0.0481
0.0574
0.0624
0.0397
0.0010
0.0572
0
2.2E8
1.0E7
0
0.3741
0.0786
0.1598
0.0790
0.1953
0.1820
0.0764
2.0E6
0.5543
0
0.2622
0
0
0.0929
0.4153
0.3224
0.3345
0.4558
0.0922
0.4165
0.3210
0.3320
0.4595
0.0959
0.6429
0.2203
1.4498
1.4991
0.0945
0.6439
0.2189
1.4476
1.5033
the convergence of the EM algorithm was declared when all of the absolute differences between consecutive updates for
the regression parameters and the frailty variance were less than 105 . Table 4 summarizes the parameter estimates under
different settings of m, as well as the corresponding AIC and BIC values and model fitting times. These results indicate that
the estimation of the regression coefficients (and their standard errors) tend to be robust to the number of knots. The model
fits based on m = 3 and 4 interior knots were selected as the final models, since these fits resulted in the smallest values of
BIC and AIC, respectively. Table 5 summarizes the estimates of the regression parameters obtained from our joint analysis
under our two final models (i.e., when m = 3 and 4). Both models identify the same set of significant risk factors; i.e., being
male, having multiple sexual partners, and presenting with symptoms are significantly related to an increases in a patients
risk. Further, it is observed that the incidence of both of these diseases is more prominent in African-Americans and the least
prevalent in Caucasians.
It is worthwhile to point out that our joint model holds the marginal logistic model as a special case; i.e., the marginal
survival function, Sj (t ), of Tj in expression (2) reduces to the P (j = 0| t ) under the logistic regression model when = 1,
with the unspecified 0j (t ) corresponding to an unknown nonlinear age effect. Further, using this fact we can formally test
the validity of whether a marginal logistic model would be appropriate by testing H0 : = 1 versus Ha : = 1. As seen
in Table 4, our joint analysis yields an estimate of to be = 2.48(2.47) with an estimated standard error of 0.21(0.20),
when m = 3(4). This leads to a p-value 0 for the test of H0 : = 1 versus Ha : = 1, under both considered values of
m. Consequently, the univariate analysis of this data, under the logistic model, would not be appropriate.
To quantify the statistical association that exists between these two diseases, we can obtain an estimate of Kendalls
concordance between the two infection times from our joint analysis. This is accomplished by an application of the delta
method based on the relationship between and . The estimated Kendalls is = 0.55 with an estimated standard error
149
of 0.20 when m = 3 and 4. This suggests that there is a moderate association between the two failure times. In comparison,
it is not possible to estimate the dependence between the infection times of chlamydia and gonorrhea through the use of
univariate modeling techniques.
In summary, this analysis illustrates the key features of the proposed methodology; i.e., the proposed techniques can
robustly and efficiently estimate all unknown parameters, through the proposed modeling structure one can estimate the
dependence between the two failure times, and the proposed methodology can be used to assess the validity of an alternate marginal modeling approach. All of this is accomplished in a computationally efficient fashion, a characteristic that
other competing procedures do not possess. For example, in the comparisons between our proposed methodology and the
techniques presented in Wen and Chen (2011), which is provided in the web-based supplementary file (see Appendix B),
we have found that it takes this competing procedure approximately 200 min, on average, to analyze data sets consisting
of 2000 observations. In contrast, our proposed methodology is capable of analyzing a data set consisting of over 50,000
observations in approximately 1/7th of this time.
6. Concluding remark
In this paper, we develop a computationally efficient method of analyzing bivariate current status data under the Gammafrailty PH model, by generalizing the work of McMahan et al. (2013). Our formulation approximates the unknown conditional
cumulative baseline hazard functions with monotone splines, which significantly reduces the number of unknown parameters while maintaining adequate modeling flexibility. A three-stage data augmentation procedure is used to facilitate the
derivation of our EM algorithm. The resulting algorithm involves solving a low-dimensional system of equations for updating the regression parameters and the frailty variance parameter, with the spline coefficients being updated in closed form.
All of the expectations involved in the EM algorithm can be expressed in closed form. The EM algorithm is easy to implement
and enjoys fast convergence. Through simulation, we have shown that the proposed method accurately and efficiently estimates all of the unknown parameters (and thus the correlation between the two failure times) when the model is correctly
specified and that the estimation of the regression parameters is robust to the misspecification of the frailty distribution.
The approach of Chen et al. (2009) can also be used to analyze bivariate current status data. Specifically their approach
is based on an EM algorithm under a frailty PH model with correlated normal frailties. The EM algorithm proposed by these
authors is definitively more complicated than ours since it uses a series of numerically intensive approximations to evaluate the conditional expectations involved in the E-step of their algorithm. Such numerical approximations cause not only
computational burdens but convergence problems when a strict convergence criteria is used. Consequently, this method, because of its computational nature, may not be appropriate for analyzing large data sets, due to the time required to complete
model fitting. On the other hand, the approach of Chen et al. (2009) allows for correlated normal frailties for multivariate
current status data, while our work focuses only on bivariate current status data. Our method can naturally be extended to
multivariate data with a shared frailty, but extensions allowing for correlated frailties do not appear to be straightforward.
Topics for future work include the development of hypothesis testing procedures that can be used to evaluate our
modeling assumptions. For example, in our data application it may or may not be reasonable to assume that the infection
times are conditionally independent of the censoring times, given the covariates. Consequently, the development of a formal
method of testing this assumption would be discernibly beneficial. Further, for situations in which the aforementioned
modeling assumption does not hold, we plan to extend our proposed methodology to allow for informative censoring, a
modeling attribute that would extend the utilitarian nature of our work to many other epidemiological and medical research
areas.
Acknowledgments
We wish to thank the Editor, Associate Editor, and the two anonymous referees for their constructive comments which
have greatly improved the quality of this work. We would like to thank professors Chi-Chung Wen and Yi-Hau Chen for
providing us with the software that implements the methodology presented in Wen and Chen (2011). We also thank
professor Christopher R. Bilder and colleagues for allowing us to use the chlamydia and gonorrhea data collected in Nebraska.
Appendix A. The conditional expectations in Section 3.3
The conditional expectations in Section 3.3 are summarized as follows,
i1 i2 i1 {S2 (ci |xi )}1+ i2 {S1 (ci |xi )}1+ + {S (ci , ci |xi )}1+
,
i1 i2 i1 S2 (ci |xi ) i2 S1 (ci |xi ) + S (ci , ci |xi )
( 1 )
Bi ()
E {log(i )|D , } =
,
( 1 )
i1 i2 i1 S2 (ci |xi ) i2 S1 (ci |xi ) + S (ci , ci |xi )
ij [1 {Sj (ci |xi )}1+ ] (1 ij ){Sj (ci |xi )}1+
E (zij |D , ) =
ij ij , j {1, 2} \ j,
ij Sj (ci |xi ) ij Sj (ci |xi ) + S (ci , ci |xi )
E (i |D , ) =
E (zijl |D , ) = {0j (ci )}1 jl Il (ci )E (zij |D , ),
150
where Bi () = i1 i2 log( 1 ) i1 S2 (ci |xi ) log( 1 + i2 ) i2 S1 (ci |xi ) log( 1 + i1 ) + S (ci , ci |xi ) log( 1 + i1 + i2 ) and
ij = 0j (ci ) exp(xi j ) for j = 1, 2 and i = 1, . . . , n.
The derivation of E (i |D , ) and E {log(i )|D , } arise directly from the augmented likelihood (5). When deriving
E (zij |D , ), we first use the law of iterated expectations to obtain
i ij ij
D , .
E (zij |D , ) = E {E (zij |D , i , )} = E
1 exp(i ij )
The last step uses the fact that the conditional distribution of zij , given i and the observed data, is a truncated Poisson with
a support of all positive integers when ij = 1 and is degenerated at 0 when ij = 0. One can complete the conditional expectation in the above expression based on the augmented likelihood (5). The derivation of E (zijl |D , ) is also accomplished
through the use of the law of iterated expectations and by noting that the conditional distribution of (zij1 , . . . , zijk ), given zij ,
is multinomial.
Appendix B. Supplementary data
Supplementary material related to this article can be found online at http://dx.doi.org/10.1016/j.csda.2014.10.013.
References
Andersen, P.K., Klein, J.P., Knudsen, K.M., Palacios, R.T., 1997. Estimation of variance in Coxs regression model with shared Gamma frailties. Biometrics 53,
14751484.
Banerjee, T., Chen, M.-H., Dey, D.K., Kim, S., 2007. Bayesian analysis of generalized odds-rate hazards models for survival data. Lifetime Data Anal. 13,
241260.
Cai, B., Lin, X., Wang, L., 2011. Bayesian proportional hazards model for current status data with monotone splines. Comput. Statist. Data Anal. 55,
26442651.
Callegaro, A., Lacobeli, S., 2012. The Cox shared frailty model with log-skew-normal frailties. Stat. Model. 12, 399418.
Chang, I.-S., Wen, C.-C., Wu, Y.-J., 2007. A profile likelihood theory for the correlated Gamma-frailty model with current statust family data. Statist. Sinica
17, 10231046.
Chen, M.-H., Tong, X.W., Sun, J., 2007. The proportional odds model for multivariate interval-censored failure time data. Stat. Med. 26, 51475161.
Chen, M.-H., Tong, X.W., Sun, J., 2009. A frailty model approach for regression analysis of multivariate current status data. Stat. Med. 28, 34243436.
Chen, C.-M., Wei, J.C., Hsu, C.-M., Lee, M.-Y., 2014. Regression analysis of multivariate current status data with dependent censoring: application to
ankylosing spondylitis data. Stat. Med. 33, 772785.
Cui, S., Sun, Y., 2004. Checking for the Gamma frailty distribution under the marginal proportional hazards frailty model. Statist. Sinica 14, 249267.
Dunson, D.B., Dinse, G., 2002. Bayesian models for multivariate current status data with informative censoring. Biometrics 58, 7988.
Goggins, W.B., Finkelstein, D.M., 2000. A proportional hazards model for multivariate interval-censored failure time data. Biometrics 56, 940943.
Hens, N., Wienke, A., Aerts, M., Molenberghs, G., 2009. The correlated and shared Gamma frailty model for bivariate current status data: an illustration for
cross-sectional serological data. Stat. Med. 28, 27852800.
Hougaard, P., 2000. Analysis of Multivariate Survival Data. Springer, New York.
Ibrahim, J.S., Chen, M.-H., Sinha, D., 2008. Bayesian Survival Analysis. Springer, New York.
Kim, Y.-J., 2014. Regression analysis of bivariate current status data using a multistate model. Comm. Statist. Simulation Comput. 43, 462475.
Kim, M.Y., Xue, X.N., 2002. The analysis of multivariate interval-censored survival data. Stat. Med. 21, 37153726.
Klein, J.P., 1992. Semiparametric estimation of random effects using the Cox model based on the EM algorithm. Biometrics 48, 795806.
Komarek, A., Lesaffre, E., 2007. Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as error distribution.
Statist. Sinica 17, 549569.
Lin, X., Wang, L., 2010. A semiparametric probit model for case 2 interval-censored failure time data. Stat. Med. 29, 972981.
Lin, X., Wang, L., 2011. Bayesian proportional odds models for analyzing current status data: univariate, clustered, and multivariate. Comm. Statist.
Simulation Comput. 40, 11711181.
McMahan, C., Wang, L., Tebbs, J., 2013. Regression analysis of current status data using the EM algorithm. Stat. Med. 32, 44524466.
Ramsay, J.O., 1988. Monotone regression splines in action. Statist. Sci. 3, 425441.
Rondeau, V., Commenge, D., Joly, P., 2003. Maximum penalized likelihood estimation in a Gamma-frailty model. Lifetime Data Anal. 9, 139153.
Rosenberg, P.S., 1995. Hazard function estimation using B-splines. Biometrics 51, 874887.
Scharfstein, D.O., Tsiatis, A.A., Gilbert, P.B., 1998. Semiparametric efficient estimation in the generalized odds-rate class of regression models for rightcensored time-to-event data. Lifetime Data Anal. 4, 355391.
Shen, X.T., 1998. Proportional odds regression and sieve maximum likelihood estimation. Biometrika 85, 165177.
Sun, J., 2006. The Statistical Analysis of Interval-Censored Data. Springer, New York.
Tong, X.W., Chen, M.-H., Sun, J., 2008. Regression analysis of multivariate interval-censored failure time data with application to tumorigenicity
experiments. Biom. J. 50, 364374.
Wang, L., Lin, X., 2011. A Bayesian approach for analyzing case 2 interval-censored failure time data under the semiparametric proportional odds model.
Statist. Probab. Lett. 81, 876883.
Wang, L., Sun, L., Sun, J., 2006. A goodness-of-fit test for the marginal Cox model for correlated interval-censored failure time data. Biom. J. 5, 19.
Wang, L., Sun, J., Tong, X.W., 2008. Efficient estimation for bivariate current status data. Lifetime Data Anal. 14, 134153.
Wei, L.J., Lin, D.Y., Weissfeld, L., 1989. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J. Amer. Statist.
Assoc. 84, 10651073.
Wen, C.-C., Chen, Y.-H., 2011. Nonparametric maximum likelihood analysis of clustered current status data with the Gamma-frailty Cox model. Comput.
Statist. Data Anal. 55, 10531060.
Wienke, A., 2012. Frailty Models in Survival Analysis. Chapman & Hall.
Yin, G., Ibrahim, J.G., 2005. A class of Bayesian shared Gamma frailty models with multivariate failure time data. Biometrics 61, 208216.
Zeng, D., Cai, J., Shen, Y., 2006. Semiparametric additive risks model for interval-censored data. Statist. Sinica 16, 287302.
Zuma, K., 2007. A Bayesian analysis of correlated interval-censored data. Comm. Statist. Theory Methods 36, 725730.

Wang 2015

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wang 2015

Uploaded by

Copyright:

Available Formats

Computational Statistics and Data Analysis 83 (2015) 140150

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis

Regression analysis of bivariate current status data under the

Department of Statistics, University of South Carolina, Columbia, SC 29208, USA

Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, USA

N. Wang et al. / Computational Statistics and Data Analysis 83 (2015) 140150

N. Wang et al. / Computational Statistics and Data Analysis 83 (2015) 140150

Sj (t |x) = P (Tj > t |x) = 1 + 0j (t ) exp(x j )

= E [sign{(Ti1 Tj1 )(Ti2 Tj2 )}],

{1 Sj (ci |xi , i )}ij {Sj (ci |xi , i )}(1ij ) g (i | )di ,

{S1 (ci |xi ) S (ci , ci |xi )}

{S2 (ci |xi ) S (ci , ci |xi )}

{1 S1 (ci |xi ) S2 (ci |xi ) + S (ci , ci |xi )},

where A1 = {i : i1 = 0, i2 = 0}, A2 = {i : i1 = 0, i2 = 1}, A3 = {i : i1 = 1, i2 = 0}, and A4 = {i : i1 = 1, i2 = 1}.

N. Wang et al. / Computational Statistics and Data Analysis 83 (2015) 140150

{1 Sj (ci |xi , i )}ij {Sj (ci |xi , i )}(1ij ) .

P {zij |0j (ci ) exp(xi j )i }.

P {zijl |jl Il (ci ) exp(xi j )i }.

N. Wang et al. / Computational Statistics and Data Analysis 83 (2015) 140150

{log(jl ) + xi j }E (zijl ) jl Il (ci ) exp(xi j )E (i ) ,

i=1 j=1 l=1

H2 (, (m) ) = n 1 log( 1 ) n log{ ( 1 )} + 1

E (i )Il (ci ) exp(xi j )

, one would then replace jl by jl

would be obtained as the solution to the following system of equations

(j ) in the system of equations given by H1 (, (m) )/j = 0 and solve

jl(m+1) (j )Il (ci ) exp(xi j ) xi = 0,

= jl(m+1) (j(m+1) ), for l = 1, . . . , k and j = 1, 2. Notice the

automatically satisfies the nonnegative constraint for each j and l.

as the solution to the following system of equations,

jl(m+1) (j )Il (ci ) exp(xi j ) xi = 0,

= jl(m+1) ((j m+1) ), for l = 1, . . . , k and j = 1, 2.

3. Obtain (m+1) = argmax H2 (; (m) ) and update m = m + 1.

N. Wang et al. / Computational Statistics and Data Analysis 83 (2015) 140150

3.3. Variance estimation

f () = 0.25LN (1, 2) + 0.50LN (1, 0.61) + 0.25LN (0.5, 0.39),

N. Wang et al. / Computational Statistics and Data Analysis 83 (2015) 140150

j1 = 0.5 and j2 = 0.5

j1 = 0.5 and j2 = 0.5

j1 = 0.5 and j2 = 0.5

j1 = 0.5 and j2 = 0.5

N. Wang et al. / Computational Statistics and Data Analysis 83 (2015) 140150

Wen and Chen (2011)

N. Wang et al. / Computational Statistics and Data Analysis 83 (2015) 140150

N. Wang et al. / Computational Statistics and Data Analysis 83 (2015) 140150

E (zijl |D , ) = {0j (ci )}1 jl Il (ci )E (zij |D , ),

N. Wang et al. / Computational Statistics and Data Analysis 83 (2015) 140150

You might also like