Public and Private Sector Wage Distributions

Public and private sector wage
distributions controlling for

endogenous sector choice
Blaise Melly
Swiss Institute for International Economics and Applied Economic Research (SIAW),
University of St. Gallen
February 2005
Preliminary
Address for correspondence

Blaise Melly
Swiss Institute for International Economics and Applied Economic Research (SIAW)
University of St. Gallen
Bodanstrasse 8, 9000 St. Gallen, Switzerland
blaise.melly@unisg.ch
Abstract:
In this paper, we apply the instrumental quantile regression model of Chernozhukov and
Hansen (2005) to data from the German Socio Economic Panel to examine the wage
structure in the public and private sector. Since the original estimator proposed by
Chernozhukov and Hansen does not allow for interaction terms between the endogenous
variable and the covariates if we do not have additional instruments, we propose an
extension of the model to allow for this possibility. The results assuming exogenous sector
choice give a negative mean public sector wage premium and show that the wage
distribution is more compressed in the public sector. Correcting for endogenous sector
choice reverses the findings concerning the mean premium but preserves the more
compressed structure of the public sector earnings distribution. Thus, there is positive
selection into the private sector, the sector with higher wage inequality, confirming the
prediction of the Roy model.
Theme: Public sector labor markets
Keywords: Wage Inequality, Quantile Regression, Instrumental Variables, Wage

Differentials, Public and Private Sector.
JEL classification: C13, C14, C21, J31, J45.
1
1 Introduction
Public sector pay has always attracted public attention. Two characteristics of public sector
labor markets explain this interest. First, public sector labor markets are large and, obviously,
the size of the public sector wage bill has implication for both monetary and fiscal policy. The
government remains by far the largest employer in Germany. In 2002, 15.2 percent of the
labor force or 5.88 million people were employed by public employers (Federal Statistical
Office 2003). Furthermore, the public sector pay practices can have substantial impact on the
operation of private sector labor markets in which the government is a major employer. The
existence of a public sector premium may induce private sector employers to pay higher
wages to private sector employees. The concern is that such general wage increases can
jeopardize competitiveness.
Second, public sector labor markets are different and there are a number of reasons that
earnings differential between the private and the public sector exists. The public sector is
subject to political constraints and not profit constraints. Therefore the political system may
have different objectives from those of the private sector. Issues of pay equity and fairness
can survive in the political market place more than in the economic market place. Reder
(1975) and Borjas (1980) present models which use vote maximization to explain public
sector wages and employment. Budget maximization models where the bureaucrat is
interested in increasing his budget are also popular, if non-rigorous. On the contrary,
Holmlund (1993) introduces a formal theoretical model of why a wage differential exists. He
shows that a public sector wage premium arises in a model with two non-cooperative unions,
one in the public sector and the other in the private sector. The reasons for this result are
externalities; the public sector union ignores that a wage increase will raise taxes for private
sector workers and reduce public consumption for all workers, including those in the private
sector.
Given these differences in the wage setting procedures and the possible consequences on the
labor market, many researchers have sought to ascertain whether an identical employee
working in the same job in the public and in the private sector would earn the same or a
different amount. Early research comparing the earnings of public sector employees was
undertaken in the United States by Smith (1976 and 1977). Ehrenberg and Schwarz (1986)
and, more recently Bender (1998) and Gregory and Borland (1999) have surveyed this
voluminous literature. Studies for the United States and United Kingdom have generally
found a positive wage premium for public sector employees. Women, non-whites and low
2
skill level occupation receive the highest premium. A smaller number of studies have
compared the extent of earnings dispersion for public sector and private sector employees.
The main conclusion from these studies is that public sector compresses the distribution of
earnings. Evidence of this effect is available for the USA (Poterba and Rueben 1994), Canada
(Mueller 1998), UK (Blackaby, Murphy and O’Leary 1999), France and Italy (Lucifora and
Meurs 2004), among others. Given that there is a choice being made by workers whether to
work in the public or private sector, there is however potential for sample selection bias.
Therefore the more recent literature takes account of nonrandom selection by Heckman/Lee
correction for selectivity, endogenous switching regression models or fixed effect panel
models. However, results differ from a study to the other and it is difficult to draw any
definite conclusions. The variability of the results could be caused by weak or absence of
instruments in the case of selection models and by the little number of sector changes in the
case of fixed effects.
Such wage comparisons using German data are not as voluminous. Dustmann and Van Soest
(1997) use data from the German Socio-Economic Panel for the years 1984-93 to analyze
developments and differences in public and private sector wage distributions. They found that
conditional wages are higher in the private sector for males but higher in the public sector for
females. Dustmann and Van Soest (1998) estimate switching regression models extended to
allow for endogeneity of education, experience and hours worked. Their results are stronger
than the results of their first work. Jürges (2003) and Melly (2004a) find that the public sector
compresses the wages distribution.
In this paper, we estimate the effect of the public sector status on the entire wage distribution
controlling for endogenous sector choice. To our knowledge, it is the first study that both
control for endogenous sector choice and analyze the public and private sector wage
distributions. In order to do this, we employ the instrumental quantile regression model and
estimator developed in Chernozhukov and Hansen (2004a, 2004b, 2004c and 2005). We
extend the original estimator to allow for the estimation of fully interacted models since the
public private sector wage differential seems to be different at different values of the
covariates. We use data from the German Socio-Economic Panel, which contain background
information on parents’ economic status and provides us with reasonable instruments. The
results indicate that there is considerable heterogeneity in the effect of the public sector status
and that controlling for endogenous sector choice is important.
3
Section 2 compares the different models that has been proposed recently to correct for
endogeneity in the quantile regression model. Section 3 presents the model and the estimator
of Chernozhukov and Hansen (2004a and 2005) and proposes an extension in the case of
regressors fully interacted with the endogenous dummy variable. Section 4 shows how it is
possible to recover the unconditional wage distribution if we have estimated the conditional
wage distribution by quantile regression and how to use this result to decompose differences
in the wage distribution. Section 5 describes the data set, inclusively the instruments, along
with some descriptive statistics. Section 6 discusses the assumptions of the instrumental
quantile regression model in the context of the application. Section 7 presents the empirical
results and section 8 gives some concluding remarks.
2 Endogeneity in the quantile regression model
2.1 The traditional (exogenous) quantile regression model

The quantile regression model introduced by Koenker and Basset (1978) extends the notion of
ordinary quantiles in the location model to a more general class of linear models in which
conditional quantiles have a linear form. The most common form is median regression (Least
Absolute Deviations or LAD), where the object is to estimate the median of the dependent
variable, conditional on the value of the independent variables. The remaining conditional
quantiles are estimated by minimizing an asymmetrically weighted sum of absolute errors.
Thus, these models can be used to characterize the entire conditional distribution of a
dependent variable given a set of regressors.
The point of departure is an elementary definition of the quantiles. Let 0 < θ < 1. The θ th
quantile of the distribution of a random variable Y is a value ξθ such that
θ = Pr (Y ≤ ξθ ) = F (ξθ ) (1)
where F is the cumulative distribution function of Y. Let { yi ,..., yn } be a random sample of
size n drawn from the distribution of Y. By analogy to (1), the sample quantile is estimated by
ξˆθ = inf { y : Fn ( y ) ≥ θ } (2)
where Fn is the empirical distribution function of Y. To extend this idea to the estimation of
conditional quantile functions we need a new way to estimate the quantiles. Circumventing
the usual reliance on an ordered set of sample observations, we can define the quantiles as an
optimization problem. It is straightforward to show that ξˆθ solves
4
N
ξˆθ = arg min ∑ ρθ ( yi − m ) (3)
m i =1
where ρθ is the check function: ρθ ( z ) = z (θ − 1( z ≤ 0 ) ) and 1( ⋅) is the usual indicator
function.
Quantile regression extends these ideas to the linear model generating a new class of statistics.
Let { xi ,...xn } be vectors of explanatory variables that correspond to { yi ,..., yn } . It is assumed
that:
Y = X ' β (θ ) + ε and Quantθ (Y X=xi ) = xi ' β (θ ) , (4)
where Quantθ (Y X=xi ) denotes the θ th quantile of Y conditional on X = xi . The θ th
regression quantile is then defined in the same manner as the sample quantile by the
minimization problem
N
βˆ (θ ) = arg min ∑ ρθ ( yi − xi ' β ) . (5)
β ∈! K i =1
Buchinsky (1991) shows that this problem has a GMM interpretation. β (θ ) satisfies
following moment condition
( )
E  θ − 1(Y < X ' β (θ ) ) X ' = 0 .
The problem can be formulated as a linear program and simplex-based methods provide
efficient algorithms for most applications (Koenker and d’Orey 1987). Recent developments
in interior point methods (Portnoy and Koenker 1997) have shown that quantile regression
problems can be solved faster than least squares problems.
Increasing θ continuously from 0 to 1, we can trace the entire distribution of Y conditional on

X. By replacing a monolithic model of conditional central tendency with a family of models
for conditional quantiles, we are able to achieve considerably greater flexibility and a much
more complete view of the effect of the covariates on the dependent variable, allowing them
to influence location, scale and shape of the response distribution. For instance, the
distributional consequences of minimum wages, training programs and education are of
primary interest to policy makers. Unfortunately, in most cases, the treatment is self-selected
or endogenous, making conventional quantile regression inappropriate.
5
2.2 Comparison of different endogenous quantile regression
models
Amemihya (1982) was the first to seriously consider quantile regression methods for the
structural equation model showing the consistency and asymptotic normality of a class of
two-stage median regression estimators. Subsequent work of Powell (1983) and Chen and
Portnoy (1996) extended this approach but maintained the focus primarily on the conditional
median problem. The main motivation of these works was the robustness of the median
regression. Chernozhukov and Hansen (2004a) shows that this approach is not consistent
when the quantile treatment effect differs across quantiles and it is precisely in that case that
quantile regression is interesting.
Another approach have been considered. If we have instruments Z that are independent of
Y X , we have the following moment condition:
( )
E  θ − 1(Y < X ' β (θ ) ) ( X ', Z ')  = 0 .
This moment condition was used by Hong and Tamer (2003), Chen, Linton and Keilegom
(2003) and Honore and Hu (2004) to construct estimators. Abadie (1995) noted the
computational difficulty in obtaining the solution to the optimization problem. The objective
function is “million-modal” and has zero derivative almost everywhere, implying the need to
peform a grid search over a subset of ! dim( X ) . Moreover, the interpretation of the results is not
obvious without further assumptions. Therefore, we discuss in the following the models
proposed by Abadie, Angrist and Imbens (2002), Chernozhukov and Hansen (2004a, 2004b,
2004c and 2005) and Chesher (2003).
These three instrumental variable models propose different strategies that allow to identify
and estimate the marginal distributions of the outcome given the value of the endogenous
variable. Note first that the approaches are not directly comparable since they require different
types of variables. Abadie, Angrist and Imbens (2002) require binary treatment variable and a
binary instrument. Extension to continuous treatments and to overidentified cases with many
instruments are not straightforward. Chesher (2003), on the other hand, requires D and Z to be
continuous and, once again, it is not clear how his approach can be extended to the discrete
case. Chernozhukov and Hansen’s (2004a, 2004b, 2004c and 2005) approach is the more
general one since it allows for both discrete D and Z. They are also not directly comparable
because the prime objects of interest are not exactly identical. Abadie Angrist and Imbens
define and identify effects only for the compliers and not for the entire population, as
Chernozhukov and Hansen do. Chesher identify the treatment effect locally, that is at a given
6
quantile of the distribution of the endogenous variable given X. Therefore, his outcome
equation is not generally equal to the outcome equation of the two other models.
All models require an exclusion (independence) restriction. However, while Abadie, Angrist
and Imbens and Chesher require that both the disturbances in the outcome equations and the
disturbances in the selection are jointly independent from Z, Chernozhukov and Hansen
require only independence of the disturbances in the outcome equation. The stronger
assumption can be violated if the instrument are measured with error or assigned depending
on D. Finally, Abadie, Angrist and Imbens require a monotonicity assumption. This
assumption is automatically satisfied by latent-index models for treatment assignments and
holds trivially in a given number of cases (JTPA programs or 401(K) participation) but must
be carefully examined in other cases. Chernozhukov and Hansen require rank similarity. This
assumption is strong and will be examined in details in the context of the public / private
sector application in section 6.
Naturally, the plausibility of the different assumptions necessary to identify the marginal
distributions can be discussed in different contexts and sometimes a model can be preferred to
another. But, once we have estimated the marginal distribution, how can we interpret the
results and are these interpretations useful for the policy maker? This is a different question
and there is no guarantee that the response will be positive.
Many criteria of interest (Heckman, Smith and Clements 1997) such as the proportion of
people who benefit from the treatment, the distribution of the causal effect for the participants
or for the whole population requires knowledge of the joint distribution. The marginal
distributions of the potential outcomes are not sufficient to know the quantile effects
contrarily to the mean effect. In interpreting their results, many authors make implicitly an
assumption. They compare the θ th quantile of the marginal distribution given a certain value
of the endogenous variable with the θ th quantile of the marginal distribution given another
value of the endogenous variable. Abadie, Angrist and Imbens (2002) speak from quantile
treatment effect and interpret it. Ma and Koenker (2004), applying an estimator based on
Chesher’s model, also analyse how the quantile treatment effect changes along the
distribution. However the differences between the quantiles of the marginal distribution are
not informative in general. Maybe, somebody who is at the 75th quantile of the conditional
distribution if she participates in the programme would be at the 25th quantile if she would not
participate. Without further assumption it is only possible to bound the quantile treatment
effects (Fréchet 1951) and the bounds are often very wide and uninformative.
7
By interpreting the quantile treatment, they assume implicitly rank invariance. With rank
invariance the joint distribution can be recovered from the marginal distributions. With rank
similarity, the joint distribution cannot be recovered. However, the quantile treatment effects
are interpretable since an individual at the θ th quantile of the marginal distribution given a
value of the treatment expect to be at the same quantile of the marginal distribution given
another value of the treatment. Therefore, we think that the assumption of rank similarity is of
crucial importance if we want not only estimate the marginal distributions but also policy
relevant parameters. A discussion of this assumption is necessary in all models, although they
do not need it to estimate the marginal distributions. And, if we assume rank similarity, why
not to use it to estimate the parameters of interests, since it helps to identify the model?
For all these reasons, we will use in priority the estimator of Chernozhukov and Hansen.
Moreover, in our application, the endogenous variable is a dummy and we have 5 different
instruments. Thus, only the model of Chernozhukov and Hansen is applicable.
3 Extension of the instrumental quantile regression model

of Chernozhukov and Hansen.
3.1 Chernozhukov and Hansen's model and estimator

Chernozhukov and Hansen (2005) propose a model of quantile treatment effects in the
presence of endogeneity. Their model is different from that of Abadie, Angrist and Imbens
(2002) since they identify effects for the whole population and not only for the sub-population
of compliers. The principal feature of their model is the imposition of conditions which
restrict the evolution of ranks across treatment states.
The data consist of n observations on a continuously distributed scalar outcome variable, Y, a

l-vector of endogenous variables (possibly interacted with covariates), D, a vector of
instruments, Z, and a k-vector of covariates, X. The causal effects of interest are defined using
potential outcomes and potential treatment status to describe counterfactual states of the
world. Potential outcomes are indexed against D, and denoted YD , while potential treatment
status is indexed against Z, and denoted DZ . Both discrete and continuous D, X and Z are
allowed. Of primary interest are the θ th quantiles of potential outcomes under various
treatments D conditional on observed characteristics X and denoted
q ( D, X , θ ) .
The main assumptions of the model are
8
1. Potential outcomes. Conditional on X = x , for each d, Yd = q ( d , x,U d ) , where
q ( d , x,θ ) is strictly increasing in θ and U d ∼ U ( 0,1) .
2. Independence. Conditional on X = x , {U d } are independent of Z.
3. Selection. D ≡ δ ( Z , X , V ) for some unknown function δ and random vector V.
4. Rank invariance or rank similarity. Conditional on X = x and Z = z ,

(a) {U d } are equal to each other; or more generally,
(b) {U d } are identically distributed conditional on V.
5. Observed variables consist of Y ≡ q ( D, X ,U D ) , D, X and Z.
6. Full rank condition. The impact of instrument Z on the joint distribution of (Y , D )
should be sufficiently rich; in particular, Z should not be independent of D and

dim ( Z ) ≥ dim( D) . The exact technical conditions can be found in theorem 2 of
Chernozhukov and Hansen (2005).
Condition 4 is the main assumption and is absent from conventional endogenous treatment
effect models. Its strongest form is rank invariance, 4. (a), when ranks do not vary with D
U1 = U 0 = U .
For instance, people who are strong earners in the public sector remain strong earners in the
private sector. Thus rank invariance implies that a common unobserved factor U determines
the ranking of a given person across treatment states. This implies that the potential outcomes
are not truly multivariate, which may be implausible. Rank similarity, 4. (b), relaxes rank
invariance by allowing "slippage" in the ranks. It states that given the information (V , Z , X )
the expectation of any function of U d does not vary across the treatment states d. In other
words, ex ante the ranks are similar, while ex post the rank could differ.
Proposition 1 of Chernozhukov and Hansen (2004a and 2005) shows that, if assumptions 1 to
5 are satisfied, then for all θ ∈ ( 0,1) , a.s.
Pr (Y ≤ q ( D, X , θ ) X , Z ) = Pr (Y < q ( D, X , θ ) X , Z ) = θ .
Furthermore,
0 = QuantY − q( D , X ,θ ) X , Z (θ ) .
9
and, assuming integrability and identification condition 6, q ( d , x,θ ) is the unique function
such that
0 = arg min E  ρθ (Y − q ( D, X , θ ) − f ( X , Z ) ) X , Z  . (6)
f
This result can be used to build estimators. Chernozhukov and Hansen (2004a, 2004b, 2004c)
propose a computationally convenient estimator based on (6). They focus on the basic linear
model,
q ( d , x,θ ) = D 'α (θ ) + X ' β (θ ) ,
which covers a wide range of applications, but the results can be easily extended to other
parametric form. In the linear model, assumption 6 specializes to the restriction that
∂
E 1(Y < D 'α + X ' β + Z ' γ ) [ X ' : Z '] ' (7)
∂ (α , β , γ ) 
has full rank and is continuous in (α , β , γ ) uniformly over the compact parameter space.
The estimation procedure is simple and implies only estimation of traditional quantile
regression along a l-dimensional grid. In the simplest case where D and Z are one-dimensional
(1 instrument and 1 endogenous variable), the procedure consists simply in finding α such
that the traditional quantile regression of Y − D ' α on Z and X gives a coefficient of zero on Z.
In order to formalize the estimator in the general case, define the weighted quantile regression
objective function:
Qn (θ , α , β , γ ) ≡
1 n 
( )
∑ ρθ Yi − Di 'α − X i ' β − Φˆ i (θ ) ' γ Vˆi (θ )
n i =1 
where Φ i (θ ) ≡ Φ (θ , X i , Z i ) is an r-vector of instruments and Vi (θ ) ≡ V (θ , X i , Z i ) > 0 is a
ˆ i (θ ) and Vˆi (θ ) are consistent estimates of Φ i (θ ) and Vi (θ ) . The

weight function. Φ
estimation procedure is defined as follows:

αˆ (θ ) = arg inf Wn (θ , α ) , Wn (θ , α ) := nγˆ (θ , α ) ' Α
ˆ (θ , α ) γˆ (θ , α ) such that
α ∈A
( βˆ (θ ,α ) , γˆ (θ ,α )) = arg( inf) Q (θ ,α , β , γ ) , so that

β ,γ
n
(αˆ (θ ) , βˆ (θ )) = (αˆ (θ ) , βˆ (θ ,αˆ (θ ))) .

ˆ (θ , α ) = Α (θ , α ) + ο (1) and Α (θ , α ) is positive definite uniformly in α ∈ A . It is
Α p
convenient to set Α (θ , α ) equal to the inverse of the asymptotic covariance matrix of
10
n ( γˆ (θ , α ) − γ (θ , α ) ) . In this case, more weights are given to the instruments that are more
precisely estimated and Wn (θ , α ) is the Wald statistics for testing γ (θ , α ) = 0 .
Under some technical regularity conditions, Chernozhukov and Hansen (2004a and 2004c)
(
derives the asymptotic distribution of αˆ (θ ) , βˆ (θ ) : )
 αˆ (θ ) − α (θ )  d   0  
N  → N    , Λ (θ ) = ( K ', L ') ' S ( K ', L ')  (8)
 βˆ (θ ) − β (θ ) 
  0 
where, for Ψ = V ⋅ [ X ', Φ '] ' and ε = Y − D 'α (θ ) − X ' β (θ ) , S = θ (1 − θ ) E [ ΨΨ '] ,
K = ( Jα ' HJα ) Jα ' H , H = J γ ' A (θ , α ) J γ , L = J β M , M = I k + r − Jα K ,

−1
Jα = E  f∈ ( 0 X , Z , D ) ΨD ' , and  J β ', J γ ' ' is a partition of E  fε ( 0 X , Z ) ΨΨ ' V 

−1
such
that J β is a k × ( k + l ) matrix and J γ is a l × ( k + l ) matrix. Efficiency can be achieved by
choosing V * = fε ( 0 X , Z ) and Φ* = E  D X , Z  . In this case, the asymptotic variance
simplifies to θ (1 − θ ) E [ ΨΨ '] .
−1
These results can be used directly for inference. Chernozhukov and Hansen (2004a) discuss
estimation of the variance and we will not go into details because we prefer to use resampling
procedures which avoid estimating a large number of densities by methods sensitive to the
choice of the bandwidth. Chernozhukov and Hansen propose a second inference procedure
called dual inference. If we set Α (θ , α ) equal to the inverse of the asymptotic covariance
matrix of n ( γˆ (θ , α ) − γ (θ , α ) ) , the objective function is the Wald statistics for testing
γ (θ , α ) = 0 . Therefore, for the true parameter α (θ ) , Wn (θ , α (θ ) ) ∼ χ r2 and a confidence

region given by
CR p (θ ) := {α : Wn (θ , α ) < c p }
is consistent, where c p is the pth quantile of a χ r2 -distributed random variable. This inference
procedure is interesting since it is consistent in the case of weak (or no) instruments.
This estimator has a lot of advantages. It is easy to compute with standard algorithms for
quantile regression. The assumptions needed for identification and interpretation of the results
are clearly stated and are reasonable in numerous applications. A detailed discussion of rank
similarity in the public-private sector application can be find in section 6. However, this
estimator needs one instrument for each endogenous regressor inclusively interaction terms
11
with covariates. For instance, if we want to estimate not only a different constant between the
public and private sector but also different rates of return to X, we need k+1 instruments,
which can be very difficult if not impossible. Therefore two possible estimators which allow
for interacted terms are proposed in the next two sections. The first uses the sample selection
correction of Buchinsky (1998) to estimate the slope parameters and the Chernozhukov and
Hansen’s estimator to estimate the constants. The second use the Chernozhukov and Hansen
estimator locally at different point of the distribution of X and integrate then the results using
a minimum distance framework.
3.2 Combination of sample selection and IV quantile regression

Buchinsky (1998, 2001) proposes a sample selection procedure for quantile regression. This
estimator cannot be directly compared to the IV estimators presented above. However IV and
sample selection are closely linked, particularly when the endogenous variable is a dummy
(cf. Vyltacil 2002).
He assumes that the dependent variable Y * depends linearly on some regressors X:

Yi* = X i ' β (θ ) + υi and Quantθ (υi X i ) = 0 .
However, we observe Y only if Y R is equal to 1. Y R is defined by

Yi R = 1( X i 'α X + Z i 'α Z + ui )
where it is assumed that at least on element of α Z is different from 0. Thus, we have
Yi = Yi RYi* and the conditional quantile of the observed Y is given by
( ) (
Quantθ Y * X , Z , Y R = 1 = X ' β (θ ) + Quant υ X , Z , Y R = 1 )
( )
and, in general, Quant υ X , Z , Y R = 1 ≠ 0 . Nevertheless, if we assume:
continuity: w ≡ (υ , u ) has a continuous density
and single index: ( )

f w ( ⋅ X , Z ) = f w ⋅ g ( X ' α X + Z 'α Z ) , (9)
we can write
( )
Quantθ Y * X , Z , Y R = 1 = X ' β (θ ) + h ( g ( X 'α X + Z 'α Z ) ) + ε (10)
( )
and Quant ε X , Y R = 1 = 0 by construction. Based on this result, Buchinsky proposes a two
steps estimator. In the first step α is estimated using the semiparametric estimator of
Ichimura (1993). In the second step a quantile regression is run on X and an approximation of
the bias by a series expansion. It is obvious from (10) that the constant term in β (θ ) can not
12
be separately identified from h ( g ( X 'α X + Z 'α Z ) ) . Therefore, Buchinsky proposes to
estimate the constant term "at infinity" from the sub-population for which the probability of
participation is close to one. This idea was suggested by Heckman (1990) and Andrews and
Schafgans (1996) for the estimation of the constant term in mean regression.
Now, if we consider the public private sector application again, D is a dummy variable that is
equal to 0 if the person works in the private sector and 1 if she works in the public sector. The
idea of the estimator proposed in this section is that a model with an endogenous dummy
variable D and a fully interacted vector of regressors X
Y = D 'α (θ ) + X ' β (θ ) + DX ' δ (θ ) + ε
can be rewritten as a switching regression model

Y0 = α 0 (θ ) + X ' β 0 (θ ) + ε 0
Y1 = α1 (θ ) + X ' β1 (θ ) + ε1
Y = DY1 + (1 − D ) Y0 .
Therefore, considering the two wage equations separately, we can estimate them using the
sample selection correction of Buchinsky (1998) if we assume the single index assumption (9)
. Note that this assumption is not innocuous since it implies, for instance, that
heteroscedasticity is only allowed if it depends on the index X 'α X + Z 'α Z . The constant
terms could theoretically be estimated "at infinity" as proposed by Buchinsky if there are
some observations with Pr ( D = 1) → 1 and others with Pr ( D = 1) → 0 . This would allow
estimation of all parameters without to assume rank similarity. Such a method has the
drawbacks that it requires very strong, large support conditions and that estimation that
directly follows the identification strategy involves estimation on "thin sets" and thus a slow
rate of convergence. Identification hinges on sufficient distribution mass in the tails of the
index X 'α X + Z 'α Z which must be unbounded. The estimation often rests on just a handful
of observations surpassing the growing threshold which may be hard to distinguish from
unreasonable outliers.
For all these reasons we propose to estimate only the slope coefficients with the sample
selection procedure of Buchinsky and we obtain n consistent and asymptotically normally
distributed estimates β̂ 0 (θ ) and β̂1 (θ ) . In a second step, both constant terms can be
estimated using a slightly modified version of the estimator of Chernozhukov and Hansen.
The new weighted quantile regression objective function is given by
13
Qn (θ , α , δ , γ ) ≡
1 n 
∑
n i =1  (
ρθ Yi − α − Di ' δ − (1 − Di ) X i ' βˆ0 − Di X i ' βˆ1 − Φ )
ˆ (θ ) ' γ Vˆ (θ )  .
i i

The estimation procedure is defined as follows:

δˆ (θ ) = arg inf nγˆ (θ , δ ) ' Α
ˆ (θ , δ ) γˆ (θ , δ ) , such that
δ ∈A
(αˆ (θ , δ ) , γˆ (θ , δ ) ) = arg(α inf

, γ)
Q (θ , α , δ , γ ) , so that
n
(αˆ (θ ) ,αˆ (θ ) , βˆ (θ ) , βˆ (θ )) = (αˆ (θ , δˆ (θ )) ,αˆ (θ , δˆ (θ )) + δˆ (θ ) , βˆ (θ ) , βˆ (θ ) ) .

0 1 0 1 0 1
The asymptotic distributions of β̂ 0 (θ ) and β̂1 (θ ) are given in Buchinsky (1998). Note
however that β̂ 0 (θ ) and β̂1 (θ ) are correlated although they use different sets of observations
since they use the same first step estimate of Pr ( D = 1 X , Z ) . The asymptotic distribution of
α̂ 0 (θ ) and α̂1 (θ ) can be derived using the results for sequential GMM estimators since they
solve a moment conditions given the first step estimates β̂ 0 (θ ) and β̂1 (θ ) :
(( ) )
E  1 Yi < α (θ ) − D ' δ (θ ) − (1 − D ) X ' βˆ0 (θ ) − DX ' βˆ1 (θ ) − θ [1: Z '] ' = 0 .
 
3.3 Integration of nonparametric first step estimation

When the set of independent variables is discrete, the minimum distance framework provides
an alternative estimation procedure. Buchinsky (1991) chapter 1 section 9 and Chamberlain
(1994) derives and applies such an estimator for exogenous regressors. For the case with an
endogenous regressor, the idea consists in estimating the IV quantile regression separately in
each cell and then to use the minimum distance framework to obtain root n consistent and
asymptotically normally distributed estimates of the coefficients. On the contrary of the
method proposed in section 3.2, this approach can be directly extended to the case of
continuous endogenous variables. Moreover, it is generally consistent with heteroscedasticity.
On the other hand, the derivation of the asymptotic distribution of the estimator in the
presence of continuous X needs some further work. Moreover, since the first step is estimated
nonparametrically, only regions with common support property ( 0 < Pr ( D = 1 X ) < 1 ) can be
used in this procedure.
Suppose that X i comes from a discrete distribution so that there is a finite number, say J, of
different possible vectors X ( j ) , j = 1,...J . Chernozhukov and Hansen’s estimator can be
14
applied separately in each cell. Of course, only a constant α j (θ ) and the quantile treatment
effect (the coefficient on D) δ j (θ ) are estimated. The asymptotic distribution of αˆ j (θ ) and
δˆ j (θ ) is directly derived from (8):
 αˆ j (θ ) − α j (θ )  d   0  Λ j (θ ) 
n → N  , 
 ˆ
 δ j (θ ) − β j (θ ) 
   0  Pr X = x
 (
( j) ) 

where Λ j (θ ) is equal to Λ (θ ) defined in (8) with the exceptions that we condition all
expected value on X = x( j ) and the vector of regressors consists only of a constant term. We
( ) (
also note that αˆ j (θ ) , δˆ j (θ ) is independent of αˆ j ' (θ ) , δˆ j ' (θ ) for j ≠ j ' . )
Recall that we assume that the conditional quantiles of Y given X are linear in each sector.
That is,
α j (θ ) = X ( j ) ' β 0 (θ )
α j (θ ) + δ j (θ ) = X ( j ) ' β1 (θ )
where the parameter vectors β 0 (θ ) and β1 (θ ) are the same for j = 1,...J . Define G to be a
J ×k (with J ≥k) matrix with rows X (1) ,..., X ( J ) , α (θ ) = (α1 (θ ) ,..., α J (θ ) ) ,
δ (θ ) = (δ1 (θ ) ,..., δ J (θ ) ) and Wˆs (θ ) is a J × J matrix that converges with probability one to
Ws (θ ) , a positive-definite matrix, for s = 0,1 . The minimum distance estimators of β 0 (θ )
and β1 (θ ) are then defined by
βˆ0 (θ ) = min (αˆ (θ ) − G β ) 'Wˆ0 (θ ) (αˆ (θ ) − G β )

β
and ( ) (
βˆ1 (θ ) = min αˆ (θ ) + δˆ (θ ) − G β 'Wˆ1 (θ ) αˆ (θ ) + δˆ (θ ) − G β .
β
)
Then
( ) ( )
d
N βˆs (θ ) − β s (θ ) → N 0, ( G 'Ws (θ ) G ) G 'Ws (θ ) Ω s (θ ) Ws (θ ) G ( G 'Ws (θ ) G ) ,
−1 −1
for s = 0,1 , where Ω s (θ ) is a J diagonal matrix with the jth diagonal element equal to the
variance of αˆ j (θ ) and Ω1 (θ ) is a J diagonal matrix with the jth diagonal element equal to the
variance αˆ j (θ ) + δˆ j (θ ) . An efficient minimum distance estimator is obtained by setting
Ws (θ ) equal to a consistent estimator of Ω −s 1 .Note that if we estimate the same model as
15
Chernozhukov and Hansen (without interaction term) the efficiently weighted minimum
distance estimator is asymptotically identical to the estimator of Chernozhukov and Hansen
with optimal instruments and weights. The estimation of Ω −s 1 is not trivial since it involves
estimation of density functions. Moreover, if the model is misspecified, the attempt to
increase efficiency would in fact lower the convergence rate to less than n . Therefore
Chamberlain (1994) recommends using only cell sample sizes as weights.
If some of the exogenous variables are continuous, the problem becomes more complicated.
We propose a procedure which consists first to estimate α and δ nonparametrically at each
observation using a locally weighted version of the instrumental quantile regression estimator
and then to use the minimum distance framework to obtain an estimate of the coefficients.
Until now, we have not derived the asymptotic distribution of this estimator but we plan to do
it in the next months. Intuitively, we think that the asymptotic distribution of the
nonparametric first-step estimator could be derived similarly as in Chaudhuri (1991), Welsh
(1996) and Fan, Hu and Truong (1994) for the traditional quantile regression model, adding
the correction for endogeneity as derived by Chernozhukov and Hansen. Under given
conditions on the kernel and the bandwidth, the second step should be n consistent since we
integrate over all observations and there are only a finite number of parameters to estimate.
This is very similar to the average derivative estimator of Chaudhuri, Doksum and Samarov
(1997).
4 Decomposition of differences in distribution

The most basic approach to explore the wage differential between groups or sectors involves
estimating an earnings regression using pooled data for public and private sector employees
and including a dummy variable for a worker’s sector of employment. This specification can
be estimated for the conditional mean or the conditional quantiles of the dependent variable. If
the sector of employment is considered to be endogenous, the conditional mean of the log
wage can be estimated by traditional instrumental variable. For quantile regression, recent
developments presented in section 2 allow to correct for endogeneity. This simple dummy
variable approach is pretty easy to estimate and the results are trivial to interpret since the
“discrimination” part of the difference is the same for all observations. However, a very
strong restriction is implied by this specification: the returns to human characteristics are
constrained to be equal across sectors. The effect of a worker's sector of employment is
limited to be an intercept effect.
16
Since this restriction is very often violated by the data, alternative methodologies have been
proposed. The first step consists naturally in estimating the wage equation separately for each
sector. Now, the discrimination is different at different point of the distribution of the
covariates. A first possibility to present the results is to consider the expected wage rates or
the quantiles of the wage distributions in the public and private sectors for reference
individuals. Another common procedure consists in aggregating the results. The Oaxaca
(1973) / Blinder (1973) decomposition is the best known decomposition procedure for models
for the mean. It allows very easily to decompose the total difference into a part explained by
different characteristics and a part explained by coefficients, which is often interpreted as
discrimination. Decomposing differences in distribution is a more complex problem because
the quantile of a linear function is not equal to linear function of the quantile contrarily to the
mean.
Melly (2004b) proposes an intuitive procedure to decompose differences at different quantiles

of the unconditional distribution. In a first step, the conditional distribution is estimated by
quantile regression. In the second step, the conditional distribution is integrated over the range
( )
of the covariates. Formally, let βˆ = βˆ (τ 1 ) ,..., βˆ (τ j ) ,..., βˆ (τ J ) be the quantile regression
coefficients estimated at J different quantiles 0 < τ j < 1 , j = 1,..., J . Integrating over all
quantile regression and over all observations, a natural estimator of the θ th unconditional
quantile of the dependent variable is given by
 1 n J 
(
Q (θ , X , β ) = inf q : ∑∑ (τ j − τ j −1 )1 xi βˆ (τ j ) ≤ q ≥ θ  .
 n i =1 j =1 
)
Melly (2004b) shows that this estimator is consistent and asymptotically normally distributed.
Now, we can estimate counterfactual distribution by replacing the estimated coefficients or
the distribution of characteristics in a sector with the estimated coefficients or the distribution
of characteristics in the other sector. It is thus possible to separate the difference at each
quantile of the unconditional distribution into a part explained by coefficients and a part
explained by characteristics:
Q (θ , X pub , β pub ) − Q (θ , X priv , β priv )
= Q (θ , X pub , β pub ) − Q (θ , X pub , β priv )  + Q (θ , X pub , β priv ) − Q (θ , X priv , β priv ) 
where the first bracket represents the effect of differences in coefficients (discrimination) and
the second bracket represents the effect of differences in the distribution of characteristics
(justified differential).
17
5 Data, descriptive statistics, instruments
The analysis in this paper draws on data from the German Socio-Economic Panel (GSOEP)1
for the year 2003. It would be interesting to use the panel structure of the data to estimate a
fixed effect model. Unfortunately there is not enough movement between the public sector
and the private sector to obtain useful results. The choice between the public sector and the
private sector seems to be a choice for the entire work life. Therefore we concentrate in this
paper on the last wave of the panel and we control for endogeneity of the sector choice by
instrumental variable methods. After the reunification, the panel was extended to include the
eastern part of Germany, but we focus here on West Germany only because undeniable
economic differences subsist between East and West Germany. Since many public sector jobs
are not open to foreign nationals, the analysis is based on the subsample of Germans only.
Furthermore, the sample is restricted to include only men who were between 17 and 65 years
old and were in full-time or part-time employment. As the sample includes only wage earners,
the results must be interpreted conditional on the selected sample. However since we
concentrate on males, we can hope that the selection bias is not important. Finally, all
observations with a missing value for one of the variables have been excluded. The final data
set has 3125 observations.
Table 1 defines the variables we use for our descriptive analyses and in the decompositions.
The dependent variable is Lnghearn, the logged gross hourly wage. X, the vector of regressors
assumed to be exogenous contents experience and experience squared, tenure, a part-time
dummy, 5 educational dummies and 5 dummies describing the education level of father2. The
endogenous variable is Psect, a dummy variable equal to 1 if the person is employed in the
public sector and 0 if she is employed in the private sector. We do not distinguish between
civil servants (Beamte) and other public sector employees since pay scales are the same and
apply to all public sector workers at the federal, state and local level. Table 2 presents
descriptive statistics for public and private sector employees. Means of relevant variables
show that average hourly earnings are higher in the public sector than in the private sector.
They also show that public sector employees are, on average, better educated than private
sector employees. For instance, 22.7 per cent of the employees in the public sector have
achieved a university degree (Ed level 6), while they are only 13 per cent in the private sector.
1
For an English language description of the GSOEP see SOEP Group (2001).
2
The exogeneity of different elements of X can be discussed. Dustman and Van Soest (1998) show that it is
important to control for possible endogeneity of educational choices. However, to avoid computational
difficulties and presumably very high variances of the estimates, we do not consider this possibility in this paper.
18
Public sector employees have acquired more labor market experience and tenure, too. These
differences in work experience, education and tenure may explain the higher average wages
of public sector employees.
A first visual summary of the public and private sector wage distributions is provided in
figure 1. The density functions were estimated using an Epanechnikov kernel estimator and
the bandwidth was chosen according Silverman’s rule of thumb (1986). It can be seen from
this figure that the distributions are quite distinct between sectors. The public sector earnings
distribution is characterized by a higher density function around the mode and a lower
dispersion. The public sector earnings distribution lies “within” the private distribution. Public
sector employees at the 10th quantile of the public sector earnings distribution enjoy an
earnings advantage over private sector employees at the same point in the private sector
distribution of wages; but the reverse holds for employees at the 90th quantile of the public
sector and private sector earnings distribution. With “higher floors” and “lower ceilings”, the
public sector compresses the unconditional wage distribution.
Given that there is a choice being made by workers whether to work in the public or private
sector, there is the potential for sample selection bias because of nonrandom draw of workers
that would be employed in the public or private sectors. To correct for endogenous sector
choice, nonparametric identification requires exclusion restrictions. In many studies, the data
is not rich enough to provide appropriate instruments and identification assumptions are
sometimes dubious. For example, different measures of education have been used in the wage
equation and in the selection equation or age is used in one equation and experience in the
other. The GSOEP is a rich data set that contains a large range of background variables
usually not available in other studies. We will use 5 variables related on parents' occupational
status. Dustman and Van Soest (1998) have used very similar exclusion restriction. The most
important instrument is Fcivil, a dummy variable that is equal to 1 if the father was a civil
servant at the time the employee was 16 years old. The idea is that children naturally tend to
exert the same job as their father as imitation. Table 2 shows that the correlation is very high
for civil servants. Having a father which worked in the public sector almost doubles the
probability to work in the public sector. We note also that the probability of working in the
public sector increase if the father was a white collar and decrease if he was a blue collar.
Finally, there is a difference of 5.2% in the probability of working in the public sector
between employees whose mother worked and whose mother did not work.
It is the first study that estimates the private-public sector wages distributions controlling for endogeneity of the
sector choice. Simultaneous endogeneity of education is left for later work.
19
6 Discussion of the assumptions
The good properties of the estimator of Chernozhukov and Hansen (2004a) depend crucially
on a set of assumptions detailed in section 3.1. In this section, we discuss the two most
controversial assumptions: the exclusion restriction and the rank similarity.
The instruments must satisfy two requirements: the full rank condition (7) and the
independence with the potential outcomes given the value of the covariates. The full rank
condition is above all an empirical question which will be examined in section 7.2. The
second requirement can also be partially tested because we have 5 instruments for a single
endogenous variable but we can also discuss its plausibility. The occupational status of the
father could affect the wage of his son if he has better relationships in his sector of
employment and the son can profit from this relationships to increase his wage. Another
possibility is that the father can teach something to his son and this increases the productivity
of the son only if he works in the same sector. However, we can control for the educational
level of the father and it seems that this information is more important than the occupational
choice of the father in determining the wage. We do not control for the educational level of
the mother because it was found to be totally insignificant in all specifications. The fact that
the mother is not working could increase the quality of the education of the child if she helps
him more by homework. All these scenarios are imaginable. However, it is unlikely that they
play an important role, if any. In any cases, they seem to be the best available instruments and
are less dubious than variables traditionally used as instruments.
The principal assumption of the instrumental variable framework of Chernozhukov and

Hansen is rank similarity, a generalization of rank invariance. Given its importance, we
discuss this assumption in general and in the context of the application. To facilitate the
discussion, we begin by considering rank invariance and abstract from the dependence on
covariates. Let F0 denote the distribution of wages in the private sector and F1 the
distribution of wages in the public sector. Following Doksum (1974) and Lehmann (1975, p.
68-69), we define δ ( y ) as the “horizontal distance” between F0 and F1 at y:
F1 ( y ) = F0 ( y + δ ( y ) ) .
If we assume that y + δ ( y ) is nondecreasing, then δ ( y ) is uniquely defined and can be
expressed as
δ ( y ) = F0−1 ( F1 ( y ) ) − y
20
and is called the treatment effect function by Doksum (1974). Changing variables so
θ = F1 ( y ) we obtain the quantile treatment effect (cf. Koenker and Machado 1999, Koenker
and Billias 2001 and Koenker and Gelling 2001):
∆ (θ ) = δ ( F1−1 (θ ) ) = F0−1 (θ ) − F1−1 (θ ) . (11)
In this model, subjects at different quantiles react differently to the sector of employment.
This is much more flexible than the linear or shift model which assumes that ∆ (θ ) = ∆ .
Doksum proposes to interpret the quantile or rank of an individual in a given distribution as

the proneness to learn fast, proneness to work hard or simply as ability. If we consider two
employees that have this proneness to a different degree, then we would expect the sector of
employment to affect these two members differently. The function ∆ (θ ) is a measure of this
interaction between the sector and the ability.
However, Doksum notes that “it is a rather special model since it assumes that two subjects
with the same control response will react the same way to the treatment”. In other words, an
employee at the median of the potential wage distribution in the private sector would also be
at the median of the potential wage distribution in the public sector. Thus, employees react
differently to the treatment if and only if they differ in the latent ability θ . This property is
called rank invariance. Of course, there no possibility of really knowing whether the sector
choice operates in this way, since we cannot observe the same employee in both the public
and private sector. In the best case, randomized experience with perfect compliance, we only
observe the two marginal distributions. Without randomized experience, the availability of an
instrument allows us to estimate the marginal distributions, but not the joint distribution.
Without covariates and with exogenous sector choice, the quantile treatment effect can be
naturally estimated with the sample analog of (11)
∆ˆ (θ ) = Fˆ0−1 (θ ) − Fˆ1−1 (θ )
{ }
where F̂ denote the empirical distribution function and Fˆs−1 (θ ) = inf y Fˆs ( y ) ≥ θ , as usual.
If the treatment is continuous or if we need to control for the presence of a vector of

exogenous covariates X, quantile regression is a natural extension to the Lehman-Doksum
formulation. In the presence of covariates, the interpretation of the results requires to assume
rank invariance conditionally on the covariates. For instance, if we regress the wages on years
of experience and education, we must assume that a worker at the θ th quantile of the
conditional distribution with 20 years of experience would be at the θ th quantile of the
21
conditional distribution with 21 years of experience in order to interpret the coefficient of
experience. This implicit assumption is thus present in the traditional, exogenous quantile
regression model.
Now, if the sector choice is endogenous, Chernozhukov and Hansen (2005) show that we can
identify and interpret the quantile treatment effect if - among other assumptions and in
particular the presence of an instrument – the assumption of rank similarity is satisfied. This
assumption states that given the values of the observed characteristics X and of the
instruments Z, the expectation of the rank in the potential distributions do not vary with the
sector of employment. An employee at the θ th quantile of the potential distribution of wages
in the private sector conditional on X and Z expect to be at the θ th quantile of the potential
distribution of wages in the public sector conditional on the same characteristics and
instruments. This assumption is weaker than rank invariance since the ranks could differ ex
post. In other words, the employee’s information does not allow the objective discrimination
of systematic variation of his ranks across the sector. The similarity is simply a restriction on
the information set of the individual.
Since it is impossible to test this assumption, we can only discuss its plausibility. Rank
invariance means that the high earners in the public sector are also the high earners in the
private sector, conditionally on X and Z. It would be violated if totally different qualities
would be required in one sector compared with the other sector, for instance
conscientiousness and precision in the public sector and originality and rapidity in the private
sector, and if these qualities are negatively correlated among the employees. If the public
sector is organized totally differently from the private sector, it is possible that rank similarity
is violated. This seems to be implausible. Control over public expenditure has been a central
theme of the political discussion since the 1990s. In order to achieve the Maastricht criterions,
pressure has been put on the public spending in general and on wages in particular. An effect
of this is that the public sector was reorganized, some parts of the public sector were
privatized and more competition between private and public sector is present. The
globalization of the economy is another source of pressure that motivated the government to
liberalize and deregulate the market. Economic principles are now to be introduced in
administrative structures. Model for these efforts is the private sector and its enterprises; an
obvious evidence of this evolution is the new administrative terminology. Given these
reforms, it is unlikely that the public sector employees are rewarded very differently from
private sector employees. Moreover, violation of rank similarity would require that an
individual knows, before he begins to work, that he will be a good employee in a sector and a
22
bad employee in the other. As explained above, there are few changes of occupation sector
during a life. Therefore, it seems difficult for an employee to judge if he would be good in a
sector in which he never worked.
7 Empirical results
7.1 Exogenous sector choice
As a benchmark, we first estimate the public-private sector wage differential using traditional
quantile regression methods, assuming that the sector choice is exogenous. The first method
used is the simple dummy variable approach. We have regressed the logged wage on X and on
the public sector dummy with traditional quantile regression at 100 different quantile
uniformly distributed between 0 and 1. The results of the median regression are given in table
6 but we will concentrate on the coefficients of the public sector dummy. They are plotted in
figure 2 with a 95% bootstrap confidence interval. At the median of the conditional
distribution public sector employees earn 11.5% less than private sector employees with the
same characteristics (standard error: 1.6%). This negative coefficient is significantly different
from zero. If we integrate over all quantiles, we estimate the mean effect to be –9.5%. To
avoid the noisiest estimates at the extremes of the distribution, we follow Koenker and
Portnoy (1987) and trim the distribution between 10% and 90%. Finally we weight the
quantiles by the inverse of their variance. We obtain a weighted trimmed mean of -9.9%
(standard error: 1.4%). The results show also that the public sector compresses the wages by
giving a positive premium at the low end of the conditional distribution and a significant
negative premium at the upper tail of the distribution.
These results are correct only if the returns to human characteristics in X are the same in both
sectors. To test this restriction, we have estimated a fully interacted model where all
characteristics are interacted with the public sector dummy and we have tested if the
interaction terms are significantly different from zero with a Wald test. The variance-
covariance matrix is estimated using a bootstrap with 200 replications. Under the null
hypothesis, the test statistics is distributed like a Chi-squared distribution with 14 degrees of
freedom. The null hypothesis is clearly rejected for all quantile regression between the 5%
quantile and the 95% quantile. The value of the statistics is for instance 103.55 for the median
regression, 328.48 for the 20% quantile regression and 309.28 for the 80% quantile
regression.
23
Therefore, we have estimated 100 separate quantile regressions for each sector. Then, using
the procedure described in section 4, we have decomposed the difference between the
quantiles of the unconditional distribution into a part explained by different distribution of
characteristics and a part explained by different coefficients (could be interpreted as premium
or discrimination). Figure 3 plots the decomposition results with a 95% bootstrap confidence
interval for all estimates. The compression of the unconditional public sector wages
distribution can be seen by looking at the total differential. The 10% quantile of the public
sector wage distribution is higher than the 10% private sector wages distribution but the
contrary holds at the 90% quantile. This is only another way of presenting the results of figure
1. The part explained by characteristics is significantly positive, reflecting the fact that the
public sector employees are better educated and have more tenure than private sector
employees. We cannot reject the hypothesis that the part explained by characteristics is
constant across the distribution. Therefore, the higher wage dispersion in the public sector is
not caused by higher dispersion of the characteristics. Finally, the part explained by
coefficients is very similar to the results of the dummy variable approach in figure 2. The
premium is significant negative at the median and decreases monotonically from the low end
to the high end of the distribution. Similar results have been found by Melly (2004a) and
Jürges (2002).
Now, since the returns to characteristics are different in the two sectors, the unexplained
differential may differ for different individuals. Therefore, we consider the quantile treatment
effect at three different quantiles (0.2, 0.5, 0.8) for five reference individuals. The first and
base case is a full-time employee with 22 years of experience, 9.4 years of tenure, education
level 3 and which father has basic schooling (median of all variables). The second individual
has no experience and tenure and education level 1. The third has 30 years of experience and
tenure and education level 6. The fourth has no experience and tenure and education level 6.
Finally, the last individual has the same characteristics than the first except that he is only
part-time employed. The second column of table 3 gives the results for the separate estimation
of the wage regression in the two sectors assuming exogeneity of sector choice. The
comparison of the results for different individuals shows that assuming a constant effect is too
restrictive. In particular, individuals with high level of human capital (education, experience
and tenure) are better off in the private sector. The part time employment is less penalized in
the public sector. Generally, the differential declines with the quantile of the conditional at
which it is measured. Thus, the public sector compresses the pay distribution at all points of
the covariate distribution. However, we should recall that these results were obtained
24
assuming exogenous sector. This unlikely to be satisfied assumption will be suppressed in
section 7.3 and 7.4.
7.2 Choice between private and public sector

To describe the selection process between both sectors and as a first step estimation for the
sample selection correction procedure of Buchinsky (1998), we estimate the probability of
working in the public sector conditionally on the covariates and on the instruments by a logit
model. Since the logit estimator depends for consistency heavily on the distributional
assumption, we also estimate the sector choice equation by smoothed binary regression
quantile (Kordas 2005). Smoothed binary quantile regression was preferred to Klein and
Spady's (1993) or Ichimura's (1993) estimators because it allows for arbitrary
heteroscedasticity. Moreover, it imposes the same type of assumptions as quantile regression
in the other steps of the estimation (conditional quantile restriction).
Maximum score estimation, developed and studied by Manski (1975, 1985) is equivalent to
median regression applied to the binary choice model. Kim and Pollard (1990) established a
Ο ( n −1 3 ) rate of convergence for the estimator and showed that its limited distribution cannot
be used for inference because it depends on unknown nuisance parameters. To overcome

these shortcomings, Horowitz (1992) proposed smoothing the objective function. Kordas
(2005) extends results regarding smoothed median regression to general smoothed binary
quantile regression. While quantile regression are interesting because they provide a more
complete picture of how covariates affect the dependent variable, binary quantile regression
are even more interesting because they can allow to identify the model in cases where the
median parameters are not identified. If both are identified, binary quantile regression can be
much more efficient than maximum score estimator. Recall that the maximum score estimator
use effectively only the observations for which Pr ( y = 1 X ) = 0.5 . In the public-private sector
application, 22% of the employees in the sample work in the public sector. Using the logit
results, only 129 observations or 4% of the sample have a probability of working in the public
sector higher than 50%. Therefore, the (smoothed) maximum score estimator will have a very
high variance since it depends on very few observations. A binary quantile regression for a
quantile higher than the median will have a lower variance. Kordas (2000) establishes the
optimal quantile for two simple configurations. Depending on the variance of the error terms,
he finds that a quantile of 65% or 72% are optimal if the unconditional success probability is
20%. Based on these results and given that the unconditional success probability is 23%, we
have chosen to estimate the 70% smoothed binary quantile regression.
25
Since the model is identified only up to scale we normalize the coefficient of tenure to 1. The
smoothed binary quantile regression estimator is defined as the solution to the problem
1 n  x 'a 
αˆ (θ ) = arg max ∑ ( yi − (1 − θ ) ) K h  i 
a n i =1  σn 
where K h ( ⋅) is the integral of a hth order kernel function and σ n is a bandwidth converging to
0 as n → ∞ . Given the conditions of Horowitz (1992) or Kordas (2005), α̂ (θ ) is
asymptotically normally distributed. The fastest possible rate of convergence of α (θ ) is
( )
Ο n − h ( 2 h +1) . In the application, we use the fourth order kernel proposed by Horowitz (2002).
Using the plug-in method of Horowitz (1992) to estimate the optimal bandwidth, we obtain an
optimal bandwidth near 1 for a large range of starting values. To remove the asymptotic bias,
we undersmooth and choose a bandwidth of 0.1. The variance of the estimates was obtained
by bootstrapping the results 200 times. Horowitz (2002) proves the validity of the bootstrap.
The objective function to be maximized has many local maxima and requires a global
optimization algorithm. We use a procedure similar to that of Horowitz (1993). Starting from
the rescaled logit parameters, we carry out 10000 iterations of simulated annealing algorithm
followed by as many Nelder-Mead iterations as were needed to obtained convergence.
Simulated annealing yields a value of α̂ (θ ) that is sufficiently near the global maximum to
enable the maximum to be found by Nelder-Mead. We repeat this procedure 5 times to be

sure that we have found the optimal solution.
The results of the logit and smoothed binary quantile regression estimations are given in table
4. The coefficients of the logit and of the smoothed binary quantile model are similar with the
exception of the constants which have different normalization. The standard errors of the logit
estimates are generally slightly lower than those of smoothed quantile estimates. The results
are globally as expected. Experience seems to play no role in the sector choice. Tenure and
education increase the probability of working in the public sector. The educational choices of
the father are not important for the occupation choice of his son contrarily to the occupation
choices of the parents. Fcivil is significantly different from zero at the 1 per mil and Mnwork
at the 1% significance level with the smoothed binary quantile regression. The Wald test for
testing the hypothesis that the coefficients of all instruments are equal to zero gives a value of
21.76 rejecting decisively the null hypothesis.
26
7.3 Endogenous dummy
The results of section 7.1 assume that the sector choice is exogenous. However, given that
there is a choice being made by workers whether to work in the public or private sector, there
is the potential for sample selection bias because of nonrandom draw of workers that would
be employed in the public or private sectors. Therefore we correct for endogeneity of the
sector choice by using the estimator of Chernozhukov and Hansen (2004a, 2004b and 2004c)
presented in section 3.1. The instruments are the 5 variables related on parents' occupational
status defined in section 5. 100 different instrumental variable quantile regressions uniformly
distributed between 0 and 1 were estimated using the estimation procedure of Chernozhukov
and Hansen. Given the moderate sample size and the difficulty to estimate densities, we
renounce to try to estimate optimal weights and optimal instruments. We do not weight the
observations and we chose Α (θ , α ) to be the inverse of the asymptotic covariance matrix of
n ( γˆ (θ , α ) − γˆ (θ , α ) ) . We estimate this covariance matrix with a Huber sandwich estimate
using a local estimate of the sparsity3. The estimation procedure consists simply in running a
series of standard quantile regression of Y − Dα on covariates X and instrument Z over a grid
of α . The parameter space for α was taken to be between -2.5 and 2.5 for θ < 0.05 or
θ > 0.95 and between -1 and 1 for other quantiles. We used an equally spaced grid with step
size of 0.01. Figure 4 shows the objective function for 3 different instrumental quantile
regressions ( θ = 0.2, θ = 0.5 and θ = 0.8 ). The optima are clear and pretty precise. This was
the case for all quantiles but the precision of the estimation diminishes as we go away from
the median.
Figure 5 plots the coefficients of the public sector dummies at each percentile. If we compare
these results with the results of section 7.1, the premium is about 40% higher if we correct for
endogeneity. While the differential was negative over the major part of the distribution with
quantile regression, it is now positive over 75% of distribution. The correction for
endogeneity of the sector choice inverts the conclusion: the majority of public sector
employees is over- and not underpaid. Thus, there is positive selection into the private sector
and negative into the public sector. Goddeeris (1998) and Gyourko and Tracy (1988) obtain
similar results. The direction of the selection effect is correctly predicted by the Roy (1951,
see also the discussion in Heckman and Honore 1990) model. Employees with an absolute
disadvantage have a comparative advantage in the sector in which earnings are more
3
This corresponds to the option nid by using the quantreg package of Roger Koenker. A description can be
found in Koenker and Machado (1999).
27
concentrated. Thus, individuals will be positively selected towards the sector with higher
wage inequality, the private sector.
Now, if we consider the evolution of the premium when we let θ vary between 0 and 1, we
obtain a similar picture to figure 2. The premium declines more or less monotonically from
high positive values at the lower end of the distribution to negative values at the higher end of
the distribution. The differences are even more pronounced with estimates ranging from 2 to –
2. However, given the variance of the estimates, these extreme results should be kept with
precaution. In any case, the compression of the pay distribution by the public sector remains
after correcting for endogeneity and is maybe even accentuated. We could have thought that
the different distributions of wages were caused by different distributions of unobserved
ability; this is not the case.
The dual inference procedure is illustrated in figure 4 for the 20%, 50% and 80% quantile
regression. α is plotted on the horizontal axis and the vertical axis measures the function
value. The horizontal line in each graph is the 95% critical value for the dual testing
procedure. The dual confidence region is all values of α such that the function value lies
below the horizontal line. This inference procedure is attractive since it is valid with weak
instruments. However, if there are more instruments than endogenous variable, this Wald
statistics was traditionally used to test the exclusion restrictions. Thus, if we want to obtain
small confidence intervals, it is sufficient to add "instruments" which are endogenous and
which will artificially increase the objective function. Since results of section 7.2 indicate that
we have strong instruments, we will concentrate in priority on the direct inference procedure
and use the minimal objective value to test the exclusion restrictions. From the 100
instrumental quantile regressions, no one has an objective value higher than the 99% quantile
of a χ 52 distributed variable, 5 have an objective value higher than the 95% quantile of a χ 52
and 13 have an objective value higher than the 90% quantile of a χ 52 . Thus, we cannot reject
the null hypothesis that the instruments do not affect the wage.
A 95% subsampling confidence interval is plotted in figure 5. We choose the m out n

bootstrap without replacement following the recommendations of Chernozhukov (2002) and
Chernozhukov and Hansen (2004c). Using their experience, we choose the block size to be
3004. 1000 replications were performed. We do not plot the confidence interval of the 10
lowest and 10 highest quantiles because the estimated values were on the border of the
They recommend choosing a block size of kn with k between 3 and 10. k = 10 gives a block size of 250.
4 25
We prefer to take a subsample size of 300 to avoid multicollinearity in too many replications.
28
parameter space for some replications. The confidence intervals based on the dual procedure,
not plotted in figure 5, are generally thinner.
The standard errors of the estimates are (about 7 times) higher if we correct for endogeneity,
as it is usually the case. Therefore, it is difficult to give a precise estimation if we consider
only one instrumental quantile regression. However, we can draw clearer conclusions by
considering the whole instrumental quantile regression process. First, to get an estimation of
the level of the public sector wage premium, we estimate the weighted trimmed mean effect
by integrating over all instrumental quantile regression between 10% and 90% and weighting
them by the inverse of their standard error, we obtain a significant public sector premium of
30.5% (standard error 6.3%).
Chernozhukov and Hansen (2004c) propose inference procedures to evaluate the impact of
the treatment on the entire distribution of outcomes. A difficulty when implementing these
tests is that the asymptotic distribution depends on unknown nuisance parameters. This was
called the Durbin problem by Koenker and Xiao (2002). Chernozhukov and Hansen suggest a
bootstrap procedure to compute asymptotically valid critical values for these tests. They
propose a method of score resampling but we prefer to resample the estimates in order to
avoid the difficult estimation of conditional densities. We use the Kolmogorov-Smirnov (KS)
and Smirnov-Cramer-Von-Misses (SCM) statistics and we estimate Anderson-Darling
weights by subsampling. We estimate the critical values by constructing 1000 replications
with 300 observations and estimate the instrumental quantile regression process on
τ ∈ [ 0.1, 0.9] .
The p-values for 5 hypotheses are given in table 5. As expected, we can reject the hypothesis
that there is no difference between both sectors. The tests also strongly reject the null
hypothesis of a constant effect, which was taken to be the weighted trimmed mean estimated
above. The tests also reject strongly the hypothesis of no endogeneity, confirming the need to
instrument for the sector choice. This confirms the results of figure 5 and show that there is
positive selection into the private sector. Finally, we reject the hypothesis that the wage
distribution in the private sector dominates the distribution in the public sector but we cannot
reject the opposite.
7.4 Endogenous sector choice with fully interacted covariates

The results of section 7.3 assume that the returns to characteristics are the same in both
sectors. In the exogenous case, it was shown in section 7.1 that this restriction is not satisfied.
29
Therefore, we apply in this section the estimator proposed in sections 3.2. Since there are 2
continuous covariates and 11 dummy variables for only 3125 observations, we renounce to
present results using the estimators proposed in section 3.3. The estimator uses the sample
selection procedure of Buchinsky (1998) to estimate the slope coefficients and the
Chernozhukov and Hansen procedure to estimate the constants. Buchinsky (1998) proposes to
use the Ichimura (1993) or the Klein and Spady (1993) semiparametric estimator as a first
step estimator. Critical assumptions necessary to obtain a consistent estimation are
homoscedastic error terms. Since this assumption may not be satisfied and in order to impose
the same type of restrictions in both steps of the estimation, we prefer to used smoothed
binary quantile regression. The results were already discussed in section 7.2. In the second
step estimation, the bias term is approximated by a fourth-order series expansion. This
procedure allows estimating the slope coefficients. The constants are then estimated with the
Chernozhukov and Hansen’s estimator. The objective functions for three quantiles are plotted
in figure 6 and we can note that the minima are clearly and unique.
The estimation of the whole coefficients vector corrected for endogeneity allows the
estimation of the potential wage distribution in both sectors and to compare these potential
wage distributions with the observed uncorrected wage distributions using the procedure of
section 4. Figure 7 plots the effects of selection on each quantile of both distributions. It is
estimated as the differences between the observed wage distributions and the wage
distribution that would prevailed without selection. If the employees would sort randomly
between the sectors, the difference between both distributions should be zero over the whole
distribution. In reality, we observe that the observed distribution is below the corrected wage
distribution for the public sector, indicating that there is negative selection into the public
sector. The inverse phenomenon is observed in the private sector.
The decomposition of the corrected differential into the effects of characteristics and
coefficients (public sector wage premium) is plotted in figure 8. The effects of characteristics
are positive and represent about 20% among the whole distribution. This result is very similar
with the decomposition assuming exogenous sector choice in figure 3. The public sector wage
premium is positive over the whole distribution and decreases slightly when we move up the
wage distribution. Thus, the pattern of the public sector wage premium is very similar with
the results assuming the same return to human capital in both sectors. However, this does not
mean that the dummy variable model is correctly specified. In the fourth column of table 3,
the quantile treatment effects are given for three different quantiles and for the 5 reference
individuals defined in section 7.1. We observe very different public sector premium for the
30
different employees. Education is more rewarded in the private sector and return to
experience (in fact age since we have only potential experience) is higher in the public sector.
Moreover, part-time work is much less penalized by the public employer.
We do not have plotted confidence intervals in figure 8 because they are very wide and
provide no real information (standard errors are about 30%). This was expected since the
assumptions are less restrictive and we try to estimate twice as many parameters as above.
These new results, if they are not significant, go in the same direction than those of the more
restrictive model and thus provide a kind of robustness test. By considering the whole process
and applying the same testing procedure as above, we obtain some significant results. If we
consider the Kolmogorov-Smirnov statistic, which seems to be more powerful, the results of
table 7 indicate that we can reject the hypothesis that the public sector wage premium is zero,
constant or exogenous. Thus, we conclude that it is important to control for endogeneity of the
sector choice and it is important to consider the whole conditional distribution since the
premium is not the same at different point of the distribution. Results of table 3 indicate that it
seems to be important to allow for different slopes between the public and the private sector
but the increase in the variance renders difficult a definitive conclusion.
8 Summary and conclusion

In this paper, we apply the instrumental quantile regression estimator of Chernozhukov and
Hansen (2004a, 2004b, 2004c and 2005) to data from the German Socio Economic Panel to
examine the wage structure in the public and private sector in West Germany. After reviewing
the literature on endogeneity in the quantile regression model, we carefully motivate the
choice of the estimator and discuss the assumptions. Since the original estimator proposed by
Chernozhukov and Hansen does not allow for interaction terms between the endogenous
variable and the covariates if we do not have additional instruments, we propose an extension
of the model to allow for this possibility.
The results assuming exogenous sector choice give a negative mean public sector wage
premium and show that the wage distribution is more compressed in the public sector. The
rich data set gives us sensible instruments related to the occupational status of the employees’
parents, in particular we know if the father of each employee was a civil servant at the time
the employee was 16 years old. By applying the instrumental quantile regression estimator we
can correct for endogenous sector choice and provide a full characterization of the effect of
the public sector status on the distribution of wages. Correcting for endogeneity reverses the
findings concerning the mean premium but preserves the more compressed structure of the
31
public sector earnings distribution. The Roy model (Roy 1951 and Heckman and Honore
1990) predicts that individuals will be positively selected towards the sector with higher wage
inequality. This prediction is in fact confirmed by the data since we find positive selection
into the private sector. Allowing for different returns to human capital in the two sectors, we
find that the public sector also reduce the between-group inequality by giving smaller returns
to education.
References
Abadie A (1995) Changes in Spanish labor income structure during the 1980s: a quantile
regression approach. CEMPFI Working Paper No. 9521
Abadie A, Angrist J, Imbens G (2002) Instrumental Variables Estimates of the Effect of
Subsidized Training on the Quantiles of Trainee Earnings. Econometrica 70: 91-117
Amemiya T (1982) Two Stage Least Absolute Deviations Estimators. Econometrica 50: 689-
711
Andrews DWK, Schafgans M (1996) Semiparametric estimation of the intercept of a sample
selection model. Review of Economic Studies 65:497-517
Bender KA (1998) The central government-private sector wage differential. Journal of
Economic Survey 12: 177-220
Blackaby DH, Murphy PD, O’Leary NC (1999) The payment of public sector workers in the
UK: reconciliation with North-American findings. Economic Letters 65:239-243
Blinder A (1973) Wage discrimination: reduced form and structural estimates. Journal of
Human Resources 8: 436-455
Borjas GJ (1980) Wage determination in the federal government: the role of constituents and
bureaucrats. Journal of Political Economy 88:1110-1147
Buchinsky M (1991) The Theory and Practice of Quantile Regression. Ph. D. Dissertation,
Harvard University
Buchinsky M (1998) The dynamics of changes in female wage distribution in the US: a
quantile regression approach. Journal of Applied Econometrics 13:1-30
Buchinsky M (2001) Quantile regression with sample selection: estimating women’s return to
education in the US. Empirical Economics 26: 87-113
Chamberlain G (1994). Quantile regression, censoring and the structure of wages. In: Sims C
(ed) Advances in Econometrics. Elsevier, New York, pp. 171-209
Chaudhuri P (1991) Nonparametric estimates of regression quantiles and their local Bahadur
representation. The Annals of Statistics 19: 760-777
32
Chaudhuri P, Doksum K, Samarov A (1997) On average derivative quantile regression. The
Annals of Statistics 25:715-744
Chen L, Portnoy S (1996) Two Staged Regression Quantiles and Two-Staged Trimmed Least-
Squares Estimators for Structural Equation Model. Communication in Statistics: Theory
and Methods 25: 1005-1032
Chen X, Linton O, Van Keilegom I (2003) Estimation of semiparametric models when the
criterion function is not smooth. Econometrica 71:1591-1608
Chernozhukov V (2002) Inference on the quantile regression process: an alternative. MIT
Working Paper
Chernozhukov V, Hansen C (2004a) Instrumental quantile regression. MIT Working Paper
Chernozhukov V, Hansen C (2004b) The impact of 401K participation on savings: an IV-QR
analysis. Review of Economics and statistics 86:735-751
Chernozhukov V, Hansen C (2004c) Inference on the instrumental quantile regression process
for structural and treatment effect models. Journal of Econometrics forthcoming
Chernozhukov V, Hansen C (2005) An IV Model of Quantile Treatment Effects.
Econometrica 73: 245-262
Chesher A (2003) Identification in nonseparable Models. Econometrica 71: 1405-1441
Doksum K (1974) Empirical probability plots and statistical inference for nonlinear models in
the two-sample case. Annals of Statistics 2:267-277
Dustman C, Van Soest A (1997) Wage Structures in the Private and Public Sectors in West
Germany. Fiscal Studies 18:225-247
Dustman C, Van Soest C (1998) Public and private sector wages of male workers in
Germany. European Economic Review 42:1417-41
Ehrenberg RG, Schwarz JL (1986) Public Sector Labor Markets. In: Ashenfelter O, Layard R
(eds.) Handbook of Labor Economics, Volume 2. Elsevier Science Publishers,
Amsterdam, pp. 1219-1268
Fan J, Hu TC, Truong YK (1994) Robust Nonparametric Function Estimation. Scandinavian
Journal of Statistics 21: 433-446
Federal Statistical Office (2003) Statistical Yearbook 2003 for the Federal Republic of
Germany. Federal Statistical Office, Wiesbaden
Fréchet M (1951) Sur les tableaux de corrélation dont les marges sont données. Annales de
l’Université de Lyon 14 :53-77
Goddeeris JH (1988) Compensating differentials and self-selection : an application to
lawyers. Journal of Political Economics 96:411-428
33
Gregory RG, Borland J (1999) Public Sector Labor Markets. In: Ashenfelter O, Card D (eds.)
Handbook of Labor Economics, Volume 3c. Elsevier Science Publishers, Amsterdam,
pp. 3573-3630
Gyourko J, Tracy J (1988) An analysis of public- and private- sector wages allowing for
endogenous choices of both government and union status. Journal of Labor Ecnomics
6:229-253
Heckman J (1990) Varieties of sample selection bias. American Economic Review 80:313-
318
Heckman J, Honore B (1990) The empirical content of the Roy model. Econometrica
58:1121-1149
Heckman J, Smith J, Clements N (1997) Making the most out of programme evaluations and
social experiments: accounting for heterogeneity in programme impacts. Review of
Economic Studies 64:487-535
Holmlund B (1993) Wage setting in private and public sectors in a model with endogenous
government behavior. European Journal of Political Economy 9: 149-162
Hong H, Tamer E (2003) Inference in censored models with endogenous regressors.
Econometrica 71:905-932
Honore B, Hu L (2004) On the performance of some robust instrumental variables estimator.
Journal of Business and Economic Statistics 22:30-39
Horowitz JL (1992) A smoothed maximum score estimator for the binary response model.
Econometrica 60: 505-531
Horowitz JL (1993) Semiparametric estimation of a work-trip mode choice model. Journal of
Econometrics 58: 49-70
Horowitz JL (2002) Bootstrap critical values for tests based on the smoothed maximum score
estimator. Journal of Econometrics 11: 141-167
Ichimura H (1993) Semiparametric least squares (SLS and weighted SLS estimation of single-
index models. Journal of Econometrics 58:71-120
Jürges H (2002) The distribution of the German public-private wage gap. Labour 16:347-381
Klein R, Spady R (1993) An efficient semiparametric estimator of the binary response model.
Econometrica 61:387-421
Koenker R, Bassett G (1978) Regression Quantiles. Econometrica 46:33-50
Koenker R, Bilias Y (2001) Quantile regression for duration data: a reappraisal of the
Pennsylvania reemployment bonus experiments. Empirical Economics 26:199-220
34
Koenker R, Geling O (2001) Reappraising medfly longevity: a quantile regression survival
analysis. Journal of the American Statistical Association 96:458-468
Koenker, R, Machado JAF (1999). Goodness of Fit and Related Inference Process for
Quantile Regression. Journal of the American Statistical Association, 94, 1296-1310.
Koenker R, D’Orey V (1987). Computing Regression Quantiles. Applied Statistics 36: 383-
393
Koenker R, Portnoy S (1987) L-estimation for linear models. Journal of the American
Statistical Association 82:851-857
Koenker R, Xiao Z (2002) Inference on the quantile regression process. Econometrica 70:
1583-1612
Kordas (2000) Smoothed binary regression quantiles. Working paper
Kordas (2005) Smoothed binary regression quantiles. Journal of applied econometrics,
forthcoming
Lehmann E (1975) Nonparametrics: statistical methods based on ranks. Holden-Day, San
Francisco
Lucifora C, Meurs D (2004) The public sector gap in France, Great Britain and Italy. IZA
discussion paper No. 1041
Ma L, Koenker R (2004) Quantile regression methods for recursive structural equation
models. University of Illinois at Urbana Champaign working paper
Manski CF (1975) Maximum score estimation of the stochastic utility model of choice.
Journal of Econometrics 3: 205-228
Manski CF (1985) Semiparametric analysis of discrete response: asymptotic properties of the
maximum score estimator. Journal of Econometrics 32: 65-108
Melly B (2004a) Public-private sector wage differentials in Germany: evidence from quantile
regression. Empirical Economics forthcoming
Melly B (2004b) Decomposition of differences in distribution using quantile regression.
Unpublished working paper, downloadable from www.siaw.unisg.ch/lechner/melly
Mueller R (1998) Public-private sector wage differentials in Canada: evidence from quantile
regressions. Economics Letters 60:229-235
Oaxaca R (1973) Male-female wage differentials in urban labor markets. International
Economic Review 14:693-709
Portnoy S, Koenker R (1997) The Gaussian Hare and the Laplacian Tortoise: Computability
of Squared-Error versus Absolute-Error Estimators. Statistical Science 12: 279-300
35
Poterba J, Rueben K (1995). The Distribution of Public Sector Wage Premia: New Evidence
Using Quantile Regression Methods. NBER Working Paper No. 4734
Powell JL (1983) The Asymptotic Normality of Two-Stage Least Absolute Deviations
Estimators. Econometrica 51: 1569-1575
Reder M (1975) The theory of employment and wages in the public sector. In D. Hamermesh
(ed.), Labor in the Public and Nonprofit Sectors (pp. 1-48), Princeton University Press,
Princeton
Roy AD (1951) Some thoughts on the distribution of earnings. Oxford Economic Papers
3:135-146
Silverman BW (1986) Density estimation. Chapman and Hall, London
Smith S (1976) Pay differentials between federal government and private sectors workers.
Industrial and Labour Relations Review 29:233-257
Smith S (1977) Equal Pay in the Public Sector: Fact or Fantasy. Industrial Relations Section,
Princeton
SOEP Group (2001) The German Socio-Economic Panel (GSOEP) after more than 15 years -
Overview. In: Holst E, Lillard DR, DiPrete TA (eds.) Proceedings of the 2000 Fourth
International Conference of German Socio-Economic Panel Study Users (GSOEP2000),
Vierteljahrshefte zur Wirtschaftsforschung (Quarterly Journal of Economic Research),
70:1, pp. 7-14
Vytlacil EJ (2002) Independence, monotonicity, and latent index models: an equivalence
result. Econometrica 70:331-341
Welsh AH (1996) Robust Estimation of smooth regression and spread functions and their
derivatives. Statistica Sinica 6: 347-366
36
Table 1: Definition of variables
Variable Description
Wage Gross hourly earnings from employment. Gross hourly wage are derived by
dividing gross monthly earnings by monthly actual hours worked.
Ln(wage) The natural logarithm of wage.
Expr Number of years of potential work experience the individual has accumulated.
It is measured by min(age-schooling-6, age –18).
Tenure Number of years with current employer
Part-time Dummy; 1 if the individual is part-time or marginally employed.
Ed level Ordered variable on education:
Ed level 1 Dummy; 1 if no degree or basic or intermediate schooling with no training.
Ed level 2 Dummy; 1 if basic schooling with apprenticeship.
Ed level 3 Dummy; 1 if intermediate schooling with apprenticeship.
Ed level 4 Dummy; 1 if high school (Abitur or Fachabitur) with no training or with
apprenticeship.
Ed level 5 Dummy; 1 if high school with technical school or polytechnic.
Ed level 6 Dummy; 1 if university.
Psect Dummy; 1 if employed in the public sector.
Fs1 Dummy, 1 if father basic schooling (Hauptschule)
Fs2 Dummy, 1 if father secondary school (Realschule)
Fs3 Dummy, 1 if father high school (Abitur)
Ft1 Dummy, 1 if father apprenticeship (Lehre)
Ft2 Dummy, 1 if father college (Hochschule or Universität)
Fcivil Dummy, 1 if father civil servant at the time the respondent was 16 years old.
Fblue Dummy, 1 if father blue collar at the time the respondent was 16 years old.
Fself Dummy, 1 if father self employed at the time the respondent was 16 years old.
Fwhite Dummy, 1 if father white collar at the time the respondent was 16 years old.
Mnwork Dummy, 1 if mother did not work at the time the respondent was 16 years old..
37
Table 2: Descriptive statistics, means
Variable All Public Private
Sector Sector
Ln(wage) 2.693 2.745 2.677
Expr 22.09 24.15 21.47
Tenure 12.18 16.30 10.96
Part-time 8.1% 7.1% 8.4%
Education:
Ed level 1 9.7% 5.7% 10.9%
Ed level 2 30.1% 22.3% 32.4%
Ed level 3 24.7% 26.4% 24.2%
Ed level 4 8.4% 9.3% 8.1%
Ed level 5 11.9% 13.5% 11.4%
Ed level 6 15.3% 22.7% 13%
Fs1 64.7% 63.7% 65%
Fs2 12.5% 13% 12.4%
Fs3 23.7% 26.1% 23%
Ft1 37.7% 37.2% 37.8%
Ft2 6.2% 6.7% 6%
Fcivil 10.3% 16.2% 8.6%
Fblue 39.9% 35% 41.4%
Fself 12% 12.6% 11.8%
Fwhite 21.6% 23.2% 21.2%
Mnwork 21.2% 25.2% 20%
Number of
observations 3125 717 2408
38
Table 3: Estimated wage differentials for five reference individuals
Exogenous Exogenous Endogenous Endogenous
dummy interacted dummy interacted
Individual 1, 20% 4.23% 49.82%
Individual 2, 20% -18.33% 14.86%
Individual 3, 20% -2.86% -13.09% 44% 51.12%
Individual 4, 20% -32.84% 15.29%
Individual 5, 20% 48.88% 95.74%
Individual 1, 50% -7.52% 40.03%
Individual 2, 50% -13.72% 28.68%
Individual 3, 50% -11.55% -23.92% 37% 29.43%
Individual 4, 50% -28.85% 19.73%
Individual 5, 50% 43.99% 93.99%
Individual 1, 80% -14.49% -3.02%
Individual 2, 80% -4.75% -12.51%
Individual 3, 80% -16.98% -27.34% -15% 14.26%
Individual 4, 80% -26.25% -13.60%
Individual 5, 80% 37.18% 57.15%
Note: reference individuals are defined in the text.
39
Table 4: Estimation of the selection equation, dependent variable: psect
Logit Smoothed binary quantile
Coefficient Std. error Coefficient Std. error
Constant -46.422*** 5.222 -29.334*** 5.896
Expr 0.261 0.348 0.209 0.524
Expr^2 -0.009 0.007 -0.011 0.015
Tenure 1*** 0.101 1***
Part 5.595 3.467 4.060 3.966
Ed level 2 -0.653 3.598 -0.738 3.359
Ed level 3 9.332*** 3.567 9.263*** 3.490
Ed level 4 13.669*** 4.049 12.327*** 4.081
Ed level 5 10.797*** 3.899 11.278** 4.155
Ed level 6 19.403*** 3.778 19.469*** 4.170
Fs1 -1.869 2.801 -2.688 2.537
Fs2 1.786 3.473 0.248 3.729
Fs3 -4.980 3.743 -6.216* 3.683
Ft1 2.028 1.944 1.589 2.008
Ft2 0.874 3.750 0.079 4.360
Fcivil 10.801*** 3.027 13.872*** 3.357
Fblue 1.279 2.516 1.940 2.588
Fself 2.370 3.090 3.046 3.109
Fwhite 3.517 2.710 3.613 2.901
Mnwork 4.021* 2.136 7.040*** 2.719
Note: *: significant at the 10%, **significant at the 5%, ***: significant at the 1%
40
Table 5: Tests on the instrumental quantile regression process
Null hypothesis P-value with KS statistic P-value with SCM statistic
No effect: α ( ⋅) = 0 <0.1% <0.1%
Constant effect: α ( ⋅) = α <0.1% <0.1%
Exogeneity: α ( ⋅) = α QR ( ⋅) <0.1% <0.1%
Dominance: α ( ⋅) ≥ 0 6.5% 92.7%
Dominance: α ( ⋅) ≤ 0 <0.1% <0.1%
Note: The statistics and critical values are computed using the method of Chernozhukov and Hansen (2002).
1000 replications with 300 observations without replacement were constructed. KS: Kolmogorov-Smirnov
statistic; SCN: Smirnov-Cramer-Von-Misses statistic
41
Table 6: Median regression with different estimators
RQ IRQ RQ RQ SIRQ SIRQ
public private public private
Constant 1.87** 1.905** 1.811** 1.905** 2.224** 1.894**
(0.05) (0.05) (0.128) (0.054) (0.175) (0.079
Expr 0.042** 0.040** 0.047** 0.040** 0.045** 0.039**
(0.003) (0.004) (0.007) (0.004) (0.009) (0.005)
Expr^2 -7e-4** -7e-4** -7e-4** -7e-4** -6e-4** -6e-4**
(-7e-5) (7e-5) (1e-4) (9e-5) (1.8e-4) (1.2e-4)
Tenure 8e-3** 3e-3* 2e-3 9e-3** -8e-4 4e-3
(9e-4) (1.5 e-3) (2e-3) (1e-3) (3e-3) (2e-3)
Part -0.582** -0.619** -0.191* -0.706** -0.182 -0.721**
(0.05) (0.052) (0.089) (0.064) (0.128) (0.064)
Ed level 2 0.109** 0.119** 0.045 0.086* 0.062 0.078
(0.035) (0.034) (0.072) (0.035) (0.136) (0.043)
Ed level 3 0.219** 0.187** 0.164* 0.213** 0.139 0.142**
(0.032) (0.034) (0.068) (0.035) (0.139) (0.046)
Ed level 4 0.316** 0.245** 0.234** 0.344** 0.185 0.255**
(0.038) (0.043) (0.090) (0.045) (0.150) (0.059)
Ed level 5 0.504** 0.483** 0.389** 0.509** 0.349* 0.436**
(0.036) (0.04) (0.077) (0.039) (0.141) (0.051)
Ed level 6 0.587** 0.510** 0.445** 0.597** 0.401** 0.491*
(0.039) (0.045) (0.075) (0.041) (0.147) (0.062)
Fs1 0.013 4e-3 -0.015 0.029 -0.024 0.019
(0.025) (0.026) (0.054) (0.028) (0.063) (0.032)
Fs2 -0.060* -0.028 -0.089 -0.023 -0.071 -0.025
(0.029) (0.038) (0.058) (0.041) (0.073) (0.047)
Fs3 0.088** 0.062 0.079 0.069 0.057 0.079
(0.033) (0.041) (0.077) (0.043) (0.082) (0.050)
Ft1 0.044** 0.021 0.010 0.045* 0.016 0.055*
(0.016) (0.018) (0.026) (0.019) (0.034) (0.022)
Ft2 0.016 8e-3 0.059 0.008 0.057 0.024
(0.032) (0.043) (0.069) (0.045) (0.099) (0.056)
Psect -0.115** 0.33**
(0.016) (0.111)
Note: column 1: quantile regression, column 2: instrumental quantile regression, column 3: quantile regression in
the public sector, column 4: quantile regression in the private sector, column 5: instrumental quantile regression
with sample selection correction in the public sector, column 6: instrumental quantile regression with sample
selection correction in the private sector. *: significant at the 5% level, **: significant at the 1% level. Bootstrap
standard errors are given in parenthesis.
42
Table 7: Tests on the instrumental quantile regression process with sample selection
Null hypothesis P-value for KS test P-value for SCM-test
No effect: 2.9% 16.1%
α ( ⋅) = 0 and β1 ( ⋅) = β 0 ( ⋅)
No effect: <0.1% 13.3%
α (⋅) = 0
Constant effect: 1.3% 52.1%
α ( ⋅) = α and β1 ( ⋅) − β 0 ( ⋅) = δ
Constant effect: <0.1% 0.6%
α (⋅) = α
Exogeneity: 3.5% 49.7%
α1 ( ⋅) − α 0 ( ⋅) = α1QR ( ⋅) − α 0QR ( ⋅) and
β1 ( ⋅) − β 0 ( ⋅) = β1QR ( ⋅) − β 0QR ( ⋅) .
Exogeneity: 0.1% 2.4%
α1 ( ⋅) − α 0 ( ⋅) = α1QR ( ⋅) − α 0QR ( ⋅)
Note: The statistics and critical values are computed using the method of Chernozhukov and Hansen (2002).
1000 replications with 300 observations without replacement were constructed. KS: Kolmogorov-Smirnov
statistic; SCN: Smirnov-Cramer-Von-Misses statistic
43
Figure 1: Kernel density estimates of the wage distributions
1.2
private sector
public sector
1.0
0.8
0.6
0.4
0.2
0.0
1 2 3 4 5
Log gross hourly wages
Note: Density functions estimated using an Epanechnikov kernel estimator and bandwidths chosen according
Silverman’s rule of thumb.
44
Figure 2: Public sector wage “premium” at different quantiles
0.1
0.0
-0.1
-0.2
0.0 0.2 0.4 0.6 0.8 1.0
Quantile
Note: coefficient on the public sector dummy variable estimated by traditional quantile regression with a 95%
bootstrap confidence interval.
45
Figure 3: Decomposition of public private sector wage differential at different quantiles
0.4
Total differential
Coefficients
Characteristics
0.2
0.0
-0.2
0.0 0.2 0.4 0.6 0.8 1.0
Quantile
Note: 95% bootstrap confidence intervals are delimited by the lines.
46
Figure 4: Objective functions of 3 instrumental quantile regression
40
θ = 0.2 θ = 0.5 θ = 0.8
50
35
40
30
40
30
25
30
20
20
20
15
10
10
10
-0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0
α α α
Note: the horizontal line is the 95% critical value from a χ 52 .
47
Figure 5: Public sector wage premium using instrumental quantile regression
2
1
0
-1
-2
0.0 0.2 0.4 0.6 0.8 1.0
Quantile
Note: coefficient on the public sector dummy variable estimated by instrumental quantile regression with a 95%
m out n bootstrap confidence interval.
48
Figure 6: Objective functions of 3 instrumental quantile regression with sample selection
140
80
50
θ = 0.5 θ = 0.8
70
120
θ = 0.2
40
60
100
50
80
30
40
60
20
30
40
20
20
10
10
-1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0
α α α
Note: the horizontal line is the 95% critical value from a χ 52 .
49
Figure 7: Effect of sample selection on the wage distributions
0.3
Public sector
0.2
Private sector
0.1
0.0
-0.1
-0.2
-0.3
0.0 0.2 0.4 0.6 0.8 1.0
Quantile
Note: Difference between the observed quantiles of the wage distributions and the simulated quantiles of the
corrected wage distributions that would prevail without endogenous sector choice. A positive difference is the
consequence of positive selection into this sector and a negative difference the consequence of negative selection
into this sector.
50
Figure 8: Decomposition of public private sector wage differential correcting for endogeneity
0.8
Uncorrected differential
Corrected differential
Coefficients
0.6
Characteristics
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Quantile
Note: The uncorrected differential is the observed differential between the quantiles of the public sector wage
distribution and the quantiles of the private sector wage distribution. The corrected differential is the differential
that would prevailed if we would correct for endogenous sector choice. The effect of characteristics is the effect
of the different distributions of characteristics. The effect of coefficients can be interpreted as discrimination.
51

Public and Private Sector Wage Distributions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Public and Private Sector Wage Distributions

Uploaded by

Copyright:

Available Formats

Public and private sector wage

distributions controlling for

Address for correspondence

Theme: Public sector labor markets

Keywords: Wage Inequality, Quantile Regression, Instrumental Variables, Wage

JEL classification: C13, C14, C21, J31, J45.

2 Endogeneity in the quantile regression model

2.1 The traditional (exogenous) quantile regression model

where F is the cumulative distribution function of Y. Let { yi ,..., yn } be a random sample of

where ρθ is the check function: ρθ ( z ) = z (θ − 1( z ≤ 0 ) ) and 1( ⋅) is the usual indicator

where Quantθ (Y X=xi ) denotes the θ th quantile of Y conditional on X = xi . The θ th

following moment condition

Increasing θ continuously from 0 to 1, we can trace the entire distribution of Y conditional on

3 Extension of the instrumental quantile regression model

3.1 Chernozhukov and Hansen's model and estimator

The data consist of n observations on a continuously distributed scalar outcome variable, Y, a

The main assumptions of the model are

q ( d , x,θ ) is strictly increasing in θ and U d ∼ U ( 0,1) .

2. Independence. Conditional on X = x , {U d } are independent of Z.

3. Selection. D ≡ δ ( Z , X , V ) for some unknown function δ and random vector V.

4. Rank invariance or rank similarity. Conditional on X = x and Z = z ,

(b) {U d } are identically distributed conditional on V.

5. Observed variables consist of Y ≡ q ( D, X ,U D ) , D, X and Z.

6. Full rank condition. The impact of instrument Z on the joint distribution of (Y , D )

should be sufficiently rich; in particular, Z should not be independent of D and

Chernozhukov and Hansen (2005).

q ( d , x,θ ) = D 'α (θ ) + X ' β (θ ) ,

where Φ i (θ ) ≡ Φ (θ , X i , Z i ) is an r-vector of instruments and Vi (θ ) ≡ V (θ , X i , Z i ) > 0 is a

ˆ i (θ ) and Vˆi (θ ) are consistent estimates of Φ i (θ ) and Vi (θ ) . The

estimation procedure is defined as follows:

( βˆ (θ ,α ) , γˆ (θ ,α )) = arg( inf) Q (θ ,α , β , γ ) , so that

(αˆ (θ ) , βˆ (θ )) = (αˆ (θ ) , βˆ (θ ,αˆ (θ ))) .

convenient to set Α (θ , α ) equal to the inverse of the asymptotic covariance matrix of

precisely estimated and Wn (θ , α ) is the Wald statistics for testing γ (θ , α ) = 0 .

where, for Ψ = V ⋅ [ X ', Φ '] ' and ε = Y − D 'α (θ ) − X ' β (θ ) , S = θ (1 − θ ) E [ ΨΨ '] ,

K = ( Jα ' HJα ) Jα ' H , H = J γ ' A (θ , α ) J γ , L = J β M , M = I k + r − Jα K ,

Jα = E  f∈ ( 0 X , Z , D ) ΨD ' , and  J β ', J γ ' ' is a partition of E  fε ( 0 X , Z ) ΨΨ ' V 

that J β is a k × ( k + l ) matrix and J γ is a l × ( k + l ) matrix. Efficiency can be achieved by

choosing V * = fε ( 0 X , Z ) and Φ* = E  D X , Z  . In this case, the asymptotic variance

matrix of n ( γˆ (θ , α ) − γ (θ , α ) ) , the objective function is the Wald statistics for testing

γ (θ , α ) = 0 . Therefore, for the true parameter α (θ ) , Wn (θ , α (θ ) ) ∼ χ r2 and a confidence

3.2 Combination of sample selection and IV quantile regression

He assumes that the dependent variable Y * depends linearly on some regressors X:

However, we observe Y only if Y R is equal to 1. Y R is defined by

where it is assumed that at least on element of α Z is different from 0. Thus, we have

Yi = Yi RYi* and the conditional quantile of the observed Y is given by

continuity: w ≡ (υ , u ) has a continuous density

and single index: ( )

can be rewritten as a switching regression model

The estimation procedure is defined as follows:

(αˆ (θ , δ ) , γˆ (θ , δ ) ) = arg(α inf

(αˆ (θ ) ,αˆ (θ ) , βˆ (θ ) , βˆ (θ )) = (αˆ (θ , δˆ (θ )) ,αˆ (θ , δˆ (θ )) + δˆ (θ ) , βˆ (θ ) , βˆ (θ ) ) .

3.3 Integration of nonparametric first step estimation

used in this procedure.

different possible vectors X ( j ) , j = 1,...J . Chernozhukov and Hansen’s estimator can be

effect (the coefficient on D) δ j (θ ) are estimated. The asymptotic distribution of αˆ j (θ ) and

δˆ j (θ ) is directly derived from (8):

J ×k (with J ≥k) matrix with rows X (1) ,..., X ( J ) , α (θ ) = (α1 (θ ) ,..., α J (θ ) ) ,

Ws (θ ) , a positive-definite matrix, for s = 0,1 . The minimum distance estimators of β 0 (θ )

and β1 (θ ) are then defined by

βˆ0 (θ ) = min (αˆ (θ ) − G β ) 'Wˆ0 (θ ) (αˆ (θ ) − G β )

variance αˆ j (θ ) + δˆ j (θ ) . An efficient minimum distance estimator is obtained by setting

Ws (θ ) equal to a consistent estimator of Ω −s 1 .Note that if we estimate the same model as

4 Decomposition of differences in distribution