Knaus m23805 PDF

Uncovering Treatment Effect Heterogeneity in
Swiss Job Search Programs

Michael Knaus, Michael Lechner+ , and Anthony Strittmatter *
First draft: April 2017

Date this version has been printed: 15 April 2017
Preliminary and incomplete. Please do not circulate or quote without the permission of one of
the authors
Comments are very welcome
Abstract: We systematically investigate the effect heterogeneity of Job Search Programmes for
unemployed, which are part of the Swiss Active Labour Market Policy. The analysis is based on rich
data coming from the Swiss social insurance system. We combine recently developed Lasso-type
estimators from machine learning to detect such heterogeneities with inverse probability weighting to
correct for selective participation in the programme. We find that while in the medium run-effect there
appear to be no systematic heterogeneities, the so-called lock-in effect (close to the beginning of the
programme) shows considerable heterogeneities. In line with previous results in the literature,
unemployed with a priori bad chances (bad employment risks) in the labour market suffer much less
from a lock-in effect than unemployed expected to find jobs soon even without a programme.
Keywords: Active labour market policy, LASSO, machine learning, effect heterogeneity
JEL classification: I12, I18, J24, L83, C21.
Addresses for correspondence: Michael Knaus, Swiss Institute for Empirical Economic Research (SEW),
University of St. Gallen, Varnbelstrasse 14, CH-9000 St. Gallen, Switzerland, Michael.Knaus@unisg.ch.
Michael Lechner, Swiss Institute for Empirical Economic Research (SEW), University of St. Gallen,
Varnbelstrasse 14, CH-9000 St. Gallen, Switzerland, Michael.Lechner@unisg.ch, www.michael-lechner.eu .
Anthony Strittmatter, Swiss Institute for Empirical Economic Research (SEW), University of St. Gallen,
Varnbelstrasse 14, CH-9000 St. Gallen, Switzerland, Anthony.Strittmatter@unisg.ch,
www.anthonystrittmatter.com.
+
Michael Lechner is also affiliated with CEPR and PSI, London, CESIfo, Munich, IAB, Nuremberg, and IZA, Bonn.
*
A previous version of the paper was presented at the University of Maastricht, Department of Economics. We thank
participant for helpful comments and suggestions. The usual disclaimer applies.
1 Introduction
In this study, we employ machine learning methods for a large-scale principled investigation of
effect heterogeneity in Job Search Programmes (JSPs) in Switzerland. Programme evaluation
studies widely acknowledge the possibility of effect heterogeneity for different groups of
individuals. Stratifying the data in mutually exclusive groups or including interactions in a
regression framework are two baseline approaches to investigate effect heterogeneity (see
Athey and Imbens, 2016a, for a review). However, for large-scale investigations of effect
heterogeneity, the standard p-values of classical (single) hypothesis tests are no longer valid
because of the multiple testing problem. The multiple testing problem can lead to the reporting
of spurious heterogeneous effects that result from false positives (see, e.g., Lan et al., 2016,
List, Shaikh, and Xu, 2016).
The disadvantages of ex post data mining have been recognized in the evaluation
literature. For example, in randomized experiments, researchers may be required to define their
heterogeneity analysis plan prior to the experiment to avoid reporting (and searching for)
significant effects only (e.g., Casey, Glennerster, and Miguel, 2012, Olken, 2015). However,
these pre-analysis plans are inflexible and usually not required for observational studies. An
alternative approach that partly alleviates the ex post selection problem is to report effect
heterogeneity for all possible groups. For large-scale investigations, an approach taking account
of all possible differences might lead to very small groups and thus imprecise estimates, and
make it also difficult to report the results in an intuitive way.
A developing literature proposes to use (adapted) machine learning algorithms to
systematically search for groups with heterogeneous effects (see, e.g., the review of Athey and
Imbens, 2016b). Potentially, machine learning approaches are attractive because they could
provide a principled approach to heterogeneity detection and avoid the multiple testing
problem. In addition, they are flexible and remain computationally feasible, even when the
1
covariate space becomes high-dimensional and possibly exceeds the sample size. Such a
systematic search for heterogeneity may uncover substantial effect heterogeneity between a
priori unexpected groups.
Machine learning methods are powerful tools for out-of-sample predictions of
observable variables. However, the fundamental problem of causal analyses is the inability to
observe individual causal effects (see Angrist and Pischke, 2010, Imbens and Wooldridge,
2009, among many others). We are able to observe individual outcome levels under specific
treatment conditions, but the counterfactual outcome is unobservable. Recently, several
methods have been proposed to apply machine learning methods in ways which overcome the
fundamental problem of causal analyses (see, e.g., Athey and Imbens, 2016c, Foster, Taylor,
and Ruberg, 2011, Green and Kern, 2012, Grimmer, Messing, and Westwood, 2016, Imai and
Strauss, 2011, Taddy et al., 2015, Vansteelandt et al., 2008). 1 Two promising approaches are
the Modified Outcome Method (MOM, Signorovitch, 2007) and the Modified Covariate
Method (MCM, Tian et al., 2014). The modifications of both methods allow to approximate
group specific treatment effects in a first step. 2 Even though these group effects are (usually)
not well identified and estimated because they may relate to a high dimensional covariate space,
their approximations can be used to apply systematic group search algorithms in a second step.
However, the robustness of the estimation results across different methods and model
specifications and the efficiency of different estimation procedures are mostly unexplored.
1
Qian and Murphy (2011), Xu et al. (2015), and Zhao et al. (2012) study individualized treatment rules, which directly focus
on the decision rule instead of estimating the effect heterogeneity. Ciarleglio et al. (2015) propose a method to select the
optimal treatment conditional on observed individual characteristics. Zhao et al. (2015) investigate the optimal dynamic
order of sequential treatments.
2
While MOM approximates group specific treatment effects with the help of an outcome modification, the MCM modifies
the covariates. The MOM can be estimated fully non-parametrically (e.g., using classification tree or random forest
estimators, see, e.g., Athey and Imbens, 2016c). The MCM requires a parametric model specification, but the model can be
very flexible and efficiency improving estimation algorithms are available.
2
JSPs provide training in effective job search and application strategies as well as
monitoring of actual applications. An assignment to a JSP may lead to threat effects before the
programme start (see, e.g., Black et al., 2003, van den Berg, Bergemann, and Caliendo, 2008)
and may affects the matching process and quality between the participants and the potential
new job (see, e.g., Blasco and Rosholm, 2011, Cottier et al., 2017). Push effects could occur if
participants accept jobs with low matching quality because of actual or perceived sanctions or
perceived future ALMP assignments. Push effects decrease the unemployment duration, but
may reduce the employment stability. On the other hand, JSP participation could improve the
visibility of suitable job vacancies and the efficiency of the application process, which may
improve the employment stability. Furthermore, many studies are concerned about the
crowding-out of non-participants (see, e.g., Blundell et al., 2004, Crpon et al., 2013, Gautier
et al., 2017).
The empirical evidence about the effectiveness of JSPs is mixed. The review studies of
Card, Kluve, and Weber (2010, 2015) and Crpon and van den Berg (2016) document a weak
tendency towards positive effects of JSPs, especially in the short-run. 3 For the Swiss JSP we
investigate, the literature finds negative employment effects, which fade one year after the start
of participation (see Gerfin and Lechner, 2002, Lalive, van Ours, and Zweimller, 2008). One
reason for the ambiguous effectiveness of JSPs might be different relative intensities of jobs
search training and monitoring. Van den Berg and van der Klaauw (2006) are concerned that
intensive monitoring reduces informal job search, which might be a more efficient strategy than
formal job search for some unemployed persons. They suggest formal job search is more
effective for individuals with little labour market opportunities. Consistent with their
3
For the US, Meyer (1995) reports negative effects on unemployment benefit payments and positive earnings effects of JSPs.
For Denmark, Graversen and van Ours (2008) and Rosholm (2008) report positive effects of JSPs on the unemployment
exit rate. For Germany, Wunsch and Lechner (2008) find JSPs have negative effect during the first two years after program
begin, which fade-out afterwards. They show that training sequences are responsible for long lasting negative lock-in effects.
3
arguments, Card, Kluve, and Weber (2015) document that job search programmes are relatively
more effective for disadvantaged participants. Vikstrm, Rosholm, and Svarer (2013) find
slightly more positive effects of JSP for women and younger participants. Dolton and ONeill
(2002) report negative employment effects of JSP for men and insignificant effects for women
five years after programme start. Surprisingly, the programme evaluation literature is lacking
large-scale evidence about the effect heterogeneity of JSPs.
In this study, we systematically investigate effect heterogeneity of JSPs. Possibly some
groups of unemployed persons benefit more from JSPs than others. We base the search
algorithm on many attributes of the unemployed and their caseworkers (we consider 1,268
different variables which might be predictive for effect heterogeneity). Effect heterogeneity by
caseworker characteristics could reflect different baseline monitoring intensities (Behncke,
Frlich, and Lechner, 2010a, 2010b). Furthermore, we translate discovered heterogeneities into
information useful for decision makers. Knowledge of heterogeneous effects enables decision
makers to improve the allocation rules for JSPs such that their intended effects and cost-benefit
efficiency increase (see discussions in, e.g., Bhattacharya and Dupas, 2012, Dehejia, 2005,
Lechner and Smith, 2007).
In the active labour market programme (ALMP) evaluation literature based on the
informative data sets from administrative registers, it has become common to pursue a
selection-on-observables strategy (see, e.g., Imbens and Wooldridge, 2009). We assume to
control for all confounders that jointly affect the probability to participate in the JSP and the re-
employment probability. This assumption is likely to hold based on the large and rich
administrative data we use (see, e.g., discussion in Biewen et al., 2014, Lechner and Wunsch,
2013). We balance the confounders with an Inverse Probability Weighting (IPW) estimator
(e.g., Hirano, Imbens, and Ridder, 2003). After calculating these IPW weights, we follow Chen
et al.s (2017) general weighting framework for MCM that is applicable in non-experimental
4
empirical designs. 4 We combine the re-weighted MCM with Tibshiranis (1996) Least Absolute
Shrinkage and Selection Operator (LASSO). 5 We achieve valid confidence intervals by splitting
the sample. In particular, we use one half of the sample to select variables that are relevant to
predict the size of the treatment effect and the other sample to estimate the coefficients for the
selected variables. This approach yields unbiased coefficients and honest confidence
intervals, which are valid under an independent sampling argument (see, e.g., Fithian, Sun,
Taylor, 2015, Cox, 1975).
Our results suggest that the different methods for principled effect heterogeneity
investigation provide useful information. The main conclusions remain robust across a variety
of different methods and model specifications. We find substantial effect heterogeneity of Swiss
JSPs during the first year after the start of participation. During this period, we observe for most
unemployed groups negative effects. The heterogeneity is strongly related to the characteristics
of the unemployed persons. Consistent with the previous literature, participants with
disadvantaged labour market characteristics benefit more from JSPs. We find only little
heterogeneity by caseworker characteristics. The effects fade after one year. We do not discover
substantial effect heterogeneity for time periods beyond one year after program begin.
In the next section, we give information about the institutional background. In Section
3, we introduce our data. In Section 4, we discuss our econometric approach. In Section 5, we
report and discuss our results. In Section 6, we conclude. Appendices A to C give additional
detailed information on descriptive statistics, the selection model and additional results.
4
Imai and Ratkovic (2013) and Zhang et al. (2012) develop alternative non-experimental approaches for principled effected
heterogeneity search.
5
Additionally, we consider other LASSO-type estimators (Adaptive LASSO, Post LASSO), different procedures to specify
tuning parameters (Cross-validation) and compare the MCM and MOM. To this end, we investigate the consistency of our
results across different procedures to investigate effect heterogeneity.
5
2 Institutional Background
Switzerland is a federal country with 26 cantons and three major language regions (French,
German, and Italian). It is a relatively wealthy country with approximately 78,000 CHF
(approx. 77,000 US-Dollar) GDP per capita and has a low unemployment rate of 3-4% (SECO,
2017, Bundesamt fr Statistik, 2017). Unemployed persons have to register at the regional
employment agency closest to their home. 6 The employment agency pays income maintenance.
Benefits amount to 70-80% of the former salary depending on age, children, and past salary
(see Behncke, Frlich, and Lechner, 2010a). The maximum benefit entitlement period is 24
months.
The yearly expenditures for Swiss active labour market programs (ALMP) exceed 500
mio. CHF (Morlok et al., 2014). Unemployed persons can participate in a variety of different
ALMP. Gerfin and Lechner (2002) classify these ALMP as (a) training courses, (b)
employment programs, and (c) temporary employment schemes. Training courses include job
search, personality, language, computer, vocational, and employment programs. We focus
exclusively on JSPs in this study, because this the most common ALMP in Switzerland (more
than 50% of the assigned ALMPs are JSPs, Huber, Lechner, and Mellace, 2017) and provides
a sufficient number of observations to search for heterogeneity. JSPs provide training in
effective job search and application strategies (e.g., training in rsum writing). Furthermore,
actual applications are screened and monitored. The JSPs we consider are relatively short, with
an average duration about three weeks. Training takes place in class rooms. The employment
agency covers the costs of training and travel costs. Participants are obliged to continue job
search during the course.
6
At the beginning of the unemployment spell, newly registered unemployed are often sent to a one-day workshop providing
information about the unemployment law, obligations and rights, job search requirements, etc.
6
In Switzerland, the regional employment agencies have a large degree of autonomy,
which is partly related to the countrys federal organisation. The assignment decision to a
training course is made by caseworkers based on their subjective impression combined which
local office policy and national eligibility rules. The national rules are however rather vague.
They imply, for example, that the training has to be necessary and adequate to improve the
individual employment chances. Unemployed persons have the possibility to apply for
participating in such courses, but the final decision is always made by the caseworkers.
Caseworkers can also essentially force unemployed into such courses by threating to impose
sanctions.
3 Data
3.1 General
We use data including all individuals who are registered as unemployed at a Swiss regional
employment agencies in the year 2003. The data contains rich information from different
unemployment insurance databases (AVAM/ASAL) and social security records (AHV). This
is the standard data used for many Swiss ALMP evaluations (e.g. Gerfin and Lechner, 2002,
Lalive, van Ours, and Zweimller, 2008, Lechner and Smith, 2007). We observe (among others)
nationality, qualification, education, language skills, employment history, profession, position,
industry of last job, desired occupation and industry, and an employability rating by the
caseworkers. The data contains detailed (de-)registration dates, and participation in ALMP. The
administrative data is linked with regional labour market characteristics, such as the population
size of municipalities and the cantonal unemployment rate. The availability of caseworker
information distinguishes our data. Swiss caseworkers employed in the period 2003-2004 were
surveyed based on a written questionnaire in December 2004 (see Behncke, Frlich, Lechner,
7
2010a, 2010b). The questionnaire contained questions about aims, strategies, processes of the
caseworker and the regional employment agency.
3.2 Sample definition
In total, 238,902 persons registered as being unemployed in 2003. Here, we only consider the
first unemployment registration per individual in 2003. Each registered unemployed person is
assigned to a caseworker. In most cases, the same caseworker is responsible for the entire
unemployment spell of his/her client. If this is not the case, we focus on the first caseworker to
avoid concerns about (rare) endogenous caseworker changes (see Behncke, Frlich, Lechner,
2010a). We only consider unemployed aged between 24 and 55 years who receive
unemployment insurance benefits. We omitted unemployed persons who apply for disability
insurance benefits, when the responsible caseworker is not clearly defined, or when their
caseworkers did not answer the questionnaire (that had a response rate of 84%). We drop
unemployed foreigners with a residence permit that is valid for less than a year. Finally, we
drop unemployed persons from a few regional employment agencies that are not comparable to
the other regional employment agencies. This sample is identical to the data used in Huber,
Lechner, and Mellace (2017). It contains 100,120 unemployed persons.
One concern regarding the treatment definition is the timing with respect to the elapsed
unemployment duration prior to participation: Caseworkers may assign unemployed persons to
job training programmes essentially anytime during their unemployment spell. Several points
have to be taken into account to address this issue (see the discussions in Abbring and van den
Berg, 2003, 2004, Fredriksson and Johansson, 2008, Heckman and Navarro, 2007, Lechner,
2009, Robins, 1986, Sianesi, 2004, among others).
We consider a classical static evaluation model and use the following denitions. The
treatment is defined as the first participation in a job search program during the first six months
8
of unemployment (83% of JSP are assigned within the first six months of unemployment). We
exclude individuals who participate in other ALMP within the first six months of
unemployment from the sample, such that our control group represents non-participants of all
programmes (8,781 other ALMP participants are dropped). Potentially this approach could lead
to a higher share of individuals with better labour market characteristics among control group
than among the training participants, because individuals in the control group possibly found
already a job prior to their potential treatment times. This would bias the results negatively. We
randomly assign (pseudo) participation starts to each individual in the control group. Thereby,
we recover the distribution of the elapsed unemployment duration at the time of training
participation from the treatment group (similar to, e.g., Lechner, 1999, Lechner and Smith,
2007). To ensure comparability of the treatment denitions of the participants and non-
participants, we only consider individuals who are unemployed at their (pseudo) treatment
dates. This makes the groups of participants and non-participants comparable with respect to
the duration of unemployment and ensures that treated and controls are eligible for programme
participation at their respective assigned start dates.
The finale sample contains 85,185 unemployed individuals (Table A.1 in Appendix A
provides the details of the sample selections steps). From this sample, 12,998 unemployed
individuals participate in a job search program and 72,187 are members of the control group.
These unemployed are assigned to 1,282 different caseworkers.
3.3 Descriptive statistics

Table 3.1 reports the means and standard deviations for the main variables for participants and
non-participants. The descriptive statistics for all other variables are shown in Table B.1 in
Appendix B. During the first 31 months after the start of training, participants in JSPs are less
months employed than non-participants. Six months after the start of participation, we find
strong negative differences with standardized differences above 20. For the other time periods
9
the standardized differences are smaller. During the months 25 until 31 the after start of training,
participants are slightly higher employment intensity.
Table 3.1: Descriptive statistics of important variables by treatment status
Participants Non-Participants Std. Diff.

Mean S.D. Mean S.D. in %
Outcomes
Cumulated employment in first 6 months 1.21 1.93 1.94 2.44 -23.29
Cumulated employment in months 25 - 31 3.48 2.88 3.33 2.86 3.72
Individual characteristics
Female 0.45 - 0.44 - 0.58
Age in 10 year 3.73 0.88 3.66 0.86 5.59
Qualification: unskilled 0.22 - 0.23 - -1.80
Qualification: some degree 0.60 - 0.56 - 5.19
Employability rating: low 0.12 - 0.14 - -3.97
Employability rating: medium 0.77 - 0.74 - 5.79
Employability rating: high 0.11 - 0.12 - -3.62
Number of unemployment spells in last two years 0.41 0.98 0.64 1.27 -13.85
Fraction of months employed in last two years 0.83 0.22 0.79 0.25 12.57
Past income in CHF 10,000 4.58 2.02 4.16 2.05 14.50
Caseworker characteristics
Age in 10 years 4.43 1.16 4.44 1.16 -0.77
Female 0.45 - 0.41 - 6.94
Tenure in employment office in years 5.54 3.23 5.86 3.31 -6.84
Own experience of unemployment 0.63 - 0.63 - 0.54
Degree in vocational training for caseworkers 0.26 - 0.23 - 5.63
Local labour market characteristics
German speaking employment office 0.89 - 0.67 - 39.68
French speaking employment office 0.08 - 0.25 - -33.30
Italian speaking employment office 0.03 - 0.08 - -16.81
Cantonal unemployment rate in % 3.64 0.77 3.75 0.86 -9.23
GDP per capita in the canton in CHF 10,000 5.13 0.92 4.92 0.93 15.75
Number of caseworker 989 1,282
Number of observations 12,998 72,187
Note: Unconditional means of all, standard deviations of non-binary variables, and standardized differences in % between
participants and non-participants. The descriptive statistics of all variables are in Table B.1 of Appendix B. See
Rosenbaum and Rubin (1983) for a definition of the standardized difference. They consider an absolute standardized
difference of more than 20 as being large.
Furthermore, we observe differences in individual characteristics, the characteristics of
their caseworkers, and the local labour market conditions. Participants have been more months
employed and received a higher income than non-participants in the last two years. We only
10
find little differences between the caseworker of participants and non-participants. 7 Finally, we
find participants are more often registered at a German speaking local employment agency and
live in cantons with better economic conditions (in terms of local GDP and unemployment rate)
than non-participants.
4 Econometric approach
4.1 Basic concept and identification

We describe the parameters of interest using the causality notion introduced in Rubins (1974)
potential outcome framework. Let Yi1 denote the potential outcome (i.e., months employment)
when individual i (for i = 1,..., N ) participates in a JSP ( Di = 1) . Conversely, Yi denotes the

0
potential outcome when individual i is not participating in a JSP ( Di = 0 ) . Obviously, each
individual can either participate or not, but both cannot occur simultaneously. This implies that
only one potential outcome is observable. The observed outcome equals
Yi = Yi1 Di + Yi 0 (1 Di ) .
It is common to define the causal treatment effect of D on Y for an individual as
=i Yi1 Yi 0 .
Even in randomized experiments, the individual causal effects, i , are not identified; however,
suitable average values that may be of interest and potentially identifiable under additional
assumptions. Standard examples are the average treatment effect (ATE), = E [ i ] , or the
=
average treatment effect on the treated (ATET), [ i | Di 1] . The former represents the
E=
7
In Table B.1 in Appendix B we also show caseworker characteristics interacted with the language of the local employment
agency. For the interacted variables we find partly strong differences between participants and non-participants.
11
average causal effect for the population of unemployed, the latter the average for participants
only. ATE and ATET might differ in the presence of selection into treatment and effect
heterogeneity.
Both aforementioned average effects could hide interesting effect heterogeneity.
Therefore, researchers gain interest in investigating the conditional average treatment effects
(CATEs). CATEs report causal treatment effects for a specific group characterized by the
variables in the vector Z i ,
( z ) = E i Z i = z = E Yi1 Yi 0 Z i = z ,
(=
z ) E i =
Z i z ,= 1 E Yi1 Yi 0 =
Di = Di 1 ,
Z i z ,=
where we indicate random variables by capital letters and realizations of these random variables
by small letters. The vector Z i contains variables that are potentially relevant for effect
heterogeneity.
In addition to the variables included in the vector Z i , we consider confounding variables
which are included in the vector X i . Confounders are variables which jointly affect the
probability to participate in JSP and the employment outcome. Z i may be larger, smaller,
partially or fully overlapping with X i depending on the question under investigation. We
impose the following assumption to achieve identification.
Assumption 1 (Conditional independence): Yi1 , Yi 0 Di |=

X i x=
, Z i z for all values of x and
z in the support.
Assumption 2 (Common support): 0 < P( Di =

1| X i =
x, Z i ==
z ) p ( x, z ) < 1, for all values of
x and z in the support.
=
Assumption 3 (Exogeneity of controls): X i1 X=0 1
i , Zi Z i0 .
12
Assumption 1 states that the potential outcomes are independent of program
participation conditional on confounding and heterogeneity variables. The plausibility of this
assumption is justified by the availability of a very detailed set of confounding variables
containing characteristics of unemployed and caseworkers. The studies of Biewen et al. (2014)
and Lechner and Wunsch (2013) discuss the selection of confounders in ALMP evaluations
based on rich administrative data. The decision for program participation is mainly driven by
the caseworker, who has high autonomy within the employment office. Our data contain the
same objective measures about labour market history, education and socio-demographics of the
unemployed as well as local labour market characteristics that are observable to the caseworker
when deciding about program participation. Additionally, we observe detailed information
about the caseworker and her counselling style. These are potential confounders as caseworker
characteristics might affect the probability of participation and labour market outcomes
simultaneously. In our baseline estimation, we adapt the specification of Huber, Lechner, and
Mellace (2017).
Under Assumption 2 the conditional probability to participate in JSP lies between zero
and one. The common support assumption has to hold when conditioning jointly on X and Z .
By Bayes rule, this assumption implies 0 < p( x) < 1 and 0 < p( z ) < 1 for
p=
( x) P=
( Di 1|=
X i x) and p=
( z ) P=
( Di 1|=
Z i z ). We enforce common support by
trimming observations below the 0.5 quantile the participants and the 99.5 quantile of non-
participants. 8 This procedure shows good final sample performance in the study Lechner and
Strittmatter (2017). Assumption 3 requires exogeneity of confounding and heterogeneity
variables. To account for this assumption, we only use control variables which are determined
prior to the start of JSP participation.
8
Total number of trimmed observations: 6,767 (579 participants, 6,188 non-participants).
13
Theorem 1 (Identification): Under Assumptions 1-3 the following equalities hold
( z) = [ E (Yi | Zi =
E X |Z z = 1) | Z i z ] E X |Z
z , X i , Di == z [ E (Yi | Zi = 0) | Z i z ]
z , X i , Di ==
( z) =
E (Yi | Z i = 1) E X |Z = z [ E (Yi | Z i =
z , Di = z , X i , Di =
0) | Z i = 1]
z , Di =
Thus ( z ) and ( z ) are identified from observable data on { yi , di , zi , xi }i =1 .

N
The proof of Theorem 1 is in Appendix C. When the vectors X i and Z i are identical,
then ( x) =
( x) =
E (Yi | X i =
x, Di =
1) E (Yi | X i =
x, Di =
0), but this does not allow to search
for groups of unemployed persons with relevant effect heterogeneity.
4.2 Modified covariate method (MCM)
Our baseline analysis applies the MCM introduced by Tian et al. (2014) to systematically
uncover effect heterogeneity. To gain some intuition about this method, assume participation
in a programme is randomly assigned to 50% of the unemployed persons. Accordingly, in this
introductory example is no need to adjust for selection bias adjustments using confounding
variables X i . Furthermore, consider Z i contains a constant term and one additional variable
Z i1 which is responsible for effect heterogeneity. A standard linear regression model is
Yi =Z i s + Di Z i + ui . (1)
The first term on the right side of equation (1) provides the linear approximation of the
=
conditional expectation for under non-participants z s E=
Yi 0 | Z i z . The second term on
the right side of (1) provides a linear approximation of the CATE
( z) =
z =
E (Yi1 Yi 0 | Z i =
z ).
Tian et al. (2014) propose to include the modified variable=

Ti 2 Di 1 in the regression
model
14
Ti Z i
Yi =Z i t + + vi . (2)
2
The treatment indicator is shifted from Di {0,1} to Ti 2 {0.5, 0.5} . The modification does
not alter the parameters . But z t becomes now the linear approximation of E [Yi | Z i = z ] .
Notice that Cov ( Z i1 , Ti Z i1=

) Var ( Zi1 ) E[Ti=] 0, where the first equality holds under random
assignment of training participation and the second equality holds because E[T1 ] = 0. 9
Accordingly, the right hand terms of equation (2) are independent of each other and the
coefficients t and can be estimated in two separate bivariate regressions. For example, we
can estimate CATEs with the model
Ti Z i
=Yi + i ,
2
Which we call the MCM model in the following. 10 The MCM is a suitable method when only
the interaction effects and not the main effects are of interest. Parsimony and robustness to
misspecification of the main effects are two advantages of the MCM compared to specification
(1). These properties maintain in the non-experimental setting, which we discuss next.
Chen et al. (2017) show that MCM can be combined with IPW re-weighting, a standard
approach to balance covariates in observational studies (Hirano, Imbens, and Ridder, 2003).
The parameters can be estimated using Weighted Ordinary Least Squares (WOLS), i.e. by
minimising the objective function
1 N
tz
2

= arg min w (di , xi , zi ) yi i i
2
(3)
N i =1
9
In contrast, Cov ( Z i1 , Di Z i1 )= Var ( Z i1 ) E[ Di ]= Var ( Z i1 ) 2 > 0.
10
Alternatively, the MOM makes the outcome modification Yi * = 2TY
i i
, which enables the non-parametric
( z ) E=
= Yi * Z i + ri estimation of CATEs.
Yi * Z i z or parametric =

15
where the IPW weights
di P ( Di =1| X i = x, Z i = z )
w (di , xi , zi ) ti
P ( Di =1| X i = x, Z i = z ) 1 P ( Di =1| X i = x, Z i = z )
are calculated using the estimated propensity score.
4.3 Principled search for effect heterogeneity

An attractive estimator from the machine learning literature to conduct a principled search for
predictive variables is Tibshiranis (1996) Least Absolute Shrinkage and Selection Operator
(LASSO, see, e.g., Hastie, Tibshirani, Wainwright, 2015, Horowitz, 2015, for recent
developments of LASSO). The weighted LASSO estimator of the MCM minimizes the
objective function
1 N zi ti
2 p
= arg min w ( di , xi , zi ) yi + N j
(4)
=
N i 1 = 2 j 1
where we add a penalization term for the sum of the coefficients of the p variables in Z . The
penalizing parameter N specifies the amount of penalization. If N = 0 , then (4) is equivalent
to the WOLS model in (3). However, when N > 0 some coefficients are shrunken towards
zero. For sufficiently high values of N some coefficients become exactly zero. Therefore, the
LASSO serves as a model selector, omitting variables with little predictive power from the
model. 11 A challenge is the optimization of the penalty term, such that only the relevant
predictors of the CATE remain in the model. Too low penalties lead to overfitting, too high
penalties lead to models which miss important variables. We use cross-validation to optimize
the penalty term.
11
The larger the values of N the fewer variables remain in the model. By gradually increasing the penalty term one can
obtain a path from a full model to a model which only contains the constant term.
16
A common approach to select the optimal penalty term is to use cross-validation (e.g.,
Bhlmann and van der Geer, 2011, Chen et al., 2017, Tian et al., 2014). Following this
approach, we apply 10-fold cross-validation to find the penalty term with the best out-of-sample
performance. This means we randomly split the sample into ten subsamples of equal size. The
weighted LASSO is then estimated on a grid of different penalties using only nine of the
subsamples. The remaining subsample is used to evaluate the out-of-sample root mean squared
error (RMSE) for each penalty term on the grid. This is done ten times such that every
subsample is left out once. The average over the ten RMSE paths approximates the out-of-
sample RMSE. The penalty term with the minimum average RMSE along the grid is used to
estimate the weighted LASSO in the full sample.
The LASSO coefficients are biased when N > 0 (see, e.g., Zou, 2006). For this reason,
we use the so called Post-LASSO coefficients to calculate the RMSE in the cross-validation.
The Post-LASSO coefficients are obtained from a WOLS (similar to model (3)) which includes
all variables with non-zero coefficients in the LASSO at the considered penalty value (see, e.g.
Belloni, Chernozhukov, Hansen, 2013).
The 10-fold cross-validated Post-LASSO selects a set of heterogeneity variables that
can be used to calculate the predicted CATE(z) ( z ) = z . However, selecting and estimating
these coefficients in the same sample can lead to invalid inference, because the selected model
would just be fitted to the data. To overcome possible concerns, we randomly split the sample
in two parts with similar size. The first sample is used to select the variables with the described
procedure and the second sample is used to estimate with the selected model. Under
independence between the selection and estimation samples, standard WOLS statistical
inference is valid for the selected variables. Taking these coefficients, we are able to calculate
the predicted treatment effect of each observation in our sample using ( zi ) = zi .
17
We observe that different random splits of the sample lead to different models and the
overlap of the selected variables across different splits is only small. However, the correlations
of the predicted individual effects ( zi ) across different random splits are usually above 0.5.
We conclude that the selected variables are not identical but contain similar relevant
information for effect heterogeneity across sample splits. The predicted individual effects are
therefore averaged over 20 different random splits to reduce model dependency and smooth
outliers resulting from potential in specific model extrapolation. 12 These average predicted
individual effects, ( zi ) are then used to summarize the detected effect heterogeneity.
In our baseline model, we adapt the propensity score specification of Huber, Lechner,
and Mellace (2017), which is reported in Table D.1 of Appendix D. The set of candidate
heterogeneity variables consists of individual and caseworker characteristics, their second order
interactions, up to fourth order polynomials and logged versions of non-binary variables.
Additionally, we consider dummy variables for the 103 employment agencies as well as 29
category dummies for previous industry and 29 category dummies for previous job description.
We exclude variables where less than 1% of (non-)participants show non-zero values. Further,
we keep only one variable of variable pairs that show correlations higher than 0.99. In total,
this lead to 1,268 heterogeneity variables which we consider in the analyses.
4.4 Efficiency augmentation
In MCM the main effects do not have to be specified. Nevertheless, Tian et al. (2014) and Chen
et al. (2017) show that accounting for the main effects can improve performance of the estimator
for , because they can absorb variation in the outcome which is unrelated to the effect
heterogeneity. They propose two ways to account for the main effects.
12
This is in the spirit of bootstrap aggregation (`bagging`) in the machine learning literature (Breiman, 1996).
18
First, the one-step procedure selects main effects and interaction effects simultaneously
by solving
1 N zt
2 p
= arg min w (di , xi , zi ) yi zi i i + N j .
=
N i 1 = 2 j 1
We use this procedure in our main specifications.
Second, the two-step procedure estimates a model for the main effects in the first place.
Afterwards, the residuals u of the first step regression are used as new outcome when selecting
the interaction effects
1 N zi ti
2 p
= arg min w(di , xi , zi ) ui
+ N j .
=
N i 1 = 2 j 1
5 Results
5.1 Average effects
In Table D.1 in Appendix D reports the marginal effects of the propensity score model. The
results confirm the notion of the descriptive statistics that the participation probability is
increasing with the level the of previous labour market success. Unemployed with low labour
market opportunities have a lower probability to participate in JSP. Such cream-skimming is
often observed for the assignment of ALMP (e.g., Bell and Orr, 2002).
For the IPW estimation, we have to check the balancing of our covariates. Table D.2 in
Appendix D documents that IPW successfully balances the covariates between participants and
non-participants. The absolute standardised differences after re-weighting are always below
2.5%, even for basis variables not included in the propensity score. Figure D.1 shows that
common support is not an issue in our setting because the propensity score distributions of
participants and non-participants are overlapping for nearly all values.

19
Figure 5.1: Development of potential outcome and average programme effects after programme start
Notes: Inverse Probability Weighting estimates. Standard errors based on 4999 bootstraps clustered at caseworker level.
Circles / triangles indicate that the effects are statistically different from zero at the 5% level.
Figure 5.1 shows the estimated potential outcomes and average program effects on
employment for each of the first 31 months after the program start. We observe substantial lock-
in effects. The employment probability in the first three months is about 15 percentage points
lower for JSP participants compared to non-participants. However, participants catch up and
the differences in the employment probability fade after 16 months. In the months 22-24 after
programme start we find small positive effects. But this seems to be only a short nuisance.
Overall, the long-run effects are insignificant and very close to zero. The negative lock-in
effects are consistent with the findings of the previous Swiss JSP evaluations (e.g., Gerfin and
Lechner, 2002, Lalive, van Ours, and Zweimller, 2008). But also in other countries the
effectiveness of JSP is negative (see e.g., Dolton and ONeil, 2002, Wunsch and Lechner,
2008). Possibly participants reduce the intensity of informal job search and focus on formal job
search during participation in JSP. This might be an inefficient strategy, especially for those
participants with good labour market opportunities.
20
Table 5.1: Average programme effects for aggregated outcomes
ATE ATET ATENT
Cumulated employment in months Coef. S.E. Coef. S.E. Coef. S.E.
1-6 -0.80*** 0.02 -0.82*** 0.02 -0.80*** 0.02
1-12 -1.10*** 0.05 -1.13*** 0.04 -1.09*** 0.05
1-31 -1.14*** 0.14 -1.20*** 0.13 -1.12*** 0.15
25-31 -0.007 0.03 -0.011 0.03 -0.007 0.04
Notes: Inverse Probability Weighting estimates. Standard errors based on 4999 bootstraps clustered at caseworker level.
*, **, *** means statistically different from zero at the 10%, 5%, 1% level.
Searching for effect heterogeneity at each month after program start is computationally
expensive and hard to summarize in an intuitive way. Therefore, we focus on cumulated
employment after program start in the first 6, 12 and 31 months as well as cumulated
employment in months 25 to 31. Table 5.1 shows the respective average effects that mirror the
findings in Figure 5.1. The lower employment probabilities after the program translate into an
average decline of 0.8 employment months during the first six months. This decrement slightly
to 1.1 months during the first year and during the first 31 months. During the months 25 to 31
we find no significant employment effects.
5.2 Effect heterogeneity
Going beyond the average treatment effects, Table 5.2 reports the estimated coefficients in
the first sample split considered. 13 The coefficients are marginal effects on the treatment effect
of JSP as opposed to marginal effects on the outcome level in standard OLS. The coefficients
of the first row are the hypothetical treatment effect when all other variables in the model would
equal zero.
The first column of Table 5.2 reports the coefficient estimates for the outcome
cumulated employment during the first six months after the start of training participation. 17
out of the 1,268 considered variables are selected as being predictive for the size of the treatment
13
We omit the coefficients for the selected main effects because they are only used for the efficiency augmentation and
irrelevant for the interpretation.
21
effect in the model selection sample. In the estimation sample, five of these variables are
significant. For example, the treatment effect increment by 0.3 months for unskilled workers
with previous earnings between 0 and 25,000 CHF per year (see row 3). For a hypothetical
individual for whom all other selected variables equal zero, the predicted treatment effect would
be obtained as i =
0.87 + 0.3 =
0.57 .
Table 5.2: Coefficients of selected variables in outcome equation
Cumulated employment after registration in months 1-6 1-12

Coef. S.E. Coef. S.E.
Job search program (treatment) -0.89*** 0.05 -1.29*** 0.09
# of unemployment spells in last two years 0.06 0.12 - -
Unskilled * past income 0 - 25k 0.30*** 0.11 0.53 0.53
Skilled w/o degree * same gender 0.20 0.21 - -
Skilled w/o degree * age difference -0.01 0.01 - -
# of unemployment spells in last two years * age of CW 0.00 0.00 - -
# of unemployment spells in last two years * medium city size -0.05 0.06 -0.13 0.14
# of unemployment spells in last two years * past income 0 - 25k -0.04 0.06 -0.10 0.14
# of unemployment spells in last two years * prev. job unskilled 0.04 0.05 0.21* 0.13
# of unemployment spells in last two years * same gender -0.01 0.05 - -
Caseworker own unemployment experience * prev. job unskilled 0.19** 0.09 0.34* 0.21
Foreigner with C permit * past income 25 - 50k 0.19 0.12 - -
Small city * 50 - 75k -0.16* 0.09 -0.26 0.20
Single household * zero employment spell last two years -0.17** 0.08 - -
Single household * prev.job unskilled 0.16 0.11 - -
Previous job primary sector * age difference -0.02** 0.01 - -
Previous job restaurant -0.01 0.12 - -
Previous job tourist sector -0.09 0.12 - -
Unskilled * prev. job unskilled - - -0.22 0.64
# of unemployment spells in last two years * both primary education - - 0.19** 0.08
Degree in vocational training for caseworkers * past income 50 - 75k - - -0.13 0.30
Past income 25 - 50k * unskilled - - 0.14 0.24
# employment spells past five years * prev. job in primary sector - - -0.24 2.16
Prev. job in primary sector * unskilled - - -0.19 0.53
Employment agency 44 - - -0.68 0.52
# of selected variables 17 of 1,268 13 of 1,268
# significant at 10% 5 3
# of caseworkers 1,282
# of observations 39,216
Note: *, **, *** means statistically different from zero at the 10%, 5%, 1%. Standard errors clustered at caseworker level.
Table shows results for the first sample split, applying one step efficiency augmentation and 10-fold cross-validated
Post-Lasso. Results for long-term outcomes in Appendix E.
22
Figure 5.2: Distribution of average predicted individual effects for cumulated employment of first 6 months
Note: Kernel smoothed distribution of average predicted individual effects. Bandwidth chosen using the
Silverman rule. Results from the one step efficiency augmented 10-fold cross-validated Post-Lasso.
Dashed line shows the IPW estimate of the ATE.
The second column of Table 5.2 shows the 13 selected variables for cumulated
employment during the first 12 months after the start of training participation. The selected
heterogeneity variables are partially overlapping for the two employment outcome durations
(comp. column 1 and 2). Table E.1 in Appendix E reports additional results for the outcome
months employed in the first 31 months and months 25-31. While the former looks similar to
the results observed in Table 5.2, no heterogeneity variable was selected for the latter.
Figure 5.2 shows the distribution of the average predicted individual effects and reports
substantial variation in the treatment effects that are hidden by averaging over the entire
population. While the biggest part of the mass is between the ATE estimate -0.8 and -1, we
observe a non-negligible part of the unemployed with substantially less negative predicted
effects and even some with a positive predicted effect of the program.
Table 5.3 shows summary statistics of the predicted CATEs. The means of all the
predicted treatment effects are close to the estimated ATEs, as they should by the law of iterated
expectations. The largest predicted group effect for the first 6 months is 0.89. This suggest that
23
there are potentially unemployed who really benefit from the program. However, the majority
seems to be worse off under participation.
Table 5.3: Descriptive statistics of CATEs

Mean Median S.D. Min Max
Cumulated employment in first 6 months -0.78 -0.84 0.23 -1.40 0.89
Cumulated employment in months 25 - 31 -0.01 -0.01 0.03 -0.20 0.25
Notes: Descriptive statistics of the predicted individual effects on support (n = 78,431).
Averaged over the models obtained from 20 different sample splits.
Accordingly, the MCM is successful in discovering substantial effect heterogeneity.
However, the interpretation of the results is not easily accessible, because the underlying
functions are too complex. We can use computer algorithms to translate these complex
functions into decision rules. However, we want to go beyond that abstract description and want
to make explicit policy recommendations. One way to summarise the results for policy makers
is to marginalise the effects for specific variables of interest.
Figure 5.3 reports the average of the predicted group effects by 15 individual
characteristics. All characteristics are classified in bins with high and low. We report the
average predicted group effects low and high associated low and high values of the
characteristics, respectively. To account for the fact that the predicted group effects are based
on first step estimates , we exploit the asymptotic normality of WOLS coefficients. To this
end, we calculate 1,000 simulated predicted group effects s ( zi ) = zis , where for each
simulation s, s is drawn from a multivariate normal distribution N , ( ( )) with ( )

being the estimated covariance matrix of the WOLS. The standard deviation of all simulated
s ,high and s ,low is then used to calculate the confidence intervals that are displayed in Figures
5.3 and 5.4. We order the characteristics by the absolute size of the differences between high
and low .
24
Figure 5.3: Effect heterogeneity along individual characteristics for cumulated employment in the first 6 months
Note: Conditional average treatment effect in strata defined by low and high values of individual characteristics. Low
value: binary characteristic: 0, non-binary: < median; High: binary characteristic: 1, non-binary: median. Ordered
by size of absolute differences. Based on average predicted individual effects of 20 replications with one step
efficiency augmented 10-fold cross-validated Post-Lasso. 95%-confidence interval shown from 1000 draws from
the multivariate distribution of the estimated models and their respective means. *, **, *** means that the
differences in these 1000 draws are significantly different from zero at the 10%, 5%, 1% level, respectively.
Figure 5.3 shows substantial differences for nearly all considered variables. The biggest
difference is observed for the number of unemployment spells in the past two year. The average
effect for people with no previous unemployment is much lower than for unemployed with at
least one recent unemployment spell. The former show an average of -0.85 and the latter an
average of -0.61. Further, we see that individuals with some educational degree suffer much
more from JSP than individuals with no degree. In general, we observe that the lock-in effect
is much less pronounced for unemployed with lower qualification and therefore employment
prospects. These findings are consistent with the previous results of the evaluation literature
(e.g., Card, Kluve, and Weber, 2015, van den Berg and van der Klaauw, 2006). Furthermore,
25
the lock-in effects are less negative for foreigners. Possibly foreigners have a relatively small
network for informal job search. Therefore, the formal job search strategy might be relatively
successful for foreigners. We find only little heterogeneity by gender and age, which is in line
with the findings of Vikstrm, Rosholm, and Svarer (2013).
Figure 5.4 shows the same analysis with respect to caseworker characteristics. Although
we find some significant differences, they are much less pronounced than for the individual
characteristics. The biggest difference is observed if caseworkers have own unemployment
experience but the lock-in effect is only different by 0.06 months, which seems negligible.
Figure 5.4: Effect heterogeneity along caseworker characteristics for cumulated employment in the first 6 months
Note: Conditional average treatment effect in strata defined by low and high values of individual characteristics. Low
value: binary characteristic: 0, non-binary: < median; High: binary characteristic: 1, non-binary: median. Ordered
by size of absolute differences. Based on average predicted individual effects of 20 replications with one step
efficiency augmented 10-fold cross-validated Post-Lasso. 95%-confidence interval shown from 1000 draws from
the multivariate distribution of the estimated models and their respective means. *, **, *** means that the
differences in these 1000 draws are significantly different from zero at the 10%, 5%, 1% level, respectively.
26
Table 5.4 Characteristics of individuals with positive vs. negative predicted effects
0 < 0 Difference S.E.

Job search program within 6 months after registration 0.08 0.16 -0.08*** 0.01
Female 0.37 0.45 -0.06* 0.03
Past income in CHF 10,000 3.32 4.24 -0.10*** 0.01
Fraction of months employed in last two years 0.70 0.80 -0.10*** 0.01
Number of unemployment spells in last two years 4.96 0.54 4.39*** 0.16
Qualification: unskilled 0.57 0.24 0.33*** 0.05
Qualification: semiskilled 0.15 0.16 -0.01 0.02
Qualification: skilled without degree 0.20 0.04 0.13** 0.05
Qualification: some degree 0.08 0.57 -0.45*** 0.02
Foreigner with mother tongue in canton's language 0.14 0.11 0.02* 0.01
Employability rating: low 0.36 0.14 0.24*** 0.04
Employability rating: medium 0.63 0.76 -0.15*** 0.05
Employability rating: high 0.01 0.10 -0.09*** 0.00
Age in 10 year 3.64 3.67 -0.02 0.05
Foreigner with B permit 0.28 0.13 0.17*** 0.03
Foreigner with C permit 0.44 0.25 0.14*** 0.02
Number of observations 572 77,859
Note: Average characteristics of individuals with positive and negative predicted effects for cumulated employment in the first
6 months. Based on average predicted individual effects of 20 replications with one step efficiency augmented 10-
fold cross-validated Post-Lasso. Standard errors of the difference calculated as the standard deviation of the
differences in 1000 draws from the multivariate distribution of the estimated models and their respective means. *,
**, *** means that the differences in these 1000 draws are significantly different from zero at the 10%, 5%, 1%
level, respectively.
Finally, we can use the predicted group effects to investigate two highly policy relevant
questions: What are the characteristics of unemployed who benefit from the program? Do
caseworkers send these promising unemployed actually to the program?
The number of individuals with positive predicted effects is 572 which corresponds to
0.7% of the unemployed. Table 5.4 shows the differences in characteristics, standard errors are
again obtained by simulating 1000 predicted effects and calculating the standard deviation of
the differences in all simulations. The first row reports the JSP assignment share. Only 8% of
the unemployed with a predicted positive effect are actually assigned to the program. This is
half of the 16% who are assigned with negative predicted group effects. This points at potentials
to improve the selection of JSP participants. The results in the remaining rows show how the
selection could be improved concretely. The effectiveness of JSP could be higher if caseworker
assign preferably unemployed persons with bad labour market prospects. Especially
27
unemployed persons with a recent history of unemployment, without degree, and with a low
employability rating are associated with positive predicted outcomes.
5.3 Robustness
To investigate the sensitivity of our results to an alternative algorithm for model selection, we
apply the so-called Adaptive Lasso (Zou, 2006) defined by
N
G ( zi ) ti 2 p
j
Adaptive Lasso arg min w (di , xi , zi )( yi
= ) + Ridge , (3)
j 1 j
= i 1= 2
where jRidge is obtained in a first step by solving
N
G ( zi ) ti 2 p
= Ridge arg min w (di , xi , zi )( yi ) + j2 .
=i 1 =j 1 2
The Ridge estimator penalises the sum of squared coefficients instead of the sum of the
absolute coefficients (Hoerl and Kennard, 1970). It therefore shrinks the coefficients also to
zero but never exactly to zero and cannot be used for model selection. The Adaptive Lasso in
equation (3) shrinks coefficients that are already relatively small after Ridge penalization more
aggressive to zero and addresses the problem that Lasso tends to select too many variables. Zou
(2006) shows that this modification of the LASSO can achieve the Oracle property, which
means that it selects the correct model at least asymptotically.
We rely in our baseline estimation on the modified covariate method because it offers the
explained possibility of efficiency augmentation. Another possibility is to use the modified
outcome method that leaves the covariates unchanged but modifies the outcome. This procedure
is first proposed by Signorovich (2007) and extended to observational studies by Zhang et al.
(2012). It can be estimated using
28
1 N p

= arg min ( yi* G ( zi ) ) 2 + N j ,
= N i 1 =j 1
where
di p ( xi , zi )
=yi* y=
i w( d i , xi , zi )Ti yi
p ( xi , zi ) 1 p ( xi , zi )
is the modified outcome. Note that the average of all yi* is identical to the IPW estimate
of the ATE.
Table 5.5 shows that the average predicted individual effects for different
implementations are highly correlated. In the first column, the baseline model with one step
efficiency augmentation and Post-Lasso model selection shows a correlation of at least 0.7 with
the alternative methods. Therefore, it is not surprising that the main conclusions drawn from
the reported baseline estimations are confirmed using alternative implementations.
Table 5.5 Correlation of average predicted individual effects over different methods
Cumulated employment in months 1-6 1st PL 2st PL 1st Ada 2st Ada MOM PL MOM Ada
One step Post-Lasso 1.00
Two step Post-Lasso 0.85 1.00
One step Adaptive Lasso 0.79 0.51 1.00
Two step Adaptive Lasso 0.83 0.59 0.93 1.00
Modified outcome method Post-Lasso 0.82 0.81 0.62 0.65 1.00
Modified outcome method Adaptive Lasso 0.70 0.56 0.67 0.67 0.76 1.00
Note: Correlation of average predicted individual effects for different methods of efficiency augmentation, model selection
and modifications.
6 Conclusion
We investigate recently developed methods to uncover treatment effect heterogeneity. We
apply these methods to a large-scale investigation of effect heterogeneity of a training
programme. We allow for a high-dimensional set of potential heterogeneity variables and use
algorithms based on the LASSO (Tibshirani, 1996) to select variables that are predictive for the
29
size of the treatment effect. We split the sample to obtain consistent estimates after model
selection and aggregate over several models to obtain predicted group effects.
We find that the standard estimates of average treatment effects hide substantial
heterogeneity in the size of the effects. While unemployed with good labour market prospects
suffer from substantial lock-in effects if participating in the job search program, unemployed
with bad employment risks might even benefit. However, in the case of the considered JSP this
seems to be only a tiny fraction of unemployment. We find evidence that caseworkers are less
likely to send those unemployed to the JSP who actually profit from it. This suggests that
expected returns to program participation are either not considered or expected to be completely
different than they turned out to be.
The newly considered methods provide useful and plausible additional information for
empirical evaluation studies. Specifically, they could inform policy makers how to improve
existing assignment mechanisms to target individuals who are expected to benefit most and
therefore increase cost-benefit efficiency.
Researchers have a variety of choices how to specifically implement these new methods.
They can decide between modifying either the outcome or the covariates. Additionally, the
implementation of the variable selection is open to numerous different choices. We find that
our results are consistent over six different implementations. However, the current state of the
literature gives very little guidance which method is preferable for a specific application at
hand.
30
Appendix A: Sample selection
Table A.1: Sample selection criteria for empirical analysis
Selection criteria Remaining sample size
Population: all new jobseekers during the year 2003 238,902
Exclude Geneva and five other employment offices -19,464 219,438
Exclude jobseekers not (yet) assigned to a caseworker -4,289 215,149
Exclude foreigners without yearly or permanent work permit -5,399 209,750
Exclude jobseekers without unemployment benefit claim -18,434 191,316
Exclude jobseekers who applied for or claim disability insurance -3,163 188,153
Restrict to prime-age population (24 to 55 years old) -51,649 136,504
Exclude unemployed whose caseworker did not respond to the questionnaire -31,469 105,035
Exclude unemployed whose caseworkers did not respond to the -4,915 100,120
cooperativeness question
Exclude participants in other ALMP than JSP -8,787 91,333
Exclude individuals employed at (pseudo) treatment date -6,148 85,185
Source: Selection steps are (partly) collected from Table B.1 in Huber, Lechner, Mellace (2017).
31
Appendix B: Descriptive statistics
Table B.1: Descriptive statistics of variables used in empirical analysis by treatment status

Outcomes
Cumulated employment in months 25 - 31 3.48 2.88 3.33 2.86 3.72
Female 0.45 - 0.44 - 0.56
*French 0.04 - 0.11 - -19.51
*Italian 0.01 - 0.04 - -11.85
Mother tongue other than German, French, Italian 0.29 - 0.32 - -5.40
*French 0.02 - 0.08 - -18.01
*Italian 0.01 - 0.02 - -9.80
Qualification: unskilled 0.22 - 0.23 - -1.80
*French 0.03 - 0.05 - -8.36
*Italian 0.01 - 0.03 - -8.62
Qualification: semiskilled 0.15 - 0.16 - -2.45
*French 0.02 - 0.05 - -12.10
*Italian 0.00 - 0.01 - -5.16
Qualification: skilled without degree 0.03 - 0.05 - -4.72
*French 0.00 - 0.02 - -11.22
*Italian 0.00 - 0.01 - -4.11
Number of unemployment spells in last two years 0.41 0.98 0.64 1.27 -13.86
*French 0.05 0.36 0.19 0.76 -16.84
*Italian 0.02 0.22 0.07 0.46 -10.16
Fraction of months employed in last two years 0.83 0.22 0.79 0.25 12.57
*French 0.06 0.22 0.19 0.35 -30.04
*Italian 0.02 0.13 0.06 0.22 -15.77
Employability rating: low 0.12 - 0.14 - -3.98
*French 0.01 - 0.02 - -9.87
*Italian 0.00 - 0.01 - -4.94
Employability rating: medium 0.77 - 0.74 - 5.80
*French 0.07 - 0.19 - -26.32
*Italian 0.02 - 0.05 - -11.57
Age in 10 year 3.73 0.88 3.66 0.86 5.60
Age squared / 10,000 0.15 0.07 0.14 0.07 5.61
Married 0.47 - 0.49 - -2.35
Foreigner with B permit 0.11 - 0.14 - -6.96
Foreigner with C permit 0.23 - 0.25 - -3.12
Table B.1 to be continued
32
Table B.1 continued

Lives in big city 0.17 - 0.17 - -0.05
Lives in medium sized city 0.16 - 0.13 - 4.83
Past income in CHF 10,000 4.58 2.02 4.16 2.05 14.50
Number of employment spells in last 5 years 0.10 0.13 0.13 0.15 -14.70
Previous job in primary sector 0.06 - 0.10 - -10.44
Previous job in secondary sector 0.16 - 0.13 - 6.04
Previous job in tertiary sector 0.63 - 0.58 - 7.07
Foreigner with mother tongue in canton's language 0.12 - 0.11 - 2.40
Previous job self-employed 0.00 - 0.01 - -3.01
Previous job manager 0.08 - 0.07 - 1.85
Previous job skilled worker 0.63 - 0.60 - 4.70
Previous job unskilled worker 0.26 - 0.29 - -5.01
Training start in second quarter of unemployment 0.45 - 0.46 - -0.38
Age in 10 years 4.43 1.16 4.44 1.16 -0.77
*French 0.37 1.29 1.14 2.04 -31.79
*Italian 0.11 0.70 0.34 1.19 -16.43
Female 0.45 - 0.41 - 6.94
*French 0.02 - 0.09 - -22.33
*Italian 0.01 - 0.02 - -6.15
Tenure in employment office in years 5.54 3.23 5.86 3.31 -6.84
*French 0.47 1.78 1.59 3.07 -31.36
*Italian 0.21 1.39 0.60 2.29 -14.58
Own experience of unemployment 0.63 - 0.63 - 0.54
*French 0.05 - 0.17 - -26.33
*Italian 0.02 - 0.05 - -11.73
Indicator for missing caseworker characteristics 0.04 - 0.04 - 0.13
Education: above vocational training 0.45 - 0.43 - 2.36
*French 0.04 - 0.10 - -17.68
*Italian 0.01 - 0.03 - -9.46
Education: tertiary track 0.21 - 0.24 - -4.68
*French 0.02 - 0.09 - -21.92
*Italian 0.00 - 0.02 - -8.25
Degree in vocational training for caseworkers 0.26 - 0.23 - 5.63
*French 0.00 - 0.01 - -9.64
*Italian 0.01 - 0.04 - -11.28
Table B.1 to be continued
33
Table B.1 continued

Allocation of unemployed to caseworkers
By industry 0.66 - 0.53 - 17.73
*French 0.05 - 0.10 - -12.88
*Italian 0.01 - 0.04 - -11.32
By occupation 0.58 - 0.56 - 3.08
*French 0.06 - 0.17 - -25.14
*Italian 0.01 - 0.05 - -14.27
By age 0.04 - 0.03 - 2.58
By employability 0.07 - 0.07 - 0.12
By region 0.09 - 0.12 - -7.55
Other 0.07 - 0.07 - -1.37
Frenchs speaking employment office 0.08 - 0.25 - -33.30
Italian speaking employment office 0.03 - 0.08 - -16.81
Cantonal unemployment rate in % 3.64 0.77 3.75 0.86 -9.23
*French 0.32 1.10 1.05 1.86 -33.93
*Italian 0.11 0.69 0.34 1.16 -16.61
GDP per capita in the canton in CHF 10,000 5.13 0.92 4.92 0.93 15.75
Additional variables not used in propensity score
Qualification: some degree 0.60 - 0.56 - 5.19
Employability rating: high 0.11 - 0.12 - -3.62
Lives in small city 0.68 - 0.70 - -3.64
Number of people in household 0.21 0.13 0.22 0.13 -2.75
Same gender 0.58 - 0.58 - 0.18
Age caseworker - age unemployed 7.96 12.57 8.69 12.63 -4.05
Same age 5 years 0.24 - 0.23 - 1.94
Both primary education 0.67 - 0.72 - -8.18
Both secondary education 0.28 - 0.31 - -4.45
Both upper secondary education 0.37 - 0.42 - -8.18
Both tertiary education 0.53 - 0.56 - -3.41
Same gender, age, and education 0.03 - 0.03 - -1.89
German speaking employment office 0.89 - 0.67 - 39.67
Number of caseworker 989 1,282
Number of observations 12,998 72,187
Note: Unconditional means and standard deviations by participation status. Standardized differences between
participants and non-participants in percent.
34
Appendix C: Identification of CATE
Proof of Theorem 1
) E X |Z = z , D = 1 d E (Y d | Z= z , X ) | Z= z ,
E (Y d | Z= z=
= E X |Z = z , D = 1 d E (Y=
d
=
| Z z, X , D d=) | Z z ,
= E X |Z = z , D = 1 d [ E (Y = =
| Z z, X , D d=) | Z z ].
1 d ) =E X |Z = z , D = 1 d E (Y d | Z =z , X , D =
E (Y d | Z =z , D = 1 d ,
1 d ) | Z =z , D =
= E X |Z = z , D = 1 d E (Y d | Z = z , X , D = d ) | Z = z , D = 1 d ,
= E X |Z = z , D = 1 d [ E (Y | Z = z , X , D = d ) | Z = z , D = 1 d ] .
35
Appendix D: Selection model, balancing tests, and common
support
Table D.1: Propensity score
Marg. eff. S.E. p-value

Female 0.01 0.00 0.11
*French 0.01 0.01 0.22
*Italian -0.03 0.02 0.11
Mother tongue other than German, French, Italian -0.01** 0.01 0.01
*French -0.03** 0.01 0.01
*Italian -0.01 0.02 0.48
Qualification: unskilled 0.01* 0.01 0.06
*French 0.10*** 0.02 0.00
*Italian 0.05** 0.02 0.02
Qualification: semiskilled 0.00 0.01 0.79
*French 0.06*** 0.01 0.00
*Italian 0.03 0.03 0.27
Qualification: skilled without degree 0.01* 0.01 0.09
*French -0.03 0.03 0.33
*Italian 0.02 0.03 0.44
Number of unemployment spells in last two years -0.01*** 0.00 0.00
*French 0.00 0.00 0.52
*Italian 0.00 0.01 0.52
Fraction of months employed in last two years 0.03*** 0.01 0.00
*French -0.03 0.02 0.12
*Italian -0.05* 0.02 0.05
Employability rating: low -0.04*** 0.01 0.00
*French 0.10*** 0.03 0.00
*Italian 0.13*** 0.03 0.00
Employability rating: medium -0.02 0.01 0.16
*French 0.09*** 0.02 0.00
*Italian 0.07*** 0.02 0.00
Age in 10 year -0.01 0.01 0.45
Age squared / 10,000 0.21 0.17 0.22
Married 0.00 0.00 0.57
Foreigner with B permit -0.02*** 0.01 0.00
Foreigner with C permit 0.00 0.00 0.68
Lives in big city -0.01 0.01 0.49
Lives in medium sized city 0.02*** 0.01 0.00
Past income in CHF 10,000 0.09*** 0.01 0.00
Number of employment spells in last 5 years -0.08*** 0.01 0.00
Table D.1 to be continued
36
Table D.1 continued

Previous job in primary sector -0.04*** 0.01 0.00
Previous job in secondary sector 0.04*** 0.01 0.00
Previous job in tertiary sector 0.01** 0.01 0.01
Foreigner with mother tongue in canton's language 0.03*** 0.00 0.00
Previous job self-employed -0.09*** 0.02 0.00
Previous job manager -0.05*** 0.01 0.00
Previous job skilled worker -0.02** 0.01 0.01
Previous job unskilled worker -0.02** 0.01 0.04
Training start in second quarter of unemployment 0.02*** 0.00 0.00
Age in 10 years 0.00 0.00 0.51
*French 0.00 0.00 0.25
*Italian 0.00 0.00 0.80
Female 0.02** 0.01 0.01
*French -0.05* 0.03 0.05
*Italian 0.05* 0.02 0.05
Tenure in employment office in years 0.00 0.00 0.15
*French -0.01 0.01 0.17
*Italian 0.00 0.00 0.48
Own experience of unemployment 0.01 0.01 0.48
*French -0.03 0.03 0.32
*Italian 0.05 0.03 0.11
Indicator for missing caseworker characteristics 0.00 0.02 0.92
Education: above vocational training 0.00 0.01 0.99
*French 0.01 0.03 0.80
*Italian 0.01 0.03 0.64
Education: tertiary track 0.00 0.01 0.84
*French -0.02 0.04 0.51
*Italian 0.01 0.04 0.76
Degree in vocational training for caseworkers 0.02* 0.01 0.09
*French -0.09 0.07 0.24
*Italian 0.02 0.03 0.51
37
Table D.1 continued

By industry 0.02*** 0.01 0.00
*French 0.05* 0.03 0.06
*Italian -0.04* 0.02 0.09
By occupation 0.02** 0.01 0.01
*French 0.04* 0.03 0.09
*Italian -0.06** 0.03 0.03
By age 0.01 0.01 0.38
By employability -0.02 0.01 0.14
By region -0.04** 0.01 0.01
Other -0.03** 0.01 0.02
Frenchs speaking employment office -0.06 0.09 0.45
Italian speaking employment office -0.19 0.11 0.10
Cantonal unemployment rate in % 0.03*** 0.01 0.00
*French -0.07*** 0.01 0.00
*Italian -0.03* 0.02 0.09
GDP per capita in the canton in CHF 10,000 -0.03*** 0.01 0.00
Number of caseworker 1,282
Number of observations 85,185
Note: Binary dependent variable: participation in job search program. Model includes constant term. Average marginal
effects reported. Bootstrapped standard errors are clustered at caseworker level (4999 replications). *, **, ***
means statistically different from zero at the 10%, 5%, 1% level, respectively.
38
Table D.2: Balancing tests
Part. Non-Part. Std. Diff. Std. Diff.

Mean Mean in % pre in % post t-statistic
Female 0.44 0.44 0.56 -0.21 -0.31
*French 0.08 0.08 -19.51 -0.01 -0.01
*Italian 0.03 0.03 -11.85 -0.66 -0.98
Mother tongue other than German, French, Italian 0.32 0.32 -5.40 0.23 0.34
*French 0.05 0.05 -18.01 -0.50 -0.74
*Italian 0.02 0.02 -9.80 0.30 0.43
Qualification: unskilled 0.24 0.24 -1.80 -0.31 -0.46
*French 0.05 0.05 -8.36 -0.56 -0.83
*Italian 0.02 0.02 -8.62 -0.44 -0.65
Qualification: semiskilled 0.16 0.16 -2.45 -0.27 -0.39
*French 0.03 0.04 -12.10 -1.43 -2.13**
*Italian 0.01 0.01 -5.16 -0.45 -0.66
Qualification: skilled without degree 0.04 0.04 -4.72 1.42 2.04**
*French 0.01 0.01 -11.22 1.30 1.84*
*Italian 0.01 0.01 -4.11 1.46 2.05**
Number of unemployment spells in last two years 0.59 0.57 -13.86 1.01 1.48
*French 0.12 0.11 -16.84 0.36 0.53
*Italian 0.05 0.05 -10.16 -0.68 -1.02
Fraction of months employed in last two years 0.80 0.80 12.57 -0.48 -0.71
*French 0.13 0.13 -30.04 -0.83 -1.21
*Italian 0.05 0.05 -15.77 -1.23 -1.82*
Employability rating: low 0.15 0.14 -3.98 1.00 1.45
*French 0.01 0.01 -9.87 -0.06 -0.08
*Italian 0.01 0.01 -4.94 0.03 0.05
Employability rating: medium 0.75 0.75 5.80 0.02 0.02
*French 0.14 0.15 -26.32 -0.70 -1.03
*Italian 0.04 0.04 -11.57 -0.20 -0.3
Age in 10 year 3.65 3.67 5.60 -1.57 -2.3**
Age squared / 10,000 0.14 0.14 5.61 -1.59 -2.33**
Married 0.48 0.48 -2.35 0.36 0.53
Foreigner with B permit 0.13 0.13 -6.96 0.06 0.08
Foreigner with C permit 0.25 0.25 -3.12 0.61 0.89
Lives in big city 0.16 0.17 -0.05 -0.98 -1.43
Lives in medium sized city 0.15 0.14 4.83 1.09 1.59
Past income in CHF 10,000 4.21 4.24 14.50 -0.85 -1.28
Number of employment spells in last 5 years 0.12 0.12 -14.70 1.19 1.73*
Previous job in primary sector 0.08 0.08 -10.44 0.45 0.65
Previous job in secondary sector 0.14 0.14 6.04 -0.17 -0.24
Previous job in tertiary sector 0.59 0.60 7.07 -0.72 -1.05
Foreigner with mother tongue in canton's language 0.11 0.11 2.40 0.10 0.14
39
Table D.2 continued

Previous job self-employed 0.01 0.01 -3.01 -0.40 -0.6
Previous job manager 0.07 0.07 1.85 0.10 0.14
Previous job skilled worker 0.59 0.60 4.70 -1.19 -1.74*
Previous job unskilled worker 0.30 0.30 -5.01 0.89 1.29
Training start in second quarter of unemployment 0.43 0.45 -0.38 -1.72 -2.51**
Age in 10 years 4.43 4.43 -0.77 -0.24 -0.36
*French 0.79 0.80 -31.79 -0.50 -0.73
*Italian 0.27 0.29 -16.43 -0.92 -1.35
Female 0.41 0.41 6.94 0.35 0.52
*French 0.05 0.05 -22.33 0.15 0.22
*Italian 0.02 0.02 -6.15 -0.15 -0.22
Tenure in employment office in years 5.77 5.75 -6.84 0.29 0.44
*French 1.09 1.09 -31.36 0.22 0.32
*Italian 0.48 0.52 -14.58 -1.26 -1.87*
Own experience of unemployment 0.63 0.63 0.54 -0.36 -0.52
*French 0.11 0.11 -26.33 -0.46 -0.67
*Italian 0.04 0.04 -11.73 -0.71 -1.04
Indicator for missing caseworker characteristics 0.04 0.04 0.13 -0.24 -0.35
Education: above vocational training 0.44 0.45 2.36 -0.95 -1.39
*French 0.08 0.08 -17.68 -0.50 -0.73
*Italian 0.02 0.02 -9.46 -1.73 -2.6***
Education: tertiary track 0.23 0.22 -4.68 1.41 2.05**
*French 0.06 0.06 -21.92 2.44 3.48***
*Italian 0.01 0.01 -8.25 0.18 0.27
Degree in vocational training for caseworkers 0.23 0.24 5.63 -0.40 -0.58
*French 0.00 0.00 -9.64 -0.47 -0.7
*Italian 0.03 0.04 -11.28 -1.36 -2.03**
By industry 0.58 0.58 17.73 -0.56 -0.82
*French 0.09 0.09 -12.88 -0.54 -0.78
*Italian 0.03 0.03 -11.32 -1.16 -1.73*
By occupation 0.56 0.56 3.08 0.01 0.02
*French 0.13 0.13 -25.14 0.13 0.19
*Italian 0.04 0.04 -14.27 -1.19 -1.76*
By age 0.03 0.03 2.58 0.53 0.77
By employability 0.08 0.07 0.12 1.23 1.78*
By region 0.11 0.11 -7.55 -1.31 -1.92*
Other 0.08 0.07 -1.37 1.89 2.72***
40
Table D.2 continued

Frenchs speaking employment office 0.17 0.18 -33.30 -0.41 -0.59
Italian speaking employment office 0.06 0.07 -16.81 -1.16 -1.72*
Cantonal unemployment rate in % 3.68 3.68 -9.23 0.29 0.44
*French 0.69 0.71 -33-93 -1.02 1.50
*Italian 0.27 0.29 -16.61 -1.10 -1.62
GDP per capita in the canton in CHF 10,000 4.98 4.98 15.75 -0.18 -0.26
Additional variables not used in propensity score
Qualification: some degree 0.56 0.56 5.19 -0.09 -0.13
Employability rating: high 0.10 0.10 -3.62 -1.19 -1.76*
Lives in small city 0.69 0.69 -3.64 -0.05 -0.07
Number of people in household 0.22 0.22 -2.75 0.31 0.45
Same gender 0.60 0.59 0.18 1.99 2.91***
Age caseworker - age unemployed 8.72 8.55 -4.05 0.99 1.45
Same age 5 years 0.24 0.23 1.94 0.95 1.38
Both primary education 0.70 0.71 -8.18 -2.41 -3.49***
Both secondary education 0.30 0.31 -4.45 -1.29 -1.89*
Both upper secondary education 0.40 0.41 -8.18 -1.61 -2.35**
Both tertiary education 0.55 0.56 -3.41 -1.73 -2.53**
Same gender, age, and education 0.03 0.03 -1.89 -1.61 -2.4**
German speaking employment office 0.76 0.76 39.67 1.04 1.52
Number of observations trimmed 584 6246
Number of observations used in estimation 12,414 65,941
Note: Means, differences, standardized differences, and t-statistic of two-sided t-test after inverse probability weighting
and common support adjustment. *, **, *** means statistically different from zero at the 10%, 5%, 1% level,
respectively.
Common support enforced by trimming observations below the 0.5 quantile the
participants and the 99.5 quantile of non-participants. Total number of trimmed observations:
6,767 (579 participants, 6,188 non-participants)
41
Figure D.1: Distribution propensity score
Note: Dashed lines show the lower and upper threshold of trimming.
42
Appendix E: Additional outcome variables
Table E.1: Coefficients of selected variables for long-term outcomes in prediction sample
Cumulated employment after registration in months 1-31 25-31

Coef. S.E. Coef. S.E.
Job search program (treatment) -1.37*** 0.32
Female * CW education: above vocational training 0.07 0.54
Unskilled * CW education: above vocational training -1.24 1.56
Unskilled * prev. job unskilled 4.66*** 1.58
# of unemployment spells in last two years * both primary education 0.29 0.20
Fraction of months employed in last two years * past income 57 - 75k 0.32 0.63
GDP per capita * prev. job self-employed 4.56 5.60
CW education: above vocational training * past income 25 - 50k 1.09* 0.66
CW education: tertiary track * past income 25 - 50k 0.21 0.84
Degree in vocational training for caseworkers * past income 50 - 75k -0.70 0.81 No
Married * 50 - 75k -0.37 0.70 heterogeneity
detected
Foreigner with C permit * 50 - 75k 0.32 0.76
Medium city * prev. job unskilled -0.48 0.94
Single household * zero employment spell last two years -0.24 0.56
Past income 0 - 25k * # employment spells past five years 3.60 2.47
# employment spells past five years * both primary education -1.41 1.87
# employment spells past five years * same gender, age, and education -6.34 5.93
Zero employment spell last two years * skilled worker -0.56 0.53
Prev. job in primary sector * unskilled -0.20 1.3
Unskilled * both primary education 0.09 0.56
Employment agency 44 -0.98 1.27
# of selected variables 20 of 1,268 0 of 1,268
# of caseworkers 1,282
# of observations 39,216
Note: *, **, *** means statistically different from zero at the 10%, 5%, 1%. Standard errors clustered at caseworker level.
43
Figure E.1: Heterogeneity for different time horizons individual characteristics: 1-12 months
44
45
46
Figure E.4: Heterogeneity for different time horizons caseworker characteristics: 1-12 months
47
48
49
References
Abbring, J.H. and van den Berg, G.J. (2003): The Non-Parametric Identification of Treatment Effects in
Duration Models, Econometrica, 71, 1491-1517.
Abbring, J.H. and van den Berg, G.J. (2004): Analyzing the Effect of Dynamically Assigned Treatments using
Duration Models, Binary Treatment Models, and Panel Data Models, Empirical Economics, 29, 5-20.
Angrist, J.D., J.-S. Pischke (2010): The Credibility Revolution in Empirical Economics: How Better Research
Design is Taking the Con out of Econometrics, Journal of Economic Perspectives, 24 (2), 330.
Athey, S., G.W. Imbens (2016a): The Econometrics of Randomized Experiments, Working Paper,
arXiv:1607.00698.
Athey, S., G.W. Imbens (2016b): The State of Applied Econometrics - Causality and Policy Evaluation, Working
Paper, arXiv:1607.00699.
Athey, S., G.W. Imbens (2016c): Recursive Partitioning for Heterogeneous Causal Effects, Proceedings of the
National Academy of Science of the United States of America, 113 (27), 7353-7360.
Bhattacharya, D., P. Dupas (2012): Inferring Welfare Maximizing Treatment Assignment Under Budget
Constraints, Journal of Econometrics, 167 (1), 168-196.
Behncke, S., M. Frlich, M. Lechner (2010a): A Caseworker like Me Does the Similarity between the
Unemployed and their Caseworkers Increase Job Placements? Economic Journal, 120 (549), 1430-1459.
Behncke, S., M. Frlich, M. Lechner (2010b): Unemployed and their Caseworkers: Should they be Friends or
Foes? Journal of the Royal Statistical Society, Series A, 173 (1), 67-92.
Bell, S., and L. Orr (2002): Screening (and Creaming?) Applicants to Job Training Programs: The AFDC
Homemaker Home Health Aide Demonstration, Labour Economics, 9(2), 279-302.
Belloni, A., V. Chernozhukov, C. Hansen (2013): Inference on Treatment Effects after Selection amongst High-
Dimensional Controls, Review of Economic Studies, 81 (2), 608-650.
Biewen, M., B. Fitzenberger, A. Osikominu, and M. Paul (2014): The Effectiveness of Public Sponsored Training
Revisited: The Importance of Data and Methodological Choices, Journal of Labor Economics, 32 (4), 837-
897.
Black D.A., J.A. Smith, M.C. Berger, B.J. Noel (2003): Is the Threat of Reemployment Services more Effective
than the Services Themselves? Evidence from Random Assignment in the UI System, American Economic
Review, 93 (4), 1313-1327.
Blasco, S, M. Rosholm (2011): The Impact of Active Labour Market Policy on Post-Unemployment Outcomes:
Evidence from a Social Experiment in Denmark, IZA Discussion Paper, No. 5631.
Blundell, R., M.C. Dias, C. Meghir, J.V. Reenen (2004): Evaluating the Employment Impact of a Mandatory Job
Search Program, Journal of the European Economic Association, 2 (4), 569-606.
Breiman, L. (1996): Bagging predictors, Machine Learning, 24 (2): 123140.
Bhlmann, P., S. van de Geer (2011): Statistics for High-Dimensional Data: Methods, Theory and Applications,
Springer, Heidelberg.
Bundesamt fr Statistik (2016): Gross Domestic Product per Capita.
50
Card, D, J. Kluve, A. Weber (2010): Active Labour Market Policy Evaluations: A Meta Analysis, Economic
Journal, 120 (548), F452-F477.
Card, D, J. Kluve, A. Weber (2015): What Works? A Meta Analysis of Recent Active Labor Market Program
Evaluations, IZA Discussion Paper, No. 9236.
Casey, K., R. Glennerster, E. Miguel (2012): Reshaping Institutions: Evidence on Aid Impacts Using a Pre-
analysis Plan, Quarterly Journal of Economics, 124 (4), 1755-1812.
Chen, S., L. Tian, T. Cai, M. Yu (2017): A General Statistical Framework for Subgroup Identification and
Comparative Treatment Scoring, Biometrics, forthcoming.
Ciarleglio, A., E. Petkova, R.T. Ogden, T. Tarpey (2015): Treatment Decisions Based on Scalar and Functional
Baseline Covariates, Biometrics, 71, 884894.
Cottier, L., P. Kempeneers, Y. Flckiger, R. Lalive (2017): Does Intensive Job Search Assistance Help Job
Seekers Find and Keep Jobs?, mimeo.
Cox, D. (1975): A Note on Data-Splitting for the Evaluation of Significance Levels, Biometrika, 62 (2), 441-
444.
Crpon, B., E. Duflo, M. Gurgand, R. Rathelot, P. Zamora (2013): Do Labor Market Policies Have Displacement
Effects? Evidence from a Clustered Randomized Experiment, Quarterly Journal of Economics, 128 (2), 531-
80.
Crpon B, G. van den Berg (2016): Active Labor Market Policies, IZA Discussion Paper, No. 10321.
Fithian, W., D. Sun, J. Taylor (2015): Optimal Inference After Model Selection, Working Paper,
arXiv:1410.2597.
Dehejia, R. (2005): Program Evaluation as a Decision Problem, Journal of Econometrics, 125 (1-2), 141-173.
Dolton, P., D. ONeill (2002): The Long-Run Effects of Unemployment Monitoring and Work-Search Programs:
Experimental Evidence from the United Kingdom, Journal of Labor Economics, 20 (2), 381-403.
Fredriksson, P. and Johannsen, P. (2008): Dynamic Treatment Assignment - The Consequences for Evaluations
using Observational Data, Journal of Business and Economic and Statistics, 26, 435-445.
Foster, J.C., J.M.G. Taylor, S.J. Ruberg (2011): Subgroup Identification from Randomized Clinical Trial Data,
Statistics in Medicine, 30 (24), 2867-2880.
Gautier P., P. Muller, B. van der Klaauw, M. Rosholm, M. Svarer (2017): Estimating Equilibrium Effects of Job
Search Assistance, Working Paper.
Gerfin M., M. Lechner (2002): A Microeconometric Evaluation of Active Labour Market Policy in Switzerland,
Economic Journal, 112 (482), 854-893.
Graversen, B.K., J.C. Van Ours (2008): How to Help Unemployed find Jobs Quickly: Experimental Evidence
from a Mandatory Activation Program Journal of Public Economics, 92 (10-11), 2020-2035.
Green, D.P., H.L. Kern (2012): Modelling Heterogeneous Treatment Effects in Survey Experiments with
Bayesian Additive Regression Trees, Public Opinion Quarterly, 76 (3), 491-511.
Grimmer, J., S. Messing, S.J. Westwood (2016), Estimating Heterogeneous Treatment Effects and the Effects of
Heterogeneous Treatments with Ensemble Methods, Working Paper.
51
Hastie, T., R. Tibshirani, M. Wainwright (2015): Statistical Learning with Sparsity The Lasso and its
Generalisations, CRC Press.
Heckman, J. and Navarro, S. (2007): Dynamic Discrete Choice and Dynamic Treatment Effects, Journal of
Econometrics, 136, 341-396.
Hirano, K., G.W. Imbens, G. Ridder (2003): Efficient Estimation of Average Treatment Effects Using the
Estimated Propensity Score, Econometrica, 71 (4), 1161-1189.
Hoerl, A. E., and R. W. Kennard (1970): Ridge Regression: Biased Estimation for Nonorthogonal Problems,
Technometrics, 12(1) 55-67.
Horowitz, J. L. (2015): Variable Selection and Estimation in High-Dimensional Models, Canadian Journal of
Economics, 48 (2), 389-407.
Huber, M., M. Lechner, G. Mellace (2017): Why Do Tougher Caseworkers Increase Employment? The Role of
Programme Assignment as a Causal Mechanism, Review of Economics and Statistics, 99 (1), 180-183.
Imai K., A. Strauss (2011): Estimation of Heterogeneous Treatment Effects from Randomized Experiments, with
Applications to the Optimal Planning of the Get-Out-of-the-Vote Campaign, Political Analysis, 19 (1), 1-19.
Imai K., M. Ratkovic (2013): Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation,
Annals of Applied Statistics, 7 (1), 443-470.
Imbens, G.W., J.M. Wooldridge (2009): Recent Developments in the Econometrics of Program Evaluation,
Journal of Economic Literature, 47 (1), 5-86.
Lalive, R., J.C. van Ours, J. Zweimller (2008): The Impact of Active Labor Market Programs on the Duration
of Unemployment, Economic Journal, 118 (525), 235-257.
Lan, W., P. Zhong, R. Li, H. Wang, C. Tsai (2016): Testing a Single Regression Coefficient in High Dimensional
Linear Models, Journal of Econometrics, 195 (1), 154-168.
Lechner, M. (1999): Earnings and Employment Effects of Continuous Off-the-job Training in East Germany after
Unification, Journal of Business Economics and Statistics, 17, 74-90.
Lechner, M. (2009): Sequential Causal Models for the Evaluation of Labor Market Program, Journal of Business
and Economic Statistics, 27, 7183.
Lechner, M., J.A. Smith (2007): What is the Value Added by Caseworkers? Labour Economics, 14 (2), 135-
151.
Lechner, M., C. Wunsch (2013): Sensitivity of Matching-Based Program Evaluations to the Availability of
Control Variables, Labour Economics, 21 (C), 111-121.
List, J. Shaikh, A., Y. Xu (2016): Multiple Hypothesis Testing in Experimental Economics, NBER Working
Paper No. 21875.
Meyer, B.D. (1995): Lessons from the US Unemployment Insurance Experiments, Journal of Economic
Literature, 33 (1), 91-131.
Morlok, M., D. Liechti, R. Lalive. A. Osikominu, J. Zweimller (2014): Evaluation der arbeitsmarktlichen
Massnahmen: Wirkung auf Bewerbungsverhalten und chancen, SECO Publikationen, Arbeitsmarktpolitik
No. 41.
Olken, B. (2015): Promises and Perils of Pre-Analysis Plans, Journal of Economic Perspectives, 29 (3), 61-80.
52
Qian, M., S.A. Murphy (2011): Performance Guarantees for Individualized Treatment Rules, Annals of
Statistics, 39, 11-80.
Robins, J.M. (1986): A new approach to causal inference in mortality studies with sustained exposure periods
application to control of the healthy worker survivor effect, Mathematical Modelling, 7, 13931512, with
1987 Errata to A new approach to causal inference in mortality studies with sustained exposure periods
application to control of the healthy worker survivor effect. Computers & Mathematics with Applications, 14,
917921; 1987 Addendum to A new approach to causal inference in mortality studies with sustained exposure
periodsapplication to control of the healthy worker survivor effect, Computers & Mathematics with Appli-
cations, 14, 923945; and 1987 Errata to Addendum to A new approach to causal inference in mortality
studies with sustained exposure periodsapplication to control of the healthy worker survivor effect, Com-
puters & Mathematics with Applications, 18, 477.
Rosholm, M. (2008): Experimental Evidence on the Nature of the Danish Employment Miracle, IZA Discussion
Paper No. 3620.
Rubin, D.B. (1974): Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies, Jour-
nal of Educational Psychology, 66 (5), 688-701.
SECO, State Secretary for Economic Affairs (2017): Die Lage auf dem Arbeitsmarkt im Februar 2017.
Sianesi, B. (2004): An Evaluation of the Swedish System of Active Labour Market Programmes in the 1990s,
Review of Economics and Statistics, 86, 133-155.
Signorovitch, J.E. (2007): Identifying Informative Biological Markers in High-Dimensional Genomic Data and
Clinical Trials, PhD thesis, Harvard University.
Taddy, M., M. Gardner, L. Chen, D. Draper (2015): A Nonparametric Bayesian Analysis of Heterogeneous
Treatment Effects in Digital Experimentation, Working Paper, arXiv:1412.8563.
Tian, L., A.A. Alizadeh, A.J. Gentles, R. Tibshirani (2014): A Simple Method for Estimating Interactions
Between a Treatment and a Large Number of Covariates, Journal of the American Statistical Association, 109
(508), 1517-1532.
Tibshirani, R. (1996): Regression Shrinkage via the Lasso, Journal of the Royal Statistical Society. Series B, 58
(1), 267-288.
Van den Berg, G.J, A. Bergemann, M. Caliendo (2009): The Effect of Active Labor Market Programs on Not-
Yet Treated Unemployed Individuals, Journal of the European Economic Association, Papers and
Proceedings, 7 (2-3), 606-616.
Van den Berg, G.J., B. van der Klaauw (2006): Counseling and Monitoring of Unemployed Workers: Theory and
Evidence from a Controlled Social Experiment, International Economic Review, 47 (3), 895-936.
Vansteelandt, S., T.J. VanderWeele, E.J. Tchetgen, J.M. Robins (2008): Multiply Robust Inference for Statistical
Interactions, Journal of the American Statistical Association, 103, 16931704.
Vikstrm, J., M. Rosholm, M. Svare (2013): The Relative Efficiency of Active Labour Market Policies: Evidence
from a Social Experiment and Non-Parametric Methods, Labour Economics, 24, 58-67.
Wunsch, C., M. Lechner (2008): What Did All the Money Do? On the General Ineffectiveness of Recent West
German Labour Market Programmes, Kyklos, 61 (1), 134-174.
53
Xu, Y., M. Yu, Y.-Q. Zhao, Q. Li, S. Wang, J. Shao (2015): Regularized Outcome Weighted Subgroup
Identification for Differential Treatment Effects, Biometrics, 71 (3), 645653.
Zhang, B., A.A. Tsiatis, E.B. Laber, M. Davidian (2012): A Robust Method for Estimating Optimal Treatment
Regimes, Biometrics, 68, 10101018.
Zhao, Y., D. Zeng, E.B. Laber, M.R. Kosorok (2015): New Statistical Learning Methods for Estimating Optimal
Dynamic Treatment Regimes, Journal of the American Statistical Association, 110 (510), 583-598.
Zhao, Y., D. Zeng, A.J. Rush, M.R. Kosorok (2012): Estimating Individualized Treatment Rules using Outcome
Weighted Learning Journal of the American Statistical Association, 107, 11061118.
Zou, H. (2006): The Adaptive Lasso and its Oracle Properties, Journal of the American Statistical, Association,
101 (476), 1418-1429.
54

Knaus m23805 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Knaus m23805 PDF

Uploaded by

Copyright:

Available Formats

Uncovering Treatment Effect Heterogeneity in

Swiss Job Search Programs

First draft: April 2017

effect heterogeneity in Job Search Programmes (JSPs) in Switzerland. Programme evaluation

individuals. Stratifying the data in mutually exclusive groups or including interactions in a

List, Shaikh, and Xu, 2016).

make it also difficult to report the results in an intuitive way.

A developing literature proposes to use (adapted) machine learning algorithms to

priori unexpected groups.

Machine learning methods are powerful tools for out-of-sample predictions of

treatment conditions, but the counterfactual outcome is unobservable. Recently, several

large-scale evidence about the effect heterogeneity of JSPs.

In this study, we systematically investigate effect heterogeneity of JSPs. Possibly some

caseworker characteristics could reflect different baseline monitoring intensities (Behncke,

Lechner and Smith, 2007).

selection-on-observables strategy (see, e.g., Imbens and Wooldridge, 2009). We assume to

Taylor, 2015, Cox, 1975).

3, we introduce our data. In Section 4, we discuss our econometric approach. In Section 5, we

search, personality, language, computer, vocational, and employment programs. We focus

a sufficient number of observations to search for heterogeneity. JSPs provide training in

search during the course.

nationality, qualification, education, language skills, employment history, profession, position,

caseworker and the regional employment agency.

3.2 Sample definition

Lechner, and Mellace (2017). It contains 100,120 unemployed persons.

unemployment duration prior to participation: Caseworkers may assign unemployed persons to

2009, Robins, 1986, Sianesi, 2004, among others).

participation at their respective assigned start dates.

These unemployed are assigned to 1,282 different caseworkers.

3.3 Descriptive statistics

participants are slightly higher employment intensity.

Table 3.1: Descriptive statistics of important variables by treatment status

Participants Non-Participants Std. Diff.

Furthermore, we observe differences in individual characteristics, the characteristics of

4.1 Basic concept and identification

when individual i (for i = 1,..., N ) participates in a JSP ( Di = 1) . Conversely, Yi denotes the

potential outcome when individual i is not participating in a JSP ( Di = 0 ) . Obviously, each

only one potential outcome is observable. The observed outcome equals

It is common to define the causal treatment effect of D on Y for an individual as

Both aforementioned average effects could hide interesting effect heterogeneity.

variables in the vector Z i ,

In addition to the variables included in the vector Z i , we consider confounding variables

partially or fully overlapping with X i depending on the question under investigation. We

impose the following assumption to achieve identification.

Assumption 1 (Conditional independence): Yi1 , Yi 0 Di |=

Assumption 2 (Common support): 0 < P( Di =

x and z in the support.

participation conditional on confounding and heterogeneity variables. The plausibility of this

assumption is justified by the availability of a very detailed set of confounding variables

when deciding about program participation. Additionally, we observe detailed information

Strittmatter (2017). Assumption 3 requires exogeneity of confounding and heterogeneity

prior to the start of JSP participation.

Thus ( z ) and ( z ) are identified from observable data on { yi , di , zi , xi }i =1 .

for groups of unemployed persons with relevant effect heterogeneity.

4.2 Modified covariate method (MCM)

in a programme is randomly assigned to 50% of the unemployed persons. Accordingly, in this

Z i1 which is responsible for effect heterogeneity. A standard linear regression model is

the right side of (1) provides a linear approximation of the CATE

Tian et al. (2014) propose to include the modified variable=

Notice that Cov ( Z i1 , Ti Z i1=

can estimate CATEs with the model

minimising the objective function

are calculated using the estimated propensity score.