You are on page 1of 55

Uncovering Treatment Effect Heterogeneity in

Swiss Job Search Programs


Michael Knaus, Michael Lechner+ , and Anthony Strittmatter *

First draft: April 2017


Date this version has been printed: 15 April 2017

Preliminary and incomplete. Please do not circulate or quote without the permission of one of
the authors
Comments are very welcome

Abstract: We systematically investigate the effect heterogeneity of Job Search Programmes for
unemployed, which are part of the Swiss Active Labour Market Policy. The analysis is based on rich
data coming from the Swiss social insurance system. We combine recently developed Lasso-type
estimators from machine learning to detect such heterogeneities with inverse probability weighting to
correct for selective participation in the programme. We find that while in the medium run-effect there
appear to be no systematic heterogeneities, the so-called lock-in effect (close to the beginning of the
programme) shows considerable heterogeneities. In line with previous results in the literature,
unemployed with a priori bad chances (bad employment risks) in the labour market suffer much less
from a lock-in effect than unemployed expected to find jobs soon even without a programme.

Keywords: Active labour market policy, LASSO, machine learning, effect heterogeneity
JEL classification: I12, I18, J24, L83, C21.
Addresses for correspondence: Michael Knaus, Swiss Institute for Empirical Economic Research (SEW),
University of St. Gallen, Varnbelstrasse 14, CH-9000 St. Gallen, Switzerland, Michael.Knaus@unisg.ch.
Michael Lechner, Swiss Institute for Empirical Economic Research (SEW), University of St. Gallen,
Varnbelstrasse 14, CH-9000 St. Gallen, Switzerland, Michael.Lechner@unisg.ch, www.michael-lechner.eu .
Anthony Strittmatter, Swiss Institute for Empirical Economic Research (SEW), University of St. Gallen,
Varnbelstrasse 14, CH-9000 St. Gallen, Switzerland, Anthony.Strittmatter@unisg.ch,
www.anthonystrittmatter.com.

+
Michael Lechner is also affiliated with CEPR and PSI, London, CESIfo, Munich, IAB, Nuremberg, and IZA, Bonn.

*
A previous version of the paper was presented at the University of Maastricht, Department of Economics. We thank
participant for helpful comments and suggestions. The usual disclaimer applies.
1 Introduction
In this study, we employ machine learning methods for a large-scale principled investigation of

effect heterogeneity in Job Search Programmes (JSPs) in Switzerland. Programme evaluation

studies widely acknowledge the possibility of effect heterogeneity for different groups of

individuals. Stratifying the data in mutually exclusive groups or including interactions in a

regression framework are two baseline approaches to investigate effect heterogeneity (see

Athey and Imbens, 2016a, for a review). However, for large-scale investigations of effect

heterogeneity, the standard p-values of classical (single) hypothesis tests are no longer valid

because of the multiple testing problem. The multiple testing problem can lead to the reporting

of spurious heterogeneous effects that result from false positives (see, e.g., Lan et al., 2016,

List, Shaikh, and Xu, 2016).

The disadvantages of ex post data mining have been recognized in the evaluation

literature. For example, in randomized experiments, researchers may be required to define their

heterogeneity analysis plan prior to the experiment to avoid reporting (and searching for)

significant effects only (e.g., Casey, Glennerster, and Miguel, 2012, Olken, 2015). However,

these pre-analysis plans are inflexible and usually not required for observational studies. An

alternative approach that partly alleviates the ex post selection problem is to report effect

heterogeneity for all possible groups. For large-scale investigations, an approach taking account

of all possible differences might lead to very small groups and thus imprecise estimates, and

make it also difficult to report the results in an intuitive way.

A developing literature proposes to use (adapted) machine learning algorithms to

systematically search for groups with heterogeneous effects (see, e.g., the review of Athey and

Imbens, 2016b). Potentially, machine learning approaches are attractive because they could

provide a principled approach to heterogeneity detection and avoid the multiple testing

problem. In addition, they are flexible and remain computationally feasible, even when the

1
covariate space becomes high-dimensional and possibly exceeds the sample size. Such a

systematic search for heterogeneity may uncover substantial effect heterogeneity between a

priori unexpected groups.

Machine learning methods are powerful tools for out-of-sample predictions of

observable variables. However, the fundamental problem of causal analyses is the inability to

observe individual causal effects (see Angrist and Pischke, 2010, Imbens and Wooldridge,

2009, among many others). We are able to observe individual outcome levels under specific

treatment conditions, but the counterfactual outcome is unobservable. Recently, several

methods have been proposed to apply machine learning methods in ways which overcome the

fundamental problem of causal analyses (see, e.g., Athey and Imbens, 2016c, Foster, Taylor,

and Ruberg, 2011, Green and Kern, 2012, Grimmer, Messing, and Westwood, 2016, Imai and

Strauss, 2011, Taddy et al., 2015, Vansteelandt et al., 2008). 1 Two promising approaches are

the Modified Outcome Method (MOM, Signorovitch, 2007) and the Modified Covariate

Method (MCM, Tian et al., 2014). The modifications of both methods allow to approximate

group specific treatment effects in a first step. 2 Even though these group effects are (usually)

not well identified and estimated because they may relate to a high dimensional covariate space,

their approximations can be used to apply systematic group search algorithms in a second step.

However, the robustness of the estimation results across different methods and model

specifications and the efficiency of different estimation procedures are mostly unexplored.

1
Qian and Murphy (2011), Xu et al. (2015), and Zhao et al. (2012) study individualized treatment rules, which directly focus
on the decision rule instead of estimating the effect heterogeneity. Ciarleglio et al. (2015) propose a method to select the
optimal treatment conditional on observed individual characteristics. Zhao et al. (2015) investigate the optimal dynamic
order of sequential treatments.
2
While MOM approximates group specific treatment effects with the help of an outcome modification, the MCM modifies
the covariates. The MOM can be estimated fully non-parametrically (e.g., using classification tree or random forest
estimators, see, e.g., Athey and Imbens, 2016c). The MCM requires a parametric model specification, but the model can be
very flexible and efficiency improving estimation algorithms are available.

2
JSPs provide training in effective job search and application strategies as well as

monitoring of actual applications. An assignment to a JSP may lead to threat effects before the

programme start (see, e.g., Black et al., 2003, van den Berg, Bergemann, and Caliendo, 2008)

and may affects the matching process and quality between the participants and the potential

new job (see, e.g., Blasco and Rosholm, 2011, Cottier et al., 2017). Push effects could occur if

participants accept jobs with low matching quality because of actual or perceived sanctions or

perceived future ALMP assignments. Push effects decrease the unemployment duration, but

may reduce the employment stability. On the other hand, JSP participation could improve the

visibility of suitable job vacancies and the efficiency of the application process, which may

improve the employment stability. Furthermore, many studies are concerned about the

crowding-out of non-participants (see, e.g., Blundell et al., 2004, Crpon et al., 2013, Gautier

et al., 2017).

The empirical evidence about the effectiveness of JSPs is mixed. The review studies of

Card, Kluve, and Weber (2010, 2015) and Crpon and van den Berg (2016) document a weak

tendency towards positive effects of JSPs, especially in the short-run. 3 For the Swiss JSP we

investigate, the literature finds negative employment effects, which fade one year after the start

of participation (see Gerfin and Lechner, 2002, Lalive, van Ours, and Zweimller, 2008). One

reason for the ambiguous effectiveness of JSPs might be different relative intensities of jobs

search training and monitoring. Van den Berg and van der Klaauw (2006) are concerned that

intensive monitoring reduces informal job search, which might be a more efficient strategy than

formal job search for some unemployed persons. They suggest formal job search is more

effective for individuals with little labour market opportunities. Consistent with their

3
For the US, Meyer (1995) reports negative effects on unemployment benefit payments and positive earnings effects of JSPs.
For Denmark, Graversen and van Ours (2008) and Rosholm (2008) report positive effects of JSPs on the unemployment
exit rate. For Germany, Wunsch and Lechner (2008) find JSPs have negative effect during the first two years after program
begin, which fade-out afterwards. They show that training sequences are responsible for long lasting negative lock-in effects.

3
arguments, Card, Kluve, and Weber (2015) document that job search programmes are relatively

more effective for disadvantaged participants. Vikstrm, Rosholm, and Svarer (2013) find

slightly more positive effects of JSP for women and younger participants. Dolton and ONeill

(2002) report negative employment effects of JSP for men and insignificant effects for women

five years after programme start. Surprisingly, the programme evaluation literature is lacking

large-scale evidence about the effect heterogeneity of JSPs.

In this study, we systematically investigate effect heterogeneity of JSPs. Possibly some

groups of unemployed persons benefit more from JSPs than others. We base the search

algorithm on many attributes of the unemployed and their caseworkers (we consider 1,268

different variables which might be predictive for effect heterogeneity). Effect heterogeneity by

caseworker characteristics could reflect different baseline monitoring intensities (Behncke,

Frlich, and Lechner, 2010a, 2010b). Furthermore, we translate discovered heterogeneities into

information useful for decision makers. Knowledge of heterogeneous effects enables decision

makers to improve the allocation rules for JSPs such that their intended effects and cost-benefit

efficiency increase (see discussions in, e.g., Bhattacharya and Dupas, 2012, Dehejia, 2005,

Lechner and Smith, 2007).

In the active labour market programme (ALMP) evaluation literature based on the

informative data sets from administrative registers, it has become common to pursue a

selection-on-observables strategy (see, e.g., Imbens and Wooldridge, 2009). We assume to

control for all confounders that jointly affect the probability to participate in the JSP and the re-

employment probability. This assumption is likely to hold based on the large and rich

administrative data we use (see, e.g., discussion in Biewen et al., 2014, Lechner and Wunsch,

2013). We balance the confounders with an Inverse Probability Weighting (IPW) estimator

(e.g., Hirano, Imbens, and Ridder, 2003). After calculating these IPW weights, we follow Chen

et al.s (2017) general weighting framework for MCM that is applicable in non-experimental

4
empirical designs. 4 We combine the re-weighted MCM with Tibshiranis (1996) Least Absolute

Shrinkage and Selection Operator (LASSO). 5 We achieve valid confidence intervals by splitting

the sample. In particular, we use one half of the sample to select variables that are relevant to

predict the size of the treatment effect and the other sample to estimate the coefficients for the

selected variables. This approach yields unbiased coefficients and honest confidence

intervals, which are valid under an independent sampling argument (see, e.g., Fithian, Sun,

Taylor, 2015, Cox, 1975).

Our results suggest that the different methods for principled effect heterogeneity

investigation provide useful information. The main conclusions remain robust across a variety

of different methods and model specifications. We find substantial effect heterogeneity of Swiss

JSPs during the first year after the start of participation. During this period, we observe for most

unemployed groups negative effects. The heterogeneity is strongly related to the characteristics

of the unemployed persons. Consistent with the previous literature, participants with

disadvantaged labour market characteristics benefit more from JSPs. We find only little

heterogeneity by caseworker characteristics. The effects fade after one year. We do not discover

substantial effect heterogeneity for time periods beyond one year after program begin.

In the next section, we give information about the institutional background. In Section

3, we introduce our data. In Section 4, we discuss our econometric approach. In Section 5, we

report and discuss our results. In Section 6, we conclude. Appendices A to C give additional

detailed information on descriptive statistics, the selection model and additional results.

4
Imai and Ratkovic (2013) and Zhang et al. (2012) develop alternative non-experimental approaches for principled effected
heterogeneity search.
5
Additionally, we consider other LASSO-type estimators (Adaptive LASSO, Post LASSO), different procedures to specify
tuning parameters (Cross-validation) and compare the MCM and MOM. To this end, we investigate the consistency of our
results across different procedures to investigate effect heterogeneity.
5
2 Institutional Background
Switzerland is a federal country with 26 cantons and three major language regions (French,

German, and Italian). It is a relatively wealthy country with approximately 78,000 CHF

(approx. 77,000 US-Dollar) GDP per capita and has a low unemployment rate of 3-4% (SECO,

2017, Bundesamt fr Statistik, 2017). Unemployed persons have to register at the regional

employment agency closest to their home. 6 The employment agency pays income maintenance.

Benefits amount to 70-80% of the former salary depending on age, children, and past salary

(see Behncke, Frlich, and Lechner, 2010a). The maximum benefit entitlement period is 24

months.

The yearly expenditures for Swiss active labour market programs (ALMP) exceed 500

mio. CHF (Morlok et al., 2014). Unemployed persons can participate in a variety of different

ALMP. Gerfin and Lechner (2002) classify these ALMP as (a) training courses, (b)

employment programs, and (c) temporary employment schemes. Training courses include job

search, personality, language, computer, vocational, and employment programs. We focus

exclusively on JSPs in this study, because this the most common ALMP in Switzerland (more

than 50% of the assigned ALMPs are JSPs, Huber, Lechner, and Mellace, 2017) and provides

a sufficient number of observations to search for heterogeneity. JSPs provide training in

effective job search and application strategies (e.g., training in rsum writing). Furthermore,

actual applications are screened and monitored. The JSPs we consider are relatively short, with

an average duration about three weeks. Training takes place in class rooms. The employment

agency covers the costs of training and travel costs. Participants are obliged to continue job

search during the course.

6
At the beginning of the unemployment spell, newly registered unemployed are often sent to a one-day workshop providing
information about the unemployment law, obligations and rights, job search requirements, etc.

6
In Switzerland, the regional employment agencies have a large degree of autonomy,

which is partly related to the countrys federal organisation. The assignment decision to a

training course is made by caseworkers based on their subjective impression combined which

local office policy and national eligibility rules. The national rules are however rather vague.

They imply, for example, that the training has to be necessary and adequate to improve the

individual employment chances. Unemployed persons have the possibility to apply for

participating in such courses, but the final decision is always made by the caseworkers.

Caseworkers can also essentially force unemployed into such courses by threating to impose

sanctions.

3 Data

3.1 General

We use data including all individuals who are registered as unemployed at a Swiss regional

employment agencies in the year 2003. The data contains rich information from different

unemployment insurance databases (AVAM/ASAL) and social security records (AHV). This

is the standard data used for many Swiss ALMP evaluations (e.g. Gerfin and Lechner, 2002,

Lalive, van Ours, and Zweimller, 2008, Lechner and Smith, 2007). We observe (among others)

nationality, qualification, education, language skills, employment history, profession, position,

industry of last job, desired occupation and industry, and an employability rating by the

caseworkers. The data contains detailed (de-)registration dates, and participation in ALMP. The

administrative data is linked with regional labour market characteristics, such as the population

size of municipalities and the cantonal unemployment rate. The availability of caseworker

information distinguishes our data. Swiss caseworkers employed in the period 2003-2004 were

surveyed based on a written questionnaire in December 2004 (see Behncke, Frlich, Lechner,

7
2010a, 2010b). The questionnaire contained questions about aims, strategies, processes of the

caseworker and the regional employment agency.

3.2 Sample definition

In total, 238,902 persons registered as being unemployed in 2003. Here, we only consider the

first unemployment registration per individual in 2003. Each registered unemployed person is

assigned to a caseworker. In most cases, the same caseworker is responsible for the entire

unemployment spell of his/her client. If this is not the case, we focus on the first caseworker to

avoid concerns about (rare) endogenous caseworker changes (see Behncke, Frlich, Lechner,

2010a). We only consider unemployed aged between 24 and 55 years who receive

unemployment insurance benefits. We omitted unemployed persons who apply for disability

insurance benefits, when the responsible caseworker is not clearly defined, or when their

caseworkers did not answer the questionnaire (that had a response rate of 84%). We drop

unemployed foreigners with a residence permit that is valid for less than a year. Finally, we

drop unemployed persons from a few regional employment agencies that are not comparable to

the other regional employment agencies. This sample is identical to the data used in Huber,

Lechner, and Mellace (2017). It contains 100,120 unemployed persons.

One concern regarding the treatment definition is the timing with respect to the elapsed

unemployment duration prior to participation: Caseworkers may assign unemployed persons to

job training programmes essentially anytime during their unemployment spell. Several points

have to be taken into account to address this issue (see the discussions in Abbring and van den

Berg, 2003, 2004, Fredriksson and Johansson, 2008, Heckman and Navarro, 2007, Lechner,

2009, Robins, 1986, Sianesi, 2004, among others).

We consider a classical static evaluation model and use the following denitions. The

treatment is defined as the first participation in a job search program during the first six months

8
of unemployment (83% of JSP are assigned within the first six months of unemployment). We

exclude individuals who participate in other ALMP within the first six months of

unemployment from the sample, such that our control group represents non-participants of all

programmes (8,781 other ALMP participants are dropped). Potentially this approach could lead

to a higher share of individuals with better labour market characteristics among control group

than among the training participants, because individuals in the control group possibly found

already a job prior to their potential treatment times. This would bias the results negatively. We

randomly assign (pseudo) participation starts to each individual in the control group. Thereby,

we recover the distribution of the elapsed unemployment duration at the time of training

participation from the treatment group (similar to, e.g., Lechner, 1999, Lechner and Smith,

2007). To ensure comparability of the treatment denitions of the participants and non-

participants, we only consider individuals who are unemployed at their (pseudo) treatment

dates. This makes the groups of participants and non-participants comparable with respect to

the duration of unemployment and ensures that treated and controls are eligible for programme

participation at their respective assigned start dates.

The finale sample contains 85,185 unemployed individuals (Table A.1 in Appendix A

provides the details of the sample selections steps). From this sample, 12,998 unemployed

individuals participate in a job search program and 72,187 are members of the control group.

These unemployed are assigned to 1,282 different caseworkers.

3.3 Descriptive statistics


Table 3.1 reports the means and standard deviations for the main variables for participants and

non-participants. The descriptive statistics for all other variables are shown in Table B.1 in

Appendix B. During the first 31 months after the start of training, participants in JSPs are less

months employed than non-participants. Six months after the start of participation, we find

strong negative differences with standardized differences above 20. For the other time periods
9
the standardized differences are smaller. During the months 25 until 31 the after start of training,

participants are slightly higher employment intensity.

Table 3.1: Descriptive statistics of important variables by treatment status

Participants Non-Participants Std. Diff.


Mean S.D. Mean S.D. in %
Outcomes
Cumulated employment in first 6 months 1.21 1.93 1.94 2.44 -23.29
Cumulated employment in first 12 months 3.68 4.27 4.53 4.80 -13.12
Cumulated employment in first 31 months 15.30 12.49 15.59 12.85 -1.60
Cumulated employment in months 25 - 31 3.48 2.88 3.33 2.86 3.72
Individual characteristics
Female 0.45 - 0.44 - 0.58
Age in 10 year 3.73 0.88 3.66 0.86 5.59
Qualification: unskilled 0.22 - 0.23 - -1.80
Qualification: some degree 0.60 - 0.56 - 5.19
Employability rating: low 0.12 - 0.14 - -3.97
Employability rating: medium 0.77 - 0.74 - 5.79
Employability rating: high 0.11 - 0.12 - -3.62
Number of unemployment spells in last two years 0.41 0.98 0.64 1.27 -13.85
Fraction of months employed in last two years 0.83 0.22 0.79 0.25 12.57
Past income in CHF 10,000 4.58 2.02 4.16 2.05 14.50
Caseworker characteristics
Age in 10 years 4.43 1.16 4.44 1.16 -0.77
Female 0.45 - 0.41 - 6.94
Tenure in employment office in years 5.54 3.23 5.86 3.31 -6.84
Own experience of unemployment 0.63 - 0.63 - 0.54
Degree in vocational training for caseworkers 0.26 - 0.23 - 5.63
Local labour market characteristics
German speaking employment office 0.89 - 0.67 - 39.68
French speaking employment office 0.08 - 0.25 - -33.30
Italian speaking employment office 0.03 - 0.08 - -16.81
Cantonal unemployment rate in % 3.64 0.77 3.75 0.86 -9.23
GDP per capita in the canton in CHF 10,000 5.13 0.92 4.92 0.93 15.75
Number of caseworker 989 1,282
Number of observations 12,998 72,187
Note: Unconditional means of all, standard deviations of non-binary variables, and standardized differences in % between
participants and non-participants. The descriptive statistics of all variables are in Table B.1 of Appendix B. See
Rosenbaum and Rubin (1983) for a definition of the standardized difference. They consider an absolute standardized
difference of more than 20 as being large.

Furthermore, we observe differences in individual characteristics, the characteristics of

their caseworkers, and the local labour market conditions. Participants have been more months

employed and received a higher income than non-participants in the last two years. We only
10
find little differences between the caseworker of participants and non-participants. 7 Finally, we

find participants are more often registered at a German speaking local employment agency and

live in cantons with better economic conditions (in terms of local GDP and unemployment rate)

than non-participants.

4 Econometric approach

4.1 Basic concept and identification


We describe the parameters of interest using the causality notion introduced in Rubins (1974)

potential outcome framework. Let Yi1 denote the potential outcome (i.e., months employment)

when individual i (for i = 1,..., N ) participates in a JSP ( Di = 1) . Conversely, Yi denotes the


0

potential outcome when individual i is not participating in a JSP ( Di = 0 ) . Obviously, each

individual can either participate or not, but both cannot occur simultaneously. This implies that

only one potential outcome is observable. The observed outcome equals

Yi = Yi1 Di + Yi 0 (1 Di ) .

It is common to define the causal treatment effect of D on Y for an individual as

=i Yi1 Yi 0 .

Even in randomized experiments, the individual causal effects, i , are not identified; however,

suitable average values that may be of interest and potentially identifiable under additional

assumptions. Standard examples are the average treatment effect (ATE), = E [ i ] , or the

=
average treatment effect on the treated (ATET), [ i | Di 1] . The former represents the
E=

7
In Table B.1 in Appendix B we also show caseworker characteristics interacted with the language of the local employment
agency. For the interacted variables we find partly strong differences between participants and non-participants.

11
average causal effect for the population of unemployed, the latter the average for participants

only. ATE and ATET might differ in the presence of selection into treatment and effect

heterogeneity.

Both aforementioned average effects could hide interesting effect heterogeneity.

Therefore, researchers gain interest in investigating the conditional average treatment effects

(CATEs). CATEs report causal treatment effects for a specific group characterized by the

variables in the vector Z i ,

( z ) = E i Z i = z = E Yi1 Yi 0 Z i = z ,
(=
z ) E i =
Z i z ,= 1 E Yi1 Yi 0 =
Di = Di 1 ,
Z i z ,=

where we indicate random variables by capital letters and realizations of these random variables

by small letters. The vector Z i contains variables that are potentially relevant for effect

heterogeneity.

In addition to the variables included in the vector Z i , we consider confounding variables

which are included in the vector X i . Confounders are variables which jointly affect the

probability to participate in JSP and the employment outcome. Z i may be larger, smaller,

partially or fully overlapping with X i depending on the question under investigation. We

impose the following assumption to achieve identification.

Assumption 1 (Conditional independence): Yi1 , Yi 0 Di |=


X i x=
, Z i z for all values of x and

z in the support.

Assumption 2 (Common support): 0 < P( Di =


1| X i =
x, Z i ==
z ) p ( x, z ) < 1, for all values of

x and z in the support.

=
Assumption 3 (Exogeneity of controls): X i1 X=0 1
i , Zi Z i0 .

12
Assumption 1 states that the potential outcomes are independent of program

participation conditional on confounding and heterogeneity variables. The plausibility of this

assumption is justified by the availability of a very detailed set of confounding variables

containing characteristics of unemployed and caseworkers. The studies of Biewen et al. (2014)

and Lechner and Wunsch (2013) discuss the selection of confounders in ALMP evaluations

based on rich administrative data. The decision for program participation is mainly driven by

the caseworker, who has high autonomy within the employment office. Our data contain the

same objective measures about labour market history, education and socio-demographics of the

unemployed as well as local labour market characteristics that are observable to the caseworker

when deciding about program participation. Additionally, we observe detailed information

about the caseworker and her counselling style. These are potential confounders as caseworker

characteristics might affect the probability of participation and labour market outcomes

simultaneously. In our baseline estimation, we adapt the specification of Huber, Lechner, and

Mellace (2017).

Under Assumption 2 the conditional probability to participate in JSP lies between zero

and one. The common support assumption has to hold when conditioning jointly on X and Z .

By Bayes rule, this assumption implies 0 < p( x) < 1 and 0 < p( z ) < 1 for

p=
( x) P=
( Di 1|=
X i x) and p=
( z ) P=
( Di 1|=
Z i z ). We enforce common support by

trimming observations below the 0.5 quantile the participants and the 99.5 quantile of non-

participants. 8 This procedure shows good final sample performance in the study Lechner and

Strittmatter (2017). Assumption 3 requires exogeneity of confounding and heterogeneity

variables. To account for this assumption, we only use control variables which are determined

prior to the start of JSP participation.

8
Total number of trimmed observations: 6,767 (579 participants, 6,188 non-participants).

13
Theorem 1 (Identification): Under Assumptions 1-3 the following equalities hold

( z) = [ E (Yi | Zi =
E X |Z z = 1) | Z i z ] E X |Z
z , X i , Di == z [ E (Yi | Zi = 0) | Z i z ]
z , X i , Di ==

( z) =
E (Yi | Z i = 1) E X |Z = z [ E (Yi | Z i =
z , Di = z , X i , Di =
0) | Z i = 1]
z , Di =

Thus ( z ) and ( z ) are identified from observable data on { yi , di , zi , xi }i =1 .


N

The proof of Theorem 1 is in Appendix C. When the vectors X i and Z i are identical,

then ( x) =
( x) =
E (Yi | X i =
x, Di =
1) E (Yi | X i =
x, Di =
0), but this does not allow to search

for groups of unemployed persons with relevant effect heterogeneity.

4.2 Modified covariate method (MCM)

Our baseline analysis applies the MCM introduced by Tian et al. (2014) to systematically

uncover effect heterogeneity. To gain some intuition about this method, assume participation

in a programme is randomly assigned to 50% of the unemployed persons. Accordingly, in this

introductory example is no need to adjust for selection bias adjustments using confounding

variables X i . Furthermore, consider Z i contains a constant term and one additional variable

Z i1 which is responsible for effect heterogeneity. A standard linear regression model is

Yi =Z i s + Di Z i + ui . (1)

The first term on the right side of equation (1) provides the linear approximation of the

=
conditional expectation for under non-participants z s E=
Yi 0 | Z i z . The second term on

the right side of (1) provides a linear approximation of the CATE

( z) =
z =
E (Yi1 Yi 0 | Z i =
z ).

Tian et al. (2014) propose to include the modified variable=


Ti 2 Di 1 in the regression

model
14
Ti Z i
Yi =Z i t + + vi . (2)
2

The treatment indicator is shifted from Di {0,1} to Ti 2 {0.5, 0.5} . The modification does

not alter the parameters . But z t becomes now the linear approximation of E [Yi | Z i = z ] .

Notice that Cov ( Z i1 , Ti Z i1=


) Var ( Zi1 ) E[Ti=] 0, where the first equality holds under random

assignment of training participation and the second equality holds because E[T1 ] = 0. 9

Accordingly, the right hand terms of equation (2) are independent of each other and the

coefficients t and can be estimated in two separate bivariate regressions. For example, we

can estimate CATEs with the model

Ti Z i
=Yi + i ,
2

Which we call the MCM model in the following. 10 The MCM is a suitable method when only

the interaction effects and not the main effects are of interest. Parsimony and robustness to

misspecification of the main effects are two advantages of the MCM compared to specification

(1). These properties maintain in the non-experimental setting, which we discuss next.

Chen et al. (2017) show that MCM can be combined with IPW re-weighting, a standard

approach to balance covariates in observational studies (Hirano, Imbens, and Ridder, 2003).

The parameters can be estimated using Weighted Ordinary Least Squares (WOLS), i.e. by

minimising the objective function

1 N
tz
2

= arg min w (di , xi , zi ) yi i i
2
(3)
N i =1

9
In contrast, Cov ( Z i1 , Di Z i1 )= Var ( Z i1 ) E[ Di ]= Var ( Z i1 ) 2 > 0.
10
Alternatively, the MOM makes the outcome modification Yi * = 2TY
i i
, which enables the non-parametric

( z ) E=
= Yi * Z i + ri estimation of CATEs.
Yi * Z i z or parametric =

15
where the IPW weights

di P ( Di =1| X i = x, Z i = z )
w (di , xi , zi ) ti
P ( Di =1| X i = x, Z i = z ) 1 P ( Di =1| X i = x, Z i = z )

are calculated using the estimated propensity score.

4.3 Principled search for effect heterogeneity


An attractive estimator from the machine learning literature to conduct a principled search for

predictive variables is Tibshiranis (1996) Least Absolute Shrinkage and Selection Operator

(LASSO, see, e.g., Hastie, Tibshirani, Wainwright, 2015, Horowitz, 2015, for recent

developments of LASSO). The weighted LASSO estimator of the MCM minimizes the

objective function

1 N zi ti
2 p
= arg min w ( di , xi , zi ) yi + N j
(4)
=
N i 1 = 2 j 1

where we add a penalization term for the sum of the coefficients of the p variables in Z . The

penalizing parameter N specifies the amount of penalization. If N = 0 , then (4) is equivalent

to the WOLS model in (3). However, when N > 0 some coefficients are shrunken towards

zero. For sufficiently high values of N some coefficients become exactly zero. Therefore, the

LASSO serves as a model selector, omitting variables with little predictive power from the

model. 11 A challenge is the optimization of the penalty term, such that only the relevant

predictors of the CATE remain in the model. Too low penalties lead to overfitting, too high

penalties lead to models which miss important variables. We use cross-validation to optimize

the penalty term.

11
The larger the values of N the fewer variables remain in the model. By gradually increasing the penalty term one can
obtain a path from a full model to a model which only contains the constant term.

16
A common approach to select the optimal penalty term is to use cross-validation (e.g.,

Bhlmann and van der Geer, 2011, Chen et al., 2017, Tian et al., 2014). Following this

approach, we apply 10-fold cross-validation to find the penalty term with the best out-of-sample

performance. This means we randomly split the sample into ten subsamples of equal size. The

weighted LASSO is then estimated on a grid of different penalties using only nine of the

subsamples. The remaining subsample is used to evaluate the out-of-sample root mean squared

error (RMSE) for each penalty term on the grid. This is done ten times such that every

subsample is left out once. The average over the ten RMSE paths approximates the out-of-

sample RMSE. The penalty term with the minimum average RMSE along the grid is used to

estimate the weighted LASSO in the full sample.

The LASSO coefficients are biased when N > 0 (see, e.g., Zou, 2006). For this reason,

we use the so called Post-LASSO coefficients to calculate the RMSE in the cross-validation.

The Post-LASSO coefficients are obtained from a WOLS (similar to model (3)) which includes

all variables with non-zero coefficients in the LASSO at the considered penalty value (see, e.g.

Belloni, Chernozhukov, Hansen, 2013).

The 10-fold cross-validated Post-LASSO selects a set of heterogeneity variables that

can be used to calculate the predicted CATE(z) ( z ) = z . However, selecting and estimating

these coefficients in the same sample can lead to invalid inference, because the selected model

would just be fitted to the data. To overcome possible concerns, we randomly split the sample

in two parts with similar size. The first sample is used to select the variables with the described

procedure and the second sample is used to estimate with the selected model. Under

independence between the selection and estimation samples, standard WOLS statistical

inference is valid for the selected variables. Taking these coefficients, we are able to calculate

the predicted treatment effect of each observation in our sample using ( zi ) = zi .

17
We observe that different random splits of the sample lead to different models and the

overlap of the selected variables across different splits is only small. However, the correlations

of the predicted individual effects ( zi ) across different random splits are usually above 0.5.

We conclude that the selected variables are not identical but contain similar relevant

information for effect heterogeneity across sample splits. The predicted individual effects are

therefore averaged over 20 different random splits to reduce model dependency and smooth

outliers resulting from potential in specific model extrapolation. 12 These average predicted

individual effects, ( zi ) are then used to summarize the detected effect heterogeneity.

In our baseline model, we adapt the propensity score specification of Huber, Lechner,

and Mellace (2017), which is reported in Table D.1 of Appendix D. The set of candidate

heterogeneity variables consists of individual and caseworker characteristics, their second order

interactions, up to fourth order polynomials and logged versions of non-binary variables.

Additionally, we consider dummy variables for the 103 employment agencies as well as 29

category dummies for previous industry and 29 category dummies for previous job description.

We exclude variables where less than 1% of (non-)participants show non-zero values. Further,

we keep only one variable of variable pairs that show correlations higher than 0.99. In total,

this lead to 1,268 heterogeneity variables which we consider in the analyses.

4.4 Efficiency augmentation

In MCM the main effects do not have to be specified. Nevertheless, Tian et al. (2014) and Chen

et al. (2017) show that accounting for the main effects can improve performance of the estimator

for , because they can absorb variation in the outcome which is unrelated to the effect

heterogeneity. They propose two ways to account for the main effects.

12
This is in the spirit of bootstrap aggregation (`bagging`) in the machine learning literature (Breiman, 1996).

18
First, the one-step procedure selects main effects and interaction effects simultaneously

by solving

1 N zt
2 p
= arg min w (di , xi , zi ) yi zi i i + N j .
=
N i 1 = 2 j 1

We use this procedure in our main specifications.

Second, the two-step procedure estimates a model for the main effects in the first place.

Afterwards, the residuals u of the first step regression are used as new outcome when selecting

the interaction effects

1 N zi ti
2 p
= arg min w(di , xi , zi ) ui
+ N j .
=
N i 1 = 2 j 1

5 Results

5.1 Average effects

In Table D.1 in Appendix D reports the marginal effects of the propensity score model. The

results confirm the notion of the descriptive statistics that the participation probability is

increasing with the level the of previous labour market success. Unemployed with low labour

market opportunities have a lower probability to participate in JSP. Such cream-skimming is

often observed for the assignment of ALMP (e.g., Bell and Orr, 2002).

For the IPW estimation, we have to check the balancing of our covariates. Table D.2 in

Appendix D documents that IPW successfully balances the covariates between participants and

non-participants. The absolute standardised differences after re-weighting are always below

2.5%, even for basis variables not included in the propensity score. Figure D.1 shows that

common support is not an issue in our setting because the propensity score distributions of

participants and non-participants are overlapping for nearly all values.


19
Figure 5.1: Development of potential outcome and average programme effects after programme start

Notes: Inverse Probability Weighting estimates. Standard errors based on 4999 bootstraps clustered at caseworker level.
Circles / triangles indicate that the effects are statistically different from zero at the 5% level.

Figure 5.1 shows the estimated potential outcomes and average program effects on

employment for each of the first 31 months after the program start. We observe substantial lock-

in effects. The employment probability in the first three months is about 15 percentage points

lower for JSP participants compared to non-participants. However, participants catch up and

the differences in the employment probability fade after 16 months. In the months 22-24 after

programme start we find small positive effects. But this seems to be only a short nuisance.

Overall, the long-run effects are insignificant and very close to zero. The negative lock-in

effects are consistent with the findings of the previous Swiss JSP evaluations (e.g., Gerfin and

Lechner, 2002, Lalive, van Ours, and Zweimller, 2008). But also in other countries the

effectiveness of JSP is negative (see e.g., Dolton and ONeil, 2002, Wunsch and Lechner,

2008). Possibly participants reduce the intensity of informal job search and focus on formal job

search during participation in JSP. This might be an inefficient strategy, especially for those

participants with good labour market opportunities.

20
Table 5.1: Average programme effects for aggregated outcomes
ATE ATET ATENT
Cumulated employment in months Coef. S.E. Coef. S.E. Coef. S.E.
1-6 -0.80*** 0.02 -0.82*** 0.02 -0.80*** 0.02
1-12 -1.10*** 0.05 -1.13*** 0.04 -1.09*** 0.05
1-31 -1.14*** 0.14 -1.20*** 0.13 -1.12*** 0.15
25-31 -0.007 0.03 -0.011 0.03 -0.007 0.04
Notes: Inverse Probability Weighting estimates. Standard errors based on 4999 bootstraps clustered at caseworker level.
*, **, *** means statistically different from zero at the 10%, 5%, 1% level.

Searching for effect heterogeneity at each month after program start is computationally

expensive and hard to summarize in an intuitive way. Therefore, we focus on cumulated

employment after program start in the first 6, 12 and 31 months as well as cumulated

employment in months 25 to 31. Table 5.1 shows the respective average effects that mirror the

findings in Figure 5.1. The lower employment probabilities after the program translate into an

average decline of 0.8 employment months during the first six months. This decrement slightly

to 1.1 months during the first year and during the first 31 months. During the months 25 to 31

we find no significant employment effects.

5.2 Effect heterogeneity

Going beyond the average treatment effects, Table 5.2 reports the estimated coefficients in

the first sample split considered. 13 The coefficients are marginal effects on the treatment effect

of JSP as opposed to marginal effects on the outcome level in standard OLS. The coefficients

of the first row are the hypothetical treatment effect when all other variables in the model would

equal zero.

The first column of Table 5.2 reports the coefficient estimates for the outcome

cumulated employment during the first six months after the start of training participation. 17

out of the 1,268 considered variables are selected as being predictive for the size of the treatment

13
We omit the coefficients for the selected main effects because they are only used for the efficiency augmentation and
irrelevant for the interpretation.

21
effect in the model selection sample. In the estimation sample, five of these variables are

significant. For example, the treatment effect increment by 0.3 months for unskilled workers

with previous earnings between 0 and 25,000 CHF per year (see row 3). For a hypothetical

individual for whom all other selected variables equal zero, the predicted treatment effect would

be obtained as i =
0.87 + 0.3 =
0.57 .

Table 5.2: Coefficients of selected variables in outcome equation

Cumulated employment after registration in months 1-6 1-12


Coef. S.E. Coef. S.E.
Job search program (treatment) -0.89*** 0.05 -1.29*** 0.09
# of unemployment spells in last two years 0.06 0.12 - -
Unskilled * past income 0 - 25k 0.30*** 0.11 0.53 0.53
Skilled w/o degree * same gender 0.20 0.21 - -
Skilled w/o degree * age difference -0.01 0.01 - -
# of unemployment spells in last two years * age of CW 0.00 0.00 - -
# of unemployment spells in last two years * medium city size -0.05 0.06 -0.13 0.14
# of unemployment spells in last two years * past income 0 - 25k -0.04 0.06 -0.10 0.14
# of unemployment spells in last two years * prev. job unskilled 0.04 0.05 0.21* 0.13
# of unemployment spells in last two years * same gender -0.01 0.05 - -
Caseworker own unemployment experience * prev. job unskilled 0.19** 0.09 0.34* 0.21
Foreigner with C permit * past income 25 - 50k 0.19 0.12 - -
Small city * 50 - 75k -0.16* 0.09 -0.26 0.20
Single household * zero employment spell last two years -0.17** 0.08 - -
Single household * prev.job unskilled 0.16 0.11 - -
Previous job primary sector * age difference -0.02** 0.01 - -
Previous job restaurant -0.01 0.12 - -
Previous job tourist sector -0.09 0.12 - -
Unskilled * prev. job unskilled - - -0.22 0.64
# of unemployment spells in last two years * both primary education - - 0.19** 0.08
Degree in vocational training for caseworkers * past income 50 - 75k - - -0.13 0.30
Past income 25 - 50k * unskilled - - 0.14 0.24
# employment spells past five years * prev. job in primary sector - - -0.24 2.16
Prev. job in primary sector * unskilled - - -0.19 0.53
Employment agency 44 - - -0.68 0.52
# of selected variables 17 of 1,268 13 of 1,268
# significant at 10% 5 3
# significant at 5% 4 1
# significant at 1% 1 0
# of caseworkers 1,282
# of observations 39,216
Note: *, **, *** means statistically different from zero at the 10%, 5%, 1%. Standard errors clustered at caseworker level.
Table shows results for the first sample split, applying one step efficiency augmentation and 10-fold cross-validated
Post-Lasso. Results for long-term outcomes in Appendix E.

22
Figure 5.2: Distribution of average predicted individual effects for cumulated employment of first 6 months

Note: Kernel smoothed distribution of average predicted individual effects. Bandwidth chosen using the
Silverman rule. Results from the one step efficiency augmented 10-fold cross-validated Post-Lasso.
Dashed line shows the IPW estimate of the ATE.

The second column of Table 5.2 shows the 13 selected variables for cumulated

employment during the first 12 months after the start of training participation. The selected

heterogeneity variables are partially overlapping for the two employment outcome durations

(comp. column 1 and 2). Table E.1 in Appendix E reports additional results for the outcome

months employed in the first 31 months and months 25-31. While the former looks similar to

the results observed in Table 5.2, no heterogeneity variable was selected for the latter.

Figure 5.2 shows the distribution of the average predicted individual effects and reports

substantial variation in the treatment effects that are hidden by averaging over the entire

population. While the biggest part of the mass is between the ATE estimate -0.8 and -1, we

observe a non-negligible part of the unemployed with substantially less negative predicted

effects and even some with a positive predicted effect of the program.

Table 5.3 shows summary statistics of the predicted CATEs. The means of all the

predicted treatment effects are close to the estimated ATEs, as they should by the law of iterated

expectations. The largest predicted group effect for the first 6 months is 0.89. This suggest that

23
there are potentially unemployed who really benefit from the program. However, the majority

seems to be worse off under participation.

Table 5.3: Descriptive statistics of CATEs


Mean Median S.D. Min Max
Cumulated employment in first 6 months -0.78 -0.84 0.23 -1.40 0.89
Cumulated employment in first 12 months -1.09 -1.22 0.37 -2.38 1.98
Cumulated employment in first 31 months -1.12 -1.19 0.40 -2.64 2.29
Cumulated employment in months 25 - 31 -0.01 -0.01 0.03 -0.20 0.25
Notes: Descriptive statistics of the predicted individual effects on support (n = 78,431).
Averaged over the models obtained from 20 different sample splits.

Accordingly, the MCM is successful in discovering substantial effect heterogeneity.

However, the interpretation of the results is not easily accessible, because the underlying

functions are too complex. We can use computer algorithms to translate these complex

functions into decision rules. However, we want to go beyond that abstract description and want

to make explicit policy recommendations. One way to summarise the results for policy makers

is to marginalise the effects for specific variables of interest.

Figure 5.3 reports the average of the predicted group effects by 15 individual

characteristics. All characteristics are classified in bins with high and low. We report the

average predicted group effects low and high associated low and high values of the

characteristics, respectively. To account for the fact that the predicted group effects are based

on first step estimates , we exploit the asymptotic normality of WOLS coefficients. To this

end, we calculate 1,000 simulated predicted group effects s ( zi ) = zis , where for each

simulation s, s is drawn from a multivariate normal distribution N , ( ( )) with ( )


being the estimated covariance matrix of the WOLS. The standard deviation of all simulated

s ,high and s ,low is then used to calculate the confidence intervals that are displayed in Figures

5.3 and 5.4. We order the characteristics by the absolute size of the differences between high

and low .
24
Figure 5.3: Effect heterogeneity along individual characteristics for cumulated employment in the first 6 months

Note: Conditional average treatment effect in strata defined by low and high values of individual characteristics. Low
value: binary characteristic: 0, non-binary: < median; High: binary characteristic: 1, non-binary: median. Ordered
by size of absolute differences. Based on average predicted individual effects of 20 replications with one step
efficiency augmented 10-fold cross-validated Post-Lasso. 95%-confidence interval shown from 1000 draws from
the multivariate distribution of the estimated models and their respective means. *, **, *** means that the
differences in these 1000 draws are significantly different from zero at the 10%, 5%, 1% level, respectively.

Figure 5.3 shows substantial differences for nearly all considered variables. The biggest

difference is observed for the number of unemployment spells in the past two year. The average

effect for people with no previous unemployment is much lower than for unemployed with at

least one recent unemployment spell. The former show an average of -0.85 and the latter an

average of -0.61. Further, we see that individuals with some educational degree suffer much

more from JSP than individuals with no degree. In general, we observe that the lock-in effect

is much less pronounced for unemployed with lower qualification and therefore employment

prospects. These findings are consistent with the previous results of the evaluation literature

(e.g., Card, Kluve, and Weber, 2015, van den Berg and van der Klaauw, 2006). Furthermore,

25
the lock-in effects are less negative for foreigners. Possibly foreigners have a relatively small

network for informal job search. Therefore, the formal job search strategy might be relatively

successful for foreigners. We find only little heterogeneity by gender and age, which is in line

with the findings of Vikstrm, Rosholm, and Svarer (2013).

Figure 5.4 shows the same analysis with respect to caseworker characteristics. Although

we find some significant differences, they are much less pronounced than for the individual

characteristics. The biggest difference is observed if caseworkers have own unemployment

experience but the lock-in effect is only different by 0.06 months, which seems negligible.

Figure 5.4: Effect heterogeneity along caseworker characteristics for cumulated employment in the first 6 months

Note: Conditional average treatment effect in strata defined by low and high values of individual characteristics. Low
value: binary characteristic: 0, non-binary: < median; High: binary characteristic: 1, non-binary: median. Ordered
by size of absolute differences. Based on average predicted individual effects of 20 replications with one step
efficiency augmented 10-fold cross-validated Post-Lasso. 95%-confidence interval shown from 1000 draws from
the multivariate distribution of the estimated models and their respective means. *, **, *** means that the
differences in these 1000 draws are significantly different from zero at the 10%, 5%, 1% level, respectively.

26
Table 5.4 Characteristics of individuals with positive vs. negative predicted effects

0 < 0 Difference S.E.


Job search program within 6 months after registration 0.08 0.16 -0.08*** 0.01
Female 0.37 0.45 -0.06* 0.03
Past income in CHF 10,000 3.32 4.24 -0.10*** 0.01
Fraction of months employed in last two years 0.70 0.80 -0.10*** 0.01
Number of unemployment spells in last two years 4.96 0.54 4.39*** 0.16
Qualification: unskilled 0.57 0.24 0.33*** 0.05
Qualification: semiskilled 0.15 0.16 -0.01 0.02
Qualification: skilled without degree 0.20 0.04 0.13** 0.05
Qualification: some degree 0.08 0.57 -0.45*** 0.02
Foreigner with mother tongue in canton's language 0.14 0.11 0.02* 0.01
Employability rating: low 0.36 0.14 0.24*** 0.04
Employability rating: medium 0.63 0.76 -0.15*** 0.05
Employability rating: high 0.01 0.10 -0.09*** 0.00
Age in 10 year 3.64 3.67 -0.02 0.05
Foreigner with B permit 0.28 0.13 0.17*** 0.03
Foreigner with C permit 0.44 0.25 0.14*** 0.02
Number of observations 572 77,859
Note: Average characteristics of individuals with positive and negative predicted effects for cumulated employment in the first
6 months. Based on average predicted individual effects of 20 replications with one step efficiency augmented 10-
fold cross-validated Post-Lasso. Standard errors of the difference calculated as the standard deviation of the
differences in 1000 draws from the multivariate distribution of the estimated models and their respective means. *,
**, *** means that the differences in these 1000 draws are significantly different from zero at the 10%, 5%, 1%
level, respectively.

Finally, we can use the predicted group effects to investigate two highly policy relevant

questions: What are the characteristics of unemployed who benefit from the program? Do

caseworkers send these promising unemployed actually to the program?

The number of individuals with positive predicted effects is 572 which corresponds to

0.7% of the unemployed. Table 5.4 shows the differences in characteristics, standard errors are

again obtained by simulating 1000 predicted effects and calculating the standard deviation of

the differences in all simulations. The first row reports the JSP assignment share. Only 8% of

the unemployed with a predicted positive effect are actually assigned to the program. This is

half of the 16% who are assigned with negative predicted group effects. This points at potentials

to improve the selection of JSP participants. The results in the remaining rows show how the

selection could be improved concretely. The effectiveness of JSP could be higher if caseworker

assign preferably unemployed persons with bad labour market prospects. Especially
27
unemployed persons with a recent history of unemployment, without degree, and with a low

employability rating are associated with positive predicted outcomes.

5.3 Robustness

To investigate the sensitivity of our results to an alternative algorithm for model selection, we

apply the so-called Adaptive Lasso (Zou, 2006) defined by

N
G ( zi ) ti 2 p
j
Adaptive Lasso arg min w (di , xi , zi )( yi
= ) + Ridge , (3)
j 1 j
= i 1= 2

where jRidge is obtained in a first step by solving

N
G ( zi ) ti 2 p
= Ridge arg min w (di , xi , zi )( yi ) + j2 .
=i 1 =j 1 2

The Ridge estimator penalises the sum of squared coefficients instead of the sum of the

absolute coefficients (Hoerl and Kennard, 1970). It therefore shrinks the coefficients also to

zero but never exactly to zero and cannot be used for model selection. The Adaptive Lasso in

equation (3) shrinks coefficients that are already relatively small after Ridge penalization more

aggressive to zero and addresses the problem that Lasso tends to select too many variables. Zou

(2006) shows that this modification of the LASSO can achieve the Oracle property, which

means that it selects the correct model at least asymptotically.

We rely in our baseline estimation on the modified covariate method because it offers the

explained possibility of efficiency augmentation. Another possibility is to use the modified

outcome method that leaves the covariates unchanged but modifies the outcome. This procedure

is first proposed by Signorovich (2007) and extended to observational studies by Zhang et al.

(2012). It can be estimated using

28
1 N p

= arg min ( yi* G ( zi ) ) 2 + N j ,
= N i 1 =j 1

where

di p ( xi , zi )
=yi* y=
i w( d i , xi , zi )Ti yi
p ( xi , zi ) 1 p ( xi , zi )

is the modified outcome. Note that the average of all yi* is identical to the IPW estimate

of the ATE.

Table 5.5 shows that the average predicted individual effects for different

implementations are highly correlated. In the first column, the baseline model with one step

efficiency augmentation and Post-Lasso model selection shows a correlation of at least 0.7 with

the alternative methods. Therefore, it is not surprising that the main conclusions drawn from

the reported baseline estimations are confirmed using alternative implementations.

Table 5.5 Correlation of average predicted individual effects over different methods
Cumulated employment in months 1-6 1st PL 2st PL 1st Ada 2st Ada MOM PL MOM Ada
One step Post-Lasso 1.00
Two step Post-Lasso 0.85 1.00
One step Adaptive Lasso 0.79 0.51 1.00
Two step Adaptive Lasso 0.83 0.59 0.93 1.00
Modified outcome method Post-Lasso 0.82 0.81 0.62 0.65 1.00
Modified outcome method Adaptive Lasso 0.70 0.56 0.67 0.67 0.76 1.00
Note: Correlation of average predicted individual effects for different methods of efficiency augmentation, model selection
and modifications.

6 Conclusion
We investigate recently developed methods to uncover treatment effect heterogeneity. We

apply these methods to a large-scale investigation of effect heterogeneity of a training

programme. We allow for a high-dimensional set of potential heterogeneity variables and use

algorithms based on the LASSO (Tibshirani, 1996) to select variables that are predictive for the

29
size of the treatment effect. We split the sample to obtain consistent estimates after model

selection and aggregate over several models to obtain predicted group effects.

We find that the standard estimates of average treatment effects hide substantial

heterogeneity in the size of the effects. While unemployed with good labour market prospects

suffer from substantial lock-in effects if participating in the job search program, unemployed

with bad employment risks might even benefit. However, in the case of the considered JSP this

seems to be only a tiny fraction of unemployment. We find evidence that caseworkers are less

likely to send those unemployed to the JSP who actually profit from it. This suggests that

expected returns to program participation are either not considered or expected to be completely

different than they turned out to be.

The newly considered methods provide useful and plausible additional information for

empirical evaluation studies. Specifically, they could inform policy makers how to improve

existing assignment mechanisms to target individuals who are expected to benefit most and

therefore increase cost-benefit efficiency.

Researchers have a variety of choices how to specifically implement these new methods.

They can decide between modifying either the outcome or the covariates. Additionally, the

implementation of the variable selection is open to numerous different choices. We find that

our results are consistent over six different implementations. However, the current state of the

literature gives very little guidance which method is preferable for a specific application at

hand.

30
Appendix A: Sample selection
Table A.1: Sample selection criteria for empirical analysis
Selection criteria Remaining sample size
Population: all new jobseekers during the year 2003 238,902
Exclude Geneva and five other employment offices -19,464 219,438
Exclude jobseekers not (yet) assigned to a caseworker -4,289 215,149
Exclude foreigners without yearly or permanent work permit -5,399 209,750
Exclude jobseekers without unemployment benefit claim -18,434 191,316
Exclude jobseekers who applied for or claim disability insurance -3,163 188,153
Restrict to prime-age population (24 to 55 years old) -51,649 136,504
Exclude unemployed whose caseworker did not respond to the questionnaire -31,469 105,035
Exclude unemployed whose caseworkers did not respond to the -4,915 100,120
cooperativeness question
Exclude participants in other ALMP than JSP -8,787 91,333
Exclude individuals employed at (pseudo) treatment date -6,148 85,185

Source: Selection steps are (partly) collected from Table B.1 in Huber, Lechner, Mellace (2017).

31
Appendix B: Descriptive statistics
Table B.1: Descriptive statistics of variables used in empirical analysis by treatment status

Participants Non-Participants Std. Diff.


Mean S.D. Mean S.D. in %
Outcomes
Cumulated employment in first 6 months 1.21 1.93 1.94 2.44 -23.29
Cumulated employment in first 12 months 3.68 4.27 4.53 4.80 -13.12
Cumulated employment in first 31 months 15.30 12.49 15.59 12.85 -1.60
Cumulated employment in months 25 - 31 3.48 2.88 3.33 2.86 3.72
Individual characteristics
Female 0.45 - 0.44 - 0.56
*French 0.04 - 0.11 - -19.51
*Italian 0.01 - 0.04 - -11.85
Mother tongue other than German, French, Italian 0.29 - 0.32 - -5.40
*French 0.02 - 0.08 - -18.01
*Italian 0.01 - 0.02 - -9.80
Qualification: unskilled 0.22 - 0.23 - -1.80
*French 0.03 - 0.05 - -8.36
*Italian 0.01 - 0.03 - -8.62
Qualification: semiskilled 0.15 - 0.16 - -2.45
*French 0.02 - 0.05 - -12.10
*Italian 0.00 - 0.01 - -5.16
Qualification: skilled without degree 0.03 - 0.05 - -4.72
*French 0.00 - 0.02 - -11.22
*Italian 0.00 - 0.01 - -4.11
Number of unemployment spells in last two years 0.41 0.98 0.64 1.27 -13.86
*French 0.05 0.36 0.19 0.76 -16.84
*Italian 0.02 0.22 0.07 0.46 -10.16
Fraction of months employed in last two years 0.83 0.22 0.79 0.25 12.57
*French 0.06 0.22 0.19 0.35 -30.04
*Italian 0.02 0.13 0.06 0.22 -15.77
Employability rating: low 0.12 - 0.14 - -3.98
*French 0.01 - 0.02 - -9.87
*Italian 0.00 - 0.01 - -4.94
Employability rating: medium 0.77 - 0.74 - 5.80
*French 0.07 - 0.19 - -26.32
*Italian 0.02 - 0.05 - -11.57
Age in 10 year 3.73 0.88 3.66 0.86 5.60
Age squared / 10,000 0.15 0.07 0.14 0.07 5.61
Married 0.47 - 0.49 - -2.35
Foreigner with B permit 0.11 - 0.14 - -6.96
Foreigner with C permit 0.23 - 0.25 - -3.12
Table B.1 to be continued

32
Table B.1 continued

Participants Non-Participants Std. Diff.


Mean S.D. Mean S.D. in %
Lives in big city 0.17 - 0.17 - -0.05
Lives in medium sized city 0.16 - 0.13 - 4.83
Past income in CHF 10,000 4.58 2.02 4.16 2.05 14.50
Number of employment spells in last 5 years 0.10 0.13 0.13 0.15 -14.70
Previous job in primary sector 0.06 - 0.10 - -10.44
Previous job in secondary sector 0.16 - 0.13 - 6.04
Previous job in tertiary sector 0.63 - 0.58 - 7.07
Foreigner with mother tongue in canton's language 0.12 - 0.11 - 2.40
Previous job self-employed 0.00 - 0.01 - -3.01
Previous job manager 0.08 - 0.07 - 1.85
Previous job skilled worker 0.63 - 0.60 - 4.70
Previous job unskilled worker 0.26 - 0.29 - -5.01
Training start in second quarter of unemployment 0.45 - 0.46 - -0.38
Caseworker characteristics
Age in 10 years 4.43 1.16 4.44 1.16 -0.77
*French 0.37 1.29 1.14 2.04 -31.79
*Italian 0.11 0.70 0.34 1.19 -16.43
Female 0.45 - 0.41 - 6.94
*French 0.02 - 0.09 - -22.33
*Italian 0.01 - 0.02 - -6.15
Tenure in employment office in years 5.54 3.23 5.86 3.31 -6.84
*French 0.47 1.78 1.59 3.07 -31.36
*Italian 0.21 1.39 0.60 2.29 -14.58
Own experience of unemployment 0.63 - 0.63 - 0.54
*French 0.05 - 0.17 - -26.33
*Italian 0.02 - 0.05 - -11.73
Indicator for missing caseworker characteristics 0.04 - 0.04 - 0.13
Education: above vocational training 0.45 - 0.43 - 2.36
*French 0.04 - 0.10 - -17.68
*Italian 0.01 - 0.03 - -9.46
Education: tertiary track 0.21 - 0.24 - -4.68
*French 0.02 - 0.09 - -21.92
*Italian 0.00 - 0.02 - -8.25
Degree in vocational training for caseworkers 0.26 - 0.23 - 5.63
*French 0.00 - 0.01 - -9.64
*Italian 0.01 - 0.04 - -11.28
Table B.1 to be continued

33
Table B.1 continued

Participants Non-Participants Std. Diff.


Mean S.D. Mean S.D. in %
Allocation of unemployed to caseworkers
By industry 0.66 - 0.53 - 17.73
*French 0.05 - 0.10 - -12.88
*Italian 0.01 - 0.04 - -11.32
By occupation 0.58 - 0.56 - 3.08
*French 0.06 - 0.17 - -25.14
*Italian 0.01 - 0.05 - -14.27
By age 0.04 - 0.03 - 2.58
By employability 0.07 - 0.07 - 0.12
By region 0.09 - 0.12 - -7.55
Other 0.07 - 0.07 - -1.37
Local labour market characteristics
Frenchs speaking employment office 0.08 - 0.25 - -33.30
Italian speaking employment office 0.03 - 0.08 - -16.81
Cantonal unemployment rate in % 3.64 0.77 3.75 0.86 -9.23
*French 0.32 1.10 1.05 1.86 -33.93
*Italian 0.11 0.69 0.34 1.16 -16.61
GDP per capita in the canton in CHF 10,000 5.13 0.92 4.92 0.93 15.75
Additional variables not used in propensity score
Qualification: some degree 0.60 - 0.56 - 5.19
Employability rating: high 0.11 - 0.12 - -3.62
Lives in small city 0.68 - 0.70 - -3.64
Number of people in household 0.21 0.13 0.22 0.13 -2.75
Same gender 0.58 - 0.58 - 0.18
Age caseworker - age unemployed 7.96 12.57 8.69 12.63 -4.05
Same age 5 years 0.24 - 0.23 - 1.94
Both primary education 0.67 - 0.72 - -8.18
Both secondary education 0.28 - 0.31 - -4.45
Both upper secondary education 0.37 - 0.42 - -8.18
Both tertiary education 0.53 - 0.56 - -3.41
Same gender, age, and education 0.03 - 0.03 - -1.89
German speaking employment office 0.89 - 0.67 - 39.67
Number of caseworker 989 1,282
Number of observations 12,998 72,187
Note: Unconditional means and standard deviations by participation status. Standardized differences between
participants and non-participants in percent.

34
Appendix C: Identification of CATE
Proof of Theorem 1
) E X |Z = z , D = 1 d E (Y d | Z= z , X ) | Z= z ,
E (Y d | Z= z=
= E X |Z = z , D = 1 d E (Y=
d
=
| Z z, X , D d=) | Z z ,
= E X |Z = z , D = 1 d [ E (Y = =
| Z z, X , D d=) | Z z ].

1 d ) =E X |Z = z , D = 1 d E (Y d | Z =z , X , D =
E (Y d | Z =z , D = 1 d ,
1 d ) | Z =z , D =
= E X |Z = z , D = 1 d E (Y d | Z = z , X , D = d ) | Z = z , D = 1 d ,
= E X |Z = z , D = 1 d [ E (Y | Z = z , X , D = d ) | Z = z , D = 1 d ] .

35
Appendix D: Selection model, balancing tests, and common
support
Table D.1: Propensity score

Marg. eff. S.E. p-value


Individual characteristics
Female 0.01 0.00 0.11
*French 0.01 0.01 0.22
*Italian -0.03 0.02 0.11
Mother tongue other than German, French, Italian -0.01** 0.01 0.01
*French -0.03** 0.01 0.01
*Italian -0.01 0.02 0.48
Qualification: unskilled 0.01* 0.01 0.06
*French 0.10*** 0.02 0.00
*Italian 0.05** 0.02 0.02
Qualification: semiskilled 0.00 0.01 0.79
*French 0.06*** 0.01 0.00
*Italian 0.03 0.03 0.27
Qualification: skilled without degree 0.01* 0.01 0.09
*French -0.03 0.03 0.33
*Italian 0.02 0.03 0.44
Number of unemployment spells in last two years -0.01*** 0.00 0.00
*French 0.00 0.00 0.52
*Italian 0.00 0.01 0.52
Fraction of months employed in last two years 0.03*** 0.01 0.00
*French -0.03 0.02 0.12
*Italian -0.05* 0.02 0.05
Employability rating: low -0.04*** 0.01 0.00
*French 0.10*** 0.03 0.00
*Italian 0.13*** 0.03 0.00
Employability rating: medium -0.02 0.01 0.16
*French 0.09*** 0.02 0.00
*Italian 0.07*** 0.02 0.00
Age in 10 year -0.01 0.01 0.45
Age squared / 10,000 0.21 0.17 0.22
Married 0.00 0.00 0.57
Foreigner with B permit -0.02*** 0.01 0.00
Foreigner with C permit 0.00 0.00 0.68
Lives in big city -0.01 0.01 0.49
Lives in medium sized city 0.02*** 0.01 0.00
Past income in CHF 10,000 0.09*** 0.01 0.00
Number of employment spells in last 5 years -0.08*** 0.01 0.00
Table D.1 to be continued

36
Table D.1 continued

Marg. eff. S.E. p-value


Individual characteristics
Previous job in primary sector -0.04*** 0.01 0.00
Previous job in secondary sector 0.04*** 0.01 0.00
Previous job in tertiary sector 0.01** 0.01 0.01
Foreigner with mother tongue in canton's language 0.03*** 0.00 0.00
Previous job self-employed -0.09*** 0.02 0.00
Previous job manager -0.05*** 0.01 0.00
Previous job skilled worker -0.02** 0.01 0.01
Previous job unskilled worker -0.02** 0.01 0.04
Training start in second quarter of unemployment 0.02*** 0.00 0.00
Caseworker characteristics
Age in 10 years 0.00 0.00 0.51
*French 0.00 0.00 0.25
*Italian 0.00 0.00 0.80
Female 0.02** 0.01 0.01
*French -0.05* 0.03 0.05
*Italian 0.05* 0.02 0.05
Tenure in employment office in years 0.00 0.00 0.15
*French -0.01 0.01 0.17
*Italian 0.00 0.00 0.48
Own experience of unemployment 0.01 0.01 0.48
*French -0.03 0.03 0.32
*Italian 0.05 0.03 0.11
Indicator for missing caseworker characteristics 0.00 0.02 0.92
Education: above vocational training 0.00 0.01 0.99
*French 0.01 0.03 0.80
*Italian 0.01 0.03 0.64
Education: tertiary track 0.00 0.01 0.84
*French -0.02 0.04 0.51
*Italian 0.01 0.04 0.76
Degree in vocational training for caseworkers 0.02* 0.01 0.09
*French -0.09 0.07 0.24
*Italian 0.02 0.03 0.51
Table D.1 to be continued

37
Table D.1 continued

Marg. eff. S.E. p-value


Allocation of unemployed to caseworkers
By industry 0.02*** 0.01 0.00
*French 0.05* 0.03 0.06
*Italian -0.04* 0.02 0.09
By occupation 0.02** 0.01 0.01
*French 0.04* 0.03 0.09
*Italian -0.06** 0.03 0.03
By age 0.01 0.01 0.38
By employability -0.02 0.01 0.14
By region -0.04** 0.01 0.01
Other -0.03** 0.01 0.02
Local labour market characteristics
Frenchs speaking employment office -0.06 0.09 0.45
Italian speaking employment office -0.19 0.11 0.10
Cantonal unemployment rate in % 0.03*** 0.01 0.00
*French -0.07*** 0.01 0.00
*Italian -0.03* 0.02 0.09
GDP per capita in the canton in CHF 10,000 -0.03*** 0.01 0.00
Number of caseworker 1,282
Number of observations 85,185
Note: Binary dependent variable: participation in job search program. Model includes constant term. Average marginal
effects reported. Bootstrapped standard errors are clustered at caseworker level (4999 replications). *, **, ***
means statistically different from zero at the 10%, 5%, 1% level, respectively.

38
Table D.2: Balancing tests

Part. Non-Part. Std. Diff. Std. Diff.


Mean Mean in % pre in % post t-statistic
Individual characteristics
Female 0.44 0.44 0.56 -0.21 -0.31
*French 0.08 0.08 -19.51 -0.01 -0.01
*Italian 0.03 0.03 -11.85 -0.66 -0.98
Mother tongue other than German, French, Italian 0.32 0.32 -5.40 0.23 0.34
*French 0.05 0.05 -18.01 -0.50 -0.74
*Italian 0.02 0.02 -9.80 0.30 0.43
Qualification: unskilled 0.24 0.24 -1.80 -0.31 -0.46
*French 0.05 0.05 -8.36 -0.56 -0.83
*Italian 0.02 0.02 -8.62 -0.44 -0.65
Qualification: semiskilled 0.16 0.16 -2.45 -0.27 -0.39
*French 0.03 0.04 -12.10 -1.43 -2.13**
*Italian 0.01 0.01 -5.16 -0.45 -0.66
Qualification: skilled without degree 0.04 0.04 -4.72 1.42 2.04**
*French 0.01 0.01 -11.22 1.30 1.84*
*Italian 0.01 0.01 -4.11 1.46 2.05**
Number of unemployment spells in last two years 0.59 0.57 -13.86 1.01 1.48
*French 0.12 0.11 -16.84 0.36 0.53
*Italian 0.05 0.05 -10.16 -0.68 -1.02
Fraction of months employed in last two years 0.80 0.80 12.57 -0.48 -0.71
*French 0.13 0.13 -30.04 -0.83 -1.21
*Italian 0.05 0.05 -15.77 -1.23 -1.82*
Employability rating: low 0.15 0.14 -3.98 1.00 1.45
*French 0.01 0.01 -9.87 -0.06 -0.08
*Italian 0.01 0.01 -4.94 0.03 0.05
Employability rating: medium 0.75 0.75 5.80 0.02 0.02
*French 0.14 0.15 -26.32 -0.70 -1.03
*Italian 0.04 0.04 -11.57 -0.20 -0.3
Age in 10 year 3.65 3.67 5.60 -1.57 -2.3**
Age squared / 10,000 0.14 0.14 5.61 -1.59 -2.33**
Married 0.48 0.48 -2.35 0.36 0.53
Foreigner with B permit 0.13 0.13 -6.96 0.06 0.08
Foreigner with C permit 0.25 0.25 -3.12 0.61 0.89
Lives in big city 0.16 0.17 -0.05 -0.98 -1.43
Lives in medium sized city 0.15 0.14 4.83 1.09 1.59
Past income in CHF 10,000 4.21 4.24 14.50 -0.85 -1.28
Number of employment spells in last 5 years 0.12 0.12 -14.70 1.19 1.73*
Previous job in primary sector 0.08 0.08 -10.44 0.45 0.65
Previous job in secondary sector 0.14 0.14 6.04 -0.17 -0.24
Previous job in tertiary sector 0.59 0.60 7.07 -0.72 -1.05
Foreigner with mother tongue in canton's language 0.11 0.11 2.40 0.10 0.14
Table D.2 to be continued

39
Table D.2 continued

Part. Non-Part. Std. Diff. Std. Diff.


Mean Mean in % pre in % post t-statistic
Previous job self-employed 0.01 0.01 -3.01 -0.40 -0.6
Previous job manager 0.07 0.07 1.85 0.10 0.14
Previous job skilled worker 0.59 0.60 4.70 -1.19 -1.74*
Previous job unskilled worker 0.30 0.30 -5.01 0.89 1.29
Training start in second quarter of unemployment 0.43 0.45 -0.38 -1.72 -2.51**
Caseworker characteristics
Age in 10 years 4.43 4.43 -0.77 -0.24 -0.36
*French 0.79 0.80 -31.79 -0.50 -0.73
*Italian 0.27 0.29 -16.43 -0.92 -1.35
Female 0.41 0.41 6.94 0.35 0.52
*French 0.05 0.05 -22.33 0.15 0.22
*Italian 0.02 0.02 -6.15 -0.15 -0.22
Tenure in employment office in years 5.77 5.75 -6.84 0.29 0.44
*French 1.09 1.09 -31.36 0.22 0.32
*Italian 0.48 0.52 -14.58 -1.26 -1.87*
Own experience of unemployment 0.63 0.63 0.54 -0.36 -0.52
*French 0.11 0.11 -26.33 -0.46 -0.67
*Italian 0.04 0.04 -11.73 -0.71 -1.04
Indicator for missing caseworker characteristics 0.04 0.04 0.13 -0.24 -0.35
Education: above vocational training 0.44 0.45 2.36 -0.95 -1.39
*French 0.08 0.08 -17.68 -0.50 -0.73
*Italian 0.02 0.02 -9.46 -1.73 -2.6***
Education: tertiary track 0.23 0.22 -4.68 1.41 2.05**
*French 0.06 0.06 -21.92 2.44 3.48***
*Italian 0.01 0.01 -8.25 0.18 0.27
Degree in vocational training for caseworkers 0.23 0.24 5.63 -0.40 -0.58
*French 0.00 0.00 -9.64 -0.47 -0.7
*Italian 0.03 0.04 -11.28 -1.36 -2.03**
Allocation of unemployed to caseworkers
By industry 0.58 0.58 17.73 -0.56 -0.82
*French 0.09 0.09 -12.88 -0.54 -0.78
*Italian 0.03 0.03 -11.32 -1.16 -1.73*
By occupation 0.56 0.56 3.08 0.01 0.02
*French 0.13 0.13 -25.14 0.13 0.19
*Italian 0.04 0.04 -14.27 -1.19 -1.76*
By age 0.03 0.03 2.58 0.53 0.77
By employability 0.08 0.07 0.12 1.23 1.78*
By region 0.11 0.11 -7.55 -1.31 -1.92*
Other 0.08 0.07 -1.37 1.89 2.72***
Table D.2 to be continued

40
Table D.2 continued

Part. Non-Part. Std. Diff. Std. Diff.


Mean Mean in % pre in % post t-statistic
Local labour market characteristics
Frenchs speaking employment office 0.17 0.18 -33.30 -0.41 -0.59
Italian speaking employment office 0.06 0.07 -16.81 -1.16 -1.72*
Cantonal unemployment rate in % 3.68 3.68 -9.23 0.29 0.44
*French 0.69 0.71 -33-93 -1.02 1.50
*Italian 0.27 0.29 -16.61 -1.10 -1.62
GDP per capita in the canton in CHF 10,000 4.98 4.98 15.75 -0.18 -0.26
Additional variables not used in propensity score
Qualification: some degree 0.56 0.56 5.19 -0.09 -0.13
Employability rating: high 0.10 0.10 -3.62 -1.19 -1.76*
Lives in small city 0.69 0.69 -3.64 -0.05 -0.07
Number of people in household 0.22 0.22 -2.75 0.31 0.45
Same gender 0.60 0.59 0.18 1.99 2.91***
Age caseworker - age unemployed 8.72 8.55 -4.05 0.99 1.45
Same age 5 years 0.24 0.23 1.94 0.95 1.38
Both primary education 0.70 0.71 -8.18 -2.41 -3.49***
Both secondary education 0.30 0.31 -4.45 -1.29 -1.89*
Both upper secondary education 0.40 0.41 -8.18 -1.61 -2.35**
Both tertiary education 0.55 0.56 -3.41 -1.73 -2.53**
Same gender, age, and education 0.03 0.03 -1.89 -1.61 -2.4**
German speaking employment office 0.76 0.76 39.67 1.04 1.52
Number of observations trimmed 584 6246
Number of observations used in estimation 12,414 65,941

Note: Means, differences, standardized differences, and t-statistic of two-sided t-test after inverse probability weighting
and common support adjustment. *, **, *** means statistically different from zero at the 10%, 5%, 1% level,
respectively.

Common support enforced by trimming observations below the 0.5 quantile the

participants and the 99.5 quantile of non-participants. Total number of trimmed observations:

6,767 (579 participants, 6,188 non-participants)

41
Figure D.1: Distribution propensity score

Note: Dashed lines show the lower and upper threshold of trimming.

42
Appendix E: Additional outcome variables
Table E.1: Coefficients of selected variables for long-term outcomes in prediction sample

Cumulated employment after registration in months 1-31 25-31


Coef. S.E. Coef. S.E.
Job search program (treatment) -1.37*** 0.32
Female * CW education: above vocational training 0.07 0.54
Unskilled * CW education: above vocational training -1.24 1.56
Unskilled * prev. job unskilled 4.66*** 1.58
# of unemployment spells in last two years * both primary education 0.29 0.20
Fraction of months employed in last two years * past income 57 - 75k 0.32 0.63
GDP per capita * prev. job self-employed 4.56 5.60
CW education: above vocational training * past income 25 - 50k 1.09* 0.66
CW education: tertiary track * past income 25 - 50k 0.21 0.84
Degree in vocational training for caseworkers * past income 50 - 75k -0.70 0.81 No
Married * 50 - 75k -0.37 0.70 heterogeneity
detected
Foreigner with C permit * 50 - 75k 0.32 0.76
Medium city * prev. job unskilled -0.48 0.94
Single household * zero employment spell last two years -0.24 0.56
Past income 0 - 25k * # employment spells past five years 3.60 2.47
# employment spells past five years * both primary education -1.41 1.87
# employment spells past five years * same gender, age, and education -6.34 5.93
Zero employment spell last two years * skilled worker -0.56 0.53
Prev. job in primary sector * unskilled -0.20 1.3
Unskilled * both primary education 0.09 0.56
Employment agency 44 -0.98 1.27
# of selected variables 20 of 1,268 0 of 1,268
# significant at 10% 2 0
# significant at 5% 1 0
# significant at 1% 1 0
# of caseworkers 1,282
# of observations 39,216
Note: *, **, *** means statistically different from zero at the 10%, 5%, 1%. Standard errors clustered at caseworker level.

43
Figure E.1: Heterogeneity for different time horizons individual characteristics: 1-12 months

44
Figure E.2: Heterogeneity for different time horizons individual characteristics: 1-31 months

45
Figure E.3: Heterogeneity for different time horizons individual characteristics: 25-31 months

46
Figure E.4: Heterogeneity for different time horizons caseworker characteristics: 1-12 months

47
Figure E.5: Heterogeneity for different time horizons caseworker characteristics: 1-31 months

48
Figure E.6: Heterogeneity for different time horizons caseworker characteristics: 25-31 months

49
References
Abbring, J.H. and van den Berg, G.J. (2003): The Non-Parametric Identification of Treatment Effects in
Duration Models, Econometrica, 71, 1491-1517.

Abbring, J.H. and van den Berg, G.J. (2004): Analyzing the Effect of Dynamically Assigned Treatments using
Duration Models, Binary Treatment Models, and Panel Data Models, Empirical Economics, 29, 5-20.

Angrist, J.D., J.-S. Pischke (2010): The Credibility Revolution in Empirical Economics: How Better Research
Design is Taking the Con out of Econometrics, Journal of Economic Perspectives, 24 (2), 330.

Athey, S., G.W. Imbens (2016a): The Econometrics of Randomized Experiments, Working Paper,
arXiv:1607.00698.

Athey, S., G.W. Imbens (2016b): The State of Applied Econometrics - Causality and Policy Evaluation, Working
Paper, arXiv:1607.00699.

Athey, S., G.W. Imbens (2016c): Recursive Partitioning for Heterogeneous Causal Effects, Proceedings of the
National Academy of Science of the United States of America, 113 (27), 7353-7360.

Bhattacharya, D., P. Dupas (2012): Inferring Welfare Maximizing Treatment Assignment Under Budget
Constraints, Journal of Econometrics, 167 (1), 168-196.

Behncke, S., M. Frlich, M. Lechner (2010a): A Caseworker like Me Does the Similarity between the
Unemployed and their Caseworkers Increase Job Placements? Economic Journal, 120 (549), 1430-1459.

Behncke, S., M. Frlich, M. Lechner (2010b): Unemployed and their Caseworkers: Should they be Friends or
Foes? Journal of the Royal Statistical Society, Series A, 173 (1), 67-92.

Bell, S., and L. Orr (2002): Screening (and Creaming?) Applicants to Job Training Programs: The AFDC
Homemaker Home Health Aide Demonstration, Labour Economics, 9(2), 279-302.

Belloni, A., V. Chernozhukov, C. Hansen (2013): Inference on Treatment Effects after Selection amongst High-
Dimensional Controls, Review of Economic Studies, 81 (2), 608-650.

Biewen, M., B. Fitzenberger, A. Osikominu, and M. Paul (2014): The Effectiveness of Public Sponsored Training
Revisited: The Importance of Data and Methodological Choices, Journal of Labor Economics, 32 (4), 837-
897.

Black D.A., J.A. Smith, M.C. Berger, B.J. Noel (2003): Is the Threat of Reemployment Services more Effective
than the Services Themselves? Evidence from Random Assignment in the UI System, American Economic
Review, 93 (4), 1313-1327.

Blasco, S, M. Rosholm (2011): The Impact of Active Labour Market Policy on Post-Unemployment Outcomes:
Evidence from a Social Experiment in Denmark, IZA Discussion Paper, No. 5631.

Blundell, R., M.C. Dias, C. Meghir, J.V. Reenen (2004): Evaluating the Employment Impact of a Mandatory Job
Search Program, Journal of the European Economic Association, 2 (4), 569-606.

Breiman, L. (1996): Bagging predictors, Machine Learning, 24 (2): 123140.

Bhlmann, P., S. van de Geer (2011): Statistics for High-Dimensional Data: Methods, Theory and Applications,
Springer, Heidelberg.

Bundesamt fr Statistik (2016): Gross Domestic Product per Capita.

50
Card, D, J. Kluve, A. Weber (2010): Active Labour Market Policy Evaluations: A Meta Analysis, Economic
Journal, 120 (548), F452-F477.

Card, D, J. Kluve, A. Weber (2015): What Works? A Meta Analysis of Recent Active Labor Market Program
Evaluations, IZA Discussion Paper, No. 9236.

Casey, K., R. Glennerster, E. Miguel (2012): Reshaping Institutions: Evidence on Aid Impacts Using a Pre-
analysis Plan, Quarterly Journal of Economics, 124 (4), 1755-1812.

Chen, S., L. Tian, T. Cai, M. Yu (2017): A General Statistical Framework for Subgroup Identification and
Comparative Treatment Scoring, Biometrics, forthcoming.

Ciarleglio, A., E. Petkova, R.T. Ogden, T. Tarpey (2015): Treatment Decisions Based on Scalar and Functional
Baseline Covariates, Biometrics, 71, 884894.

Cottier, L., P. Kempeneers, Y. Flckiger, R. Lalive (2017): Does Intensive Job Search Assistance Help Job
Seekers Find and Keep Jobs?, mimeo.

Cox, D. (1975): A Note on Data-Splitting for the Evaluation of Significance Levels, Biometrika, 62 (2), 441-
444.

Crpon, B., E. Duflo, M. Gurgand, R. Rathelot, P. Zamora (2013): Do Labor Market Policies Have Displacement
Effects? Evidence from a Clustered Randomized Experiment, Quarterly Journal of Economics, 128 (2), 531-
80.

Crpon B, G. van den Berg (2016): Active Labor Market Policies, IZA Discussion Paper, No. 10321.

Fithian, W., D. Sun, J. Taylor (2015): Optimal Inference After Model Selection, Working Paper,
arXiv:1410.2597.

Dehejia, R. (2005): Program Evaluation as a Decision Problem, Journal of Econometrics, 125 (1-2), 141-173.

Dolton, P., D. ONeill (2002): The Long-Run Effects of Unemployment Monitoring and Work-Search Programs:
Experimental Evidence from the United Kingdom, Journal of Labor Economics, 20 (2), 381-403.

Fredriksson, P. and Johannsen, P. (2008): Dynamic Treatment Assignment - The Consequences for Evaluations
using Observational Data, Journal of Business and Economic and Statistics, 26, 435-445.

Foster, J.C., J.M.G. Taylor, S.J. Ruberg (2011): Subgroup Identification from Randomized Clinical Trial Data,
Statistics in Medicine, 30 (24), 2867-2880.

Gautier P., P. Muller, B. van der Klaauw, M. Rosholm, M. Svarer (2017): Estimating Equilibrium Effects of Job
Search Assistance, Working Paper.

Gerfin M., M. Lechner (2002): A Microeconometric Evaluation of Active Labour Market Policy in Switzerland,
Economic Journal, 112 (482), 854-893.

Graversen, B.K., J.C. Van Ours (2008): How to Help Unemployed find Jobs Quickly: Experimental Evidence
from a Mandatory Activation Program Journal of Public Economics, 92 (10-11), 2020-2035.

Green, D.P., H.L. Kern (2012): Modelling Heterogeneous Treatment Effects in Survey Experiments with
Bayesian Additive Regression Trees, Public Opinion Quarterly, 76 (3), 491-511.

Grimmer, J., S. Messing, S.J. Westwood (2016), Estimating Heterogeneous Treatment Effects and the Effects of
Heterogeneous Treatments with Ensemble Methods, Working Paper.

51
Hastie, T., R. Tibshirani, M. Wainwright (2015): Statistical Learning with Sparsity The Lasso and its
Generalisations, CRC Press.

Heckman, J. and Navarro, S. (2007): Dynamic Discrete Choice and Dynamic Treatment Effects, Journal of
Econometrics, 136, 341-396.

Hirano, K., G.W. Imbens, G. Ridder (2003): Efficient Estimation of Average Treatment Effects Using the
Estimated Propensity Score, Econometrica, 71 (4), 1161-1189.

Hoerl, A. E., and R. W. Kennard (1970): Ridge Regression: Biased Estimation for Nonorthogonal Problems,
Technometrics, 12(1) 55-67.

Horowitz, J. L. (2015): Variable Selection and Estimation in High-Dimensional Models, Canadian Journal of
Economics, 48 (2), 389-407.

Huber, M., M. Lechner, G. Mellace (2017): Why Do Tougher Caseworkers Increase Employment? The Role of
Programme Assignment as a Causal Mechanism, Review of Economics and Statistics, 99 (1), 180-183.

Imai K., A. Strauss (2011): Estimation of Heterogeneous Treatment Effects from Randomized Experiments, with
Applications to the Optimal Planning of the Get-Out-of-the-Vote Campaign, Political Analysis, 19 (1), 1-19.

Imai K., M. Ratkovic (2013): Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation,
Annals of Applied Statistics, 7 (1), 443-470.

Imbens, G.W., J.M. Wooldridge (2009): Recent Developments in the Econometrics of Program Evaluation,
Journal of Economic Literature, 47 (1), 5-86.

Lalive, R., J.C. van Ours, J. Zweimller (2008): The Impact of Active Labor Market Programs on the Duration
of Unemployment, Economic Journal, 118 (525), 235-257.

Lan, W., P. Zhong, R. Li, H. Wang, C. Tsai (2016): Testing a Single Regression Coefficient in High Dimensional
Linear Models, Journal of Econometrics, 195 (1), 154-168.

Lechner, M. (1999): Earnings and Employment Effects of Continuous Off-the-job Training in East Germany after
Unification, Journal of Business Economics and Statistics, 17, 74-90.

Lechner, M. (2009): Sequential Causal Models for the Evaluation of Labor Market Program, Journal of Business
and Economic Statistics, 27, 7183.

Lechner, M., J.A. Smith (2007): What is the Value Added by Caseworkers? Labour Economics, 14 (2), 135-
151.

Lechner, M., C. Wunsch (2013): Sensitivity of Matching-Based Program Evaluations to the Availability of
Control Variables, Labour Economics, 21 (C), 111-121.

List, J. Shaikh, A., Y. Xu (2016): Multiple Hypothesis Testing in Experimental Economics, NBER Working
Paper No. 21875.

Meyer, B.D. (1995): Lessons from the US Unemployment Insurance Experiments, Journal of Economic
Literature, 33 (1), 91-131.

Morlok, M., D. Liechti, R. Lalive. A. Osikominu, J. Zweimller (2014): Evaluation der arbeitsmarktlichen
Massnahmen: Wirkung auf Bewerbungsverhalten und chancen, SECO Publikationen, Arbeitsmarktpolitik
No. 41.

Olken, B. (2015): Promises and Perils of Pre-Analysis Plans, Journal of Economic Perspectives, 29 (3), 61-80.

52
Qian, M., S.A. Murphy (2011): Performance Guarantees for Individualized Treatment Rules, Annals of
Statistics, 39, 11-80.

Robins, J.M. (1986): A new approach to causal inference in mortality studies with sustained exposure periods
application to control of the healthy worker survivor effect, Mathematical Modelling, 7, 13931512, with
1987 Errata to A new approach to causal inference in mortality studies with sustained exposure periods
application to control of the healthy worker survivor effect. Computers & Mathematics with Applications, 14,
917921; 1987 Addendum to A new approach to causal inference in mortality studies with sustained exposure
periodsapplication to control of the healthy worker survivor effect, Computers & Mathematics with Appli-
cations, 14, 923945; and 1987 Errata to Addendum to A new approach to causal inference in mortality
studies with sustained exposure periodsapplication to control of the healthy worker survivor effect, Com-
puters & Mathematics with Applications, 18, 477.

Rosholm, M. (2008): Experimental Evidence on the Nature of the Danish Employment Miracle, IZA Discussion
Paper No. 3620.

Rubin, D.B. (1974): Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies, Jour-
nal of Educational Psychology, 66 (5), 688-701.

SECO, State Secretary for Economic Affairs (2017): Die Lage auf dem Arbeitsmarkt im Februar 2017.

Sianesi, B. (2004): An Evaluation of the Swedish System of Active Labour Market Programmes in the 1990s,
Review of Economics and Statistics, 86, 133-155.

Signorovitch, J.E. (2007): Identifying Informative Biological Markers in High-Dimensional Genomic Data and
Clinical Trials, PhD thesis, Harvard University.

Taddy, M., M. Gardner, L. Chen, D. Draper (2015): A Nonparametric Bayesian Analysis of Heterogeneous
Treatment Effects in Digital Experimentation, Working Paper, arXiv:1412.8563.

Tian, L., A.A. Alizadeh, A.J. Gentles, R. Tibshirani (2014): A Simple Method for Estimating Interactions
Between a Treatment and a Large Number of Covariates, Journal of the American Statistical Association, 109
(508), 1517-1532.

Tibshirani, R. (1996): Regression Shrinkage via the Lasso, Journal of the Royal Statistical Society. Series B, 58
(1), 267-288.

Van den Berg, G.J, A. Bergemann, M. Caliendo (2009): The Effect of Active Labor Market Programs on Not-
Yet Treated Unemployed Individuals, Journal of the European Economic Association, Papers and
Proceedings, 7 (2-3), 606-616.

Van den Berg, G.J., B. van der Klaauw (2006): Counseling and Monitoring of Unemployed Workers: Theory and
Evidence from a Controlled Social Experiment, International Economic Review, 47 (3), 895-936.

Vansteelandt, S., T.J. VanderWeele, E.J. Tchetgen, J.M. Robins (2008): Multiply Robust Inference for Statistical
Interactions, Journal of the American Statistical Association, 103, 16931704.

Vikstrm, J., M. Rosholm, M. Svare (2013): The Relative Efficiency of Active Labour Market Policies: Evidence
from a Social Experiment and Non-Parametric Methods, Labour Economics, 24, 58-67.

Wunsch, C., M. Lechner (2008): What Did All the Money Do? On the General Ineffectiveness of Recent West
German Labour Market Programmes, Kyklos, 61 (1), 134-174.

53
Xu, Y., M. Yu, Y.-Q. Zhao, Q. Li, S. Wang, J. Shao (2015): Regularized Outcome Weighted Subgroup
Identification for Differential Treatment Effects, Biometrics, 71 (3), 645653.

Zhang, B., A.A. Tsiatis, E.B. Laber, M. Davidian (2012): A Robust Method for Estimating Optimal Treatment
Regimes, Biometrics, 68, 10101018.

Zhao, Y., D. Zeng, E.B. Laber, M.R. Kosorok (2015): New Statistical Learning Methods for Estimating Optimal
Dynamic Treatment Regimes, Journal of the American Statistical Association, 110 (510), 583-598.

Zhao, Y., D. Zeng, A.J. Rush, M.R. Kosorok (2012): Estimating Individualized Treatment Rules using Outcome
Weighted Learning Journal of the American Statistical Association, 107, 11061118.

Zou, H. (2006): The Adaptive Lasso and its Oracle Properties, Journal of the American Statistical, Association,
101 (476), 1418-1429.

54

You might also like