You are on page 1of 16

Games and Economic Behavior 79 (2013) 132147

Contents lists available at SciVerse ScienceDirect


Games and Economic Behavior
www.elsevier.com/locate/geb
The value of recommendations

Jeanine Mikls-Thal
a,
, Heiner Schumacher
b,1
a
Simon Graduate School of Business, University of Rochester, Carol Simon Hall 3-141, Rochester, NY 14610, USA
b
Goethe-University Frankfurt, Grueneburgplatz 1, Rechts- und Wirtschaftswissenschaften, DE-60323 Frankfurt am Main, Germany
a r t i c l e i n f o a b s t r a c t
Article history:
Received 20 October 2010
Available online 29 January 2013
JEL classication:
C73
D83
L15
Keywords:
Repeated games
Moral hazard
Imperfect monitoring
Information sellers
Many markets without repeated sellerbuyer relations feature third-party monitors that
sell recommendations. We analyze the prot-maximizing recommendation policies of
such monitors. In an innitely repeated game with seller moral hazard and short-lived
consumers, a monopolistic monitor with superior information about the sellers past effort
decisions sells recommendations about the seller to consumers. We show that the monitor
has an incentive to make its recommendations hard to predict, which in general leads to
inecient effort provision by the seller. These results hold under perfect and imperfect
monitoring and in a variety of informational setting. When there are multiple competing
sellers, the conict between the monitors prot-maximization objective and ecient effort
provision is mitigated.
2013 Elsevier Inc. All rights reserved.
1. Introduction
Third party recommendations inuence the trade of many goods and services. When selecting restaurants or hotels,
many consumers follow the recommendations of guidebooks. Durable good purchase decisions are often guided by the
recommendations of consumer magazines. Some sellers of recommendations are very inuential. Lonely Planet tourist guides
for instance are the preferred choice of many younger travelers. As anybody who ever visited one of the restaurants or
hostels featured in a Lonely Planet guide can attest, this strong position in the guidebook market translates into substantial
power in channeling travelers to certain businesses. Similarly, many consumers rely on expensive gourmet guides, such as
the Michelin Guide or Gault Millau, to decide where to dine.
A seller of recommendations (henceforth called monitor) can play an important role in alleviating moral hazard prob-
lems. When consumers interact only once with a rm, like tourists in a foreign city, they usually know very little about
the rms track record. As is well known, this implies that the rm will lack the incentive to exert costly effort to provide
quality. A monitor who has information about past outcomes can potentially solve this incentive problem by rewarding
good outcomes with positive recommendations and punishing poor outcomes with negative recommendations. However,
as our analysis will show, a prot-maximizing monitor does not generally want to adopt a recommendation policy that
induces ecient effort by the rms it recommends. Concerned with selling its advice protably, the monitor instead wants

We are grateful to the Editor, two anonymous referees, Matthias Blonski, Tore Ellingsen, Guido Friebel, Thomas Gall, Philipp Kircher, Michael Kosfeld,
Timoy Mylovanov, Michael Raith, Ernst-Ludwig von Thadden, and audiences at the University of Frankfurt, the SFB/TR 15 Conference 2008, and the 2009
EEA Meeting for helpful comments. The usual disclaimer applies.
*
Corresponding author. Fax: +1 585 273 1140.
E-mail addresses: jeanine.miklos-thal@simon.rochester.edu (J. Mikls-Thal), heiner.schumacher@econ.uni-frankfurt.de (H. Schumacher).
1
Fax: +49 798 35021.
0899-8256/$ see front matter 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.geb.2013.01.005
J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147 133
to generate the right amount of unpredictability about is recommendations, as inducing an easy-to-predict action would
destroy the value of the monitors advice to consumers.
To model this conict between the monitors prot-maximization objective and eciency, we analyze the innitely
repeated interaction between a long-lived monitor, a long-lived rm, and an innite sequence of short-lived consumers.
In each period, the rm decides whether to exert costly effort, and the current consumer decides whether to trade with
the rm. The value of trade is positive if and only if the rm exerts effort. Our model departs from the canonical repeated
game in that consumers cannot observe the outcomes of past transactions (they observe past recommendations though).
Thus, without the monitor who does observe past outcomes, there would be no trade in equilibrium. In each period, the
monitor sells a guide to consumers that contains a recommendation of the form I (do not) recommend buying from the
rm. The monitors recommendation policy, set at the beginning of the game, determines the current recommendation
based on the history of the game including past outcomes of trade. We allow for imperfect monitoring of the rms effort
by the monitor.
Since we are interested in the impact of the monitors prot maximization objective, our analysis focuses on equilibria
that maximize the monitors payoff. A consumer has positive willingness to pay for the monitors guide only if it improves
her ability to predict the rms effort and thus make a well-informed trading decision. Consider the perfect monitoring case.
If the discount factor is high enough, the rm is willing to always follow the monitors recommendation (i.e., exert effort
if and only if it is recommended) provided consumers trade with the rm only if it is recommended and a bad outcome
triggers the loss of all future business. The monitor then maximizes the value of its guide by recommending the rm with
a probability strictly below 1, although eciency would call for trade and effort provision in every period. Intuitively, to sell
its guide at the high price, the monitor wants to induce a hard-to-predict action. This creates a need for randomization on
the equilibrium path, which leads to ineciency.
The conict between the value of the monitors guide and (constrained) ecient effort provision persists under imperfect
monitoring. For the value of the guide to reach its maximum in all periods, the recommendation policy must be such
that all consumer generations predict the same recommendation probability at some level strictly below 1. The monitor
hence needs to make past recommendations uninformative for consumers, while punishing bad outcomes (which now
happen on the equilibrium path) with phases of negative recommendations to induce effort incentives. We construct such
a recommendation policy and show that if the rm is suciently patient and monitoring close enough to perfect, there
is an equilibrium in which all consumers purchase the guide at the highest possible price. Moreover, if there are some
commitment consumers who trade if and only if the monitor recommends the rm, then for high enough discount factors
and almost perfect monitoring there are no alternative equilibria in which the monitor earns less. However, as in the perfect
monitoring case, any equilibrium in which the monitor earns maximum prots is inecient because trade takes place too
infrequently.
The model lends itself to several extensions. First, we consider various alternative information structures. We show that
if consumers do not observe past recommendations and have little knowledge about which period they are born into, then
an equilibrium in which the value of the guide attains its upper bound exists for a larger range of discount factors than
in the baseline model. Uncertainty about the rms behavior can now be generated by means of long phases of negative
recommendations after bad outcomes instead of recommendation probabilities below 1 after good outcomes. This implies
that the rm has stronger effort incentives, which expands the scope for equilibria in which the monitor can achieve
maximum prots. Another alternative information structure is that, as in the canonical repeated game, consumers have
the same (potentially imperfect) information about past effort decisions as the monitor, for instance thanks to word-of-
mouth communication between consumers. We show that for high enough discount factors and good enough monitoring
technology, there still exist inecient equilibria in which the monitor earns positive prots by acting as a correlating
device for the rm and consumers.
Second, we extend the model to situations in which consumers choose between multiple rms. If consumers face an
outside option of known value (for instance, McDonalds instead of a local restaurant), the maximum value of the guide
may increase or decrease. In either case, welfare in an equilibrium in which the guides value is maximized increases with
the outside option because the prot-maximizing recommendation probability becomes higher. Another possibility is that
consumers can choose between multiple ex ante identical rms, which are all included in the monitors guide. With two
rms, an optimal recommendation policy for the monitor is such that one and only one of the rms is recommended in
every period and consumers predict that each rm is equally likely to be the chosen one in any given period. Consumers
then value the guide because it allows them to choose the right rm. Since one of the rms exerts effort in every pe-
riod, the conict between the monitors prot-maximization objective and welfare is fully resolved in the special case of
undifferentiated rms.
Our results have implications for a wide range of situations in which decisions are inuenced by a third party with
superior information but no immediate stake in the outcome. Consumers are willing to pay more for information provided
by product reviewers, investment advisors, political journalists, and other recommendation sellers, the more it improves
their ability to make good (product adoption, investment, or election and campaign nancing) decisions. The third party
that sells the guide is therefore better off if consumers are more uncertain, and may want to exacerbate such uncertainty
where possible. Whenever eciency calls for little or no variation in decisions over time, the incentive to make information
valuable to consumers is therefore likely to clash with eciency.
134 J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147
Fig. 1. Intra-period timing.
The remainder of the paper is organized as follows. Section 2 describes and discusses the framework. Section 3 contains
the analysis of the baseline model. Section 4 deals with alternative information structures. Section 5 extends the model to
multiple rms. The literature review is relegated to Section 6. Section 7 concludes. Appendix A contains all proofs of results
in the baseline model (Section 3). The proofs of results in the model extensions (Sections 4 and 5) are relegated to the
Online Appendix.
2. The model
Framework Time is discrete and indexed by t {0, 1, . . .}. We consider an innitely repeated game with a long-lived moni-
tor M, a long-lived rm F , and innitely many consumers C
1
, C
2
, . . . , where C
t
is alive in period t only.
At the beginning of each period t 1, C
t
can buy a guide from M. For each t 1, the guide contains a recommendation s
t
that can take on two values, 0 or 1, where s
t
= 1 (s
t
= 0) signies I (do not) recommend buying from F .
2
C
t
decides
whether to purchase the guide (a
t
= 1) or not (a
t
= 0) and observes s
t
if and only if a
t
= 1. F always observes s
t
. The price
of the guide is denoted by q, and Ms marginal cost of selling the guide is normalized to zero.
Next, C
t
decides whether to buy from F (b
t
= 1) or not (b
t
= 0), and F simultaneously decides whether to exert effort
(e
t
= 1) or not (e
t
= 0). The cost of effort for F is c (0, 1), and the price of F s good is p (c, 1). If trade occurs and F
exerts effort in period t, then C
t
s payoff is 1 p and F s period payoff is p c. If trade occurs and F does not exert effort
in period t, then C
t
s payoff is p and F s period payoff is p. If C
t
does not buy from F , both receive a payoff of zero.
At the end of each period t 1, M observes a signal about F s effort. Ms monitoring technology may be imperfect.
If trade occurs in period t and F exerts effort, M erroneously observes a bad outcome ( y
t
= 0) with probability [0, 1),
and a good outcome ( y
t
= 1) with probability 1 . If no trade occurs and/or F does not exert effort, M observes a bad
outcome ( y
t
= 0) with probability 1. To simplify the exposition, it is assumed that (1 )p >c.
3
Let h
M
1
= and denote by h
M
t
= ((s
1
, y
1
, b
1
), . . . , (s
t1
, y
t1
, b
t1
)) the M-history at t 2. Let H
M
be the set of all
nite M-histories. In period 0, M commits to a price q and a recommendation policy P : H
M
[0, 1], where P(h
M
t
) is the
probability that M sets s
t
= 1 when the M-history in period t is given by h
M
t
. F and all consumers observe (q, P). Let P be
the set of all recommendation policies.
Fig. 1 provides an overview of the sequence of events in any period t 1. First, M sets its recommendation s
t
. Sec-
ond, C
t
decides whether to purchase Ms guide. Third, C
t
decides whether to buy from F and F decides whether to exert
effort. Fourth, M observes y
t
.
Strategies and equilibrium We now dene strategies for F and consumers. Let h
C
1
= and denote by h
C
t
=(s
1
, . . . , s
t1
) the
C-history at t 2. Let H
C
be the set of all nite C-histories. Let ({0, 1}) be the set of mixed actions if the pure actions
are 0 and 1. C
t
s strategy is given by
t
= (
t
a
,
t
b
), where
t
a
: P[0, 1] H
C
({0, 1}) determines a
t
as a function of P,
q and h
C
t
, and
t
b
: P[0, 1] H
C
{, 0, 1} ({0, 1}) determines b
t
as a function of P, q, h
C
t
and s
t
(conditional on C
t
having purchased the guide). Denote
C
= {
t
}

t=1
.
Let h
F
1
= and denote by h
F
t
= ((s
1
, e
1
, y
1
, b
1
), . . . , (s
t1
, e
t1
, y
t1
, b
t1
)) the F -history at t 2. Let H
F
be the set
of all nite F -histories. F s strategy
F
: P[0, 1] H
F
{0, 1} ({0, 1}) determines e
t
as a function of P, q, h
F
t
and
s
t
. An outcome ({0, 1}
5
)

takes on the form = ((s


1
, e
1
, y
1
, a
1
, b
1
), (s
2
, e
2
, y
2
, a
2
, b
2
), . . .). A strategy prole
= (P, q,
F
,
C
) induces a probability measure on the set of outcomes , denoted by Q

(). We say that consumers


always follow Ms recommendation if Q

(a
t
= 1) = Q

(b
t
= s
t
) = 1 for all t 1. F always follows Ms recommendation
(follows Ms recommendation in period t) if Q

(e
t
= s
t
) = 1 for all t 1 (in period t).
The long-lived players F and M maximize the sum of their expected discounted payoffs using the common discount
factor (0, 1), while C
t
maximizes her payoff in period t. We dene welfare as the normalized sum of expected discounted
payoffs from trade between F and consumers. The equilibrium concept is subgame perfection.
2
Since consumers have only two actions available and we will focus on equilibria in which consumers play pure strategies, allowing M to use more than
two different recommendations would not affect our results.
3
This assumption guarantees a strictly positive upper bound on welfare in the canonical repeated game that will serve as a benchmark in the later
analysis.
J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147 135
Discussion A discussion of several key model assumptions is in order. To simplify, we assume that M is able commit to a
recommendation policy. This assumption, which is common in the literature on intermediaries (see Admati and Peiderer,
1986, 1988; Lizzeri, 1999; Albano and Lizzeri, 2001), allows us to exclude collusion between F and M. One way to endo-
genize Ms commitment power would be to introduce long-term relationships between consumers and M. If consumers
repeatedly purchase a guide from M (for instance, travel guides for different destinations), they can punish M if F delivers
a disappointing outcome. The commitment assumption can hence be viewed as a short-cut for modeling Ms reputation.
In the equilibria we will focus on, consumers willingness to pay for Ms guide is constant over time. Hence, the as-
sumption that M sets the price q at the beginning of the game and cannot change it later merely economizes on notation.
Similarly, the assumption that all consumers observe P is not important for the results, because in equilibrium they must
have correct beliefs about P. An exception is Proposition 2, where we rene the equilibrium set to those equilibria in
which M earns maximum prots.
Another simplifying assumption is that M can use a recommendation policy with randomization. Alternatively, one could
assume that the quality of the traded good can take on any value in the unit interval. M could then condition the current
recommendation on past observed quality such that no randomization is required.
The intra-period timing we adopt is natural for markets with anonymous consumers, such as gourmet or travel guides.
While the qualitative results would remain unchanged if the decisions of C
t
and F were sequential instead of simultaneous,
it is crucial that M sets s
t
before C
t
s decision whether to purchase Ms guide. Otherwise, M could easily force C
t
to
purchase its guide by recommending F to C
t
if and only if C
t
buys the recommendation.
Finally, one may question the need for a monitor in a world where consumers can share their consumption experiences
in online consumer blogs and leave feedback on public websites. There are several reasons why such communication is
unlikely to fully replace independent sellers of guides. First, consumers typically do not receive any compensation for online
reviews, so they lack extrinsic motivation to share their experiences. Second, rms have incentives to manufacture positive
reviews for themselves, thereby making online information unreliable (see Mayzlin et al., 2012, for empirical evidence of
online review manipulation). Third, consumer ratings are often biased because consumers with extreme (especially very
negative) experience are more prone to posting reviews than others (Anderson, 1998). Most of our analysis therefore as-
sumes that consumers cannot observe past outcomes of trade. In Section 4.2, however, we will analyze the case in which
consumers observe the same (potentially imperfect) information about past outcomes as the monitor, for instance thanks
to communication with earlier generations. We show that there still can be equilibria in which M makes positive prots
because it acts as a correlating device for F and consumers.
3. The value of recommendations and eciency
This section starts with the case of perfect monitoring to illustrate the key trade-offs in our model. We then derive
formal results for the case of imperfect monitoring. In particular, we will derive an upper bound on the equilibrium value
of the guide and characterize conditions for the existence of an equilibrium in which M earns this maximum prot in
every period. We then show that, for high enough discount factors, this is the unique equilibrium outcome if a fraction of
consumers are commitment consumers who always follow Ms recommendation.
3.1. Perfect monitoring
Before we discuss equilibria in which M sells a guide to consumers, consider the benchmark canonical repeated game
in which there is no M and consumers observe past outcomes. Under perfect monitoring ( = 0), this game has an ecient
equilibrium with effort and trade in every period if and only if
c
p
. F s effort incentives are maximized if consumers play
the following grim trigger strategy: trade with F as long as the outcome of trade has been good in all past periods and
stop trading for good if F ever deviates to no effort. F is then willing to exert effort if and only if the long-term loss of a
deviation exceeds its short-term gain:

1
(p c) c, (1)
which holds whenever
c
p
.
Now consider the following recommendation policy in our original game with M and short-lived consumers who cannot
observe past outcomes:
P
0
_
h
M
t
_
=
_
0, if s

=b

= 1 and y

= 0 for any {1, . . . , t 1},

0
, otherwise.
(2)
In words, M recommends F with probability
0
as long as the outcome was good in every past period in which F was
recommended and trade occurred. Otherwise, M punishes F by never recommending it again.
136 J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147
It is easy to see that, as long as
c
p
, there again exists an ecient equilibrium with effort and trade in every period.
Suppose that M always recommends F after a good outcome (
0
= 1) and that consumers always follow Ms recommenda-
tion. F is then willing to exert effort in a period t with s
t
= 1 if and only if condition (1) from above holds.
4
Moreover, if F
always follows Ms recommendation, then it is rational for consumers who bought the guide to do the same. Finally, M is
willing to adopt the recommendation policy P
0
with
0
= 1 and set q = 0, because F and consumers can credibly agree on
never trading with each other should M make different choices at t = 0. Hence, an ecient equilibrium again exists as long
as the discount factor (weakly) exceeds
c
p
. M is thus able to fully substitute for consumers lack of knowledge about past
outcomes.
However, since F s action is the same in every period of this equilibrium, consumers willingness to pay for Ms guide
is 0. They are willing to buy Ms guide at a strictly positive price only if it resolves some prior uncertainty about what F s
effort will be. M can generate such uncertainty by setting
0
(0, 1). Suppose that F always follows Ms recommendation.
Then it is rational for a consumer who bought the guide to trade with F if and only if s
t
= 1; hence, the consumers
expected (gross) payoff from buying the guide is
0
(1 p). For a consumer who did not purchase the guide it is rational
to trade with F if and only if
0
p 0. Consumers willingness to pay for the guide hence becomes

0
(1 p) max
_
0,
0
p
_
, (3)
which is strictly positive for any
0
(0, 1) and reaches a unique maximum q p(1 p) at
0
= p.
5
There is an equilibrium in which all consumers pay the value in (3) for Ms guide if F always follows Ms recommenda-
tion given that consumers do so. If s
t
= 0, it is clearly optimal for F not to exert effort. If s
t
= 1, F is willing to exert effort
if and only if

0
(p c) c. (4)
Since the long-term loss from a deviation to no effort is increasing in
0
, the inequality in (4) is easier to satisfy the
higher
0
. For
0
= p, (4) holds whenever
c
p(pc)+c
>
c
p
. Hence, for high enough discount factors, there is an equilib-
rium in which the value of Ms guide reaches the maximum value q.
The perfect monitoring case illustrates the fundamental conict between Ms prot-maximization objective and ecient
trade between F and consumers. For any >
c
p
, Ms preferred equilibrium is such that
0
< 1 although an equilibrium in
which
0
= 1 exists. Hence, welfare in Ms preferred equilibrium is strictly less than 1 c, the highest level that could be
attained in equilibrium. Ms goal to make its guide valuable for consumers also conicts with effort incentives. Since F is
willing to exert effort after being recommended only if a deviation triggers a high enough long-term loss, any reduction in
the recommendation probability
0
makes it more dicult to induce effort. For discount factors between
c
p
and
c
p(pc)+c
,
the incentive constraint in (4) hence puts a limit on the extent to which M can reduce
0
in order to increase its prots.
3.2. Imperfect monitoring
We rst provide an upper bound on the level of welfare that F and consumers could achieve without Ms help if
consumers had the same (imperfect) information as M about the past. As before, we call this the canonical repeated game,
and the game with M the original game. Formally, in the canonical repeated game, C
t
observes h
M
t
, and, because there is
no M, q = 0 and P(h
M
t
) = 0 for all h
M
t
.
Lemma 1. (1) The highest level of welfare that can be attained in an equilibrium of the canonical repeated game is

W() =
(1

1
c
pc
)(1 c). (2) If F and consumers always follow Ms recommendation in an equilibrium of the original game, then welfare
is at most

W(). (3) If >
c
p
and is suciently close to 0, then the original game has an equilibrium in which F and consumers
always follow Ms recommendation and welfare equals

W().
Under imperfect monitoring ( >0), effort provision no longer automatically translates into a good outcome. This implies
that punishments, i.e., periods with no trade, must occur on the equilibrium path. As the monitoring technology worsens
(higher ), the maximum level of welfare that is attainable in an equilibrium of the canonical repeated game decreases.
In the proof of Lemma 1(1), we show that welfare reaches its maximum value only if F exerts effort in every period with
trade. Hence, in the original game,

W() remains the highest attainable welfare level if F and consumers always follow
Ms recommendation on the equilibrium path. As shown below, the latter must hold in any equilibrium in which the value
4
Given that consumers always follow Ms recommendation, it is obviously optimal for F to set e
t
= 0 whenever s
t
= 0.
5
Lemma 2 below shows that q is the maximal value of Ms guide to consumers for any 0 and without any prior restriction on the class of
recommendation policies.
J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147 137
of Ms guide is maximized. We will therefore call an equilibrium of the original game constrained ecient if welfare
equals

W().
6
Lemma 1(3) shows that the original game has a constrained ecient equilibrium if the discount factor is high enough
and monitoring close enough to perfect. The reason is that F and consumers can force M to adopt a recommendation
policy P

that enables constrained ecient trade by playing the following strategy prole: If P = P

and q = 0, then both F


and consumers always follow Ms recommendation; otherwise, they never trade with each other. It is then a best response
for M to indeed adopt P

and q = 0.
Under imperfect monitoring, consumers willingness to pay for Ms guide is strictly positive in at least some periods on a
constrained ecient equilibrium path. Intuitively, consumers pay M for knowing whether an (equilibrium path) punishment
is in play or not. However, if monitoring is close enough to perfect, then the probability that F exerts no effort is small in
any given period on a constrained ecient equilibrium path, which implies a low willingness to pay for the guide. Therefore,
M can again increase its prots by inducing more uncertainty about F s actions. The following result presents the maximum
value of Ms guide to consumers for any 0.
Lemma 2. (1) The value of Ms guide to consumers is at most q = p(1 p). (2) It is rational for C
t
to purchase the guide at q under
the strategy prole if and only if F follows Ms recommendation in period t and Q

(s
t
= 1|h
C
t
) = p.
The results of Lemma 2 are intuitive. The value of Ms guide to consumers increases in its precision, that is, in how well
the guide allows consumers to predict F s action. Ceteris paribus, the guide is thus most valuable if F always follows Ms
recommendation. In this case, C
t
s willingness to pay for the guide becomes
Q

_
s
t
= 1

h
C
t
_
(1 p) max
_
0, Q

_
s
t
= 1

h
C
t
_
p
_
, (5)
which reaches its maximum value q = p(1 p) at the recommendation probability Q

(s
t
= 1|h
C
t
) = p.
In the following, we show that for small enough , there is a recommendation policy such that (i) Q

(s
t
= 1|h
C
t
) = p for
all t and h
C
t
if F always follows Ms recommendation, and (ii) a bad outcome in a period with a positive recommendation
and trade triggers a punishment phase with no recommendations. The main challenge in constructing such a recommen-
dation policy is that, while punishments occur on the equilibrium path whenever > 0, past recommendations must not
reveal any information about the recommendation probability in the current period to consumers. The recommendation
policy below achieves this by increasing the recommendation probability after periods with negative recommendations that
do not serve as punishment. Dene
P
T ,
_
h
M
t
_
=
_
0, if s
t
=b
t
= 1 and y
t
= 0 for some {1, . . . , T },

T ,
(h
M
t
), otherwise.
(6)
Under P
T ,
, a bad outcome in a period with a positive recommendation and trade triggers a punishment phase lasting T
periods. If the game is not in a punishment phase, then s
t
= 1 with probability
T ,
(h
M
t
). We now construct a function

T ,
(h
M
t
) such that Q

(s
t
= 1|h
C
t
) = p for all t and h
C
t
if F always follows Ms recommendation, and show that this function
is well-dened for suciently close to 0:
1. We set
T ,
(h
M
t
) =
T ,
T
= p for all h
M
t
where (i) h
M
t
=h
M
1
, or (ii) s
tT
= = s
t1
= 0. We thus have Q

(s
t
= 1|h
C
t
) = p
for any h
C
t
with h
C
t
=h
C
1
or s
tT
= = s
t1
= 0.
2. We dene
T ,
(h
M
t
) for M-histories with a positive recommendation in (at least) one of the last T periods. Let
C
be
such that consumers always follow Ms recommendation. If C
t
observes s
t1
= 1, then she assigns probability 1 to
the game not being in a punishment phase in period t, in which case s
t
= 1 with probability
T ,
(h
M
t
). We therefore
set

T ,
_
h
M
t
_
=
T ,
0
=
p
1
(7)
for all h
M
t
where s
t1
= b
t1
= y
t1
= 1, which implies that Q

(s
t
= 1|h
C
t
) = p for any h
C
t
with s
t1
= 1. Note that

T ,
0
< 1 for small enough . Now let {1, . . . , T 1}. Using
T ,
0
from (7), we can inductively dene
T ,

for any
{1, . . . , T 1} as

T ,

= p
+(1 )

1
g=0
(1
T ,
g
)
(1 )

1
g=0
(1
T ,
g
)
. (8)
6
Interestingly, for > 0 equilibrium welfare in the original game may exceed

W() if M adopts a recommendation policy where in some periods it
recommends F to consumers, consumers follow Ms recommendation, but F does not exert effort. M can thereby increase F s reward for periods with
good outcomes (which is impossible in the canonical repeated game, because short-lived consumers would not trade with F if it does not exert effort).
This, in turn, reduces the expected length of punishment phases after bad outcomes needed to provide incentives to F , which raises welfare. In the Online
Appendix, we provide an example in which welfare indeed exceeds

W().
138 J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147
Note that
T ,

< 1 for all if is suciently small. We set


T ,
(h
M
t
) =
T ,

for all h
M
t
where s
t1
= b
t1
=
y
t1
= 1 and F has not been recommended for the past period(s). If C
t
observes s
t1
= 1 and s
t
= =
s
t1
= 0, she then assigns probability
(1 )

1
g=0
(1
T ,
g
)
+(1 )

1
g=0
(1
T ,
g
)
(9)
to the game not being in a punishment phase at time t, in which case s
t
= 1 with probability
T ,

. By (8), we thus have


Q

(s
t
= 1|h
C
t
) = p for any h
C
t
with s
t1
= 1 and s
t
= = s
t1
= 0 if
C
is such that consumers always follow Ms
recommendation.
3. Finally, we set
T ,
(h
M
t
) =
T ,
T
= p for all h
M
t
where (i) s
t1
= 1, b
t1
= 0, or (ii) s
t1
= 1, b
t1
= 0 for some
{1, . . . , T 1} and s
t
= = s
t1
= 0. Together with parts 1 and 2 this implies that, if F always follows Ms
recommendation, then we have Q

(s
t
= 1|h
C
t
) = p for any h
C
t
, regardless of
C
.
7
If M adopts P
T ,
as dened above and F always follow Ms recommendation, then, by Lemma 2(2), every consumer
is willing pay q for the guide. It remains to show that F is always willing to follow Ms recommendation if consumers
do so. If s
t
= 0, then it is clearly optimal for F not to exert effort. By the construction of P
T ,
, if s
t
= 1 and F follows
Ms recommendation in all future periods, then Q

(b

= e

= 1|s
t
, h
F
t
) = p for any possible h
F
t
and all > t. Hence, the
condition that rules out a protable one-shot deviation to no effort in a period where s
t
= 1 is
(p c) +

1
p(p c) p +

T +1
1
p(p c). (10)
The rst result in the following proposition follows.
Proposition 1. (1) If >
c
p(pc)+c
and is suciently close to 0, then there exists an equilibrium in which M earns q in every period
t 1.
8
(2) If <
c
p(pc)+c
, then there is no equilibrium in which M earns q in every period t 1. (3) Welfare in any equilibrium in
which M earns q in every period t 1 is equal to p(1 c) and strictly less than

W().
Proposition 1(3) shows that the conict between welfare and the value of M s guide discussed in the perfect monitoring
case persists under imperfect monitoring. The proof of Proposition 1(3) is based on the observation that if there exists an
equilibrium in which M earns q in every period, then there also exists an alternative equilibrium in which positive
recommendations are more frequent and F as well as consumers always follow Ms recommendation. A particularly simple
way to achieve such a welfare improvement is to keep the recommendation policy unchanged except for increasing the
recommendation probability in period 1 from p to 1. In terms of welfare, this means that if there is an equilibrium in
which M earns q in every period, then there must also exist an equilibrium in which welfare is strictly above p(1 c) but,
by Lemma 1(2), at most

W(). It follows that p(1 c) <

W(), which leads to the following corollary:
Corollary 1. An equilibrium in which M earns q in every period t 1 can only exist if <
(1p)(pc)
(1p)(pc)+c
.
Finally, note that there continues to be a conict between the value of Ms guide and effort incentives. Recall from
Lemma 1(3) that for >
c
p
and is suciently small, M could enable a constrained ecient equilibrium. However, any
recommendation policy that enables M to earn q in each period must induce Q

(s
t
= 1|h
C
t
) = p for all t and h
C
t
, which
conicts with Ms objective to provide incentives to F by maximally rewarding good outcomes and optimally punishing bad
outcomes. An equilibrium in which Ms guide has value q can therefore only exists if
c
p(pc)+c
>
c
p
.
3.3. Commitment consumers
We observed that F and consumers can credibly threaten M not to trade if M fails to choose a certain recommendation
policy and price. The game can therefore have multiple equilibria and the equilibrium price of the guide can take on any
value in [0, q].
In this subsection, we show that a small modication of our model suces to rene the equilibrium set for large to
equilibria in which M earns the maximum price q in every period. Specically, we assume that any C
t
is a commitment
consumer with probability (0, 1) and a normal consumer (playing strategy
t
) with probability 1 .
9
A commitment
7
This feature of P
T ,
will play an important role in the proof of Proposition 2.
8
For =
c
p(pc)+c
, this equilibrium exists if = 0. The range of for which P
T ,
is well-dened is smaller the higher T . In the limit case T = , P
T ,
only exists if = 0 and coincides with P
0
from Section 3.1 for
0
= p.
9
The idea of using commitment types to reduce the equilibrium set is borrowed from the reputation literature, see Mailath and Samuelson
(2006, Chapter 15).
J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147 139
consumer follows Ms recommendation if and only if q q. Thus, we rule out that all consumers coordinate on punishing M
for not choosing a certain recommendation policy and price q. Neither M nor F can discriminate between commitment and
normal consumers. The rest of the model remains the same.
Suppose that M adopts a recommendation policy that punishes F for bad outcomes in periods with positive recom-
mendations, such as P
T ,
from Section 3.2. F then loses future business with commitment consumers if it fails to follow a
positive recommendation. If is suciently close to 1 and the punishment is long enough, it is a strict best response for F
to always follow Ms recommendation. Consequently, Ms guide has positive value for all consumers and the threat of not
purchasing Ms guide is no longer credible. Hence, for large and small enough, the equilibrium set reduces to those
equilibria where q = q and F as well as consumers always follow Ms recommendation.
Proposition 2. Let each consumer be a commitment consumer with probability (0, 1). If >
p(pc)
(1)p+[p(pc)+c]
and is su-
ciently close to 0, then in any equilibrium M earns q in every period t 1.
Intuitively, since commitment consumers always follow Ms recommendation, the recommendation policy has a direct
impact on F s payoff now recommendations are no longer pure cheap talk. This creates an incentive for F to follow Ms
recommendation, which, in turn, makes the recommendation valuable to normal consumers.
4. Alternative information structures
This section considers two informational variations of the baseline model analyzed so far. First, a setting in which con-
sumers observe neither past outcomes nor past recommendations. Consumers who consider buying a travel guide for a new
destination, for example, are likely to have little to no information about past recommendations. Second, a setting in which
consumers observe the same history as M, for instance thanks to word-of-mouth communication with earlier consumers.
4.1. Consumers cannot observe past recommendations
So far, we assumed that consumers observe Ms previous recommendations. To maximize prots, M therefore had to
choose a recommendation policy that makes previous recommendations uninformative about F s current action. In this
subsection, we show that it becomes easier for M to maximize the value of the guide and provide effort incentives to F if
consumers know even less about the history of the game.
Assume that consumers know the recommendation policy P and price q, but cannot observe past recommendations.
In addition, they do not exactly know which period they are born into. We partition time in segments of K periods,
where we think of K as a large number. That is, segment 1 are the periods 1, . . . , K, and segment 2 are the periods
( 1)K + 1, . . . , K. C
t
knows her segment but not t, and assigns probability
1
K
to each period in her segment.
10
This
informational setting captures the idea that consumers know very little about the time index t.
11
To maximize the value of the guide, M can now select a recommendation policy that changes the recommendation
infrequently. For example, consider the following policy:
P
T
_
h
M
t
_
=
_
0, if s
t
=b
t
= 1 and y
t
= 0 for some {1, . . . , T },
1, otherwise.
(11)
Under P
T
, a bad outcome in a period with a positive recommendation and trade triggers a punishment phase lasting T
periods. If the game is not in a punishment phase, then M sets s
t
= 1 with probability 1. If is small and both F and con-
sumers always follow Ms recommendation, there are long phases with positive recommendations and punishment phases
of length T with negative recommendations. With Ms guide consumers essentially purchase the information whether the
game is in a punishment phase.
Suppose that M chooses P
T
and F always follows Ms recommendation. Since consumers only know which segment
they belong to, their willingness to pay for Ms recommendation is determined by the average recommendation probability
in their segment. This probability can be manipulated through the choice of T . Ceteris paribus, if T is small (large), the
average recommendation probability is relatively high (low). If is small and K is suciently large, then T can be chosen
such that the average recommendation probability in each segment is close to p, which implies that the value of the guide
is close to the maximum value q. At the same time, F has strong effort incentives because the recommendation changes
infrequently. When is small and s
t
= 1, F knows that if it always follows Ms recommendation, trade will take place with
high probability in each period of the near future; if s
t
= 1 and F does not follow Ms recommendation in period t, then M
will not recommend F to consumers in the next T periods. Thus, the conict between the value of the guide and incentives
10
The uniform distribution is not crucial for the result. What matters is that each period has only low probability.
11
A possible interpretation is that consumers do not precisely observe the age of the rm that M recommends. We are grateful to one of the reviewers
for this suggestion.
140 J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147
is mitigated relative to the previous case where M had to manipulate recommendation probabilities after good outcomes in
order to make previous recommendations uninformative for consumers.
12
Proposition 3. Suppose each consumer observes only q, P, and the K-period segment she is born into. If >
c
p
, is suciently close
to 0, and K is suciently large, then there is an equilibrium in which M earns q in every period t 1.
Note also that if negative recommendations only occur after bad outcomes, and F as well as consumers always follow
Ms recommendation, welfare exceeds
(1 )

t=0

t
(1 )
t
(1 c) =
(1 )(1 c)
1 (1 )
. (12)
For any given < 1, the difference between (12) and

W() converges to 0 as 0. Hence, welfare can be close to

W()
in an equilibrium in which M earns maximum prots. However, for a social planner who does not discount future payoffs,
the outcome would still be inecient whenever the value of the guide is maximized, because the punishment phases are
excessively long.
4.2. Consumers have the same information as the monitor
If each C
t
observes h
M
t
, as in the canonical repeated game, then F may exert effort in equilibrium even if M is inac-
tive. This does not imply that there are no equilibria in which M earns positive prots, however. Consider the following
recommendation policy:

P
T
_
h
M
t
_
=
_
0, if s
t
=b
t
= 1 and y
t
= 0 for some {1, . . . , T },
p, otherwise.
(13)
Under

P
T
, a bad outcome in a period with a positive recommendation and trade triggers a punishment phase lasting T
periods. If the game is not in a punishment phase, M sets s
t
= 1 with probability p. Hence, if P =

P
T
and F always follows
Ms recommendation, C
t
s willingness to pay for the guide is q if s
t1
= y
t1
= 1, and 0 otherwise.
Suppose that M chooses P =

P
T
, consumers (do not) purchase the guide if in none (at least one) of the last T periods
the outcome was bad and trade occurred, and trade with F if and only if they observe that M recommends it. If s
t
= 0, it is
optimal for F not to exert effort. If s
t
= 1 and is small, then, by always following Ms recommendation, F earns p(p c)
in expectation in the following T periods; by exerting no effort in period t, F earns 0 in each of these periods. Hence, if T
and are suciently large, and suciently small, it is a best-response for F to always follow Ms recommendation. This
leads to the next result.
Proposition 4. Let consumers have the same information as M. (1) If
c
p(pc)+c
and = 0, there exists an equilibrium in which M
earns q in every period t 1. (2) If > 0, such an equilibrium does not exist. (3) If >
c
p(pc)+c
and is suciently close to 0, there
exists an equilibrium in which M earns q in some periods.
The only difference between the game in which consumers have the same information as M and the canonical repeated
game is the existence of M. Therefore, any equilibrium of the canonical game is also an equilibrium of the game with M in
which consumers have the same information as M. In addition, however, the game with M also admits equilibria in which
M earns positive prots by acting as a correlating device for F and consumers. Yet, unless monitoring is perfect, the max-
imum prot level that M can attain remains strictly below that in the original game. Intuitively, M loses its informational
advantage over consumers in knowing whether the game is in a punishment phase, which limits its ability to sell the guide
at a high price in all periods.
5. The value of recommendations with multiple rms
We now analyze the value of Ms guide when consumers can choose between multiple rms. Two scenarios will be
considered. In the rst scenario, consumers have an outside option of known value. Tourists, for instance, could visit Mc-
Donalds instead of a local restaurant. In the second scenario, Ms guide includes recommendations for two competing rms,
both of which are subject to moral hazard.
12
The analysis in this section shares some similarity with Gershkov and Szentes (2009). They show that the following mechanism solves a free-rider
problem among agents that can acquire costly information to improve the decision of a social planner: agents are selected sequentially and randomly to
acquire information and report it to the social planner; they neither know their position in the sequence nor the reports of previous agents. For each agent,
the probability of being pivotal for the decision is then large enough such that information acquisition becomes optimal. In a similar spirit, we show that
M can both maximize the value of its guide to consumers and provide almost maximum incentives for F if consumers observe neither the period they are
born into nor past recommendations.
J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147 141
5.1. Fixed outside option
Suppose all consumers have an outside option of known value u > 0. If u 1 p, then consumers always prefer the
outside option to trading with F , so Ms guide is valueless. Therefore, assume that u <1 p from now onwards. There can
then exist equilibria in which Ms guide has strictly positive value because it allows consumers to make a better choice
between F and the outside option. If F always follows Ms recommendation, C
t
s willingness to pay for Ms guide becomes
Q

_
s
t
= 1

h
C
t
_
(1 p) +
_
1 Q

_
s
t
= 1

h
C
t
__
u max
_
Q

_
s
t
= 1

h
C
t
_
p, u
_
, (14)
which attains its maximum value q
o
= (p + u)(1 p u) if and only if Q

(s
t
= 1|h
C
t
) = p + u. We obtain the following
results:
Proposition 5. Let consumers have an outside option of value u < 1 p. (1) The value of Ms guide to consumers is at most q
o
=
(p + u)(1 p u). (2) If >
c
(p+ u)(pc)+c
and is suciently close to 0, then there exists an equilibrium in which M earns q
o
in
every period t 1.
A positive outside option relaxes the conict between the maximum value of Ms guide and ecient effort provision.
Since a consumer who does not buy the guide obtains a strictly positive utility, M must induce trade more often to ensure
purchase of its guide. In addition, a higher outside option for consumers can increase the maximum value of Ms guide,
that is, q
o
may be greater than q and increasing in u. Intuitively, the opportunity cost of trading with F without knowing
Ms recommendation is increasing in the consumers outside option. Trading with F without buying the guide is therefore
more attractive for a consumer with a low outside option than for a consumer with a higher outside option, which implies
that M may need to lower the price of its guide to attract consumers with low outside options.
5.2. Competition for recommendations
Now consider an extension of our basic model to situations with two identical rms, F
1
and F
2
. In each period, the
consumer trades with at most one rm and rms simultaneously decide whether to exert effort. Ms guide contains a
(positive or negative) recommendation for each rm.
13
Proposition 6. Let there be two rms that M can recommend to consumers. (1) The value of Ms guide to consumers is at most
q
2
= min{
1
2
, 1 p}. (2) If >
c
1
2
(p+c)
and is suciently close to 0, then there exists an equilibrium in which M earns q
2
in every
period t 1 and welfare equals 1 c.
Since M can now recommend different rms in different periods, it no longer needs to impose a probability of trade
below 1 to create the uncertainty that makes its guide valuable. If the rms always follow Ms recommendation, M maxi-
mizes the value of its guide by means of a recommendation policy such that, in each period, exactly one of the two rms is
recommended and the consumer assigns fty percent probability to each of the two rms. Trade then takes place in every
period, which implies that welfare is maximized. It is important to note, however, that the conict between Ms prot max-
imization incentive and eciency would persist if F
1
and F
2
were differentiated in the eyes of consumers. Note also that
the maximum value of Ms guide is higher when there are two rms than when there is only one rm ( q
2
> q): intuitively,
the guide becomes more valuable when there are two rms because it guarantees trade with a rm that exerts effort.
The conict between the value of Ms guide and effort incentives persists. M could maximize the range of discount
factors for which at least one rm exerts effort in every period by recommending one particular rm as long as M observes
only good outcomes from this rm. However, for small , consumers willingness to pay for Ms guide then lies strictly
below q
2
for all t 2, as each consumer expects the rm that was recommended in the last period to be recommended
with a probability close to 1 in the current period.
6. Related literature
This paper contributes to several strands of the literature. First, in a broad sense the paper is related to the litera-
ture on seller reputation in repeated games building on Klein and Leer (1981) and Shapiro (1983). In contrast to this
literature, consumers cannot observe the history of the game in our model, which precludes the rm from maintain-
ing a reputation with consumers. The monitor could discipline the rm in the same way consumers do in models la
Klein and Leer (1981), but, as stressed above, this would conict with the monitors prot-maximization objective.
Second, our research is related to the literature on strategic communication building on Crawford and Sobel (1982). As in
cheap talk models, the sender in our game (i.e., the monitor) wants to manipulate the decisions of the receivers (i.e., the
13
The Online Appendix contains a formal model description.
142 J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147
rm and consumers). The key difference is that the sender has no immediate stake in the receivers decisions; instead, the
sender wants to increase the value of its message to receivers.
Third, our analysis contributes to the literature on costly acquisition of information about past actions in innitely re-
peated games. Dixit (2003) is most closely related to our work in that he analyzes a prot-maximizing monitor selling
information about past actions of potential trading partners to economic agents. His modeling approach, however, is quite
different from ours. In particular, he considers a population with three different types of individuals (honest, dishonest and
opportunistic) who are randomly matched into pairs in every period. The main determinant of the value of the monitors
information to individuals, and hence the monitors prot, is the distribution of different types in the population. In our
model, on the other hand, the value of the monitors guide is endogenously determined by the equilibrium strategies of the
rm and consumers, all of whom are payoff maximizers.
A series of other papers in this literature analyze costly information acquisition in innitely repeated games
with a non-strategic monitor (Ben-Porath and Kahneman, 2003; Miyagawa et al., 2008; Kandori and Obara, 2004;
Flesch and Perea, 2009). This literature focuses on showing that the folk theorem holds even when the cost of monitoring
is high. The key difference of our approach is that the monitor is a strategic player that may nd it optimal to manipulate
the information it sells in order to increase the price consumers are willing to pay for it. While the constrained ecient
equilibrium is one equilibrium outcome in our basic setting, a small modication of the model reduces the equilibrium set
to those equilibria in which the monitor maximizes its prot.
Fourth, our paper is related to the literature on intermediaries (or middlemen) that reduce asymmetric information prob-
lems between sellers and buyers. Biglaiser (1993) considers an intermediary who becomes an expert in determining product
quality, thereby resolving an adverse selection problem. In Biglaiser and Friedman (1994), intermediaries buy and resell
goods of several rms, which implies that reputational spillovers between products can reduce moral hazard. Lizzeri (1999)
and Albano and Lizzeri (2001) study the optimal disclosure policy of a certication intermediary. In Lizzeri (1999), the in-
termediary has incentives to strategically restrict the amount of information disclosure. In Albano and Lizzeri (2001), there
can be full disclosure of information, but there will always be underprovision of quality. Kennes and Schiff (2007, 2008)
examine the role of intermediaries in (nite horizon) search models with asymmetric information about rm qualities.
In contrast to this literature, we consider an intermediary who inuences the repeated interaction between a long-lived
rm and short-lived consumers.
Finally, a related literature investigates how reputational concerns can prevent collusion between intermediaries and
rms (Strausz, 2005; Peyrache and Quesada, 2011). Focusing on adverse selection, a key result of this literature is that the
monitor and the seller will not collude if is high enough. Our model assumes away the possibility of collusion in order to
focus on the value of the monitors guide.
7. Conclusion
This paper analyzes the repeated interaction between a long-lived rm and short-lived consumers in the presence of
a prot-maximizing monitor selling a guide that allows consumers to predict the rms effort decision. We show that
the monitor has an incentive to adopt a recommendation policy that creates sucient uncertainty about the rms action
for consumers and thereby makes the guide more valuable. This conicts with welfare maximization in a broad range of
settings.
An interesting direction for future research would be to introduce a cost of monitoring. Intuitively, one would expect
that the monitor might be willing to accept a higher monitoring error rate in this case, even if this reduces the value of
its guide to consumers. Note, however, that starting from perfect monitoring, there are several ways for the monitor to save
monitoring costs without losing revenues. First, the monitor could completely stop monitoring rms in periods in which
they are not recommended. Second, provided the discount factor is high enough, the monitor could reduce the intensity of
monitoring in periods with a positive recommendation to a level just high enough to preserve effort incentives.
Appendix A
Proof of Lemma 1. We prove the rst statement. Let the canonical repeated game with correlating device (CRG
+
) be
identical to the canonical repeated game (CRG) except that F and consumers also observe the outcomes of a correlating
device and can select continuation equilibria on its basis.
14
Consider an equilibrium in CRG
+
. Dene
Q

t=1

t1
Q

(b
t
= e
t
= 1). (15)
The expected sum of payoffs in period t is 0 if b
t
= 0 or e
t
= 0, and 1 c if b
t
= e
t
= 1. Hence, welfare in equilib-
rium is given by (1 )(1 c)Q

. Suppose that satises the following two properties: (a) Q

(b
1
= e
1
= 1) = 1, and
(b) Q

(b
t
= 1, e
t
= 0) = 0 in all periods t. We now derive an upper bound

Q

on Q

. Abbreviate
14
Lemma 1(1) does not depend on whether there is a correlating device or not. What we need in the proof is that in CRG
+
players have one additional
option to correlate their play.
J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147 143
Q
1

t=2

t2
Q

(b
t
= e
t
= 1| y
1
=b
1
= 1), (16)
Q
0

t=2

t2
Q

(b
t
= e
t
= 1| y
1
= 0, b
1
= 1). (17)
Since has property (b), it is rational for F to exert effort in period 1 if and only if
(p c) +(1 )Q
1

(p c) +Q
0

(p c) p +Q
0

(p c). (18)
Rearranging this inequality yields
Q
0

Q
1


c
(1 )(p c)
. (19)
Since has property (a), we have Q

= 1 +(1 )Q
1

+Q
0

. Using (19) to substitute for Q


0

in this equality yields


Q

1 +Q
1


c
(1 )(p c)
. (20)
If

Q

is an upper bound on Q

, we must have Q
1

. From (20) it then follows

=
1
1
_
1

1
c
p c
_
. (21)
By multiplying

Q

with (1 )(1 c), we obtain the welfare upper bound



W(). The result holds if we can show that
for each equilibrium in CRG, there is an equilibrium in CRG
+
that satises properties (a), (b) and has Q

Q

. This
follows from the following two claims:
Claim 1. If there exists an equilibrium in CRG with Q

> 0 and Q

(b
1
= e
1
= 1) < 1, there exists an alternative equilibrium in
CRG that satises property (a) and has Q

> Q

.
Proof. Let be an equilibrium in CRG with Q

> 0 and Q

(b
1
= e
1
= 1) < 1. We show that either there exists an equi-
librium with Q


1

or an equilibrium with property (a) and Q



> Q

. The result then follows from the fact


that there cannot exist an innite sequence of equilibria {
k
}

k=1
with Q

k+1
1

k > 0 for each k. Let t

be the smallest
number such that Q

(b
t
= e
t
= 1) > 0.
Assume that t

>1. Since it is rational for C


t
to trade with F only if Q

(e
t
= 1|h
M
t
) >0, the outcome is (y

=b

= 0) in
all periods {1, . . . , t

1}. Let be a prole that starts play in period 1 like after the outcome {(y

= b

= 0)}
t

1
=1
.
Clearly, is an equilibrium and Q


1

.
Assume that t

= 1. We then have Q

(b
1
= e
1
= 1) (0, 1). Note that F must be indifferent between e
1
= 1 and e
1
= 0.
We have
Q

=Q

(b
1
= e
1
= 1)
_
1 +(1 )

t=2

t1
Q

(b
t
= e
t
= 1| y
1
=b
1
= 1)
+

t=2

t1
Q

(b
t
= e
t
= 1| y
1
= 0, b
1
= 1)
_
+Q

(b
1
= 1, e
1
= 0)

t=2

t1
Q

(b
t
= e
t
= 1| y
1
= 0, b
1
= 1)
+Q

(b
1
= 0)

t=2

t1
Q

(b
t
= e
t
= 1| y
1
= 0, b
1
= 0). (22)
Suppose that

t=2

t1
Q

(b
t
= e
t
= 1| y
1
= 0, b
1
= 1)

t=2

t1
Q

(b
t
= e
t
= 1| y
1
= 0, b
1
= 0) (23)
and
144 J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147
1 +(1 )

t=2

t1
Q

(b
t
= e
t
= 1| y
1
=b
1
= 1) +

t=2

t1
Q

(b
t
= e
t
= 1| y
1
= 0, b
1
= 1)

t=2

t1
Q

(b
t
= e
t
= 1| y
1
= 0, b
1
= 1). (24)
We then can nd an equilibrium

with Q

(b
1
= e
1
= 0) = 1 and where continuation play is identical to after outcome
(y
1
= 0, b
1
= 1). By construction, we have Q

. By the result for the case t

> 1, we then can nd an equilibrium


with Q


1

. Next, suppose that the weak inequality in (23) is reversed and


1 +(1 )

t=2

t1
Q

(b
t
= e
t
= 1| y
1
=b
1
= 1) +

t=2

t1
Q

(b
t
= e
t
= 1| y
1
= 0, b
1
= 1)

t=2

t1
Q

(b
t
= e
t
= 1| y
1
= 0, b
1
= 0). (25)
Again, we can nd an equilibrium

with Q

(b
1
= e
1
= 0) = 1 and where continuation play is identical to after
outcome (y
1
= 0, b
1
= 0). By construction, we have Q

. By the result for the case t

> 1, we then can nd an


equilibrium with Q


1

. Finally, suppose that both (24) and (25) are violated. Since F is indifferent between
e
1
= 1 and e
1
= 0 in equilibrium , there is an equilibrium with Q

(b
1
= e
1
= 1) = 1 and where continuation play is
identical to . We then have Q

> Q

, which completes the proof. 2


Claim 2. If there exists an equilibrium in CRG that satises property (a) and has Q

(b
t
= 1, e
t
= 0) > 0 for some period t, there
exists an alternative equilibrium in CRG
+
that satises properties (a), (b), and has Q

> Q

.
Proof. Let be an equilibrium in CRG that satises property (a) and Q

(b
t
= 1, e
t
= 0) > 0 for some period t. Choose a
period t

for which there exists a M-history



h
M
t
with Q

h
M
t
) >0 and Q

(b
t
= 1, e
t
= 0|

h
M
t
) >0. Dene
=
t

1
Q

h
M
t

_
Q

_
b
t
= 1

h
M
t

_
Q

_
e
t
= 0

h
M
t

_
. (26)
Note that > 0. The proof proceeds stepwise. In Step 1, we show that if for a period t there exists a M-history h
M
t
with
Q

(h
M
t
) > 0 and Q

(b
t
= 1, e
t
= 0|h
M
t
) > 0, then we can modify in CRG
+
(by using the correlating device) such that in
period t after h
M
t
the prole (b
t
= 1, e
t
= 0) is played with probability 0 and the prole (b
t
= 1, e
t
= 1) is played with
higher probability than before. In Step 2, we use the technique of Step 1 to modify in CRG
+
to derive an equilibrium

where prole (b
t
= 1, e
t
= 0) is played only in nitely many periods and Q

> Q

. In Step 3, we use the technique


of Step 1 to modify

in CRG
+
to get an equilibrium that satises properties (a), (b) and Q

Q

+ , which, by
Step 2, yields the main result.
15
Step 1. Let t be any period for which there exists a M-history h
M
t
with Q

(h
M
t
) > 0 and
Q

(b
t
= 1, e
t
= 0|h
M
t
) > 0. We show that we can nd an equilibrium

in CRG
+
such that Q

(b
t
= 1, e
t
= 0|h
M
t
) = 0
and Q

(b
t
= e
t
= 1|h
M
t
) > Q

(b
t
= e
t
= 1|h
M
t
). Let V (y
t
, b
t
, h
M
t
) be F s continuation value after outcome (y
t
, b
t
) and M-
history h
M
t
. Note that we must have Q

(e
t
= 0|h
M
t
.) < 1, otherwise b
t
= 0 would be the unique best-response of C
t
when
she observes h
M
t
. Hence, in period t after history h
M
t
, F must be indifferent between e
t
= 1 and e
t
= 0, so that
(p c) +(1 )V
_
1, 1, h
M
t
_
+(1 )V
_
0, 1, h
M
t
_
= p +V
_
0, 1, h
M
t
_
. (27)
Now consider an alternative prole

in CRG
+
that is identical to except in period t after history h
M
t
. There, prole
(b
t
= e
t
= 1) is played with probability Q

(b
t
= 1|h
M
t
), and prole (b
t
= e
t
= 0) is played with probability 1Q

(b
t
= 1|h
M
t
).
Note that F s continuation payoff in any period after any M-history remains unchanged so that

is an equilibrium. More-
over, we have
Q

_
b
t
= e
t
= 1

h
M
t
_
Q

_
b
t
= e
t
= 1

h
M
t
_
=Q

_
b
t
= 1

h
M
t
_
Q

_
e
t
= 0

h
M
t
_
> 0, (28)
which proves the result. Step 2. We show that there exists an equilibrium

in CRG
+
where (i) the prole (b
t
= 1, e
t
= 0)
is played with positive probability only in nitely many periods, and (ii) Q

> Q

. For any

t we can write
Q

t=1

t1
Q

(b
t
= e
t
= 1) +

t=

t+1

t1
Q

(b
t
= e
t
= 1). (29)
15
The idea behind Steps 2 and 3 is to use a backward induction argument (as in the proof of the one-shot deviation principle in Mailath and Samuelson,
2006, pp. 2527).
J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147 145
Note that the last term on the right-hand side of this equation is weakly smaller than

t
1
. Hence, we can choose

t > t

large enough such that

t=1

t1
Q

(b
t
= e
t
= 1) > Q

. (30)
Now consider an alternative prole

in CRG
+
that (i) is identical to in the rst

t periods, and (ii) where play is
modied in each period t >

t as described in Step 1, i.e., for any period t >

t and M-history h
M
t
with Q

(h
M
t
) > 0 and
Q

(b
t
= 1, e
t
= 0|h
M
t
) > 0, we have Q

(b
t
= 1, e
t
= 0|h
M
t
) = 0. By construction,

is an equilibrium and we have Q

>
Q

. Step 3. We show that by modifying

(from Step 2) in each period t

t as described in Step 1, we obtain an


equilibrium in CRG
+
that has properties (a), (b) and Q

Q

+. Let

t be a period for which there exists a M-


history

h
M

with Q

h
M

) > 0, Q

(b

= 1, e

= 0|

h
M

) > 0 and Q

(b
t
= 1, e
t
= 0) = 0 for all t > (since t

<

t there exists
at least one such period). If

h
M

has occurred, then in each period t > we either have b


t
= e
t
= 1 or b
t
= 0. Since

is an
equilibrium, incentive compatibility implies

t=+1

t
_
Q

_
b
t
= e
t
= 1

h
M

, y

= 1, b

= 1
_
Q

_
b
t
= e
t
= 1

h
M

, y

= 0, b

= 1
__
> 0. (31)
Consider now an equilibrium

in CRG
+
that is identical to

except in period after history



h
M

, where play is modied


as described in Step 1. It follows from (28) and (31) that Q

> Q

. The proof is now completed with a backward induc-


tion argument. If we modify play in

in all periods t

t for all possible M-histories as described in Step 1, we get an


equilibrium that satises properties (a), (b) and has Q

Q

+ (by the denition of and the fact that t

t). 2
We prove the second statement. If in an equilibrium of the original game F and consumers always follow Ms recom-
mendation, then satises property (b) as dened above. Then, as in the proof of Claim 1, we can show that there exists
an equilibrium which satises properties (a), (b) and has Q

Q

. That

W() is the upper bound on welfare can then
be shown by going through the same line of reasoning as in the proof of the rst statement.
We prove the third statement. Let >
c
p
be given and T R
+
. Denote by [T ] the largest integer smaller than T . Consider
the following recommendation policy:

P
T
_
h
M
t
_
=
_
_
_
0, if s
t
=b
t
= 1 and y
t
= 0 for some {1, . . . , [T ]},
T [T ], if s
t[T ]1
=b
t[T ]1
= 1 and y
t[T ]1
= 0,
1, otherwise.
(32)
Suppose that M chooses P =

P
T
, q = 0 and consumers always follow Ms recommendation. It is rational for F not to exert
effort in a period t where s
t
= 0. If is suciently small, we can choose T such that F is indifferent between e
t
= 1 and
e
t
= 0 in any period t where s
t
= 1. Let T

be this value. Let be a strategy prole with the following features: (i) P =

P
T

and q = 0; (ii)
F
is such that F always follows Ms recommendation if and only if P =

P
T

and q = 0; otherwise, F
never exerts effort; (iii)
C
is such that consumers always follow Ms recommendation if and only if P =

P
T

and q = 0;
otherwise, they never trade with F . By construction, is an equilibrium and has properties (a) and (b). By the choice of T

,
the inequalities in (18), (19) and (20) hold with equality. Since Q

= Q
1

, we get Q

=

Q

, which proves the result. 2


Proof of Lemma 2. We prove the rst statement. Consider any strategy prole . Abbreviate = Q

(s
t
= 1|h
C
t
),

1
= Q

(e
t
= 1|s
t
= 1, h
C
t
) and
0
= Q

(e
t
= 1|s
t
= 0, h
C
t
). We rst derive C
t
s willingness to pay for the guide and then
show that it is bounded above by p(1 p). We have
Q

_
e
t
= 1

h
C
t
_
=
1
+(1 )
0
. (33)
Suppose C
t
buys Ms guide. If s
t
= s, then it is rational to trade with F if and only if
s
p. Hence, the expected gross
payoff from purchasing the guide is
max{0,
1
p} +(1 ) max{0,
0
p}. (34)
The expected payoff from not purchasing the guide is
max
_
0, Q

_
e
t
= 1

h
C
t
_
p
_
. (35)
Using (33) to replace Q

(e
t
= 1|h
C
t
) in (35), C
t
s willingness to pay for the guide becomes
max{0,
1
p} +(1 ) max{0,
0
p} max
_
0,
1
+(1 )
0
p
_
. (36)
146 J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147
We now establish an upper bound on (36). If max{0,
1
p} and max{0,
0
p} are both non-negative, then (36) is 0.
Assume therefore that
1
> p and
0
p.
16
In this case, (36) becomes
(
1
p) max
_
0,
1
+(1 )
0
p
_
, (37)
which is weakly decreasing in
0
. To maximize (37), we set
0
= 0, which yields
(
1
p) max{0,
1
p}. (38)
This term has a unique maximum at
1
= 1 and = p. Hence, the willingness to pay for the guide has a positive upper
bound equal to q = p(1 p).
We prove the second statement. If
1
= 1 and = p, then (37) is maximized if and only if
0
= 0. Therefore, C
t
is willing to purchase the guide at q if and only if F follows Ms recommendation in period t and (as shown above)
Q

(s
t
= 1|h
C
t
) = p. 2
Proof of Proposition 1. The main text contains the proof of the rst statement. We prove the second statement. Assume
by contradiction that there is an equilibrium in which F and consumers always follow Ms recommendation and q = q.
Lemma 2(2) then implies that Q

(b
t
= e
t
= 1) = p for each period t 1. If s
1
= 1 (which happens with probability p in
this equilibrium), then F follows Ms recommendation only if p c +

1
p(p c) p. This inequality is equivalent to

c
p(pc)+c
, a contradiction.
We prove the third statement. Consider an equilibrium in which q = q and consumers always follow Ms recom-
mendation. By Lemma 2(2), F always follows Ms recommendation in this equilibrium and Q

(e
t
=b
t
= 1) = p < 1 for each
period t. Hence, welfare equals p(1c). Note that in equilibrium , F exerts effort in period 1 if s
1
= 1. Consequently, there
is an alternative equilibrium that is identical to except that M recommends F in period 1 with probability 1 and C
1
trades with F without purchasing the guide. Since Q

(e
t
= b
t
= s
t
) = 1 for all t 1, Lemma 1(2) implies that welfare in
is at most

W(). It follows that welfare in equilibrium is strictly lower than

W(). 2
Proof of Proposition 2. Let > 0 and >
p(pc)
(1)p+[p(pc)+c]
be given. The proof proceeds by steps. In Step 1, we analyze
the incentives of normal consumers if F does not always follow Ms recommendation. In Step 2, we show that if is
suciently small, M can choose a recommendation policy so that in any continuation equilibrium F always follows Ms
recommendation. In Step 3, we show that if is suciently small, there is no equilibrium in which q < q. In Step 4,
we show that if is suciently small, there is no equilibrium in which q = q and some consumers do not purchase
Ms guide. From Step 3 and 4 the result follows. Step 1. Suppose that M chooses P = P
T

,
(as dened in Section 3.2)
with T

2 and q q. Assume that is small enough so that P


T

,
is well-dened. Consider any
C
. Let there be a
period t and a M-history h
M
t
that occurs with positive probability, Q

(s
t
= 1|h
M
t
) > 0 and Q

(e
t
= 1|s
t
= 1, h
M
t
) < 1. Let h
C
t
be the corresponding C-history. Since > 0, trade takes place in period t with positive probability. Consider any period
{t +1, . . . , t + T

} and assume that the normal consumer C

observes h
C
t
, s
t
= 1 and that F has not been recommended
since period t. Let h
C

be the corresponding C-history. Then, by the construction of P


T

,
, we have Q

(s

= 1|h
C

) < p. If F
exerts effort only if it is recommended to consumers, C

s expected payoff from trade with F without guide is at most


Q

(s

= 1|h
C

) p < 0, i.e., C

then strictly prefers not to trade with F . Step 2. We show that we can nd T

and > 0
such that (i) P
T

,
is well-dened if < and (ii) F always follows Ms recommendation in any continuation equilibrium
(
F
,
C
) when < and M has chosen q q as well as P = P
T

,
in period 0. Fix T

large enough such that


(p c) +

1
p(p c) > p +

T

+1
1
p. (39)
The assumption on makes sure that this is possible. Clearly, if is suciently small, then P
T

,
is well-dened when
< . Now assume by contradiction that there is a continuation equilibrium (
F
,
C
) where F does not always follow Ms
recommendation when P = P
T

,
and q q. Consider any period t and M-history h
M
t
that occurs with positive probability
where Q

(s
t
= 1|h
M
t
) > 0 and Q

(e
t
= 1|s
t
= 1, h
M
t
) < 1. Let h
C
t
be the corresponding C-history. If s
t
= 1 and no trade
takes place in period t, F s expected discounted payoff is independent from its effort. Suppose that s
t
= 1 and trade takes
place in period t. If F does not exert effort, M sets s

= 0 in all periods {t + 1, . . . , t + T

}. Clearly, F never exerts


effort in a period where it is not recommended to consumers. By denition, commitment consumers never trade with F
in this phase. By Step 1, the same holds for normal consumers. Consequently, if F exerts no effort in period t, its expected
discounted payoff is less than p +

T +1
1
p, while if it follows Ms recommendation in period t and all future periods, it is
at least (p c) +

1
p(p c). Hence, the choice of T

ensures that it is a unique best-response for F to always follow


Ms recommendation. Step 3. We show that if < , there is no equilibrium in which q < q. Assume by contradiction that
such an equilibrium exists. If M chooses P = P
T

,
and q

(q, q), then, by Step 2, the value of Ms guide is q for each


16
This is without loss of generality. If
1
p and
0
> p instead, the interpretation of s
t
= 1 would simply change from I recommend buying from F
to I do not recommend buying from F .
J. Mikls-Thal, H. Schumacher / Games and Economic Behavior 79 (2013) 132147 147
consumer, i.e., the unique best-response for each consumer is to purchase Ms guide and to follow the recommendation.
Hence, by choosing P = P
T

,
and q

, M earns more than if it chooses q, a contradiction. Step 4. We show that if < ,


there is no equilibrium in which q = q and some consumers do not purchase Ms guide. Assume by contradiction that an
equilibrium exists where for some period t we have a

= 1 for all {1, . . . , t 1} and a


t
= 0 with positive probability
on the equilibrium path. If M chooses P
T

,
and lowers q by a suciently small amount, then, by Step 2, it becomes the
unique best-response for all consumers to follow Ms recommendation. Hence, M can increase its prot compared to the
original situation, a contradiction. 2
Supplementary material
The online version of this article contains additional supplementary material.
Please visit http://dx.doi.org/10.1016/j.geb.2013.01.005.
References
Albano, Gian L., Lizzeri, Alessandro, 2001. Strategic certication and provision of quality. Int. Econ. Rev. 42 (1), 267283.
Admati, Anat R., Peiderer, Paul, 1986. A monopolistic market for information. J. Econ. Theory 39 (2), 400438.
Admati, Anat R., Peiderer, Paul, 1988. Selling and trading on information in nancial markets. Amer. Econ. Rev. Pap. Proc. 78 (2), 96103.
Anderson, Eugene W., 1998. Customer satisfaction and word of mouth. J. Serv. Res. 1 (1), 517.
Ben-Porath, Elchanan, Kahneman, Michael, 2003. Communication in repeated games with costly monitoring. Games Econ. Behav. 44 (2), 227250.
Biglaiser, Gary, 1993. Middlemen as experts. RAND J. Econ. 24 (2), 212223.
Biglaiser, Gary, Friedman, James W., 1994. Middlemen as guarantors of quality. Int. J. Ind. Organ. 12 (4), 509531.
Crawford, Vincent P., Sobel, Joel, 1982. Strategic information transmission. Econometrica 50 (6), 14311451.
Dixit, Avinash, 2003. On modes of economic governance. Econometrica 71 (2), 449481.
Flesch, Jnos, Perea, Andrs, 2009. Repeated games with voluntary information purchase. Games Econ. Behav. 66 (1), 126145.
Gershkov, Alex, Szentes, Balzs, 2009. Optimal voting schemes with costly information acquisition. J. Econ. Theory 144 (1), 3668.
Kandori, Michihiro, Obara, Ichio, 2004. Endogenous monitoring. Working paper, University of Tokyo and UCLA.
Kennes, John, Schiff, Aaron, 2007. Simple reputation systems. Scand. J. Econ. 109 (1), 7191.
Kennes, John, Schiff, Aaron, 2008. Quality infomediation in search markets. Int. J. Ind. Organ. 26 (5), 11911202.
Klein, Benjamin, Leer, Keith B., 1981. The role of market forces in assuring contractual performance. J. Polit. Economy 89 (4), 615641.
Lizzeri, Alessandro, 1999. Information revelation and certication intermediaries. RAND J. Econ. 30 (2), 214231.
Mailath, George, Samuelson, Larry, 2006. Repeated Games and Reputations. Oxford University Press, New York.
Mayzlin, Dina, Dover, Yaniv, Chevalier, Judith A., 2012. Promotional reviews: An empirical investigation of online review investigation. Working paper, Yale
School of Management.
Miyagawa, Eiichi, Miyahara, Yasuyuki, Sekiguchi, Tadashi, 2008. The Folk theorem for repeated games with observation costs. J. Econ. Theory 139 (1),
192221.
Peyrache, Eloc-Anil, Quesada, Lucia, 2011. Intermediaries, credibility and incentives to collude. J. Econ. Manage. Strategy 20 (4), 10991133.
Shapiro, Carl, 1983. Premiums for high quality products as returns to reputation. Quart. J. Econ. 98 (4), 659680.
Strausz, Roland, 2005. Honest certication and the threat of capture. Int. J. Ind. Organ. 23 (12), 4562.