You are on page 1of 8

BOLYAI SOCIETY MATHEMATICAL STUDIES, 16

Entropy, Search, Complexity, pp. 151158.

Reinforced Random Walk

MICHAEL KEANE

One of the distinguishing properties of the present scientic method is reproducibility. In one of its guises, probability theory is based on statistical reproduction, near certainty being obtained of truth of statements by averaging over long term to remove randomness occurring in individual experiments.When one assumes, as is often the case, that events farther and farther in the past have less and less inuence on the present, the probabilistic paradigm is currently well understood and is successful in many scientic and technological applications. Recently, however, we have come to realize that precisely in these applications important stocahstic processes occur whose present outcomes are signicantly inuenced by events in the remote past. This behaviour is not at all well understood and some of the simplest questions remain today irritatingly beyond reach. A salient example occurs in the theory of random walks, where there is a dichotomy between recurrent and transient behaviour. After explaining this classical dichotomy, we present a very simple example with innite memory which is neither known to be transient nor recurrent. Then, using a reinforcement mechanism due to Plya, we explain the nature of a particular innite memory process o in terms of spontaneous emergence of opinions. Finally we would like to discuss briey some of our recent results towards understanding the recurrence-transience dichotomy for reinforced random walks.

An open problem
First consider the graph whose vertices are the points of Zd and whose edges are the subsets {z, z } of Zd with |zz | = 1, |.| denoting Euclidean distance. Here d is a xed positive integer. Let Sn , n 0, be the position of a simple random walker on this graph: S0 = 0 (the origin of Zd ) |Sn+1 Sn | = 1 for each n 0

152

M. Keane

the jumps Xn+1 = Sn+1 Sn , n 0, are independent identically distributed random variables whose common distribution is uniform over the 2d possible elements in Zd of length one. In general, a discrete time and space stochastic process Sn , n 0, is called recurrent if P(n 1 : Sn = S0 ) = 1 , and transient if this probability is less than one. We recall at this point a remarkable result due to Plya: o If d = 1 or d = 2, simple random walk on Zd is recurrent. If d 3, simple random walk on Zd is transient. As simple random walk is a Markov process, this yields a strong dichotomy: If d = 1 or d = 2, each point of Zd is visited an innite number of times by Sn , with probability one. If d 3, each point of Zd is visited at most nitely often by Sn , with probability one; that is, limn Sn = with probability one. Our open problem consists of a seemingly slight modication of simple random walk on Z2 . We denote the modied stochastic process by Sn , n 0. We still require S0 = 0 (the origin of Zd ) |Sn+1 Sn | = 1 for each n 0 the jumps Xn+1 = Sn+1 Sn , n 0, are distributed over the four possible elements of Z2 of length one. However, these jumps Xn+1 will no longer be uniformly distributed; their distribution will depend on the history {S0 , . . . , Sn } of our process. The idea is to make the probability of traversing an edge at time n + 1 larger than 1/4 if the edge was previously visited, and less than 1/4 if the edge was not previously traversed. A simple mechanism for this is to introduce a parameter > 1, xed, and to require that this probability is proportional to for edges previously traversed and proportional to 1 for edges not previously traversed. For example, if = 2 and if we arrive at a point z Z2 at time n, having previously traversed two of the four edges containing z (in either direction),

Reinforced Random Walk

153

then the probability of leaving z at time n + 1 via one of these two edges is 1 = , 3 2 + 2 whereas the probability of leaving z via one of the two virgin edges is 1 1 = ; 6 2 + 2 note that is simply the factor by which the probability of taking an edge is increased if it has previously been taken. Note also that if = 1 we have the original simple random walk of Plya in Z2 , which is recurrent. o Open Problem. Prove that for some > 1, the reinforced random walk Sn , n 0, is recurrent. We believe strongly that this is true for any > 1, and in fact, it is easy to show that it is true in d = 1 for any > 0. (If < 1 then, strictly speaking, we should not use the word reinforced, but this is a minor point.) However, the process Sn is in no way similar to a Markov process, having memory which does not attenuate as time progresses, and up to now this problem remains beyond reach using currently known techniques.

Spontaneous emergence of opinions


In this section we attempt to give the reader a feeling for the naturality of questions concerning reinforcement. We feel this to be necessary for the following philosophical reason. The Markovian paradigm has been an extremely successful driving force for the development of probability theory over the last century, arguably the most important for practical applications. The main reason for this is that a large number of physical and technological type stochastic processes do have an asymptotic forgetfulness property what happened in the past has less and less eect as time progresses, or, what happened very far away does not signicantly inuence the happenings here. This is a so-called locality principle which, if we exclude quantum mechanics, seems to be a law of nature; even in quantum probability there has been a large discussion and much disagreement concerning the possible validity of this type of locality. (As it now seems, the question has been

154

M. Keane

decided through Bells work and the Aspect experiments, but the discussion still continues.) Thus it seems to be useful to point to naturally arising processes which in no sense obey this principle. Here, I have chosen to illustrate the point with a story concerning spontaneous emergence of opinions; there are a number of parallel ideas in the realm of (universal) coding theory, and in this audience there are certainly people more capable than I of illustrating this phenomenon. What follows is not new mathematics, being based upon classical ideas again due to Plya, developed around the 1920s o the important point is the interpretation in terms of reinforcement and the non-Markovian nature, leading to surprising behavior. Let me begin by telling the story. Some 25 years ago I moved to The Netherlands and bought a house in Scheveningen, a bathing resort on the cost which is part of the town of The Hague, seat of government and royal residence. We knew at that point nothing about the surroundings. Our house was very close to the beach and also to the center of night life, so in the evening we had two natural alternatives for amusement: a visit to a nice bar B a stroll on the beautiful beach b Thus we consider a graph with one vertex (our house) and two loops B (bar) and b (beach) from the vertex to itself. Our original opinion is denoted by a function giving weights to each of the loops we start with each loop having weight one. Each evening we search for amusement for simplicity let us assume (but see the next section for more generality) that our search is restricted to B and b and we choose one of these with a probability proportional to its current weight. Thus the rst evening we visit the bar with probability 1/2 and the beach with probability 1/2. If, for example, we visit the bar and have a drink, we enjoy it very much, and decide to increase the weight of B by adding one to it this would result in weight two for bar and weight one for beach. If the beach is visited, its weight will be increased by one. After n 2 nights of entertainment, suppose that we have chosen k 1 times bar and l 1 times beach, with k, l 1 and k + l = n. Then at this time, the bar weight is k and the beach weight is l, and the probability k n

n =

Reinforced Random Walk

155

of choosing bar the next time is our alcohol preference, whereas the probability l n = 1 n = n is our nature preference, or probability of choosing beach the next time. Thus our opinion at time n (after n 2 choices) is represented by the random pair (n , n ). It is a (not so simple but) interesting result of Plya which tells us: o Our opinion becomes more and more stable as time passes. or, in mathematical terms, = lim n
n

= lim n
n

exist with probability one. Nowadays we understand the reason behind this result to be contained in the martingale convergence theorem. That is, a simple calculation yields: E n+1 n = k n = k k(n + 1) k l k k+1 = = n = + n n(n + 1) n n+1 n n+1

so that n is indeed a positive martingale, and the theory says that all positive martingales converge. (Plyas proof was dierent, and perhaps o simpler; it is not such an easy matter to prove martingale convergence theorems.) Next comes a minor surprise: Suppose that not only I have bought a house in Scheveningen, but also a number of others, who search in the same manner for entertainment. Each of these householders develops a more and more stable opinion, but these opinions dier in the limit. For instance, perhaps in the limit (Keane) = 0.9 (v.d.Toorn) = 0.2 (Pronk) = 0.6 (Hermina) = 0.7

156
(Lootsma) = 0.3 (Trahtenbroit) = 0.8

M. Keane

and so on. The second part of Plyas result tells us the distribution of the o limit opinion : The random limit is uniformly distributed over the unit interval Thus, although the alcohol preference of a given individual is random, we can predict the preferences of the population as a whole using its distribution, which is uniform! Some are alcoholic, some are nature lovers, and all types of mixtures occur equally often; each person is convinced of his or her own opinion. This behavior is one of the salient characteristics of reinforced random walk. It is in fact very surprising that such a calculation can be accomplished, and in our open problem of the rst section we know of no way to do this type of calculation. The reason that it is possible in this case is usually called (partial) exchangeability. This has been intensively studied by Diaconis and Friedman, and later for innite graphs on trees by Pemantle. In our example things are very simple, which we now explain. It is best to take a sample event. Suppose that for the rst eight visits, three were visits to the beach and ve were visits to the bar, in the following order: BBbBbbBB (B = bar, b = beach). Let us now calculate the probability of this event; after a bit of thought we see that it is 3!5! 1 2 1 3 2 3 4 5 = 9! 2 3 4 5 6 7 8 9 (The reader should now verify this calculation step by step. 6 At this point, n = 10, k = 6, l = 4, and 10 = 10 . The important point to notice is that if we have any other sequence of eight visits with three bs and ve Bs, then this probability remains the same. More generally, if n, k kl, and l are given, then n = n and each sequence of k 1 Bs and l 1 bs has probability (k 1)!(l 1)! ; (n 1)! as there are (n 2)! (k 1)!(l 1)!

Reinforced Random Walk

157

such sequences, we see that P n = k n

1 is the product of these two numbers, which is simply n1 . This is valid for each k in the range 1 k n 1, which shows clearly that the limit distribution is uniform.

This concludes our philosophical section.

The current state of affairs


In this section I describe informally what we have done in the past few years concerning reinforcement. First of all, the result of Plya for the simple case o above has been extended to random walk with reinforcement on a nite graph with arbitrary initial weights, according to a suggestion of Diaconis and Coppersmith. This, together with related references, can be found in [1]. Sellke and Vervoort have studied intensively once reinforced random walks (as in the rst section) on biinnite strips of nite widths, which are called ladders. The most recent results can all be found in the thesis of Vervoort [3]. It is still unknown whether for any value of the reinforcement parameter , once reinforced random walk is recurrent or not on ladders of width larger than two. The case of width two was settled a number of years ago by an interesting calculation due to T. Sellke (unpublished). In [2], we treat multiple reinforcement in essentially one-dimensional cases (tubes) when the weights are put on directed edges. Curiously enough, this is simpler and leads to relations with random walk in random environment. It is still unknown whether random walk with multiple reinforcement is recurrent or transient in two or higher dimensions, or whether the behavior depends on the amount of reinforcement. We do not discuss here the case in which the underlying graph is a tree, for which there are many interesting and nontrivial results.

158 References

M. Keane

[1] M. S. Keane and S. W. W. Rolles, Edge-reinforced random walks on nite graphs, in: Innite Dimensional Stochastic Analysis (eds. Ph. Clment et al.), KNAW e Verhandelingen, Afdeling Natuurkunde, Eerste Reeks, Deel 52, (2000), pp. 217 234. [2] M. S. Keane and S. W. W. Rolles, Tubular recurrence, Acta Mathematica Hungarica, 97 (2002), 207221. [3] M. Vervoort, Games, walks and grammars: some problems Ive worked on, Dissertation, University of Amsterdam (2000).

Michael Keane
Department of Mathematics Wesleyan University Middletown, Connecticut 06459 U.S.A.
mkeane@wesleyan.edu

You might also like