(2009 Springer Bookchapter) Modeling Learning Impacts On Day-To-day Travel Choice

Transportation and Traffic Theory 2009
Edited by W.H.K. Lam, S.C. Wong and H.K. Lo

2009 Springer Science+Business Media. All rights reserved.
Chapter 19
Modeling Learning Impacts on Day-to-day Travel
Choice
Ozlem Yanmaz-Tuzel and Kaan Ozbay, Rutgers University, U.S.A.
Abstract This paper uses Stochastic Learning Automata and Bayesian Inference
theory to model drivers day-to-day learning behavior in an uncertain environment. The proposed model addresses the adaptation of travelers on the basis of
experienced choices and user-specific characteristics. Using the individual commuter data obtained from New Jersey Turnpike, the parameters of the model are
estimated. The proposed model aims to capture the commuters departure time
choice learning/adaptation behavior under disturbed network conditions (after toll
change), and to investigate commuters responses to toll, travel time, departure/arrival time restrictions while selecting their departure times. The results have
demonstrated the possibility of developing a psychological framework (i.e., learning models) as a viable approach to represent travel behavior.
1.
Introduction
The need for better understanding the dynamics day-to-day evolution of travel
choices has been recently recognized by several researchers (Ramming 2002). A
better grasp of individual travelers decisions will shed light to the behavioral
processes involved in travel decisions. However, it is very difficult to understand
and then accurately represent the underlying decision mechanisms in highly dynamic, non-stationary and uncertain systems like transportation systems. The key
mechanisms underlying these decision dynamics include how commuters update
their perceptions (i.e., learning) about the transportation system and how they
adapt their behavior as a result of this learning.
In transportation field, only a few studies have focused on jointly modeling
learning and adaptation processes and fewer studies have attempted to estimate
the parameters of such a model empirically (Jotisankasa and Polak 2006). The
purpose of this study is to design an agent-based learning system which can model
the learning and adaptation processes of travelers day-to-day departure time
choices in a non-stationary stochastic environment, and test this model empirically
using a real transportation network, i.e. New Jersey Turnpike (NJTPK).
388 Transportation and Traffic Theory 2009
A wide variety of existing studies model the perception of travel time on a given day as the weighted average of previous days travel times (Horowitz 1984;
Ben-akiva et al. 1991; Nakayama et al. 2001; Ettema et al. 2004; Jotisankasa and
Polak 2005). However, recent research has shown that travelers do not necessarily
minimize travel time when making a travel choice (Kahneman and Tversky 1979;
Mahmassani and Chang 1985; Roth and Erev 1995; Avineri and Prashker 2003;
Senbil and Kitamura 2004). Rather they may adopt some simple rules; such that
favorable and acceptable outcomes, associated with selecting a particular strategy, increase the probability that this strategy would be chosen again. In a recent
study, Jotisankasa and Polak (2006) developed a day-to-day learning model based
on bounded rationality, similar to the one by Chen and Mahmassani (2004). The
authors assumed that the travelers would update their perception of travel time if
the difference between the perceived and experienced travel time exceeds some
threshold values. This kind of learning process, where travelers update their behavior based on previous experiences, has been studied by applying Reinforcement
Learning (Erev and Roth 1998; Erev et al. 1999; Nakayama et al. 2001; Arentze
and Timmermans 2003; Avineri and Prashker 2003; Schreckenberg and Selten
2004; Miyagi 2005; Bogers et al. 2007; Selten et al. 2007), Bayesian Learning
(BL) (Kobayashi 1994, March 1996; Jha et al. 1998; Chen and Mahmassani 2004)
and Stochastic Learning Automata (SLA) (Ozbay et al. 2001; Ozbay et al. 2002;
Ozbay and Yanmaz-Tuzel 2006).
SLA mimics drivers day-to-day learning behavior by updating the travelers
choice probabilities based on their experience with the system. Unlike other forms
of learning frameworks, in SLA the environment is treated as an unknown random
media in which an automaton operates. In fact, in SLA the environment is treated
as an unknown random media in which an automaton operates, and the response
of the environment to a specific action rather than the environment itself is considered. In simple terms, SLA approach is an inductive inference mechanism that
updates the probabilities of its actions occurring in a stochastic environment to
improve a certain performance index. This process is naturally closely related to
BL, in which the distribution function of a parameter is updated at each instant on
the basis of new information. However, in BL updating takes place according to
Bayes rule, while it is more general in SLA (Narendra and Thathachar 1989).
This research focuses on developing a theoretical framework for modeling behavioral mechanisms in departure time choice and illustrates the implementation
of this framework on NJTPK toll road. In particular, SLA theory is extended by
using it in a Bayesian framework and bounded rationality (BR) to model NJTPK
users day-to-day learning behavior within the context of departure time choice. In
this approach, Bayesian framework systematically accounts for commuters belief
and perceptions about the transportation system, and represents these dynamics as
random variables. Specifically, each user updates his/her choice based on the rewards/punishments received due to selected actions in the previous days. At the
end of each day, favorable actions are rewarded, while unfavorable actions are punished. Whether an action is favorable or not is determined using bounded ratio-
Modeling Learning Impacts on Day-to-day Travel Choice 389
nality approach via indifference bands calculated around the travelers experienced utility function value and deviation from desired arrival time. Utility functions are estimated via Bayesian random-coefficient (BRC) models, considering
variables other than travel-time, such as toll, departure time, early/late arrival
amount, income, employment type, education level, gender, age and interactions
between them. After determining favorable and unfavorable actions, a linear reward-penalty reinforcement scheme is considered to update day-to-day learning
behavior of NJTPK users, and to investigate commuters responses to toll, travel
time, and departure/arrival-time restrictions while selecting their departure times.
One of the main contributions of this paper is the fact that it introduces user heterogeneity into day-to-day learning modeling via Bayesian Inference approach,
and estimates posterior probability distribution of the learning parameters. In addition to the extended Bayesian-SLA framework it is proposing, this paper uses extensive vehicle-by-vehicle real world traffic data to understand the traveler responses to real changes in the transportation system. This is a unique opportunity to
test the validity this and other models using observed vehicle-by-vehicle traffic
data obtained from a complex and highly stochastic traffic system such as NJTPK.
2.
Day-to-day Learning Model
In this paper, commuters day-to-day learning behavior on the basis of experienced travel choices and user-specific characteristics is modeled via BayesianSLA theory. In SLA, automaton attempts a solution to the problem without any information on the optimal action. One action is selected at random, response from
the environment is observed, action probabilities are updated based on that response, and the procedure is repeated. Stochastic automaton acting as described to
improve its performance is called a learning automaton (Narendra and Thathachar
1989). The objective in the design of the automaton is to determine how the
choice of the action at any stage should be guided by past actions and responses.
Learning automata is concerned with the analysis and synthesis of the automata
which operate in random environments. In this section we describe the random
environment, structure and characteristics of the automata, and the mathematical
tools that are applicable to the analysis of such systems.
2.1 Environment
In SLA, environment, in our case the transportation system, is defined as a large
class of unknown random media in which an automaton (traveler) can operate.
Mathematically, an environment is represented by a triple { , c, } , where
represents a finite action/input set (travel choice in our model), represents an

output set (utility experienced from a choice), and c is a set of penalty probabilities, where each element c i corresponds to one action i of the set . The action (n ) of the automaton is then applied to the environment at time t=n. Consequently, c i represents the probability that the action i will result in a penalty
output. The elements of c are defined as:
(i = 1,2,...r ) ,
Pr ( (n ) = 1 | (n ) = i ) = c i
(1)
Several models are defined by the response set of the environment. Models, in
which the output can take only, 0 or 1, are referred to as P-models. In this case, response value of 1 corresponds to an unfavorable (failure) response, while output
of 0 means the action is favorable. The focus of this paper is P-models.
2.2 The Stochastic Automaton

The automaton takes in a sequence of inputs and puts out a sequence of actions.
Mathematically, the automaton can be represented by a quintuple
{, , , F (.), H (.)} (Fig. 1) (Narendra and Thathachar 1989):
is asset of internal states
is a set of actions (or outputs of the automaton)

is a set of responses (or inputs from the environment)
F (.) : is a stochastic function that maps the current state and input
into the next state
H (.) : is a function that maps the current state and input into the
current output
Fig. 1. The automaton and the environment (Narendra and Thathachar 1989)
2.3 Behavior Updating Mechanism

An automaton generates a sequence of actions on the basis of its interaction with
the environment. Each action results in a favorable or unfavorable response. In
this paper, whether a choice is favorable or unfavorable is determined via experienced utility value and deviation from desired arrival time of the selected choice.
Moreover, it is assumed that travelers exhibit a tendency to search for satisfying
choices rather than the best behavior; thus they do not have the cognitive ability to
process all the information simultaneously and are happy with a good solution. To
incorporate this kind of behavior, bounded rationality (BR) approach (Simon
1955) is included in the behavior updating mechanism. Various assumptions considered in this paper while applying BR in departure time choices are:
Based on former choices, a traveler has personal experiences. From these

experiences, s/he can learn about the characteristics of the actions. Information about the actions not chosen by the traveler can only be updated, if s/he
has chosen them in the past.
To account for BR, we use indifference bands. As long as the outcome of the
travel choice falls within the indifference bands, the travelers will not update
their choice. In calculating the indifference bands, 10% confidence intervals are used.
The utility values employed in the behavior updating mechanism are estimated via
Bayesian random coefficient (BRC) models. Next section provides the details of
this framework.
2.3.1
Bayesian Random Coefficient Models
One of the important contributions of this paper is the introduction of the population heterogeneity through the use of BRC models. This is especially important in
the case of a learning model where different users have different learning behavior
that can be represented through varying the coefficients of the model. The descriptive variables considered in the BRC models are summarized in Table 1.
In BRC, coefficients of the utility model are assumed to vary in the population
rather than being fixed at the same value for each person. Thus, unlike the classical approach, in Bayesian statistics, parameters are treated as random variables, and
prior knowledge about parameter vector (model coefficient vector) is
represented by a prior distribution, p( ) . The prior distribution can either be based
on previous empirical work, or researchers subjective beliefs. In this paper, we
assume multivariate normal distribution for the prior of parameter vector ,
which is the most commonly used prior for regression parameters:
p( ) ~ Normal k (b0 , 0 ) ,
(2)
where b0 is the vector of means for the k explanatory variables, 0 is the kxk variance-covariance matrix. To specify the values of b0 and 0 , empirical Bayes
approach is utilized, i.e. some part of the data set is selected randomly, and BRC
model is fitted to this randomly selected subset, using non-informative priors.
Then, resulting posterior mean vector and variance-covariance matrix for the values of b0 and 0 are used as the prior for the training dataset (85% of the dataset),
and BRC model is re-estimated. After observing some data, using Bayes Theorem the information about is given by the posterior distribution:
Normal ( | b0 , 0 ) log it X T
p(Y | X , ) f ( )
. (3)
=
p(Y | X , ) f ( )d Normal ( | b0 , 0 ) log it X T d
The posterior distribution in the above equation is a complex multidimensional
function which requires integrating. Thus, sampling methods such as modern
Bayesian Monte Carlo algorithms are needed to summarize the posterior distribution via sampling methods. In this paper, using Random Walk Metropolis (RWM)
algorithm we produce Markov Chain Monte Carlo samplers to estimate BRC
model for the travel choice at NJTPK. Based on the prior information and likelihood function, RWM algorithm approximates the asymptotic normal distribution:
T
1
^
^
p( | X , Y ) ~| H |1 / 2 exp H .
(4)
2

The RWM proposal distribution is centered at the current value of and has
p( | X , Y ) =
variance-covariance H = T B01 + C 1 T . In this formulation T is the diagonal

positive definite matrix of the RWM tuning parameter (set to a constant value
such that the acceptance probability, is between 0.2 and 0.5), and C is the large
sample variance-covariance matrix of the maximum likelihood estimates.
Table 1. Definition of variables
Variable
Choice Variables
E-ZPass pre-peak
E-ZPass peak
E-ZPass post-peak
Explanatory Variables
Time
Toll
Early
Late
Dep. Time
Income
Age
Female
Education
Employment
Description
Respondents traveling at pre-peak periods
Respondents traveling at peak periods
Respondents traveling at post-peak periods
Travel time, in hours
Toll paid per occupancy, in dollars
Amount of early arrival time, in minutes
Amount of late arrival time, in minutes
(Departure time) (Desired arrival time), in min
Income level, in $10,000
Age
1 if female, 0 otherwise
1, if user has at least bachelor degree,0 otherwise
1, if user is manager or professional,0 otherwise
2.4 Reinforcement Schemes

In stochastic systems, after determining whether an observed action is favorable or
not, probability values for each action are updated at every state using a reinforcement scheme. In general terms a reinforcement scheme can be represented as
(Narendra and Thathachar 1989):
(5)
p(n + 1) = T [ p(n ), (n ), (n )] ,
where T is mapping. If p(n+1) is a linear function of p(n), the reinforcement
scheme is said to be linear, otherwise it is termed nonlinear. Since this paper utilizes a linear reinforcement scheme with multi-actions, the update process of action probabilities in a linear environment is discussed in detail. This kind of learning scheme is called linear reward-penalty learning scheme and denoted by
L R P .
For an r-action learning automaton, the linear reinforcement scheme is given as
(Narendra and Thathachar 1989):
if (n ) = i ,
(6)
j i
p j (n + 1) = (1 ). p j (n )
,
(7)
(n ) = 0
(
)
(
)
(
)
[
]
p
n
+
=
p
n
+
p
n
1
1
i
i
i
b
+ (1 b ). p j (n )
p j (n + 1) =
r 1
p i (n + 1) = (1 b ) p i (n )
(n ) = 1
j i
(8)
where 0<a<1 is the reward parameter, and 0<b<1 is the penalty parameter of the
reinforcement scheme.
The concepts associated with the convergence of SLA require sophisticated mathematical tools, and the nature of convergence depends on the kind of reinforcement scheme employed (Narendra and Thathachar 1989). The multi-action automaton using linear reward-penalty scheme L R P , is expedient for all initial
action probabilities and in all stationary random environments, i.e. the automaton
will behave better than the pure chance automaton. The details of the derivation
for the expedient criterion can be found in Narendra and Thathachar (1989).
In previous studies, learning parameters (a, b) were estimated via train and error approach (Narendra and Thathachar 1989; Ozbay et al. 2001; Ozbay et al.
2002; Ozbay and Yanmaz-Tuzel 2006). This study, on the other hand, utilizes
Bayesian Inference theory, and estimates the posterior distribution of these parameters. Next section provides the details of the estimation process.
2.5 Posterior Distribution of Learning Parameters

In this paper, learning parameters (a, b) are estimated via Bayesian Inference approach. In particular, given the likelihood of the observations and the prior information regarding parameters (a, b), joint posterior distribution of the learning pa-
rameters are estimated. Unlike maximum likelihood analysis, the aim of a Bayesian analysis is not to provide so-called point estimates of the model parameters;
the result of the analysis is the posterior probability distribution itself. With this
approach, it is possible to introduce user heterogeneity into the estimation process,
and to investigate distribution of the learning parameters among different users.
The proposed likelihood function of the observations is estimated via following
equation:
K
p(D | a, b ) = p ki (n ) ,
k =1 n =1 i =1
[ p (n 1) + (1 p (n 1))] ki (n 1)(1 k (n 1))

ki
ki
(
[(1 a ) p ki (n 1)] ki (n 1))(1 k (n 1)) .
K N r
(
)
(
)
n
n
p (D | a, b ) = [(1 b ) p (n 1)] ki
k
,
.
ki
k =1n =1i =1
(1 ki (n 1)) k (n 1)
b
+ (1 b ) p ki (n 1)
r 1
(9)
where p(D | a, b ) = Likelihood function of the observations D given learning parameters (a, b); k = index for users (K: total number of users); n= index for days
(N: total number of days); i = index for choices (r: total number of choices);
pki (n 1) = probability of selecting choice i for user k on day (n-1);
1 if user k selects choice i on day (n - 1)
0 otherwise
1
0
ki (n 1) =
k (n 1) =
if user k experiences a favorable action on day (n-1)

otherwise
and
Similarly, the prior distribution of the learning parameters (a, b) can be

represented by p(a,b). In this paper, Beta, Normal and Uniform distributions are
tested as prior distributions. For illustration purposes Normal prior distribution of
p(a) and p(b) is provided here. In this case, functional form of joint prior distribution is as follows:
1
1
(10)
exp (x )T 1 (x ) ,
p(a, b ) =
1/ 2
2
2 | |

2
where = a ; = a 0 ; and x = (a, b ) .

2
b
0 b
Finally, the posterior distribution of the learning parameters given the observations, p(a, b | D ) , can be calculated using Bayes theorem:
p(D | a, b ) p(a, b )
p(a, b | D ) =
p(D | a, b ) p(a, b )dadb ,
(11)
a ,b
p (D | a, b ) p (a, b )
p (D )
Since p(D ) is independent of (a, b) the posterior distribution of the learning parameters will be proportional to the multiplication of the likelihood function and
the prior distributions of the learning parameters:
p(a, b | D ) ~ p(D | a, b ) p(a, b ) .
(12)
Posterior distribution of the learning parameters is very complex multidimensional function which requires integrating. Thus, in order to obtain the mean and
variance of the parameters (a, b) Metropolis-Hastings (M-H) algorithm is used.
The Metropolis-Hastings algorithm is a rejection sampling algorithm used to generate a sequence of samples from a probability distribution that is difficult to sample from directly. The details of the algorithm can be found in Gelman et al.
(2003). Since inference of the posterior distribution of (a, b) is based on the simulation of the posterior distribution by construction of a MCMC via M-H algorithm,
the chain needs to be monitored and tested for convergence. To ensure MCMC
convergence, Heidelberger and Welch (1983) first test diagnostic was employed.
This diagnostic compares the observed sequence of MCMC samples to a hypothetical stationary distribution, using the Cramer-von-Mises statistic. The test iteratively discards the first 10% of the chain until the null hypothesis is not rejected
(i.e. the chain is stationary), or until 50% of the original chain remains. If the null
hypothesis is rejected each time, the stationarity test fails. For those samples
which pass the stationarity test, a second test which calculates a (1 ) 100%
confidence interval on the sample mean is executed. The half-width of this interval is compared to the mean over the same interval; if the ratio of the mean to the
half-width is larger than some threshold, the test fails. In the final estimation only
the parameters which have passed both tests are included. Given these specifications, next section focuses on the empirical testing of the proposed model.
3.
Model Estimation
3.1 Data Sources

The proposed SLA model is applied to the two road sections from Exit 15E and
Exit 18E and from Exit 15W to 18W located between Newark and George Washington Bridge along the NJTPK. The main reason to select these road sections is
that NJTPK road sections from an exit to another exit include both the demand between these two exits and the demand from that particular exits to other exits located further away. Thus, any change in the latter demand will affect the traffic
conditions in the selected road section. To minimize these outside effects we select road sections isolated from the other portions of NJTPK, i.e., more than 90%
of the traffic observed on these section is due to the demand between these particular exits.
While training and testing the proposed learning model two types of datasets
were considered. First dataset covers the traffic data which include real world vehicle-by-vehicle traffic and travel time data observed from passenger cars with toll
tags. The traffic data contain the counts for each 1 hour time interval from 6:00 am
to 10:00 am from January 2003 to March 2003, three months after the toll increase
at NJTPK (Ozbay et al. 2005). The travel time data include mean and standard
deviations of the travel times observed for the corresponding time period. During
estimation process, weekends and holidays were excluded from the database. For
each month approximately 15 days were considered. Preliminary analysis of the
response of travelers to disturbed conditions (toll increase on January 2003) can be
found in a study by Ozbay et al. (2006). The results of this analysis revealed that
travelers do not choose their travel choices solely based on toll differentials, but
travelers individual preferences affect their travel behavior.
The second dataset covers the individual travel survey which was used to estimate the utility functions and to provide information regarding users departure
time choices and their socio-economic characteristics. The surveys were conducted by the Eagleton Institute of Rutgers University (Ozbay et al. 2005). The
data set contains 513 observations, 483 (94.2%) of which are current regular users
residing in NJ. The survey participants were asked in detail about their most recent
trips in the am and pm peaks. The questions include origin, destination, toll, departure time, desired/actual arrival time of each trip, as well as the socio-economic
characteristics such as; income, education, employment, age and gender.
3.2 BRC-MNL Model Estimation

Utility function of each choice is estimated via revealed-preference traveler surveys conducted as a part of the Evaluation Study of NJTPK Authoritys Time-ofday Pricing Initiative (Ozbay et al. 2005).
For the proposed model, an input set X composed of the explanatory variables
is considered. Output set D = {d1, d2, d3} includes actions composed of three
choices (1: pre-peak from 6:00 am to 7:00 am, 2: peak from 7:00 am to 9:00 am,
and 3: post-peak from 9:00 am to 10:00 am). Using the explanatory variables obtained from traveler survey, a user-specific utility function is derived for each
choice based on the proposed Bayesian framework. Summary of the explanatory
variables along with the choice set is provided in Table 1. Estimation process is
conducted via statistical software R (www.r-project.org/). Table 2 shows the
mean and standard deviation of the coefficients for each utility function. As stated
in Section 2.3.1, each parameter follows a normal distribution, with the mean and
standard deviations provided in Table 2.

Table 2. BRC-MNL estimation results
E-ZPass pre-peak
Mean
SD
E-ZPass peak
Mean
SD
E-ZPass post-peak
Mean
SD
Constant
DepTime
-3.621
-0.069
0.122
0.045
-4.251
-0.083
0.160
0.029
-4.025
-0.082
0.110
0.025
tr time
Toll
-1.861
-0.673
0.691
0.112
-1.853
-0.680
0.350
0.121
-1.960
-0.650
0.220
0.150
Early
Income
-0.086
0.235
0.019
0.033
-0.105
0.242
0.028
0.012
-0.109
0.238
0.023
0.013
Late
income*tr time
income*toll
toll*tr time
Education
Age
Employment
Gender
-0.125
-0.218
-0.217
-0.895
0.712
0.088
0.667
0.751
0.041
0.051
0.025
0.055
0.062
0.055
0.219
0.123
-0.114
-0.198
-0.218
-0.713
1.241
0.112
0.861
0.951
0.035
0.064
0.011
0.035
0.055
0.036
0.025
0.062
-0.118
-0.215
-0.282
-0.887
0.520
0.098
0.620
0.439
0.047
0.054
0.026
0.089
0.085
0.039
0.063
0.050
3.3 Estimation of the Learning Parameters

Estimation process updates the action probabilities p(n + 1) at the end of each day
n based on L R P scheme, such that at the end of the estimation process, the difference between the observed and the calculated p j (N ) values at day N (last day
of the calibration period) is minimized. Unlike previous SLA models in the literature (Ozbay et al. 2001; Ozbay et al. 2002; Ozbay and Yanmaz-Tuzel 2006), we
combine Bayesian approach with SLA theory and estimate Bayesian posterior
probability distributions for reward and penalty parameters. For the normal distribution case, joint prior distribution for the learning parameters is selected as:
0
0.02
0.06
.
N =
(13)
, = 0
0
.
06
0
.
02
In order to determine the joint posterior distribution which represents the traveler
behavior the best, mean standard deviations (MSD) for each day were calculated
as the percent difference between observed traffic values and the assigned traffic
volumes using the converged learning parameters. The parameters which minimize the MSD value were selected. Fig. 2 provides the samples from joint posterior
distribution of the converged learning parameters (a, b). The samples are obtained
from Metropolis- Hastings algorithm with 10, 000 iterations coded in Matlab. The
sensitivity analysis revealed that Normal prior distribution resulted in lowest mean
MSD at a value of 0.07. These results show that the proposed Bayesian-SLA
model can successfully mimic NJTPK travelers day-to-day travel behavior.
Fig. 3 shows the histogram of the each learning parameter. The estimation
process resulted in Beta distribution for the posterior distribution of each parameter. Since beta distribution always lies within [0, 1], the constraints on the learning
parameters will be satisfied at all times. Mean values for the parameters (a, b) are
(0.062, 0.0067), and standard deviations are (0.0046, 0.0021), respectively.
p(a ) =
1
(a 0.003)1.29 (0.106 a )0.73 ,
B(2.29,1.73)
1.03 3.02
(b + 0.041)48.35 (0.106 a )3.75 ,

1
B(49.35,4.75)
0.052 53.1
where; B(.) = Beta distribution.
p(b ) =
Fig. 2. Samples from joint posterior distribution
Fig. 3. Histograms for the posterior distributions of the learning parameters
(14)
(15)
Mean values of the reward and penalty parameters estimated for NJTPK users
are different than the ones in other disciplines. These relatively different values
can be due to the fact that NJTPK commuters are familiar with the system, thus
can adapt themselves to the changes in the system rather quickly. This is in fact an
expected behavior for NJTPK users since most of the E-ZPass users are frequent
users of NJTPK and are familiar with the daily traffic conditions.
4.
Conclusions
In this paper, SLA theory was extended to take advantage of powerful Bayesian
Inference approach, to model drivers day-to-day learning behavior within the
context of departure time choice, and to evaluate the effect of a feedback mechanism on decision-making under uncertainty. The proposed model has several advantages as summarized below:
1. Day-to-day learning behavior is modeled based on Bayesian-SLA theory,
where each user updates his/her choice based on the rewards/punishments received due to selected actions in previous days. A linear reward-penalty reinforcement scheme is considered to represent day-to-day behavior of NJTPK
users as a response to toll changes while selecting their departure-times.
2. The original SLA model proposed by Ozbay et al. (2001, 2002) considered
only travel times with some random perception error, and assumed the same
reward/penalty parameters for each user. In this paper, by using realistic utility functions we introduced user heterogeneity into the SLA modeling process.
3. Instead of using just travel times, utility functions were introduced into the
learning model. These functions were estimated via BRC models, considering
a wide variety of explanatory variables, including travel time, toll, departure
time, early/late arrival, income, education, employment, age, and gender.
4. Finally, learning parameters were modeled as probability distributions rather
than deterministic values, and Bayesian posterior distributions are estimated.
To the best of our knowledge, this is the first attempt to dynamically model the
variations in perception in a day-to-day travel choice model. The estimation
process conducted via Bayesian Inference approach resulted in Beta distribution
for the posterior distribution of both of the learning parameters. Mean values for
the learning parameters (a, b) are (0.062, 0.0067), and standard deviations are
(0.0046, 0.0021). These results show that learning parameters are not the constant
among different users of the transportation system; rather they exhibit variations
in perception among the population.
This paper has attempted to gain insights into commuters learning behavior in
uncertain and dynamic environments. The results show that the proposed Bayesian-SLA model can successfully mimic NJTPK travelers day-to-day travel behavior, demonstrating the possibility of developing a psychological framework
(i.e., learning models) as a viable approach to represent travel behavior. The
present framework does not incorporate traffic assignment into the modeling
process; rather it uses observed travel times to model the learning behavior. Integrating the proposed day-to-day update mechanism into dynamic traffic assignment would demonstrate the possibility of developing a psychological framework
(i.e., learning models) as an alternative to represent traveler behavior.
References
Avineri, E. and Prashker, J.N. (2003). Sensitivity to uncertainty: the need for a paradigm shift.
Transportation Research Record, 1854, 90-98.
Arentze, T. and Timmermans, H. (2003). Modeling learning and adaptation processes in activitytravel choice: a framework and numerical experiment. Transportation, 30, 37-62.
Bogers, E.A.I., Bierlaire, M. and Hoogendoorn, S.P. (2007). Modeling learning in route choice.
Ben-Akiva, M.E., De Palma, A. and Kaysi, I. (1991). Dynamic network models and driver information systems. Transportation Research Part A, 25, 251-266.
Chen, R.B. and Mahmassani, H.S. (2004). Travel time perception and learning mechanisms in
traffic networks. Transportation Research Record, 1894, 209-221.
Erev, I. and Roth, A.E. (1998). Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review, 88,
848-881.
Erev, I., Bereby-Meyer, Y. and Roth, A. (1999). The effect of adding a constant to all payoffs:
experimental investigation, and implications for reinforcement learning models. Journal of
Economic Behavior & Organization, 39, 111-128.
Ettema, D., Timmermans, H. and Arentze, T. (2004). Modeling perception updating of travel
times in the context of departure time choice under ITS. ITS Journal, 8, 33-43.
Gelman, A., Carlin, J.B., Stern, H.S. and Rubin D.B. (2003). Posterior simulation. Bayesian Data
Analysis, 291-292.
Heidelberger, P. and Welch, P. (1983). Simulation run length control in the presence of an initial
transient. Operations Research, 31, 1109-1144.
Horowitz, J.L. (1984). The stability of stochastic equilibrium in a two-link transportation network. Transportation Research Part B, 18, 13-28.
Jha, M., Madanat, S. and Peeta, S. (1998). Perception updating and day-to-day travel choice dynamics in traffic networks with information provision. Transportation Research Part C, 6,
189-212.
Jotisankasa, A. and Polak, J.W. (2005). Modeling learning and adaptation in route and departure
time choice behavior: achievements and prospects. Integrated Land-Use and Transportation
Models. Elsevier, Oxford, United Kingdom.
Jotisankasa, A. and Polak, J.W. (2006). Framework for travel time learning and behavioral adaptation in route and departure time choice. Transportation Research Record, 1985, 231-240.
Kahneman, D. and Tversky, A. (1979). Prospect theory: an analysis of decisions under risk.
Econometrica, 47, 263-291.
Kobayashi, K. (1994). Information, rational expectations, and network equilibria: an analytical
perspective for route guidance systems. Annals of Regional Science, 28, 369-393.
Mahmassani, H.S. and Chang, G.L. (1985). Dynamic aspects of departure time choice behavior
in commuting system: theoretical framework and experimental analysis. Transportation Research Record, 1037, 88-101.
March, J.G. (1996). Learning to be risk averse. Psychological Review, 10, 309-319.
Miyagi, T. (2005). A reinforcement learning model for simulating route choice behaviors in
transport network. Proceedings of 16th Mini - EURO Conference and 10th Meeting of
EWGT.

Nakayama, S., Kitamura, R. and Fujii, S. (2001). Drivers route choice rules and network behavior: do drivers become rational and homogeneous through learning? Transportation Research Record, 1752, 62-68.
Narendra, K.S. and Thathachar, M.A.L. (1898). Learning Automata. Englewood Cliffs, N.J.:
Prentice-Hall, Inc.
Ozbay, K., Datta, A. and Kuchroo, P. (2001). Modeling route choice behavior using SLA.
Ozbay, K., Datta, A. and Kuchroo, P. (2002). Application of stochastic learning automata for
modeling departure time and route choice behavior. Transportation Research Record, 1807,
154-162.
Ozbay, K., Holgun-Veras, J., Yanmaz-Tuzel, O., Mudigonda, S., Lichtenstein, A., Robins, M.,
Bartin, B., Cetin, M., Xu, N., Zorrilla, J.C., Xia, S., Wang, S. and Silas, M. (2005). Evaluation Study of New Jersey Turnpike Authoritys Time-of-day Pricing Initiative. FHWA-NJ2005-012.FHWA, U.S. Department of Transportation.
Ozbay, K. and Yanmaz-Tuzel, O. (2006). Modeling of commuters day-to-day learing behavior.
Proceedings of First International Symposium on Dynamic Traffic Assignment.
Ozbay, K., Yanmaz-Tuzel, O. and Holguin-Veras, J. (2006). Evaluation of combined traffic impacts of time-of-day pricing program and E-ZPass usage on New Jersey Turnpike. Transportation Research Record, 1960, 40-47.
Ramming, M.S. (2002). Network Knowledge and Route Choice. PhD thesis, Department of Civil
Engineering, Massachusetts Institute of Technology.
Roth, A.E. and Erev, I. (1995). Learning in extensive-form games: experimental data and simple
dynamic models in intermediate term, games and economic Behavior. Nobel Symposium, 8,
164-212.
Schreckenberg, M. and Selten, R. (2004). Experimental investigation of day-to-day route-choice
behavior and network simulations of autobahn traffic in North Rhine-Westphalia. Human Behaviour and Traffic Networks, 1, 1-22.
Selten, R., Schreckenberg, M., Pitz , T., Chmura, T. and Kube, S. (2007). Commuters route
choice behavior. Games and Economic Behavior, 58, 394-406.
Senbil, M. and Kitamura, R. (2004). Reference points in commuter departure time choice: a
prospect theoretic test of alternative decision frames. Journal of Intelligent Transportation
Systems: Technology, Planning, and Operations, 8, 19-31.
Simon, H.A. (1955). A behavioral model of rational choice. Quarterly Journal of Economics, 69,
99-118.
Sutton, R.S. and Barto, S. (1998). Reinforcement Learning. Cambridge, MA: MIT Press, 1998.

(2009 Springer Bookchapter) Modeling Learning Impacts On Day-To-day Travel Choice

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(2009 Springer Bookchapter) Modeling Learning Impacts On Day-To-day Travel Choice

Uploaded by

Copyright:

Available Formats

Transportation and Traffic Theory 2009

Edited by W.H.K. Lam, S.C. Wong and H.K. Lo

388 Transportation and Traffic Theory 2009

Modeling Learning Impacts on Day-to-day Travel Choice 389

Day-to-day Learning Model

390 Transportation and Traffic Theory 2009

represents a finite action/input set (travel choice in our model), represents an

2.2 The Stochastic Automaton

is asset of internal states

is a set of actions (or outputs of the automaton)

Modeling Learning Impacts on Day-to-day Travel Choice 391

2.3 Behavior Updating Mechanism

Based on former choices, a traveler has personal experiences. From these

Bayesian Random Coefficient Models

392 Transportation and Traffic Theory 2009

variance-covariance H = T B01 + C 1 T . In this formulation T is the diagonal

Modeling Learning Impacts on Day-to-day Travel Choice 393

2.4 Reinforcement Schemes

2.5 Posterior Distribution of Learning Parameters

394 Transportation and Traffic Theory 2009

[ p (n 1) + (1 p (n 1))] ki (n 1)(1 k (n 1))

if user k experiences a favorable action on day (n-1)

Similarly, the prior distribution of the learning parameters (a, b) can be

where = a ; = a 0 ; and x = (a, b ) .

Modeling Learning Impacts on Day-to-day Travel Choice 395

3.1 Data Sources

396 Transportation and Traffic Theory 2009

3.2 BRC-MNL Model Estimation

Modeling Learning Impacts on Day-to-day Travel Choice 397

3.3 Estimation of the Learning Parameters

398 Transportation and Traffic Theory 2009

(b + 0.041)48.35 (0.106 a )3.75 ,

Fig. 2. Samples from joint posterior distribution

Fig. 3. Histograms for the posterior distributions of the learning parameters

Modeling Learning Impacts on Day-to-day Travel Choice 399

400 Transportation and Traffic Theory 2009

Modeling Learning Impacts on Day-to-day Travel Choice 401

You might also like