Professional Documents
Culture Documents
AbstractA methodology for estimating the destination of passenger journeys from automated fare collection (AFC) system
data is described. It proposes new spatial validation features to
increase the accuracy of destination inference results and to verify
key assumptions present in previous origindestination estimation
literature. The methodology applies to entry-only system configurations combined with distance-based fare structures, and it
aims to enhance raw AFC system data with the destination of
individual journeys. This paper describes an algorithm developed
to implement the methodology and the results from its application
to bus service data from Porto. The data relate to an AFC system
integrated with an automatic vehicle location system that records
a transaction for each passenger boarding a bus, containing attributes regarding the route, the vehicle, and the travel card used,
along with the time and the location where the journey began.
Some of these are recorded for the purpose of allowing onboard
ticket inspection but additionally enable innovative spatial validation features introduced by the methodology. The results led to the
conclusion that the methodology is effective for estimating journey
destinations at the disaggregate level and identifies false positives
reliably.
Index TermsAutomated fare collection, O-D matrix, public
transport, spatial validation, travel patterns.
I. I NTRODUCTION
Manuscript received December 23, 2014; revised May 18, 2015; accepted
July 19, 2015. This work was supported by Fundao para a Cincia e
Tecnologia (FCT) through a doctoral fellowship of the MIT Portugal Program.
The Associate Editor for this paper was S. Sun.
The authors are with the Faculty of Engineering, University of Porto,
4200-465 Porto, Portugal and with Instituto de Engenharia de Sistemas e
Computadores, Tecnologia e Cincia (INESC TEC), 4200 Porto, Portugal
(e-mail: antonio.nunes@fe.up.pt; tgalvao@fe.up.pt; jfcunha@fe.up.pt).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TITS.2015.2464335
1524-9050 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
NUNES et al.: PASSENGER JOURNEY DESTINATION ESTIMATION FROM AFC SYSTEM DATA USING SPATIAL VALIDATION
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4
TABLE I
I RREGULAR T RANSACTION R ECORDS A FTER D ATA P REPARATION
is the structure of bus routes (67 in total), listing the bus stops
and their respective sequence in each direction of travel. The
third is a matrix with the minimum number of zones traveled
between O-D zone pairs (for a total of 18 zones).
B. Data Preparation
Some authors have previously noted that AFC system data
often contains transaction records with missing or illogical attribute values [3], [6]. Hence, it is necessary to prepare the data
beforehand. The Andante system data set is not an exception.
The proportion of transaction records in the data sample missing at least one of the relevant data attributes was initially 3.2%.
Additionally, 0.8% of transaction records were found to have
illogical values across two attributes. However, it has been possible to identify and, in 81.5% of those cases, correct such errors
from thorough analysis and subsequent pre-processing of data.
The most frequent cause of missing and illogical attribute
values of the Andante data is the changeover between consecutive vehicle trips. Upon arrival at the terminus stop, the
bus driver needs to signal completion of the trip and afterward
signal the beginning of the next trip if not returning the vehicle
to the depot. That next trip often is a return in the same route,
but in the opposite direction. The data sample shows that passengers sometimes board the bus, initiating their journey before
the changeover process is completed. Two different types of
error occur in those circumstances. The first is when a passenger
boards the bus before the driver signals the completion of the
previous trip. This creates an illogical value in terms of the bus
stop code attribute of that transaction record, because it fails to
make sense that a passenger would board the bus at the terminus
stop. The second is when a passenger boards the bus after the
driver signals the completion of the previous trip, yet before
the next trip is initiated. This originates missing values for the
direction of travel and vehicle trip start time attributes of that
boarding transaction record. Both types of errors originating
(1)
where,
sapjk
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
NUNES et al.: PASSENGER JOURNEY DESTINATION ESTIMATION FROM AFC SYSTEM DATA USING SPATIAL VALIDATION
i
R
sR
i
nR
AR
pjk
pjk
sapjk
arg min d sbp1k , sapjk , sapjk AR
pjk , j = mpk
(4)
sa
pjk
(5)
where,
sapjk
sbpjk
mpk
Applying the key assumptions sets out a candidate destination for each boarding transaction record (Fig. 2), unless there
is a single daily journey for a passenger (mpk = 1) in which
case the candidate of its last stage is not determined. It was
decided that the destination of single daily journeys would not
be inferred because if a passenger occasionally changes part of
(6)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6
(10)
where,
pjk
pjk
V. I MPLEMENTATION
Fig. 3 illustrates the algorithm developed in SQL to implement the proposed methodology. It has linear complexity; the
execution time is proportional to the number of transaction
records selected. The algorithm goes through the transaction
record data set sorted firstly by travel card serial number and
secondly by the travel card transaction timestamp. The first
decision is to verify if the record is a duplicate, in which case it
will be used for spatial validation, but its destination shall not
be inferred.
The following decision is to check whether the record is
the last or the only stage of a single daily journey on that
travel card serial number. Two aspects are highlighted here.
The first is the day interval definition. In this case, the data set
reveals significant transaction levels around midnight, dropping
steadily to minimums between 3:00 am and 5:00 am. Transactions between midnight and 3:00 am appear to be largely related
to the previous day passenger journey chains. Coincidently, the
shift from night time to daytime bus services happens around
5:00 am. Therefore, it was decided that in terms of daily journey
chains a day starts at 5:00 am and ends at 4:59 am of the
following morning.
The second aspect relates to the distinction between journey
stages and complete journeys. Passengers often have to change
between public transport routes to reach their destination and,
in the case of the Andante system, tap their travel card on a
reader every time they board a different vehicle. Each of those
transactions relate to a stage of their complete journey. The
difference matters in cases when there is a single daily journey
for a passenger. If that journey is single staged, it is trivial
that a destination cannot be inferred due to lack of information
to determine a candidate destination. But if that journey has
several stages, it is arguably most likely that the last journey
stage was to reach a destination other than the daily origin
(left in Fig. 4), otherwise the passenger would be traveling in
a circle (right in Fig. 4). Both of these scenarios are possible
in theory, but simply assuming that the latter is true carries
great risk of inferring the destination incorrectly. Therefore,
a candidate for that destination is not determined either to
meet the objective of highest accuracy of estimates, instead of
assuming it to be the daily origin as seen in previous literature.
It is reminded that the stages of a complete journey are defined
by the time-based Andante rules for pay-per-use passengers,
which set out maximum journey durations according to the
number of travel zones.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
NUNES et al.: PASSENGER JOURNEY DESTINATION ESTIMATION FROM AFC SYSTEM DATA USING SPATIAL VALIDATION
Fig. 4. Single daily journeys with multiple stages (three stages example).
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8
TABLE II
R ESULTS OF THE M ETHODOLOGY A PPLIED
TO A NDANTE AFC S YSTEM D ATA
TABLE III
S ENSITIVITY A NALYSIS OF THE C UT-O FF
E UCLIDEAN D ISTANCE PARAMETER
the only or last stage of single daily journeys, or that fail the
first validation rule, as further journeys may exist for those
passengers within the Andante system with other operators.
Additional results were obtained from applying the methodology to available data from a previous month. The objective
has been to check for significant variations, yet the results were
very similar. Using data for the whole month of January 2010,
the percentage of boarding transactions that survived validation
were 62.1% of the total (0.3%), which subdivided into 35.8%
of the total (0.8%) having the time inferred or bound, and
26.3% of the total (+0.5%) not having enough information
to estimate an arrival time. The remaining transaction records
from January 2010 totaled 37.9% (+0.3%).
Sensitivity analysis of the cutoff Euclidean distance parameter is shown in Table III. If it were made stricter by reducing
it from 640 m to 400 m as in [2], the percentage of transaction
records failing the second spatial validation rule would increase
to 10.5% and the percentage of inferred destinations would drop
to 60.0%. Conversely, if the parameter were made more liberal
by increasing it to 1000 m as in [10], those percentages would
respectively drop to 5.8% and increase to 64.6%. The variation
is significant, however the parameter set was felt to be an
adequate trade-off between the risks of rejecting true positives
(rejecting a correct candidate destination) that is greater with a
short cutoff distance, and accepting false positives (accepting
an incorrect candidate destination) that is greater with a longer
cutoff distance.
Another contribution of this work is the introduction of the
third and fourth spatial validation rules with the main purpose
of testing at the maximum disaggregation levelsingle journey
stagethat the two key assumptions underlying most O-D
estimation methodologies in the literature apply to a particular case study. Each public transport system has specific
usage patterns; therefore the fit of key assumptions may vary
and should be tested for each particular case. Previously
those assumptions had mostly been validated at an aggregate
level using external data from O-D surveys and exit counts.
However, there is a risk concerned with the errors tending to
average out when the assumptions are tested at aggregate level
and there is no bias. For example, if the assumptions were
flawed, an overestimate of journeys from origin A to destination
C may compensate an underestimate of journeys from origin B
to destination C. Despite this risk, endogenous and exogenous
validation rules should ideally have been combined. It was
not possible for the present case study due to the inexistence
of a recent O-D survey or Automatic Passenger Count (APC)
devices in STCP buses. The latter had reportedly been trialed,
but the operator found the APC technology unreliable and
decided against its use.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
NUNES et al.: PASSENGER JOURNEY DESTINATION ESTIMATION FROM AFC SYSTEM DATA USING SPATIAL VALIDATION
ACKNOWLEDGMENT
The authors thank STCP for providing the AFC system data
necessary for this work.
R EFERENCES
[1] W. Wang, J. P. Attanucci, and N. H. M. Wilson, Bus passenger origindestination estimation and related analyses using automated data collection systems, J. Public Transp., vol. 14, no. 4, pp. 131150, 2011.
[2] J. Zhao, A. Rahbee, and N. H. M. Wilson, Estimating a rail passenger
trip origin-destination matrix using automatic data collection systems,
Comput. Civ. Infrastruct. Eng., vol. 22, no. 5, pp. 376387, Jul. 2007.
[3] J. J. Barry, R. Freimer, and H. Slavin, Use of entry-only automatic fare
collection data to estimate linked transit trips in New York City, Transp.
Res. Rec. J. Transp. Res. Board, vol. 2112, pp. 5361, Dec. 2009.
[4] J. J. Barry, R. Newhouser, A. Rahbee, and S. Sayeda, Origin and destination estimation in New York City with automated fare system data,
Transp. Res. Rec. J. Transp. Res. Board, vol. 1817, pp. 183187, 2002.
[5] M. Munizaga, F. Devillaine, C. Navarrete, and D. Silva, Validating travel
behavior estimated from smartcard data, Transp. Res. C, Emerg. Technol.,
vol. 44, pp. 7079, Jul. 2014.
[6] M. Trpanier, N. Tranchant, and R. Chapleau, Individual trip destination estimation in a transit smart card automated fare collection system,
J. Intell. Transp. Syst. Technol. Plann., Oper., vol. 11, no. 1, pp. 114,
2007.
[7] J. M. Farzin, Constructing an automated bus origin-destination matrix using farecard and global positioning system data in So Paulo,
Brazil, Transp. Res. Rec. J. Transp. Res. Board, vol. 2072, pp. 3037,
Dec. 2008.
[8] D. Li, Y. Lin, X. Zhao, H. Song, and N. Zou, Estimating a transit
passenger trip origin-destination matrix using automatic fare collection
system, in Database Systems for Advanced Applications, J. Xu, G. Yu,
S. Zhou, and R. Unland, Eds. Berlin, Germany: Springer-Verlag, 2011,
pp. 502513.
[9] J. B. Gordon, Intermodal passenger flows on Londons public transport
network: Automated inference of full passenger journeys using faretransaction and vehicle-location data, M.S. thesis, Dept. Urban Stud.
Plann., Mass. Inst. Technol., Cambridge, MA, USA, 2012.
[10] M. A. Munizaga and C. Palma, Estimation of a disaggregate multimodal
public transport Origin-Destination matrix from passive smartcard data
from Santiago, Chile, Transp. Res. C, Emerg. Technol., vol. 24, pp. 918,
Oct. 2012.
[11] F. Devillaine, M. Munizaga, and M. Trpanier, Detection of activities of
public transport users by analyzing smart card data, Transp. Res. Rec. J.
Transp. Res. Board, vol. 2276, pp. 4855, Dec. 2012.
[12] E. van der Hurk, L. Kroon, G. Maroti, and P. Vervest, Deduction of passengers route choices from smart card data, IEEE Trans. Intell. Transp.
Syst., vol. 16, no. 1, pp. 430440, Feb. 2015.
[13] C. Oberli, M. Torres-Torriti, and D. Landau, Performance evaluation of
UHF RFID technologies for real-time passenger recognition in intelligent
public transportation systems, IEEE Trans. Intell. Transp. Syst., vol. 11,
no. 3, pp. 748753, Sep. 2010.
[14] X. Zhou and H. S. Mahmassani, Dynamic origin-destination demand estimation using automatic vehicle identification data, IEEE Trans. Intell.
Transp. Syst., vol. 7, no. 1, pp. 105114, Mar. 2006.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10
Joo Falco e Cunha received the M.Eng. in electrical engineering from the University of Porto, Porto,
Portugal, in 1983; the M.Sc. in operations research
from Cranfield University, Bedford, U.K., in 1984;
and the Ph.D. in computing science from Imperial
College London, London, U.K., in 1989. He is the
Dean of the Faculty of Engineering, University of
Porto, and a researcher at Instituto de Engenharia
de Sistemas e Computadores, Tecnologia e Cincia
(INESC TEC). He has been involved with theoretical
and experimental work in software engineering and
information systems applied to transportation systems for the past 25 years. His
research interests include decision support systems, graphical user interfaces,
object-oriented modeling, and service engineering and management. Dr. Falco
e Cunha joined the IEEE in 1982 and is now a Senior Member.