1 s2.0 S0925231215006839 Main

Neurocomputing 169 (2015) 158168
Contents lists available at ScienceDirect
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
Formalizing computational intensity of big trafc data understanding

and analysis for parallel computing
Yingjie Xia a,b,c,n, Jinlong Chen b, Chunhui Wang b
a
College of Computer Science, Zhejiang University, 310012 Hangzhou, China

Hangzhou Institute of Service Engineering, Hangzhou Normal University, 310012 Hangzhou, China
c
Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, 215006 Soochow, China
b
art ic l e i nf o
a b s t r a c t
Article history:
Received 21 March 2014
Received in revised form
5 October 2014
Accepted 5 October 2014
Available online 16 May 2015
Nowadays, trafc data can be collected from different types of sensors widely-deployed in urban
districts. Big trafc data understanding and analysis in intelligent transportation systems (ITS) turns out
to be an urgent requirement. This requirement leads to the computation-intensive and data-intensive
problems in ITS, which can be innovatively resolved by using Cyber-Infrastructure (CI). A generic process
for the solution contains four steps: (1) formalized data understanding and representation, (2) computational intensity transformation, (3) computing tasks creation, and (4) CI resources allocation. In this
paper, we rstly propose a computational domain theory to formally represent heterogeneous big trafc
data based on the data understanding, and then use data-centric and operation-centric transformation
functions to evaluate the computational intensity of trafc data analysis in different aspects. Afterwards,
the computational intensity is leveraged to decompose the domain into sub-domains by octree
structure. All the sub-domains create computing tasks which are scheduled to CI resources for parallel
computing. Based on the evaluation of overall computational intensity, an example of fusing Sydney
Coordinated Adaptive Trafc System (SCATS) data and Global Positioning System (GPS) data for trafc
state estimation is parallelized and executed on CI resources to test the accuracy of domain decomposition and the efciency of parallelized implementation. The experimental results show that the ITS
computational domain is decomposed into load-balanced sub-domains, therefore facilitating signicant
acceleration for parallelized big trafc data fusion.
& 2015 Elsevier B.V. All rights reserved.
Keywords:
Intelligent transportation systems
Computational intensity
Computational domain
Formalization
Parallel computing
1. Introduction
The quantity of trafc data collected from different types of
sensors widely deployed in urban districts have increased dramatically in the recent few decades, with the data quality improving
signicantly. This trend will continue in the foreseeable future,
and lead to the urgent requirement of big trafc data understanding and analysis in intelligent transportation systems (ITS)
[1]. A solution involves the use of Cyber-Infrastructure (CI) which
can support the analysis of computation-intensive and dataintensive problems in high performance [2]. This solution generically consists of four steps: (1) formalized understanding and
representation of heterogeneous big trafc data; (2) computational
intensity evaluation on ITS applications; (3) algorithmic parallelization to create computing tasks; and (4) CI scheduling and
allocation for parallel computing. For each step, the output of its
previous step is used as its input. Therefore, in order to efciently
n
Corresponding author at: College of Computer Science, Zhejiang University,
310012 Hangzhou, China. Tel.: 86 571 28866898; fax: 86 571 28866717.
E-mail address: xiayingjie@zju.edu.cn (Y. Xia).
http://dx.doi.org/10.1016/j.neucom.2014.10.104
0925-2312/& 2015 Elsevier B.V. All rights reserved.
parallelize the data-driven ITS applications in step (3), we need to

formalize the computational intensity of big trafc data analysis
based on the formalized data understanding and representation in
steps (1) and (2).
Traditional computational complexity theory is used to assess
the computational intensity of trafc data analysis on the algorithmic complexity notation [3]. However, this notation merely
focuses on the evaluation of algorithmic structure, and does not
adequately capture spatio-temporal characteristics of data and
operations for analysis. These characteristics are always dependent
on spatio-temporal clustering, neighborhood, autocorrelation, and
interaction dynamics of big trafc data, and their transformed
computational intensity can be measured in different aspects.
Following the aforementioned solution steps and characteristics of computational intensity, in this paper we propose the
architecture of our work as the owchart in Fig. 1. We rstly
dene an ITS computational domain to represent multi-sensor
heterogeneous data in an accommodating structure. The computational domain is dened as a high-dimensional data space
consisting of a large amount of cell tuples, whose attributes
include spatio-temporal information and trafc features. Based
Y. Xia et al. / Neurocomputing 169 (2015) 158168
Fig. 1. Architecture of our work.
on the domain, two types of computational intensity transformation functions, data-centric function and operation-centric function, are adopted to elucidate the computational intensity of a
particular big trafc data analysis in three principle aspects,
memory, I/O, and computing time. Different aspects of computational intensity are synthesized to decompose the computational
domain into load-balanced sub-domains, which create computing
tasks to be executed in parallel. CI resources are allocated to all
tasks according to the matching of computing capability and
computational intensity, to improve the efciency of the parallel
computing. The formalization of computational intensity provides
a comprehensive evaluation of algorithmic complexity and correlation between neighboring data cells, and therefore can facilitate
to accelerate big trafc data analysis because of the accurate
domain decomposition by overall computational intensity.
The rest of the paper is organized as follows. Some related work
about data-driven ITS and computational complexity analysis is
reviewed in Section 2. Section 3 introduces the ITS computational
domain theory to formally represent multi-sensor heterogeneous
trafc data. Section 4 species the data-centric and operationcentric transformation functions to implement the computational
intensity transformation into different aspects. We also dene the
overall computational intensity in Section 4. An example application of data-driven ITS, multi-sensor data fusion for trafc state
estimation, is illustrated in Section 5. Section6 analyzes the
utilization of computational intensity for parallel computing by
using real trafc data, and conducts load-balance, accuracy, and
efciency tests on the data. Finally, the conclusion with remarks
on future work is drawn in Section 7.
2. Related work
Recently, the conventional ITS is evolving into the data-driven
ITS where data collected from multiple trafc sensors play an
essential role in ITS. Based on the types of trafc sensors used, the
way to process data, and the specic applications, a full datadriven ITS can be classied. The main classied categories include
vision-driven ITS, multisource-driven ITS, learning-driven ITS, and
visualization-driven ITS [1].
Vision-driven ITS takes the trafc data collected from video
sensors as input, and uses the processing output for ITS related
applications, such as (1) trafc object detection [4], monitoring [5],
and recognition [6], (2) trafc behavior analysis [7], (3) trafc data
statistical analysis [4], and (4) vehicle trajectory construction [8].
However, vision-driven ITS suffers from the environmental constraints, e.g., snow, static or dynamic shadows, and rain [9].
Therefore, multisource-driven ITS uses multiple types of sensors,
such as Sydney Coordinated Adaptive Trafc System (SCATS) loop
159
detector, Global Positioning System (GPS), and Remote Trafc

Microwave Sensor (RTMS), to play complementary roles for each
other. The collected heterogeneous trafc data can be fused for
trafc state estimation [10], clustered for urban advertising value
evaluation [11], and combined to improve the reliability of vehicle
classication [12]. Although video devices and multiple sensors
can generate trafc data for different applications in ITS, they still
require some learning tools, including online learning [13], rough
set theory [14], adaptive dynamic programming (ADP) [15], and
fuzzy logic [16], to extract intrinsic mechanisms from historical
and real-time data in specic applications. This is named learningdriven ITS. The output of data processing by learning tools can be
visualized to help people understand and analyze trafc data
intuitively in visualization-driven ITS [17]. Some visualization
packages, such as CubeView, are developed to identify abnormal
trafc patterns and accordingly take the system back to the normal
track [18].
Although all the aforementioned work is related to the datadriven ITS, to the best of our knowledge, little research has been
carried out on how to deal with the data growing in a large scale.
For example, Google cooperates with INRIX to use the GPS data
collected from more than 30,000,000 taxies, transit vehicles, and
trucks, to estimate the trafc states plotted on the Google map
[19]. These massive GPS data are processed by high performance
computers (HPC), which divide the data by cities and create the
corresponding multiple computing tasks. The performance of this
implementation is limited by the unbalanced computing loads of
different cities, and can be improved by achieving load-balance. As
the solution, the goal of this paper is to create load-balanced
computing tasks based on the evaluation of computational complexity. The computational complexity theory dates from 1960s,
and is rstly used to evaluate the polynomial time on Turing
machines [20]. This topic came into the picture owing to the
discovery of NP-complete problems in 1970s. The NPcompleteness can indicate the computational complexity of problems by using enumerative optimization methods and approximation algorithms [21]. Loosely speaking, an algorithm can be
described as a nite sequence of instructions, and its computability can be quantied as the computational complexity measured by how fast a computer works out all instructions [22]. The
analysis of computational complexity leads to various postprocessing models, such as parallel computing, probability calculation,
and decision tree. For example, the evaluated complexity of rules
extraction from massive trafc data is used to parallelize the
attribute signicance calculation in bootstrapping rough set algorithm to estimate trafc state more accurately and efciently [23].
The work in [23] focuses on the evaluation of algorithmic complexity for the algorithmic parallelization of computationintensive problems. However, massive trafc data analysis
urgently requires the evaluation of data-centric computational
complexity to parallelize the data-intensive problems. It is also
argued that the computational complexity is different from the
computational intensity. The computational complexity theory
addresses how much the intrinsic complexity of computing tasks
are, while the computational intensity requires much extra consideration on the spatio-temporal correlated characteristics in
massive trafc data and algorithms. In geographical information
systems (GIS), the computational intensity of spatial analysis has
been proposed to discover the difcult nature of spatial domain
decomposition. This work is fundamental to analyze computationintensive spatial problems which focus on the geographic data in
two-dimensional space [24]. However, in current ITS research, few
efforts have been made to formalize the computational intensity
with respect to the trafc data characteristics, quantity, and spatiotemporal distribution. Therefore, this is what we aim to elucidate
in the following parts.
160
Fig. 3. ITS computational domain in three-dimensional Euclidian space.
Fig. 2. Location indices denition of road segments.
3. ITS Computational domain theory

As an essential preparation for formalizing the computational
intensity of big trafc data analysis, the ITS computational domain
theory is proposed to understand and represent multi-sensor
heterogeneous trafc data set. This theory is inspired by spatial
computational domain, which is designed to represent twodimensional computational surface in GIScience [25]. Comparatively, the ITS computational domain is dened as a highdimensional data space consisting of a large amount of cell tuples
projected to a three-dimensional spatio-temporal domain [26].
By taking the SCATS loop detector and GPS as example trafc
sensors, the cell tuple can be dened as t; lx ; ly ; s; v; f , where t and
lx ; ly denote the discretized time and location indices of a road
segment, s denotes the trafc sensor type, and v and f are the
estimated speed and ux of that road segment at that time
respectively. The location indices are dened in Fig. 2. We use a
location matrix M whose elements represent the existence of a
road segment from one intersection to another, e.g. Ma; b 1
indicates the existence of a road segment from intersection a to b,
while Ma; b 0 indicates the inexistence of that road segment.
Both speed and ux can be estimated from SCATS data and GPS
data by the corresponding algorithms [23]. The tuple format is
exible to be adjusted by adding or removing attributes. In ITS
computational domain, the cell reorganizes the attributes of
original cell tuples, and can be formally represented as D ci;j;k
in three-dimensional Euclidian space RN shown in Fig. 3, where
N xt nylx nzly . The cell ci;j;k denotes the vector s; v; f at position
i; j; k. xt, ylx , and zly are the discretization number of cells in the
respective t, lx, and ly.
The ITS computational domain in three-dimensional Euclidian
space can be divided and normalized into multiple unit cubes,
which provides a unied data input for different ITS applications.
The unit cube is dened according to the time slot which is used to
create data unit for processing. Once one data unit has been
processed, the next data unit will be input into ITS applications.
For example, the time slot is set as 15 min, and t of cell is set as
1 min. This means that the data unit consists of (15 min/1 min)
15 layers of cells in xt axis. These 15 layers of cells will be
processed entirely in the ITS applications. In the rest parts of the
paper, we use the computational domain with one data unit as the
example, for the computational intensity transformation and
parallel processing.
Based on the specication above, the normalized computational domain is dened as (1), where a cell at a position
i=xt ; j=ylx ; k=zly A I 3 , i 1,, xt ; j 1,, ylx ; k 1,, zly denotes a
three-dimensional vector as (2)
f : I 3 0; 1 0; 1 0; 1-R3
f i=xt ; j=ylx ; k=zly s; v; f
Based on the formalized representation of data, the computational

intensity of data-related ITS applications will be transformed in
data-centric and operation-centric ways.
4. Computational intensity transformation

4.1. Computational intensity aspects
For a particular data-driven ITS application, the data in computational domain can be used to analyze the computational
burden derived from the following three aspects:
(1) Memory: the memory to perform the computation for cells in
the computational domain.
(2) I/O: the amount of data input/output for cell interactions in the
computational domain.
(3) Computing time: the time to complete the computation for
cells in the computational domain.
The formalized representation of heterogeneous trafc data
facilitates to formalize the evaluation of computational intensity.
These three aspects are derived from the computational transformation which involves algorithmic and data analysis on ITS
applications. The aspects which have been adopted to measure
the consumption of major computer resources can also be
extended to include additional ones due to the requirements of
specic applications, such as the number of threads in multithreaded applications.
4.2. Transformation functions

The computational transformation is performed to characterize
the three aspects of computational intensity for a particular datadriven ITS application based on its characteristics of data as well as
specic analysis algorithms. According to different objects conducted by transformation, two types of functions, data-centric
function and operation-centric function, are identied as follows:
(1) TRd denotes the data-centric transformation function, which
transforms the characteristics of trafc data in the context of
their corresponding operations into memory and I/O aspects.
Based on the ITS computational domain, the function used to
evaluate the memory consumption of operations on a cell
i; j; k is dened as
TRd m i; j; k memoryUnit nattr_numberci;j;k
attr_numberci;j;k_neighbors
where the denition of distance function D depends on the

specic application and means the existed interaction relationship.
Similarly, the amount of data input and output for a cell is
determined by the number of its neighboring cells which
interact with the cell in applications. Therefore, the transformation function to count I/O for a cell (i, j, k) is dened as
TRd IO i; j; k Input_Countci;j;k ; ci;j;k_neighbors
Output_Countci;j;k_neighbors
TRd TRd1 TRd2 TRdn
TRo TRo1 TRo2 TRon
10
The denitions of aforementioned functions are based on the

transformation of trafc data and applications characteristics
which can be acquired from the analysis on computational
domain.
A metric is required to evaluate the overall computational

intensity OCI, which can combine three computational intensity
aspects, memory, I/O, and computing time. OCI is calculated
depending on the performance of computing resources of computers, such as CPU and memory. The OCI calculation for one nonnull cell is designed as
OCI w1 nTRd m =CI m unit w2 nTRd I=O =CI I=O unit
w3 nTRo t =CI t unit
11
where w1 ; w2 , and w3 are the weights for computational intensity

in memory, I/O, and computing time, and CI m unit ; CI I=O unit , and
CI t unit are the units to normalize the respective output of datacentric and operation-centric transformation functions. The
weights and the units depend on the real hardware conditions,
and they can be evaluated by the respective baseline conditions
and experience. In Section 6, we use a real example to evaluate the
OCI.
5. An illustrative example
5
The input count of a cell is evaluated as the number of writing

to the cell by its neighboring cells, and the output count of a
cell is evaluated as the number of reading the cell by its
neighboring cells. The amount of the cell I/O sums the input
count and output count.
(2) TRo denotes the operation-centric transformation function,
which transforms the characteristics of operations related to
trafc data into a computing time aspect. The computing time
is mostly related to the operation algorithm which is directly
dependent upon the data characteristics. It can be evaluated
by the equation as (6), where timeUnit is a constant used to
convert the number of operations on the cell Oci;j;k and its
interactions with neighboring cells Oci;j;k ; ci;j;k_neighbors into a
computing time unit
TRo t i; j; k timeUnit nOci;j;k ; ci;j;k_neighbors
TRd and TRo can be comprised of multiple transformation subfunctions as (9) and (10), where represents function multiplication [27]. All these sub-functions output the same aspect and
their order should be consistent with the specic input and output
sequences between neighboring sub-functions, and thus the
function multiplication cannot obey the commutative law:
4.3. Overall computational intensity
where memoryUnit is a constant used to convert the number of

attributes calculated from the cell and its neighbors in operations into units of memory capacity. The neighbors in the
denition represent the cells adjacent to the cell (i, j, k) in
applications. Here we can formally represent the neighboring
cells (m, n, r) of the cell (i, j, k) as
fm; n; rp i; j; k; 8 q m; n; r;
Dp; q o d 4 i a m 3 j a n 3 k a rg
161
There are four combinations of these two types of computational transformation functions, data-centric and operation-centric, non-data-centric and operation-centric, data-centric and nonoperation-centric, non-data-centric and non-operation-centric. A
particular ITS data application must be placed into one of these
combinations to evaluate its computational intensity. Specically,
in this part an example on fusing a large amount of SCATS and GPS
data for trafc state estimation is provided to demonstrate how
computational transformation functions are used to derive computational intensity from the formalized data representation. Since
the example is a data-intensive and computation-intensive program, it requires both data-centric and operation-centric functions
to transform into memory consumption, I/O count, and CPU
computing time.
5.1. Trafc data fusion
The overall transformation function TR for an application can

be dened as a composite of TRd and TRo as Eq. (7), where
denotes function addition [27]. Since the output of TRd and TRo are
in different computational intensity aspects, the function addition
of TR obeys the commutative law as Eq. (8)
TR TRd TRo
TR TRd TRo TRo TRd
As a cutting-edge research in ITS, the trafc state estimation by

collecting, processing, and transmitting trafc detector data aims
to provide travelers the real-time and accurate trafc states for
automatic control and guidance. The trafc state estimation is a
typical data-intensive ITS application, which meets two grand
challenges: how to combine heterogeneous data collected from
multiple types of trafc sensors, and how to efciently process a
large amount of data from numerous trafc sensors. In order to
implement the real-time and accurate trafc state estimation for a
large-scale urban road network, the parallelized fusion of trafc
data from heterogeneous multi-sensors is implemented and
162
intensity evaluation, we focus on the analysis of data-centric and

operation-centric computational transformation for the fusion part.
5.3. Computational transformation for fusion algorithm
For a cell (i,j,k) which corresponds to non-null values to
represent an existed road segment, the computational intensity
of D-S evidential fusion in memory and I/O is transformed by the
data-centric functions as DSTRd m and DSTRd I=O :
DSTRd m i; j; k memoryUnit n2nBPA_number
Fig. 4. D-S evidential fusion of multi-sensor heterogeneous Trafc data.
executed on CI resources [28]. The complete fusion process

consists of four steps, sensor data input, trafc features conversion,
D-S evidential fusion, and trafc state output. To be different from
the work in [28], in this paper we dene the computation load by
the features of trafc data, not the number of road segments.
Therefore, we can formalize the computational domain and
computational intensity to implement efcient parallel computing. Based on the formalized representation of SCATS and GPS data
in the computational domain, trafc data fusion only needs the
last two parts, which is evidential fusion and output, to be
implemented and parallelized for trafc state estimation. Therefore, we mainly evaluate the computational intensity of fusion
algorithm by data-centric and operation-centric transformation
functions, and provide a corresponding suggestion for parallelizing
the fusion.
5.2. Fusion algorithm

The formalized trafc features are fused by D-S evidence theory
[29] as shown in Fig. 4. The algorithm can overcome the conicts
of evidences and compensate for the deciencies of data sources
mutually. Specically, the evidential fusion part fuses the same
trafc feature in the computational domain of SCATS and GPS data
on one road, and provides an overall trafc state evaluation for
that road.
The algorithm of D-S evidential fusion on the discretized speed
of each cell in the computational domain is shown as
mst m1 s1;t m2 s2;t
P
mX sX;t
mi si;t
1
Pi 1
ni;t s; d
ni;t d
X
si;t st i 1 mi si;t
X
Xi 1 si;t i 1 mi si;j
12
13
where mst denotes the basic probability assignment (BPA) of a

discretized speed as the fusion result at time t, and mi si;t ,
i 1; 2; ; X represents the BPA of this speed of a cell in the
computational domain of ith sensor at time t. The mi si;t of one cell
can be estimated by calculating the proportion which divides the
number of its neighboring cells ni;t s; d within a distance threshold d
and having that speed s by the number of all neighboring cells within
d and indicating some existed road segments. The speed discretization
is implemented by manual segmentation according to the data
distribution, i.e. three-level segmentation as smaller than 1=3V max ,
1=3 2=3V max , and greater than 2=3V max [30]. For each speed level, it
requires to run the fusion algorithm once to evaluate the trafc state
and its corresponding probability. The nal trafc state can be
determined following a certain decision rule, such as maximum belief
or maximum plausibility, which are typical measurement in D-S
evidence theory. Since the decision part is trivial in computational
14
where the number 2 indicates that there are two types of trafc
sensors, SCATS and GPS, and the value of BPA_number is determined by the number of speed segmentation levels. Towards a cell
set cs with n cells, such as data of one road segment (i, j) within a
time slice, the memory consumption can be directly calculated by
DSTRd I=O cs nnDSTRd I=O i; j; k
nnmemoryUnit n2nBPA_number
15
Similarly, the I/O count can be evaluated as

DSTRd I=O i; j; k N i;j;k_neighbors BPA_number
2
n2nBPA_number 2nBPA_number 1
3
N i;j;k_neighbors 4nBPA_number BPA_number
16
where N i;j;k_neighbors indicates the number of reading (output) the

neighboring cells of i; j; k for calculating BPA values, according to
Eq. (12) the rst BPA_number indicates the times to call the fusion
2
algorithm for all the speed levels, the rst 2nBPA_number
indicates the reading count of a pair of SCATS and GPS data cells
P
X
in the numerator
\ Xi 1 si;t st i 1 mi si;t of D-S evidential
2
fusion algorithm, the second 2nBPA_number indicates the reading
P
X
count in the denominator
X
\ i 1 si;t
i 1 mi si;t , and the
number 1 denotes the writing (input) count of the fusion result.
The computing time aspect of trafc data fusion can be
evaluated by the following operation-centric transformation function:
DSTRo t i; j; k timeUnit nBPA_number nN i;j;kn eighbors
2
BPA_number n2nBPA_number 2nBPA_number

3
timeUnit nBPA_number nN i;j;k_neighbors 4nBPA_number
17
where BPA_number nN i;j;k_neighbors is the number of CPU cycles to

calculate the BPA values for all speed levels, and also according to
2
Eq. (12) the second BPA_number and the two 2nBPA_number have
the same indications as Eq. (16) with respect to the count of CPU
cycles.
6. Utilizing computational intensity for parallel computing

The transformed computational intensity is utilized to decompose ITS computational domain into sub-domains under the loadbalancing strategy, which can inherently enable an efcient
parallelization for big trafc data processing. The implementation
consists of computational intensity evaluation, computational
domain decomposition, and computing resources allocation.
6.1. Evaluation of OCI
The sub-domains evaluate the OCIs to determine a threshold to
end the decomposing recursion. According to the OCI denition, by
the evaluation experience we can set the unit values as
CI m unit memoryUnit, CI I=O unit 2, and CI t unit 2ntimeUnit.
We can also set baseline conditions for different hardware devices
of computing nodes, such as 1G for memory, 667 MHz for memory
I/O speed, and 2 GHz for CPU frequency. Therefore, the weights are
163
calculated according to the real hardware conditions as

w1 Memory=1G
18
w2 I=O_Speed=667 MHz
19
w3 CPU=2 GHz
20
The weights can also be determined manually, e.g., if the

evaluation of computational intensity just focuses on the memory
consumption, we set w1 1, w2 0, and w3 0. By taking the
multi-sensor trafc data fusion in Section 5 as an experimental
case, the hardware conditions of our computing node are 4G
memory, 667 MHz I/O speed, and 2.5 GHz Xeon CPU. The OCI can
be calculated as
OCI 4G=1GnmemoryUnit n2nBPA_number=memoryUnit
667 MHz=667 MHznN i;j;k_neighbors 4nBPA_number
Fig. 5. Octree-based ITS computational domain decomposition under a threshold

of OCI.
BPA_number=2 2:5 GHz=2 GHzntimeUnit nBPA_number

3
nN i;j;k_neighbors 4nBPA_number =2ntimeUnit

3
8nBPA_number 0:5nN i;j;k_neighbors 2nBPA_numbers
0:5nBPA_number 0:625nBPA_number nN i;j;k_neighbors

2:5nBPA_number
3
3
8:5nBPA_number 4:5nBPA_number
0:625nBPA_number 0:5nN i;j;k_neighbors
21
Supposing that we adopt a three-level speed segmentation strategy, BPA_number 3, OCI of one cell can be calculated as
OCI 147 2:375nNi;j;k_neighbors
22
Fig. 6. The corresponding octree structure of domain decomposition in Fig. 4.
Therefore, the cells aggregate their OCI values, which will be used
to decompose ITS computational domain into load-balanced subdomains.
6.2. Octree-based computational domain decomposition
The octree structure is a representation of regular partition of
three-dimensional space into eight octants by recursively decomposing until achieving a tradeoff between the number of subdomains and their computational intensity [31]. Some kinds of
octrees have been dened, differing in the rules that govern data
decomposition, data type indexed, and other details [32]. A basic
octree in the three-dimensional space is an 8-way branching tree,
wherein at each level a cubic domain is decomposed into eight
equal-size cubes. By traversing all the leaf nodes of the octree, the
generated sub-domains are represented by a specic data structure which is linked and stored in a single directional list [33]. Big
trafc data in the computational domain can be decomposed into
data pieces in sub-domains by the octree structure.
In the fusion example, we take OCI as a metric, and calculate its
threshold for domain decomposition through dividing the OCI
value of whole computational domain by a specied number of
computing nodes. By representing SCATS and GPS data collected
from Shanghai downtown area into a computational domain, a
threshold can be calculated by
OCI threshold OCI computational_domain =N computing_node 1; 793; 152:6=64
23
where 1,793,152.6 denotes the sum of OCI values of all the subdomains and 64 denotes the number of common-used computing
nodes in CI resources. This threshold calculation can facilitate to
make full use of computing resources, and achieve better performance in load-balance. The octree-based domain decomposition is
shown in Fig. 5. Each sub-domain is tagged with A-B-C where A
denotes its level in the corresponding octree structure in Fig. 6, B
denotes its sequence number in that level, and C denotes its OCI
value which is calculated by summarizing OCI values of all its

contained non-null cells. All the leaves of octree correspond to the
sub-domains which are generated from the decomposition of
whole ITS computational domain.
6.3. CI resources allocation
The computational intensity evaluated to the sub-domains from
domain decomposition enables the optimal scheduling of computing
tasks to CI resources under the load-balancing strategy [34]. Supposing that we use the cluster machine as CI resources and its nodes are
regarded as homogeneity, one node can be regarded as the data
repository to be allocated to contain the ITS computational domain,
and the other nodes can be simply allocated to process sub-domains
following the sequence of leaves in the octree. As an example, by
using the domain decomposition in Fig. 5 and the corresponding
octree structure in Fig. 6, the cluster nodes allocation to all the leaves
is shown as Fig. 7. If the number of nodes is less than the number of
sub-domains, some computing nodes have to be allocated more than
once to complete the fusion on all the sub-domains. Here we adopt
the maxmin scheduling algorithm [35], which schedules a subdomain with high computational intensity and a sub-domain with
low computational intensity to one cluster node. The target of this
resources scheduling to excessive tasks is to avoid the buckets effect
and achieve the computing loads on each nodes as balancing as
possible.
For the heterogeneous CI resources, e.g. there are two kinds of nodes
with different computing capabilities, we can set two different OCI
thresholds for the sub-domains decomposition. Supposing that the CI
resource contains 64 computing nodes, 16 of them are evaluated to
have the capability 1, and the remaining 48 nodes are evaluated to have
the capability 0.5. Therefore, the threshold for the sub-domains
allocating the nodes with capability 1 is set as OCI computational_domain
=16n1 48n0:5 OCI computational_domain =40 and the threshold for the
sub-domains allocating nodes with capability 0.5 is set as
164
comparing fusion results with ground-truth values, which are

manually extracted from trafc surveillance videos. The standard
for evaluating the ground-truth values is set as, from the captured
snapshots of roads in the videos, if the vehicles waiting in a queue
cannot be totally released during two periods of green light, the
trafc state is evaluated as congested. If the waiting queue can
be released during one period of green light, the trafc state is
evaluated as smooth. Otherwise, the trafc state is evaluated as
medium. The experimental metrics of computing efciency,
including average execution time and computational throughput,
are measured by parallel computing on 1, 2, 4, 8, 16, 32, 64, 128,
and 256 computing cores. The efciency test is conducted to
compare our method with two other common load-balancing
methods, octree-based domain decomposition by number of cells
and even domain decomposition by amount of data, which are
applied to the same example. The detailed experimental results
and analysis are listed below.
Fig. 7. Cluster nodes allocation for the computation on sub-domains in the fusion
example.
OCI computational_domain =80. To be different from recursive octree-based

domain decomposition with the same threshold on the homogeneous
CI resources, we set different thresholds for the sub-domains running
on the heterogeneous CI resources. In the eight sub-domains from
decomposing the domain by octree, we choose (16/64)n82
sub-domains to be set the threshold OCI computational_domain= 40, and the
remaining 6 sub-domains are set the threshold OCI computational_domain =80.
The serial numbers of the 2 sub-domains set the higher threshold are
xed in the eight sub-domains by recursive octree-based domain
decomposition, e.g. the rst and the second octants. This threshold
set method and the corresponding octree-based domain decomposition
can support the heterogeneous CI resources allocation with the mapping of different computational intensity and different computing
capability and the matching of the number of different computing
nodes and then number of different sub-domains.
Because most of the CI resources are in the homogeneous
architecture, we take the cluster as the main CI resource category
and specify the work on it in this paper.
6.4. Experimental results and analysis
We conduct some experiments on the fusion example to
evaluate load-balance of computational domain decomposition,
accuracy of multi-sensor trafc data fusion after decomposition,
and computing efciency of parallelized implementation, which
are based on the formalization of computational intensity and
decomposition of computational domain. The experiments are
executed on a cluster which consists of 32 computing nodes, and
each node consists of two Quad-core Intel Xeon E5420 2.5 GHz
CPU, 4G memory, and 667 MHz I/O speed. We use the SCATS and
GPS data units collected on 393 road segments of Shanghai
downtown area every 15 min from 8:00 to 17:00 with their
representation and decomposition like Fig. 5. The source data
can be downloaded from the website.1
The load-balance of computational domain decomposition is
tested under different OCI thresholds. The accuracy of multi-sensor
trafc data fusion is also tested under different OCI thresholds, by
1
Source data download URL: http://its.hznu.edu.cn/data.rar
6.4.1. Load balance test

By using the computational intensity, we need to ensure the
load-balance of decomposing the computational domain, which
can improve the performance of acceleration by parallel computing. In this test, we use three different OCI thresholds calculated
like Eq. (23) in Section 6.2, to decompose the ITS computational
domain. The three OCI thresholds are 1,793,152.6/64 28,018,
1,793,152.6/32 56,036, and 1,793,152.6/8 224,144. Based on the
thresholds, the whole ITS computational domain is decomposed
into 71, 71, and 29 sub-domains, respectively. We show OCI values
of all the sub-domains in Fig. 8, and compare the maximum one
with the minimum one to evaluate the balance of load after
domain decomposition.
In Fig. 8(a), if the threshold is set as 28,018 or 56,036, the
decomposition results show that the OCI values of all 71 subdomains are less than the threshold and greater than 20,000. By
exploring the results, the maximum OCI value is 27,965.125, while
the minimum OCI value is 20,110.5. The minimum value is above
70% of the maximum value, and therefore the maximum one will
not delay the completion of all computing tasks for a long time.
However, in Fig. 8 (b), if the threshold is set as 224,144, there exists
a big gap between the maximum OCI value and the minimum OCI
value for the 29 sub-domains. The comparison between two
gures demonstrates that different thresholds can lead to different
levels of load-balance for sub-domains. Therefore, for domain
decomposition, we can try different OCI thresholds to nd a better
one for load-balance. In this experiment, decomposing the ITS
computational domain into 71 sub-domains can improve the
computing performance of parallelized fusion in high efciency,
which will be shown in the efciency test.
6.4.2. Accuracy test
Because the BPA calculation depends on the data of the neighboring cells in each sub-domain, the domain decomposition affects the
accuracy of trafc state estimation by using BPA to fuse SCATS and GPS
data. The metric for this accuracy test is dened as follows:
Ac Sc =Sall
24
where Ac denotes the accuracy of data fusion, Sc denotes the number

of cells with correctly evaluated trafc states by fusing data within the
cells, and Sall denotes the total number of non-null cells with both
SCATS and GPS data. We calculate the average Ac of all the subdomains in each 15 min. The D-S evidential fusion applied in subdomains is compared with typical Bayesian fusion applied in subdomains, D-S evidential fusion applied in the whole ITS computational
domain, and trafc state estimation by using the single type of
sensors. The average Ac is also evaluated for the respective 71 and
29 sub-domains by domain decomposition under different OCI
165
Fig. 8. OCI values of all the sub-domains. (a) OCI values of 71 sub-domains by thresholds 28,018 or 56,036. (b) OCI values of 29 sub-domains by thresholds 224,144.
thresholds used in the load-balance test. The experimental results are

shown in Fig. 9.
As illustrated in the results of Fig. 9(a) and (b), the average Ac of
D-S evidential fusion applied in sub-domains is close to that of D-S
evidential fusion applied in the whole ITS computational domain.
Their results are all above 0.9, which means the D-S evidential
fusion works well for trafc state estimation. The results also show
that the domain decomposition does not affect the BPA calculation
seriously because in each sub-domain there are enough non-null
neighboring cells to calculate the BPA of either type of sensor and
the data fusion can lessen the impact of BPA calculation errors. In
this test, we compare the D-S evidential fusion with the typical
Bayesian fusion to evaluate the impact of domain decomposition
for different fusion methods. The Bayesian fusion is based on the
prior probability counted from the historical data in just time
series, which are affected by the domain decomposition more
seriously than D-S evidential fusion based on the BPA calculation
from the neighboring data cells in spatio-temporal series. The
results by all fusion methods are signicantly higher than those by
SCATS or GPS for the synergistic effect of multi-sensor data fusion
in trafc state estimation. In the gure, we also nd that the

average Ac of trafc state estimation using SCATS data is higher
than that using GPS data. This is mainly caused by the errors
brought by GPS sampling and map matching.
In Fig. 9(a) and (b), under the two different domain decomposition, the curves of D-S evidential fusion and Bayesian fusion
applied in 71 and 29 sub-domains are compared. By taking the
curve of D-S evidential fusion applied in the whole ITS computational domain as the reference line, the average Ac of D-S
evidential fusion and Bayesian fusion applied in 29 sub-domains
are higher than those of the fusion methods applied in 71 subdomains. This is because the sub-domains with bigger sizes
contain more non-null data cells which can improve the precision
of the calculated BPA and prior probability. In the gures it also
can be found that the distances between the two curves of D-S
evidential fusion and Bayesian fusion applied in 29 sub-domains
become closer. This can illustrate that the domain decomposition
affects Bayesian fusion more seriously than D-S evidential fusion.
The accuracy test is conducted not aiming to evaluate the
effectiveness of D-S evidential fusion in trafc state estimation, but
166
Fig. 10. Efciency of parallelized fusion by different load-balancing methods on

1, 2, 4, 8, 16, 32, 64, 128, and 256 computing cores.
Fig. 9. Average Ac of D-S evidential fusion applied in sub-domains, Bayesian fusion

applied in sub-domains, D-S evidential fusion applied in the whole ITS computational domain, SCATS data, and GPS data by domain decomposition under different
OCI thresholds. (a) Average Ac of test cases for 71 sub-domains. (b) Average Ac of
test cases for 29 sub-domains.
to evaluate the effectiveness of the fusion methods after the

domain decomposition.
6.4.3. Efciency test

(1) Execution time: The efciency test rstly uses execution
time as the metric to measure performance of parallelized fusion
on sub-domains. The execution time focuses on the time spent on
the execution of D-S evidential fusion part which is shown in Eqs.
(12) and (13). This experiment will explore the trend of execution
time as allocating 1, 2, 4, 8, 16, 32, 64, 128, and 256 computing
cores to 71 sub-domains. The execution time of parallelized fusion
using octree-based domain decomposition by OCI is compared
with two other load-balancing methods for domain decomposition, octree-based domain decomposition by number of cells and
even domain decomposition on amount of data. By number of
cells, based on the threshold denition in Eq. (23), the computational domain is decomposed into 64 sub-domains, which correspond to a three-level full octree. The amount of SCATS and GPS
data records on 393 road segments during 15 min is 7914, and
based on the threshold denition the amount of data records in
the sub-domain by even domain decomposition is 124. We

compare the execution time of parallelized fusion by these three
load-balancing methods on different number of computing cores
with the ideal acceleration trend. In order to enhance the visual
representation, we use Efciency (1/Execution Time) as the
y-axis. The experimental results are shown in Fig. 10.
In the gure, as the number of computing cores increases from
1 to 64, the efciency increases signicantly (as same as the
execution time decreases signicantly), along with the ideal
acceleration trend on cluster. Here we turn the efciency back
into the execution time. For example, the parallelized fusion using
octree-based domain decomposition by OCI takes more than
4500 s on 1 computing core, while decreasing to around 150 s
on 64 computing cores. However, the increasing rate of efciency
of parallelized fusion on the increment of number of cells or
amount of data is smaller than that by OCI. This is caused by that
compared with number of cells or amount of data the computational intensity is more exact to evaluate the computation load for
load-balancing methods. The efciency of parallelized fusion using
even domain decomposition on amount of data is greater than
that using octree-based domain decomposition on number of cells.
By the number of cells, in each sub-domain there are uncertain
number of non-null cells which make the computation load
seriously unbalancing. As an improvement, the computational
domain is evenly decomposed by amount of data, which is
equivalent to by the number of non-null cells. Although the
amount of data is more exact to evaluate the computation load
than the number of cells, both of them are not as exact as OCI
because the computation load of multi-sensor trafc data fusion
depends on not only the data amount for the memory aspect of
computational intensity but also the algorithmic complexity for
the I/O and computing time aspects of computational intensity.
The increasing rates of efciency of all the test cases are smaller
than the ideal acceleration trend because of the load-unbalancing
sub-domains and the computation overhead of domain decomposition and CI resources allocation. As the number of computing
cores increases from 64 to 128 and 256, the efciency of all the test
cases keeps nearly stable. This is for the reason that the numbers
of sub-domains are 71, 64, and 64 after the computational domain
decomposition by the respective load-balancing methods using
OCI, number of cells, and amount of data. The number of computing cores is much greater than the number of sub-domains.
In this experiment, by using OCI we decompose the ITS
computational domain into 71 sub-domains and the computing
task on each sub-domain is distributed to 1 computing core in the
program. Therefore, following the load-balancing method, the
maximum number of computing tasks assigned to one computing
167
core is 71 by using 1 computing core, 36 by using 2 computing

cores, 18 by using 4 computing cores, 9 by using 8 computing
cores, 5 by using 16 computing cores, 3 by using 32 computing
cores, 2 by using 64 computing cores, and 1 by using 128 or 256
computing cores. Supposing that all sub-domains have the same
computational intensity, based on the execution time by 1 computing core, the execution time of parallelized fusion can be theoretically decreased as 36/71 by using 2 computing cores, 18/71 by
using 4 computing cores, 9/71 by using 8 computing cores, 5/71 by
using 16 computing cores, 3/71 by using 32 computing cores, 2/71
by using 64 computing cores, and 1/71 by using 128 or 256
computing cores, which are respectively greater than the ideal
rates, i.e. 1/2 by using 2 computing cores, 1/4 by using 4 computing
cores, 1/8 by using 8 computing cores, etc. In this experiment, the
evaluated results of efciency show that the actual acceleration by
respective number of computing cores is less than the aforementioned theoretical evaluation. This is because the actual different
OCI values of sub-domains bring unbalanced loads for computing
cores, especially in a worse case that the number of sub-domains
is greater than the number of cores, and also cannot be exactly
divided by the number of cores.
(2) Computational throughput: Because the parallelized fusion
will be employed as a public service for many users to invoke
simultaneously, the computational throughput needs to be considered to measure the efciency and collaboration capability of
the parallelization based on efciently allocating CI resources to
sub-domains. Computational throughput thrp, dened as the
amount of computing tasks performed continuously and stably
[36], can be evaluated by counting the average number of tasks
completed per time unit, as follows:
As the number of computing cores increases from 1 to 256, the

thrp of single fusion part and multiple parts exhibit a consistently
increasing pattern, and especially the increasing trend of thrp of
single fusion part is close to the ideal increasing trend of thrp by
the number of computing cores. This shows that the load-balance
of domain decomposition based on OCI facilitates the parallelization achieving high efciency. The thrp of single fusion part is
always greater than that of multiple parts. This is due to the
overhead parts occupying CI resources for some time to decrease
the computational throughput. The thrp difference per computing
core of these two categories of computing tasks also show an
increasing trend. By using 1 computing core, the thrp of single
fusion part and multiple parts are 0.22 and 0.21 instances per
1000 s respectively, and their difference is 0.01 instances per
1000 s per computing core. By using 256 computing cores, the
thrp of these two cases achieve 36.79 and 22.82 instances per
1000 s respectively, and the difference achieves 0.05 instances per
1000 s per computing core. This is because that the data fusion is
the only part which can be parallelized, and other overhead parts
become the bottleneck in the parallel computing on multiple
computing cores. When more computing cores taking part in
parallelizing the data fusion, the computing time of the fusion part
on each computing core decreases, while the bottleneck effect of
the overhead part on one computing core becomes more obvious.
Therefore, the thrp difference per computing core increases as the
number of computing cores increases, and the thrp difference
between the single fusion part and multiple parts becomes greater.
thrp N task =dt
This paper studies how to formalize computational intensity of

big trafc data understanding and analysis to facilitate the parallelization under the load-balancing strategy. The solution is based
on the preparation work on the computational domain theory,
which can formally represent multi-sensor heterogeneous trafc
data by a high-dimensional data space consisting of cell tuples.
Moreover, the computational domain is transformed into three
different computational intensity aspects, memory, I/O, and computing time by the corresponding data-centric and operationcentric transformation functions. Afterwards, the derived overall
computational intensity is used to decompose the computational
domain into load-balanced sub-domains by the octree structure.
Finally, these sub-domains are distributed to CI computing
resources for parallel computing.
An example about fusing SCATS and GPS data for trafc state
estimation is presented to demonstrate the data understanding
and representation, computational intensity transformation,
domain decomposition, and application parallelization. Its experimental results in accuracy of domain decomposition and efciency
of parallelized fusion demonstrate that the sub-domains derived
from the computational-intensity-based decomposition are evaluated to have close OCI values, which facilitate signicant acceleration of parallel computing in execution time and computational
throughput. Our work is much more computing efcient than the
sequential computing, and costs much less power than the parallel
computing without load-balance strategy based on computational
intensity. Therefore, our work achieves a better trade-off in the
computing efciency and power dissipation.
Further research will rstly explore the utilization of computational intensity in more cases about trafc data analysis to validate
the formalization. Afterwards, by extracting the commonality from
these cases, a generic computing framework which synthesizes
the data representation, computational intensity evaluation, and
application parallelization will be investigated for data-driven ITS.
The approaches presented in multimedia content analysis [37,38]
25
where Ntask denotes the number of completed computing tasks,

and dt is the number of time units. The computational throughput
can be regarded as the reciprocal of execution time. In this
experiment, we set the time unit as 1000 s, and concurrently
submit 10 instances of computing tasks to measure the computational throughput. The computing tasks are divided into two
categories, single D-S evidential fusion part and multiple parts
consisting of the fusion part accompanied with some overhead
parts, such as domain decomposition, task scheduling, and CI
resources allocation. We execute the computing tasks on 1, 2, 4, 8,
16, 32, 64, 128, and 256 computing cores, respectively. The
experimental results are shown in Fig. 11.
Fig. 11. Computational throughput of single fusion part and multiple parts by 1, 2,
4, 8, 16, 32, 64, 128, and 256 computing cores.
7. Conclusions and future work
168
are of good use for big trafc data understanding. Furthermore, it

is of particular interest to investigate the implementation on some
other types of CI devices, such as general-purpose graphics
processing units.
Acknowledgments
This research is supported in part by the following funds: National
Natural Science Foundation of China under Grant numbers 61472113
and 61304188, and Zhejiang Provincial Natural Science Foundation of
China under Grant numbers LZ13F020004and LR14F020003.
References
[1] J. Zhang, F. Wang, K. Wang, W. Lin, X. Xu, C. Chen, Data-driven intelligent
transportation systems: a survey, IEEE Transp. Intell. Transp. Syst. 12 (4) (2011)
16241639.
[2] D. Chen, L. Wang, M. Tian, J. Tian, S. Wang, C. Bian, X. Li, Massively parallel
modelling & simulation of large crowd with gpgpu, J. Supercomput. 63 (3)
(2013) 675690.
[3] O. Goldreich, Computational complexity: a conceptual perspective, ACM
SIGACT News 39 (3) (2008) 3539.
[4] B.T. Morris, M.M. Trivedi, Learning, modeling, and classication of vehicle
track patterns from live video, IEEE Trans. Intell. Transp. Syst. 9 (3) (2008)
425437.
[5] W. Wang, J. Jin, B. Ran, X. Guo, Large-scale freeway network trafc monitoring:
a map-matching algorithm based on low-logging frequency GPS probe data,
J. Intell. Transp. Syst. 15 (2) (2011) 6374.
[6] C.N. Anagnostopoulos, I.E. Anagnostopoulos, I.D. Psoroulas, V. Loumos,
E. Kayafas, License plate recognition from still images and video sequences:
A survey, IEEE Trans. Intell. Transp. Syst. 9 (3) (2008) 377391.
[7] T.H. Chang, C.S. Hsu, C. Wang, L. Yang, Onboard measurement and warning
module for irregular vehicle behavior, IEEE Trans. Intell. Transp. Syst. 9 (3)
(2008) 501513.
[8] S. Atev, G. Miller, N.P. Papanikolopoulos, Clustering of vehicle trajectories, IEEE
Trans. Intell. Transp. Syst. 11 (3) (2010) 647657.
[9] M.S. Shehata, J. Cai, W.M. Badawy, T.W. Burr, M.S. Pervez, R.J. Johannesson,
A. Radmanesh, Video-based automatic incident detection for smart roads: the
outdoor environmental challenges regarding false alarms, IEEE Trans. Intell.
Transp. Syst. 9 (2) (2008) 349360.
[10] K. Choi, Y. Chung, A data fusion algorithm for estimating link travel time, ITS J.
7 (34) (2002) 235260.
[11] Z. Shan, Y. Xia, K. Li, X. Shi, A meta-level k-means method for evaluating the
advertising value of urban roads, in: ICTIS 2011: Multimodal Approach to
Sustained Transportation System DevelopmentInformation, Technology,
Implementation: Proceedings of the First International Conference on Transportation Information and Safety, Wuhan, China, June 30July 2, 2011, ASCE
Publications, Wuhan, 2011, p. 440.
[12] T. Gandhi, R. Chang, M.M. Trivedi, Video and seismic sensor-based structural
health monitoring: framework, algorithms, and implementation, IEEE Trans.
Intell. Transp. Syst. 8 (2) (2007) 169180.
[13] J. van Lint, Online learning solutions for freeway travel time prediction, IEEE
Trans. Intell. Transp. Syst. 9 (1) (2008) 3847.
[14] J.T. Wong, Y.S. Chung, Rough set approach for accident chains exploration,
Accid. Anal. Prev. 39 (3) (2007) 629637.
[15] F. Wang, H. Zhang, D. Liu, Adaptive dynamic programming: an introduction,
IEEE Comput. Intell. Mag. 4 (2) (2009) 3947.
[16] Y. Zhang, Z. Ye, Short-term trafc ow forecasting using fuzzy logic system
methods, J. Intell. Transp. Syst. 12 (3) (2008) 102112.
[17] S. Shekhar, C.T. Lu, R. Liu, C. Zhou, Cubeview: A system for trafc data
visualization, in: The IEEE Fifth International Conference on the Proceedings
of Intelligent Transportation Systems, 2002, IEEE, Singapore, 2002, pp. 674
678.
[18] C.T. Lu, L.N. Sripada, S. Shekhar, R. Liu, Transportation data visualisation and mining
for emergency management, Int. J. Crit. Infrastruct. 1 (2) (2005) 170194.
[19] http://www.inrix.com/, 2012.
[20] J. Hartmanis, R.E. Stearns, On the computational complexity of algorithms,
Trans. Am. Math. Soc. 117 (285306) (1965) 1240.
[21] J.K. Lenstra, A. Rinnooy Kan, Computational complexity of discrete optimization problems, Ann. Discrete Math. 4 (1979) 121140.
[22] V.D. Blondel, J.N. Tsitsiklis, A survey of computational complexity results in
systems and control, Automatica 36 (9) (2000) 12491274.
[23] Y. Xia, Z. Ye, Y. Fang, T. Zhang, Parallelized extraction of trafc state estimation rules
based on bootstrapping rough set, in: 2012 Ninth International Conference on Fuzzy
Systems and Knowledge Discovery (FSKD), IEEE, Chongqing, 2012, pp. 15321536.
[24] S. Wang, M.K. Cowles, M.P. Armstrong, Grid computing of spatial statistics:
using the teragrid for g in(d) analysis, Concurr. Comput.: Pract. Exp. 20 (14)
(2008) 16971720.
[25] S. Wang, M.P. Armstrong, A quadtree approach to domain decomposition for

spatial interpolation in grid computing environments, Parallel Comput. 29
(10) (2003) 14811504.
[26] Yingjie Xia, Jia Hu, Michael D. Fontaine, Cyber-ITS Framework for Massive
Trafc Data Analysis Using Cyber-Infrastructure, The Scientic World Journal.
2013 (2013), Article ID 462846, 9 pages.
[27] K. Devlin, Sets, functions, and logic: an introduction to abstract mathematics,
Sets Funct. Logic: Introd. Abstr. Math., CRC Press, 2003.
[28] Y. Xia, X. Li, Z. Shan, Parallelized fusion on multisensor transportation data: A
case study in cyberits, Int. J. Intell. Syst. 28 (6) (2013) 540564.
[29] P. Smets, R. Kennes, The transferable belief model, Artif. Intell. 66 (2) (1994)
191234.
[30] Q. Kong, Z. Li, Y. Chen, Y. Liu, An approach to urban trafc state estimation by
fusing multisource information, IEEE Trans. Intell. Transp. Syst. 10 (3) (2009)
499511.
[31] C.L. Jackins, S.L. Tanimoto, Oct-trees and their use in representing threedimensional objects, Comput. Gr. Image Process. 14 (3) (1980) 249270.
[32] H. Samet, The quadtree and related hierarchical data structures, ACM Comput.
Surv. 16 (2) (1984) 187260.
[33] Y. Xia, L. Kuang, X. Li, Accelerating geospatial analysis on GPUs using CUDA,
J. Zhejiang Univ. Sci. C 12 (12) (2011) 990999.
[34] K. Qureshi, B. Majeed, J.H. Kazmi, S.A. Madani, Task partitioning, scheduling
and load balancing strategy for mixed nature of tasks, J. Supercomput. 59 (3)
(2012) 13481359.
[35] T.D. Braun, H.J. Siegel, N. Beck, L.L. Blni, M. Maheswaran, A.I. Reuther,
J.P. Robertson, M.D. Theys, B. Yao, D. Hensgen, et al., A comparison of eleven
static heuristics for mapping a class of independent tasks onto heterogeneous
distributed computing systems, J. Parallel Distrib. Comput. 61 (6) (2001)
810837.
[36] R. Raman, M. Livny, M. Solomon, Matchmaking: distributed resource management for high throughput computing, in: Proceedings of the Seventh International Symposium on High Performance Distributed Computing, 1998, IEEE,
1998, pp. 140146.
[37] L. Zhang, Y. Gao, K. Lu, J. Shen, R. Ji, Representative discovery of structure cues
for weakly-supervised image segmentation, IEEE Trans. Multimed. 16 (2)
(2014) 470479.
[38] L. Zhang, Y. Gao, R. Zimmermann, Q. Tian, X. Li, Fusion of multi-channel local
and global structural cues for photo aesthetics evaluation, IEEE Trans. Image
Process. 23 (3) (2013) 14191429.
Yingjie Xia received his Ph.D. degree in computer
science from Zhejiang University, China. He has been
a Postdoc in the Department of Automation, Shanghai
JiaoTong University from 2010 to 2012, supervised by
Professor Yuncai Liu. Before that, he had been a Visiting
Student at University of Illinois at Urbana-Champaign
from 2008 to 2009, supervised by Professor Shaowen
Wang. He is currently an Associate Professor in Hangzhou Institute of Service Engineering, Hangzhou Normal University. His research interests include
multimedia analysis, pattern recognition, and intelligent transportation systems.
Jinlong Chen received his Bachelor degree in computer

science and technology from West Anhui University,
China. He is currently a Graduate Student in Hangzhou
Normal University. His research interests include cloud
computing and intelligent transportation systems.
Chunhui Wang received his Bachelor degree in Department of Automation from Nanjing University of Posts
and Telecommunications, China. He is currently a
Graduate Student in Hangzhou Normal University. His
research interests include data fusion, data mining, and
intelligent transportation systems.

1 s2.0 S0925231215006839 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0925231215006839 Main

Uploaded by

Copyright:

Available Formats

Neurocomputing 169 (2015) 158168

Contents lists available at ScienceDirect

Formalizing computational intensity of big trafc data understanding

College of Computer Science, Zhejiang University, 310012 Hangzhou, China

parallelize the data-driven ITS applications in step (3), we need to

Y. Xia et al. / Neurocomputing 169 (2015) 158168

Fig. 1. Architecture of our work.

detector, Global Positioning System (GPS), and Remote Trafc

Y. Xia et al. / Neurocomputing 169 (2015) 158168

Fig. 3. ITS computational domain in three-dimensional Euclidian space.

Fig. 2. Location indices denition of road segments.

3. ITS Computational domain theory

f i=xt ; j=ylx ; k=zly s; v; f

Based on the formalized representation of data, the computational

4. Computational intensity transformation

Y. Xia et al. / Neurocomputing 169 (2015) 158168

4.2. Transformation functions

where the denition of distance function D depends on the

TRd TRd1 TRd2 TRdn

TRo TRo1 TRo2 TRon

The denitions of aforementioned functions are based on the

A metric is required to evaluate the overall computational

where w1 ; w2 , and w3 are the weights for computational intensity

The input count of a cell is evaluated as the number of writing

4.3. Overall computational intensity

where memoryUnit is a constant used to convert the number of

The overall transformation function TR for an application can

TR TRd TRo TRo TRd

As a cutting-edge research in ITS, the trafc state estimation by

Y. Xia et al. / Neurocomputing 169 (2015) 158168

intensity evaluation, we focus on the analysis of data-centric and

executed on CI resources [28]. The complete fusion process

5.2. Fusion algorithm

where mst denotes the basic probability assignment (BPA) of a

Similarly, the I/O count can be evaluated as

N i;j;k_neighbors 4nBPA_number BPA_number

where N i;j;k_neighbors indicates the number of reading (output) the

BPA_number n2nBPA_number 2nBPA_number

timeUnit nBPA_number nN i;j;k_neighbors 4nBPA_number

where BPA_number nN i;j;k_neighbors is the number of CPU cycles to

6. Utilizing computational intensity for parallel computing

Y. Xia et al. / Neurocomputing 169 (2015) 158168

calculated according to the real hardware conditions as

The weights can also be determined manually, e.g., if the

Fig. 5. Octree-based ITS computational domain decomposition under a threshold

BPA_number=2 2:5 GHz=2 GHzntimeUnit nBPA_number

nN i;j;k_neighbors 4nBPA_number =2ntimeUnit

8nBPA_number 0:5nN i;j;k_neighbors 2nBPA_numbers

0:5nBPA_number 0:625nBPA_number nN i;j;k_neighbors

value which is calculated by summarizing OCI values of all its

Y. Xia et al. / Neurocomputing 169 (2015) 158168

comparing fusion results with ground-truth values, which are

OCI computational_domain =80. To be different from recursive octree-based

Source data download URL: http://its.hznu.edu.cn/data.rar

6.4.1. Load balance test

where Ac denotes the accuracy of data fusion, Sc denotes the number

Y. Xia et al. / Neurocomputing 169 (2015) 158168

thresholds used in the load-balance test. The experimental results are

in trafc state estimation. In the gure, we also nd that the

Y. Xia et al. / Neurocomputing 169 (2015) 158168

Fig. 10. Efciency of parallelized fusion by different load-balancing methods on

Fig. 9. Average Ac of D-S evidential fusion applied in sub-domains, Bayesian fusion

to evaluate the effectiveness of the fusion methods after the