You are on page 1of 8

Real-Time Monitoring of Road Traffic using Data

Stream Mining
Guilherme Guerreiro Ruben Costa
Paulo Figueiras
CTS, UNINOVA, Dep. de Eng.ª CTS, UNINOVA, Dep. de Eng.ª
CTS, UNINOVA, Dep. de Eng.ª
Eletrotécnica Faculdade de Ciências e Eletrotécnica Faculdade de Ciências e
Eletrotécnica Faculdade de Ciências e
Tecnologia, FCT, Universidade Nova de Tecnologia, FCT, Universidade Nova de
Tecnologia, FCT, Universidade Nova de
Lisboa Lisboa
Lisboa
2829-516 Caparica, Portugal 2829-516 Caparica, Portugal
2829-516 Caparica, Portugal
gam@uninova.pt rddc@uninova.pt
paf@uninova.pt
António Rosa Ricardo Jardim-Gonçalves
Zala Herga
CTS, UNINOVA, Dep. de Eng.ª CTS, UNINOVA, Dep. de Eng.ª
Artificial Intelligence Laboratory
Eletrotécnica Faculdade de Ciências e Eletrotécnica Faculdade de Ciências e
Jožef Stefan Institute
Tecnologia, FCT, Universidade Nova de Tecnologia, FCT, Universidade Nova de
Jamova cesta 39, 1000 Ljubljana,
Lisboa Lisboa
Slovenia
2829-516 Caparica, Portugal 2829-516 Caparica, Portugal
zala.herga@ijs.si
arm.rosa@campus.fct.unl.pt rg@uninova.pt

Abstract—The development of applications for advanced whether we address public transports or roads infrastructures.
Intelligent Transportation Systems (ITS), is a growing research The correlation of more population and more private vehicle
area, pushed especially by the need to mitigate problems related usage, is a known catalytic agent to contribute to traffic
to transportation inefficiency, overuse and pollution, mainly congestions, and consequently, increase traffic accidents and
caused by the increasing of traffic congestions in big cities. more CO2 emissions into the environment. Hence, also capable
Nowadays, traffic related data coming from the road of affecting the overall public health, by diminishing air
infrastructure, user’s cars or mobile phones, is generated in quality, which is easily perceived in heavy traffic areas.
unprecedent volume and speed. Therefore, ITS face new Besides contributing to economic losses associated with
challenges, with respect to growing demand to process large
infrastructures´ costs and hours spent in traffic jams that could
volumes of traffic data in real-time, and extract value that can be
immediately used for decision making. The work presented here,
be productive otherwise. These issues, are rapidly growing and
proposes a scalable architecture supported by Big Data need a paradigm shift in the way cities are organized, as well as
technologies, capable of processing real-time traffic data in people´s behaviour, in order to keep cities sustainable[2].
captured from 349 inductive loop counters, placed in Slovenian Moreover, since building more roads is no longer viable and
road network. The main goal is to propose an approach which sustainable as long run policy in many cities, there is a clear
can be adopted by national road operators, capable of need to find new ways to make a better use of existing
monitoring, in real-time, the current status of the road network. infrastructures. Thus, it is necessary to support new
Traffic events need to be detected in real-time and require the technological solutions and new transportation models that can
collection of data from several fixed sensors. To deal with data help achieve more efficiency, and equally offer good
volume and velocity challenges, a data stream management convenience of use to users, fitting their needs.
system was used, able to collect traffic data that arrives and
process it on a continuous (24x7) basis. Results achieved, suggest The transportation sector, aligned with the need to mitigate
that there’s a great advantage in the adoption of a data stream the problems stated above, is under a profound transformation,
management system, with respect to traditional database pushed by the advances of several technological areas, either in
management systems, illustrated by practical examples, hardware (e.g. sensing and wireless communication
providing quantitative metrics about data processing capabilities) or software (Artificial Intelligence based on
performance. Machine Learning already used by autonomous vehicles).
These technologies have the possibility to change the way we
Keywords— Intelligent Transportation Systems, Stream see mobility nowadays and between 2020 and 2030, as
Analytics, Stream Processing, Traffic Management. autonomous vehicles (AVs) become regulated by countries, is
expected a consequential disruption in transportation methods.
I. INTRODUCTION This idea is supported by roadmaps[3] that indicate future
strength of a new market that will explore electric, autonomous
Cities have to deal with several challenges regarding their
vehicles and shared rides, using business models based on
transportation infrastructures due to the number of people
“transport-as-a-service” (TaaS). Such transformations in the
living and commuting in these areas on a daily basis. The
transportation domain, should be capable of dropping
United Nations project that, in 2030, the 63 most populated
transportation costs through cheaper maintenance and cheaper
cities will house at least 400 million people[1], which is an
running costs, having a positive impact for families. Promoting
unprecedent number. Urban infrastructures and services, were
in that way, a paradigm shift in the way we look into
not designed to be used by such number of commuting people,
transportation nowadays.

978-1-5386-1469-3/18/$31.00 ©2018 IEEE 2018 IEEE International Conference on Engineering,


Technology and Innovation (ICE/ITMC)
One major thing in common within technologies that soon In [10], some of the authors of this paper studied and
can be in full use, such as autonomous cars or Vehicle to proposed an architecture based on Big Data technologies (using
Vehicle/Infrastructure (V2V,V2I) communication, and Apache Spark, MongoDB) for batch processing, mainly to
technologies already widely implemented in roads today (in- perform “Big ETL” on traffic related information, obtained
roadway sensors like pneumatic road tubes, inductive loop through large batch files and datasets from web services.
detectors, piezoelectric sensors, along with over-roadway Framing the needs in an application scenario from an European
sensors like infrared sensors and video cameras)[4] is the research project, OPTIMUM [11], contributing to deliver
generation of massive amounts of heterogeneous traffic related multi-source Big Data fusion driven proactivity for intelligent
data. Volume of data that can be even bigger if we consider all mobility. The work presented here proposes and validates a
useful traffic related information created by cars, mobile stream processing approach for traffic management, with the
phones, users in social media or in traffic mobile applications. objective of deliver also real-time Big Data processing
All this data represents an opportunity for data driven capabilities and streaming analytics, together with an interface
application for Intelligent Transportation Systems (ITS), to show basic key performance indicators about traffic
applications that can also be data providers to others. One of behavior. The main processing engine adopted to perform on
the upfront challenges is how to properly deal with this kind of the fly transformations and aggregations was Apache Storm.
data, commonly called Big Data. Characterized by its The validation was performed using 17 months of Slovenian
overwhelming continuously rising Volume, Variety of data traffic loop data as input, with the developed topologies
sources with structured or unstructured datasets that may vary (Storm´s workers logic) tailor-made for the specific datasets.
in Veracity (different data quality/ accuracy), Velocity of data Nevertheless, this logic can easily be changed/extended to
collection and transmission, and finally Value, referring to the enable the processing of other types of datasets and calculate
extraction of information from datasets[5][6], which can be other metrics than the ones presented. In terms of performance,
done by applying data analytics and data mining tools. This the objective was to assess how the proposed stream processing
kind of traffic related information can be extremely useful, approach for traffic management, behaves. Results achieved so
namely to identify mobility pattern and understand far, indicate that the proposed approach for stream processing
transportation network weaknesses, in order to road authorities is capable of handling real-time traffic data from loop sensors,
take actions to improve efficiency. Other advantageous and also showed good performance when on the fly
applications, only possible through the availability of such data, aggregations or queries on data streams were needed.
are the ones for forecasting purposes, where historical data can
be used to train models that will predict how traffic will behave This paper is structured as follows: Section II presents the
in the future, as in [7]. Information that can be applied, for Related work, describing also technologies assessed. The
instance, in transport infrastructure design, by predicting how scenario and data sources used for tests are described in section
drivers will behave if any changes are introduced. III. Section IV describes the followed methodology, while in
Section V is described the technical architecture with special
But before any data analysis is possible, all data generated focus in Storms´ topologies and implemented logics. Section
from sensors must be cleaned, transformed and stored VI describes the performance results and shows the
efficiently. This process, named Extract, Transform, Load visualization component, created for validation. Finally, section
(ETL)[8] stage, is particularly challenging in terms of VII presents the conclusions and future work.
computational resources, requiring the adoption of reliable Big
Data technologies that can combine distributed and parallel II. RELATED WORK
processing, together with Map Reduce algorithms. Big Data
processing is classified in [9]: ITS and Big Data are two recent, deeply intertwined
research areas. In [12], authors present a review of ITS history
• Batch Processing: processes that normally run at and ITS architecture, defining the utilization of multiple
certain place in time, normally scheduled or triggered sources of traffic related data in ITS applications one of the
by something, dealing with big files/datasets in large main challenges today. While Data Fusion and Data Mining are
volumes; presented as both key technologies and bottlenecks.
• Stream Processing: continuous processes that deal There are several studies about the implementation of
with datasets produced at high velocity rates, requiring parallel and distributed in-memory technologies for stream and
on the fly transformations. Ideally processed near real- batch processing in the ITS domain, as the way to deal with the
time. great amount of heterogeneous data generated nowadays,
addressing issues such as scalability, data fusion, and good
A Stream processing approach is suitable for performing
processing capabilities [13][14][15], as well as for data
data analytics on the fly, when traffic data is being streamed
analytics. More specifically, in [14], a platform to process
from sensors, e.g. loop sensors, weather data, car vehicle data.
structured and unstructured data to feed Smart Cities
Nevertheless, these streams of traffic related data can also
applications is presented, using Hadoop, Apache Spark and
produce traffic indicators near real-time, through the use of
Apache Storm for data processing. Kafka is the choice for data
stream analytics jobs in the process. These stream analytics
ingestion. The work also shows benchmarks to assess
jobs are a fast way to identify events on the road and enable the
technologies’ performance. The authors of [13], propose an
road manager to take action faster and proactively (before the
approach for automotive applications that includes machine
event is reported), trying to resolve any situation more quickly.
learning libraries and reviews mainly the same technologies as
the previous authors, besides some others such as Spark

978-1-5386-1469-3/18/$31.00 ©2018 IEEE 2018 IEEE International Conference on Engineering,


Technology and Innovation (ICE/ITMC)
Streaming and Flink. In [15] is applied similar technologies for TABLE I. COMPARISON BETWEEN STREAM PROCESSING
FRAMEWORKS[24].
a mobility analytics framework, applying Spark Streaming
Analytics. Spark
Samza Storm
Streaming
Considering applications for traffic control/monitoring in One record at One record at time Micro-batch
Processing
the ITS domain using Big Data streaming and analytics to model time
process data near real-time, the literature is scarse, as Latency milliseconds milliseconds seconds
confirmed by [16], which is unexpected, since there is a shared 100k+ records 10k+ records per 100k+
understanding that joining both traffic data and real-time per node per node per second records per
streaming analytics can be advantageous. The works presented Throughput second node per
by [16] and [17], have a more similar objective with the work second
presented here. Yet, both authors follow completely different At-least-once At-least-once or Exactly
approaches. The authors of [16] follow a similar approach to delivery; exactly once once
the above ITS studies, applying parallel distributed Processing Support for
guarantees exactly-once
technologies: Kafka is employed, with Hadoop serving as data planned
warehouse, for analytics. The study also indicates as future
work the usage of Spark Streaming for data processing, Language
+ +++ ++
analytics, and machine learning, in order to deal with the future support
multi-objective control platforms that are expected to be Estimated
+ +++ ++
community size
needed for V2X, autonomous vehicles, etc. On the other hand,
[17] employs a completely different approach, presenting Storm was the chosen technology to use, because of
DBStream, a continuous analytics system that combines data its latency of milliseconds, throughput rate, compatibility with
from multiple sources and generates analyses, along with many programming languages, big community support, and
visualization modules. In this case, the approach is a because is widely used in studies for high stream performance
centralized architecture, developed on top of improved applications, as in [25][26].
Database Management Systems (DBMS). In the demonstration,
the application runs in powerful hardware, which is the III. APPLICATION SCENARIO AND DATA AVAILABILITY
opposite of shown in other solutions. The scenario is based on the use of Slovenian loop
With respect to Distributed Stream Processing Frameworks counters‘ data, in order to develop a stream processing
(DSPF) for Big Data processing, Storm, Spark Streaming, and architecture, capable of:
Samza (the most recent one), all from Apache[18], are three • Process big volumes of traffic data that may come from
popular open-source solutions referenced at time, all of them multiple sources;
with different characteristics. Besides other commercial
solutions like Microsoft Azure[19] or IBM Streams[20]. • Perform data transformations and aggregations, either
through ETL jobs, traffic data analytics, or both at
(i)Samza[21] functions with ordered partitions that contain
same time;
immutable messages. It uses Kafka in his architecture as
streaming layer and YARN, a resource manager for clustering, Other requirements are: easy scalability to include
in the execution layer. different data sources and correlate findings, distributed
processing capabilities for ease the processing needs, using one
(ii)Storm[22] is the most referenced one. It uses tuples,
or multiple “not high-end” machines. Both these requirements
named list of values, to structure data in streams, being the
are fulfilled with the use of Apache Storm.
processes defined by topologies. These topologies, deployed to
Storm engine, determine the data flow graph and consists More specifically, due to data available, implementation
mainly of: Spouts, units that provide data, and Bolts, units that focusses in the processing of traffic loop counters´ data,
perform logic processing. generated by inductive loop sensors, with the end of producing
key performance indicators with usable information for traffic
(iii)Spark Streaming[23] is part of overall Spark
managers. Data is generated by 349 sensors in multiple roads,
framework, it uses RDDs ( Resilient Distributed Datasets) as
spread in urban and rural areas around all country, as shown in
data abstraction, being possible to perform multiple operations
Figure 1.
over these RDDs. A stream of data in Spark Streaming is
denominated as DStream (Discretized Stream), being nothing
less than a sequence of RDDs partitioned between processing
nodes.
All three offer relative low latency, scalability and fault-
tolerance mechanisms, but present different approaches. Storm
and Samza process one record at a time, while Spark Streaming
produces micro-batches to be processed. Table 1 shows a
comparison between the three.

978-1-5386-1469-3/18/$31.00 ©2018 IEEE 2018 IEEE International Conference on Engineering,


Technology and Innovation (ICE/ITMC)
well accepted by the scientific community. It can be
synthesized in 6 steps: business understanding(i), data
understanding(ii), data preparation(iii), modeling(iv),
evaluation(v), and finally deployment(vi). The sequence of
steps is not strictly sequential, it is rather recursive, where
many times is necessary to revisit previous steps, altering and
adapting the development if necessary.
In this case, business understanding and data understanding
were especially important, since one of the objectives is to
deliver information that have somehow value for road
operators, in order to use the information to improve efficiency
in the road network. Considering this, several studies on raw
data were performed, such as: assess data availability per
sensor, assess data availability per month, study the behavior of
average speed and occupancy during weekdays or weekends,
Fig. 1. Loop Sensors ‘distribution in Slovenian.
study the flow in different temporal aggregations, compare
Data was collected and stored through web services that traffic flows in different geographical areas, between others.
provide loop sensors´ data, in JSON format, matching the flow As example of the data exploration made, Figure 3 shows
of vehicles within a 5 minute interval. Each sensor collects occupancy values per month, for the year 2016. Outliers, which
several information, namely: average time interval between are unusual values of occupation in the road, are represented
vehicles passage (“stevci_gap”), average speed (“stevci_hit”), with red marks. Typically, outliers represent unusual situations
road occupation in number of vehicles (“stevci_stev”). Each on the road that affect the normal flow of cars.
record of data, in addition to the information collected by each
sensor, also has the location of the sensor, date and time of data
collection, sensor Id, name and section of the road where the
sensor is placed.

Fig. 3. Box plot showing occupancy values, per month (January to


December, left to right), for year 2016.

From the studies was possible to assess the


importance of some variables regarding the application context.
Therefore, speed and occupancy values were the variables
selected to serve as key performance indicators for traffic
monitoring in this first iteration of the system. Thereunder, a
large part of the processing work is done on this data, being
also the two main outputs in the visualization component,
either in simple or temporally aggregated metrics.

Fig. 2. Sample of one sensor´s data, for two lanes in the same direction. V. ARCHITECTURE
The architecture can be divided into 3 main blocks, which
The data collected and used for this study, corresponds to also represent the 3 phases of the data processing pipeline
17 months of loop data, from January 2016 to May 2017, implemented: data ingestion, stream data processing and data
making a total of 33 Gigabytes of data stored. storage/visualization.

IV. METHODOLY Taking into account the requirements and related work, the
technologies adopted for each phase are (Figure 4):
The methodology adopted to conduct the work was CRISP-
DM (Cross Industry Standard Process for Data Mining), which • RabbitMQ[27] is the technology responsible for
establish the principles for data-mining related projects, being receiving and collecting data from sources, sending

978-1-5386-1469-3/18/$31.00 ©2018 IEEE 2018 IEEE International Conference on Engineering,


Technology and Innovation (ICE/ITMC)
afterwards stream messages to Apache Storm.
RabbitMQ utilizes Advanced Message Queuing
Protocol (AMQP), which allows asynchronous
messages exchange between applications or services. It
can also use more than one thread to deliver messages,
key feature for distributed data processing applications.
Besides the possibility to deliver the same messages to
multiples data processing jobs, for different ends.
Nevertheless, other solutions like Kafka would also fit.
• Apache Storm is the choice for the core processing
engine, as seen before. Being responsible for all
transformations and analytics made and sent to the
Fig. 5. Storm Topology for aggregation of speed and occupation values.
visualization component and storage. Hence, was
developed and deployed to Storm topologies to process
(i)RabbitMongoSpout: Spout unit that consumes the
accordingly the data coming from RabbitMQ. As well
messages made available by RabbitMQ. Every 1 millisecond it
as to, simultaneously, output the wanted analytics.
checks if there are new messages in the queue. Here the String
• HTML and JavaScript, also together with Leaflet API messages are serialized into Objects in tuples, before passed
and Highcharts API were used for data visualization. into the next stage.
Being used Web Sockets to connect Storm and the (ii) GetterMongoBolts: Objects are received from (i) using
visualization layer, along with Apache Camel for a ShuffleGrouping connection, meaning that the data is passed
connections mediation. For storage is adopted between the instances of the topology randomly. This stage is
MongoDb, for the same reasons as applied in[10], responsible mainly to acknowledge the sensor that generated
explicitly because it utilizes a document oriented the data through the sensor Id. This step also enables the
NoSQL approach, has low latency in real-time for parallelization process for the aggregations made in the next
reading and writing documents, besides its good step.
scalability. In this way, it is also possible to integrate
the architecture present in this study with the (iii)FieldsBolt: Objects are received by Bolts throught
architecture implemented in [10]. FieldsGrouping connection, taking into account their sensor Id
number, allowing that data from the same sensor to be properly
aggregate. Also considering directions and lanes. Therefore, in
FieldsBolt, there will be one Bolt per sensor when parallel
processing. In the case of aggregations that require that all
traffic data associated to a certain time/sensor be process
together within a specific end, like hourly averages
calculations, was also created a customGrouping. This
customGrouping guarantees that each Bolt, for the necessary
period of time, will receive loop counter´ data from the same
sensor. Not erasing completely the previous data immediately
after the first record is processed, which is what happens in
standard jobs. Enabling in that way, continuous calculations
taking into account previous values, without any intermediate
datawarehouse technology.
Fig. 4. Architecture.
(iv)AggregationBolt: here tuples are joined in just one Bolt.
This Bolt is responsible for sending the output to MongoDB
A. Topology instances or to the visualization component, or both.
All logic behind stream processing work is deployed to
It’s also worth mention that an additional stage was created
storm via topologies. These topologies represent a graph of the
in the topology to manage the transference of data to the
entire computation flow. In this case, two main topologies were
visualization component. This stage, called JmsBolt1, is based
developed, one to deal with speed values and another to handle
in an open source library, “Storm JMS”, available in GitHub.
occupation values. Both to generate instant and temporally
JmsBolt has the job of coordinate messages between storm´
aggregated values, like for example, to calculate hourly
output tuples and ActiveMQ, a message broker similar to
averages. Both topologies implement the logic shown in Figure
RabbigMQ for point to point public subscribe service. Being
5, following these 4 stages in Storm (better describe below).
Apache Camel responsible to manage the exchange of
The stages are: (i)RabbitMongoSpout,(ii)getterMongoBolts, information in the Web Sockets, passing messages to front-end
(iii)FieldsBolt, (iv)AggregationBolt. technologies.

1 https://github.com/ptgoetz/storm-jms

978-1-5386-1469-3/18/$31.00 ©2018 IEEE 2018 IEEE International Conference on Engineering,


Technology and Innovation (ICE/ITMC)
VI. VALIDATION To stream and process all data, with 1 worker, it took
The validation of the real-time monitoring system for road approximately 3 hours and 11 minutes. Using 4 workers the
traffic, using data stream mining, proposed here, focusses in execution time were worse, approximately 3 hours and 35
two main dimensions: (i)assess the performance achieved when minutes, being a consequence of slightly worse latency in the
processing data streams and producing analytics; (ii)assess the tuples-hour indicator (more 0.2 milliseconds), although, results
data visualization component, which is the main interface to of performance are very similar. Both setups (1 and 4 workers)
provide traffic monitoring to any user. process almost 9 messages per second, that is, 9 times 349
sensors´ data, verified by the Publish/ Consumer Acknowledge
(8.8 messages per second in Figure 6). From the Storm
A. Performance performance is important to highlight the very low latency in
The main goal of performance testings was to evaluate how Bolts (pointed by the circle in Figure 7) and the capacity used,
Storm performs when applying the topologies shown in Figure very near to 0.
5, to process a specific workload composed by real traffic flow
data, as input. In particular, analyse the output rate, latency 1) Performance Discussion
produced, and evaluate processing stability and reliability when From the results is possible to infer that Storm processed
data come with higher periodicity. Framing also system´s the coming streams very easily, confirmed by the low capacity
limitations. To do this, all data stored from Slovenian roads, values reached in Bolts. That is, if needed, processing
described in Section III, were used, in order to see how fast all capabilities could even be increased by augmenting data
data could be processed in streams by the system. To perform ingestion, namely using more RabbitMQ instances, which are
the tests, was used one instance of RabbitMQ to retrieve time- the bottleneck of the stream processing pipeline at this time.
ordered data, continuously, from MongoDB, where historical Latency was also very low, as expected. Regarding the slightly
data is stored. Each message retrieved, contains loop data for worse output rate obtained with 4 workers, by comparison with
all the 349 sensors available, for a specific date time, and is only 1 worker, they are explained and confirmed in [22] by the
publish by RabbitMQ to be further consumed by Storm. Being need to use the workers transfer buffer, to allow the
always published a new message after the previous one is transference of data between workers, when performing tasks.
consumed, simulating in that way streams of data that could be In [28], where is shown how Spotify uses Storm, same
coming from loop sensors itself or from any intermediate in- conclusion is reached, being the advised tuning 1 worker per
memory data warehouse, for instance. Both RabbitMQ and node per topology, to achieve better performance.
Apache Storm offer a monitoring interface, where all
information concerning performance and tasks accomplished B. Visualization Component
can be accessed, shown in Figure 6 and Figure 7. In Figure 8 and Figure 9 are shown screenshots samples of
Machine Specification: Intel(R) Core(TM) i7-7500U CPU @ the visualization interface component, showing the traffic
2,70Ghz , Quad-Core. 8GB RAM , 256 SSD monitoring application working. This component is responsible
of showing to the user, in real –time, the output of the
Software Version: Apache Storm 0.9.1 - incubating, processing engine, when is receiving streams. The metrics
MongoDB 3.0.4, RabbitMQ 3.6.10. shown in the Figures, linked to speed and occupancy, are the
Setup of Storm: 1 Nimbus (master node), 1 Zookeeper metrics reached in Section IV, determined through following
(coordination service), 1 to 4 Worker nodes and Supervisors CRISP-DM methodology, being two common key performance
(manage worker processes). indicators to measure traffic network performance.
Thereunder, when Storm receives a new stream of loop data,
Results from processing 17 months of loop data, about processes it and sends the results to the visualization
95500 messages, can be seen in Figure 6 and Figure 7. component, through the process already described in Section V,
thereafter, charts in the interface are updated.
In Figure 8, is shown an example of the web-based user
interface, where is displayed instant values of speed and
occupancy, together with the average number of cars detected
in the last hour, for a group of sensors. It is possible to, for
example, check the values for all sensors or isolate the ones are
required, for instance, to monitor a certain geographic region.
Fig. 6. RabbitMQ report using 1 Worker in Storm. Time aggregated variables can also be customized, in order to
see average values, peaks and minimum values for a certain
period in time: last hour, last 30 minutes, others. This enables
the user to continuously check and compare several metrics in
one dashboard, enabling real-time information for road
managers.

Fig. 7. Storm performance using 1 Worker.

978-1-5386-1469-3/18/$31.00 ©2018 IEEE 2018 IEEE International Conference on Engineering,


Technology and Innovation (ICE/ITMC)
Fig. 8. Charts showing instant values of speed (upper left) , occupation (upper right) and hourly aggregated values for occupation (bottom), for a big range of
sensors.

Besides the common bar charts, other types of of cope the main objectives, namely, perform real-time stream
visualizations were also explored, with the objective of show to computing of traffic related data and generate on the fly traffic
the user usable information in a most readable, interactive, and indicators with it, through stream mining. Being shown,
meaningful way. In Figure 9, a geographical representation of through a user interface, average speeds and occupation
the sensors and their values is shown, done by placing bars in metrics, aggregated in different granularities (e.g. current time,
the map that vary according to their values. From it, is possible last 30 minutes, hourly). Moreover, for the specific scenario of
to realize, for instance, which areas comprise more traffic in a Slovenian loop counters´ data, the performance achieved by the
certain time. system in tests, outperform the needs. Having a throughput and
output rate much faster than the input rate, if working in real-
time, showing that the implementation is valid to perform
stream processing work, such as ETL jobs and real-time
streaming analytics. The use of Big Data technologies also
cover requirements like parallel processing capabilities and
scalability.
ITS applications capable of stream mining, such as the one
developed in this study, have increasing value for traffic
operators, making available information not generally
accessible in real-time, considering the classical methods to
monitor the road network. Especially, if we compare these new
Data Stream Management Systems (DSMS) approaches with
the classical ones, using Database Management Systems
(DBMS), which are more static, less flexible, and perform
normally later analytics. In the other hand, DBMS still fit best
when is required to perform analysis on persistent data.
Fig. 9. Overview of occupancy values, for all Slovenian loop sensors,
varying their value in real-time depending on time. Future work involves the study and development of a
Complex Event Processing (CEP) software layer that could
take advantage of the traffic monitoring information, to
VII. CONCLUSIONS AND FUTURE WORK automatically detect unusual events on the road. In another
Results showed that the approach presented here, perspective, we also wish to use the base of the stream
materialized in an application for traffic monitoring, is capable processing pipeline, shown here, to process other data sources

978-1-5386-1469-3/18/$31.00 ©2018 IEEE 2018 IEEE International Conference on Engineering,


Technology and Innovation (ICE/ITMC)
that can affect traffic, such as weather data, traffic events data, IEEE Int. Conf. Big Data Secur. Cloud, BigDataSecurity 2017, 3rd
social events data, or others. The objective of using different IEEE Int. Conf. High Perform. Smart Comput. HPSC 2017 2nd
IEEE Int. Conf. Intell. Data Secur., pp. 167–172, 2017.
data sources is to develop also new topologies, applying the
[13] A. Luckow, K. Kennedy, F. Manhardt, E. Djerekarov, B. Vorster,
knowledge acquired here, to generate different analytics and, if and A. Apon, “Automotive big data: Applications, workloads and
possible, in a next stage, correlate more than one source and infrastructures,” Big Data (Big Data), 2015 IEEE Int. Conf., pp.
type of data. The combination of CEP and more data sources, 1201–1210, 2015.
would allow the development of a more complete real-time [14] S. Ma and Z. Liang, “Design and Implementation of Smart City Big
traffic management application, that could sense traffic related Data Processing Platform Based on Distributed Architecture,” 2015
events faster, alerting quickly road operators about any 10th Int. Conf. Intell. Syst. Knowl. Eng., pp. 428–433, 2015.
problem. [15] M. Taneja, “A Mobility Analytics Framework for Internet of
Things,” pp. 113–118, 2015.
[16] S. Amini, I. Gerostathopoulos, and C. Prehofer, “Big data analytics
ACKNOWLEDGMENT architecture for real-time traffic control,” Model. Technol. Intell.
Transp. Syst. (MT-ITS), 2017 5th IEEE Int. Conf., no. Tum Llcm,
The authors acknowledge the European Commission for its pp. 710–715, 2017.
support and partial funding and the partners of the research
[17] A. Bär, P. Casas, L. Golab, and A. Finamore, “DBStream: an Online
project: H2020– 636160 OPTIMUM. Aggregation, Filtering and Processing System for Network Traffic
Monitoring.”
REFERENCES [18] “Welcome to The Apache Software Foundation!” [Online].
Available: https://www.apache.org/. [Accessed: 13-Mar-2018].
[1] United Nations, World Urbanization Prospects, vol. 12. 2014.
[19] “Plataforma de Informática na Cloud e Serviços do Microsoft
[2] R. Louf and M. Barthelemy, “How congestion shapes cities: From Azure.” [Online]. Available: https://azure.microsoft.com/pt-pt/.
mobility patterns to scaling,” Sci. Rep., vol. 4, 2014. [Accessed: 07-Mar-2018].
[3] J. Arbib and T. Seba, “Rethinking Transportation 2020-2030,” 2017. [20] “IBM Streams - Overview - Montserrat.” [Online]. Available:
[4] G. S. Tewolde, “Sensor and network technology for intelligent https://www.ibm.com/ms-en/marketplace/stream-computing.
transportation systems,” in 2012 IEEE International Conference on [Accessed: 07-Mar-2018].
Electro/Information Technology, 2012, pp. 1–7. [21] “Samza.” [Online]. Available: http://samza.apache.org/. [Accessed:
[5] I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani, and 07-Mar-2018].
S. Ullah Khan, “The rise of ‘big data’ on cloud computing: Review [22] “Apache Storm.” [Online]. Available: http://storm.apache.org/.
and open research issues,” Inf. Syst., vol. 47, pp. 98–115, 2015. [Accessed: 07-Mar-2018].
[6] J. Anuradha, “A Brief Introduction on Big Data 5Vs Characteristics [23] “Spark Streaming | Apache Spark.” [Online]. Available:
and Hadoop Technology,” Procedia - Procedia Comput. Sci., vol. https://spark.apache.org/streaming/. [Accessed: 07-Mar-2018].
48, pp. 319–324, 2015.
[24] “The Only Converged Data Platform | MapR.” [Online]. Available:
[7] Y. G. Petalas, A. Ammari, P. Georgakis, and C. Nwagboso, “A Big https://mapr.com/. [Accessed: 07-Mar-2018].
Data Architecture for Traffic Forecasting Using Multi-Source
Information,” in ALGORITHMIC ASPECTS OF CLOUD [25] S. Kamburugamuve, S. Ekanayake, M. Pathirage, and G. Fox,
COMPUTING, ALGOCLOUD 2016, 2017, vol. 10230, pp. 65–83. “Towards High Performance Processing of Streaming Data in Large
Data Centers,” in 2016 IEEE International Parallel and Distributed
[8] M. Bala, O. Boussaid, and Z. Alimazighi, “Big-ETL: Extracting- Processing Symposium Workshops (IPDPSW), 2016, pp. 1637–
Transforming-Loading Approach for Big Data,” pp. 462–468, 2015. 1644.
[9] “IBM Knowledge Center - Home of IBM product documentation.” [26] P. Karunaratne, S. Karunasekera, and A. Harwood, “Distributed
[Online]. Available: stream clustering using micro-clusters on Apache Storm,” J. Parallel
https://www.ibm.com/support/knowledgecenter/. [Accessed: 06- Distrib. Comput., vol. 108, pp. 74–84, 2017.
Mar-2018].
[27] “RabbitMQ - Messaging that just works.” [Online]. Available:
[10] G. Guerreiro, P. Figueiras, R. Silva, R. Costa, and R. Jardim- https://www.rabbitmq.com/. [Accessed: 07-Mar-2018].
Goncalves, “An architecture for big data processing on intelligent
transportation systems. An application scenario on highway traffic [28] “How Spotify Scales Apache Storm | Labs.” [Online]. Available:
flows,” in 2016 IEEE 8th International Conference on Intelligent https://labs.spotify.com/2015/01/05/how-spotify-scales-apache-
Systems (IS), 2016, pp. 65–72. storm/. [Accessed: 08-Mar-2018].
[11] “Optimum Project - Home.” [Online]. Available:
http://www.optimumproject.eu/. [Accessed: 06-Mar-2018].
[12] Y. Lin, P. Wang, and M. Ma, “Intelligent Transportation
System(ITS): Concept, Challenge and Opportunity,” Proc. - 3rd

978-1-5386-1469-3/18/$31.00 ©2018 IEEE 2018 IEEE International Conference on Engineering,


Technology and Innovation (ICE/ITMC)

You might also like