2 Surveillance

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO.
5, SEPTEMBER 2010 493
Survey on Contemporary Remote Surveillance

Systems for Public Safety
Tomi D. Räty
Abstract—Surveillance systems provide the capability of collect- literature review. In public safety, real-time distributed archi-
ing authentic and purposeful information and forming appropriate tecture is required to transmit sensor data immediately for de-
decisions to enhance safety. This paper reviews concisely the his- duction. Awareness and intelligence is applied to address the
torical development and current state of the three different gener-
ations of contemporary surveillance systems. Recently, in addition automatic deduction. Video surveillance is thoroughly used in
to the employment of the incessantly enlarging variety of sensors, public safety. The usage of wireless networks is growing in pub-
the inclination has been to utilize more intelligence and situation lic safety and it is accompanied with energy efficiency. Surveil-
awareness capabilities to assist the human surveillance personnel. lance personnel often patrol in surveyed areas and their precise
The most recent generation is decomposed into multisensor envi- location must be known to exploit their benefit to the fullest.
ronments, video and audio surveillance, wireless sensor networks,
distributed intelligence and awareness, architecture and middle- As surveyed areas become constantly larger and more com-
ware, and the utilization of mobile robots. The prominent diffi- plex, scalability is a crucial issue in the surveillance of public
culties of the contemporary surveillance systems are highlighted. safety.
These challenging dilemmas are composed of the attainment of Public safety and homeland security are substantial concerns
real-time distributed architecture, awareness and intelligence, ex- for governments worldwide, which must protect their people
isting difficulties in video surveillance, the utilization of wireless
networks, the energy efficiency of remote sensors, the location dif- and the critical infrastructures that uphold them. Information
ficulties of surveillance personnel, and scalability difficulties. The technology plays a significant role in such initiatives. It can assist
paper is concluded with concise summary and the future of surveil- in reducing risk and enabling effective responses to disasters of
lance systems for public safety. natural or human origin [1].
Index Terms—Distributed systems, human safety, surveillance, There is an increasing demand for security in society. This
survey. results in a growing need for surveillance activities in many
environments. Recent events, including terrorist attacks, have
I. INTRODUCTION resulted in an increased demand for security in society. This
has influenced governments to make personal and asset security
URVEILLANCE systems enable the remote surveillance
S of widespread society for public safety and proprietary
integrity. This paper contains the revision of the background
priorities in their policies. Valera and Velastin [2] state that the
demand for remote surveillance relative to safety and security
has received significant attention, especially in the public places,
and the three different generations of surveillance systems. The remote surveillance of human activities, surveillance in forensic
emphasis of this paper is on the third-generation surveillance applications, and remote surveillance in military applications.
system (3GSS) and its current and significant difficulties. The The public can be perceived either as individuals or as a crowd.
3GSSs use multiple sensors. Domain-specific issues are omitted Valera and Velastin [2] indicate that a future challenge is to
from this paper, despite being inherent to their own domain. The develop a wide-area distributed multisensor surveillance system,
focus is on generic surveillance, which is applicable to public which has robust, real-time computer algorithms, which are
safety. executable with minimal manual reconfiguration for different
Surveillance systems are typically categorized into three dis- applications [2].
tinct generations of which the 3GSSs is the current genera- There is a growing interest in surveillance applications, be-
tion. The essential dilemmas of the 3GSSs are related to the cause of the availability of cheap sensors and processors at
attainment of real-time distributed architecture, awareness and reasonable costs. There is also an emerging need from the pub-
intelligence, existing difficulties in video surveillance, the uti- lic for improved safety and security in urban environments and
lization of wireless networks, the energy efficiency of remote the significant utilization of resources in public infrastructure.
sensors, location difficulties of surveillance personnel, and scal- This, with the growing maturity of algorithms and techniques,
ability difficulties. These aspects repetitively occurred in the enables the application of technology in miscellaneous sectors,
such as security, transportation, and the automotive industry. The
problem of remote surveillance of unattended environments has
Manuscript received August 4, 2009; revised November 16, 2009 and January received particular attention in the past few years [3].
28, 2010; accepted January 28, 2010. Date of publication March 1, 2010; date
of current version August 18, 2010. This paper was recommended by Associate Intelligent remote monitoring systems allow users to sur-
Editor L. Zhang. vey sites from significant distances. This is especially useful
The author is with the VTT Technical Research Centre of Finland, Oulu when numerous sites require security surveillance simultane-
90571, Finland (e-mail: tomi.raty@vtt.fi).
Color versions of one or more of the figures in this paper are available online ously. These systems use rapid and efficient corrective actions,
at http://ieeexplore.ieee.org. which are executed immediately once a suspicious activity is de-
Digital Object Identifier 10.1109/TSMCC.2010.2042446
1094-6977/$26.00 © 2010 IEEE

494 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010
tected. An alert system can be used to warn security personnel still images, or audio. Such data processed and analyzed by a
of impending difficulties and numerous sites can be simulta- human, a computer, or a combination of both at a command
neously monitored. This considerably reduces the load of the center. An administrator can decide on performing an on-field
security personnel [4]. operation to put the environment back into a situation considered
A fundamental goal of surveillance systems is to acquire good as normal. On-field control operations are issued by on-field
coverage of the observed region with as few cameras as possible agents who require effective communication channels to uphold
to keep the costs for the installation and the maintenance of cam- a close interaction with the command center [10].
eras, transmission channels, and complexity in scene calibration A surveillance system can be defined as a technological tool
reasonable [5]. that assists humans by offering an extended perception and
In this paper, we first present the background and progression reasoning capability about situations of interest that occur in
of surveillance systems. This is followed by careful descrip- the monitored environments. Human perception and reasoning
tions of the three generations of surveillance systems. Then we are restricted by the capabilities and limits of human senses
present the difficulties of contemporary surveillance systems, and mind to simultaneously collect, process, and store limited
which compose of the attainment of real-time distributed archi- amount of data [3].
tecture, awareness and intelligence, existing difficulties in video To address this amount of information, aspects such as scala-
surveillance, the utilization of wireless networks, the energy ef- bility and usability become very significant. This includes how
ficiency of remote sensors, location difficulties of surveillance information needs to be given to the right people at the right time.
personnel, and scalability difficulties. The paper is concluded To tolerate this growing demand, research and development has
with a future prospect and a brief summary. been subsequently executed in commercial and academic envi-
ronments to discover improvements or new solutions in signal
processing, communications, system engineering and computer
II. HISTORICAL SURVEILLANCE AND SURVEILLANCE SYSTEMS vision [2].
The stone-age warrior used his eyes and ears from atop of a
mantle to survey his battle area and to distinguish targets against
which his could utilize his primitive weapons. Despite advance-
ments in weaponry to catapults, swords, and shields, the eyes III. PROGRESSION OF SURVEILLANCE SYSTEMS
and ears of warriors were utilized for surveillance. The observa- Over the past two decades, surveillance systems have been an
tion balloon and the telegraph significantly improved range in area of considerable research. Recently, plenty of research has
both visibility and information transmission, respectively, but in been concentrated on video-based surveillance systems, partic-
the twentieth century, the improvements from the eyes and ears ularly for public safety and transportation systems [11].
transformed surveillance into the concept “modern” [6]. Data are collected by distributed sources and then they are
Military operations have introduced the importance of the typically transmitted to some remote control center. The auto-
combat surveillance problem. The location of target coordinates matic capability to learn and adjust to altering scene conditions
and shifting own troops accordingly requires dynamic actions and the learning of statistical models of normal event patterns
accompanied with decisions. Rapid, complete, and precise in- are growing issues in surveillance systems. The learning sys-
formation is needed to address this [7]. Information included tem offers a mechanism to flag potentially anomalous events
the detection and approximate location of personnel, concentra- through the discovery of the normal patterns of activity and
tions of troops, and the monitoring and storage of position data flagging the least probable ones. Two substantial restrictions
over time and according to movements [8]. Surveillance infor- that affect the deployment of these systems in the real world
mation must be delivered to the correct commander when he contain real-time performance and low cost. Multisensor sys-
requires it and the information must be presented in a meaning- tems can capitalize from processing either the same type or
ful form to address the problem of information processing [7]. different type of information collected by sensors, e.g., video
The data-collection problem is addressed by the entities, which cameras, and microphones, of the same monitored area. Appro-
perform the surveillance, e.g., intelligence sources and human priate processing techniques and new sensors offering real-time
surveillance, and transmit it to the command [7]. information associated to different scene characteristics can as-
The fundamental intention of a surveillance system is to ac- sist both to improve the size of monitored environments and to
quire information of an aspect in the real world. Military surveil- enhance performances of alarm detection in regions monitored
lance systems enhance the sensory capabilities of a military by multiple sensors [3].
commander. Surveillance systems have evolved from simple Security surveillance systems are becoming crucial in situa-
visual and verbal systems, but the purpose is still the same. tions in which personal safety could be compromised resulting
Even the most primitive surveillance systems gathered informa- from criminal activity. Video cameras are constantly being in-
tion concerning reality and communicated it to the appropriate stalled for security reasons in prisons, banks, automatic teller
users [9]. machines, petrol stations, and elevators, which are the most sus-
Generic surveillance is composed of three essential parts. ceptible for criminal activities. Usually, the video camera is con-
These are data acquisition, information analysis, and on-field nected to a recorder or to a display screen from which security
operation. Any surveillance system requires means to monitor personnel constantly monitor suspicious activities. As security
the environment and collect data in the form of, e.g., video, personnel typically monitor multiple locations simultaneously,
RÄTY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY 495
this manual task is labor intensive and inefficient. Significant of the events of interest. From a communication perspective,
stress may be placed on the security personnel involved [4]. these systems suffered from the main difficulties of analogue
Another technological breakthrough substantial to the devel- video communication, e.g., high-bandwidth requirements and
opment of surveillance systems is the capability of remotely poor allocation flexibility [3].
transmitting and reproducing images and video information, The 1GSS utilizes analogue CCTV systems. The advantage is
e.g., TV broadcasting and the successive use of video sig- that they provide good performance in some situations and the
nal transmission and display in the close circuit TV systems technology is mature. The utilization of analogue techniques for
(CCTV). CCTVs that provide data at acceptable quality date image distribution and storing is inefficient. The current 1GSSs
back to the 1960s. The availability of CCTVs can be considered examine the usage of digital information against analogue, re-
as the beginning point that allowed online surveillance to be view digital video recording, and CCTV video compression [2].
feasible, and 1960 can be considered the beginning date of the Computer vision is a significant artificial intelligence (AI)
first generation surveillance systems [3]. research area. From the 1970s to the 1990s, computer vision
Surveillance systems have developed in the three genera- proved its practical value in a vast range of application domains
tions [11]. The first generation of surveillance systems (1GSSs) including medical diagnostics, automatic target recognition, and
used analogue equipment throughout the complete system [11]. remote sensing [13].
Analogue closed-circuit television cameras (CCTV) captured
the observed scene and transmitted the video signals over ana-
logue communication lines to the central back-end systems, V. SECOND-GENERATION SURVEILLANCE SYSTEMS
which presented and archived the video data [11]. The main In this technological evolution, 2GSSs (1980–2000) corre-
challenge in the 1GSS is that it uses analogue techniques for spond to the maturity phase of the analogue 1GSS. The 2GSSs
image distribution and storage [2]. benefited from the early progression in digital video commu-
The second generation of surveillance systems (2GSSs) uses nications, e.g., digital compression, robust transmission, band-
digital back-end components [11]. They enable real-time au- width reduction, and processing methods, which assist the hu-
tomated analysis of the incoming video data [11]. Automated man operator by prescreening important visual events [3].
event detection and alarms substantially improve the content of Regarding the 2GSS, automated visual surveillance is
simultaneously monitored data and the quality of the surveil- achieved through the combination of computer vision technol-
lance system [11]. The difficulty in the 2GSS is that it does ogy and CCTV systems. The benefits of the second generation
not support robust detection and tracking algorithms, which are are that the surveillance efficiency of CCTV is enhanced. The
needed for behavioral analysis [2]. difficulties lie within the robust detection and tracking algo-
The 3GSSs have finalized the digital transformation. In these rithms needed for behavioral analysis. The current research of
systems, the video signal is converted into the digital domain at 2GSS rests in real-time robust computer vision algorithms, au-
the cameras, which transmit the video data through a computer tomatic learning of scene variability and patterns of behavior,
network, for instance a local area network. The back-end and and eliminating the differences between the statistical analyses
transmission systems of a third-generation surveillance system of a scene and establishing natural language interpretations [2].
have also improved their functionality [11]. The 2GSS research addressed multiple areas with improved
There are immediate needs for automated surveillance sys- results in real-time analysis and separation of 2-D image se-
tems in commercial, military applications, and law enforcement. quences, identification, and tracking of multiple objects in com-
Mounting video cameras is inexpensive, but locating available plex scenes, human behavior comprehension, and multisensor
human resources to survey the output is expensive. Despite data fusion. The 2GSS also improved intelligent man–machine
the usage of surveillance cameras in banks, stores, and park- interfaces, performance evaluation of video processing algo-
ing lots, video data currently are used only retrospectively as a rithms, wireless and wired broadband access networks, signal
forensic tool, thus losing its primary benefit as an active real- processing for video compression, and multimedia transmission
time medium. What is required is a continuous 24-h monitoring for video-based surveillance systems [3].
of surveillance video to alert security officers of a burglary in The majority of research efforts during the period of the
progress, or a suspicious individual lingering in a parking lot, 2GSSs have been used in the development of automated real-
while there still is time to prevent the criminal offence [12]. time event detection techniques for video surveillance. The
availability of automated methods would significantly ease the
monitoring of large sites with multiple cameras as the automated
IV. FIRST-GENERATION SURVEILLANCE SYSTEMS event detection enables prefiltering and the presentation of the
First video generation surveillance systems (1960–1980) con- main events [3].
siderably extend human perception capabilities in a spatial
sense. The 1GSSs are based on analogue signal and image trans-
VI. THIRD-GENERATION SURVEILLANCE SYSTEMS
mission and processing. In these systems, analogue video data
form a collection of cameras, which view remote scenes and The 3GSSs handle a large number of cameras, a geographical
present information to the human operators. The main disad- spread of resources, and many monitoring points. From an image
vantages of these systems concern the reasonably small atten- processing view, they are based on the distribution of processing
tion span of operators that may result in a significant miss rate capacities over the network and the use of embedded signal-
Fig. 1. Illustration of a typical processing flow in video surveillance systems

[2].
processing devices to achieve the benefits of scalability and

potential robustness offered by distributed systems [14].
In the 3GSS, the technology revolves around wide-area
surveillance systems. This results in the advantages of the col-
lection of more accurate information by combing different types
of sensors and in the distribution of the information. The diffi-
culties are in the efficient integration and communication of in-
formation, establishment of design methodologies, and moving
and multisensor platforms. The current research of 3GSSs con-
centrate on distributed and centralized intelligence, data fusion,
probabilistic reasoning frameworks, and multicamera surveil-
lance techniques [2].
The fundamental goals that are expected of a third-generation
vision surveillance application, based on end-user requirements,
are to offer good scene comprehension, surveillance informa-
tion at real-time in a multisensor environment, and the use of
low-cost standard components. Fig. 1 presents a typical process-
ing flow of video surveillance systems. It composes of object
Fig. 2. Example of combining the data of multiple sensors in different events:
detection, object recognition, tracking, behavior and activities (a) walking, (b) running, (c) talking, (d) knocking on a door, and (e) shouting
analysis, and a database [2]. [17].
Once the object is detected, the object recognition task uses
model-based techniques in recognition and tracking. This is
CCTV is generally used as a reactive tool. If a problem hap-
followed with the behavior and activities analysis of the tracked
pens which is not noticed, then it will proceed without any
objects. The database addresses the storage and retrieval [2].
response [15].
Research on distributed real-time video processing tech-
The notable aspects of the 3GSSs are decomposed into the
niques in intelligent, open, and dedicated networks is anticipated
topics of the following subchapters. They consist of multisensors
to offer interesting results. This is largely due to the availabil-
environments, video surveillance, audio surveillance, wireless
ity of enhanced computational power at reasonable expenses,
sensor networks, distributed intelligence and awareness, archi-
advanced video processing and comprehension methods, and
tecture and middleware, and the utilization of mobile robots.
multisensor data fusion [3].
The main objective of the fully digital 3GSSs is to ease
efficient data communication, management, and extraction of A. Multiple Sensor-Enabled Environments
events in real-time video from a large collection of sensors. To Spatially distributed multisensor environments offer interest-
achieve this goal, improvements in automatic recognition func- ing possibilities and challenges to surveillance. Recently, there
tionalities and digital multiuser communications are required. have been studies on data fusion techniques to tolerate infor-
Technologies, which satisfy the requirements of the recognition mation sharing that results from different types of sensors [2].
algorithms, contain computational speed, memory utilization, The communication aspects within separate parts of the sys-
remote data access, and multiuser communications between dis- tem play a crucial role, with particular challenges either due to
tributed processors. The availability of this technology signifi- bandwidth constraints or due to the asymmetric characteristics
cantly eases the 3GSS development and deployment [3]. of communication [2]. Rasheed et al. exploit the utilization of
The main application areas for the 3GSSs are in the region data fusion over multiple modalities, including radar informa-
of public monitoring. This is required by the rapid growth of tion, automatic identification systems (AIS), and global position
metropolitan localities and by the increasing need to offer en- system (GPS) receivers [16].
hanced safety and security to the general public. Other factors Fig. 2 illustrates the reflection of two discrete sensors, the
that drive the deployment of these systems include efficient re- video sensor and the audio sensor, which can be fused to en-
source management and rapid emergency assistance [3]. hance information. The sequences compose of walking, running,
The essential limitation in the efficiency of CCTV surveil- talking, knocking, and shouting events. The recorded audio is
lance systems is the cost of offering adequate human monitoring segmented into audio frames of 50 ms. Each sequence was
coverage for what is a considerably boring task. Additionally, recorded over a time of 8 s [17].
Fig. 4. Illustration of tracking with an occluding object passing across the

view of the camera [22].
to achieve the exhaustive surveillance task as automatically as

Fig. 3. Generic view of networked cameras [18].
possible [20].
Intelligent cameras execute a statically defined collection of
low-level image-processing operations on the captured frames to
Junejo et al. state that a single camera is not sufficient in enhance the video compression and intelligent host efficiency.
monitoring a large area. To address this problem, a network Changing or reconfiguring the video processing and analysis
of cameras is established. Junejo et al. utilize an automatically during the operation of a surveillance system is difficult [11].
configurable network of nonoverlapping cameras to attain suf- The difficulty of tracking an individual maneuvering in a
ficient monitoring capabilities on large areas of interest. Fig. 3 cluttered environment is a well-studied region. Usually, the ob-
illustrates the principles of a network of cameras. In this ex- jective is to predict the state of an object based on a set of noisy
ample, each camera is mounted on to a moving platform while and unclear measurements. There is a vast range of applications
detecting and tracking objects [18]. in which the target-tracking problem is presented, including ve-
hicle collision warning and avoidance, mobile robotics, speaker
localization, people and animal tracking, and tracking a military
B. Video Surveillance target [21].
Video surveillance has become an omnipresent aspect of Fig. 4 illustrates a tracking sequence with an individual pass-
the modern urban landscape, situated in a vast variety of en- ing through the view of a camera causing an occlusion. The
vironments including shopping malls, railway stations, hospi- output of the background subtraction method for each frame
tals, government buildings, and commercial premises. In some is a binary image, which is composed of foreground region.
cases, surveillance performs persuasion, discouraging unaccept- When an occlusion occurs, multiple objects may merge into
able behavior that can no longer be performed anonymously, the same area. This requires an object model that can address
recording and logging events for evidential reasons, or offering split-and-merge cases. Each pixel in the foreground indicates an
remote observation of sensitive locations where access control object label according to which the product of color and spatial
is crucial [19]. probability is the highest [22].
Intelligent visual surveillance systems address the real-time However, watchful the operators, manual monitoring suffers
monitoring of persistent and transient objects within a specific from information overload, which results in periods of operator
environment. The primary goals of these systems are to offer inattention due to weariness, distractions, and interruptions. In
an automatic interpretation of scenes, and to understand and practice, it is unavoidable that a significant amount of the video
predict the actions and interactions of the observed objects. The channels is not usually monitored, and potentially important
understanding and prediction are based on the information col- events are overlooked. Additionally, weariness grows signifi-
lected by sensors. The basic stages of processing in an intelligent cantly as the amount of cameras in the system increases. The
visual surveillance system are moving object definition, recog- automation of all or part of this process would obviously offer
nition, tracking, behavioral analysis, and retrieval. These stages dramatic benefits, ranging from a capability to alert an operator
contain the topics of machine vision, pattern analysis, artificial of potential event of interest, to a completely automatic detection
intelligence and data management [2]. and analysis system. However, the dependability of automated
As an active research topic in computer vision, visual surveil- detection systems is an essential issue, because frequent false
lance in dynamic scenes attempt to detect, recognize, and track alarms introduce skepticism in the operators, who quickly learn
certain objects from image sequences. In addition, it is impor- to disregard the system [19].
tant to comprehend and depict object behaviors. The aim is to It is desirable that visual surveillance systems can understand
develop an intelligent visual surveillance to replace the tradi- the activity of the scene it is detecting and tracking. Ideally,
tional passive video surveillance that is proving to be inefficient this would be done in a manner, which is consistent with that
as the amount of cameras exceeds the capability of human op- of a human observer. The task of automating the interpretation
erator surveillance. Shortly, the goal of visual surveillance is of the video data is a detailed one and can depend on a vast
not only to place cameras in the place of human eyes, but also range of factors, including location, context, time, and date.
This information indicates where objects are and what they may
be doing as they are observed, and attempts to characterize usual
behavior [19].
Research interests have shifted from ordinary static image-
based analysis to video-based dynamic monitoring and anal-
ysis. Researchers have advanced in addressing illumination,
color, background, and perspective static aspects. They have
advanced in tracking and analyzing shapes related to moving
human bodies and moving cameras. They have improved activ-
ity analysis and control of multicamera systems. The research
of Trivedi et al. [13] addresses a distributed collection of cam-
eras, which provide wide-area monitoring and scene analysis
on several levels of abstraction. Installing multiple sensors in-
troduces new design aspects and challenges. Handoff schemes
are needed to pass tracked objects between sensors and clus- Fig. 5. Illustration of images and the output of background subtraction [22].
ters, methods are required to specify the best view given in the
scene’s context, and sensor-fusion algorithms capitalize a given
sensor’s strengths [13].
Modern visual surveillance systems deploy multicamera clus-
ters operating at real-time with embedded adaptive algorithms.
These advanced systems need to be operational constantly, and
to robustly and reliably detect events of interest in difficult
weather conditions. This includes adjusting to natural and ar-
tificial changes in the illumination, and withstanding hardware
and software system failures [23].
Generally, the initial step for automatic video surveillance is
adaptive background subtraction to extract foreground regions
from the incoming frames. Object tracking is then executed on
the foreground regions. In this case, tracking isolated objects is
relatively easy. When multiple tracked objects are placed into
groups with miscellaneous complexities of occlusion, tracking
each individual object through crowds becomes a challenging
Fig. 6. Example of tracklet tracking [26].
task. First, when objects merge into a group, the visual character-
istics for each object become unclear and obscure. The objects
distant from the camera can be partially or completely occluded Li et al. state that the aim of multitarget tracking is to infer
by the surrounding objects. Second, the poses and scales of the the target trajectories from image observations in a video. This
target objects may severely change when they are in crowds. poses a significant challenge in crowded environments where
Third, the motion speed and the direction of the target objects there are frequent occlusions and multiple targets have a similar
may essentially change during occlusion [24]. appearance and intersecting trajectories. Data association-based
Basically, the approach of the detection of moving objects is tracking (DAT) associates links to short track fragments, i.e.,
through background subtraction that contains the model of the tracklets, or detection responses into trajectories based on simi-
background and the detection of moving objects from those that larity in position, size, and appearance. This enables multitarget
differ from such a model. In comparison to other approaches, tracking from a single camera by progressively associating de-
such as optical flow, this approach is computationally afford- tection responses into longer track fragments, i.e., tracklets, to
able for real-time applications. The main dilemma is its sen- resolve target trajectories. Fig. 6 presents an image of tracklet
sitivity to dynamic scene challenges and the subsequent need tracking [26].
for background model adaptation through background mainte- Human motion tracking which is based on the input from
nance. This type of a problem is known to be essential and red–green–blue (RGB) cameras can produce results in indoor
demanding [25]. scenes with consistent illumination and steady background [27].
Fig. 5 illustrates a collection of images from a parking lot Outdoor scenes with significant background clutter results from
and the background subtraction output of these images. Object illumination changes are a challenge for conventional charged-
detection is achieved by constructing a representation of the couple device (CCD) cameras [27]. There have been contri-
scene, which is called a background model, and then locating butions on pedestrian localization and tracking in visible and
the differences from the model against each incoming frame. infrared videos [28]. Fig. 7 presents a thermal image and a color
The higher image sequence illustrates the complete scene and image of the same scene [28].
the lower image sequence represents the resulting background A significant problem encountered in numerous surveillance
subtraction output [22]. systems are the changes in ambient light, particularly in an out-
Fig. 7. (Left) Thermal image and (right) color image of a scene [28].
Fig. 8. Example of a microphone array for measuring the bearing angle [32].
door environment, where the lighting conditions varies. This

renders the conventional digital color image analysis very dif-
ficult. Thermography, or thermal visualization, is a type of in-
frared visualization. Thermal cameras have been utilized for Sound localization using compact sensor nodes deployed in
imaging objects in the dark. These cameras use infrared (IR) networks has applications in surveillance, security, and law en-
sensors that capture IR radiation of different objects in the enforcement. Numerous groups have reported noncoherent and
vironment and forms IR images [29]. coherent methods for sound localization, detection, classifica-
tion, and tracking in sensor networks. Coherent methods are
based on the arrival time differences of the acoustic signal to the
C. Audio Surveillance
sensors. In standard systems, microphones are separated to max-
The creativeness of the research of Istrate et al. [30] is to use imize precision. The need of synchronization requires frequent
sound as an informative source simultaneously with other sen- communication that is expensive in terms of power consump-
sors. Istrate et al. [30] suggest extracting and classifying normal tion. The nodes must achieve synchronization to produce a valid
life sounds, such as a door banging, glass shattering, and objects estimate [32].
falling, with the intention of identifying serious accidents, for Fig. 8 presents an example of sound location. An array of
instance as falling or somebody fainting. The approach of Is- microphones (M1, M2, M3, and M4) pairwise separated by a
trate et al. [30] comprises the replacement of the video camera distance (d) is considered. The angle of the source of sound
with a multichannel sound acquisition system, which analyzes is presented against the coordinate axis. The bearing of the
the sound range of the location at real-time and specifies situ- microphone pair M1 and M3 is given as the beta angle. The
ations of emergency. Only the previously detected sound event bearing of the microphone pair M2 and M4 is presented as the
is transmitted to the alarm monitor, if it is considered to be a alpha angle [32].
possible alarm. To reduce the computation time required for a Considering the nature of an event that is desirable to de-
multichannel real-time system, the sound extraction process has tect, the content of information created is more than just visual
been split into detection and classification. Sound event detec- information. Many of the significant events from a monitoring
tion is a complicated task, because the audio signals occur in a point of view are accompanied with audio information, which
noisy environment [30]. would be useful to examine. The significance of these events is
Accurate and robust localization and tracking of acoustic provided by their semantic information and their temporal con-
sources is of interest to a variety of applications in surveillance, text. A monitoring system that must distinguish between a door
multimedia, and hearing enhancement. The miniaturization of opening and glass breaking, should be expected to identify one
microphone arrays combined with acoustic processing further and not the other at a given time and location. By expanding the
enhances the advantages of these systems, but poses challenges range of information available to the system, the precision of the
to achieve precise localization performance due to decreasing operation can be improved. The purpose of an audio sensor net-
aperture. For surveillance, acoustic emissions from ground ve- work would be to assist the end user to search through data and
hicles offer an easily detected signature, which can be used for return the points of interest. This would not be done by adding an
unobtrusive and passive tracking. This results in a higher lo- overwhelming amount additional data, but by drawing attention
calization performance in distributed sensing environments. It to the data already collected, but difficult to locate [33].
exceeds the requirement for excessive data transfer and fine- The sound analysis system has been separated into three mod-
grain time synchronization among nodes, with low communi- ules as illustrated in Fig. 9. The first module is applied to every
cation bandwidth and low complexity. Additional improvement channel to detect sound events and to extract them from the
can also be achieved through the fusion of other data modali- signal flow. The source of speech or sound can be localized
ties, such as video. Traditionally, large sensor arrays are used by comparing the predicted SNR for every channel. The fusion
for source localization to guarantee adequate spatial diversity module chooses the premium channel if multiple events are
over sensors to resolve time delays between source observa- detected simultaneously. The third module receives the sound
tions. The precision of delay-based bearing estimation degrades event extracted by the previous module, and it predicts the most
with decreasing dimensions (aperture) of the sensor array [31]. probable sound class [30].
environment. In most situations these devices must be small in

size, require low power, lightweight, and unobtrusive [37].
In addition to the new applications, wireless sensor networks
offer an alternative to several existing technologies. The wiring
costs restrict complicated environment controls and the recon-
figurability of these systems. In many cases, the savings in
the wiring costs alone justify the use of the wireless sensor
nodes [38].
A basic issue that arises naturally in sensor networks is cov-
erage. Due to the significant variety of sensors and their appli-
cations, sensor coverage is subject to a vast sphere of interpreta-
tions. Generally, coverage can be considered as a measure of the
quality of service of a sensor network. Coverage formulations
can attempt to locate the weak points in a sensor field and sug-
gest future deployment or reconfiguration schemes to enhance
the total quality of service [38].
In the previous years, wireless networks, such as IEEE
802.11 a, b, and g wireless local-area networks (WLANs), have
become plentiful and their popularity is only increasing. In the
near future, wireless networks will become omnipresent, and
they will supply high-speed communication capabilities almost
anywhere. An immediate question is whether it is possible to uti-
lize the wireless network infrastructure to implement other func-
tionalities in addition to communication. WLANs have been
Fig. 9. Analysis of sound [30]. used for positioning mobile terminals and tracking their move-
ments. If the communication infrastructure could be utilized for
security purposes, the deployment of the additional infrastruc-
D. Wireless Sensor Networks ture could be avoided or reduced, resulting in a considerably
Wireless devices, such as wireless-enabled laptops and palm more cost-effective solution [34].
pilots, have progressed into an integral part of daily lives. A wire- Fig. 10 illustrates the basic functionality of a store-and-
less network can be considered to be a sensor network, where forward wireless sensor network (WSN) in which video infor-
the network nodes function as sensors. They sense changes in mation is obtained with cameras and transmitted forward. The
the environment according to the movement of objects or hu- WSN composes of shared-medium cameras, store-and-forward
mans. A possible additional functionality could be the indoor cameras, distributed servers, routing nodes, wireless cameras
surveillance of corporate buildings and private houses [34]. and base stations, and a control room. The cameras distribute
Wireless sensor networks represent a new type of ad hoc net- their information through the nodes and the distributed server
works, which integrate sensing, processing, and wireless com- to the control room [39].
munication in a distributed system [35]. Sensor networks are
a growing technology that promises a novel ability to monitor
and equip the physical world [36]. In a sensing-covered net- E. Distributed Intelligence and Awareness
work, each point in a geographic area of interest needs to be The 3GSSs use distributed intelligence functionality. An im-
within the sensing range of at least one sensor [35]. Sensor net- portant design issue is to determine the granularity at which
works comprise a significant amount of inexpensive wireless the tasks can be distributed based on available computational
devices (nodes) that are densely distributed over the region of resources, network bandwidth, and task requirements. The dis-
interest [36]. They are usually battery powered with restricted tribution of intelligence can be achieved by the dynamic partition
computation and communication abilities [36]. Every node is of all the logical processing tasks, including event recognition
equipped with different of sensing modalities, such as acoustic, and communications. The dynamic task allocation dilemma is
infrared, and seismic [36]. studied through the usage of a computational complexity model
Wireless sensor networks have the potential to improve the for representation and communication tasks [3].
ability to develop user-centric applications to monitor and pre- A surveillance task can be separated into four phases, which
vent harmful events. The availability of inexpensive low-power are 1) event detection, 2) event representation, 3) event recog-
sensors, radios, and embedded processors enables the deploy- nition, and 4) event query. The detection phase addresses mul-
ment of distributed sensor networks to offer information to users tisource spatiotemporal data fusion for efficient and reliable ex-
in distinct environments and to provide them control over unde- traction of motion trajectories from videos. The representation
sirable situations. Networked sensors can collaborate to process phase revises raw trajectory data to construct hierarchical, in-
and make deductions from the collected data and provide the variant, and adequate representations of the motion events. The
user with access to continuous or selective observations of the recognition phase handles event recognition and classification.
Fig. 11. Generic architecture of a monitoring/control pervasive system [42].
Fig. 11 presents an illustration of an MCS, which is structured

into three logical levels, which are 1) observation, 2) interpre-
tation, and 3) actuation. In observation, the state of a monitored
field is periodically captured by a specified monitoring agency
(MA). This is usually a set of sensors. In interpretation, the
values detected by sensors are evaluated by a specified inter-
pretation agency (IA). In actuation, the specified actions are
Fig. 10. Wireless sensor network accompanied with distributed location executed by a specific actuation agency (AA), depending on the
servers [39]. interpretational results [42].
F. Architecture and Middleware

The query component indexes and retrieves videos that match The field of automated video surveillance is quite novel and
some query criteria [40]. the majority of contemporary approaches are engineered in an
The key to security is situation awareness. Awareness re- ad hoc manner. Recently, researchers have begun to consider
quires information, which spans across multiple scales of time architectures for video surveillance. Middleware that provides
and space. A security analyst must keep track of “who are the general support to video surveillance architectures is the logi-
people and vehicles in a space” (identity tracking), “where are cal next step. It should be noted that while video surveillance
the people in a space” (location tracking), and “what are the peo- networks are a class of sensor networks, the engineering chal-
ple/vehicles/objects in a space doing” (activity tracking). The lenges are quite different. A large quantity of data flows through
analyst must use historical content to interpret this data. Smart a surveillance network. Especially, the requirement for extreme
video surveillance systems are capable of enhancing situational economizing in use of power and network bandwidth, which is
awareness over multiple scales of time and space. Currently, the a dominating factor in most sensor networks, is excluded from
component technologies are evolving in isolation. For instance, most surveillance networks [43].
face recognition technology handles the identity-tracking chal- Fig. 12 illustrates a simple architecture for information fusion.
lenge, while restricting the subject to be in front of the camera, The nodes scan the environment periodically and transmit a
and intelligent video surveillance technologies offer activity signal. The received signal is first processed by a preprocessor
detection capabilities to video streams while disregarding the to extract significant characteristics from the environment. The
identity tracking challenge. To offer comprehensive, nonintru- preprocessors are responsible for quantifying how much the
sive situation awareness, it is crucial to address the challenge of environment is different from the steady state. The information
multiscale, spatiotemporal tracking [41]. fusion function then deducts if there is an intruder present or
Bandini and Sartori [42] present a monitoring and control not [34].
system (MCS). An MCS attempts to support humans in deci- Due to the availability of more advanced and powerful com-
sion making regarding problems, which can occur in critical munications, sensors, and processing units, the architectural
domains. It can be characterized based on its functionalities choice in the 3GSSs can potentially become extremely vari-
1) to gather data of the monitored situation, 2) to evaluate if the able and flexibly customized to acquire a desired performance
data concern an anomalous situation, and 3) in case of anoma- level. The system architecture represents a key factor. For in-
lous situations, to perform the proper actions, e.g., to remedy stance, different levels of distributed intelligence can result in
the problems [42]. preattentive detection methods either closer to the sensors or
An action is typically the creation of an alarm to notify hu- deployed at different levels in a computational processing hier-
mans about the problem. MCSs should be intelligent. For this archy. Another source of variability results from the usage of
reason, MCSs have been traditionally developed by using arti- heterogeneous networks, either wireless or wired, and transmis-
ficial intelligence (AI) technologies, such as neural networks, sion modalities both in means of source and channel coding and
data mining, and knowledge-based systems [42]. in means of multiuser access techniques. Temporal and spatial
In detail, the GSR was an experimental M114 personnel car-

rier, which had been modified for computer control. It had sen-
sors and computer control for vision, navigation, and proxim-
ity aspects. The vision subsystem was mounted on a transport
platform. The proximity sensor subsystem used acoustic rang-
ing sensors to provide short range obstacle position and target
tracking information. The proximity sensor subsystem fused the
information from the sensors into consistent target and obstacle
position and velocity vectors. In target tracking, vision esti-
mates of target bearing could be fused with proximity estimates
to enhance the knowledge of target angular position and motion
accurate vehicle response [46].
SURBOT was another notable mobile surveillance robot de-
Fig. 12. Simple example of a basic architecture [34]. veloped in 1985. SURBOT was developed by Remote Tech-
nology Corporation (REMOTEC) to execute visual, sound, and
radiation surveillance within rooms specified as radiologically
coding scalability can be extremely productive for reducing the
hazardous at nuclear power plants. The results verified that SUR-
quantity of information to be transmitted by every camera de-
BOT could be used for remote surveillance in 54 separate con-
pending on the intelligence level of the camera itself. Multiple
trolled radiation rooms at the plant [47].
access techniques are a fundamental tool to allow a significant
Currently, the development of a completely automated
amount of sensors to share a communication channel in the most
surveillance system based on mobile multifunctional robots is an
efficient and robust way [3].
active research area. Mobility and multifunctionality are gener-
Surveillance network management techniques are required in
ically adopted to reduce the amount of sensors required to cover
the 3GSSs to coordinate distributed intelligence modules to ac-
a given region. Mobile robots can be organized in teams, which
quire optimal performances and to adjust the system behavior
results in intelligent distributed surveillance over considerable
according to the variety of conditions occurring either in a scene
areas. Several worldwide projects attempt to develop completely
or in the parameters of a system. All of these tools are crucial to
or semiautonomous mobile security systems. There are a few
design efficient systems. Finally, a further evolution is the inte-
security robot guards commercially available, e.g., CyberGuard,
gration among surveillance networks based on different types of
RoboGuard, and Security Patrolbot [48].
sensor information, such as audio or visual, but oriented accord-
Recent progression in automation technologies, combined
ing to completely different functionalities, e.g., face detection,
with research in machine vision and robot control, should in
and different types of sensors, e.g., standard cameras [3].
the near future allow industrial robots to adapt to unexpected
variations in their environments. Such autonomous systems are
G. Utilization of Mobile Robots dependent on real-time sensor feedback to reliably and pre-
Seals defines a robot to be an automatic machine with a certain cisely detect, recognize, and continuously track objects within
degree of autonomy, which is designed for active interaction the robot’s workspace, especially for applications such as on-
with the environment. It integrates different systems for the the-fly object interception [49].
perception of the environment, decision making, and formation Traditionally, the amount of different sensors mounted on
and execution of plans. In addition to these characteristics, a the robot, the amount of tasks related to navigation, explo-
mobile robot must produce a transitable path and then follow ration, monitoring, and detection operations present the design
this path [44]. of the overall control system challenging. In recent years, there
The extremely hostile environments imposed by combat, has been research in issues, such as autonomous navigation in
space, and deep ocean environments created the need for prac- indoor and outdoor environments and outdoor rough terrains,
tical autonomous vehicles for military applications, space, and visual recognition, sensor fusion and modulation, and sensor
ocean exploration. Several efforts have formed the foundation scheduling. An essential part of the research has concentrated
for autonomous vehicle development, such as Shaky, Jason, and on behavior-based approaches in which complexity is reduced
the Stanford Cart. These first generation autonomous vehicles with computationally simple algorithms that process sensor in-
were used to explore fundamental issues in vision, planning, formation at real-time with high-level inference strategies [48].
and robot control [45]. The inclusion of distributed artificial intelligence has intro-
These systems were strictly hampered by primitive sensing duced the development of new technologies in detection (sen-
and computing hardware. Efforts in the 1980s created the second sors and captors), robotics (actuators), and data communication.
generation of autonomous vehicle testbeds. This era includes the These technologies enable surveillance systems to detect a wider
developments of autonomous land vehicle (ALV) and the United frequency range, to cover a wider sensor area, and to decide the
States Marine Corps (USMC) ground surveillance robot (GSR). character of a particular situation [50].
The GSR was an autonomous vehicle, which transited from Researchers in robotics have debated the surveillance issue.
one known geographic location to another known geographic Robots and cameras installed can identify obstacles or humans
location across a completely unknown terrain [45]. in the environment. The systems guide robots around these ob-
Fig. 13. Platform model of iBot [52].
Fig. 14. Detected target tracked and geo-registered on the map [53].
stacles. These systems typically extract purposeful information

eventually deploys its robot movement commands and camera
from massive visual data, which requires substantial computa-
pan-tilt-zoom commands [52].
tion or manpower [51].
Liu et al. present an unmanned water vehicle (UWV), which
A security guard system, which uses autonomous mobile
performs automatic maritime visual surveillance. The UMV
guard robots, can be used in buildings. The guard can be a
mobile platform is equipped with a GPS device and a high-
wheel-type autonomous robot that moves on a planned path.
resolution omnicamera. Omnicameras provide a 360◦ view ca-
The robot is always on alert for anything unusual, from moving
pability. Targets are detected with a saliency-based model and
objects to leaking water. The robot is equipped with cameras.
adaptively tracked with through selective features. Each target
While the robot is patrolling, it transmits images back to the
is geo-registered to a longitude and latitude coordinate. The tar-
monitoring station. After the robot finishes patrolling, it can au-
get geo-location and appearance information is then transmitted
tomatically return to and dock in a battery recharging station.
to the fusion sensor, where the target location and image is
These security robot systems can improve the security of homes
displayed on a map, as in Fig. 14 [53].
and offices [52].
A basic need in security is the ability to automatically verify
an intruder, to alert remote guards, and to allow them to monitor VII. DISCUSSION ON CURRENT DILEMMAS IN THE 3GSS
the intruder when an intruder enters a secure or prohibited area. According to Pavlidis et al. [54], the contemporary security
To assure both mobility and automaticity, the camera is embed- infrastructure could be summarized as the following: 1) secu-
ded onto a teleoperated robot. Mobility and teleoperation, in rity systems act locally and they do not cooperate in an effi-
which security guards can remotely instruct a mobile robot to cient manner; 2) extremely high value assets are insufficiently
track and identify a potential intruder, are more attractive than protected by obsolete technology systems; and 3) there is a de-
conventional immovable security systems [52]. pendence on intensive human concentration to detect and assess
An example of mobile robots is “iBotGuard” which was de- threats. Considering the practical realities, Pavlidis et al. [54]
veloped by Liu et al. [52]. It is an Internet-based intelligent robot recommend to cooperate closely with both the business unit that
security system, which can detect intruders utilizing invariant would productize the surveillance prototype and the potential
face recognition [52]. customers [54].
Fig. 13 illustrates the iBot platform model. This platform Security-related technology is a growing industry. Govern-
enables users to remotely control a robot in response to live ments and corporations worldwide are spending billions of dol-
video images captured by the camera on the robot. The iBot lars in the research, development, and deployment of intelligent
Server connects the robot and camera over a wireless channel, video surveillance systems, data mining software, biometrics
excluding problems associated with cables [52]. systems, and Internet geolocation technology. The technologies
The iBot Server includes two components, which are 1) a target terrorists and violators of export restrictions. Surveillance
streaming media encoder (SME) and 2) a controller server (CS). technologies are typically shrouded with secrecy, because the
The iBot client includes another two components, which are 1) fear of exposing them will make them less efficient, but the
a streaming media player (SMP) and 2) a controller client (CC). growing utilization of these technologies has provoked public
The SME captures and encodes the real-time video from the interest and resistance to security-related technologies [55].
camera on the robot under the instruction of the CS. The encoded The following sections consist of the most notable aspects
streams are delivered by the streaming media server (SMS) to discovered in literature review. They compose of the attain-
the SMP. The SMP receives, decodes, and displays the media ment of real-time distributed architecture, awareness and intel-
data. The CC communicates with the SMP and the CC interacts ligence, existing difficulties in video surveillance, the utilization
with the CS to perform the intelligent control algorithms. The CS of wireless networks, the energy efficiency of remote sensors,
the location difficulties of surveillance personnel, and scalability The fundamental techniques for interpreting video and ex-
difficulties. tracting information from it have received a substantial amount
of attention. The successive set of challenges addresses on how
to use these techniques to construct large-scale deployable sys-
A. Real-Time Distributed Architecture tems. Several challenges of deployment contain the cost min-
It is fundamental to establish a framework or methodology imization of wiring, low-power hardware for battery-operated
for designing distributed wide-area surveillance systems. This camera installations, automatic calibration of cameras, auto-
ranges from the generation of requirements to the creation of matic fault detection, and the development of system manage-
design paradigms by defining functional and intercommunica- ment tools [41].
tion models. The future realization of a wide-area distributed Improving the smart cameras with additional sensors could
intelligent surveillance system should be through a collection of transform them into a high-performance multisensor system. By
distinct disciplines. Computer vision, telecommunications, and combining visual, acoustic, tactile, or location-based informa-
system engineering are clearly needed [2]. tion, the smart cameras become more sensitive and can transmit
A distributed multiagent approach may provide numerous results that are more precise. This makes the results more appli-
benefits. First, intelligent cooperation between agents may en- cable widely [11].
able the use of less expensive sensors and, therefore, a large The usual scenario in an industrial research and development
number of sensors may be deployed over a larger area. Sec- unit developing vision systems is that a customer presents a
ond, robustness is enhanced, because even if some agents fail, system specification and its requirements. The engineer then
others remain to perform the mission. Third, performance is interprets these requirements into a system design and validates
more flexible, there is a distribution of tasks at miscellaneous that the system design fulfils the user-specified requirements.
locations between groups of agents. For instance, the likelihood The accuracy requirements are typically defined in terms of
of correctly classifying an object or target increases if multiple detection and false alarm rates for objects. The computational
sensors are concentrated on it from different locations [2]. requirement is specified commonly by the system response time
A video surveillance network is a complicated distributed ap- to the presence of an object, e.g., real-time or delayed. The in-
plication and requires sophisticated support from middleware. tention of the vision systems engineer is to then exploit these
The role of middleware is primarily to support communica- restrictions and design a system that is operational in the sense
tion between modules. The nonfunctional requirements for the that it satisfies customer requirements regarding speed, accu-
video surveillance networks are best defined in architectural racy, and expenses [58].
terms and contain scalability (middleware must offer tools suit- The essential dilemma is that there is no known systematic
able for the scalable re-implementation of these algorithms), way for vision systems engineers to conduct this translation of
availability (the middleware needs to support sufficient fault the system requirements to a detailed design. It is still an art to
tolerance to uphold acceptable levels of availability), evolvabil- engineer systems that satisfy application-specific requirements.
ity (the capacity of the surveillance network to adjust to changes, There are two basic steps in the design process, which are 1) the
including changes to the hardware and modifications to the soft- choice of the system architecture and the modules to achieve the
ware), integration (middleware is the intermediary for this type task, and 2) the statistical analysis and validation of the system
of communication), security (middleware needs to offer secu- to check if it fulfils user requirements. In real life, the system
rity facilities to address such attacks), and manageability (the design and analysis phases usually follow each other in a cycle
network middleware must support the on-demand requirement until the engineer creates a design and a suitable analysis that
for manageability) [43]. satisfies the user specifications [58].
The systems provide a concrete and profitable assistance to Automation of the design process is a research area with
forensic investigations, despite that their potential capabilities multiple open issues, even though there has been some studies
are decreased in reality by the limitations of storage capacities, in the context of image analysis, e.g., automatic programming.
the frame skipping, and the data compression. Currently real- The systems analysis (performance characterization) phase in
time reactivity is insufficient, because the human operators that the context of video processing systems has been an active
cannot handle enormous amounts of surveillance streams [57]. region of research in the recent years. Performance evaluation
1) Architectural Dilemmas in Video Surveillance: While ex- of image and video analysis components or systems is an active
isting research has addressed multiple issues in the analysis of research topic in the vision community [58].
surveillance video, there has been little work in the area of 2) Real-Time Data Constraints: Society requires the results
more efficient information acquisition based on real-time au- of research activities to address new solutions in video surveil-
tomatic video analysis, such as the automatic acquisition of lance and sensor networks. Security and safety calls for new
high-resolution face images. There is a challenge in transmitting generations of multimedia surveillance systems, in which com-
information across different scales and the interpretation of the puters will act not only as supporting platforms but as the essen-
information become essential. Multiscale techniques present a tial core of real-time data comprehension process, is becoming
completely novel region of research, including camera control, a reality [57].
processing video from moving cameras, resource allocation, Most of the new research activities in surveillance are explor-
and task-based camera management in addition to challenges in ing larger dimensions, such as distributed video surveillance
performance modeling and evaluation [41]. systems, heterogeneous video surveillance, audio surveillance,
and biometric systems. In vast distributed environments, the processing or operator involvement, and 4) offer human users a
exploitation of networks of small cooperative sensors should high-level interface for dynamic scene visualization and system
considerably improve the surveillance capability of high-levels tasking [12].
sensors, such as cameras [57]. Intelligent visual surveillance is a vital application area for
As system size and diversity grow and consequently the com- computer vision. In situations in which networks of hundreds of
plexity increases, the probability of inconsistency, unreliability cameras are used to cover a wide area, the obvious restriction is
and nonresponsiveness grows. The design and implementation the ability of the user to manage vast amounts of information.
of distributed real-time systems present essential challenges to Due to this reason, automated tools that can generalize activi-
ensure that these complicated systems function as required. To ties or track objects are crucial to the operator. The ability to
comprehend or implement any complex system, it is necessary track objects across (spatially separated) camera scenes is the
to decompose it into component parts and functions. Distributed key to the user requirements. Extensive geometric knowledge of
systems can be considered in terms of independent concurrent the site and camera positions is normally needed. This type of
activities that need to exchange data that do not weaken the explicit mapping to camera placement is impossible for large in-
overall predictability and performance of the system [59]. stallations, because it requires that the operator knows to which
There are four crucial objectives that design methods for real- camera to switch when an object vanishes [61].
time systems should achieve, 1) to be able to structure the system While detecting and tracking objects are crucial capabili-
in concurrent tasks, 2) to be capable of developing reusable ties for smart surveillance, from the perspective of human in-
software by information hiding, 3) to be able to determine the telligence analyst, the most critical challenge in video-based
behavioral characteristics of the system, and 4) to be able to surveillance is interpreting the automatic analysis of data into
analyze the performance of the design by distinguishing its the detection of events of interest and the identification of trends.
performance and the fulfillment of requirements [59]. Contemporary systems have just begun to examine automatic
The main motivation of the paradigm shift from a central to event detection. The key points are video-based detection and
a distributed control surveillance system is an improvement of tracking, video-based person identification, large-scale surveil-
the functionality, availability, and autonomy of the surveillance lance systems, and automatic system calibration [41].
system. These surveillance systems can respond autonomously Object tracking is a vital task for many applications in the re-
to changes in the environment of the system and to detected gion of computer vision and particularly in those associated to
events in the monitored scenes. A static surveillance system video surveillance. Recently, the research community has con-
configuration is not desirable. The system architecture must centrated its interests on developing smart applications to en-
support reconfiguration, migration, quality of service, and power hance event detection capabilities in video surveillance systems.
adaptation in analysis tasks [11]. Advanced visual-based surveillance systems need to process
Recently, there has been rapid development in advanced videos resulting from multiple cameras to detect the presence of
surveillance systems to solve a collection of difficulties that mobile objects in the monitored scene. Every detected object is
vary from people recognition to behavior analysis with the in- tracked and their trajectories are analyzed to deduct their move-
tention to enhance security. These challenges have encountered ment in the scene. Finally, at the highest levels of the system,
different perspectives and were followed by a vast selection of detected objects are recognized and their behavior is analyzed
system architectures. As cheaper and faster computing hard- to verify if the state is normal or potentially dangerous [62].
ware accompanied with efficient and versatile sensors reached Motion detection, tracking, behavior comprehension, and
the consumer, there was a rapid development of multicamera personal identification at a distance can be realized by sin-
systems. In spite of their large area coverage, they introduce gle camera-based visual surveillance systems. Multiple camera-
new dilemmas that must be addressed in the architectural defi- based visual surveillance systems can be helpful, because the
nition [60]. surveillance region is enlarged and multiple view information
can outperform occlusion. Tracking with a single camera easily
creates obscurity resulting from occlusion or depth (see Fig. 15).
B. Difficulties in Video Surveillance This incomprehensibility may be removed by another view. Vi-
In realistic surveillance scenarios, it is impossible for a sin- sual surveillance using multicameras introduces dilemmas, such
gle sensor to view all the areas simultaneously, or to visually as camera installation, camera calibration, object matching, au-
track a moving object for a long period. Objects become oc- tomated camera switching, and data fusion [20].
cluded by buildings and trees and the sensors themselves have The recognition of human activities in restricted settings, such
confined fields of view. A promising solution to this difficulty as airports, parking lots, and banks is of significant interest in se-
is to use a network of video sensors to cooperatively monitor curity and automated surveillance systems. Albanese et al. [63]
all the objects within an extended region and seamlessly track state that science is still far from achieving a systematic solution
individual objects that cannot be viewed continuously by an in- to this difficulty. The analysis of activities executed by humans
dividual sensor alone. Some of the technical challenges within in restricted settings is of great importance in applications, such
this method are to 1) actively control sensors to cooperatively as automated security and surveillance systems. There has been
track multiple moving objects, 2) fuse information from mul- essential interest in this area where the challenge is to automat-
tiple sensors into scene-level object representations, 3) survey ically recognize the activities occurring in the field of a camera
the scene for events and activities that should “trigger” further and detect abnormalities [63].
reasoning the occlusion between standing humans. Exploiting a

human appearance model can achieve better results in tracking
multiple standing and walking people in a large crowd, but it
may also result in difficulties in addressing occlusions involving
objects, such as bags, luggage, children, sitting people, and ve-
hicles. The change and interchange of labels of tracked objects
after occlusion are the most conventional and significant errors
of these methods [24].
Tracking multiple people in cluttered and crowded scenes is
a demanding task primarily because of the occlusion between
Fig. 15. Example of occlusion [15].
people. If a person is visually isolated, it is easier to perform
the tasks of detection and tracking. The increase of the density
of objects in the scene increases interobject occlusions. A fore-
Visual surveillance is a very active research area in computer ground blob may not belong to a single individual and it may
vision because of the rapidly increasing number of surveillance belong to several individuals in the scene. A person may even
cameras, which results in a strong demand for the automatic pro- be completely occluded by other people, resulting in an impos-
cessing methods of their output. The scientific challenge is to sibility to detect and track multiple individuals with a single
plan and implement automatic systems that can detect and track camera. Multiple views of the same scene attempts to acquire
moving objects, and interpret their activities and behaviors. This information that might be omitted in a particular view [67].
need is a worldwide phenomenon, which is required by both pri- The usage of multiple cameras in visual surveillance has
vate companies, governmental and public institutions, with the grown significantly, because it is very useful to address many
aim of enhancing public safety. Visual surveillance is a key tech- difficulties, such as occlusion. Visual surveillance that uses mul-
nology in public safety, e.g., in transport networks, town centers, tiple cameras has numerous problems though. These include
schools, and hospitals. The main tasks in visual surveillance sys- camera installation, calibration of multiple cameras, correspon-
tems contain motion detection, object classification, tracking, dence between multiple cameras, automated camera switching,
activity understanding, and semantic classification [25]. and data fusion [68].
Luan et al. proclaim that tracking in low frame rate (LFR) 2) Feature Extraction and Classification: The recognition
video is a practical requirement for numerous real-time appli- of moving targets in a video stream still remains a difficulty.
cations, including visual surveillance. For tracking systems, an Moving target recognition entails two main steps, which are 1)
LFR condition is equivalent to abrupt motion, which is typ- feature extraction and 2) classification. The feature extraction
ically encountered but difficult to address. Specifically, these process derives a collection of features from the video stream.
difficulties include poor motion continuity, and fast appearance Numerous machine-learning classification techniques have been
variation of target and increased background clutter. The ma- studied for surveillance tasks [69].
jority of existing approaches cannot be readily applied to LFR The most typical approach to detect moving objects is back-
tracking problems because of their vulnerability to motion and ground subtraction in which each frame of a video sequence is
appearance discontinuity inflicted by LFR data [64]. compared against a background model. One dilemma in back-
1) Occlusions: Outdoor and indoor surveillance has some ground subtraction is caused by the detection of false objects
distinct requirements. Indoor surveillance can be considered when an object that belongs to the background, e.g., after re-
as less complicated than outdoor surveillance. The operating maining stationary for a period of time, moves away. This cre-
conditions are stable in indoor environments. The cameras are ates what are called “ghosts.” It is vital to address this problem
typically fixed and not subject to vibration, weather conditions because ghost objects will unfavorably affect many tasks, such
do not affect the scene, and the moving targets are generally as object classification, tracking and event analysis, e.g., aban-
limited to people. Regardless of these simplified conditions, in- doned item detection [70].
house scenes are characterized by other eccentricities, which Fig. 16 presents visual results from a basic motion tracker and
enlarge the dilemmas of surveillance systems [65]. a ghost detection algorithm. Boxes with dark borders indicate the
Occlusions and operation in difficult weather conditions are valid moving tracks created by the tracker. Boxes with dashed
fundamental challenges. In a multiple-target-tracking system, dark borders denote the valid but static tracks. Boxes with white
the key points of the local tracker are typically the detection borders represent invalid tracks, also known as, ghost tracks. The
subsystem and the measurements-to-tracks association subsys- patches presented in the boxes present the foreground pixels,
tem. The design of the association system is dependent on the which are detected as moving pixels [70].
quality of the detection subsystem [66]. 3) Automatic Video Analysis: The strategy proposed by
The difficulty of tracking multiple objects among complicated Wang et al. [71] to support rapid decision making is to reduce
crowds in busy areas is far from being completely solved. The the amount of information required to be processed by human
majority of existing algorithms are designed under one or mul- operators. For this reason, researchers have been studying auto-
tiple presumptions on occlusions, e.g., the number of objects, matic video content analysis technologies to extract information
partial occlusion, short-term occlusion, constant motion, and from videos. Even though substantial progress has been made,
constant illumination. Some methods use a human model for the high computational cost of these techniques limits their us-
data processing and computation, are available and can decrease

the computational burden of a central processing node [73].
The reliability of sensors is never explicitly considered. The
difficulty in choosing the most relevant sensor or collection of
sensors to execute a particular task often arises. The task could
be target tracking, audio recording of a suspicious event, or
triggering an alarm. It would be desirable to have a system that
could automatically select the correct camera or collection of
cameras. If data from multiple sensors are available and data
fusion can be performed, results could be considerably affected
in a case of a malfunctioning sensor. A means to evaluate the
performance of the sensors and to weight their contribution in
the fusion process is required [73].
In the contemporary generation of surveillance systems, in
which multiple asynchronous and miscellaneous sensors are
Fig. 16. Example basic tracking and ghosts [70]. used, the adaption of the information acquired from them to
derive the events from the environment is an important and
challenging research problem. Information adaption refers to
age in real-time situations in the near future. Even though these the process of combining the sensor and nonsensor information
techniques can essentially reduce the amount of video informa- using the context and past experience. The issue of informa-
tion, which must be analyzed by human operators, the human tion adaption is vital, because when information is acquired
operators must resolve the ambiguities in the videos, synthesize from multiple sources, adapted information offers more precise
a vast range of context information within the videos, and make inferences of the environment than individual sources [74].
final decisions. Therefore, it is important to design interactive 1) Context Awareness: To improve software autonomy, ap-
visualizations that can support real-time information synthesis plications depend on the context information to dynamically
and decision making for video surveillance tasks [71]. adapt their behavior to match the environment and user require-
Additionally, the amount of cameras and the area under ments. Context-aware applications require middleware for the
surveillance are restricted by the personnel available. To re- transparent distribution of components. Context-aware applica-
duce the restrictions of traditional surveillance methods, there tions are needed to support personalization and adaptation based
is ongoing effort in the computer vision and artificial intelli- on context awareness. The user must understand how the appli-
gence community to develop automated systems for the real- cations function, such as what context information and logic are
time monitoring of people, vehicles, and other objects. These utilized at particular automated actions. Context-aware applica-
systems can create a depiction of the events occurring within tions must ensure that actions are committed on behalf of users
their vicinity and raise alarms if they detect a suspicious person are both accountable and intelligible. The system cannot simply
or unusual activity [22]. be trusted to act on behalf of users [75].
Camera systems for surveillance are in extensive use and To address these dilemmas, autonomous context-aware sys-
produce considerable amounts of video data, which are stored tems need to provide mechanisms to present a suitable balance
for future or immediate utilization. In this context, efficient in- between user control and software autonomy. This contains pro-
dexing and retrieval from surveillance video databases are cru- viding mechanisms to make users aware of application adapta-
cial. Surveillance videos are rich in motion information, which tions by indicating aspects of the application state, such as con-
is the most important cue to identify the dynamic content of text information and adaptation logic used in decision making
videos. Extraction, storage, and analysis of motion information processes. The challenge is not to only identify what application
in videos and in content-based surveillance video retrieval are state information should be presented, but in what manner, e.g.,
of importance [72]. with what level of explanation. In traditional applications, the
tradeoff between user control and software autonomy has been
fixed during the design phase. In contrast, context-aware appli-
C. Awareness and Intelligence cations may need to adjust the balance of software autonomy
The ultimate goal of surveillance systems is to automatically and user control at run-time by changing the level of feedback
evaluate the ongoing activities of the monitored environment to users and the content of user input. The support for adaption
by flagging and presenting the suspicious events at real-time to includes the management of rules and user preferences that are
the operator to prevent dangerous situations. Data fusion tech- used to distinguish how the context-aware system will respond
niques can be used to enhance the estimation of performance to the available context information [75].
and system robustness by exploiting the redundancy offered The design of aware systems, the systems that have capabili-
by multiple sensors observing the same scene. With recent ad- ties of automatic adaption to changes, learning from experience,
vancements in camera and processing technology, data fusion is and active interaction with external entities, are all active topics
being considered for video-based systems. Intelligent sensors, of research involving several disciplines ranging from computer
which are equipped with microprocessors to execute distributed vision to artificial intelligence. To reach this goal, the approaches
tioning sensors, and enhanced continuity with complementary

detections [78].
Typically, surveillance systems are composed of numerous
sensors to acquire data from each target in the environment.
These systems encounter two types of dilemmas, which are
1) the fusion of data which addresses the combination of data
from distinct sources in an optimal manner, and 2) the manage-
ment of multiple sensors, which addresses the optimization of
the global management of the system through the application of
individual operations in every sensor [14].
In Castanedo et al.’s [14] surveillance systems, autonomous
agents can cooperate with others agents for two different objec-
tives, which are 1) to acquire enhanced performance or precision
for a specific surveillance task, in which the complementary in-
formation can be incorporated and then combined through data
fusion techniques, and 2) to use capabilities of other agents to
expand system coverage and execute tasks that they are not able
to achieve individually [14].
The information adaption is a challenging task, because of
1) the diversity and asynchrony of sensors, 2) the disagreement
Fig. 17. Example of a cognitive cycle [76]. or agreement of media streams, and 3) the confidence regarding
the media streams. There is an issue on how to fuse individual
information to establish comprehensive information. These are
based on the imitation of human brain skills are typical and, in items of importance and essential challenges [74].
the past, they have offered successful applications. In Fig. 17,
Dore et al. present a possible model that contains sensing in-
formation from the external world, analyzing and representing D. Wireless Networks and Their Applicability
the information, conducting decisions, and issuing actions and The WSN has multiple applications in environment monitor-
communications to the external world [76]. ing applications. Advances in microsensor and communication
2) Data Fusion: Blasch and Plano [77] state that “data fu- technologies have made it possible to manufacture cost-effective
sion” is a term used to refer to the bottom-level, data-driven fu- and small WSNs. Several interesting WSN applications have
sion. “Information fusion” refers to processing of already-fused been specified, such as the active badge system, which locates
data, such as from primary sensors or sources, into meaning- individuals within a building. Radio frequency identification
ful and preferably relevant information to another part of the (RFID) technology is utilized in inventory management and
system, human or not [77]. monitoring, e.g., rail car tracking. The confidence in object lo-
A multimedia system incorporates relevant media streams cation may be improved with an RFID stream in comparison
to accomplish a detection task. As the different streams have to an audio stream [16]. Berkeley Smart Dust can be used to
different confidence levels in achieving distinct tasks, it is vital periodically receive readings from sensors. The Massachusetts
for the system to wisely identify the most appropriate streams Institute of Technology (MIT) cricket uses the time difference of
for a specific analysis task for it to reach higher confidence. arrival (TDoA) model to distinguish the position and orientation
The confidence information of media streams is usually used in of a device [79].
their incorporation by assigning weights to them accordingly. The combination of these technologies could provide many
The confidence in a stream is normally determined on how it has new applications. The sensor networks can detect and indicate
assisted in performing the detection task previously. Arguably, if environment-related information and events. Through messag-
the system acquires precise results based on a particular stream, ing systems, these events can be transmitted to the outside world
a higher confidence level is assigned to it in the adaption process for immediate processing. These events may trigger human or
[16], [17]. application programs to respond with actions, which may be
Data fusion from multiple cameras involving the same objects further conveyed back into the sensor networks [79].
is a main challenge in multicamera surveillance systems and in- By adopting the networks as the communications medium for
fluences the optimal data combination of different sources. It is real-time transmission of video signals in a security-sensitive
required to estimate the reliability of the available sensors and operation, many technological issues need to be resolved. A
processes to combine complementary information in regions great amount of data flow can cause network congestion. The
where there are multiple views to solve dilemmas of specific system must provide real-time transmission of video signals
sensors, such as occlusions, overlaps, and shadows. Some tradi- even though there might be only a small amount of bandwidth
tional benefits, in addition to extended spatial coverage, are the available. Robust and efficient error control mechanisms and
enhancements in accuracy with the combination of covariance video compression techniques need to be used to prevent the
reduction, improved robustness by the identification of malfunc- difficulties related to limited bandwidth [4].
Recently, there has been an emphasis on the development A sensor surveillance system comprises a set of wireless sen-
of wide-area distributed wireless sensor networks with self- sor nodes and a set of targets to be monitored. The wireless
organization capabilities to tolerate sensor failures, changing sensor nodes collaborate with each other to survey the targets
environmental conditions, and distinct environmental sensing and transmit the sensed data to a base station. The wireless sen-
applications. Particularly, mobile sensor networks (MSNs) re- sor nodes are powered by batteries and have demanding power
quire support from self-configuration mechanisms to guarantee requirements. The lifetime is the duration until there is no target
adaptability, scalability, and optimal performance. The best net- that can be surveyed by any wireless sensor node or data cannot
work configuration is typically time varying and context depen- be forwarded to be processed because of a lack of energy in the
dent. Mobile sensors can physically change the network topol- sensors nodes [84].
ogy, responding to events of the environment or to changes in A client-side computing device has a crucial influence on the
the mission [80]. total performance of a surveillance system. The utilization of
a cellular phone as a client of a surveillance system is notable,
because of its portability and omnipresent computing. The in-
tegration of video information and sensor networks established
E. Energy Efficiency of Remote Sensors the fundamental infrastructure for new generations of multime-
With the emergence of high-resolution image sensors, dia surveillance systems. In this infrastructure, different media
video transmission requires high-bandwidth communication streams, such as audio, video and sensor signals, would pro-
networks. It is predicted that future intelligent video surveil- vide an automatic analysis of the controlled environment and a
lance requires more computing power and higher communi- real-time interpretation of the scene [85].
cation bandwidth than currently. This results in higher reso-
lution images, higher frame rates, and increasing numbers of
cameras in video surveillance networks. Novel solutions are F. Dilemmas in Scalability
needed to handle demanding restrictions of video surveillance A scalable system should be able to integrate the sensor data
systems, both in terms of communication bandwidth and com- with contextual information and domain knowledge provided
puting power [81]. by both the humans and the physical environment to maintain
Intruder detection and data collection are examples of ap- a coherent picture of the world over time. The performance of
plications envisioned for battery-powered sensor networks. In the majority of the systems is far from what is required from
many of these applications, the detection of a certain triggering real-world applications [86].
event is the initial step executed prior to any other processing. A large-scale distributed video surveillance system usually
If trigger events occur seldom, sensor nodes will use a large comprises many video sources distributed over a vast area, trans-
majority of their lifetime in the detection loop. The efficient use mitting live video streams to a central location for monitoring
of system resources in detection then plays a key role in the and processing. Contemporary advances in video sensors and the
longevity of the sensor nodes. The energy consumption in the increasing availability of networked digital video cameras have
system includes transmission energy and the energy required allowed the deployment of large-scale surveillance systems over
by processing has not been considered directly in the detection existing IP-network infrastructure. Implementing an intelligent,
problem [82]. scalable, and distributed video surveillance system remains a
It is crucial to note that technology scaling will gradually de- research problem. Researchers have not paid too much attention
crease the processing costs with the transmission cost remain- on the scalability of video surveillance systems. They typically
ing constant. With the usage of compression techniques, one utilize a centralized architecture and assume the availability of
can reduce the number of transmitted bits. The transmission all the required system resources, such as computational power
cost is decreased with an increase of additional computation. and network bandwidth [87].
This communication computation tradeoff is the fundamental Fig. 18 presents an example of sensor coverage in a large com-
idea behind low-energy sensor networks. This is a sharp con- plex [15]. The sensor and its coverage is drawn and indicated,
trast to the classical distributed systems, in which the goal is e.g., B1, C1, C2, and C3 [15].
usually maximizing with the speed of execution. The most ap- The integration of heterogeneous digital networks in the same
propriate metrics in wireless networks is power. Experimental surveillance architecture needs a video encoding and distribu-
measurements indicate that the communication cost in wireless tion technology capable of adapting to the currently available
ad hoc networks can be two orders of magnitude higher than bandwidth, which may change in time for the same communi-
computation costs regarding consumed power [38]. cation channel, and to be robust against transmission errors. The
Integrated video systems (IVSs) are based on the recent devel- presence of clients with different processing power and display
opment of smart cameras. In addition to high demands in com- capabilities accessing video information requires a multiscale
puting performance, power awareness is of major importance representation of the signal. The restrictions of surveillance
in IVS. Power savings may be achieved by graceful degrada- applications regarding delay, security, complexity, and visual
tion of quality of service (QoS). There has been research done quality introduce strict demands to the technology of the video
in the tradeoff of image quality and power consumption. The codec. In a large surveillance system, the digital network that
work mainly concentrates on sophisticated image compression enables remote monitoring, storage, control and analysis is not
techniques [83]. within a single local area network (LAN). It typically represents
G. Location Difficulties
Location techniques have numerous possible applications
in wireless communication, surveillance, military equipment,
tracking, and safety applications. Sagiraju et al. [56] concen-
trate on positioning in cellular wireless networks. The results
can be applied to other systems. In the GPS, code-modulated
signals are transmitted by numerous satellites, which orbit the
earth, and are received by GPS receivers to determine the current
position. To calculate a position, the receiver must first acquire
the satellite signals. Traditionally, GPS receivers have been de-
signed with specific acquisition and tracking modes. After the
signal has been acquired, the receiver switches to the track-
ing mode. If it loses the lock, then the acquisition needs to be
repeated [56].
The GPS system comprises of at least 24 satellites in orbit
around the world, with at least four satellites viewable from any
Fig. 18. Schematic representation of sensor coverage in a large area [15]. point, at a given time, on Earth. Despite GPS being a sophis-
ticated solution to the location discovery process, it has mul-
tiple network dilemmas. First, GPS is expensive both in terms
of hardware and power requirements. Second, GPS requires
a collection of interconnected LANs, wired or wireless, with line-of-sight between the receiver and the satellites. It does not
different bandwidths and QoS. Different types of clients con- function well when obstructions, such as buildings, block the
nect to these networks and access one or multiple video sources, direct “view” of the satellites. Locations can be calculated by
decode them at the temporal and spatial resolution they require, trilateration. For a trilateration to be successful, a node needs
and provide different functions [88]. to have at least three neighbors who already are aware of their
QoS is a fundamental concern in distributed IVS. In video- positions [38].
based surveillance, normal QoS parameters contain frame rate, Security personnel review their wireless video systems for
transfer delay, image resolution, and video-compression rate. critical incident information. Complementary information in the
The surveillance tasks might also provide multiple QoS levels. form of maps and live video streaming can assist in locating the
In addition, the offered QoS levels can change over time due to problematic zone and act quickly and with knowledge of the
user instructions or modifications in the monitored environment. situation. The need for providing detailed real-time informa-
Novel IVS systems need to contain dedicated QoS management tion to the surveillance agents has been identified and is being
mechanisms [11]. addressed by the research community [10].
1) Scalability in Testing: Testing of individual modules is The analysis and fusion of different sensor information re-
called unit testing. Integration testing comprised of rerunning quires mapping observations to a common coordinate system to
the unit test cases after the system was completely integrated. achieve situational awareness and scene comprehension. Avail-
For feature testing, which is also called system testing, testers ability of mapping capabilities enables critical operational tasks,
developed test cases based on the requirements of the system. such as the fusion of multiple target measurements across the
They chose adequate test cases according to every expected network, deduction of the relative size and speed of the tar-
result. Load testing comprises four subphases, which are 1) get, and the assignment of tasks to Pan, Tilt, Zoom (PTZ) and
stability testing, 2) stress testing, 3) reliability testing, and 4) mobile sensors. This presents the need for automated and effi-
performance testing. Stability testing comprises the installation cient geo-registration mechanism for all sensors. For instance,
of software in a field-like environment and the verification of its target observations from multiple sensors may be mapped to a
ability to appropriately address data continuously. Stress test- geodetic coordinate system and then displayed on a map-based
ing comprises the verification of the ability of the software to interface. Fig. 19 illustrates an example of geo-registration in a
address heavy loads for short periods without crashing. Reli- visual sensor network [90].
ability testing comprises the verification that the software can
fulfill reliability requirements. Performance testing comprises
the verification that the software can achieve performance re- H. Challenges in Privacy
quirements [89]. Surveillance of events poses ethical problems. For instance,
A substantial pitfall in incorporating intelligent functions into events involving humans and the right to monitor can conflict
real-world systems is the lack of robustness, the inability to test with the individual privacy rights of the monitored people. These
and validate these systems under a variety of use cases, and the privacy challenges depend heavily on the shared acceptance of
lack of quantification of the performance of the system. Addi- the surveillance task as a necessity by the public with respect to
tionally, the system should gracefully degrade in performance a given application [3].
as the complexity of data grows. This is a very open research The suitability of homeland security for this role is plagued
issue that is vital for the deployment of these systems [3]. by questions ranging from dependability to the risks that tech-
VIII. GROWING TECHNOLOGIES AND TRENDS

There are novel technologies and trends, which have begun
or are beginning to establish themselves. Kankanhalli and Rui
[93] have indicated numerous. Prati et al. [94] introduced a
multisensor surveillance system containing video cameras and
passive infrared sensors (PIR). Calderara et al. [95] state that
visual sensors will continue to be dominant sensors but they will
be complemented with other appropriate sensors.
Atrey et al. [96] claim that contemporary systems are con-
structed for specific physical environments with specific sensor
types and sensor deployments. While this is efficient, it lacks
portability required for widespread deployment. The system ar-
chitecture should be capable of the usage of the sensors and
Fig. 19. Fields of view of four cameras at a port [90].
resources to address the needs of the environment.
With the increasing variety and decreasing expenses of mis-
cellaneous types of sensors, there will be an increase in the
nologies, e.g., surveillance, profiling, and data aggregation, pose usage of radically differentiated media, such as infrared, mo-
to privacy and civil liberties [1]. tion sensor information, text in diverse formats, optical sensor
In many applications, surveillance data needs to be trans- data, biological and satellite telemetric data, and location data
mitted across open networks with multiuser access characteris- obtained by GPS devices. Some other developments are mobile
tics. Information protection on these networks is a crucial issue sensors, such as moving cameras on vehicles used in public
for upholding privacy in the surveillance service. Paternity of buses. Humans are also mobile sensors recording information
surveillance data can be extremely essential for efficient use in different media types such as blogs. It would beneficial to
in law enforcement. Legal requirements necessitate the devel- enhance the environment with suitable sensors to reduce the
opment of watermarking and data-hiding techniques for secure sensor and semantic omissions [93].
sensor identity assessment [3]. Accompanied with the increased popularity of portable se-
Despite the relevance of the contemporary surveillance sys- curity applications, it is more important that the surveillance
tems, and their role of supporting human control, there is a system has low power consumption, simple functionality, and
world-spread controversy about their utilization, connected with compact size. This includes the integration of the miscellaneous
risks of privacy violations [57]. functional blocks and a motion detection sensor (MDS) into a
Advancements in sensor, communications, and storage ca- single chip [97].
pacities ease the large collection of multimedia material. The The process of extracting and tracking human figures in image
value of this recorded data is only unlocked by technologies sequences is vital for video surveillance and video-indexing ap-
that can efficiently exploit the knowledge it contains. Regard- plications. A useful and popular approach is based on silhouette
less of the concerns over privacy issues, such capabilities are analysis with spatiotemporal representation in which the goal
becoming more common in different environments, for exam- is to achieve an invariant representation of the detected object.
ple, in public transportation premises, cities, public building, Symmetries of the silhouette can be used as a gait parameter for
and commercial establishments [91]. the identification of a person [98].
CCTV surveillance systems used in the field with their cen- Biometrics has been vastly applied to secure surveillance,
tralized processing and recording architecture together with a access control, and personal identification with high security.
simple multimonitor visualization of the crude video streams With the rise of pervasive and personal computation, cell phones
has several disadvantages and restrictions. The most relevant and PDAs will become a major communication and computa-
dilemma is the complete lack of privacy. An automated and tion platform for individuals and suitable organizations. Even
privacy-respecting surveillance system is a desirable goal. The though biometrics has been an appropriate method for attaching
latest video analysis systems emerging currently are based on physical identity to a digital correspondence, a flexible biomet-
centralized approaches that impose strict limitations to expand- rics system, which can accommodate real world applications in
ability and privacy [92]. a secure manner is still a substantial challenge [99].
To realize the fusion for integrated situation awareness, The next generation video surveillance system will be a net-
Trivedi et al. [13] developed the networked sensor tapestry worked, intelligent, multicamera cooperative system with inte-
(NeST) framework for multilevel semantic integration. NeST grated situation awareness of complicated and dynamic scenes.
ensures the tracked person’s privacy by using a set of pro- It will be applicable to urban centers or indoor complexes. The
grammable plug-in privacy filters operating on incoming sensor essence of such a system is the increasingly intelligent and robust
data. The filters either inhibit access to the data or remove any video analysis that is capable of reviewing the videos from low-
personally identifiable information. Trivedi et al. [13] use pri- level image appearance and feature extraction to middle-level
vacy filters with a privacy grammar that can connect multiple object or event detection, and finally to high-level reasoning
low-level data filters and aspects to create data-dependent pri- and scene comprehension. Significant steps have been reached
vacy definitions.
in examining these issues by the research laboratories in the ware and architecture, which serves their unique properties and
last decade. Currently, the focus is on the application of these purposes.
integrated systems and the supplying of automated solutions to There are several major companies that deliver surveillance
realistic surveillance dilemmas [100]. systems. GE Security offers integrated security management,
There has been a dramatic progression in sensing for security intrusion and property protection, and video surveillance [103].
applications and in the analysis and processing of sensor data. ObjectVideo provides intelligent video software for security,
O’Sullivan and Pless [101] concentrate on two broad applica- public safety, and other applications [104]. IOImage provides
tions of sensors for security applications, which are 1) anomaly video surveillance, real-time detection, and alert and tracking
detection, and 2) object or pattern recognition [101]. services [105]. RemoteReality offers video surveillance ser-
In anomaly detection, the difficulty is to detect activity, behav- vices, including the detection and tracking of objects, in both
ior, objects, or substances that are atypical. Typical is defined visible and infrared thermal spectra [106]. Point Grey Research
with respect to historical data and is extremely scenario de- offers digital camera technology for machine vision and com-
pendent. Algorithms for anomaly detection must adjust to the puter vision applications [107].
scenario and be robust to a vast range of possible assumptions.
As a result, there is typically no model for an anomaly and
the model for the location and time are derived from observa- IX. CONCLUSION
tions. Scenarios that need anomaly detection include perimeter, This paper presented the contemporary state of modern
border, or gateway surveillance [101]. surveillance systems for public safety with a special emphasis
In object or pattern recognition, there is typically a model or on the 3GSSs and especially the difficulties of present surveil-
prior information of the object or pattern and the intention is to lance systems. The paper briefly reviewed the background and
categorize the pattern. The level of categorization, the required progression of surveillance systems, including a short review of
system robustness, and the required system efficiency define the first and second generation of surveillance systems. The third
and restrict the possible models and processing. The usage of generation of surveillance systems addresses topics such as mul-
biometrics for the recognition of people is a prime example of tisensor environments, video surveillance, audio surveillance,
an application that is evolving rapidly [101]. wireless sensor networks, distributed intelligence and aware-
Gupta et al. propose a leader–follower system, which receives ness, and architecture and middleware. According to modern
multimodal sensor information from a wide array of sensors, science, the current difficulties of surveillance systems for pub-
including radars and cameras. In such a system, a fixed wide lic safety reside in the fields of the attainment of real-time dis-
field of view (FOV) sensor conducts the duties of the leader. tributed architecture, awareness and intelligence, existing diffi-
The leader directs follower PTZ cameras to zoom in on targets culties in video surveillance, the utilization of wireless networks,
of interest. One of the typical difficulties in a leader–follower the energy efficiency of remote sensors, location difficulties of
system is that the follower camera can only follow the target as surveillance personnel, and scalability difficulties. A portion of
it remains in the FOV of the leader. Additionally, inaccuracies in the difficulties are the same as declared in the 3GSSs, but with
the leader–follower calibration may result in imprecise zooming detailed descriptions on the characteristics of the dilemmas,
operations [102]. such as the architectural, visual and awareness aspects. Other
In general, there is plenty of prototypical research, which has difficulties are completely novel or substantially highlighted,
transformed into practical solutions. Environments with mul- such as surveillance personnel location, application of wireless
tiple sensors include solutions in which electronic locks and networks, energy efficiency, and scalability.
user identification have been incorporated into doors, both of Novel sensors and new requirements will accompany surveil-
which can be perceived as individual sensors. The electronic lance systems. This places demanding challenges on architec-
lock indicates its own status and the user identification device ture and its real-time functionality. There are existing funda-
denotes the access rights of the user. This also forms a simple mental concepts, such as video and audio surveillance, but
realization of distributed intelligence and awareness in which there is a lack of their intelligent usage and especially their
each sensor acts independently but a higher level of deduction seamless interoperability through a united real-time architec-
can be performed based on the individual information of each ture. Contemporary surveillance systems still reside in state in
sensor. Video surveillance has been employed in solutions such which individual concepts may achieve functionality in specific
as the detection of the direction of movement. Airports have cases, but their comprehensive on-site interoperability is yet
utilized this technology to automatically raise alarms in situa- to be reached. Substantial evidence of a distributed multisen-
tions in which a person goes through a passage in the wrong sor intelligent surveillance system does not exist. As the size
direction. Audio surveillance technology has been adopted to of surveyed complexes and buildings grow, the deployment of
video camera solutions, which direct the cameras to the loca- wireless sensors and their energy consumption becomes more
tion of alarming sounds. Within various police forces, mobile notable. Wireless sensors are easy to deploy and low-energy con-
robots have been used to remotely survey a potentially haz- sumption is constantly improving. Scalability issues are funda-
ardous environment and transmit video feed to the user. Wire- mentally related to magnitude of areas under surveillance. Areas
less sensor networks can be used to indicate the locations of that require surveillance are growing and also the complexity
nomadic guards to the control room within an indoor perime- of surveillance systems is expanding. These both pose great
ter. All of these solutions have their own appropriate middle- challenges to the scalability aspect. Different sensors provide
different information and their exploitation in intelligent tasks [15] S. A. Velastin, B. A. Boghossian, B. P. I. Lo, J. Sun, and M. A. Vicencio-
remain a challenge. Sensor data should be decomposed into Silva, “PRISMATICA: Toward ambient intelligence in public transport
environments,” IEEE Trans. Syst., Man, Cybern. A, Syst. Hum., vol. 35,
fundamental blocks and the intelligent components should have no. 1, pp. 164–182, Jan. 2005, doi: 10.1109/TMSCA.2004.838461.
the responsibility of composing the deductions from them. An [16] Z. Rasheed, X. Cao, K. Shafique, H. Liu, L. Yu, M. Lee, K. Ram-
attempt should be made to construct a multisensor distributed nath, T. Choe, O. Javed, and N. Haering, “Automated visual anal-
ysis in large scale sensor networks,” in Proc. 2nd ACM/IEEE Int.
intelligent surveillance system that functions at a relatively high Conf. Distrib. Smart Cameras (ICDSC), Sep. 2008, pp. 1–10, doi:
level, capturing alerting situations with a very low false alarm 10.1109/ICDSC.2008.4635678.
rate. The surveillance personnel are one of the strongest aspects [17] P. K. Atrey and A. El Saddik, “Confidence evolution in multimedia
systems,” IEEE Trans. Multimedia, vol. 10, no. 7, pp. 1288–1298, Nov.
in a surveillance system and should be retained in the system. 2008, doi:10.1109/TMM.2008.2004907.
Despite advancements in intelligence and awareness, the human [18] I. N. Junejo, X. Cao, and H. Foroosh, “Autoconfiguration of a dy-
being will always be a forerunner in adaptability and deductions. namic nonoverlapping camera network,” IEEE Trans. Syst., Man,
Cybern. B, Cybern., vol. 37, no. 4, pp. 803–816, Aug. 2007, doi:
The endless demand and abundance of surveillance systems 10.1109/TSMCB.2007.895366.
for public safety has multiple issues, which still require resolu- [19] D. Makris and T. Ellis, “Learning semantic sense models from observing
tions. Extensive intelligent and automation accompanied with activity in visual surveillance,” IEEE Trans. Syst., Man, Cybern. B,
Cybern., vol. 35, no. 3, pp. 397–408, Jun. 2005.
energy efficiency and scalability in large areas are required to be [20] W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual surveil-
adopted by suppliers to establish surveillance systems for civic lance of object motion and behaviors,” IEEE Trans. Syst., Man, Cy-
and communal public safety. bern. C, Appl. Rev., vol. 34, no. 3, pp. 334–352, Aug. 2004, doi:
10.1109/TSMCC.2004.829274.
[21] C. Kreucher, K. Kastella, and A. O. Hero III, “Multitarget track-
ing using the joint multitarget probability density,” IEEE Trans.
REFERENCES Aerosp. Electron. Syst., vol. 41, no. 4, pp. 1396–1414, Oct. 2005, doi:
10.1109/TAES.2005.1561892.
[1] M. Reiter and P. Rohatgi, “Homeland security guest editor’s introduc-
[22] M. Shah, O. Javed, and K. Shafique, “Automated visual surveillance
tion,” IEEE Internet Comput., vol. 8, no. 6, pp. 16–17, Nov./Dec. 2004,
in realistic scenarios,” IEEE Multimedia, vol. 14, no. 1, pp. 30–39,
doi: 10.1109/MIC.2004.62.
Jan.–Mar. 2007, doi: 10.1109/MMUL.2007.3.
[2] M. Valera and S. A. Velastin, “Intelligent distributed surveillance sys-
[23] G. L. Foresti, C. Micheloni, L. Snidaro, P. Remagnino, and T. El-
tems: A review,” IEE Proc.-Vis. Image Signal Process., vol. 152, no. 2,
lis, “Active video-based surveillance system,” IEEE Signal Pro-
pp. 192–204, Apr. 2005, doi: 10.1049/ip-vis: 20041147.
cess. Mag., vol. 22, no. 2, pp. 25–37, Mar. 2005, doi: 10.1109/MSP.
[3] C. S. Regazzoni, V. Ramesh, and G. L. Foresti, “Scanning the is-
2005.1406473.
sue/technology special issue on video communications, processing, and
[24] L. Li, W. Huang, I. Y.-H. Gu, R. Luo, and Q. Tian, “An efficient
understanding for third generation surveillance systems,” Proc. IEEE,
sequential approach to tracking multiple objects through crowds for
vol. 89, no. 10, pp. 1355–1367, Oct. 2001, doi: 10.1109/5.959335.
real-time intelligent CCTV systems,” IEEE Trans. Syst., Man, Cy-
[4] A. C. M. Fong and S. C. Hui, “Web-based intelligent surveillance system
bern. B, Cybern., vol. 38, no. 5, pp. 1254–1269, Oct. 2008, doi:
for detection of criminal activities,” Comput. Control Eng. J., vol. 12,
10.1109/TSMCB.2008.927265.
no. 6, pp. 263–270, Dec. 2001.
[25] L. Maddalena and A. Petrosino, “A self-organizing approach to back-
[5] K. Müller, A. Smolic, M. Dröse, P. Voigt, and T. Wiegand, “3-D con-
ground subtraction for visual surveillance applications,” IEEE Trans.
struction of a dynamic environment with a fully calibrated background
Image Process., vol. 17, no. 7, pp. 1168–1177, Jul. 2008, doi:
for traffic scenes,” IEEE Trans. Circuits Syst. Video Technol., vol. 15,
10.1109/TIP.2008.924285.
no. 4, pp. 538–549, Apr. 2005, doi: 10.1109/TCSVT.2005.844452.
[26] Y. Li, C. Huang, and R. Nevatia, “Learning to associate: Hybrid boosted
[6] W. M. Thames, “From eye to electron—Management problems of the
multi-target tracker for crowded scene,” in Proc. IEEE Conf. Com-
combat surveillance research and development field,” IRE Trans. Mil.
put. Vis. Pattern Recognit. (CVPR), Jun. 2009, pp. 2953–2960, doi:
Electron., vol. MIL-4, no. 4, pp. 548–551, Oct. 1960, doi: 10.1109/IRET-
10.1109/CVPRW.2009.5206735.
MIL.1960.5008288.
[27] A. Leykin, Y. Ran, and R. Hammoud, “Thermal-visible video fusion
[7] H. A. Nye, “The problem of combat surveillance,” IRE Trans. Mil.
for moving target tracking and pedestrian classification,” in Proc. IEEE
Electron., vol. MIL-4, no. 4, pp. 551–555, Oct. 1960, doi: 10.1109/IRET-
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2007, pp. 1–8, doi:
MIL.1960.5008289.
10.1109/CVPR.2007.383444.
[8] A. S. White, “Application of signal corps radar to combat surveillance,”
[28] A. Leykin and R. Hammoud, “Robust multi-pedestrian tracking in
IRE Trans. Mil. Electron., vol. MIL-4, no. 4, pp. 561–565, Oct. 1960,
thermal-visible surveillance videos,” in Proc. Conf. Comput. Vis. Pat-
doi: 10.1109/IRET-MIL.1960.5008291.
tern Recognit. Workshop (CVPRW), Jun. 2006, pp. 136–143, doi:
[9] C. E. Wolfe, “Information system displays for aerospace surveillance
10.1109/CVPRW.2006.175.
applications,” IEEE Trans. Aerosp., vol. AS-2, no. 2, pp. 204–210, Apr.
[29] W. K. Wong, P. N. Tan, C. K. Loo, and W. S. Lim, “An effective surveil-
1964, doi: 10.1109/TA.1964.4319590.
lance system using thermal camera,” in Int. Conf. Signal Acquis. Process.
[10] R. Ott, M. Gutierrez, D. Thalmann, and F. Vexo, “Advanced virtual
(ICSAP), Apr. 2009, pp. 13–17, doi: 10.1109/ICSAP.2009.12.
reality technologies for surveillance and security applications,” in Proc.
[30] D. Istrate, E. Castelli, M. Vacher, L. Besacier, and J. F. Serignat, “In-
ACM SIGGRAPH Int. Conf. Virtual Real. Continuum Its Appl. (VCRIA),
formation extraction from sound for medical telemonitoring,” IEEE
Jun. 2006, pp. 163–170.
Trans. Inf. Technol. Biomed., vol. 10, no. 2, pp. 264–274, Apr. 2006, doi:
[11] M. Bramberger, A. Doblander, A. Maier, B. Rinner, and H. Schwabach,
10.1109/TITB.2005.859889.
“Distributed embedded smart cameras for surveillance applica-
[31] M. Stanacevic and G. Cauwenberghs, “Micropower gradient flow acous-
tions,” Computer, vol. 39, no. 2, pp. 68–75, Feb. 2006, doi:
tic localizer,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 10,
10.1109/MC.2006.55.
pp. 2148–2157, Oct. 2005, doi: 10.1109/TCSI.2005.853356.
[12] R. T. Collins, A. J. Lipton, H. Fujiyoshi, and T. Kanade, “Algorithms
[32] P. Julian, A. G. Andreou, L. Riddle, S. Shamma, D. H. Goldberg, and
for cooperative multisensor surveillance,” Proc. IEEE, vol. 89, no. 10,
G. Cauwenberghs, “A comparative study of sound localization algo-
pp. 1456–1477, Oct. 2001, doi: 10.1109/5.959341.
rithms for energy aware sensor network nodes,” IEEE Trans. Cir-
[13] M. M. Trivedi, T. L. Gandhi, and K. S. Huang, “Homeland security
cuits Syst. I, Reg. Papers, vol. 51, no. 4, pp. 640–648, Apr. 2004, doi:
distributed interactive video arrays for event capture and enhanced situa-
10.1109/TCSI.2004.826205.
tional awareness,” IEEE Intell. Syst., vol. 20, no. 5, pp. 58–66, Sep./Oct.
[33] A. F. Smeaton and M. McHugh, “Towards event detection in an audio-
2005, doi:10.1109/MIS.2005.86.
based sensor network,” in Proc. 3rd Int. Workshop Video Surveill. Sens.
[14] F. Castanedo, M. A. Patricio, J. Garcia, and J. M. Molina, “Extending
Netw. (VSSN), Nov. 2005, pp. 87–94.
surveillance systems capabilities using BDI cooperative sensor agents,”
[34] J. Chen, Z. Safar, and J. A. Sorensen, “Multimodal wireless networks:
in Proc. 4th Int. Workshop Video Surveill. Sens. Netw. (VSSN), Oct. 2006,
Communication and surveillance on the same infrastructure,” IEEE
pp. 131–138.
Trans. Inf. Forensics Secur., vol. 2, no. 3, pp. 468–484, Sep. 2007, doi: [56] P. K. Sagiraju, S. Agaian, and D. Akopian, “Reduced complexity ac-
10.1109/TIFS.2007.904944. quisition of GPS signals for software embedded applications,” IEE
[35] G. Xing, C. Lu, R. Pless, and Q. Huang, “Impact of sensing coverage on Proc.-Radar Sonar Navig., vol. 153, no. 1, pp. 69–78, Feb. 2006, doi:
greedy geographic routing algorithms,” IEEE Trans. Parallel Distrib. 10.1049/ip-rsn:20050091.
Syst., vol. 17, no. 4, pp. 348–360, Apr. 2006, doi: 10.1109/TPDS.2006.48. [57] R. Cucchiara, “Multimedia surveillance systems,” in Proc. 3rd Int. Work-
[36] R. R. Brooks, P. Ramanathan, and A. M. Sayeed, “Distributed target shop Video Surveill. Sens. Netw. (VSSN), Nov. 2005, pp. 3–10.
classification and tracking in sensor networks,” Proc. IEEE, vol. 91, [58] M. Greiffenhagen, D. Comaniciu, H. Niemann, and V. Ramesh, “Design,
no. 8, pp. 1163–1171, Aug. 2003, doi: 10.1109/JPROC.2003.814923. analysis, and engineering of video monitoring systems: An approach and
[37] A. M. Tabar, A. Keshavarz, and H. Aghajan, “Smart home care network a case study,” Proc. IEEE, vol. 89, no. 10, pp. 1498–1517, Oct. 2001,
using sensor fusion and distributed vision-based reasoning,” in Proc. 4th doi: 10.1109/5.959343.
Int. Workshop Video Surveill. Sens. Netw. (VSSN), Oct. 2006, pp. 145– [59] M. Valera and S. A. Velastin, “Real-time architecture for a large dis-
154. tributed surveillance system,” in Proc. IEE Intell. Surveill. Syst., London,
[38] S. Megerian, F. Koushanfar, M. Potkonjak, and M. B. Srivastava, U.K., Feb. 2004, pp. 41–45.
“Worst and best-case coverage in sensor networks,” IEEE Trans. [60] C. Micheloni, L. Snidaro, L. Visentini, and G. L. Foresti, “Sensor
Mobile Comput., vol. 4, no. 1, pp. 84–92, Jan./Feb. 2005, doi: bandwidth assignment through video annotation,” in Proc. IEEE Int.
10.1109/TMC.2005.15(410)4. Conf. Video Signal Based Surveill. (AVSS), Nov. 2006, pp. 48–48, doi:
[39] V. Chandramohan and K. Christensen, “A first look at wired sen- 10.1109/AVSS.2006.102.
sor networks for video surveillance systems,” in Proc. 27th Annu. [61] R. Bowden and P. KaewTraKulPong, “Towards automated wide area
IEEE Conf. Local Comput. Netw. (LCN), Nov. 2002, pp. 728– visual surveillance: Tracking objects between spatially-separated, uncal-
729. ibrated views,” IEE Proc.-Vis. Image Signal Process., vol. 152, no. 2,
[40] Z. Dimitrijevic, G. Wu, and E. Y. Chang, “SFINX: A multi-sensor pp. 213–223, Apr. 2005, doi: 10.1049/ip-vis: 20041233.
fusion and mining system,” in Proc. 2003 Joint Conf. Fourth Int. [62] C. Micheloni, G. L. Foresti, and L. Snidaro, “A network of co-operative
Conf. Inf., Commun. Signal Process., Dec., vol. 2, pp. 1128–1132, doi: cameras for visual surveillance,” IEE Proc.-Vis. Image Signal Process.,
10.1109/ICICS.2003.1292636. vol. 152, no. 2, pp. 205–212, Apr. 2005, doi: 10.1049/ip-vis: 20041256.
[41] A. Hampapur, L. Brown, J. Connell, A. Ekin, N. Haas, M. Lu, H. Merkl, [63] M. Albanese, R. Chellappa, V. Moscato, A. Picariello, V. S. Sub-
S. Pankanti, A. Senior, C.-F. Shu, and Y. L. Tian, “Smart video surveil- rahmanian, P. Turaga, and O. Udrea, “A constrained probabilistic
lance: Exploring the concept of multiscale spatiotemporal tracking,” petri net framework for human activity detection in video,” IEEE
IEEE Signal Process. Mag., vol. 22, no. 2, pp. 38–51, Mar. 2005, doi: Trans. Multimedia, vol. 10, no. 8, pp. 1429–1443, Dec. 2009, doi:
10.1109/MSP.20005.1406476. 10.1109/TMM.2008.2010417.
[42] S. Bandini and F. Sartori, “Improving the effectiveness of monitoring and [64] L. Yuan, A. Haizhou, T. Tamashita, L. Shihong, and M. Kaware,
control systems exploiting knowledge-based approaches,” Pers. Ubiqui- “Tracking in low frame rate video: A cascade particle filter with dis-
tous Comput., vol. 9, no. 5, pp. 301–311, Sep. 2005, doi: 10.1007/s00779- criminative observers of different life spans,” IEEE Trans. Pattern
004-0334-3. Anal. Mach. Intell., vol. 30, no. 10, pp. 1728–1740, Oct. 2008, doi:
[43] H. Detmold, A. Dick, K. Falkner, D. S. Munro, A. Van Den Hengel, 10.1109/TPAMI.2008.73.
and P. Morrison, “Middleware for video surveillance networks,” in Proc. [65] R. Cucchiara, C. Grana, A. Prati, and R. Vezzani, “Computer vision
1st Int. Workshop Middleware Sens. Netw. (MidSens), Nov.–Dec. 2006, system for in-house video surveillance,” IEE Proc.-Vis. Image Signal
pp. 31–36. Process., vol. 152, no. 2, pp. 242–249, Apr. 2005, doi: 10.1049/ip-vis:
[44] R. Seals, “Mobile robotics,” Electron. Power, vol. 30, no. 7, pp. 543–546, 20041215.
Jul. 1984, doi: 10.1049/ep.1984.0286. [66] J. A. Besada, J. Garcia, J. Portillo, J. M. Molina, A. Varona, and G. Gonza-
[45] S. Harmon, “The ground surveillance robot (GSR): An autonomous vehi- lex, “Airport surface surveillance based on video images,” IEEE Trans.
cle designed to transit unknown terrain,” IEEE J. Robot. Autom., vol. RA- Aerosp. Electron. Syst., vol. 41, no. 3, pp. 1075–1082, Jul. 2005, doi:
3, no. 3, pp. 266–279, Jun. 1987, doi: 10.1109/JRA.1987.1087091. 10.1109/TAES.2005.1541452.
[46] S. Harmon, G. Bianchini, and B. Pinz, “Sensor data fusion through a [67] S. M. Khan and M. Shah, “Tracking multiple occluding people by local-
distributed blackboard,” in Proc. IEEE Int. Conf. Robot. Autom., Apr. izing on multiple scene planes,” IEEE Trans. Pattern Anal. Mach. Intell.,
1986, pp. 1449–1454. vol. 31, no. 3, pp. 505–519, Mar. 2009, doi: 10.1109/TPAMI.2008.102.
[47] J. White, H. Harvey, and K. Farnstrom, “Testing of mobile surveillance [68] W. Hu, M. Hu, X. Zhou, T. Tan, J. Lou, and S. Maybank, “Principal axis-
robot at a nuclear power plant,” in Proc. IEEE Int. Conf. Robot. Autom., based correspondence between multiple cameras for people tracking,”
Mar. 1987, pp. 714–719. IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 663–671, Apr.
[48] D. Di Paola, D. Naso, A. Milella, G. Cicirelli, and A. Distante, “Multi- 2006, doi: 10.1109/TPAMI.2006.80.
sensor surveillance of indoor environments by an autonomous mobile [69] D.-Y. Chen, K. Cannons, H.-R. Tyan, S.-W. Shih, and H.-Y. M. Liao,
robot,” in Proc. 15th Int. Conf. Mechatronics Mach. Vis. Pract. (M2VIP), “Spatiotemporal motion analysis for the detection and classification of
Dec. 2008, pp. 23–28, doi: 10.1109/MMVIP.2008.474501. moving targets,” IEEE Trans. Multimedia, vol. 10, no. 8, pp. 1578–1591,
[49] A. Bakhtari, M. D. Naish, M. Eskandari, E. A. Cloft, and B. Ben- Dec. 2008, doi:10.1109/TMM.2008.2007289.
habib, “Active-vision-based multisensor surveillance—An implemen- [70] F. Yin, D. Makris, and S. A. Velastin, “Time efficient ghost removal for
tation,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 36, no. 5, motion detection in visual surveillance systems,” Electron. Lett., vol. 44,
pp. 668–680, Sep. 2006, doi: 10.1109/TSMCC.2005.855525. no. 23, pp. 1351–1353, Nov. 2008, doi: 10.1049/el:20082118.
[50] J. J. Valencia-Jimenez and A. Fernandez-Caballero, “Holonic multi- [71] Y. Wang, D. Bowman, D. Krum, E. Coelho, T. Smith-Jackson, D. Bailey,
agent systems to integrate multi-sensor platforms in complex surveil- S. Peck, S. Anand, T. Kennedy, and Y. Abdrazakov, “Effects on video
lance,” in Proc. IEEE Int. Conf. Video Signal Based Surveill. (AVSS), placement and spatial context presentation on path reconstruction tasks
Nov. 2006, p. 49, doi: 10.1109/AVS.2006.58. with contextualized videos,” IEEE Trans. Vis. Comput. Graph., vol. 14,
[51] Y.-C. Tseng, Y.-C. Wang, K.-Y. Cheng, and Y.-Y. Hsieh, “iMouse: An no. 6, pp. 1755–1762, Nov./Dec. 2008, doi:10.1109/TVCG.2008.126.
integrated mobile surveillance and wireless sensor system,” Computer, [72] W. Hu, D. Xie, Z. Fu, W. Zeng, and S. Maybank, “Semantic-based
vol. 40, no. 6, pp. 60–66, Jun. 2007, doi: 10.1109/MC.2007.211. surveillance video retrieval,” IEEE Trans. Image Process., vol. 16, no. 4,
[52] J. N. K. Liu, M. Wang, and B. Feng, “iBotGuard: An internet-based pp. 1168–1181, Apr. 2007, doi:10.1109/TIP.2006.891352.
intelligent robot security system using invariant face recognition against [73] L. Snidaro, R. Niu, G. L. Foresti, and P. K. Varshney, “Quality-based
intruder,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 35, no. 1, fusions of multiple video sensors for video surveillance,” IEEE Trans.
pp. 97–105, Feb. 2005, doi:10.1109/TSMCC.2004.840051. Syst., Man, Cybern. – Part B: Cybern., vol. 37, no. 4, pp. 1044–1051,
[53] H. Liu, O. Javed, G. Taylor, X. Cao, and N. Haering, “Omni-directional Aug. 2007, doi: 10.1109/TSMCB.2007.895331.
surveillance for unmanned water vehicles,” presented at the 8th Int. [74] P. K. Atrey, M. S. Kankanhalli, and R. Jain, “Timeline-based informa-
Workshop Vis. Surveill., Marseilles, France, Oct. 2008. tion assimilation in multimedia surveillance and monitoring systems,” in
[54] I. Pavlidis, V. Morellas, P. Tsiamyrtzis, and S. Harp, “Urban surveillance Proc. 3rd Int. Workshop Video Surveill. Sens. Netw. (VSSN), Nov. 2005,
systems: From the laboratory to the commercial world,” Proc. IEEE, pp. 103–112.
vol. 89, no. 10, pp. 1478–1497, Oct. 2001, doi: 10.1109/5.959342. [75] B. Hardian, “Middleware support for transparency and user control in
[55] J. Krikke, “Intelligent surveillance empowers security analysts,” IEEE context-aware systems,” presented at the 3rd Int. Middleware Doctoral
Intell. Syst., vol. 21, no. 3, pp. 102–104, May/Jun. 2006. Symp. (MDS), Melbourne, Australia, Nov.–Dec. 2006.
[76] A. Dore, M. Pinasco, and C. S. Regazzoni, “A bio-inspired learning [93] M. S. Kankanhalli and Y. Rui, “Application potential of multimedia
approach for the classification of risk zones in a smart space,” in Proc. information retrieval,” Proc. IEEE, vol. 96, no. 4, pp. 712–720, Apr.
IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2007, pp. 1–8, doi: 2008, doi: 10.1109/JPROC.2008.916383.
10.1109/CVPR.2007.383440. [94] A. Prati, R. Vezzani, L. Benini, E. Farella, and P. Zappi, “An integrated
[77] E. Blasch and S. Plano, “Proactive decision fusion for site security,” multi-modal sensor network for video surveillance,” in Proc. ACM Int.
in Proc. 8th Int. Conf. Inf. Fusion, Jul. 2005, pp. 1584–1591, doi: Workshop Video Surveill. Sens. Netw., Nov. 2005, pp. 95–102.
10.1109/ICIF.2005.1592044. [95] S. Calderara, R. Cucchiara, and A. Prati, “Multimedia surveillance:
[78] F. Castanedo, M. A. Patricio, J. Garcia, and J. M. Molina, “Robust data Content-based retrieval with multicamera people tracking,” in Proc. ACM
fusion in a visual sensor multi-agent architecture,” in Proc. 10th Int. Int. Workshop Video Surveill. Sens. Netw., Oct. 2006, pp. 95–100.
Conf. Inf. Fusion, Jul. 2007, pp. 1–7, doi: 10.1109/ICIF.2007.4408121. [96] P. K. Atrey, M. S. Kankanhalli, and R. Jain, “Information assimila-
[79] Y.-C. Tseng, T.-Y. Lin, Y.-K. Liu, and B.-R. Lin, “Event-driven tion framework for event detection in multimedia surveillance systems,”
messaging services over integrated cellular and wireless sensor net- ACM Multimedia Syst. J., vol. 12, no. 3, pp. 239–253, Dec. 2006.
works: Prototyping experiences of a visitor system,” IEEE J. Sel. [97] J. Kim, J. Park, K. Lee, K.-H. Baek, and S. Kim, “A portable surveil-
Areas Commun., vol. 23, no. 6, pp. 1133–1145, Jun. 2005, doi: lance camera architecture using one-bit motion detection,” IEEE Trans.
10.1109/JSAC.2005.845623. Consum. Electron., vol. 53, no. 4, pp. 1254–1259, Nov. 2007, doi:
[80] J.-S. Lee, “A petri net design of command filters for semiautonomous 10.1109/TCE.2007.4429209.
mobile sensor networks,” IEEE Trans. Ind. Electron., vol. 55, no. 4, [98] L. Havasi, Z. Szlavik, and T. Sziranyi, “Detection of gait charac-
pp. 1835–1841, Apr. 2008, doi: 10.1109/TIE.2007.911926. teristics for scene registration in video surveillance system,” IEEE
[81] E. Norouznezhad, A. Bigdeli, A. Postula, and B. C. Lovell, “A high res- Trans. Image Process., vol. 16, no. 2, pp. 503–510, Feb. 2007, doi:
olution smart camera with GigE vision extension for surveillance appli- 10.1109/TIP.2006.88839.
cations,” in Proc. Second ACM/IEEE Int. Conf. Distrib. Smart Cameras, [99] Y. Huang, X. Ao, Y. Li, and C. Wang, “Multiple biometrics system based
Sep. 2008, pp. 1–8, doi: 10.1109/ICDSC.2008.4635711. on DavinCi platform,” in Proc. Int. Symp. Inf. Sci. Eng. (ISISE), Dec.
[82] S. Appadwedula, V. V. Veeravalli, and D. L. Jones, “Energy-efficient 2008, pp. 88–92, doi: 10.1109/ISISE.2008.163.
detection in sensor networks,” IEEE J. Sel. Areas Commun., vol. 23, [100] L.-Q. Xu, “Issues in video analytics and surveillance systems: Re-
no. 4, pp. 693–702, Apr. 2005, doi: 10.1109/JSAC.2005.843536. search/prototyping vs. applications/user requirements,” in Proc. IEEE
[83] A. Maier, B. Rinner, W. Schriebl, and H. Schwabach, “Online multi- Conf. Adv. Video Signal Based Surveill. (AVSS), Sep. 2007, pp. 10–14,
criterion optimization for dynamic power-aware camera configura- doi: 10.1109/AVSS.2007.4425278.
tion in distributed embedded surveillance clusters,” in Proc. 20th Int. [101] J. A. O. O’Sullivan and R. Pless, “Advances in security technologies:
Conf. Adv. Inf. Netw. Appl. (AINA 2006), Apr., pp. 307–312, doi: Imaging, anomaly detection, and target and biometric recognition,” in
10.1109/AINA.2006.250. Proc. IEEE/MTT-S Int. Microw. Symp., Jun. 2007, pp. 761–764, doi:
[84] H. Liu, X. Jia, P.-J. Wan, C.-W. Yi, S.-K. Makki, and N. Pissnou, 10.1109/MWSYM.2007.380051.
“Maximizing lifetime of sensor surveillance systems,” IEEE/ACM [102] H. Gupta, X. Cao, and N. Haering, “Map-based active leader-follower
Trans. Netw., vol. 15, no. 2, pp. 334–345, Apr. 2007, doi: surveillance system,” presented at the Workshop Multi-Camera Multi-
10.1109/TNET.2007.892883. Modal Sens. Fusion Algorithms Appl. (M2SFA2), Marseille, France,
[85] Y. Imai, Y. Hori, and S. Masuda, “Development and a brief evaluation Oct. 2008.
of a web-based surveillance system for cellular phones and other mo- [103] GE Security website. (2009). [Online]. Available: http://www.gesecurity.
bile computing clients,” in Proc. Conf. Hum. Syst. Interact., May 2008, com/portal/site/GESecurity
pp. 526–531, doi: 10.1109/HSI.2008.4581494. [104] ObjectVideo website. (2009). [Online]. Available: http://www.
[86] V. A. Petrushin, O. Shakil, D. Roqueiro, G. Wei, and A. V. Gershman, objectvideo.com/company/
“Multiple-sensor indoor surveillance system,” in Proc. 3rd Can. Conf. [105] IOImage website. (2009). [Online]. Available: http://www.ioimage.com/
Comput. Robot Vis., Jun. 2006, p. 40, doi:10.1109/CRV.2006.50. [106] RemoteReality website. (2009). [Online]. Available: http://www.
[87] P. Korshunov and W. T. Ooi, “Critical video quality for distributed au- remotereality.com/
tomated video surveillance,” in Proc. 13th Annu. ACM Int. Conf. Multi- [107] PointGrey webiste. (2009). [Online]. Available: http://www.ptgrey.com/
media, Nov. 2005, pp. 151–160.
[88] A. May, J. Teh, P. Hobson, F. Ziliani, and J. Reichel, “Scalable video
requirements for surveillance systems,” IEE Intell. Surveill. Syst., pp. 17–
20, Feb. 2004.
[89] A. Avritzer, J. P. Ros, and E. Weyuker, “Reliability testing of rule-
based systems,” IEEE Softw., vol. 13, no. 5, pp. 76–82, Sep. 1996, doi:
10.1109/52.536461. Tomi D. Räty received the Ph.D. degree in informa-
[90] K. Shafique, F. Guo, G. Aggarwal, Z. Rasheed, X. Cao, and N. Haering, tion processing science from the University of Oulu,
“Automatic geo-registration and inter-sensor calibration in large sen- Oulu, Finland, in 2008.
sor networks,” in Smart Cameras. New York: Springer-Verlag, 2009, He is currently a Senior Research Scientist and
pp. 245–257. a Team Leader of the Software Platforms Team at
[91] C. Caricotte, X. Desurmont, B. Ravera, F. Bremond, J. Orwell, S. A. Ve- VTT Technical Research Centre of Finland, Oulu.
lastin, J. M. Obodez, B. Corbucci, J. Palo, and J. Cernocky, “Toward His research interests include surveillance systems,
generic intelligent knowledge extractions from video and audio: The model-based testing, network monitoring, software
EU-funded CARETAKER project,” in Proc. Inst. Eng. Technol. Conf. platforms, and middleware. He is the author or coau-
Crime Secur., Jun. 2006, pp. 470–475. thor of more than 20 papers published in various
[92] S. Fleck and W. Strasser, “Smart camera based monitoring system and conferences and journals.
its application to assisted living,” Proc. IEEE, vol. 96, no. 10, pp. 1698– Dr. Räty has served as a Reviewer for IEEE TRANSACTIONS ON MOBILE
1714, Oct. 2008, doi:10.1109/JPROC.2008.928765. COMPUTING and in several conferences.

2 Surveillance

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 Surveillance

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO.

5, SEPTEMBER 2010 493

Survey on Contemporary Remote Surveillance

1094-6977/$26.00 © 2010 IEEE

Fig. 1. Illustration of a typical processing flow in video surveillance systems

processing devices to achieve the benefits of scalability and

Fig. 4. Illustration of tracking with an occluding object passing across the

to achieve the exhaustive surveillance task as automatically as

door environment, where the lighting conditions varies. This

environment. In most situations these devices must be small in

Fig. 11. Generic architecture of a monitoring/control pervasive system [42].

Fig. 11 presents an illustration of an MCS, which is structured

F. Architecture and Middleware

In detail, the GSR was an experimental M114 personnel car-

Fig. 13. Platform model of iBot [52].

stacles. These systems typically extract purposeful information

reasoning the occlusion between standing humans. Exploiting a

data processing and computation, are available and can decrease

tioning sensors, and enhanced continuity with complementary

VIII. GROWING TECHNOLOGIES AND TRENDS

You might also like