Professional Documents
Culture Documents
Principal Investigator
1
Distributed Compositional Operations for
Aggregation and Visualization of Cyberspace Data
PI: K. Ravindran
City University of New York
Contact information: Email: ravi@cs.ccny.cuny.edu; Ph.: 212-650-6218.
TASK SUMMARY
The proposed project deals with designing composition operators for data aggregation and vi-
sualization in a geographically distributed information network. The composition functions are
prescribed by the application to act upon the data collected from the external environment. The
component data may pertain to a certain geographic area being observed and/or disparate sub-
systems being controlled. The fusion of a set of such data components for dissemination purposes
involves applying logical operations on these data to yield a high level representation of the exter-
nal phenomenon being studied. An example is the assessment of terrorist threat level in a certain
geographic area based on the monitored communications, the social demographics, the time-frame
in question, and the presence of high-profile targets. For such complex application domains, the
existing models of using simple syntactic rules for data aggregation are not sufficient. Instead,
multiple interpretations of the low level data should be composable for analysis purposes. A visual
representation of the composed data will also allow the human commanders to aid more complex
compositions of the data. In this project we propose semantics-based operators that can be in-
stalled at aggregation nodes in order to customize an application-specific interpretation of the raw
data. Such data aggregation nodes are part of a overlay tree set up over the distributed informa-
tion network, working as application proxies to drive the necessary operations on data collected
from the external environment. The semantics-aided data aggregation can improve the situational
awareness of applications (be it in military or commercial or industrial settings), in comparison to
the currently prevalent syntactic rule-based models of processing data.
We shall develop the proxy-based overlay architecture for data aggregation, with a set of re-
lational operators that can be applied on the target data items. Both the spatial and temporal
characteristics of data can be incorporated in the fusion operators that are installed at the proxy
nodes of the aggregation tree. The formal properties of the fusion operators will be identified,
and a prototype implementation of the fusion system will be undertaken. GUIs will be developed
that allow the human users to install customized functions and operators in the proxy nodes and
visualize the data in different ways. Case studies of applications will be undertaken for analyzing
‘denial of service’ attacks on distributed networks and for application-specific QoS control with
dashboard-like visual interfaces, to demonstrate the benefits of our semantics-based approach.
1
TECHNICAL APPROACH
We begin with generalized structure of a geographically distributed system maintaining different
data components that are related. It provides the basis for our proposed overlay tree based data
aggregation using application-supplied fusion operators.
2
Figure 1: Structure of a geographically distributed information network
Since an affected subsystem impacts the visible behavior of a DCS in some way, we need to
instrument mechanisms for detecting deviations from the expected behaviors. To enable construc-
tion of these mechanisms, a DCS should be modularized in such a way that the overall behavior of
interest can be inferred from the easily controllable and/or observable behavior of system compo-
nents in various geographic areas. The high-level inference involves aggregating the data collected
through the various components.
3
Figure 2: Tree-structured overlay for semantics-driven data fusion
The criteria used for data aggregation needs to be embodied into the functions placed in the
overlay nodes. These functions act as proxies for the application, in that a projection of the data
processing needs in a certain geographic area is encapsulated into the overlay function serving that
area.
4
known before-hand at the start of a data aggregation procedure. In the absence of comprehensive
analytical models to capture the relations between the various data components, heuristics-based
models offer effective means of relating them over specific operating regions of the system. This in
turn calls for an active participation of human users during the event inferencing over a distributed
information network.
User participation requires an ability to reconfigure event prescriptions as the partially aggre-
gated data propagates upstream along the overlay tree. In an example of target tracking over a
terrain, the system may initially look for certain simple patterns in the terrain images. When these
patterns so appear in the images, there may be a need to examine other image-level attributes of
the terrain, such as the patterns induced by, say, dense clouds over the terrain, to delineate the
suspicious objects from the aberrations caused by other environmental parameters. Thus, we need
user-level interface tools for prescribing data aggregations, say, with visual representations of the
meta-relationships between data elements.
5
Figure 3: A schema for on-tree event aggregation
6
An aggregation of events at various nodes in the tree typically enlarges the time-scale of changes
in the resulting macro-level data. An example is to determine if there is a sustained lack of morale
in the deployed soldier battalion, based on the observations of spatially separated smaller groups.
The overall loss of morale is then a maximum of the per-group observations. Since the ’max’
operator selects the highest of a set of rapidly fluctuating per-group morale levels at any given time,
the composite metric varies slowly in comparison. Consider another example, namely, vehicular
networks. Here, the traffic congestion on a given route is the maximum of the reported congestion
levels in the various stretches of roads along that route.
A domain-specific interpretation of the events in different regions cannot be adequately captured
with the standard mathematical operators of aggregation — as argued in [14]. For example, the
effect of a morale loss in one group of soldiers on the overall battle-readiness of the battalion
spanning the adjoining regions cannot be expressed through simple syntactic connectives. Instead,
the role of the group in the overall battle plan needs to be taken into account. This motivates the
need for a semantic knowledge in interpreting events.
7
Figure 4: Functional view of mapping feature spaces to objects
Mi : Fi × Li −→ O,
where Li is a set of logical formulas applied on the feature values instantiated for F i . See Figure 4
for illustration. Mi often employs ’statistical pattern recognition’ techniques to obtain a functional
view of the data space of Fi sensed by Vi [12], as captured through the logical formulas L i .
Consider an example of detecting the incidence of disease outbreak, say, Malaria, in a geographic
area where troops are deployed. Here, F i may depict the input data features that describe the
conduciveness of mosquito breeding: such as water stagnancy and land marshness, atmospheric
temperature, and altitude of the terrain; whereas, O depicts the various types of Malaria outbreaks
and their intensity levels. The algorithmic procedure M i may use, say, the water stagnancy and
marshness as the primary features to first determine the possibility of Malaria outbreaks: such
as a rule clause 0 stagnancy0 > 6 days; thereafter, the atmospheric temperature and the population
demographics may be used to distinguish different types of Malaria — L i is a set of such clauses.
These clauses can be represented as nodes of an aggregation tree.
8
algorithm oi = Mi (Fi , Li ) has |Fi | |F ∗ | and |Li | |L∗ |. Though the non-represented features
F ∗ − Fi are deemed as less important by Mi , they do have some impact on the ability of M i to
distinguish one object from another in certain value regions of the missing feature space.
We then say that k oi − o∗ k is a measure of the accuracy of the device V i . The uncertainty in the
detection of an object by Vi is captured by a confusion probability parameter p i , i.e., the probability
that an object oi reported by Vi indeed matches with the actual physical world phenomenon o ∗
within an accuracy level — where 0.5 p i < 1.0.
Voting among such replica devices provides an overall confidence level Θ that is higher than
the per-device confidence level in the system, i.e., max({p i )}i=1,2,···,N ) < Θ < 1.0. This algorithmic
requirement is expressed by the mathematical relation:
N −1
(1 − K1 [1 − pi (Xj )]K1 +1−K2 ) > Θ, (1)
where K1 and K2 are the number of consenting and dissenting votes on a data X j ∈ O proposed
by Vi — assuming that devices have the same capability for object classification. For example,
pi = 0.85 and N = 10 can achieve a confidence level of 98% with replica voting 3 .
The users who disseminate the data are part of the voting functionalities. The computational
aspects of the voting can be incorporated as part of a GUI in the data aggregation software tools.
The enforcement of timeliness constraints requires knowledge of the overlay tree topology and the
data delays incurred in the various fusion path segments.
9
4.2 Event predicates
In general, an event may be represented as a condition on the data components O collected from
the external environment. An event is said to occur when user-prescribed condition on the data
components holds, i.e., a predicate L 0 (O) = true where L0 (· · ·) is a logical formula applied on the
data space O.
L0 is an ‘applicative’ function that maps the observed value of data to a boolean result (such as
>, <, =, +, · · ·). The functions {L0 } may be hooked onto the computation as ‘plug-ins’ at the
on-tree proxy nodes that can be invoked by the data fusion procedures. In comparison with the
work in [3], our notion of predicates blends the notion of ‘flow of real-time’ and ’spatial spreadout’
as part of a predicate formulation.
The condition L0 (· · ·) is supplied by a user to the data fusion procedures through a GUI. In
a way, we carry out event filtering, wherein an event filter is a programmable selection criterion
for classifying or selecting data sets from the raw data collected from the environment. The filter
conditions can be prescribed in a high-level declarative language that is compiled into the data
fusion portion of the system.
d{loc(p)}
evnt spec(FIRE) ≡ ((avg(loc(p)) < T 1 ) ∧ ( > T2 )).
dt
Suppose the missile system which was readied for firing needs to be pulled back when p moves to
a distance beyond a threshold T10 (> T1 ) — which implies that d{loc(p)}
dt > 0 — to avoid a firing4 .
This relation may be expressed as:
d{loc(p)}
evnt spec(NOFIRE) ≡ ((avg(loc(p)) > T 10 ) ∧ ( > 0)).
dt
The radar device that samples loc(p) is embedded into the snapshot-taking mechanism which filters
the timed samples from the radar.
Consider the earlier example of determining the ‘inference reliability’ parameter for a bank of
sensors that detect enemy movements in a battle terrain. A property of this surveillance function
may be prescribed as:
Ng
evnt spec(SENREL) ≡ ((Ns > 10) ∧ ( > 0.7)),
Ns
where Ns is the total number of sensors in the bank and N g is the number of non-faulty sensors.
The underlying data fusion procedures determine N g from a knowledge of the set of sensors in
the bank (such as ’majority voting’ among sensors), which allows the evaluation of predicates.
With the above form of predicates loaded into the data fusion GUI, the data fusion procedure
4
That T10 > T1 implies a ‘hysteresis’ during increasing trends of loc(p), to avoid ‘chattering’ of the missile system.
10
(···)
computes the applicative functions — ‘avg(· · ·)’/‘ d(···)
dt ’ and ‘<’/‘ (···) ’ on the data collected from
leaf nodes, namely, loc(p) and Ns /Ng respectively. This in turn allows the evaluation of predicates:
evnt spec(FIRE) and evnt spec(SENREL), as the case may be.
11
with one another (unlike benign failures which are statistically independent of one another) [17],
the temporal and/or spatial relationships between the various observed symptoms can be easily
gleaned using a tree-like visual representation of the relationships. For instance, the loss of a
communication link in the network can be correlated with a reduction in the data arrival rate at an
end-point (such as the events ey and ex in Figure 1). Here, the link will appear as a lower node in
the tree and the data rate will appear as an upper node. All the causal events that can impact the
data arrival rate (including the link availability) will then be represented as lower nodes connected
to the upper node.
The case study will involve using a CISCO router based network testbed with Spirent traffic
analyzers that are available at CUNY. The traffic analyzers can inject arbitrary amounts of data
at selected points in the network to simulate different types of DOS attacks. Delay and bandwidth
sensors will be implanted at the observation points. Where necessary, multiple sensor algorithms
will be installed (say, different bandwidth estimation methods) to enhance the confidence levels in
the event reporting by replica voting.
The goal of the case study will be to assess the effectiveness of visual GUI aided data aggregation
methods in accurately detecting DOS attacks.
12
Figure 5: A schematic of dashboard-like visual interface for QoS control
user can guage the delays involved in the reporting of objects. Network monitors will sample the
various resource information (such as bandwidth, links, processing speeds, and battery power) and
aggregate them to update the resource indicator bar.
We shall use the network testbed at CUNY to realize the dashboard interface. Standard image
processing algorithms [?] will be constructed in a layered fashion, and then applied on (publicly
available) target data to establish the feasibility of dashboard-like QoS control.
13
Figure 6: Software components of semantics-driven data fusion system
14
6.0 Expertise of project personnel
6.1 Qualifications of PI K. Ravindran: Professor of Computer Science (City College,
CUNY); Ph.D. (Computer Science, 1987, University of British Columbia); research ar-
eas: distributed collaboration systems & protocols, information assurance systems, service-
level management of network infrastructures; managed external grants/contracts of about
$1.1M; has about 90 refereed publications; 17 years of university experience and 5 years
of experience in space and communication industries.
This PI has studied the on-line monitoring and control paradigms for distributed multimedia net-
works and enterprise web server systems. These works involve extensive simulation modeling of the
underlying network and server infrastructures. This expertise will help in the conduct of the AFRL-
HE project. The PI has also worked on coarse granular event management services in distributed
collaboration settings. The PI has extensively worked in the area of replica voting algorithms to
decide among conflicting results involving deterministic input data — partly through summer fel-
lowships at the Air Force Research Lab (Rome, NY) during 2001-07. This expertise will be useful
in the application case studies.
15
A: STATEMENT OF WORK
TASKS/TECHNICAL REQUIREMENTS
The contractor shall accomplish the following:
A.1 Study models for characterizing the external environment behaviors and how they map into
specific data characteristics;
A.2 Develop the distributed control mechansims from the high-level specification of data fusion
procedures;
A.3 Develop algorithms for event filtering at the protocol level and for event propagations at the
application interface level;
A.4 Develop designer-friendly GUIs for rapid event compositions and visual evaluation in the
in-house network testbed at CUNY;
A.5 Develop SQL-based and XML-based schemas for specifying event filters and aggregation
rules;
A.6 Carry out application case studies of data aggregation procedures for use in military settings:
i) the analysis of DOS attacks on distributed networks using visual tools, and ii) the study of
dashboard-like visual interfaces for QoS control and the underlying system structures.
The task deliverables A.1-A.6 will be accomplished on a software-level simulation testbed with
real data obtained from AFRL-HE offices.
During the project period, the contractor shall write technical reports outlining the progress of
research work for dissemination of the results (both positive and negative) to the technical liaison
groups in the AFRL-HE. To permit full understanding of the techniques and procedures used in
evolving the protocol testing technology, the reports will include pertinent observations, nature of
technical problems, design methods used, computer algorithms developed, etc. The contractor shall
also make detailed technical presentations on the progress of work at the above site semi-annually
during the project period, and present the completed work (including the demonstration of data
fusion software tools and applications) at the end of the project period. In addition, the contractor
shall provide a demonstration of the ‘in-progress’ project works on the CUNY simulation testbeds
in about 16 months from the start of the project.
The contractor shall also write research articles and technical papers for publication in journals
and conferences for wider dissemination of the results across academic and industrial research
communities in the areas of advanced network architectures. Such publications will carry a citation
to acknowledge the AFRL-HE contract in supporting the published work.
References
[1] S. Chamberlain. Automated Information Distribution in Bandwidth-constrained
Environments. In Proc. Milcom’94, North-Holland pub., 1994.
16
[2] H. Kopetz and P. Verissmo. Real Time Dependability Concepts. Chap. 16, Distributed
Systems, ed. S. Mullender, Addison-Wesley Publ. Co., 1993.
[4] Y. Yao and J. Gehrke. The Cougar Approach to in-network Query processing in
Sensor Networks. In ACM SIGMOD Record, 2002.
[8] K. Ravindran and Jun Wu. Programming Models for Behavioral Monitoring of Dis-
tributed Information Networks. in proc. Distributed Systems and Real-time Applications,
IEEE-DSRT 2005, Oct.2005.
[9] K. Birman and et al. Astrolebe: a publish-subscribed based event processing system.
in Technical reports, Cornell University, 2002-2005.
[12] D. G. Stork, R. O. Duda, and P. E. Hart Pattern Recognition Systems. Chapter 1.3, Pattern
Classification, 2000.
[13] W. Hu, A. Misra, and R. Shorey. CAPS: Energy-Efficient Processing of Continuous Aggregate
Queries in Sensor Networks. In proc. 4th Intl. conf. on Pervasive Computing and Communi-
cations, IEEE-PerCom’06, pp.190-199, June 2006.
[14] S. Kabadayi, A. Pridgen, and C. Julien. Virtual Sensors: Abstracting Data from Physical
Sensors. In Technical Report 2006-01, University of Texas at Austin, 2006.
[15] K. Stranc. Airborne Networking. In Presentation, MITRE Corporation, Public Release Ref.#
04-0941, 2004.
[16] R. Clark, E.D. Jensen, A, Kanevsky, J. Maurer, T. Wheeler, Y. Zhang, D. Wells, T. Lawrence,
and P. Hurley. An Adaptive, Distributed Airborne Tracking System. In proc. IEEE WPDRTS,
vol.1586 of LNCS, 1999.
17
[18] J. Jachner, S. Petrack, E. Darmois, and T. Ozugur. Rich Presence: A New User Communica-
tions Experience. In Alcatel Telecommunications Review, Technology White Paper, pp.73-77,
1st quarter 2005.
18
K. Ravindran
Current Position: Professor of Computer Science
City University of New York (City College), New York (joined in 1996)
Degrees Received: Ph.D (Computer Science), University of British Columbia, Canada, 1987
M.Eng (Computer Science & Automation), Indian Institute of Science, Bangalore, 1978
B.Eng (Electronics), Indian Institute of Science, 1976.
3. Assistant Professor,
Department of Computer Science, Indian Institute of Science (1988).
4. Teaching/Research Assistant,
Department of Computer Science, University of British Columbia (1983 – 1987).
Collaborators: 1. Dr. Kevin A. Kwiat, Air Force Research Laboratory (Rome, NY)
(2002-present) Areas: Information Assurance and Security
2. Dr. K. K. Ramakrishnan, AT&T Research (Flohram Park, NJ)
Areas: Rate Control for Video Distribution in Large Multicast Groups
3. Prof. D. Kumar, City College of CUNY (New York, NY)
Areas: Design and Validation of Distributed Protocols
19
Teaching activities: 1. Instructor for graduate Distributed Systems course
2. Instructor for graduate level and undergraduate level
operating systems and computer networks courses
3. Instructor for undergraduate level Data Structures & Algorithms
20
Publications relevant to proposal
References
[1] X. Liu, K. Ravindran, and D. Loguinov. A Queuing-theoretic Foundation of Available
Bandwidth Estimation: Single Hop Analysis. accepted for publication in IEEE/ACM
Transactions on Networking, June 2006.
[2] K. Ravindran and Jun Wu. Architecture for Dynamic Protocol-level Adaptation for
Enhancing Network Service Performance. In proc. IEEE/IFIP Conf. on Network Oper-
ations and Management (NOMS’06), Vancouver (Canada), April 2006.
[4] A. Sabbir and K. Ravindran. Concurrency Control for Interactive Sharing of Data
Spaces for Distributed Real-time Collaborations. in proc. of IEEE Intl. Symp. on
Distributed Simulation and Real-time Applications, Montreal (Canada), Oct. 2005.
[5] K. Ravindran and X. Liu. Service-level Management Frameworks for Adaptive Dis-
tributed Network Applications. in Springer Verlag Lecture Notes on Computer Science,
(Service Availability), LNCS 3335, 2005.
[6] K. Ravindran, J. P. Fortin, and X. Liu. Flow Management for QoS-controlled ’Data
Connectivity’ Provisioning. accepted for publication in ElSevier Journal of Computer
Communications, Febr. 2005.
[10] K. Ravindran and X. T. Lin. Structural Complexity and execution Efficiency of Dis-
tributed Application Protocols. In Proc. Conf. on Communication Architectures, Protocols
and Applications, ACM SIGCOMM, San Fransisco (CA), pp.160-169, Sept. 1993.
21
Other significant Publications
References
[1] K. A. Kwiat, K. Ravindran, and P. Hurley. Energy-efficient Replica Voting Mechanisms
for Secure Real-time Embedded Systems. In proc. of Intl. Conf. on World of Wireless
and Mobile Multimedia, WOWMOM’05, Taormina (Italy), June 2005.
[2] K. Ravindran, K. A. Kwiat, and A. Sabbir. Adapting Distributed Voting Algorithms for
Secure Real-time Embedded Systems. In proc. of workshop on Distributed Auto-adaptive
Reconfigurations in Software Systems, ICDCS-DARES’04, Tokyo (Japan), March 2004.
22