Distributed Compositional Operations For Aggregation and Visualization of Cyberspace Data

Distributed Compositional Operations for
Aggregation and Visualization of Cyberspace Data
A Proposal submitted to the
Air Force Research Laboratory

AFRL/RH Human Effectiveness Directorate
Det 1 AFRL/PKHA, Bldg 167

2310 Eighth Street
Wright-Patterson AFB, OH, 45433-7801
BAA-08-02-RH (AFRL/RH FY08 HBCU/MI Set Aside Program)
Contracting POC: Rhonda L. Powderly, (937)656-9046
(Technical Area No. RHC-1: Synthesizing the Dynamics of Cyberspace

Visual Renderings for Distributive Systems to Characterize
Cyber Attack, Performance, and Vulnerability
Amount: $99,887 Duration: 14 months Start date: August 2008
Principal Investigator
Prof. Kaliappa Ravindran Phone number: (212)650-6218

Department of Computer Science Fax number: (212)650-6248
The City College of CUNY Email address: ravi@cs.ccny.cuny.edu
160 Convent Avenue
New York, NY 10031.
Authorized Institutional Representative
Regina Masterson, Director Phone number: (212)650-5418

Office of Research Administration Fax number: (212)650-7906
The City College of CUNY Email address: rmaster@scisun.sci.ccny.cuny.edu
1
Distributed Compositional Operations for
Aggregation and Visualization of Cyberspace Data
PI: K. Ravindran
City University of New York
Contact information: Email: ravi@cs.ccny.cuny.edu; Ph.: 212-650-6218.
TASK SUMMARY
The proposed project deals with designing composition operators for data aggregation and vi-
sualization in a geographically distributed information network. The composition functions are
prescribed by the application to act upon the data collected from the external environment. The
component data may pertain to a certain geographic area being observed and/or disparate sub-
systems being controlled. The fusion of a set of such data components for dissemination purposes
involves applying logical operations on these data to yield a high level representation of the exter-
nal phenomenon being studied. An example is the assessment of terrorist threat level in a certain
geographic area based on the monitored communications, the social demographics, the time-frame
in question, and the presence of high-profile targets. For such complex application domains, the
existing models of using simple syntactic rules for data aggregation are not sufficient. Instead,
multiple interpretations of the low level data should be composable for analysis purposes. A visual
representation of the composed data will also allow the human commanders to aid more complex
compositions of the data. In this project we propose semantics-based operators that can be in-
stalled at aggregation nodes in order to customize an application-specific interpretation of the raw
data. Such data aggregation nodes are part of a overlay tree set up over the distributed informa-
tion network, working as application proxies to drive the necessary operations on data collected
from the external environment. The semantics-aided data aggregation can improve the situational
awareness of applications (be it in military or commercial or industrial settings), in comparison to
the currently prevalent syntactic rule-based models of processing data.
We shall develop the proxy-based overlay architecture for data aggregation, with a set of re-
lational operators that can be applied on the target data items. Both the spatial and temporal
characteristics of data can be incorporated in the fusion operators that are installed at the proxy
nodes of the aggregation tree. The formal properties of the fusion operators will be identified,
and a prototype implementation of the fusion system will be undertaken. GUIs will be developed
that allow the human users to install customized functions and operators in the proxy nodes and
visualize the data in different ways. Case studies of applications will be undertaken for analyzing
‘denial of service’ attacks on distributed networks and for application-specific QoS control with
dashboard-like visual interfaces, to demonstrate the benefits of our semantics-based approach.
1
TECHNICAL APPROACH
We begin with generalized structure of a geographically distributed system maintaining different
data components that are related. It provides the basis for our proposed overlay tree based data
aggregation using application-supplied fusion operators.
1: Behavioral view of distributed systems

The distributed structure of a system consists of multiple computation nodes (or sub-systems)
that collaborate with one another to carry out an application task. The task execution state is
determined by the spontaneous changes occurring in external environment and the service-level
requirements prescribed by application entities. The meta-data pertaining to the external envi-
ronment and the service-level needs of applications are made available to the control task through
a set of sensors and actuators. Human elements and/or intelligent agents at distinct geographic
locations are part of the distributed control system (DCS), interacting with one or more nodes
to obtain strategic decision support. In a ‘digital battle field’ for instance, the environment may
consist of infrared sensors to collect data about enemy troop and machinery movement in a certain
terrain. The computation subsystems are interconnected by a geographically distributed informa-
tion network that make the relevant pieces of data available to these subsystems at various points
in time, to enable them carry out the application task [1].
1.1: Modular structuring of a DCS

The behavior of a DCS as a whole may be continually changing, sometimes to even below acceptable
threshold levels of operations. This may be due to sub-system level components (viz., computa-
tion nodes and/or their interconnections) being forced to operate in a degraded mode because of
undesirable changes occurring in the external environment (say, ’denial of service’ attacks on a
network) and/or component failures. Visible changes in a system behavior mean that the behavior
is measurable in terms of concrete parameters of the system. See Figure 1.
Consider, as an example, sensor devices deployed in the field of operations, say, to detect enemy
movements. These sensors are prone to failures (such as the loss of battery power to sensors).
At a macro level, the reliability of inferences made about enemy movements, which depends on
the accuracy and timeliness of sensor data, is a measurable property of the detection system.
Some form of ‘majority voting’ technique (such as ‘at least 10 sensors in the field have detected
enemy movements) may be employed to increase the quality of inferences. The decision-making by
commanders, say, about attacks on enemy positions, may itself may be based on such a quantitative
assessment of the ‘inference reliability’ (or confidence level) parameter 1 .
Consider a case of public health surveillance system. When the incidence rate of asthma in a
certain geographic area exceeds a threshold (say, as observed from the number of patient visits to
clinics), the system may dispatch additional medical resources to the area (such as ambulances,
medicine supplies, and nursing staff). The system may also initiate actions that have large time-
scale effects — such as reducing the aerosol concentrations in the atmosphere. A combination of
such corrective actions may be put in effect until the asthma incidence rate falls below the threshold.
1
The sensors consitute the ’resources’ that improve the quality of inference in the battlefield application.
2
Figure 1: Structure of a geographically distributed information network
Since an affected subsystem impacts the visible behavior of a DCS in some way, we need to
instrument mechanisms for detecting deviations from the expected behaviors. To enable construc-
tion of these mechanisms, a DCS should be modularized in such a way that the overall behavior of
interest can be inferred from the easily controllable and/or observable behavior of system compo-
nents in various geographic areas. The high-level inference involves aggregating the data collected
through the various components.
1.2: Overlay trees

Our aggregation architecture is based on a tree-structured overlay set up over an information net-
work, in which the root node is attached to a data dissemination station and the leaf nodes are
attached to the data collection modules in different geographic regions. An intermediate mode in
the tree aggregates the data emanating from its downstream nodes, and this partially aggregated
data propagates up the tree towards the root. See Figure 2.
Suppose, in the earlier example of sensors, it is stipulated that the confidence level in the
reliability of ‘enemy movement’ information in a certain area exceed a certain threshold, say 70%,
in order to initiate a pre-emptive strike on enemy positions. Here the overlay node carries out a
majority voting on the sensor data collected at the leaf nodes. When the number of active sensors
falls below the corresponding ‘majority’ figure in the number of sensors detecting enemy movement
(say, 10), the detection system can no longer be able to supply the information at the critical level
of reliability. The system should then be able to notify a ‘failure’ to the C2 center, so that the
latter can resort to alternate means of inferring enemy movements.
3
Figure 2: Tree-structured overlay for semantics-driven data fusion
The criteria used for data aggregation needs to be embodied into the functions placed in the
overlay nodes. These functions act as proxies for the application, in that a projection of the data
processing needs in a certain geographic area is encapsulated into the overlay function serving that
area.
1.3: User-level control of data aggregation

A high level inference of events in the cyberspace, say, for situational awareness, may involve many
examining many data components, and require capturing the causal and timing relationship be-
tween them. Condier an example of health assessment and fatigueness of soldiers deployed in a
geographic region. The influential parameters are, say, the climatic conditions, social demopgraph-
ics (of the local population), distance from the nearest base, and the medical facilities. Their
relationship may often not be expressible in a closed form, due to the non-separable ways in which
one parameter may impact the others. In general, only imprecise information may be available in
the system, compounded by the lack of quantitative models of the impact of various parameters.
Often, the occurrence of an event e may influence the prescription of which future events
are possible in the system state in which e occurred. The large problem dimensionalities and
incompleteness in system-level characterizations often make the required event prescriptions not
4
known before-hand at the start of a data aggregation procedure. In the absence of comprehensive
analytical models to capture the relations between the various data components, heuristics-based
models offer effective means of relating them over specific operating regions of the system. This in
turn calls for an active participation of human users during the event inferencing over a distributed
information network.
User participation requires an ability to reconfigure event prescriptions as the partially aggre-
gated data propagates upstream along the overlay tree. In an example of target tracking over a
terrain, the system may initially look for certain simple patterns in the terrain images. When these
patterns so appear in the images, there may be a need to examine other image-level attributes of
the terrain, such as the patterns induced by, say, dense clouds over the terrain, to delineate the
suspicious objects from the aberrations caused by other environmental parameters. Thus, we need
user-level interface tools for prescribing data aggregations, say, with visual representations of the
meta-relationships between data elements.
2: Relationship to current models of data aggregation

There are proposals studied elsewhere for in-network data aggregation, particularly, in the context
of sensor networks — such as TinyDB [5] and Cougar [4]. The goal of in-network data aggregation is
to reduce the energy costs that are otherwise incurred when raw data are sent to the disseminating
nodes. And these works are mainly for homogeneous data types, such as all the sensors observing
the environment temperature in a given region. In contrast, our proposal is for heterogeneous data
types, with the rules of aggregation determined by the semantic relationships between various data.
Directed diffusion techniques [6] employ an attribute-based naming of data, which provides a
basis for specifying the relationships between them. But the goal of this work is more on the network
protocols and algorithms for in-network aggregation, and less on exploiting the data semantics in
the aggregation operations. The SensorML language developed elsewhere [7] allows defining the
external characteristics of a physical sensor device using XML-based schemas. This language is
specifically designed for satellite-based sensor data collections; its use for defining high level data
semantics has not been explored.
Virtual Sensors [14] is an abstraction that allows heterogeneous physical sensors to be viewed
uniformly through a canonical interface. It provides a publish-subscribe interface to the sensor
data, for applications to extract the aggregated data they need. This work however focuses more
on network-oriented data and parameters, with the main goal being to reduce the energy of sensor
devices operating in wireless mode.
In comparison with the above works, our proposal differes in two significant areas. First, it
offers a framework to fuse disparate data items based on the semantic relationship between them.
Second, our notion of sensors is far abstract, in that any data collection system in the external
environment of an application is treated as a (logical) sensor. For instance, a question-answer
session with a soldier in the field can be abstracted as a sensor that collects data about the mental
health and fatigueness of the soldier. Thus, our generalized notion of sensors and data fusion goes
beyond the existing notions of physical sensor devices. The meta-model of sensors in our approach
can thus be instantiated in diverse application domains (with domain-specific stub interfaces, of
course) — thereby reducing the system development effort and costs by software re-use.
5
Figure 3: A schema for on-tree event aggregation
3: Data aggregation in on-tree nodes

We now describe the high level aggregation operations carried out by the on-tree nodes in our
model2 . There are two reasons for the on-tree aggregation of events as they surface, instead of
aggregating all the events at the root node. First, it enhances the scalability of event reporting
system when large amounts of data are collected. Second, it entails a faster reaction to the events
by overlay nodes as soon as a composite situation emerges that warrants an action (e.g., responding
to traffic congestion events in a vehicular network).
3.1 Aggregation using syntactic rules

Let Θ1 and Θ2 be the confidence intervals of the data delivered at an overlay node Z from its two
downstream segments. With only a syntactic processing of the two distinct events, a confidence
measure associated with the combined data sent by Z to its upstream node is: min({Θ 1 , Θ2 }).
See Figure 3 for an illustration.
Similarly, other types aggregation operators can be implemented in Z such as addition, maxi-
mum, average, median, set union & intersection, selection, and the like. Scalability considerations
require that these operators satisfy the commutativity and associativity properties [11]. These prop-
erties allow an efficient examination of the events arriving asynchronously from various downstream
nodes (by avoiding inter-event synchronizations).
2
The data aggregation functions in on-tree overlay nodes and the communication functions between overlay nodes
are orthogonal aspects.
6
An aggregation of events at various nodes in the tree typically enlarges the time-scale of changes
in the resulting macro-level data. An example is to determine if there is a sustained lack of morale
in the deployed soldier battalion, based on the observations of spatially separated smaller groups.
The overall loss of morale is then a maximum of the per-group observations. Since the ’max’
operator selects the highest of a set of rapidly fluctuating per-group morale levels at any given time,
the composite metric varies slowly in comparison. Consider another example, namely, vehicular
networks. Here, the traffic congestion on a given route is the maximum of the reported congestion
levels in the various stretches of roads along that route.
A domain-specific interpretation of the events in different regions cannot be adequately captured
with the standard mathematical operators of aggregation — as argued in [14]. For example, the
effect of a morale loss in one group of soldiers on the overall battle-readiness of the battalion
spanning the adjoining regions cannot be expressed through simple syntactic connectives. Instead,
the role of the group in the overall battle plan needs to be taken into account. This motivates the
need for a semantic knowledge in interpreting events.
3.2 Aggregation using semantic knowledge

Information network applications often require abstracted measurements of the diverse environment
phenomena (or events) in various geographic regions. These measurements need to be interpreted
using a semantic relationship between the events (which may take into account the weak con-
sistency and the temporal correlation among events [13]). Typically, the confidence level in the
reporting of a combined event can be increased with a semantic knowledge that interconnects the
two independently reported events.
As an example, consider the detection of a plane (in terms of speed and location) by the devices
in region 1 followed by the detection of a plane by the devices in an adjacent region 2 after a
certain time interval T . If the geographic distance between regions 1 and 2 depicts a flight time
close to T at the given speed, then it is highly likely that the object detected in regions 1 and 2
refers to the same plane. So, when the detection reports from regions 1 and 2 arrive at the overlay
node Z, the latter may aggregate them into a single report with a confidence measure higher than
max({Θ1 , Θ2 }). The timing correlation in the two reports increases the confidence level of the
combined report to higher than that of the individual reports.
Where semantic knowledge is used, the aggregation operations on two events may have to
be carried out in a certain order (i.e., the operations may not satisfy the commutativity and/or
associativity properties). Typically, each overlay node may implement the required synchroniza-
tion between the arrival of various data items from its downstream nodes, based on the ordering
relationship between the data items — such as the causal relationship between events.
3.3 Replica voting based on data classification

A computer algorithm or system sampling an environment parameter is less-than-100% certain
about the accuracy of the indicator due to the large dimensionality of input data (e.g., a question-
answer session with soldiers to reason about their morale). To increase the confidence level on
the outcomes in a quantifiable way, we resort to replica voting among the various computational
modules. Here, voters are the computational evaluation algorithms that are replicated to observe
the same external parameter (or, object).
7
Figure 4: Functional view of mapping feature spaces to objects
We denote the ‘data classifier’ implemented by a voter algorithm V i as Mi (Fi , Li )|i=1,2,···,N ,

where Mi is an algorithmic procedure operating on a feature set F i to describe the parameter
values of a class O. In the earlier example to assess the morale of soldiers, a question-answer
session may pose 50 questions and examine their answers (so N = 50). The outcome may be the
determination of morale as one of 4 levels: EXCELLANT, HIGH, MEDIUM, and LOW — which
consitiutes O. Typically, The computational procedure M i may employ some form of Markov
Modeling for object classifications [10].
For notational purposes, a canonical structure of M i may take the functional form:
Mi : Fi × Li −→ O,
where Li is a set of logical formulas applied on the feature values instantiated for F i . See Figure 4
for illustration. Mi often employs ’statistical pattern recognition’ techniques to obtain a functional
view of the data space of Fi sensed by Vi [12], as captured through the logical formulas L i .
Consider an example of detecting the incidence of disease outbreak, say, Malaria, in a geographic
area where troops are deployed. Here, F i may depict the input data features that describe the
conduciveness of mosquito breeding: such as water stagnancy and land marshness, atmospheric
temperature, and altitude of the terrain; whereas, O depicts the various types of Malaria outbreaks
and their intensity levels. The algorithmic procedure M i may use, say, the water stagnancy and
marshness as the primary features to first determine the possibility of Malaria outbreaks: such
as a rule clause 0 stagnancy0 > 6 days; thereafter, the atmospheric temperature and the population
demographics may be used to distinguish different types of Malaria — L i is a set of such clauses.
These clauses can be represented as nodes of an aggregation tree.
3.4 Confidence levels associated with data

Suppose o∗ = M ∗ (F ∗ , L∗ ) is a ’data classifier’ that detects an object o ∗ ∈ O with 100% certainty.
Here, F ∗ is an exhaustive enumeration of features in terms of which O can be completely described
using an appropriate set of logical formulas L ∗ . Relative to this ideal case, a data classification
8
algorithm oi = Mi (Fi , Li ) has |Fi | |F ∗ | and |Li | |L∗ |. Though the non-represented features
F ∗ − Fi are deemed as less important by Mi , they do have some impact on the ability of M i to
distinguish one object from another in certain value regions of the missing feature space.
We then say that k oi − o∗ k is a measure of the accuracy of the device V i . The uncertainty in the
detection of an object by Vi is captured by a confusion probability parameter p i , i.e., the probability
that an object oi reported by Vi indeed matches with the actual physical world phenomenon o ∗
within an accuracy level — where 0.5 p i < 1.0.
Voting among such replica devices provides an overall confidence level Θ that is higher than
the per-device confidence level in the system, i.e., max({p i )}i=1,2,···,N ) < Θ < 1.0. This algorithmic
requirement is expressed by the mathematical relation:
N −1
(1 − K1 [1 − pi (Xj )]K1 +1−K2 ) > Θ, (1)
where K1 and K2 are the number of consenting and dissenting votes on a data X j ∈ O proposed
by Vi — assuming that devices have the same capability for object classification. For example,
pi = 0.85 and N = 10 can achieve a confidence level of 98% with replica voting 3 .
The users who disseminate the data are part of the voting functionalities. The computational
aspects of the voting can be incorporated as part of a GUI in the data aggregation software tools.
The enforcement of timeliness constraints requires knowledge of the overlay tree topology and the
data delays incurred in the various fusion path segments.
4: Specification of semantic knowledge in middleware

4.1 Temporal and spatial dimension of information elements
An information element Ii |i=1,2,··· may assume a value vali (t, s) at time t in a spatial location s.
Certain changes in vali ’s may indicate a significant deviation in the external environment relative
to its current operating point, which may be deemed as an ‘event’ [2].
We may prescribe a time scale and a spatial scale ( i , σi ) over which the changes in Ii occur, say,
i is an average time interval between the changes and σ i is an average area of the region affected
by the changes. For example, a noticeable change in aircraft coordinates provided by a tracking
radar may not occur any more frequent than once in 5 sec. The and σ parameters may determine
the sampling parameters for the sensor data emanating from external environment.
Suppose an event pertaining to Ii is observed to occur at time t and at a location s. A data
fusion procedure starts at time t0 > t in a region s0 that is close to s. The fusion procedure should
complete within a time duration β > 0, where (t 0 + β) < (t + i ). In other words, the fusion
procedure should complete before a next change in I i occurs — on the average (otherwise, changes
occur faster than they can be handled). The parameter depicts the time scale of event occurrences,
whereas the σ parameter depicts the geographic area of impact of events.
3
Referring to section , Replica voting on fuzzy data may be viewed as a knowledge-based ’data aggregation’
procedure executed at a leaf node. Here, the goal is to generate a single event notification with a base confidence
measure that is higher than pi — c.f. Equation (1). The semantic knowledge is that when two devices report the
same datum with confidence levels of pi1 and pi2 , the leaf node can accept the datum with a confidence level higher
than min({pi1 , pi2 }).
9
4.2 Event predicates
In general, an event may be represented as a condition on the data components O collected from
the external environment. An event is said to occur when user-prescribed condition on the data
components holds, i.e., a predicate L 0 (O) = true where L0 (· · ·) is a logical formula applied on the
data space O.
L0 is an ‘applicative’ function that maps the observed value of data to a boolean result (such as
>, <, =, +, · · ·). The functions {L0 } may be hooked onto the computation as ‘plug-ins’ at the
on-tree proxy nodes that can be invoked by the data fusion procedures. In comparison with the
work in [3], our notion of predicates blends the notion of ‘flow of real-time’ and ’spatial spreadout’
as part of a predicate formulation.
The condition L0 (· · ·) is supplied by a user to the data fusion procedures through a GUI. In
a way, we carry out event filtering, wherein an event filter is a programmable selection criterion
for classifying or selecting data sets from the raw data collected from the environment. The filter
conditions can be prescribed in a high-level declarative language that is compiled into the data
fusion portion of the system.
4.3 Illustrative examples of our specification approach

Consider the tracking of an unidentified plane p in a theater of operation. Suppose a critical
situation is stipulated as the rate of approach of p towards friendly assets exceeding a threshold
T2 when the current distance of p is within T 1 (this may trigger the firing of a missile towards p).
This event may be expressed as:
d{loc(p)}
evnt spec(FIRE) ≡ ((avg(loc(p)) < T 1 ) ∧ ( > T2 )).
dt
Suppose the missile system which was readied for firing needs to be pulled back when p moves to
a distance beyond a threshold T10 (> T1 ) — which implies that d{loc(p)}
dt > 0 — to avoid a firing4 .
This relation may be expressed as:
d{loc(p)}
evnt spec(NOFIRE) ≡ ((avg(loc(p)) > T 10 ) ∧ ( > 0)).
dt
The radar device that samples loc(p) is embedded into the snapshot-taking mechanism which filters
the timed samples from the radar.
Consider the earlier example of determining the ‘inference reliability’ parameter for a bank of
sensors that detect enemy movements in a battle terrain. A property of this surveillance function
may be prescribed as:
Ng
evnt spec(SENREL) ≡ ((Ns > 10) ∧ ( > 0.7)),
Ns
where Ns is the total number of sensors in the bank and N g is the number of non-faulty sensors.
The underlying data fusion procedures determine N g from a knowledge of the set of sensors in
the bank (such as ’majority voting’ among sensors), which allows the evaluation of predicates.
With the above form of predicates loaded into the data fusion GUI, the data fusion procedure
4
That T10 > T1 implies a ‘hysteresis’ during increasing trends of loc(p), to avoid ‘chattering’ of the missile system.
10
(···)
computes the applicative functions — ‘avg(· · ·)’/‘ d(···)
dt ’ and ‘<’/‘ (···) ’ on the data collected from
leaf nodes, namely, loc(p) and Ns /Ng respectively. This in turn allows the evaluation of predicates:
evnt spec(FIRE) and evnt spec(SENREL), as the case may be.
4.4 Event causality based data fusion

Due to involvement of human elements and/or physical systems in the application, the fusion system
should post events only in an order that does not violate causality [9, 8]. The causality relations
between events, expressed with a operator, are specific to the given application.
Consider the earlier example of tracking an unidentified plane p. The control system may pre-
scribe two operations: ready missile and pull back, as part of a recovery module for the purpose
of threat containment. Even though the enabling conditions for ready missile and pull back are
prescribed separately (by FIRE and NOFIRE events respectively), there is an application-specific
causal relationship that needs to be enforced on these actions, namely, ready missile j pull backj
in a j th control cycle. If a computation node sees these actions in an incorrect order, it thinks that
the missile system is ready to fire, while nodes seeing the actions in the causal order will treat the
missile system is pulled back. This inconsistency may in turn lead to dangerous threat situations
in the terrain.
Thus the behavior B of a system interface may be prescribed in terms of the allowed temporal
ordering relationships between various events possible at the interface level, given as: B ≡ (, {e}).
The problem-specific orderings are best prescribable by the user.
5. Prototyping and integration (non-military and AFRL related)

5.1: Data aggregation trees for visual representation
A data aggregation tree can provide the right level of visual interface for users to disseminate the
data collected at various points in the system. In some cases, the data collection points may be
different control functions in the system structure (such as the communication bandwidth and ca-
pacity information in a network QoS control setting). In some other cases, the data collection points
may be geographically distributed where the tree may be a topological overlay on the information
network (such as collecting the terrain data in a battlefield). With a tree-like structure, the users
can choose the right level of details and confidence levels required in disseminating the data.
Typically, the points closer to the leaf nodes of a tree convey the low level details of the
external phenomenon being observed. Whereas, the points closer to the root node of a tree convey
the decision outcomes obtained from the low level data. A human user examining the tree (say, a
commander) can choose the tree depth to navigate based on the situation on hand and the criticality
of actions needed.
5.2: A case study of analyzing DOS attacks

Consider a case study of a system for detecting the ‘denial of service’ (DOS) attacks on a distributed
network. The resource information at various points in the network (such as the available bandwidth
and communication delays) can be organized as the nodes of a tree, based on the hierarchical
relationship between these pieces information. Since distributed attacks are generally correlated
11
with one another (unlike benign failures which are statistically independent of one another) [17],
the temporal and/or spatial relationships between the various observed symptoms can be easily
gleaned using a tree-like visual representation of the relationships. For instance, the loss of a
communication link in the network can be correlated with a reduction in the data arrival rate at an
end-point (such as the events ey and ex in Figure 1). Here, the link will appear as a lower node in
the tree and the data rate will appear as an upper node. All the causal events that can impact the
data arrival rate (including the link availability) will then be represented as lower nodes connected
to the upper node.
The case study will involve using a CISCO router based network testbed with Spirent traffic
analyzers that are available at CUNY. The traffic analyzers can inject arbitrary amounts of data
at selected points in the network to simulate different types of DOS attacks. Delay and bandwidth
sensors will be implanted at the observation points. Where necessary, multiple sensor algorithms
will be installed (say, different bandwidth estimation methods) to enhance the confidence levels in
the event reporting by replica voting.
The goal of the case study will be to assess the effectiveness of visual GUI aided data aggregation
methods in accurately detecting DOS attacks.
5.3: A case study of dashboard-like visual interface for QoS control

Typically, QoS control is driven by applications based on how important the data being analyzed
is and the situation on hand. In a non-combat scenario for instance, a moving object detected in a
terrain suffices to have a coarse interpretation. Whereas, in a combat scenario, the moving object
needs to be closely examined in order to detect, say, enemy planes. The latter involves more complex
algorithmic procedures for target recognition purposes — and hence incur more computational
and communication resources when compared to a casual examination of the objects (a study of
AWACS from an adaptive QoS standpoint is given in [16]). Thus, the application requirements
directly control the amount of resources expended for detecting the objects in a terrain. Since the
resources are scarce in a battlefield setting, conserving the resources with an appropriate high-level
QoS interface is important.
A dashboard is a high-level visual interface that provides the human users with a simple means
of controlling the QoS — and hence the computational and communication resources [18]. See
Figure 5. The dashboard provides sliding bars that depict the normalized ranges of QoS metrics
for the application. It also provides a resource indicator that represents the available resources
(in a macroscopic way)5 . The QoS metric bars capture two coarse parameters: timeliness and
accuracy of the object detection. These metrics are not orthogonal from a resource allocation
standpoint, because improving the timeliness of a report will involve using simpler and less expensive
computational algorithms, al beit, with a reduction in the accuracy of the report. The user can
slide the bars appropriately to control the timeliness and accuracy parameters.
In the underlying implementation, the timeliness and accuracy parameters will be mapped onto
domain-specific stub procedures to control the computational and communication resources. For
instance, sliding the the accuracy bar to the higher end will invoke more complex algorithms (with
a large Fi and Li ); it may also possibly use multiple algorithms to enhance the accuracy by replica
voting. Accordingly, the system will update the timeliness indicator on the dashboard, so that the
5
An analogy is the fuel indicator in an automobile dashboard that allows the drivers to decide how far they can
travel.
12
Figure 5: A schematic of dashboard-like visual interface for QoS control
user can guage the delays involved in the reporting of objects. Network monitors will sample the
various resource information (such as bandwidth, links, processing speeds, and battery power) and
aggregate them to update the resource indicator bar.
We shall use the network testbed at CUNY to realize the dashboard interface. Standard image
processing algorithms [?] will be constructed in a layered fashion, and then applied on (publicly
available) target data to establish the feasibility of dashboard-like QoS control.
5.4: Software architecture of data fusion system

There are three functions in our model of data fusion procedures (DFP): the ‘event transformer’,
‘event detector’, and ‘event dispatcher’. See Figure 6. We plan to use languages such as SQL and
XML to program the components of DFP. These languages provide relational constructs necessary
for object-level data aggregation. For event notification purposes, we shall develop JAVA-based
program interfaces to the DFP procedures.
The ‘event transformer’ maps the physical manifestation of an event into a form that can be
observed by the DFP. The user may supply the required mapping functions through the GUI. The
event dispatcher maintains a registry of predicates supplied by the application (the registry may be
viewed as an ‘event rule library’). It invokes the DFP with a predicate definition and appropriate
parameters. When a predicate becomes true signifying the occurrence of an event, a ‘call back’ to
the application may occur (say, to generate an audible siren and/or a visual red alert from a GUI).
13
Figure 6: Software components of semantics-driven data fusion system
5.5: Developing menu-driven GUI

We plan to develop an interactive menu-driven window interface between human users and the
machines implementing the DFP. The menus will basically identify the set of system functions as
a set of ‘buttons’ displaying the verbose description of the physical events and the data describing
them. When a user clicks on a button, a ‘value’ menu space will pop open, indicating the possible
values for the function type clicked. A set of applicative procedures prescribed by the DFP will
also be loaded into the desktop as icons. Using these buttons and icons, a user can profile an
event as satisfying any desired condition. Users can profile different system properties by clicking
on these ‘buttons’ and invoking pre-loaded applicative procedures in the desktop to operate on
the variables. Event causality relationships system interface events may also likewise be prescribed
from the menu-system.
The user-friendliness of the menu-driven window interface to the DFP may allow users with
with less experience in distributed programming and algorithms development to inject a variety
of critical scenarios when conducting the event simulations. In one application, commanders in a
battlefield may study of the morale of soldiers using low level observations of soldiers’ behaviors.
In another application, paramedical personnel may gear themselves up to attend to a potential
disease outbreak in a geographic area. These case studies will be conducted in our research by
event simulations.
14
6.0 Expertise of project personnel
6.1 Qualifications of PI K. Ravindran: Professor of Computer Science (City College,
CUNY); Ph.D. (Computer Science, 1987, University of British Columbia); research ar-
eas: distributed collaboration systems & protocols, information assurance systems, service-
level management of network infrastructures; managed external grants/contracts of about
$1.1M; has about 90 refereed publications; 17 years of university experience and 5 years
of experience in space and communication industries.
This PI has studied the on-line monitoring and control paradigms for distributed multimedia net-
works and enterprise web server systems. These works involve extensive simulation modeling of the
underlying network and server infrastructures. This expertise will help in the conduct of the AFRL-
HE project. The PI has also worked on coarse granular event management services in distributed
collaboration settings. The PI has extensively worked in the area of replica voting algorithms to
decide among conflicting results involving deterministic input data — partly through summer fel-
lowships at the Air Force Research Lab (Rome, NY) during 2001-07. This expertise will be useful
in the application case studies.
6.2 Role of Graduate Students

A 14-month graduate student position has been requested, who will assist in developing the soft-
ware and simulation tools needed to implement the data fusion procedures. The graduate student
will have expertise in MATLAB, XML, UML, and JAVA software tools, which will be useful for
developing the DFP software. Besides, the graduate student will also look into the geography and
social aspects human behavior. This will be useful in conducting case studies for assessing the
human-effectiveness oriented metrics using our model of data aggregation (such as the morale of
soldiers when deployed in remote regions where the social demographics and the geography of the
region have a strong impact). Personnel with the the above combination of expertises will be sought
to participate in this project.
7.0 Schedule of milestones

1. Implementation of discrete-event simulation testbed for event aggregation using MATLAB
(Aug.’08 to Dec.’08);
2. Preparation of in-house network testbed at CUNY for experimental studies (Oct.’08-Jan.’09);
3. Development of GUI for event specification and aggregation (Jan.-April’09);
4. Development of SQL and XML schema for event specifications (Jan.-April’09);
5. Study of replica voting mechanisms for event fusion (May-July’09);
6. Case studies of battlefield applications and human-in-the-loop simulations, in collaboration

with AFRL-HE lab (June-Aug.’09);
7. Documentation of project results (June-Sept.’09).
15
A: STATEMENT OF WORK
TASKS/TECHNICAL REQUIREMENTS
The contractor shall accomplish the following:
A.1 Study models for characterizing the external environment behaviors and how they map into
specific data characteristics;
A.2 Develop the distributed control mechansims from the high-level specification of data fusion
procedures;
A.3 Develop algorithms for event filtering at the protocol level and for event propagations at the
application interface level;
A.4 Develop designer-friendly GUIs for rapid event compositions and visual evaluation in the
in-house network testbed at CUNY;
A.5 Develop SQL-based and XML-based schemas for specifying event filters and aggregation
rules;
A.6 Carry out application case studies of data aggregation procedures for use in military settings:
i) the analysis of DOS attacks on distributed networks using visual tools, and ii) the study of
dashboard-like visual interfaces for QoS control and the underlying system structures.
The task deliverables A.1-A.6 will be accomplished on a software-level simulation testbed with
real data obtained from AFRL-HE offices.
During the project period, the contractor shall write technical reports outlining the progress of
research work for dissemination of the results (both positive and negative) to the technical liaison
groups in the AFRL-HE. To permit full understanding of the techniques and procedures used in
evolving the protocol testing technology, the reports will include pertinent observations, nature of
technical problems, design methods used, computer algorithms developed, etc. The contractor shall
also make detailed technical presentations on the progress of work at the above site semi-annually
during the project period, and present the completed work (including the demonstration of data
fusion software tools and applications) at the end of the project period. In addition, the contractor
shall provide a demonstration of the ‘in-progress’ project works on the CUNY simulation testbeds
in about 16 months from the start of the project.
The contractor shall also write research articles and technical papers for publication in journals
and conferences for wider dissemination of the results across academic and industrial research
communities in the areas of advanced network architectures. Such publications will carry a citation
to acknowledge the AFRL-HE contract in supporting the published work.
References
[1] S. Chamberlain. Automated Information Distribution in Bandwidth-constrained
Environments. In Proc. Milcom’94, North-Holland pub., 1994.
16
[2] H. Kopetz and P. Verissmo. Real Time Dependability Concepts. Chap. 16, Distributed
Systems, ed. S. Mullender, Addison-Wesley Publ. Co., 1993.
[3] V. K. Garg. Observation of Global Properties in Distributed Systems. In Proc. Intl.

Conf. on Software and Knowledge Engineering, IEEE CS, pp.418-425, Lake Tahoe (NV), June
1996.
[4] Y. Yao and J. Gehrke. The Cougar Approach to in-network Query processing in
Sensor Networks. In ACM SIGMOD Record, 2002.
[5] S. Madden, M. Franklin, J. Hellerstein, and W. Hong. TineDB: an Acquisitional Query

Processing System for Sensor Networks. in ACM-Transactions on Database Systems,
2005.
[6] C, Intanagonwiwat, R. Govindan, D. Estrin,, J. Heideman, and F. Silva. Directed Diffusion

for Wireless Sensor Networking. in IEEE/ACM-Transactions on Networking, Febr. 2003.
[7] Sensor Model Language (SensorML). in http://vast.uah.edu//SensorML, 2005.
[8] K. Ravindran and Jun Wu. Programming Models for Behavioral Monitoring of Dis-
tributed Information Networks. in proc. Distributed Systems and Real-time Applications,
IEEE-DSRT 2005, Oct.2005.
[9] K. Birman and et al. Astrolebe: a publish-subscribed based event processing system.
in Technical reports, Cornell University, 2002-2005.
[10] C. G. Cassandras and S. Lafortune. Introduction to Discrete Event Systems. In Kluwer

academic Publishers (Springer), 2007.
[11] R. Stadler, F. Wuhib, M. Dam, and A. Clemm. Decentralized Computation of Threshold-

crossing Alerts. In proc. conf. on Distributed Systems: Operations and Management,
IEEE/IFIP, Barcelona (Spain), Oct. 2005.
[12] D. G. Stork, R. O. Duda, and P. E. Hart Pattern Recognition Systems. Chapter 1.3, Pattern
Classification, 2000.
[13] W. Hu, A. Misra, and R. Shorey. CAPS: Energy-Efficient Processing of Continuous Aggregate
Queries in Sensor Networks. In proc. 4th Intl. conf. on Pervasive Computing and Communi-
cations, IEEE-PerCom’06, pp.190-199, June 2006.
[14] S. Kabadayi, A. Pridgen, and C. Julien. Virtual Sensors: Abstracting Data from Physical
Sensors. In Technical Report 2006-01, University of Texas at Austin, 2006.
[15] K. Stranc. Airborne Networking. In Presentation, MITRE Corporation, Public Release Ref.#
04-0941, 2004.
[16] R. Clark, E.D. Jensen, A, Kanevsky, J. Maurer, T. Wheeler, Y. Zhang, D. Wells, T. Lawrence,
and P. Hurley. An Adaptive, Distributed Airborne Tracking System. In proc. IEEE WPDRTS,
vol.1586 of LNCS, 1999.
[17] W. J. Gutjahr. Reliability Optimization of Redundant Software with Correlated Failures. In

proc. 9th Intl. Symp. on Software Reliability Engineering, Germany, 1998.
17
[18] J. Jachner, S. Petrack, E. Darmois, and T. Ozugur. Rich Presence: A New User Communica-
tions Experience. In Alcatel Telecommunications Review, Technology White Paper, pp.73-77,
1st quarter 2005.
18
K. Ravindran
Current Position: Professor of Computer Science
City University of New York (City College), New York (joined in 1996)
Degrees Received: Ph.D (Computer Science), University of British Columbia, Canada, 1987
M.Eng (Computer Science & Automation), Indian Institute of Science, Bangalore, 1978
B.Eng (Electronics), Indian Institute of Science, 1976.
Work experience: 1. Assistant Professor,

Department of Computing & Information Sciences, Kansas State University (1989 – 1996).
2. Member of Scientific Staff,

Bell Northern Research, Ottawa, Canada (1988 – 1989).
3. Assistant Professor,
Department of Computer Science, Indian Institute of Science (1988).
4. Teaching/Research Assistant,
Department of Computer Science, University of British Columbia (1983 – 1987).
Research activities: 1. Working on Distributed systems modeling and design

for information security, replication control, QoS assurance)
2. Working on Architectures and protocols for
Future Internet, Network Management, traffic engineering
3. Published about 90 papers in refereed international conferences and journals
4. Received research grants/contracts from IBM, Philips Research,
US Air Force, ARPA, BMDO during 1992-01 (totaling about $1.1M)
5. Supervised research thesis of 4 Ph.D. and 18 M.S. students
¿ (currently supervising 2 Ph.D. thesis works and 2 M.S. thesis works)
6. Selected as Senior Summer Faculty Fellow at Naval Research Lab, 2008
7. Summer Faculty Fellow at Air Force Research Lab, Rome, 2001, 2003, 2005-07
Collaborators: 1. Dr. Kevin A. Kwiat, Air Force Research Laboratory (Rome, NY)
(2002-present) Areas: Information Assurance and Security
2. Dr. K. K. Ramakrishnan, AT&T Research (Flohram Park, NJ)
Areas: Rate Control for Video Distribution in Large Multicast Groups
3. Prof. D. Kumar, City College of CUNY (New York, NY)
Areas: Design and Validation of Distributed Protocols
19
Teaching activities: 1. Instructor for graduate Distributed Systems course
2. Instructor for graduate level and undergraduate level
operating systems and computer networks courses
3. Instructor for undergraduate level Data Structures & Algorithms
20
Publications relevant to proposal
References
[1] X. Liu, K. Ravindran, and D. Loguinov. A Queuing-theoretic Foundation of Available
Bandwidth Estimation: Single Hop Analysis. accepted for publication in IEEE/ACM
Transactions on Networking, June 2006.
[2] K. Ravindran and Jun Wu. Architecture for Dynamic Protocol-level Adaptation for
Enhancing Network Service Performance. In proc. IEEE/IFIP Conf. on Network Oper-
ations and Management (NOMS’06), Vancouver (Canada), April 2006.
[3] X. Liu, K. Ravindran, and D. Loguinov. Towards a Generalized Stochastic Model of

End-to-End Packet Pair. accepted for publication in IEEE Journal on Selected Areas in
Communications, March 2006.
[4] A. Sabbir and K. Ravindran. Concurrency Control for Interactive Sharing of Data
Spaces for Distributed Real-time Collaborations. in proc. of IEEE Intl. Symp. on
Distributed Simulation and Real-time Applications, Montreal (Canada), Oct. 2005.
[5] K. Ravindran and X. Liu. Service-level Management Frameworks for Adaptive Dis-
tributed Network Applications. in Springer Verlag Lecture Notes on Computer Science,
(Service Availability), LNCS 3335, 2005.
[6] K. Ravindran, J. P. Fortin, and X. Liu. Flow Management for QoS-controlled ’Data
Connectivity’ Provisioning. accepted for publication in ElSevier Journal of Computer
Communications, Febr. 2005.
[7] K. Ravindran. Programming Models for Behavioral Monitoring of Distributed In-

formation Networks. in proc. of Hawaii Intl. Conf. on System Sciences, Big Island (HI),
Jan. 2005.
[8] K. Ravindran and A. Sabbir. Event-based Programming Structures for Multimedia

Information Flows. in Springer Verlag Lecture Notes on Computer Science, (Management
of Multimedia Networks and Systems), LNCS 3271, 2004.
[9] K. Ravindran and R. Steinmetz. Object-oriented Communication Structures for Mul-

timedia Data Transport. In IEEE Journal on Selected Areas in Communications, Special
Issue on Distributed Multimedia Systems and Technology, vol.14, no.7, pp.1360-1375, Sept.
1996.
[10] K. Ravindran and X. T. Lin. Structural Complexity and execution Efficiency of Dis-
tributed Application Protocols. In Proc. Conf. on Communication Architectures, Protocols
and Applications, ACM SIGCOMM, San Fransisco (CA), pp.160-169, Sept. 1993.
21
Other significant Publications
References
[1] K. A. Kwiat, K. Ravindran, and P. Hurley. Energy-efficient Replica Voting Mechanisms
for Secure Real-time Embedded Systems. In proc. of Intl. Conf. on World of Wireless
and Mobile Multimedia, WOWMOM’05, Taormina (Italy), June 2005.
[2] K. Ravindran, K. A. Kwiat, and A. Sabbir. Adapting Distributed Voting Algorithms for
Secure Real-time Embedded Systems. In proc. of workshop on Distributed Auto-adaptive
Reconfigurations in Software Systems, ICDCS-DARES’04, Tokyo (Japan), March 2004.
[3] K. Ravindran, A. sabbir, D. Loguinov, and G. S. Bloom. Cost-optimal Multicast Trees

for Multi-source Data Flows. In proc. INFOCOM’01, IEEE Com. Soc., Anchorage (AK),
April 2001.
[4] K. Ravindran, T. J. Gong. Cost Analysis of Multicast Data Transport Architectures

in Multi-service Networks. In IEEE/ACM Transactions on Networking, Feb. 1998.
[5] K. Ravindran, G. Singh, C. M. Woodside. Architectural Concepts for Implementation of

End-systems High Performance Communications. In Intl. Conf. on Network Protocols
(ICNP), Columbus (OH), Oct. 1996.
[6] K. Ravindran. Architectures and Protocols for Data Multicasting in Multi-service

Networks. In Computer Communications Review, ACM SIGCOMM, July 1996.
[7] K. Ravindran. A Flexible Network Architecture for Data Multicasting in High

Speed ‘Multi-service Networks’ . In IEEE Journal of Selected Areas in Communications,
Special Issue on Global Internets, Oct. 1995.
[8] K. Ravindran and V. Bansal. Delay Compensation Protocols for Synchronization of

Multimedia Data Streams. In IEEE Transactions on Knowledge and Data Engineering —
Special Issue on Multimedia Information Systems, vol. 5, no. 4, pp.574-589, Aug. 1993.
22

Distributed Compositional Operations For Aggregation and Visualization of Cyberspace Data

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed Compositional Operations For Aggregation and Visualization of Cyberspace Data

Uploaded by

Copyright:

Available Formats

Distributed Compositional Operations for

Aggregation and Visualization of Cyberspace Data

A Proposal submitted to the

Air Force Research Laboratory

Det 1 AFRL/PKHA, Bldg 167

BAA-08-02-RH (AFRL/RH FY08 HBCU/MI Set Aside Program)

Contracting POC: Rhonda L. Powderly, (937)656-9046

(Technical Area No. RHC-1: Synthesizing the Dynamics of Cyberspace

Amount: $99,887 Duration: 14 months Start date: August 2008

Prof. Kaliappa Ravindran Phone number: (212)650-6218

Authorized Institutional Representative

Regina Masterson, Director Phone number: (212)650-5418

1: Behavioral view of distributed systems

1.1: Modular structuring of a DCS

1.2: Overlay trees

1.3: User-level control of data aggregation

2: Relationship to current models of data aggregation

3: Data aggregation in on-tree nodes

3.1 Aggregation using syntactic rules

3.2 Aggregation using semantic knowledge

3.3 Replica voting based on data classification

We denote the ‘data classifier’ implemented by a voter algorithm V i as Mi (Fi , Li )|i=1,2,···,N ,

3.4 Confidence levels associated with data

4: Specification of semantic knowledge in middleware

4.3 Illustrative examples of our specification approach

4.4 Event causality based data fusion

5. Prototyping and integration (non-military and AFRL related)

5.2: A case study of analyzing DOS attacks

5.3: A case study of dashboard-like visual interface for QoS control

5.4: Software architecture of data fusion system

5.5: Developing menu-driven GUI

6.2 Role of Graduate Students

7.0 Schedule of milestones

2. Preparation of in-house network testbed at CUNY for experimental studies (Oct.’08-Jan.’09);

3. Development of GUI for event specification and aggregation (Jan.-April’09);

4. Development of SQL and XML schema for event specifications (Jan.-April’09);

5. Study of replica voting mechanisms for event fusion (May-July’09);

6. Case studies of battlefield applications and human-in-the-loop simulations, in collaboration

7. Documentation of project results (June-Sept.’09).

[3] V. K. Garg. Observation of Global Properties in Distributed Systems. In Proc. Intl.

[5] S. Madden, M. Franklin, J. Hellerstein, and W. Hong. TineDB: an Acquisitional Query

[6] C, Intanagonwiwat, R. Govindan, D. Estrin,, J. Heideman, and F. Silva. Directed Diffusion

[7] Sensor Model Language (SensorML). in http://vast.uah.edu//SensorML, 2005.

[10] C. G. Cassandras and S. Lafortune. Introduction to Discrete Event Systems. In Kluwer

[11] R. Stadler, F. Wuhib, M. Dam, and A. Clemm. Decentralized Computation of Threshold-

[17] W. J. Gutjahr. Reliability Optimization of Redundant Software with Correlated Failures. In

Work experience: 1. Assistant Professor,

2. Member of Scientific Staff,

Research activities: 1. Working on Distributed systems modeling and design

[3] X. Liu, K. Ravindran, and D. Loguinov. Towards a Generalized Stochastic Model of

[7] K. Ravindran. Programming Models for Behavioral Monitoring of Distributed In-

[8] K. Ravindran and A. Sabbir. Event-based Programming Structures for Multimedia

[9] K. Ravindran and R. Steinmetz. Object-oriented Communication Structures for Mul-

[3] K. Ravindran, A. sabbir, D. Loguinov, and G. S. Bloom. Cost-optimal Multicast Trees

[4] K. Ravindran, T. J. Gong. Cost Analysis of Multicast Data Transport Architectures

[5] K. Ravindran, G. Singh, C. M. Woodside. Architectural Concepts for Implementation of

[6] K. Ravindran. Architectures and Protocols for Data Multicasting in Multi-service

[7] K. Ravindran. A Flexible Network Architecture for Data Multicasting in High

[8] K. Ravindran and V. Bansal. Delay Compensation Protocols for Synchronization of

You might also like