You are on page 1of 18

WIRELESS SENSOR NETWORKS

An algorithm for determining backbones in a wireless sensor network using the Smallest Last graph coloring algorithm in random geometric graphs.

JEFF ALLEN
ALGORITHM ENGINEERING FALL 2009 PROFESSOR DAVID W. MATULA

E XECUTIVE S UMMARY
I N TR O D UC T I ON & S UM M AR Y
ABSTRACT
I implemented an algorithm which efficiently creates, colors, and computes statistics on a Random Geometric Graph (RGG) of various shapes and distributions. This implementation produces redundant independent sets which could serve as backbones within the network using a greedy graph coloring [Kosowski] method and the Smallest Last (SL) [Matula] vertex ordering algorithm. Many of these backbones achieve very high percentage of coverage on the graph while only using a small subset of the vertices. This implementation can produce reasonably sized networks for visualization in under a second, and can produce networks containing over a hundred thousand vertices and millions of edges in a few minutes. This study has focused on applications within the field of wireless sensor networks and could serve as a useful tool in the simulation and study of such networks. Note that a website has been developed to accompany this material. This site includes more interactive features which will better relay some of the algorithms in this project including Smallest Last ordering, Grundy coloring, etc. This site also includes extensive details showing the performance of this algorithm on certain benchmark sets. Also, all of the more data-intensive features (such as tabulated orderings and statistics for benchmark graphs) have been included in the website. This paper is just intended to outline the algorithmic developments behind the project.

Wireless Sensor Networks

Page 2

BACKGROUND & ENVIRONMENT


Wireless sensor networks are collections of sensing devices which communicate without traditional cabling. These networks have widespread applications from agriculture to military security [Lawson 3]. Wireless sensor networks are gradually becoming prevalent in environments where running wires and cabling is impractical [Cook 26]. Pushing aside the mechanical and manufacturing challenges involved in creating these networks, much energy has been expended in attempting to efficiently enable communication between these sensors [Sohrabi, Mahjoub]. This is a particularly interesting problem in wireless networks for two reasons: first, the sensors must be able to handle the congestion that is often much trickier in wireless applications; second, the deployment of sensors is often poorlystructured, if not completely random. This introduces new challenges in trying to enable communication between these nodes, as well as in trying to extract the sensed information from the networks. With so many potential applications in hand, researchers are very interested in algorithms to develop "backbones" within these networks reliably and quickly. A backbone, more technically, is a (nearly) dominating, independent set in the graph of nodes, meaning that the nodes, themselves, are out of one another's range, but would be able to relay information between an intermediary node. These nodes could, theoretically, operate on the same frequency without fear of interference, due to their placement. A naive solution to calculate the ideal backbone would be a "brute-force" analysis, analyzing all possible connections between nodes in the graph. Unfortunately, this solution is super-exponential on the number of nodes and very quickly becomes unfeasible computationally,

Wireless Sensor Networks

Page 3

even for a handful of nodes. Obviously, there is great interest in more efficient solutions for this problem. Algorithmic efficiency in this problem, as applied to wireless sensor networks, is important for three primary reasons: 1. These networks are often dynamic and/or mobile. Thus, we may need to re-compute network backbones frequently. The algorithm must be able to produce a backbone quickly enough to still be a useful means of transporting information before the nodes have relocated. 2. These sensor nodes are often impoverished devices with very limited capabilities. In a self-organizing network, sensors would need to be able to quickly and easily handle these computations. 3. The scale of the problem is much larger than the naive observer would expect. Many researchers hope to deploy these sensors ubiquitously. Some going so far as to say that, in the future, every plant on a farm could have its own sensor to ensure optimal growing conditions and nutrients [Lawson 2]. In such hypothetical problems, we're quickly dealing with hundreds of thousands, if not millions, of nodes. Algorithms must be developed that can reliably handle networks of this size. The scope of this project is to analyze these network in random geometric graphs (RGGs) which are graphs in which n vertices are randomly placed, then a connection is established if two nodes are within some distance r of one another. When applied to wireless sensor networks, we can imagine that the vertices on these graphs are sensors and an edge in the graph between two nodes indicates that these two nodes are within communication range of one another.

Wireless Sensor Networks

Page 4

RESULTS
There are two primary contributions of this project: the Java-based graph creation and coloring algorithm, and the Flash tool used in visualizing and examining such graphs. The graph generation and coloring algorithm could be a useful implementation in this field. I am currently unaware of any implementations in Java, so the fact that it provides this functionality in a (presumably) new language could be of use. Aside from that, I have been able to make certain optimizations on the construction of the graph which pertain to the underlying algorithm, as opposed to its implementation. These optimizations may be of interest to users in the field at large. The Flash program may have less tangible benefits, but I feel that it could be a valuable tool nonetheless. The most obvious applications I see are in visually analyzing the performance of graphs and also in teaching. Visual analysis of problems is an important - albeit often overlooked - aspect of algorithm engineering and analysis. Many important breakthroughs have been made only after a physical or visual manifestation of the problem could be studied. Also, because the interface is more interactive and animated, it could serve as an effective learning tool in studying these and related topics. It may be possible to convey the nuances of the Smallest Last algorithm, for instance, by viewing an animated sequence.

P R O GR AM M I NG E N V IR O NM EN T D E SC R IP TIO N
My algorithm was developed on a custom-built machine with specifications as listed in Table 1. I used two languages, primarily, in the development of the algorithm and displays. The algorithm, itself, was developed completely in Java [Sun]. The only imported libraries used in

Wireless Sensor Networks

Page 5

the code were: java.util.Point, java.util.Stack, java.util.Vector, and java.util.Random. No preexisting code or external java libraries (other than the above) were used. I wanted to find a solution which would allow for a more interactive experience in trying to convey these complicated topics. With this in mind, I developed a custom application in Adobe Flash to display the graphs[Adobe]. Specifically, I used Flash 9 with ActionScript 3. I chose Flash, in part, because of its ease of use in a web-browser. This means that I would be able to publish my application online and have interested readers be able to interact with the graphical tools in the same way I did. Flash is a client-side technology that runs within a user's webbrowser. This means that any information that
Table 1 - Specifications of computer

Item

Description Intel Q6600 CPU 2.4 GHz Intel Core 2 Quad Quad Core 8GB DDR2 RAM 800MHz Operating System Windows 7 Western Digital Raptor Hard Drive 10K RPM, 130GB

needs to be passed to Flash needs to be downloaded from a server, generally as an HTML or XML file. In order for my graphical application in Flash to be able to retrieve the graphs produced

in Java, I needed to bridge the two technologies. To do this, I used Java Server Pages. This technology runs Java code in a web server, making the Java output available as an HTML file, which makes it accessible to the Flash client. I considered using XML markup to describe the structure of the generated graphs, but found that the files were far too large, so I developed my own format to efficiently transmit the information. The ultimate goal of much of this work involved visualization. Due to the current pixel density on modern displays, it became infeasible to visually examine graphs any larger than n=10,000. With this in mind, the hardware on which I ran the programs turned out to be a bit of

Wireless Sensor Networks

Page 6

an overkill - the RAM specifically. As will be discussed later, a graph with 10,000 nodes only occupied about 50MB of memory for me and took around 1 second to generate. A typical modern computer with 2GB of RAM, would be sufficient for most graphs that could be visualized on a computer monitor. However, having more memory allowed me to study the performance on some more theoretical networks containing hundreds of thousands of vertices.

R E FER E NC ES
Adobe Inc. Adobe Flash Platform. http://www.adobe.com/flashplatform/. Accessed December 11, 2009. Cook, Diane J. Das, Sajal K. Smart environments: technologies, protocols and applications. Wiley & Sons Inc. 2005. Kosowski, Adrian; Manuszewski, Ktzysztof. Classical Coloring of Graphs. Lawson, Shaun. University of Lincoln. Wireless Sensor Networks. http://hemswell.lincoln.ac.uk/~slawson/napier/CO42022/lectures/Week10.pdf 2005.

Mahjoub, Dhia. Matula, David. Experimental Study of Independent and Dominating Sets in Wireless Sensor Networks Using Graph Coloring Algorithms. WASA 2009, LNCS 5283, pp. 3242, 2009. Matula, David. Beck, Leland. Smallest-last ordering and clustering and graph coloring algorithms. Journal of the ACM, Volume 30, Issue 3. July 1983. Sohrabi, Katayoun. Protocols for Self-Organization of a Wireless Network. Allerton Conference on Communication, Computing and Control, September 1999. Sun Microsystems. Developer Resources for Java Technology. http://java.sun.com/. Accessed December 11, 2009.

Wireless Sensor Networks

Page 7

W IRELESS S ENSOR N ETWORK B ACKBONE


R E DU C T IO N TO P R AC TI C E
This project consisted of a multitude of algorithms which needed to be implemented. This section will detail just a few of the higher-level, more important algorithms which were used. To give an idea of the overall scope of the project, the program currently in use to develop these graphs consists of over 1,100 lines of code just to create and color the graph. The goal is to create and color a graph in O(|V|+|E|) time, meaning that the complexity should scale linearly on the number of vertices and edges in the graph. After some early testing, it became obvious that the memory requirements of this program would be minimal. With that in mind, I made the initial decision to prioritize computational complexity over memory use, where possible. There are certain situations in which one data structure could be converted to another, for example, but I typically will just duplicate the data structures to avoid the overhead of converting the data back and forth. This may, at time, double the amount of memory required to perform some operation, but it typically saves enough time to justify this.

GRAPH CREATION
The heart of the graph creation algorithm is an iterative loop which creates n vertices with random x and y coordinates between 0 and 1 which are stored as single-precision floating point values. The work involved in creating these points is on the order of the number of nodes we're creating - or "O(n)." More specifically, we're generating two random numbers per vertex,

Wireless Sensor Networks

Page 8

so the time will increase on the order of 2n. These nodes can be stored in an array of predefined size, as the size doesn't change. When working with a square, all coordinates from [0,1] are acceptable, so no further filtering is necessary (Figure 1). However, when working with a disc, only a subset of these points actually fall within the area of the disc. To calculate whether or not a coordinate is within the acceptable bounds of
Figure 1 - A uniform, Square graph with n = 400 and r = 0.1

a disc, we just compute the Euclidian distance from the origin of the disc (0.5, 0.5). If this distance is greater than 0.5, then we know that the point must be outside of the perimeter of the disc and thus cannot be used (Figure 2). The creation of non-uniform distributions upon these surfaces is not considered in their pseudo-random generation. Instead, after generating a point with coordinates (x,y), the algorithm will add the node to the graph with a various probability based on its location. This method will achieve the effect of
Figure 2 - A uniform disc with n= 100 and r = 0.2

distributing vertices on the graph with the desired distribution.

The two non-uniform distributions considered in this project were both only applied to the disc. The first is a "Skewed distribution" which, given a vertex with coordinates (x,y), will accept and place a vertex with probability = 2 ( 0.5)2 + ( 0.5)2 , where the origin of the circle is at (0.5, 0.5). This function is equivalent to the Euclidean distance from the origin; essentially, it will place a vertex on the perimeter of the disc with 100% probability, and will never place a vertex at the origin. This creates a graph which is very sparse in the center and very dense as you approach the border (Figure 3).

Wireless Sensor Networks

Page 9

The second distribution is a "Two-Tiered" distribution does not have a continuously varying probability of placement, as the skewed disc does. Instead, it distinguishes only between 1. those nodes which are within distance r of the border and 2. those

discs which are not. The function will give a 100% chance of

Figure 3 - A skewed disc with n = 400 and r = 0.1

placing those nodes within distance r of the border and only a 50% chance of placing those nodes in the interior region. This will create a graph which resembles the uniform disc on the interior but will have a much more crowded border region (Figure 5). Once the nodes have been placed, they must be connected. A random geometric graph, as stated earlier, connects those nodes which are within distance r of one
Figure 5 - A two-tiered disc with n = 400 and r = 0.1

another. In order to establish all of these connections, we must

calculate the distances between many of the nodes in the graph. A naive algorithm to handle this problem would merely calculate the distance between any node (n total nodes) and every other node (n - 1 other nodes) and connect if
50000000

that distance is less than r. The problem with this solution, of course, is that it would require a multitude of - O(n2) calculations for large graphs, which quickly becomes computationally

40000000 30000000 20000000 10000000 0 100 400 800 1600 3200 6400

unfeasible (Figure 4).

Nave Comparisons

Cell Method

Figure 4 - Number of nodes vs. number of comparisons between nodes.

Wireless Sensor Networks

Page 10

A more efficient method to connect the nodes is to divide the graph into smaller pieces and connect the nodes only within these pieces or "cells." If this division is done intelligently, we can minimize the number of connections which would overlap between "cells" which will save even further computation. More specifically, we will need to compare all nodes within one cell first. This operation is still technically on the order of the number of nodes squared - O(n2) however, it will be much faster than a naive implementation. As shown in Figure 4, the number of comparisons required to create a graph is significantly lower. For instance, for a graph of size n = 6400, the naive method would require 40 million comparisons, while the cell-based implementation averages just over 110 thousand comparisons. Clearly, the cell method would be an improvement in the number of comparisons, but is it feasible in terms of the computations complexity of dividing these nodes as well as the memory requirements to do so? If implemented efficiently, it can be. To minimize the overlap of cells, it is best to use cells of size r. This way, there is no way that a vertex can be connected to a non-adjacent cell, as that would require spanning a distance of more than r. However, cells any larger than r would begin to be costly as the number of comparisons increases on the order of nodes in the cell squared. Thus, we partition the graph into cells of size r x r, as displayed in Figure 6. Dividing the nodes into cells requires going through all nodes in the graph (of which there are n) and doing a constant
Figure 6 - A view of a graph split into cells. The red nodes are in the current cell, the blue nodes are in adjacent cells so they must be compared.

amount of work on each, thus the division into cells is O(n). In order to do this, we can do a

Wireless Sensor Networks

Page 11

"bucket sort" on all n nodes into (1/r)2 cells. As we go through the nodes, at each node, we classify its destination cell based on its x and y coordinates. Once we know which cell it will end up in, we can copy that node into the "bucket" corresponding to that cell. These buckets are variable length, but we must be able to retrieve elements by their index (for reasons to be explained later), so we store these nodes in a Vector, which is a Java Class which is built on the List interface. This interface is similar to a stack, but provides index-able access (O(1) lookup time) to certain elements in the Vector. Essentially, it's a growable array; by setting parameters to the estimated cell size, we can minimize the necessity to grow the size of a Vector, so the performance, on average, will be almost, if not as good as keeping these elements in an array. These cells will double the required amount of memory, as they are duplicating the initial array of nodes, which we're keeping for use later. Once these elements are sorted into their constituent cells, we can begin connecting the nodes. Connections are stored redundantly by both vertices at either end of an edge. Each node stores a Vector of connections which can grow to any desired size. These connections store the node ID of the node at the other end of this connection. This avoids to overhead and wasted space of storing a sparse adjacency matrix. To utilize the cells to connect the nodes, we must first check all O(n2) comparisons within a cell (note, however, that the number of nodes within a cell is, assuming a uniform distribution, O(n*r2), which is significantly lower than O(n). Assuming that r will halve every time n is increased by a factor of four - which keeps the average degree of the nodes constant and was used on all test graphs in this project - then the comparisons within a cell could be said to be O(n)). After the connections are made within a cell, we must check for connections with adjacent cells.

Wireless Sensor Networks

Page 12

Again, some optimization can take place here. If we were to connect to all eight adjacent cells (eight assumes that the source cell is not on an edge), we would be redundantly checking nine cells for each one source cell. The first optimization we can make is to check only those relationships coming from the source cell and going to some adjacent cell; this means that we won't check connections within an adjacent cell until later. Also, if we consistently check for connections between cells in a certain direction (to the upper-left, for instance), we can minimize the number of adjacent cells we need to check. By starting at the lower right and working up and to the left by column then by row, we actually only need to check the adjacent nodes to the left, upper-left, and above the source cell. This greatly reduces the complexity of the connection process. One decision which was glazed over previously may deserve more consideration here. The decision of whether or not to sort vertices within these cells carries with it certain pros and cons. It may be possible to further divide a cell dynamically if the nodes were sorted by xcoordinate, for instance. A node on the far edge of an adjacent cell may not need to be compared to a node on the opposite edge of this cell. There are other analytical purposes that sorting these vertices could serve. However, for the sake of this implementation, it seemed beneficial to leave the cells unsorted, as the lookup within these cells was always done by index, so we never needed to search for any particular item within a cell; all information was retrieved by index. Thus, we are able to create random geometric graphs of size n with a given radius in O(n2) time, which, in practice, is typically much closer to O(n) time when the radius scales inversely with n in order to maintain a constant average vertex degree.

Wireless Sensor Networks

Page 13

COMPONENT ANALYSIS
One consideration which hadn't been anticipated coming into the project but had to be addressed was the issue of separate graphs being created. At time, some nodes would have a degree of 0 or would be an isolated subgraph. The literature and others in this field suggested that the project limit itself to only those graphs with one component, or connected graph. For the smaller graphs in the sample set (n 1,600) this was not a large problem. However, on the largest graphs there were multiple occasions on which the graph consisted of multiple separate components. This was a particular problem on the square graph (edges/corners) as well as on the skewed disc (the sparse center) as can be seen in Figure 7. In order to limit my study only to those graphs with one component, I needed to perform some additional analysis on the graph once it was created to ensure that it was, indeed, one contiguous component. To do this, I use a breadth-first search and keep an array of nodes which have been visited on this search. This requires O(|V|) memory, as I need a bit for each vertex in the graph. Regarding the computational complexity, a breadth-first search will require O(|V| + |E|) time.
Figure 7 - Histogram of the number of components on the Skewed Disc for n=6400 and r = 0.025

VERTEX ORDERING
I implemented the Smallest Last algorithm to order the vertices in the graph based on their degree. The logic here is to sort the nodes based, roughly, on their degree so that we can

Wireless Sensor Networks

Page 14

color those nodes with the highest degree first, as it's likely that their neighbors will have lower degree, thus less competition for color availability, than the large degree nodes. The Smallest-Last (SL) algorithm is designed to work in time O(|V| + |E|). To truly understand the algorithm, I recommend that you view the animations available in the accompanying website. However, I will give a brief description of the algorithm here, as it pertains to the algorithm implementation and data structures. The algorithm is initially interested in grouping vertices by their degree. In order to do this, we first perform a "bucket sort" on all of the vertices in the graph, an O(|V|) operation, assuming that each node stores its number of connections as an O(1) accessible variable. In this case, as all of my vertices stored their connections as a Vector, I was able to retrieve the length field in O(1) time. Once the nodes have been grouped into these buckets, we then begin by taking a vertex of the lowest degree and performing a "cut" on the graph regarding this node. This will simulate the deletion of this node, including all edges connecting to this node. We must then update two features of all nodes to which this node was previously connected (O(|E|) in time): first, we will remove the edge from the other node; second, we will move this node from a bucket of degree D to a bucket of degree D-1, to show that this node's degree is now one less than it was previously. In implementation, one must be careful at this point not to have an algorithm which searches through the entire bucket to find a neighbor node and remove it. If this were the case, the algorithm would approach O(n2) (or O(n lg n), if the bucket were sorted) in its time complexity, as there could be up to n elements in any bucket.

Wireless Sensor Networks

Page 15

The easiest way to handle this is to implement a doubly-linked list for the buckets so that a vertex, upon deletion, can update the previous and next node's pointers in 2 * O(1) time. Note that, because of the lack of traditional pointers in Java, this had to be simulated. As we delete the nodes, we place them into an array or stack which represents the order in which the nodes were deleted. This ordering, when read backwards, will serve as the order in which we color the vertices. By recursively performing this action, we'll eventually consume the entire graph, beginning with low-degree nodes and moving up to higher-degree nodes, for the most part. Thus, the smallest-last ordering can be completed in O(|V| + |E|) time.
18 16 14 Degree When Removed 12 10 8 6 4 2 0 1 47 93 139 185 231 277 323 369 415 461 507 553 599 645 691 737 783 829 875 921 967 1013 1059 1105 1151 1197 1243 1289 1335 1381 1427 1473 1519 1565 Vertex (Smallest Last Ordered, Smallest -> Largest)
Figure 8 - Degree When Removed during smallest-last algorithm

A few interesting observations can be made regarding the degree of nodes when removed from the graph. For one thing, the degree can only decrease by one for consecutively removed nodes. This is because a node will only be selected if it is the lowest degree node in the

Wireless Sensor Networks

Page 16

remaining graph. Thus, the only way a node could be of lower degree is if that node was connected to the node that just got deleted, and is now of degree D - 1. However, the graph can increase in degree for consecutive nodes by any number. Also, the final "plunge" down to zero marks the terminal clique. Other cliques can be seen by the sharp vertical descents elsewhere in the graph. The maximum number of colors to be used in the graph can be computed by the largest degree when removed in the graph plus one. This is because, if a node has degree D, it could use D colors to color its neighbors, and D + 1 colors to color itself. No higher color could possibly be needed.

GRAPH COLORING
Note that the final nodes removed by the Smallest Last algorithm will necessarily be a clique - a subgraph in which every vertex is connected to every other vertex - commonly called the "terminal clique." This is because the algorithm will remove all nodes of lower degree before arriving at m nodes which all have the same degree, m-1, as they are all connected to one another. This is typically one of the larger cliques in the graph. This is a near ideal place to start coloring, as we know that we will need at least m colors to color the graph if there is an m-sized clique. By assigning these colors initially, we can assume that the lower-degree nodes will be easier to color. We apply the greedy algorithm commonly referred to as the "Grundy" function which, given a node off of the Smallest-Last ordering, will analyze all edges connected to this node (O(|E|)) and will find the smallest color which is available, i.e. not used by any adjacent node.

Wireless Sensor Networks

Page 17

To do this, my implementation creates a bitmap of the size of the current colors. It then visits every neighbor - O(|E|) - of the current node and marks the spot it the bitmap pertaining to the color of the visited cell as "taken." After visiting all neighbors, the algorithm then finds the lowest index which has not been marked as "taken," and applies that color to the current vertex. Because this must be done for every node in the graph, the total time of coloring is O(|V| |E|). Applying this function recursively will produce a coloring of the graph which, typically, is a near-optimal solution.

B E NC HM AR K R E SU LT S UM M AR Y & D IS PL A Y
All data related to the performance of my implementation on the benchmark algorithms is detailed extensively on the accompanying website.

Wireless Sensor Networks

Page 18

You might also like