Introduction To Algorithmic Analysis

Introduction to Algorithmic Analysis
Algorithmic Analysis is primarily concerned with bounding the running time (also called the growth rate or the time complexity) of an algorithm. Now, this concept, bounding the running time of an algorithm, is more complex than it appears at first glance. So lets work our way into it slowly. Well first consider what bounding means, then what the running time of an algorithm is, and finally, well consider how these two concepts work together to give us the material in this chapter.
Bounding
Generally, we bound the value of something because we dont know the exact value for that thing. Since we dont know the exact value, we settle for upper and lower limits between which the value we are interested in must lie. These form the upper and lower bounds respectively. For example, lets say that someone robs a bank and a witness says that the robber is between 150 and 170 pounds. Well, that witness probably didnt have the opportunity to determine the exact weight of the robber. The best he could do was to give a range within which the weight of the robber probably lies. So he says something like between 150 and 170 pounds. In this case, 170 pounds would be the upper bound, because it bounds the actual weight of the robber on the upper side. The value of 150 pounds would be the lower bound, because it bounds the weight of the robber on the lower side. In either case, we assume that the actual weight of the robber is in between, hence bounded, by those two numbers. Perhaps the witness bounds the height of the robber in a similar fashion. He may say that the robber was between 6 feet and 6 feet 4 inches tall. Again, he doesnt know the exact height of the robber, but he does the best he can and attempts to place the robbers height within a range. Assuming that he is correct, 6 feet 4 inches would be upper bound on the actual height of the robber, 6 feet 0 inches would be the lower bound, and the actual height would be in between those two values. Having bounds like this is not trivial or a waste of time. Even though we dont know the exact value being bounded, these bounds are frequently very useful. For example, if we know that the height of the robber was between 6 feet and 6 feet 4 inches and we find a suspect who is 5 foot 2 inches tall, then we can reject him as the robber. Similarly, if we know the weight of the robber is between 150 and 170 pounds, were not going to arrest someone who is 250 pounds (unless there is another reason, of course). In general, we would not be able to reject any potential suspect if we didnt at least have the bounds on the height and weight values. It would be better if we had an exact figure, that is true. Nevertheless, these upper and lower bounds still serve a very useful purpose. Note that that usefulness of the bounds is directly related to how tight the bounds are. By tightness is meant how close the bound values are to the real value being bounded. For example, the height bounds of 6 feet to 6 feet 4 inches allow us to reject anyone outside of that range, but those bounds would not allow us to reject anyone within that range. Thus, anyone whose height was between 6 feet and 6 feet 4 inches would remain a potential suspect. If the specified bounds were tighter, however, say 6 feet 1 to 6 feet 2 (with the robbers height being in
2010 Charles O. Shields, Jr.
page: 2
between those two bounds), then we could reject people we couldnt have rejected before, such as those whose height is between 6 feet and 6 feet 1 inch or between 6 feet 2 and 6 feet 4 inches, and wed have a much clearer idea of the actual height of the robber. If the bounds are not very tight at all, then we lose information. Lets say that the witness reported that the robber was between 4 feet and 7 feet in height. Well, these bounds are so loose as to be almost useless. They only people we could reject would be those with at the extremes; virtually everyone else would be in between those two values. The much tighter upper and lower bounds of 6 feet 1 and 6 feet 2 respectively give us far more information about the actual height of the robber, which is, after all, what we are really interested in. In general, therefore, we want our bounds to be as tight as possible. This means that the upper bound must be as low as possible but still greater than or equal to the actual value being bounded, while the lower bound must be a great as possible while still remaining less than or equal to the actual value. This gives a general introduction into the idea of bounding a value. The situation in Algorithmic Analysis, however, is significantly different, since we are attempting to bound the running time (also called the time complexity or growth rate) of an algorithm, a more difficult and subtle concept than bounding a simple value like weight or height. Lets take a look at just what the running time of an algorithm means.
Running Time (Time Complexity) of an Algorithm

Definition of an operation
All non-trivial algorithms do useful work by performing a series of operations. (Some books call these primitive operations.) These operations are simply the steps necessary to accomplish the task defined by the algorithm. For example, lets say that we are given an array of integers, and our task is to determine the sum of the integers in the array. For a small example, lets say the array has five elements in it. It might look like this:
10
21
A reasonable algorithm to solve this problem might be: 1. create a variable called sum and set sum = 0 2. loop through the array elements from the first element to the fifth element a. at each position, add the contents of the current array element to sum 3. return the value of sum When this algorithm runs, we get the following results: Step (1): sum is created and initialized to 0; Step (2): as the loop runs through all five elements, sum equals, respectively, 0,
page: 3
10, 13, 34, 41, and finally 46, which is the sum of these integers. In Step (3) the value of sum is returned in one operation. Note that Step (1) takes two operations worth of work, and Step (3) takes one operation of work. That is a fixed overhead of 3 operations. Step (2) takes 5 operations of work precisely because there are 5 elements in the array. The total number of operations performed by this algorithm is therefore 5+3=8. But what if we generalize the algorithm so that it runs on an array of any size and is not limited to summing just 5 elements? Say we want it to work for an array of n elements, whatever n is for a particular array. We could then rewrite the algorithm as follows: 0. get the value of n 1. create a variable called sum and set sum = 0 2. loop through the array elements from the first element to the nth element i. add the contents of the current array element to sum 3. return the value of sum This would perform the exact same function, i.e., adding up the elements of the array, but now it would work on an array of any size. We just have to specify the size of the array beforehand. But what is the amount of work (i.e., the number of operations) done by the algorithm now? Well, we still have the fixed overhead of 3 operations for Steps (1) and (3), plus one additional operation for step 0. That is a fixed overhead of 4 operations. (We had to add one operation to the overhead to obtain the value of n.) But note what happens in Step (2) now. Instead of being a fixed number, as it was when the size of the array was limited to 5, the number of operations done in Step (2) is now a function of n. It is, in fact, precisely equal to n. When we add everything up, we see that the total number of operations performed by this algorithm is now given by the expression n+4. Note that the total work done by the algorithm, i.e., the total number of operations performed, depends on n, the size of the input. As n increases, so does the number of operations performed by the algorithm. Similarly, as n decreases, so does the amount of work done by the algorithm. In either case, so as long as we know what n is, we can use the formula we developed to calculate the number of operations performed by this algorithm when it runs. Now, we all know that modern computers, even desktop systems, are blindingly fast, and that the amount of time it takes to do one operation, say to add a number to sum, for example, may be measured in microseconds or less. This will vary, of course, from one computer to the next, as some computers are faster than others. Nevertheless, no matter how much time a particular computer takes to do one operation, it is not zero time. It will take less time on a faster computer, more time on a slower computer, but still some non-zero amount of time.
Relationship between operations and actual time

So what is the relationship between the number of operations a particular algorithm requires when it runs and the actual time it takes to run on a specific computer? Well, clearly, if we knew the number of operations a particular algorithm requires for a given input size, we could multiply that number by the average time it takes to do one operation to get an estimate of the actual time the algorithm requires on that computer. Of course, a single operation would take
page: 4
much longer on an old 8086 processor than a modern Cray supercomputer, and therefore, the total time would be longer for any algorithm. But that calculation is not difficult to do. Say, for example, we know that a computer takes 1 second to do one operation. Then this algorithm will take 5+4=9 seconds to run on an array of size 5, and 30+4=34 seconds to run on an array of size 30. If, on the other hand, our computer takes 0.1 second to do one operation, then the total time will be 0.34 seconds. In either case, the number of operations performed is still 34 for this particular input, no matter what computer it is run on. The actual time it takes for the algorithm to run, however, will vary depending on the speed of the specific computer. Although actual time information might be useful in some situations, it is, in general, ignored in algorithmic analysis. There are several reasons for this, but as a threshold issue, note that if we included the actual time a particular computer takes to do one operation in our formulae, wed have to create a different formula for each separate computer. A faster computer would have a different time factor per operation than a slower computer, resulting in a different formula. This would limit the value of our analysis considerably. Rather than be bogged down in computer specific details, we generalize our algorithmic analysis to be specific to an algorithm, and not to a particular computer. We accomplish this by focusing our analysis on the number of operations a particular algorithm requires when it runs, and ignore the time it takes to do one operation. (We have to assume, of course, that the time required for one operation is constant for a given machine, but that is a reasonable assumption given the ultra fast nature of todays computers, and it helps us avoid implementation specific details.) The formulae we develop as a result will apply to any machine that runs a given algorithm. For example, we could run the algorithm we developed above on a 20 year old 8086 processor or on a modern Cray supercomputer, and both will require n+4 operations to run, whatever n is for a particular input array. Thats because were looking at the number of operations and not the actual time the algorithm takes to run on a particular machine. It is this assumption that makes time complexity analysis so worthwhile. By generalizing the analysis to track the number of operations instead of actual time, we get a single formula for an algorithm that applies to any computer on which the algorithm is run. This makes the formula specific for the algorithm and not for the computer. With experience, youll see that this is a huge advantage when doing this type of analysis.
Time complexity formulae

In general, we let capital T represent time when doing this analysis. (Again, this is not very precise terminology, since, as we mentioned, we are really tracking number of operations instead of time. But as we discussed above, this is a simplification that has a purpose and is widely accepted.) Thus, we would say that for this algorithm, T(n) = n+4. Stated in English, this algorithm will require n+4 operations to run on an input size of n. Its time complexity, represented by T(n), is n+4. Weve now created an equation that represents the amount of time it takes for our algorithm to run as a function of n, the input size. This equation is specific to the algorithm, and will apply to whatever computer on which the algorithm is run. It is called a time complexity equation, and what it tracks, in our loose sense, is the running time of the algorithm as a function of n, the size of the input. Equations like this can be generated for almost any algorithm, and the variations are as numerous as there are algorithms. In this case, T(n)=n+4 is a linear equation, since the exponent
page: 5
on the single n term is 1. But any variation you could think of is certainly possible for time complexity equations. Instead of T(n) = n+4, for example, you could have time complexity equations like T(n)=n2 + 4, or T(n)=n25 + 14, or T(n) = n25 + 47n13 5n3 + 56 (Im just making this up), or T(n) = nlog2 n (a very common time complexity for sorting algorithms). Nor do these functions have to be polynomials in n. T(n) = 2n is an exponential time complexity function that is found in many algorithms. (It is called exponential because the n value is in the exponent position.) It all depends on what the particular algorithm does as a function of n when it runs.
Graphing time complexity functions

In our current discussion, it will be useful to look at some graphs of these functions. But before we do, lets take note of a couple of factors that will simplify the graphs. First, since we are talking about time complexities of algorithms for various input sizes, the n value that is fed into the function will always be positive. (It doesnt make any sense to talk about a negative input size.) Furthermore, since the time complexity function measures the amount of work (or the number of operations) of an algorithm as it runs on input size n, T(n) will always be positive as well. (It doesnt make any sense to talk about a negative amount of work.) Thus, these functions can always be graphed in the upper right hand quadrant of a Cartesian coordinate plane, and the other three quadrants can be ignored. If we graph our function, T(n)=n+4, we get a straight line. For simplicity, well omit the 4, since all it does is shift the curve upwards without changing the shape of the curve. (As we will see shortly, its the shape of the curve that is of most importance to us. We will generally ignore additive and multiplicative constants.) Lets graph the output of our time complexity function, T(n), against the size of the input, n. Since we intend to add some more functions to this graph, well include some subscripts so we can distinguish between them. Lets call our current function T1(n)=n, and graph it.
T(n)
T1(n)=n
10
15 20
n
In this case, our function graphs as a straight line. But note what happens if we make even a slight change to the exponent on n. Lets say that instead of T1(n)=n=n1, our time complexity function is T2(n)=n1.2. When we graph these two together, we get something like:
page: 6
T2(n)=n
1.2
T(n)
T1(n)=n
10
15 20
n
since if T2(n) = n1.2, then T2(5) = 6.9, T2(10)=15.8, and T2(15)=25.8. From this graph we can observe a very important point, namely, that the second time complexity function, T2(n)=n1.2, has a very different shape from the first function, T1(n). We note that the output of both time functions increases as the size of the input, n, increases, but we also note that T2 increases at a faster rate than the original time function T1(n)=n increases. The actual shape of the curves is different. This critical point is at the heart of algorithmic analysis, and it is not captured by the simple fact that T2 is greater than T1 for any value of n greater than 1. Of far more importance is the fact that the shape of T2 is different than T1 and arcing upwards more rapidly. We express this fact by saying that the growth rate of T2 is greater than the growth rate of T1. (This analysis is a little bit incomplete, because it doesnt take into account the effect of multiplicative or additive constants. That issue will be discussed shortly, however, so please bear with us.) This same type of analysis can be continued for any time function. Lets say our time function is T3(n)=n2. Wed have to change the scale on our graph to display it, but T3(n)=n2 will increase at an even faster rate than the other two.
2
T3(n)=n
T2(n)=n
1.2
T1(n)=n
T(n)
10
n
page: 7
Here the values for T2(n)=n1.2 and T3(n)=n2 were calculated as follows: n 2 3 5 8 T1(n)=n 2 3 5 8 T2(n)=n1.2 2.3 3.7 6.9 12.1 T3(n)=n2 4 9 25
(not shown)
64
(not shown)
Again, the critical point to be observed from this graph is that, as n increases, T3(n)=n2 grows more rapidly than either of the other two time functions. Stated more succinctly, the growth rate of T3(n)=n2 is higher than the growth rates of the other two functions. Similarly, the growth rate (i.e., the shape of the curve as n increases) is higher for T2(n)=n1.2 than for T1(n)=n, but it is not higher than T3(n)=n2. Functions of the form nk, where k is a constant, are called polynomial functions in n. The growth rate of such functions is determined by the value of the constant exponent: the larger the exponent, the larger the growth rate. Thus, for two functions nk and nt, the growth rate of nk is larger than the growth rate of nt whenever k>t. If k=t, then the two functions have the same growth rate.
Bounding time complexity functions

Now, here comes the main point to which this discussion has been leading. It is a subtle point but very important. As we mentioned above, in time complexity analysis we attempt to bound the time complexity of an algorithm. Well, time complexity is a loose way of describing the shape, and hence the growth rate, of the time complexity function for that particular algorithm. Therefore, we could say that in time complexity analysis we want to establish upper and lower bounds for the growth rate of a time function, exactly as we established upper and lower bounds for the height or weight of a robber. Of course, in the latter case, those height and weight values are single numbers and therefore much easier to bound. In time complexity analysis, on the other hand, we bound functions and not single values. Thus, bounds consisting of single values will not work, as they did with our robber example. So, how do we bound time complexity functions? We do it by finding other functions that have higher growth rates (if we want an upper bound), or lower growth rates (if we want a lower bound). We then let those other functions be the bounds on our specific function Using functions to bound functions is actually a very natural solution, even in the context of the bounding examples of height and weight we looked at previously. Height and weight values are single numbers, and so we use single numbers to bound them. Time complexity functions, on the other hand, are functions, and so we use functions to bound them. Single values wont work. For an example, let us say that we wanted to find an upper and lower bound for the growth rate of T2. As we observed in the last graph above, T3 has a higher growth rate than T2 (it
page: 8
increases more rapidly as n increases), and therefore, T3 could serve as an upper bound for T2. Similarly, T1, on the other hand, has a lower growth rate than T2 (it grows more slowly as n increases). Thus, T1 could serve as a lower bound on T2. Thus, T2 is bounded from above by T3 and bounded from below by T1. Note that, even in the context of using functions to bound functions, the idea of tightness comes into play. We said above that T2 is a lower bound on the growth rate of T3. Well, you can see from the graph that T1 is also a lower bound on the growth rate of T3, but it is not as tight of a bound as T2. (Why is this? Because the growth rate of T2 is closer to the actual growth rate of T3 than the growth rate of T1 is. Therefore, T2 is a tighter lower bound.) Similarly, T3 is an upper bound on T1, but it is not as tight of an upper bound as T2. It would be nice if we had some formal method that could be used to determine if one function is an upper or lower bound on some other function. In the next section, well develop formal criteria and a definition that can be used to do just that. Well begin by looking at two issues: (a) a lower bound on the n values to be used in making that determination, and (b) the role of constants in the definition. Once we have the definition, we will describe a method by which we can show that some functions are upper bounds on other functions. Finally, well consider some practical examples of the method.
Lower bound on the n values

Not all time complexity functions are as straightforward as the ones in our examples, which were simple polynomials in n. In general, functions may not be so well behaved, and there may be regions of the graph where one function is above the other and other regions in the same graph where that situation is reversed. As we will see, this does not necessarily prevent one function from serving as the upper or lower bound on another function. Lets say that we have a time complexity function T4, and wed like to show that another function, T5, is an upper bound on the growth rate of T4. The problem is that T5 crosses T4 multiple times in the lower portion of the graph. We have a situation like this...
T5(n) T4(n)
T(n)
n0
10
page: 9
Although T5 is sometimes greater than and sometimes less than T4, T5 can still serve as the upper bound for T4, provided it meets a very well defined criterion, described shortly. In general, we dont require that T5 be greater than T4 for all values of n, although that certainly was the case with our examples above, T1, T2, and T3. All functions from that set of functions that were upper bounds for other functions in that set were greater for all values of n. But this constraint is too limiting in a general sense. After all, since almost all computers nowadays are fast enough to perform almost any algorithm quickly for small input sizes, we are really concerned about what happens with our function for large values of n. Therefore, what happens in the lower part of the graph (i.e., for small n), is not a major concern, and we need a way of defining the bounding process that ignores that part of the graph. This is accomplished by setting a lower limit on the n values that concern us. When we say that one function, say T5, is an upper bound for another function, say T4, we require that there exist some value of n, lets call it n0, for which T5 is greater than T4 for all values of n that are greater than or equal to n0 (up to some constant, to be discussed shortly). It doesnt make any difference per se what the value of n0 is. It varies for different functions and can be anywhere on the positive X axis. It can be 1 or 10 or a 100 (although clearly, it must be positive). It is just important that there be a specific value of n after which the curves no longer cross. In the case of the upper bound, the bounding function should be higher than the function being bounded after the n0 point. In the case of the lower bound, it should be lower. But the discussion is symmetrical in both cases. In the graph above, n0 is approximately 7 as it is drawn. For values of n less than 7, we see that T5 wanders around and is sometimes greater than and sometimes less than T4. T5, therefore, could not serve as an upper bound for T4 in that region. But after n=7, all values of T5 are greater than all values of T4, and T5 never crosses below T4 after that point. This is the critical characteristic that allows T5 to serve as an upper bound for T4. If we were to express this pseudo-mathematically (keeping in mind that this is a bit incomplete, because we have yet to talk about the role of constants), we could say that some function T5 is an upper bound for some function T4 if there is a positive integer n0 such that T4(n) <= T5(n) for all values of n >= n0. This statement simply uses basic logical and mathematical terms to express what we have been saying.
The role of constants in determining upper bounds

Although weve made good progress in the analysis so far, it remains incomplete until we take into account the effect of constants. To remind ourselves of some comments made earlier, we fundamentally are interested in something deeper than whether one curve has higher values than another in the graph. We are looking for some way to bound growth rates, the actual shape of the curve, and multiplicative constants can sometimes make that analysis more difficult. Consequently, we need some way to eliminate or at least mitigate their influence. For example, lets look at the linear function with which we began this discussion, T1(n)=n, and consider a variation that differs only by a multiplicative constant: T6(n) = 2n. Note that both of these functions are linear, which means that their basic growth rates are the same. Since the growth rates are the same, we would want them to be considered the same in our algorithmic analysis. Yet, because of the multiplicative constant in T6, this is not possible in our current analysis. T6 will be greater than T1 for all values of n.
page: 10
If we graph T1 and T6 together, we get something like:
T6(n)=2n
T(n)
T1(n)=n
10
15 20
n
T6 is also a straight line (i.e., linear), it just has a higher slope than T1. If we let n0 = 1, then we can say that T1(n) <= T6(n) for all values of n >= n0. Apparently, then, T6 fulfills the definition for an upper bound on T1 as the definition has been written so far. The problem with this analysis is that we really dont want to consider two functions that differ only by a constant to have different time complexities. Time complexity analysis is not based solely on whether one function is larger than another in the graph, but on the growth rates of the two functions (i.e., how quickly they grow as n increases). In the case of T1(n)=n and T6(n)=2n, the growth rates (in contrast to the actual values of T1 or T6) are exactly the same -they are both linear -- even though T6 is apparently a larger function because of the times 2 factor. (Note that this was not the case when comparing T1(n)=n, T2(n)= n1.2, and T3(n)=n2. There the growth rates of all three functions were different (and listed in order of increasing growth rate).) Since T1 and T6 have the same growth rate, we would want either one to be able to serve as an upper bound for the other. This would not work, however, with our current definition, since the inequality T6(n) <= T1(n) is not true for any value of n. The reason for making this distinction between growth rates and actual values in the graph is that basing upper and lower bound analysis merely on the actual values provides a too high granularity in the comparison of time complexity functions. It can make one function appear to be an upper bound when it really isnt. What we really want to look at, in effect, is the shape of the curve rather than how steep it is. The differences provided by the different shapes of curves are far more powerful than those differences provided by constants. An actual example will illustrate this point.
Growth rates always predominate over multiplicative constants

Lets consider two different time complexity functions f(n)=250n and g(n)=n1.001. (Not all authors follow the convention of using T to represent time complexity functions. Using functions such as f and g in time complexity analysis is also perfectly acceptable as long as the context is clear.) It looks like f(n) is going to be a much larger function than g(n). We quickly check some numbers to verify this and come up with:
page: 11
n 10 100 2000 5000
f(n) = 250*n 2500 25,000 500,000 1,250,000
g(n) = n1.001 10.02 100.5 2,015.3 5,042.8
h(n) = n 10 100 2000 5000
Here, f(n) is much larger than g(n) for all the numbers we checked, and at first glance, it looks like f(n) is always going to be larger than g(n). We might conclude, if we look solely at the values in the graph, that f(n) has a higher growth rate than g(n). (As a quick point, note that this would not be the case if the multiplicative constant 250 were not there, and the h(n)=n column is included to emphasize this point. If we compare h(n) and g(n), we see that g(n) is larger than h(n) for all values of n.). However, that is not the case. It can be easily shown (and this is a skill that you will learn shortly) that there exists a specific n value for which g(n) will cross f(n) and be larger from that point onwards. Stated another way that makes use of the partial definition created above, there is an n0 value beyond which g(n) will be larger than f(n) and never again cross below it. (In logical terms, we say that there is a positive integer n0, where f(n) <= g(n) for all n >= n0.) In this particular case, that n value will be 2501000, a very large number indeed, but still, just a number. (We can calculate that value by solving the equation 250n <= n1.001 for n. To do that, first divide both sides by n. (Note that this is allowed, since we know that n>=0.) That gives us 250 <= n0.001. Then, take both sides to the 1000th power to get the result that 2501000 <= n1.) For all values of n greater than 2501000, g(n) will be larger than f(n). Thus, although it looks like f(n) is larger than g(n), this is really true only for n values on the lower end of the graph. There is an n0 value beyond which g(n) will be larger than f(n). Therefore, g(n) is an upper bound on f(n), rather than the other way around, a fact that was at first obscured by the large multiplicative constant in f(n). This illustrates an extremely important point, namely that the difference in the shape of the curve (f(n) is linear whereas g(n) is slightly greater than linear) will eventually overwhelm the effect of any multiplicative constant. The constant simply pushes that point of crossing to different places along the graph. For example, if the constant were 350 instead of 250, that n0 value would be 3501000, an even greater number. If the constant were less than 250, then n0 would be correspondingly smaller. But no matter how large the constant is in f(n), there will be some n0 value after which g(n) will be greater than f(n). The underlying reason for this is so very important that it is worth restating: the shape of the curve (i.e., the growth rate) will eventually overwhelm the effect of any constant. A function in n that has an exponent of 1.001, say, on the n term, has a different growth rate than a function that has an exponent of 1 on the n, and the former function has a larger growth rate than the latter. At some point, that larger growth rate will overwhelm the effect of any constant in the function with a lower growth rate. This factor needs to be captured in our time complexity analysis. We need to be able to focus on the growth rates of the various functions, and not get sidetracked by the effect of constants. We want the growth rates specifically to be the determinative factor, and not merely the issue of whether one function is greater than another when they are graphed. So how do we fix this problem? We fix it by adding a constant to the definition. We now say that some function g(n) is an upper bound for some function f(n) if there is a positive
page: 12
integer n0, and a real constant c>0, such that f(n) <= c*g(n) for all n >= n0. Here, f(n) is the function being bounded, and g(n) is the bounding function (upper bound, in this case). Weve simply added the possibility of including a multiplicative constant to the bounding (not the bounded) function. What is the effect of this constant c? It eliminates the effect of any such constant in f(n). For example, if we want to show that g(n)=n is an upper bound on f(n)=2n, we simply let c=2 and n0=1. Given these constants, it is now clear that f(n) <= c*g(n) for all n >= n0. Why? Because 2n (=f(n)) is indeed less than or equal to 2 * n (=c*g(n)) for all n >= 1 (=n0). Given these constants, these two functions now fulfill the definition, and g(n)=n is indeed an upper bound on f(n)=2n. (Well work some examples that illustrate this process in more detail.)
Big-Oh Notation
In the literature, these concepts have been incorporated under the name of the Big-Oh notation. (It is called Big-Oh because there is also a small-Oh relationship that has a different definition.) Using set notation, one reasonable definition for Big-Oh is the following: O(g(n)) = {f(n) | there are constants n0 and c>0 such that f(n) <= c*g(n) for all n >= n0} That is, Big-Oh of some function g(n) is the set of functions f(n) that are bounded from above by g(n). In the literature, this idea is expressed colloquially by saying that f(n) is O(g(n)), or even, f(n) = O(g(n)). (Some authors even use the more correct set notation and say that f(n) O(g(n)).) The function inside the Big-Oh, g(n) in this case, is the bounding function (that is, it is the upper bound); the function outside the Big-Oh, f(n) in this case, is the bounded function. To say that some function f(n) is Big-Oh of some function g(n) means that f(n) is bounded from above by g(n), which is another way of saying that the growth rate of g(n) is greater than or equal to the growth rate of f(n). We can determine whether some function f(n) is bounded from above by g(n) by finding correct constants n0 and c and then showing that the different elements of the definition are fulfilled with those constants. That is the subject of the next section.
Using the Big-Oh notation

Our ultimate goal is to be able to use the Big-Oh definition as a tool to determine whether one function could be an upper bound of some other function. For a general example (specific examples will follow), let us say that we have two functions in n, f(n) and g(n), that these are time complexity functions for some algorithms, and that we want to show that f(n) = O(g(n)) (stated in English, we want to show that f(n) is Big-Oh of g(n).) Before we get into the details of how to do that, lets restate the definition in a way that will prove more useful to our efforts: If there exists constants n0 and c such that f(n) <= c*g(n) for all n >= n0, then f(n) = O(g(n)).
page: 13
This definition is now a propositional statement of the form p implies q, or if p then q, where p is the proposition there exists constants n0 and c such that f(n) <= c*g(n) for all n >= n0, and q is the proposition f(n) = O(g(n)). Because of the logical structure of this statement, if we can show that p is true, we can conclude that q is true as well. There are several things to clarify here. First, notice that in the statement f(n) = O(g(n)) there is a function inside the Big-Oh, g(n) in this case, and a function outside the BigOh, f(n) in this case. The function inside the Big-Oh is the bounding function, that is, it is the upper bound. The function outside the Big-Oh is the function being bounded. The relationship between these is clearer if we look at the definition in a semi-graphical form, with arrows drawn to show the relationship between some of the components:
if there exists constants n0 and c such that: then we can conclude that:
f(n) c*g(n)
for all n n0,
f(n) is O(g(n))
Note that the function that appears inside the Big-Oh, namely g(n), is the same function that is greater than or equal to (up to a constant) the function that appears outside the Big-Oh, namely f(n). The function g(n) is therefore an upper bound on f(n) (i.e., its on the big side of the inequality). The constant c in the definition eliminates the effect of any possible constants in f(n), since we can set c to be anything we want, including something greater than the constant in f(n). And the constant n0 eliminates from our consideration lower values of n. Putting all of these elements together, this definition now focuses on how the growth rates of these two functions relate for large values of n, and that is exactly what we wanted all along. Based on this definition of the Big-Oh, the following four step process can be used to prove that some function f(n) is O(g(n)): 1) Determine the inequality template for that particular problem. (This template is the inequality found in the first line of the definition.) Take particular note of which function is the bounding function, and which is the bounded function. Specify some values for n0 and c. Note that n0 must be a positive integer greater than or equal to 1 (n0>=1), and c is any real number strictly greater than 0 (c>0). Show that the elements of the definition are true with those two constants and the functions f(n) and g(n). In general, this requires us to show that the inequality is true with the four values for f(n), c, g(n), and n0 plugged in. (We simply plug in the four values into the template.)
2) 3)
page: 14
4)
If the inequality is true with the specified values, we can conclude that f(n) is O(g(n)). (This follows from the propositional nature of the definition, if p then q.)
Lets work through some examples to illustrate how this is done.
Example 1:
Let f(n) = n and g(n) = n. Show that f(n) = O(g(n)).
Here, both f(n) and g(n) are the same function. This is no problem, and we can still show that f(n) = O(g(n)). All we have to do is find constants n0 and c such that the definition holds. We can set these values to anything we want within the described limits; we just have to find some correct ones. Following the four step process, we first determine the inequality template for this particular problem. As illustrated in the diagram above, the positions of the functions within the template are given by the positions of the functions in what we are trying to prove. Since g(n) is inside the Big-Oh, it will be on the big side of the inequality, and f(n) will be on the other side. For this problem, therefore, the inequality template is: f(n) <= c*g(n) for all n >= n0 We now need to determine some specific values for c and n0. Well, lets try a few and see what happens. What happens if we let c=1 and n0=1? Is the definition true? Well, lets plug these values into the business end of the definition (i.e., the inequality) and find out. Our goal is to see if f(n) c*g(n) for all nn0 for these particular values. When the values f(n) = n, c=1, g(n) = n, and n0 = 1 are all plugged into the inequality template, we get the statement: n <= 1*n for all n >= 1. We now need to determine if this statement is true. Simplifying a bit, we see that this statement is the same as n n when n1, which is obviously true. Since the statement is true when the specified values for n0, c, f(n), and g(n) are used in the inequality template, we can conclude that f(n)=n is O(g(n)). Thats all it takes to make a proof. Since our definition is of the form if p then q, and we have shown that p is true, we can conclude that q is true as well.
Example 2:
Let f(n) = 2n and let g(n) = n. Show that f(n) = O(g(n)).
In this problem, the constants n0=1 and c=1 wont work. Why? Because when we plug these values into the definition, the resulting inequality isnt true. Lets verify that. If f(n)=2n, c=1, g(n)=n, and n0=1, and we plug these values into the inequality template for this problem: f(n) <= c*g(n) for all n >= n0
page: 15
we get the statement: 2n <= 1*n (= n) for all n >= 1 This statement is the same as 2n <= n for all n >= 1, which is clearly not true for any n (remember that n is always positive). For example, let n=5. Then the inequality says that 2*5 <= 5 or 10 <= 5. This statement is mathematically false. Hence, we cannot conclude that f(n) = O(g(n)) by using these constants. Now, the fact that the definition doesnt work for a specific pair of constants doesnt mean it wont work for another pair. The definition of Big-Oh requires us to find just a single pair that works; it does not have to work for all pairs. So lets try a different set of constants, say c=2 and n0 = 1, and see what happens. When we plug these values into the same template inequality f(n) <= c*g(n) for all n >= n0 we get: 2n 2*n for all n >= 1 This statement is the same as 2n <= 2n for all n >= 1. In contrast to the previous result, this is clearly true for all n. Therefore the definition holds, and we can conclude that f(n) = O(g(n)) from that analysis alone. What if we let c=3? Then wed have the statement 2n <= 3n for all n >= 1. This statement is clearly true as well. What about c=125? Then wed have 2n <= 125n for all n >=1. This is also true. In fact, the inequality will be true for all c >= 2 and all n0 >= 1. It is true for an infinite number of values for both c and n0. This was a lucky occurrence on this problem. In general, when doing Big-Oh analysis, it is not necessary to show that the definition is true for an infinite number of cs and n0s. We really need to demonstrate only one such pair to complete the proof. In this case, however, it is certainly true for many such values of c and n0. What if we let c be some value less than 2, say c=1.5? Now the definition does not hold. The inequality 2n <= 1.5n is not true for any values of n. Thus, c=1.5 wont work at all, no matter what n0 value we choose.
Example 3:
Let f(n) = 2n and g(n) = n (exactly as in Example 2). Show that g(n) = O(f(n)).
Now we need to find constants n0 and c such that g(n) <= c*f(n) for all n>= n0. Note that even though the values for f(n) and g(n) have not changed, f(n) is now the bounding function, while g(n) is the function being bounded. Thus, these two functions have changed positions in the definition from the previous problem. It is very important that this orientation be correct in the inequality template when working these problems.
page: 16
Again, finding correct constants is easy for this problem. Lets examine c=1 and n0=1 and see if they work. If we plug these into the general inequality specified by the definition: g(n) <= c*f(n) for all n >= n0 we get: n <= 1*(2n) for all n >= 1 which is the same as: n <= 2n for all n >= 1 This statement is clearly true for all n values. (Example: let n=5. Then the statement says that 5 <= 10, which is true.) Since the inequality is true with the specified constants, we can conclude that g(n) = O(f(n)).
Example 4:
Let f(n) = n and g(n) = n2. Show that f(n) = O(g(n)).
Finding constants c and n0 that will work here is also a simple matter. Lets let c=1 and n0 =1 and see what happens. Since g(n) is inside the Big-Oh, the inequality template for this problem is: f(n) <= c*g(n) for all n >= n0 When we plug in the values f(n)=n, c=1, g(n)=n2, and n0=1, we get: n <= 1*n2 for all n >= 1. This statement is the same as n <= n2 for all n >= 1. Since n is always positive, this is clearly a true statement, and we can conclude that f(n) = O(g(n)).
Example 5:
Let f(n) = 25n and g(n) = n2. Show that f(n) = O(g(n)).
Here the growth rate of g(n) is clearly higher than the growth rate of f(n), so we expect to be able to prove this. However, the presence of the multiplicative constant in f(n) creates a problem. Because of it, our previous constants of c=1 and n0=1 will no longer work. Lets quickly verify this. The template for this problem is again: f(n) <= c*g(n) for all n >= n0 But when we plug in the values f(n)=25n, c=1, g(n)=n2, and n0=1, we get: 25n <= 1*n2 for all n >= 1.
page: 17
This statement is not true for any n <= 25. For example, let n=10. When we plug n=10 into that last statement, we get: 25*10 = 250 <= (10)2 = 100. In other words, the statement says that 250 <= 100. This is clearly false mathematically. As mentioned in an earlier discussion, the fact that a given pair of constants c and n0 doesnt work does not necessarily mean that we cant make the proof. There may be another pair of constants that would work. In this case, we can observe that the inequality above fails only for values of n < 25, while it is correct for all values of n >= 25. We could thus set n0, the lowest value of n that we are willing to consider, to 25. So lets try the constants c=1 and n0=25, and see what happens. When we plug these into the template, we get the statement: 25n <= n2 for all n >= 25. This statement is now true. For example, if n=25, the lowest value of n now allowed, then the statement says that: 25*25 = 625 <= (25)2 = 625 or 625 = 625, which is true. Say n=50. Then the statement now says that 25*50 = 1,250 <= (50)2 = 2,500, which is also true. Clearly, the statement is true for all values of n greater than or equal to 25 (but it is not true for any n < 25). Thus, the values of c=1 and n0=25 also satisfy the definition, and we can conclude from them that f(n) = O(g(n)). Setting the lower limit of the n values we were willing to consider to 25 worked nicely. Are there other constant pairs that would work? How about setting c=25? If we did, what value for n0 would work? If we let c=25 and n0=1 and plug them into the template, we get the following statement: 25n <= 25*n2 for all n >= 1. This statement is the same as 25n <= 25n2 for all n >= 1. This is clearly true as well. Thus, the constant pair c=25 and n0 = 1 also satisfies the definition. Notice that changing the c value allowed us to use a different n0 value. These two constants must work together to achieve a correct statement. What if we let c=5? Is there a value of n0 that would work? If c=5, then we have the inequality: 25n <= 5*n2 for all n >= n0 We havent figured out n0 yet, but we can do that with a little math. Lets solve the inequality 25n <= 5n2 for n and see when it is true.
page: 18
If we divide both sides by 5n (acceptable, since n is positive), we get the inequality 5<=n. This result says that the original inequality is true for all n >= 5. This small calculation gives us a clue for yet another constant pair. Apparently, if we set the lower bound on n values to 5, the inequality will be true. So lets set c=5 and n0=5 and see if that pair works. If we plug those values into the template, we get: 25n <= 5*n2 for all n >= 5. Mathematically, this statement is the same as the statement 5<=n for all n>=5, which is trivially true. Nevertheless, lets take a few examples and see how it works. If n=5, then this statement says that 25*5 =125 <= 5*(5)2 = 5*25 = 125, or 125 <= 125. This is clearly true. If n=10, then this statement says that 25*10 = 250 <= 5*(10)2 = 5*100 = 500, or 250 <= 500. This is also true. In fact, this statement will indeed be true for all n >= 5. Thus, we have found another constant pair, c=5 and n0=5, that will satisfy the definition and lead to a successful proof. This example illustrates an important point: The constants c and n0 have to work together. If we change the value of one, there is oftentimes a different value on the second that will lead to a successful proof. Just remember that the overall objective in Big-Oh analysis is to find a single pair that will make the inequality true.
Example 6:
Let f(n) = n and g(n) = n2 (as in a previous example). Is g(n) = O(f(n))?
The answer to this question is no, g(n) is not O(f(n)). We know this because both f(n) and g(n) are polynomial functions and the exponent in f(n) is 1 and the exponent in g(n) is 2. Since g(n)s exponent is larger, it has the higher growth rate, and there is no way that f(n), with a lower growth rate, could bound the growth rate of g(n) from above. (f(n) could be a lower bound for g(n), however, but that is a different story.) Nevertheless, as a learning experience, lets attempt to prove that g(n) = O(f(n)). Well go through the process and see where the proof fails. Since we want to show that g(n) = O(f(n)), the inequality template for this problem is: g(n) <= c*f(n) for all n >= n0 We begin by plugging in the values for g(n) and f(n), and see what we have. When we plug in g(n)=n2 and f(n) = n, we get the statement: n2 <= c*n for all n >= n0 If we are going to complete this proof, we still have to find values for c and n0 that make this statement true. However, there is a significant problem already. There is no way that this inequality can be true for large values of n no matter what values are chosen for c and n0.
page: 19
Why is that? Well, lets examine the conditions under which this inequality, n2<=c*n, is true. If we divide both sides by n, we end up with the statement n<=c. In other words, this inequality is true only when n is less than or equal to the constant c. This is a serious problem. It means that the inequality can only be true for values that are less than some fixed constant, i.e., the lower values of n. Therefore, the inequality can never be true for large values of n extending to infinity. Thats why we didnt even need to get to the step of giving specific values to c and n0. It really doesnt make any difference what c is. This inequality will only be true for the lower portion of the graph (in terms of n values), never in the upper portion in which we are truly interested. Since the inequality cannot be true for large values of n extending to infinity, there is no n0 value for which the inequality is true for all n >= n0. Therefore, the proof fails, and we cannot conclude that g(n) = O(f(n)).

Introduction To Algorithmic Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Algorithmic Analysis

Uploaded by

Copyright:

Available Formats

Introduction to Algorithmic Analysis

2010 Charles O. Shields, Jr.