Professional Documents
Culture Documents
Algorithmic Analysis is primarily concerned with bounding the running time (also called the growth rate or the time complexity) of an algorithm. Now, this concept, bounding the running time of an algorithm, is more complex than it appears at first glance. So lets work our way into it slowly. Well first consider what bounding means, then what the running time of an algorithm is, and finally, well consider how these two concepts work together to give us the material in this chapter.
Bounding
Generally, we bound the value of something because we dont know the exact value for that thing. Since we dont know the exact value, we settle for upper and lower limits between which the value we are interested in must lie. These form the upper and lower bounds respectively. For example, lets say that someone robs a bank and a witness says that the robber is between 150 and 170 pounds. Well, that witness probably didnt have the opportunity to determine the exact weight of the robber. The best he could do was to give a range within which the weight of the robber probably lies. So he says something like between 150 and 170 pounds. In this case, 170 pounds would be the upper bound, because it bounds the actual weight of the robber on the upper side. The value of 150 pounds would be the lower bound, because it bounds the weight of the robber on the lower side. In either case, we assume that the actual weight of the robber is in between, hence bounded, by those two numbers. Perhaps the witness bounds the height of the robber in a similar fashion. He may say that the robber was between 6 feet and 6 feet 4 inches tall. Again, he doesnt know the exact height of the robber, but he does the best he can and attempts to place the robbers height within a range. Assuming that he is correct, 6 feet 4 inches would be upper bound on the actual height of the robber, 6 feet 0 inches would be the lower bound, and the actual height would be in between those two values. Having bounds like this is not trivial or a waste of time. Even though we dont know the exact value being bounded, these bounds are frequently very useful. For example, if we know that the height of the robber was between 6 feet and 6 feet 4 inches and we find a suspect who is 5 foot 2 inches tall, then we can reject him as the robber. Similarly, if we know the weight of the robber is between 150 and 170 pounds, were not going to arrest someone who is 250 pounds (unless there is another reason, of course). In general, we would not be able to reject any potential suspect if we didnt at least have the bounds on the height and weight values. It would be better if we had an exact figure, that is true. Nevertheless, these upper and lower bounds still serve a very useful purpose. Note that that usefulness of the bounds is directly related to how tight the bounds are. By tightness is meant how close the bound values are to the real value being bounded. For example, the height bounds of 6 feet to 6 feet 4 inches allow us to reject anyone outside of that range, but those bounds would not allow us to reject anyone within that range. Thus, anyone whose height was between 6 feet and 6 feet 4 inches would remain a potential suspect. If the specified bounds were tighter, however, say 6 feet 1 to 6 feet 2 (with the robbers height being in
page: 2
between those two bounds), then we could reject people we couldnt have rejected before, such as those whose height is between 6 feet and 6 feet 1 inch or between 6 feet 2 and 6 feet 4 inches, and wed have a much clearer idea of the actual height of the robber. If the bounds are not very tight at all, then we lose information. Lets say that the witness reported that the robber was between 4 feet and 7 feet in height. Well, these bounds are so loose as to be almost useless. They only people we could reject would be those with at the extremes; virtually everyone else would be in between those two values. The much tighter upper and lower bounds of 6 feet 1 and 6 feet 2 respectively give us far more information about the actual height of the robber, which is, after all, what we are really interested in. In general, therefore, we want our bounds to be as tight as possible. This means that the upper bound must be as low as possible but still greater than or equal to the actual value being bounded, while the lower bound must be a great as possible while still remaining less than or equal to the actual value. This gives a general introduction into the idea of bounding a value. The situation in Algorithmic Analysis, however, is significantly different, since we are attempting to bound the running time (also called the time complexity or growth rate) of an algorithm, a more difficult and subtle concept than bounding a simple value like weight or height. Lets take a look at just what the running time of an algorithm means.
10
21
A reasonable algorithm to solve this problem might be: 1. create a variable called sum and set sum = 0 2. loop through the array elements from the first element to the fifth element a. at each position, add the contents of the current array element to sum 3. return the value of sum When this algorithm runs, we get the following results: Step (1): sum is created and initialized to 0; Step (2): as the loop runs through all five elements, sum equals, respectively, 0,
page: 3
10, 13, 34, 41, and finally 46, which is the sum of these integers. In Step (3) the value of sum is returned in one operation. Note that Step (1) takes two operations worth of work, and Step (3) takes one operation of work. That is a fixed overhead of 3 operations. Step (2) takes 5 operations of work precisely because there are 5 elements in the array. The total number of operations performed by this algorithm is therefore 5+3=8. But what if we generalize the algorithm so that it runs on an array of any size and is not limited to summing just 5 elements? Say we want it to work for an array of n elements, whatever n is for a particular array. We could then rewrite the algorithm as follows: 0. get the value of n 1. create a variable called sum and set sum = 0 2. loop through the array elements from the first element to the nth element i. add the contents of the current array element to sum 3. return the value of sum This would perform the exact same function, i.e., adding up the elements of the array, but now it would work on an array of any size. We just have to specify the size of the array beforehand. But what is the amount of work (i.e., the number of operations) done by the algorithm now? Well, we still have the fixed overhead of 3 operations for Steps (1) and (3), plus one additional operation for step 0. That is a fixed overhead of 4 operations. (We had to add one operation to the overhead to obtain the value of n.) But note what happens in Step (2) now. Instead of being a fixed number, as it was when the size of the array was limited to 5, the number of operations done in Step (2) is now a function of n. It is, in fact, precisely equal to n. When we add everything up, we see that the total number of operations performed by this algorithm is now given by the expression n+4. Note that the total work done by the algorithm, i.e., the total number of operations performed, depends on n, the size of the input. As n increases, so does the number of operations performed by the algorithm. Similarly, as n decreases, so does the amount of work done by the algorithm. In either case, so as long as we know what n is, we can use the formula we developed to calculate the number of operations performed by this algorithm when it runs. Now, we all know that modern computers, even desktop systems, are blindingly fast, and that the amount of time it takes to do one operation, say to add a number to sum, for example, may be measured in microseconds or less. This will vary, of course, from one computer to the next, as some computers are faster than others. Nevertheless, no matter how much time a particular computer takes to do one operation, it is not zero time. It will take less time on a faster computer, more time on a slower computer, but still some non-zero amount of time.
page: 4
much longer on an old 8086 processor than a modern Cray supercomputer, and therefore, the total time would be longer for any algorithm. But that calculation is not difficult to do. Say, for example, we know that a computer takes 1 second to do one operation. Then this algorithm will take 5+4=9 seconds to run on an array of size 5, and 30+4=34 seconds to run on an array of size 30. If, on the other hand, our computer takes 0.1 second to do one operation, then the total time will be 0.34 seconds. In either case, the number of operations performed is still 34 for this particular input, no matter what computer it is run on. The actual time it takes for the algorithm to run, however, will vary depending on the speed of the specific computer. Although actual time information might be useful in some situations, it is, in general, ignored in algorithmic analysis. There are several reasons for this, but as a threshold issue, note that if we included the actual time a particular computer takes to do one operation in our formulae, wed have to create a different formula for each separate computer. A faster computer would have a different time factor per operation than a slower computer, resulting in a different formula. This would limit the value of our analysis considerably. Rather than be bogged down in computer specific details, we generalize our algorithmic analysis to be specific to an algorithm, and not to a particular computer. We accomplish this by focusing our analysis on the number of operations a particular algorithm requires when it runs, and ignore the time it takes to do one operation. (We have to assume, of course, that the time required for one operation is constant for a given machine, but that is a reasonable assumption given the ultra fast nature of todays computers, and it helps us avoid implementation specific details.) The formulae we develop as a result will apply to any machine that runs a given algorithm. For example, we could run the algorithm we developed above on a 20 year old 8086 processor or on a modern Cray supercomputer, and both will require n+4 operations to run, whatever n is for a particular input array. Thats because were looking at the number of operations and not the actual time the algorithm takes to run on a particular machine. It is this assumption that makes time complexity analysis so worthwhile. By generalizing the analysis to track the number of operations instead of actual time, we get a single formula for an algorithm that applies to any computer on which the algorithm is run. This makes the formula specific for the algorithm and not for the computer. With experience, youll see that this is a huge advantage when doing this type of analysis.
page: 5
on the single n term is 1. But any variation you could think of is certainly possible for time complexity equations. Instead of T(n) = n+4, for example, you could have time complexity equations like T(n)=n2 + 4, or T(n)=n25 + 14, or T(n) = n25 + 47n13 5n3 + 56 (Im just making this up), or T(n) = nlog2 n (a very common time complexity for sorting algorithms). Nor do these functions have to be polynomials in n. T(n) = 2n is an exponential time complexity function that is found in many algorithms. (It is called exponential because the n value is in the exponent position.) It all depends on what the particular algorithm does as a function of n when it runs.
T(n)
T1(n)=n
10
15 20
n
In this case, our function graphs as a straight line. But note what happens if we make even a slight change to the exponent on n. Lets say that instead of T1(n)=n=n1, our time complexity function is T2(n)=n1.2. When we graph these two together, we get something like:
2010 Charles O. Shields, Jr.
page: 6
T2(n)=n
1.2
T(n)
T1(n)=n
10
15 20
n
since if T2(n) = n1.2, then T2(5) = 6.9, T2(10)=15.8, and T2(15)=25.8. From this graph we can observe a very important point, namely, that the second time complexity function, T2(n)=n1.2, has a very different shape from the first function, T1(n). We note that the output of both time functions increases as the size of the input, n, increases, but we also note that T2 increases at a faster rate than the original time function T1(n)=n increases. The actual shape of the curves is different. This critical point is at the heart of algorithmic analysis, and it is not captured by the simple fact that T2 is greater than T1 for any value of n greater than 1. Of far more importance is the fact that the shape of T2 is different than T1 and arcing upwards more rapidly. We express this fact by saying that the growth rate of T2 is greater than the growth rate of T1. (This analysis is a little bit incomplete, because it doesnt take into account the effect of multiplicative or additive constants. That issue will be discussed shortly, however, so please bear with us.) This same type of analysis can be continued for any time function. Lets say our time function is T3(n)=n2. Wed have to change the scale on our graph to display it, but T3(n)=n2 will increase at an even faster rate than the other two.
2
T3(n)=n
T2(n)=n
1.2
T1(n)=n
T(n)
10
n
2010 Charles O. Shields, Jr.
page: 7
Here the values for T2(n)=n1.2 and T3(n)=n2 were calculated as follows: n 2 3 5 8 T1(n)=n 2 3 5 8 T2(n)=n1.2 2.3 3.7 6.9 12.1 T3(n)=n2 4 9 25
(not shown)
64
(not shown)
Again, the critical point to be observed from this graph is that, as n increases, T3(n)=n2 grows more rapidly than either of the other two time functions. Stated more succinctly, the growth rate of T3(n)=n2 is higher than the growth rates of the other two functions. Similarly, the growth rate (i.e., the shape of the curve as n increases) is higher for T2(n)=n1.2 than for T1(n)=n, but it is not higher than T3(n)=n2. Functions of the form nk, where k is a constant, are called polynomial functions in n. The growth rate of such functions is determined by the value of the constant exponent: the larger the exponent, the larger the growth rate. Thus, for two functions nk and nt, the growth rate of nk is larger than the growth rate of nt whenever k>t. If k=t, then the two functions have the same growth rate.
page: 8
increases more rapidly as n increases), and therefore, T3 could serve as an upper bound for T2. Similarly, T1, on the other hand, has a lower growth rate than T2 (it grows more slowly as n increases). Thus, T1 could serve as a lower bound on T2. Thus, T2 is bounded from above by T3 and bounded from below by T1. Note that, even in the context of using functions to bound functions, the idea of tightness comes into play. We said above that T2 is a lower bound on the growth rate of T3. Well, you can see from the graph that T1 is also a lower bound on the growth rate of T3, but it is not as tight of a bound as T2. (Why is this? Because the growth rate of T2 is closer to the actual growth rate of T3 than the growth rate of T1 is. Therefore, T2 is a tighter lower bound.) Similarly, T3 is an upper bound on T1, but it is not as tight of an upper bound as T2. It would be nice if we had some formal method that could be used to determine if one function is an upper or lower bound on some other function. In the next section, well develop formal criteria and a definition that can be used to do just that. Well begin by looking at two issues: (a) a lower bound on the n values to be used in making that determination, and (b) the role of constants in the definition. Once we have the definition, we will describe a method by which we can show that some functions are upper bounds on other functions. Finally, well consider some practical examples of the method.
T5(n) T4(n)
T(n)
n0
10
page: 9
Although T5 is sometimes greater than and sometimes less than T4, T5 can still serve as the upper bound for T4, provided it meets a very well defined criterion, described shortly. In general, we dont require that T5 be greater than T4 for all values of n, although that certainly was the case with our examples above, T1, T2, and T3. All functions from that set of functions that were upper bounds for other functions in that set were greater for all values of n. But this constraint is too limiting in a general sense. After all, since almost all computers nowadays are fast enough to perform almost any algorithm quickly for small input sizes, we are really concerned about what happens with our function for large values of n. Therefore, what happens in the lower part of the graph (i.e., for small n), is not a major concern, and we need a way of defining the bounding process that ignores that part of the graph. This is accomplished by setting a lower limit on the n values that concern us. When we say that one function, say T5, is an upper bound for another function, say T4, we require that there exist some value of n, lets call it n0, for which T5 is greater than T4 for all values of n that are greater than or equal to n0 (up to some constant, to be discussed shortly). It doesnt make any difference per se what the value of n0 is. It varies for different functions and can be anywhere on the positive X axis. It can be 1 or 10 or a 100 (although clearly, it must be positive). It is just important that there be a specific value of n after which the curves no longer cross. In the case of the upper bound, the bounding function should be higher than the function being bounded after the n0 point. In the case of the lower bound, it should be lower. But the discussion is symmetrical in both cases. In the graph above, n0 is approximately 7 as it is drawn. For values of n less than 7, we see that T5 wanders around and is sometimes greater than and sometimes less than T4. T5, therefore, could not serve as an upper bound for T4 in that region. But after n=7, all values of T5 are greater than all values of T4, and T5 never crosses below T4 after that point. This is the critical characteristic that allows T5 to serve as an upper bound for T4. If we were to express this pseudo-mathematically (keeping in mind that this is a bit incomplete, because we have yet to talk about the role of constants), we could say that some function T5 is an upper bound for some function T4 if there is a positive integer n0 such that T4(n) <= T5(n) for all values of n >= n0. This statement simply uses basic logical and mathematical terms to express what we have been saying.
page: 10
T6(n)=2n
T(n)
T1(n)=n
10
15 20
n
T6 is also a straight line (i.e., linear), it just has a higher slope than T1. If we let n0 = 1, then we can say that T1(n) <= T6(n) for all values of n >= n0. Apparently, then, T6 fulfills the definition for an upper bound on T1 as the definition has been written so far. The problem with this analysis is that we really dont want to consider two functions that differ only by a constant to have different time complexities. Time complexity analysis is not based solely on whether one function is larger than another in the graph, but on the growth rates of the two functions (i.e., how quickly they grow as n increases). In the case of T1(n)=n and T6(n)=2n, the growth rates (in contrast to the actual values of T1 or T6) are exactly the same -they are both linear -- even though T6 is apparently a larger function because of the times 2 factor. (Note that this was not the case when comparing T1(n)=n, T2(n)= n1.2, and T3(n)=n2. There the growth rates of all three functions were different (and listed in order of increasing growth rate).) Since T1 and T6 have the same growth rate, we would want either one to be able to serve as an upper bound for the other. This would not work, however, with our current definition, since the inequality T6(n) <= T1(n) is not true for any value of n. The reason for making this distinction between growth rates and actual values in the graph is that basing upper and lower bound analysis merely on the actual values provides a too high granularity in the comparison of time complexity functions. It can make one function appear to be an upper bound when it really isnt. What we really want to look at, in effect, is the shape of the curve rather than how steep it is. The differences provided by the different shapes of curves are far more powerful than those differences provided by constants. An actual example will illustrate this point.
page: 11
Here, f(n) is much larger than g(n) for all the numbers we checked, and at first glance, it looks like f(n) is always going to be larger than g(n). We might conclude, if we look solely at the values in the graph, that f(n) has a higher growth rate than g(n). (As a quick point, note that this would not be the case if the multiplicative constant 250 were not there, and the h(n)=n column is included to emphasize this point. If we compare h(n) and g(n), we see that g(n) is larger than h(n) for all values of n.). However, that is not the case. It can be easily shown (and this is a skill that you will learn shortly) that there exists a specific n value for which g(n) will cross f(n) and be larger from that point onwards. Stated another way that makes use of the partial definition created above, there is an n0 value beyond which g(n) will be larger than f(n) and never again cross below it. (In logical terms, we say that there is a positive integer n0, where f(n) <= g(n) for all n >= n0.) In this particular case, that n value will be 2501000, a very large number indeed, but still, just a number. (We can calculate that value by solving the equation 250n <= n1.001 for n. To do that, first divide both sides by n. (Note that this is allowed, since we know that n>=0.) That gives us 250 <= n0.001. Then, take both sides to the 1000th power to get the result that 2501000 <= n1.) For all values of n greater than 2501000, g(n) will be larger than f(n). Thus, although it looks like f(n) is larger than g(n), this is really true only for n values on the lower end of the graph. There is an n0 value beyond which g(n) will be larger than f(n). Therefore, g(n) is an upper bound on f(n), rather than the other way around, a fact that was at first obscured by the large multiplicative constant in f(n). This illustrates an extremely important point, namely that the difference in the shape of the curve (f(n) is linear whereas g(n) is slightly greater than linear) will eventually overwhelm the effect of any multiplicative constant. The constant simply pushes that point of crossing to different places along the graph. For example, if the constant were 350 instead of 250, that n0 value would be 3501000, an even greater number. If the constant were less than 250, then n0 would be correspondingly smaller. But no matter how large the constant is in f(n), there will be some n0 value after which g(n) will be greater than f(n). The underlying reason for this is so very important that it is worth restating: the shape of the curve (i.e., the growth rate) will eventually overwhelm the effect of any constant. A function in n that has an exponent of 1.001, say, on the n term, has a different growth rate than a function that has an exponent of 1 on the n, and the former function has a larger growth rate than the latter. At some point, that larger growth rate will overwhelm the effect of any constant in the function with a lower growth rate. This factor needs to be captured in our time complexity analysis. We need to be able to focus on the growth rates of the various functions, and not get sidetracked by the effect of constants. We want the growth rates specifically to be the determinative factor, and not merely the issue of whether one function is greater than another when they are graphed. So how do we fix this problem? We fix it by adding a constant to the definition. We now say that some function g(n) is an upper bound for some function f(n) if there is a positive
page: 12
integer n0, and a real constant c>0, such that f(n) <= c*g(n) for all n >= n0. Here, f(n) is the function being bounded, and g(n) is the bounding function (upper bound, in this case). Weve simply added the possibility of including a multiplicative constant to the bounding (not the bounded) function. What is the effect of this constant c? It eliminates the effect of any such constant in f(n). For example, if we want to show that g(n)=n is an upper bound on f(n)=2n, we simply let c=2 and n0=1. Given these constants, it is now clear that f(n) <= c*g(n) for all n >= n0. Why? Because 2n (=f(n)) is indeed less than or equal to 2 * n (=c*g(n)) for all n >= 1 (=n0). Given these constants, these two functions now fulfill the definition, and g(n)=n is indeed an upper bound on f(n)=2n. (Well work some examples that illustrate this process in more detail.)
Big-Oh Notation
In the literature, these concepts have been incorporated under the name of the Big-Oh notation. (It is called Big-Oh because there is also a small-Oh relationship that has a different definition.) Using set notation, one reasonable definition for Big-Oh is the following: O(g(n)) = {f(n) | there are constants n0 and c>0 such that f(n) <= c*g(n) for all n >= n0} That is, Big-Oh of some function g(n) is the set of functions f(n) that are bounded from above by g(n). In the literature, this idea is expressed colloquially by saying that f(n) is O(g(n)), or even, f(n) = O(g(n)). (Some authors even use the more correct set notation and say that f(n) O(g(n)).) The function inside the Big-Oh, g(n) in this case, is the bounding function (that is, it is the upper bound); the function outside the Big-Oh, f(n) in this case, is the bounded function. To say that some function f(n) is Big-Oh of some function g(n) means that f(n) is bounded from above by g(n), which is another way of saying that the growth rate of g(n) is greater than or equal to the growth rate of f(n). We can determine whether some function f(n) is bounded from above by g(n) by finding correct constants n0 and c and then showing that the different elements of the definition are fulfilled with those constants. That is the subject of the next section.
page: 13
This definition is now a propositional statement of the form p implies q, or if p then q, where p is the proposition there exists constants n0 and c such that f(n) <= c*g(n) for all n >= n0, and q is the proposition f(n) = O(g(n)). Because of the logical structure of this statement, if we can show that p is true, we can conclude that q is true as well. There are several things to clarify here. First, notice that in the statement f(n) = O(g(n)) there is a function inside the Big-Oh, g(n) in this case, and a function outside the BigOh, f(n) in this case. The function inside the Big-Oh is the bounding function, that is, it is the upper bound. The function outside the Big-Oh is the function being bounded. The relationship between these is clearer if we look at the definition in a semi-graphical form, with arrows drawn to show the relationship between some of the components:
if there exists constants n0 and c such that: then we can conclude that:
f(n) c*g(n)
f(n) is O(g(n))
Note that the function that appears inside the Big-Oh, namely g(n), is the same function that is greater than or equal to (up to a constant) the function that appears outside the Big-Oh, namely f(n). The function g(n) is therefore an upper bound on f(n) (i.e., its on the big side of the inequality). The constant c in the definition eliminates the effect of any possible constants in f(n), since we can set c to be anything we want, including something greater than the constant in f(n). And the constant n0 eliminates from our consideration lower values of n. Putting all of these elements together, this definition now focuses on how the growth rates of these two functions relate for large values of n, and that is exactly what we wanted all along. Based on this definition of the Big-Oh, the following four step process can be used to prove that some function f(n) is O(g(n)): 1) Determine the inequality template for that particular problem. (This template is the inequality found in the first line of the definition.) Take particular note of which function is the bounding function, and which is the bounded function. Specify some values for n0 and c. Note that n0 must be a positive integer greater than or equal to 1 (n0>=1), and c is any real number strictly greater than 0 (c>0). Show that the elements of the definition are true with those two constants and the functions f(n) and g(n). In general, this requires us to show that the inequality is true with the four values for f(n), c, g(n), and n0 plugged in. (We simply plug in the four values into the template.)
2) 3)
page: 14
4)
If the inequality is true with the specified values, we can conclude that f(n) is O(g(n)). (This follows from the propositional nature of the definition, if p then q.)
Example 1:
Here, both f(n) and g(n) are the same function. This is no problem, and we can still show that f(n) = O(g(n)). All we have to do is find constants n0 and c such that the definition holds. We can set these values to anything we want within the described limits; we just have to find some correct ones. Following the four step process, we first determine the inequality template for this particular problem. As illustrated in the diagram above, the positions of the functions within the template are given by the positions of the functions in what we are trying to prove. Since g(n) is inside the Big-Oh, it will be on the big side of the inequality, and f(n) will be on the other side. For this problem, therefore, the inequality template is: f(n) <= c*g(n) for all n >= n0 We now need to determine some specific values for c and n0. Well, lets try a few and see what happens. What happens if we let c=1 and n0=1? Is the definition true? Well, lets plug these values into the business end of the definition (i.e., the inequality) and find out. Our goal is to see if f(n) c*g(n) for all nn0 for these particular values. When the values f(n) = n, c=1, g(n) = n, and n0 = 1 are all plugged into the inequality template, we get the statement: n <= 1*n for all n >= 1. We now need to determine if this statement is true. Simplifying a bit, we see that this statement is the same as n n when n1, which is obviously true. Since the statement is true when the specified values for n0, c, f(n), and g(n) are used in the inequality template, we can conclude that f(n)=n is O(g(n)). Thats all it takes to make a proof. Since our definition is of the form if p then q, and we have shown that p is true, we can conclude that q is true as well.
Example 2:
In this problem, the constants n0=1 and c=1 wont work. Why? Because when we plug these values into the definition, the resulting inequality isnt true. Lets verify that. If f(n)=2n, c=1, g(n)=n, and n0=1, and we plug these values into the inequality template for this problem: f(n) <= c*g(n) for all n >= n0
page: 15
we get the statement: 2n <= 1*n (= n) for all n >= 1 This statement is the same as 2n <= n for all n >= 1, which is clearly not true for any n (remember that n is always positive). For example, let n=5. Then the inequality says that 2*5 <= 5 or 10 <= 5. This statement is mathematically false. Hence, we cannot conclude that f(n) = O(g(n)) by using these constants. Now, the fact that the definition doesnt work for a specific pair of constants doesnt mean it wont work for another pair. The definition of Big-Oh requires us to find just a single pair that works; it does not have to work for all pairs. So lets try a different set of constants, say c=2 and n0 = 1, and see what happens. When we plug these values into the same template inequality f(n) <= c*g(n) for all n >= n0 we get: 2n 2*n for all n >= 1 This statement is the same as 2n <= 2n for all n >= 1. In contrast to the previous result, this is clearly true for all n. Therefore the definition holds, and we can conclude that f(n) = O(g(n)) from that analysis alone. What if we let c=3? Then wed have the statement 2n <= 3n for all n >= 1. This statement is clearly true as well. What about c=125? Then wed have 2n <= 125n for all n >=1. This is also true. In fact, the inequality will be true for all c >= 2 and all n0 >= 1. It is true for an infinite number of values for both c and n0. This was a lucky occurrence on this problem. In general, when doing Big-Oh analysis, it is not necessary to show that the definition is true for an infinite number of cs and n0s. We really need to demonstrate only one such pair to complete the proof. In this case, however, it is certainly true for many such values of c and n0. What if we let c be some value less than 2, say c=1.5? Now the definition does not hold. The inequality 2n <= 1.5n is not true for any values of n. Thus, c=1.5 wont work at all, no matter what n0 value we choose.
Example 3:
Let f(n) = 2n and g(n) = n (exactly as in Example 2). Show that g(n) = O(f(n)).
Now we need to find constants n0 and c such that g(n) <= c*f(n) for all n>= n0. Note that even though the values for f(n) and g(n) have not changed, f(n) is now the bounding function, while g(n) is the function being bounded. Thus, these two functions have changed positions in the definition from the previous problem. It is very important that this orientation be correct in the inequality template when working these problems.
page: 16
Again, finding correct constants is easy for this problem. Lets examine c=1 and n0=1 and see if they work. If we plug these into the general inequality specified by the definition: g(n) <= c*f(n) for all n >= n0 we get: n <= 1*(2n) for all n >= 1 which is the same as: n <= 2n for all n >= 1 This statement is clearly true for all n values. (Example: let n=5. Then the statement says that 5 <= 10, which is true.) Since the inequality is true with the specified constants, we can conclude that g(n) = O(f(n)).
Example 4:
Finding constants c and n0 that will work here is also a simple matter. Lets let c=1 and n0 =1 and see what happens. Since g(n) is inside the Big-Oh, the inequality template for this problem is: f(n) <= c*g(n) for all n >= n0 When we plug in the values f(n)=n, c=1, g(n)=n2, and n0=1, we get: n <= 1*n2 for all n >= 1. This statement is the same as n <= n2 for all n >= 1. Since n is always positive, this is clearly a true statement, and we can conclude that f(n) = O(g(n)).
Example 5:
Let f(n) = 25n and g(n) = n2. Show that f(n) = O(g(n)).
Here the growth rate of g(n) is clearly higher than the growth rate of f(n), so we expect to be able to prove this. However, the presence of the multiplicative constant in f(n) creates a problem. Because of it, our previous constants of c=1 and n0=1 will no longer work. Lets quickly verify this. The template for this problem is again: f(n) <= c*g(n) for all n >= n0 But when we plug in the values f(n)=25n, c=1, g(n)=n2, and n0=1, we get: 25n <= 1*n2 for all n >= 1.
2010 Charles O. Shields, Jr.
page: 17
This statement is not true for any n <= 25. For example, let n=10. When we plug n=10 into that last statement, we get: 25*10 = 250 <= (10)2 = 100. In other words, the statement says that 250 <= 100. This is clearly false mathematically. As mentioned in an earlier discussion, the fact that a given pair of constants c and n0 doesnt work does not necessarily mean that we cant make the proof. There may be another pair of constants that would work. In this case, we can observe that the inequality above fails only for values of n < 25, while it is correct for all values of n >= 25. We could thus set n0, the lowest value of n that we are willing to consider, to 25. So lets try the constants c=1 and n0=25, and see what happens. When we plug these into the template, we get the statement: 25n <= n2 for all n >= 25. This statement is now true. For example, if n=25, the lowest value of n now allowed, then the statement says that: 25*25 = 625 <= (25)2 = 625 or 625 = 625, which is true. Say n=50. Then the statement now says that 25*50 = 1,250 <= (50)2 = 2,500, which is also true. Clearly, the statement is true for all values of n greater than or equal to 25 (but it is not true for any n < 25). Thus, the values of c=1 and n0=25 also satisfy the definition, and we can conclude from them that f(n) = O(g(n)). Setting the lower limit of the n values we were willing to consider to 25 worked nicely. Are there other constant pairs that would work? How about setting c=25? If we did, what value for n0 would work? If we let c=25 and n0=1 and plug them into the template, we get the following statement: 25n <= 25*n2 for all n >= 1. This statement is the same as 25n <= 25n2 for all n >= 1. This is clearly true as well. Thus, the constant pair c=25 and n0 = 1 also satisfies the definition. Notice that changing the c value allowed us to use a different n0 value. These two constants must work together to achieve a correct statement. What if we let c=5? Is there a value of n0 that would work? If c=5, then we have the inequality: 25n <= 5*n2 for all n >= n0 We havent figured out n0 yet, but we can do that with a little math. Lets solve the inequality 25n <= 5n2 for n and see when it is true.
page: 18
If we divide both sides by 5n (acceptable, since n is positive), we get the inequality 5<=n. This result says that the original inequality is true for all n >= 5. This small calculation gives us a clue for yet another constant pair. Apparently, if we set the lower bound on n values to 5, the inequality will be true. So lets set c=5 and n0=5 and see if that pair works. If we plug those values into the template, we get: 25n <= 5*n2 for all n >= 5. Mathematically, this statement is the same as the statement 5<=n for all n>=5, which is trivially true. Nevertheless, lets take a few examples and see how it works. If n=5, then this statement says that 25*5 =125 <= 5*(5)2 = 5*25 = 125, or 125 <= 125. This is clearly true. If n=10, then this statement says that 25*10 = 250 <= 5*(10)2 = 5*100 = 500, or 250 <= 500. This is also true. In fact, this statement will indeed be true for all n >= 5. Thus, we have found another constant pair, c=5 and n0=5, that will satisfy the definition and lead to a successful proof. This example illustrates an important point: The constants c and n0 have to work together. If we change the value of one, there is oftentimes a different value on the second that will lead to a successful proof. Just remember that the overall objective in Big-Oh analysis is to find a single pair that will make the inequality true.
Example 6:
The answer to this question is no, g(n) is not O(f(n)). We know this because both f(n) and g(n) are polynomial functions and the exponent in f(n) is 1 and the exponent in g(n) is 2. Since g(n)s exponent is larger, it has the higher growth rate, and there is no way that f(n), with a lower growth rate, could bound the growth rate of g(n) from above. (f(n) could be a lower bound for g(n), however, but that is a different story.) Nevertheless, as a learning experience, lets attempt to prove that g(n) = O(f(n)). Well go through the process and see where the proof fails. Since we want to show that g(n) = O(f(n)), the inequality template for this problem is: g(n) <= c*f(n) for all n >= n0 We begin by plugging in the values for g(n) and f(n), and see what we have. When we plug in g(n)=n2 and f(n) = n, we get the statement: n2 <= c*n for all n >= n0 If we are going to complete this proof, we still have to find values for c and n0 that make this statement true. However, there is a significant problem already. There is no way that this inequality can be true for large values of n no matter what values are chosen for c and n0.
page: 19
Why is that? Well, lets examine the conditions under which this inequality, n2<=c*n, is true. If we divide both sides by n, we end up with the statement n<=c. In other words, this inequality is true only when n is less than or equal to the constant c. This is a serious problem. It means that the inequality can only be true for values that are less than some fixed constant, i.e., the lower values of n. Therefore, the inequality can never be true for large values of n extending to infinity. Thats why we didnt even need to get to the step of giving specific values to c and n0. It really doesnt make any difference what c is. This inequality will only be true for the lower portion of the graph (in terms of n values), never in the upper portion in which we are truly interested. Since the inequality cannot be true for large values of n extending to infinity, there is no n0 value for which the inequality is true for all n >= n0. Therefore, the proof fails, and we cannot conclude that g(n) = O(f(n)).