You are on page 1of 5

The Maximum Subsequence Sum

Problem
To emphasize algorithm analysis, we will consider
the problem of finding the maximum sum of a
contiguous subsequence of integers from a given
input data sequence.
For example, if the input data sequence contains

CSCI 3320/8325
Data Structures

2 11 4 13

Module 2
First Algorithms for Analysis

5 2

the answer is 20 (11 4 + 13 = 20).


For convenience, we define the maximum
subsequence sum as 0 if all the integers are
negative.

The Solutions

Algorithm 1: Exhaustive Trials

We will consider four different algorithms to solve


this problem.
Each algorithm will be explained, and the C++
code will be examined.
We will also compute the worst case running time
for each of the algorithms, and observe how
relatively simple algorithms can yield enormous
improvement in the running time.

Algorithm 1: C++ Code

The most obvious way to solve this problem is to compute


the sum of each possible subsequence, and retain the
largest sum as the result.
Each possible subsequence is identified by the subscripts of
the starting and ending elements in the subsequence
(assuming the entire sequence has been stored in an
array). Thus a pair of nested loops will be used to
generate all possible pairs of starting and ending
subscripts.
For each subsequence, we will also need a loop to compute
the sum of the elements in the candidate subsequence.
Note that we are not concerned with identifying the
subsequence itself in this problem, only with obtaining its
sum.
4

Algorithm 1: Analysis

int maxSubSum1(const vector<int > &a)


{
int maxsum = 0;
// 1
for ( int i=0; i<a.size(); i++) // 2
for (int j=i; j<a.size(); j++) { // 3
int thisSum = 0; // 4
for (int k=i; k<=j; k++) // 5
thisSum += a[k]; // 6
if ( thisSum > maxsum) // 7
maxSum = thisSum; // 8
}
return maxSum; // 9
}
5

The analysis of this algorithm is quite simple.


It is easy to see that the maximum number of iterations of
each loop (the worst case) is governed by the size of the
sequence, N .
There is just one statement (6) repeated inside the three
nested loops (2, 3 and 5), and it has a constant, or O (1),
running time.
Thus, the worst case running time for this algorithm is O (N ?3).
While a more precise analysis can be done, it yields an
expression that is still dominated by N 3, and thus the worst
case running time is not affected.
6

Algorithm 2: Eliminating a Loop

Algorithm 2: C++ Code

A simple observation allows us to eliminate one of


the nested loops.
The sum computed by the loop on lines 5 and 6
includes just one more element that the previous
calculation (for the same value of i).
Thus we can eliminate the innermost loop and
rewrite the code with only two nested loops.
The running time of this improved algorithm is
clearly O (N 2).
7

Algorithm 3: A Recursive Solution

int maxSubSum2 (const vector< int> & a)


{
int maxSum = 0;
for ( int i = 0; i < a.size(); i++) {
int thisSum = 0;
for (int j = i; j < a.size(); j++) {
thisSum += a[j];
if ( thisSum > maxSum)
maxSum = thisSum;
}
}
return maxSum;
}
8

Algorithm 3: The Three Cases

The idea in this algorithm is to split the problem into two


pieces, each approximately the same size, and solve them
independently.
Recall that the division of the size of a problem (usually by
2) typically results in a logarithmic worst case running time;
that is the case with this solution.
If we divide the sequence, then compute the maximum
subsequence sum of each part, the result is, with one
exception, just the maximum of the two resulting sums.
The exception occurs when the maximum subsequence
crosses the middle, and has elements in each of the two
parts.

Consider this input:


Left Half

Right Half

4 -3 5 -2

-1 2 6 -2

The maximum of the left half is 6 (4 3 +5), and


the maximum of the right half is 8 (2+6).
The maximum sum in the left half that includes its
rightmost element is 4 (4 3 +5 2); the maximum
sum in the right half that includes its leftmost
element is 7 (-1 +2 +6). The maximum sum that
crosses the middle is thus just 11 (4 +7).

Algorithm 3: Putting It All Together


The solution has a recursive function that receives the
entire array along with the subscripts of the left and right
border elements.
It checks first for the base case (just one element).
If not the base case, the two parts of the sequence are
checked recursively to obtain their maximum sums.
The maximum sums of the left and right sequences that
include the border elements are then added to obtain the
maximum sum that crosses the middle.
Finally, the maximum of these three sums is returned as
the result.
11

10

Algorithm 3: C++ Code (Part 1)


int maxSumRec (const vector<int> &a, int l, int r)
{
if (l == r)
// base case: only one element
if (a[l] > 0) return a[l]; else return 0;
int
int
int
int
for

c = (l + r) / 2;
// approximate center
maxlsum = maxSumRec(a,0,c);
// solve left part
maxrsum = maxSumRec(a,c+1,a.size()); // right part
lbsum=0, maxlbsum=0; // left border sum
(int i=c; i>=l; i--) {
lbsum += a[i];
if (lbsum > maxlbsum) maxlbsum = lbsum;

12

Algorithm 3: C++ Code (Part 2)

Algorithm 3: Analysis
If we consider only the base case for the recursion (a single
element), it is clear that T (1) = O (1), since only one of
two return statements is executed.
If more than one element is in the subsequence, then we
recursively invoke the function with the left and right halves
of the original vector, each taking time T (N /2), then
compute the sum of (potentially all) the elements in the
vector, taking time O (N ). Thus the total worst case
running time for the recursive cases is

int rbsum=0, maxrbsum=0; // right border sum


for (int i=c+1; i<=right; i++) {
rbsum += a[i];
if (rbsum > maxrbsum) maxrbsum = rbsum;
}
return max3(maxlsum, maxrsum, maxlbsum + maxrbsum);
}
int maxSubSum3 (const vector< int> &a)
{
return maxSumRec(a, 0, a.size()-1);
}

T (N ) = 2 T (N /2) + O (N ).
13

14

Algorithm 3: Analysis Conclusion

Algorithm 4: Further Improvements

We wont bother with formally solving these equations now,


but will revisit them later in the course.

Our final algorithm for the maximum subsequence


sum problem is not only the simplest, but also the
most efficient (in terms of running time growth).
Eliminating the need for the i loop can be
understood if we make the following observations.

At this point, however, we simply note that the final result


is
T (N ) = O (N log N )
Again, recall that when an algorithm works by dividing the
work to be done by a constant factor in each iteration (or
invocation), the running time will likely include a factor that
is logarithmic in the problem size (N ).

If a[i] is negative, then any sequence that begins with it


can be improved by starting with the next element.
Any subsequence that begins with a negative
subsequence can be improved by eliminating that
negative subsequence.

15

16

Algorithm 4: A Single Loop

Algorithm 4: C++ Code

To eliminate the i loop, we just keep track of the


subscript of the last element of the subsequence
being examined.
As soon as the sum of a subsequence becomes
negative, we just set the sum back to zero,
essentially eliminating it from the current
subsequences sum.

int maxSubSum4 (const vector< int> &a)


{
int maxSum=0, thisSum = 0;
for (int j=0; j<a.size(); j++) {
thisSum += a[j];
if (thisSum > maxSum)
maxSum = thisSum;
else if ( thisSum < 0) // eliminate negative prefix
thisSum = 0;
}
return maxSum;
}

17

18

Algorithm 4: Analysis

Algorithm 4: An On-line Algorithm

The running time for algorithm 4 should now be easy for


you to determine.
The body of the for loop has running time O (1), since it
contains
one assignment statement, with an addition, clearly requiring ti me
O (1), and
one if statement, with the then and else parts containing one
assignment statement each, with running time O (1).

Since the number of times the body of the for loop is


executed is equal to the problem size, we clearly have
T (N ) = O (N )

This last algorithm has the property of being an


on-line algorithm.
This means that the algorithm
requires only constant space (since only three integers
maxSum, thisSum, and the current value from the a
vector are needed at any time), and
can instantly provide an answer to the problem for the
data it has already processed.

19

Binary Search

20

Binary Search: C++ Code

Given an integer X and integers A 0 , A 1 , , A N-1 , which are


presorted and already in memory, find i such that A i = X, or
return i = -1 if X is not in the input.
The most obvious solution is a linear search, examining A 0,
then A 1, and so forth. It should be easy to see that this
solution has T (N ) = O (N ).
The linear search does not use the fact that the data is
presorted.
The binary search does better by examining the middle
element which is either the desired value X, or identifies
which of the remaining parts (left or right of the middle
element) should be examined further.

int binarySearch(const vector<int> &a, const int x) {


int low = 0, high = a.size()-1;
while (low <= high) {
int mid = (low + high) / 2; // middle subscript
if (a[mid] < x)
low = mid + 1;
else if (x < a[mid])
high = mid 1;
else
return mid;
// found
}
return 1;
// not found
}

21

Binary Search: Analysis

22

Euclids Algorithm

The work done inside the while loop clearly takes


O (1) time, so the total running time depends on
the number of iterations of the while loop.
Without loss of generality, we can consider the
number of elements in A to be equal to 2 k.
Each iteration reduces the number of elements
remaining to be considered by half, so clearly the
number of iterations required is k, where 2k-1 ? N ?
2k. Thus T (N ) = O (log N ).
23

Euclids algorithm is used to compute the greatest common


divisor (gcd) of two integers a and b (that is, the largest
integer that divides each of a and b).
The algorithm works by repeatedly computing the
remainder of a / b, replacing a by b, and b by the
remainder, until the remainder is zero. The last nonzero
remainder is the answer.
For example, computation of the gcd of 137,912 and
151,360 yields the following sequence of remainders:
137,912 13,448 3,432 3,152 280 72 64 8 0
Thus the greatest common denominator is 8.
24

Euclids Algorithm: C++ Code


long gcd(long a, long b) {
while (b != 0) {
long rem = a % b;
a = b;
b = rem;
}
return a;
}

//
//
//
//

Euclids Algorithm: Analysis


Each execution of the body of the loop started with statement 1 (that
is, statements 2, 3, and 4) takes constant time, so the running time of
the algorithm depends only on the number of iterations of the loop,
which depends on the length of the sequence of nonzero remainders.
If we could show that each iteration of the loop decreased the value of
the remainder by at least a constant factor, then we could predi ct a
logarithmic running time.
But this is not the case (refer back to the computation of the gcd of
137,912 and 151,360).
We can, however, show that after two iterations the remainder is at
most half its original value (the proof appears on the next slide).
Thus, since 2 log N = O (log N), we have established the logarithmic
running time of the algorithm.

#1
#2
#3
#3

// #4

25

Remainder Reduction Rate Analysis


To show that the remainder decreases by at least half with
every pair of iterations, it is sufficient to prove that if a > b,
then a mod b < a / 2. (Recall that the remainder of a / b,
or a mod b, will always be less than b.)
There are two cases to consider:
If b ? a / 2, the remainder is clearly less than a / 2.
If b > a / 2, then a / b = 0 with a remainder of a b, which must
be less than a / 2.

Thus, even if a < b, at most two iterations of the loop will


be required to obtain a remainder that is at most half as
large as a.

26

Exponentiation
Computation of a b (where a and b are both
integers) can be done using the obvious technique
involving b 1 multiplications.
We can do better than this is we observe that if z
= a k, then z ? z = a 2 k.
Thus, if b is even, a b = (a 2) b / 2, eliminating
almost half of the multiplications required by the
obvious technique.

27

Exponentiation: C++ Code


long pow(long a, long b) {
if (b == 0) return 1;
if (b == 1) return a;
if ( isEven(n))
return pow (a * a, b / 2);
else return pow (a * a, b / 2) * a;
}

//
//
//
//
//

28

Exponentiation: Analysis
The base cases for the recursive function
(statements 1 and 2) clearly take O (1) time.
Each recursive invocation of pow reduces the size
of the exponent by a constant factor (2), so the
running time is logarithmic.
If we count the number of multiplications, its easy
to see that at most two multiplications are involved
for each invocation of the function, so the
maximum number of multiplications is 2 log2 b.

#1
#2
#3
#4
#5

29

30

You might also like