Algorithm Analysis - Max Alekseyev

CSCE750: Analysis of Algorithms
Lecture 6 Max Alekseyev

University of South Carolina
September 10, 2013
1/19
Outline
Divide-and-Conquer Fast Integer Multiplication Fast Matrix Multiplication
2/19
Fast Integer Multiplication

Let b , c 0 be integers, represented in binary, with n bits each. Let us assume that n is large, so that b and c cannot be added, subtracted, or multiplied in constant time. We imagine that the b and c are both represented as arrays of n bits: b = bn1 b0 and c = cn1 c0 , where the bi and ci are individual bits (leading 0s are allowed). Thus,
n1
b = b0 2 + b1 2 + + bn1 2
n 1
=
i =0 n 1
bi 2 i , ci 2i .
i =0
c = c0 20 + c1 21 + + cn1 2n1 =
3/19
Addition
The usual sequential binary add-with-carry algorithm that we all learned in school takes time (n), since we spend a constant amount of time at each column, from right to left. The sum is representable by n + 1 bits (at most). Q: Can we do better?
4/19
Addition
The usual sequential binary add-with-carry algorithm that we all learned in school takes time (n), since we spend a constant amount of time at each column, from right to left. The sum is representable by n + 1 bits (at most). This algorithm is clearly asymptotically optimal, since the produce the correct sum we must at least examine each bit of b and of c .
4/19
Subtraction
Similar to addition, the usual subtract-and-borrow algorithm takes (n) time, which is clearly asymptotically optimal. The result can be represented by at most n bits.
5/19
Multiplication
If we multiply b and c using the naive grade school algorithm, then it takes quadratic (i.e., (n2 )) time. Essentially, this algorithm is tantamount to expanding the product bc according to the expressions above:
n 1 i =0 n1 j =0
bc =
bi 2
c j 2j =
i ,j
bi cj 2i +j ,
then adding everything up term by term. There are n2 terms. Q: Can we do better?
6/19
Multiplying with Divide-and-Conquer
If n = 1, then the multiplication is trivial, so assume that n > 1. Let us further assume for simplicity n is even. In fact, we can assume that n is a power of 2. If it is not, pad each number with leading 0s to the next power of 2; at worst this just doubles the input size.
7/19
Let m = n/2. Split b and c up into their m least and m most signicant bits. Let b and bh be the numbers given by the low m bits and the high m bits of b , respectively. Similarly, let c and ch be the low and high halves of c . Thus, 0 b , bh , c , ch < 2m and b = b + bh 2m , c = c + ch 2m .
7/19

We then have bc = (b + bh 2m )(c + ch 2m ) = b c + (b ch + bh c )2m + bh ch 2n . This suggests that we can compute bc with four recursive multiplications of pairs of m-bit numbers b c , b ch , bh c , and bh ch as well as (n) time spent doing other things, namely, some additions and multiplications by powers of two (the latter amounts to arithmetic shift of the bits, which can be done in linear time.) The time for this divide-and-conquer multiplication algorithm thus satises the recurrence T (n) = 4T (m) + (n) = 4T (n/2) + (n). The Master Theorem (Case 1) then gives T (n) = (n2 ), which is asymptotically no better than the naive algorithm.
7/19
Better Approach
Another way to compute bc = (b + bh 2m )(c + ch 2m ) = b c + (b ch + bh c )2m + bh ch 2n .
Split b and c up into their low and high halves as above, but then recursively compute only three products: x y z = bc, = bh ch , = (b + bh )(c + ch ).
Now you should verify for yourself that bc = x + (z y x )2m + y 2n , which the algorithm then computes.
8/19
Running Time Analysis
How much time does this take? Besides the recursive calls, there is a linear times worth of overhead: additions, subtractions, and arithmetic shifts. There are three recursive callscomputing x , y , and z . The numbers x and y are products of two m-bit integers each, and z is the product of (at most) two (m + 1)-bit integers. Thus the running time satises T (n) = 2 T (n/2) + T (n/2 + 1) + (n).
9/19
Running Time Analysis

It can be shown that the +1 doesnt aect the result, so the recurrence is eectively T (n) = 3 T (n/2) + (n), which yields T (n) = (nlg 3 ) by the Master Theorem. Since lg 3 1.585 < 2, the new approach runs signicantly faster asymptotically.
Remark
If youre really worried about the +1,, you should verify that T (n) = (nlg 3 ) directly using the substitution method. Alternatively, you can modify the algorithm a bit so that only m-bit numbers are multiplied recursively and the overhead is still (n).
9/19
A Bit of History
This approach dates back at least to Gauss, who discovered (using the same trick) that multiplying two complex numbers together could be done with only three real multiplications instead of the more naive four. The same idea has been applied to long integer multiplication by Karatsuba and to matrix multiplication by Strassen.
10/19
Matrix Multiplication
n Given two n n matrices A = (aij )n i ,j =1 and B = (bij )i ,j =1 , their product is dened as follows: n
A B = (cij )n i ,j =1 ,
where cij =
k =1
aik bkj .
Therefore, to compute the matrix product, we need to compute n2 matrix entries. A naive approach takes n multiplications and n 1 additions for each entry.
11/19
Naive Matrix Multiplication Pseudocode

Matrix-Multiply(A, B ) 1. n = A.rows 2. let C be a new n n matrix 3. for i = 1 to n 4. 5. 6. 7. for j = 1 to n cij = 0 for k = 1 to n cij = cij + aik bkj
8. return C
12/19
Naive Matrix Multiplication Pseudocode

Matrix-Multiply(A, B ) 1. n = A.rows 2. let C be a new n n matrix 3. for i = 1 to n 4. 5. 6. 7. for j = 1 to n cij = 0 for k = 1 to n cij = cij + aik bkj
8. return C Running time is (n3 ).
12/19
Can we do better?
Is (n3 ) the best we can do? Can we multiply matrices in o (n3 ) time? It seems like any algorithm to multiply matrices must take (n3 ) time: Must compute n2 entries. Each entry is the sum of n terms. But with Strassens method, we can multiply matrices in o (n3 ) time: Strassens algorithm runs in (nlg 7 ) time. 2.80 < lg 7 < 2.81. Hence, runs in O (n2.81 ) time.
13/19
Divide-and-Conquer Multiplication Algorithm

For simplisity assume that n is a power of 2. To compute the product of matrices, we subdivide each of the matrices into four n/2 n/2 submatrices so that the equation C = A B takes form: C11 C12 C21 C22 = A11 A12 A21 A22 B11 B12 . B21 B22
This matrix equation corresponds to the following four equations on the submatrices: C11 = A11 B11 + A12 B21 C = A B + A B 12 11 12 12 22 C = A B + A B 21 21 11 22 21 C = A B + A B 22 21 12 22 22
14/19
Divide-and-Conquer Multiplication Pseudocode

Matrix-Multiply-Recursive(A, B ) 1. n = A.rows 2. let C be a new n n matrix 3. if n == 1 4. 6. 7. 8. 9. c11 = a11 b11 C11 = Matrix-Multiply-Recursive(A11 , B11 ) + Matrix-Multiply-Recursive(A12 , B21 ) C12 = Matrix-Multiply-Recursive(A11 , B12 ) + Matrix-Multiply-Recursive(A12 , B22 ) C21 = Matrix-Multiply-Recursive(A21 , B11 ) + Matrix-Multiply-Recursive(A22 , B21 ) C22 = Matrix-Multiply-Recursive(A21 , B12 ) + Matrix-Multiply-Recursive(A22 , B22 )
15/19
5. else partition each of A, B , C into four submatrices
10. return C
Divide-and-Conquer Multiplication Running Time

Using index calculation, we can execute Step 5 in (1) time (in contrast to (n2 ) that would be required if we created submatrices and copied their entries). However, that does not make a dierence asymptotically.
16/19
Divide-and-Conquer Multiplication Running Time

Using index calculation, we can execute Step 5 in (1) time (in contrast to (n2 ) that would be required if we created submatrices and copied their entries). However, that does not make a dierence asymptotically. The running time T (n) for Matrix-Multiply-Recursive on n n matrices satisfy the recurrence: T (n) = (1) + 8 T (n/2) + (n2 ) = 8 T (n/2) + (n2 ) with T (1) = (1). By Master Theorem, T (n) = (n3 ) which is unfortunately not faster than the naive method Matrix-Multiply.
16/19
Divide-and-Conquer Multiplication Drawback

Each time we split matrix sizes in half, but do not actually reduce the total amount of time. If we assume that naive matrix multiplication takes c n3 time. Then computing each product of submatrices takes 3 c (n/2)3 = c n 8 and we need eight such products, resulting in 3 3 total time of 8 c n 8 = c n (plus overhead) that is no better than simply doing multiplication in the naive way.
17/19
Divide-and-Conquer Multiplication Drawback

Each time we split matrix sizes in half, but do not actually reduce the total amount of time. If we assume that naive matrix multiplication takes c n3 time. Then computing each product of submatrices takes 3 c (n/2)3 = c n 8 and we need eight such products, resulting in 3 3 total time of 8 c n 8 = c n (plus overhead) that is no better than simply doing multiplication in the naive way. In contrast, let us consider Merge-Sort with the running time recurrence T (n) = 2 T (n/2) + (n). Even if we did naive quadratic (that is, of time c n2 ) sorting for each of the 2 subproblems, the total time would be 2 c (n/2)2 = c n 2 (plus overhead of (n)) that is faster than naive sorting of the whole problem by factor of 2. This tells us that Divide-and-Conquer sorting may be more ecient than naive sorting (and it is indeed such as Master Theorem proves).
17/19
Strassens Method
The idea behind Strassens method is to reduce the number of multiplications at each recursive call from eight to seven. That makes the recursion tree slightly less bushy.
18/19
Strassens Method
Strassens method has four steps: 1. Divide the input matrices A and B into submatrices as before, using index calculations in (1) time. 2. Create ten n/2 n/2 matrices S1 , S2 , . . . , S10 , each equal the sum or dierence of two submatrices created in Step 1. This step takes (n2 ) time. 3. Using the submatrices created in Steps 1 and 2, compute seven products P1 , P2 , . . . , P7 , each of size n/2 n/2. 4. Compute the submatrices of C by adding or subtracting various combinations of the matrices Pi . This step takes (n2 ). The running time for Strassens Method satises the recurrence: T (n) = (1) if n = 1, 2 7 T (n/2) + (n ) if n > 1.
18/19
By the Master theorem, T (n) = (nlg 7 ).
Notes
Strassens algorithm was the rst to beat (n3 ) time, but it is not the asymptotically fastest known. A method by Coppersmith and Winograd runs in O (n2.376 ) time. Practical issues against Strassens algorithm: Higher constant factor than the obvious (n3 )-time method. Not good for sparse matrices. Not numerically stable: larger errors accumulate than in the naive method. Submatrices consume space, especially if copying. Various researchers have tried to nd the crossover point, where Strassens algorthm runs faster than the naive (n3 )-time method. Theoretical analyses (that ignore caches and hardware pipelines) have produced crossover points as low as n = 8, and practical experiments have found crossover points as low as n = 400.
19/19

Algorithm Analysis - Max Alekseyev

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Algorithm Analysis - Max Alekseyev

Uploaded by

Copyright:

Available Formats

CSCE750: Analysis of Algorithms

Lecture 6 Max Alekseyev

September 10, 2013

Divide-and-Conquer Fast Integer Multiplication Fast Matrix Multiplication

Fast Integer Multiplication

Multiplying with Divide-and-Conquer

Multiplying with Divide-and-Conquer

Multiplying with Divide-and-Conquer

Running Time Analysis

Running Time Analysis

Naive Matrix Multiplication Pseudocode

Naive Matrix Multiplication Pseudocode

8. return C Running time is (n3 ).

Divide-and-Conquer Multiplication Algorithm

Divide-and-Conquer Multiplication Pseudocode

5. else partition each of A, B , C into four submatrices

Divide-and-Conquer Multiplication Running Time

Divide-and-Conquer Multiplication Running Time

Divide-and-Conquer Multiplication Drawback

Divide-and-Conquer Multiplication Drawback

By the Master theorem, T (n) = (nlg 7 ).

You might also like