You are on page 1of 27

CSCE750: Analysis of Algorithms

Lecture 6 Max Alekseyev


University of South Carolina

September 10, 2013

1/19

Outline

Divide-and-Conquer Fast Integer Multiplication Fast Matrix Multiplication

2/19

Fast Integer Multiplication


Let b , c 0 be integers, represented in binary, with n bits each. Let us assume that n is large, so that b and c cannot be added, subtracted, or multiplied in constant time. We imagine that the b and c are both represented as arrays of n bits: b = bn1 b0 and c = cn1 c0 , where the bi and ci are individual bits (leading 0s are allowed). Thus,
n1

b = b0 2 + b1 2 + + bn1 2

n 1

=
i =0 n 1

bi 2 i , ci 2i .
i =0

c = c0 20 + c1 21 + + cn1 2n1 =

3/19

Addition

The usual sequential binary add-with-carry algorithm that we all learned in school takes time (n), since we spend a constant amount of time at each column, from right to left. The sum is representable by n + 1 bits (at most). Q: Can we do better?

4/19

Addition

The usual sequential binary add-with-carry algorithm that we all learned in school takes time (n), since we spend a constant amount of time at each column, from right to left. The sum is representable by n + 1 bits (at most). This algorithm is clearly asymptotically optimal, since the produce the correct sum we must at least examine each bit of b and of c .

4/19

Subtraction

Similar to addition, the usual subtract-and-borrow algorithm takes (n) time, which is clearly asymptotically optimal. The result can be represented by at most n bits.

5/19

Multiplication

If we multiply b and c using the naive grade school algorithm, then it takes quadratic (i.e., (n2 )) time. Essentially, this algorithm is tantamount to expanding the product bc according to the expressions above:
n 1 i =0 n1 j =0

bc =

bi 2

c j 2j =
i ,j

bi cj 2i +j ,

then adding everything up term by term. There are n2 terms. Q: Can we do better?

6/19

Multiplying with Divide-and-Conquer

If n = 1, then the multiplication is trivial, so assume that n > 1. Let us further assume for simplicity n is even. In fact, we can assume that n is a power of 2. If it is not, pad each number with leading 0s to the next power of 2; at worst this just doubles the input size.

7/19

Multiplying with Divide-and-Conquer

Let m = n/2. Split b and c up into their m least and m most signicant bits. Let b and bh be the numbers given by the low m bits and the high m bits of b , respectively. Similarly, let c and ch be the low and high halves of c . Thus, 0 b , bh , c , ch < 2m and b = b + bh 2m , c = c + ch 2m .

7/19

Multiplying with Divide-and-Conquer


We then have bc = (b + bh 2m )(c + ch 2m ) = b c + (b ch + bh c )2m + bh ch 2n . This suggests that we can compute bc with four recursive multiplications of pairs of m-bit numbers b c , b ch , bh c , and bh ch as well as (n) time spent doing other things, namely, some additions and multiplications by powers of two (the latter amounts to arithmetic shift of the bits, which can be done in linear time.) The time for this divide-and-conquer multiplication algorithm thus satises the recurrence T (n) = 4T (m) + (n) = 4T (n/2) + (n). The Master Theorem (Case 1) then gives T (n) = (n2 ), which is asymptotically no better than the naive algorithm.
7/19

Better Approach
Another way to compute bc = (b + bh 2m )(c + ch 2m ) = b c + (b ch + bh c )2m + bh ch 2n .

Split b and c up into their low and high halves as above, but then recursively compute only three products: x y z = bc, = bh ch , = (b + bh )(c + ch ).

Now you should verify for yourself that bc = x + (z y x )2m + y 2n , which the algorithm then computes.
8/19

Running Time Analysis

How much time does this take? Besides the recursive calls, there is a linear times worth of overhead: additions, subtractions, and arithmetic shifts. There are three recursive callscomputing x , y , and z . The numbers x and y are products of two m-bit integers each, and z is the product of (at most) two (m + 1)-bit integers. Thus the running time satises T (n) = 2 T (n/2) + T (n/2 + 1) + (n).

9/19

Running Time Analysis


It can be shown that the +1 doesnt aect the result, so the recurrence is eectively T (n) = 3 T (n/2) + (n), which yields T (n) = (nlg 3 ) by the Master Theorem. Since lg 3 1.585 < 2, the new approach runs signicantly faster asymptotically.

Remark
If youre really worried about the +1,, you should verify that T (n) = (nlg 3 ) directly using the substitution method. Alternatively, you can modify the algorithm a bit so that only m-bit numbers are multiplied recursively and the overhead is still (n).

9/19

A Bit of History

This approach dates back at least to Gauss, who discovered (using the same trick) that multiplying two complex numbers together could be done with only three real multiplications instead of the more naive four. The same idea has been applied to long integer multiplication by Karatsuba and to matrix multiplication by Strassen.

10/19

Matrix Multiplication

n Given two n n matrices A = (aij )n i ,j =1 and B = (bij )i ,j =1 , their product is dened as follows: n

A B = (cij )n i ,j =1 ,

where cij =
k =1

aik bkj .

Therefore, to compute the matrix product, we need to compute n2 matrix entries. A naive approach takes n multiplications and n 1 additions for each entry.

11/19

Naive Matrix Multiplication Pseudocode


Matrix-Multiply(A, B ) 1. n = A.rows 2. let C be a new n n matrix 3. for i = 1 to n 4. 5. 6. 7. for j = 1 to n cij = 0 for k = 1 to n cij = cij + aik bkj

8. return C

12/19

Naive Matrix Multiplication Pseudocode


Matrix-Multiply(A, B ) 1. n = A.rows 2. let C be a new n n matrix 3. for i = 1 to n 4. 5. 6. 7. for j = 1 to n cij = 0 for k = 1 to n cij = cij + aik bkj

8. return C Running time is (n3 ).

12/19

Can we do better?
Is (n3 ) the best we can do? Can we multiply matrices in o (n3 ) time? It seems like any algorithm to multiply matrices must take (n3 ) time: Must compute n2 entries. Each entry is the sum of n terms. But with Strassens method, we can multiply matrices in o (n3 ) time: Strassens algorithm runs in (nlg 7 ) time. 2.80 < lg 7 < 2.81. Hence, runs in O (n2.81 ) time.

13/19

Divide-and-Conquer Multiplication Algorithm


For simplisity assume that n is a power of 2. To compute the product of matrices, we subdivide each of the matrices into four n/2 n/2 submatrices so that the equation C = A B takes form: C11 C12 C21 C22 = A11 A12 A21 A22 B11 B12 . B21 B22

This matrix equation corresponds to the following four equations on the submatrices: C11 = A11 B11 + A12 B21 C = A B + A B 12 11 12 12 22 C = A B + A B 21 21 11 22 21 C = A B + A B 22 21 12 22 22

14/19

Divide-and-Conquer Multiplication Pseudocode


Matrix-Multiply-Recursive(A, B ) 1. n = A.rows 2. let C be a new n n matrix 3. if n == 1 4. 6. 7. 8. 9. c11 = a11 b11 C11 = Matrix-Multiply-Recursive(A11 , B11 ) + Matrix-Multiply-Recursive(A12 , B21 ) C12 = Matrix-Multiply-Recursive(A11 , B12 ) + Matrix-Multiply-Recursive(A12 , B22 ) C21 = Matrix-Multiply-Recursive(A21 , B11 ) + Matrix-Multiply-Recursive(A22 , B21 ) C22 = Matrix-Multiply-Recursive(A21 , B12 ) + Matrix-Multiply-Recursive(A22 , B22 )
15/19

5. else partition each of A, B , C into four submatrices

10. return C

Divide-and-Conquer Multiplication Running Time


Using index calculation, we can execute Step 5 in (1) time (in contrast to (n2 ) that would be required if we created submatrices and copied their entries). However, that does not make a dierence asymptotically.

16/19

Divide-and-Conquer Multiplication Running Time


Using index calculation, we can execute Step 5 in (1) time (in contrast to (n2 ) that would be required if we created submatrices and copied their entries). However, that does not make a dierence asymptotically. The running time T (n) for Matrix-Multiply-Recursive on n n matrices satisfy the recurrence: T (n) = (1) + 8 T (n/2) + (n2 ) = 8 T (n/2) + (n2 ) with T (1) = (1). By Master Theorem, T (n) = (n3 ) which is unfortunately not faster than the naive method Matrix-Multiply.

16/19

Divide-and-Conquer Multiplication Drawback


Each time we split matrix sizes in half, but do not actually reduce the total amount of time. If we assume that naive matrix multiplication takes c n3 time. Then computing each product of submatrices takes 3 c (n/2)3 = c n 8 and we need eight such products, resulting in 3 3 total time of 8 c n 8 = c n (plus overhead) that is no better than simply doing multiplication in the naive way.

17/19

Divide-and-Conquer Multiplication Drawback


Each time we split matrix sizes in half, but do not actually reduce the total amount of time. If we assume that naive matrix multiplication takes c n3 time. Then computing each product of submatrices takes 3 c (n/2)3 = c n 8 and we need eight such products, resulting in 3 3 total time of 8 c n 8 = c n (plus overhead) that is no better than simply doing multiplication in the naive way. In contrast, let us consider Merge-Sort with the running time recurrence T (n) = 2 T (n/2) + (n). Even if we did naive quadratic (that is, of time c n2 ) sorting for each of the 2 subproblems, the total time would be 2 c (n/2)2 = c n 2 (plus overhead of (n)) that is faster than naive sorting of the whole problem by factor of 2. This tells us that Divide-and-Conquer sorting may be more ecient than naive sorting (and it is indeed such as Master Theorem proves).
17/19

Strassens Method

The idea behind Strassens method is to reduce the number of multiplications at each recursive call from eight to seven. That makes the recursion tree slightly less bushy.

18/19

Strassens Method
Strassens method has four steps: 1. Divide the input matrices A and B into submatrices as before, using index calculations in (1) time. 2. Create ten n/2 n/2 matrices S1 , S2 , . . . , S10 , each equal the sum or dierence of two submatrices created in Step 1. This step takes (n2 ) time. 3. Using the submatrices created in Steps 1 and 2, compute seven products P1 , P2 , . . . , P7 , each of size n/2 n/2. 4. Compute the submatrices of C by adding or subtracting various combinations of the matrices Pi . This step takes (n2 ). The running time for Strassens Method satises the recurrence: T (n) = (1) if n = 1, 2 7 T (n/2) + (n ) if n > 1.
18/19

By the Master theorem, T (n) = (nlg 7 ).

Notes
Strassens algorithm was the rst to beat (n3 ) time, but it is not the asymptotically fastest known. A method by Coppersmith and Winograd runs in O (n2.376 ) time. Practical issues against Strassens algorithm: Higher constant factor than the obvious (n3 )-time method. Not good for sparse matrices. Not numerically stable: larger errors accumulate than in the naive method. Submatrices consume space, especially if copying. Various researchers have tried to nd the crossover point, where Strassens algorthm runs faster than the naive (n3 )-time method. Theoretical analyses (that ignore caches and hardware pipelines) have produced crossover points as low as n = 8, and practical experiments have found crossover points as low as n = 400.
19/19

You might also like