Professional Documents
Culture Documents
1/19
Outline
2/19
b = b0 2 + b1 2 + + bn1 2
n 1
=
i =0 n 1
bi 2 i , ci 2i .
i =0
c = c0 20 + c1 21 + + cn1 2n1 =
3/19
Addition
The usual sequential binary add-with-carry algorithm that we all learned in school takes time (n), since we spend a constant amount of time at each column, from right to left. The sum is representable by n + 1 bits (at most). Q: Can we do better?
4/19
Addition
The usual sequential binary add-with-carry algorithm that we all learned in school takes time (n), since we spend a constant amount of time at each column, from right to left. The sum is representable by n + 1 bits (at most). This algorithm is clearly asymptotically optimal, since the produce the correct sum we must at least examine each bit of b and of c .
4/19
Subtraction
Similar to addition, the usual subtract-and-borrow algorithm takes (n) time, which is clearly asymptotically optimal. The result can be represented by at most n bits.
5/19
Multiplication
If we multiply b and c using the naive grade school algorithm, then it takes quadratic (i.e., (n2 )) time. Essentially, this algorithm is tantamount to expanding the product bc according to the expressions above:
n 1 i =0 n1 j =0
bc =
bi 2
c j 2j =
i ,j
bi cj 2i +j ,
then adding everything up term by term. There are n2 terms. Q: Can we do better?
6/19
If n = 1, then the multiplication is trivial, so assume that n > 1. Let us further assume for simplicity n is even. In fact, we can assume that n is a power of 2. If it is not, pad each number with leading 0s to the next power of 2; at worst this just doubles the input size.
7/19
Let m = n/2. Split b and c up into their m least and m most signicant bits. Let b and bh be the numbers given by the low m bits and the high m bits of b , respectively. Similarly, let c and ch be the low and high halves of c . Thus, 0 b , bh , c , ch < 2m and b = b + bh 2m , c = c + ch 2m .
7/19
Better Approach
Another way to compute bc = (b + bh 2m )(c + ch 2m ) = b c + (b ch + bh c )2m + bh ch 2n .
Split b and c up into their low and high halves as above, but then recursively compute only three products: x y z = bc, = bh ch , = (b + bh )(c + ch ).
Now you should verify for yourself that bc = x + (z y x )2m + y 2n , which the algorithm then computes.
8/19
How much time does this take? Besides the recursive calls, there is a linear times worth of overhead: additions, subtractions, and arithmetic shifts. There are three recursive callscomputing x , y , and z . The numbers x and y are products of two m-bit integers each, and z is the product of (at most) two (m + 1)-bit integers. Thus the running time satises T (n) = 2 T (n/2) + T (n/2 + 1) + (n).
9/19
Remark
If youre really worried about the +1,, you should verify that T (n) = (nlg 3 ) directly using the substitution method. Alternatively, you can modify the algorithm a bit so that only m-bit numbers are multiplied recursively and the overhead is still (n).
9/19
A Bit of History
This approach dates back at least to Gauss, who discovered (using the same trick) that multiplying two complex numbers together could be done with only three real multiplications instead of the more naive four. The same idea has been applied to long integer multiplication by Karatsuba and to matrix multiplication by Strassen.
10/19
Matrix Multiplication
n Given two n n matrices A = (aij )n i ,j =1 and B = (bij )i ,j =1 , their product is dened as follows: n
A B = (cij )n i ,j =1 ,
where cij =
k =1
aik bkj .
Therefore, to compute the matrix product, we need to compute n2 matrix entries. A naive approach takes n multiplications and n 1 additions for each entry.
11/19
8. return C
12/19
12/19
Can we do better?
Is (n3 ) the best we can do? Can we multiply matrices in o (n3 ) time? It seems like any algorithm to multiply matrices must take (n3 ) time: Must compute n2 entries. Each entry is the sum of n terms. But with Strassens method, we can multiply matrices in o (n3 ) time: Strassens algorithm runs in (nlg 7 ) time. 2.80 < lg 7 < 2.81. Hence, runs in O (n2.81 ) time.
13/19
This matrix equation corresponds to the following four equations on the submatrices: C11 = A11 B11 + A12 B21 C = A B + A B 12 11 12 12 22 C = A B + A B 21 21 11 22 21 C = A B + A B 22 21 12 22 22
14/19
10. return C
16/19
16/19
17/19
Strassens Method
The idea behind Strassens method is to reduce the number of multiplications at each recursive call from eight to seven. That makes the recursion tree slightly less bushy.
18/19
Strassens Method
Strassens method has four steps: 1. Divide the input matrices A and B into submatrices as before, using index calculations in (1) time. 2. Create ten n/2 n/2 matrices S1 , S2 , . . . , S10 , each equal the sum or dierence of two submatrices created in Step 1. This step takes (n2 ) time. 3. Using the submatrices created in Steps 1 and 2, compute seven products P1 , P2 , . . . , P7 , each of size n/2 n/2. 4. Compute the submatrices of C by adding or subtracting various combinations of the matrices Pi . This step takes (n2 ). The running time for Strassens Method satises the recurrence: T (n) = (1) if n = 1, 2 7 T (n/2) + (n ) if n > 1.
18/19
Notes
Strassens algorithm was the rst to beat (n3 ) time, but it is not the asymptotically fastest known. A method by Coppersmith and Winograd runs in O (n2.376 ) time. Practical issues against Strassens algorithm: Higher constant factor than the obvious (n3 )-time method. Not good for sparse matrices. Not numerically stable: larger errors accumulate than in the naive method. Submatrices consume space, especially if copying. Various researchers have tried to nd the crossover point, where Strassens algorthm runs faster than the naive (n3 )-time method. Theoretical analyses (that ignore caches and hardware pipelines) have produced crossover points as low as n = 8, and practical experiments have found crossover points as low as n = 400.
19/19