You are on page 1of 8

Computing 36, 375 382 (1986)

Computing
9 by Springer-Verlag1986

An Improved Algorithm for Boolean Matrix Multiplication*


N . S a n t o r o a n d J. Urrutia, O t t a w a

Received December 17, 1984


Abstract - - Zusammenfassung An Improved Algorithm for Boolean Matrix Multiplication. A new algorithm for computing the product

of two arbitrary N x N Boolean matrices is presented. The algorithm requires O (N3/logN) bit operations and only O (Nlog N) bits of additional storage. This represents an improvement on the Four Russians' method which requires the same number of operations but uses O (N3/log37)bits of additional storage.

Key words: Boolean matrix multiplication, analysis of algorithms.


Ein verbesserter Algorithmus fiir die Multiplikafion Booleseher Matrizen. In dieser Arbeit ist ein neuer Algorithmus zur Berechnung des Produktes von N x N Matrizen mit Booleschen Koeffizienten beschrieben. Die Anzahl der Operationen betr~igt O (N3/logN), wobei lediglich O (Nlog N) zus~itzlicher Speicherplatz benOtigt wird. Das ist eine Verbesserung gegeniiber der ,,Vier-Russen"-Methode, die bei gleicher Anzahl yon Operationen O (N3/logN) Speicherplatz ben6tigt.

I. Introduction

In parallel with the analysis of asymptotically fast methods, the research on Boolean matrix multiplication has also focused on the determination of "efficient" bounds; that is, finding non-optimal but practical algorithms that outperform the asymptotic algorithms for a b o u n d e d matrix size or for special classes of matrices. Examples of these results are the O ( N 2) algorithms for multiplying N x N sparse or dense Boolean matrices [2, 3], and the O (N3/log N) algorithm for multiplying arbitrary N x N Boolean matrices [1]. The latter algorithm, k n o w n as the " F o u r Russians' M e t h o d " , unfortunately requires O(N3/logN) additional bits to store the O (N/log N) sets, each containing N rows of N bits each. In this paper, a new algorithm for Boolean matrix multiplication is presented; this algorithm is based on some properties of the product matrix, and it is shown to require O (N3/log N) bit operations (thus, achieving the F o u r Russian' bound) but only O (N log N) bits of additional storage.

* This work was supported in part by the Natural Sciences and Engineering Research Council of Canada under Grant No. A.2415.

376

N. Santoro and J. Urrutia:

The paper is organized as follows. In the next section, some properties of the product matrix are discussed; based on these properties, in Section3, an algorithm is presented which multiplies a p x N matrix by a N x N matrix in O(N 2) bit operations using O (N log N) bits of additional storage where p < [log 2 NJ ; finally, in Section 4, this algorithm is employed to obtain the claimed result. In the following, all logarithms are in base two and (P, Q, R) denotes the problems of multiplying a P x Q by a Q x R Boolean matrix.

2. Properties of the Product Matrix


Consider the product C = A x B where A is a p x N Boolean matrix, p < [log NJ, and B is a N x N Boolean matrix. Consider the sets
i zi+l 2k_I={X~Zk:A[i+ I,x]=I}

(1 a)
(lb)

and
z i 2k l = { x ~ Zik: A [i-I- 1 , x ] = 0 } +

( 0 < i < p , 1 _<k_<2i), where Z~ Obviously, Z~+* i+~=O and Z;k W=2k ~+1 7~+,=Z~(O<i<p,l<k<2~). 2k_l("~Z2k --- --

Example 1: C o n s i d e r t h e m a t r i x
01100101 A=01111000. 11001001 In this case, the sets Z~'s are as follows Z~

zl = {2,3,6,8}, z~={1,4,5,7};
Z2={2,3}, Z 2 = {6,8}, Z~={4,5), Z2={1,7}; Z~={2}, Z2 3={3}, Z 3={8}, Z 3={6}, Z~={5}, Z6 3={4}, Z{ ={1}, Z3 = {7}. Let f i = {1,2, ..., N} ~ {1 , 2 , . . , N} (0 < i _<p) be the mapping f i (x) = k iff x e Z~. (2)

A bijection ~z:{1,2 ..... N} + {1,2, ..., N} is said to be i-canonical (0 <_i<p) if for all x, y e { 1 , 2 .... ,N}
7"C (X)(77~ -1 (y) if -1

.f~(x)<f'(y).

(3)

Obviously, the identity function i is 0-canonical..

An Improved Algorithm for Boolean Matrix Multiplication

377

Given a bijection 7c: {1, ..., N} ~ {1, ...,N}, let

[i, k] =

{~ -~

(x): x E z~},

1~ [i, k] = min {a ~ ~ [i, k] }, h~ [i, k] = max {a s 7cI-i, k] }. Lemma 1: Let rc be i-canonical. Then, Jbr all non-empty Z~(1 <k_<2 ~)

9 z ki- {re (ls [~, k3), z ( l , [ i , k ] + l ) ,

...,

~(h~[i,k])}.

Proof: By definition of /-canonical mapping, each x eZ~ is such that re- 1 (w) < ~ - 1 (x) < z - 1 (z) for all w ~ Z~,,, z ~ Z/k,, where k' < k < k". Hence ~z[i, k] is a sequence of consecutive integers. By definition ofl~ [i, k] and h~ [i, k], and since ~zis a bijection the L e m m a follows. [ ]
The sets Z~ may conveniently be imagined as lying in a binary tree, each Z~ having Z~+~ 1 a n d T~2k as left and right children. The i-th level of the tree then consists of i+~ 2kthe sets Z~ (k = 1,2 .... ,2 i) which form a partition (possibly with some empty parts) of { 1,2, ..., N} ; the function f~ is the characteristic function of the partition. If the nodes at the i-th level, Za, Z2, ..., are sets of size m], m2, ..., an/-canonical bijection is one ~ ~ " that maps the first m] positive integers onto Z~, the next m~ integers onto Z~, and so
on.

In the case of the example above, we have the tree


{1,2,3,4,5,6,7,8}.

/\
{2} {3} {8} {6}
C o n s t r u c t i o n 2.1 :

?\
{5} {4}

?\
{1} {7}

Given an/-canonical bijection ~ , consider the following construction.

1. For all k(l_<k<2~), i f Z ~ 0. a) Partition ~z[i, k] into two sets P~ and Q~ such that P~ = {a E ~ [i, k]' A [i + 1, ~i (a)] = 1} and Q~ = {a ~ n[i, k] : A [i + 1, ~i (a)] = 0). b) Define gig : ~Z[i, k] ~ rc [i, k] to be a bijection such that for all a~ P~, b~Qik, Og(a)<Ok(b ). 2. Define hi+l: {1,2 ..... N} ~ {1,2, ...,N} to be the bijection defined by 7C/-+11(X) = Ok (7"g/~ (X)) iff re/- 1 (X) ~ ~Z[i, k]. 1

378

N . S a n t o r o a n d J. U r r u t i a :

Example 2:

It is easy to verify that the bijection ~2 defined as follows 1--2, 2-+3, 3-+6, 4-+8, 5-*4, 6-*5, 7--+1, 8--+7 is 2-canonical for the matrix of Example 1. The sets 7r2 [i, k] are as follows, 7z[2, 1] ={1,2}, rc[2,23 ={3,4}, ~ [2, 3] = {5, 6), ~ [ 2 , 4 ] = { 7 , 8 } ; the partitions P~, Q~ obtained by applying Construction 2.1 are P~={1}, P2Z=(4}, P2={6}, P2=(7};
Q2 = {2}, = {3}, Q2 = {5}, = {s};

and the bijections Ck are Ca:l--*l, 2-.2; ~2:4--*3, 3 ~ 4 ; ~3:6--5, 5--+6; r The bijection rc3 obtained by Step 2 of Construction 2.1 is then =a: 1-+2, 2 ~ 3 , 3-*8, 4-+6, 5-+5, 6--+4, 7---,1, 8--+7.
Lemma 2:

8--+8.

Let ~z~ be i-canonical, and let 7r~+ be the bijection obtained by 1 Construction 2.1. Then, 7 +1 is (i + 1)-canonical. h

Proof: It is not difficult to see that ifr h is/-canonical, then zi+ a is/-canonical. Hence, it suffices to prove that ~L~ (x) < rc,.-+](y) if/'+a (x) <f~+~ (y). Let x E Z~ + 1, y e Z~+~ with k'<k". Two cases may arise. Casel ( k ' = k " - l = 2 k - 1 ) : In this case, rc~-1 (x) E P~ and ~- * (y) e Q~; by Step 1 (b) of Construction 3.1

'V+', (x) = 0~ (~,-~ (x)) < 0~ (~?' (y)) = ~,-+~ (y). Case 2: Let c' = Lk'/2J and c" = [k"/2J, obviously c' < c". By definition (1), x e Zic9and y e Z~,, ; since ~i + 1 is/-canonical, then r~/-+~(x) < n/-+~ (y). E] Let qS~(/') be the Boolean function (1 <_i<p, 1 < k < 2 ~,1 < j < N )
V B [x,j]
4~, (j) =

if Z' =fi0 otherwise,


(4)

:,~ z~

where V denotes the Boolean OR.


Lemma 3: Let ~ be p-canonical. Then

(J) = I s = I=[p, k]
0 for l_<k<2 ~, I <j<_N. Proof: By Lemma 1.

I h~/k] B[Tr(s),j]

if Z f r
otherwise

[]

Lemma 4: For l < i < p , 1_<k<2 ~, I < j < N

"~i + 1 Proof: By observing that Z~=Z2k--,U Z i + l 2k

"

E]

An Improved

Algorithm

for Boolean Matrix Multiplication

379

Theorem 1 :
2i 1

C[i,j]=l iff V ~)2r_ 1 (j)- 1. (l<_i<_p,l<_j<_N). / r=l

Proof: By (1), it follows that


A[i,/]=I Then, by (4) it follows that iff 3 r e { l ,

/ 9 " .,2/-1}'leZ 2 r - 1

C[i,j]=l iff 3/e{1 .... ,N}:A[i,l]=B[1,j]----1 iff


3re{1 ..... 2i-1},le{1,...,N}:leZ~_~ A B[l,j]=l iff 3 r e { 1 .... ,2i-1} : V
xsZL

B[x,j]=liff

~re{1 .... ,2i-i} : (b~_l (j) = 1 iff


21-1

V q~i,.- ~(/)-- 1.
r=l

3. An Algorithm for (p, N, N) with p _<[_log N]

Theorem 1 shows that to compute the entries of the j-th column of the product matrix C = A x B it is sufficient to compute the OR of the Boolean functions (b~(])'s over the appropriate indices. Lemma 4 gives a method for computing the q~ (j)'s once the @~+1 (])'S have been computed; Lemma3 shows how to compute the starting values ~b~(j)'s once a p-canonical bijection n and the values h~ [p, k]'s and l~ [p, k]'s are known. Finally, Construction 2.1 provides a method for determining a (i + 1)-canonical bijection once a/-canonical bijection is known. Observe that, by Lemma 1, each set n i [i, k] is uniquely defined by the values l~ [i, k] and h~ [i, k] (1 _k<2/). The above considerations lead to the following algorithm for computing the product

C=AxB.
Algorithm 3.1 :

Step 1: (Computation of a p-canonical bijection: Initialization)


Set n 0 to be the identity function on {1 .... ,N}; set /~o[0,11:=l, h~o [0, 1 ] : = N , i:=1.

Step 2: (Computation of a p-canonical bijection: Iteration)

a) Compute an /-canonical bijection n/ from n/_ 1 using Construction2.1.


(Note: the set n / [ i - 1 , k ] is uniquely defined by the values l~_ 1 [ i - 1 , k] and h~_l [ i - 1,k], 1 < k < 2 i - 1 ) ; b) Compute the values l.~ [i, k] and h~ [i, k] (1 < k_< 2/); c) Set i : = i + 1 ; if i<_p, repeat Step2.

Step 3: (Computation of product matrix: Initialization)


Set j: = 1.
26 C o m p u t i n g 36/4

380

N. Santoro and J. Urrutia:

Step 4: (Computation of j-th column of product matrix: Initialization) a) Set i:=p; b) Compute qSP(j) (1 _<k _<2p) using Lemma 3. Step 5: (Computation ofj-th column of product matrix: Iteration) a) Compute C [i,j] using Theorem 1 ; -t b) Compute ~b~ (j) (1 <_k<2 i-1) using Lemma 4; c) Set i: = i - 1 ; if i>_ 1, repeat this step. Step 6: (Computation of product matrix: Iteration) Set j: = j + 1 ; if j _<N, goto Step 4.
Theorem 2: Algorithm 3.1 correctly computes the product C = A x B within finite time.

Proof: Finiteness is obvious. Correctness Lemmas 2 - 4 . []

follows from Theorem 1 and

Lemma 5: Given an i-canonical bijection 7rl, and the values l~i [i, k] and h~ [i, k] for each non empty Z~(1 <_k <_2~), an (i + 1)-canonical bijection can be computed by Construction 2.1 using 0 (N log N) bit operations.

Proof: Each non-empty ~ [i, k] is composed (see Lemma 1) of the consecutive i integers between l~ [i, k] and h~., [i, k]. To construct P~ and Qk, it is sufficient to test each entry A [i + 1, ~z (x)] (1~,[i, k] _<x _<h~ [i, k]); i
if the entry is zero, then x is added to Q~; otherwise, it is added to P~. Note that since x _<h~ [i, k] _<N, the addition of x to either P/k or Q~ would require [log N] bit operations. Hence, the construction of P~ and Q~ requires a total of [rh [i,k] I([logN]+ 1) bit operations; since 4)k can be constructed by simply examining the sets P~ and Q~ (in that order), and assigning consecutive integers between l~, [i, k] and h~, [i, k], an additional ] ~zl[i, k] [ [log N] bit operations suffice. Assuming that testing on whether rc~[i, k] is empty can be done in [log N] bit operations (note: this can be obviously achieved by setting l~ [i, k] = h~, [i, k] = 0 for empty rci I-i, k]), the execution of Step 2 requires at most
2i

~([ 7 [i, k]l(2 [log NO + 1)+ [log N]) h


k=l

bit operations. Since the ~i [i, k]'s are all disjoint and I._J ~ [i, k] = {1, ..., N}, then k O (N log N) bit operations are required in total by Step 1. Step 2 can be obviously performed in an additional O(NlogN) bit operations; hence, the Lemma holds. []
Lemma 6: The total execution of Step 2 of Algorithm 3.1 requires 0 (p N log N) bit operations.

A n I m p r o v e d Algorithm for Bo01ean Matrix Multiplication

381

Proof: By Lemma 5, the i-th iteration of Step 2(a) requires O(NlogN) bit operations, Since the values l~i [-i, k] and h~ [i, k] (in Step 2 (b)) can be computed from the sets P~- 1 and Q~,-~ in I rc~_~ [i - 1, k] [ [log N] bit operations, and since ~ l ~ i - 1 [ - i - 1 , k ] ] = N it follows that the i-th iteration of Step2(b) requires k O (N log N) bit operations. Since Steps 2 (a) and 2 (b) are performed 1.1. times, the Lemma holds. [] Lemma 7: For a given j, 1 <_j<_N, the total execution of Step 5 of Algorithm 3.1 requires 0 (N) bit operations. Proof: The i-th execution of Step 5 (a) (for a fixed j) requires at most 2i 1 - - I bit operations (see Theorem 1); Step 5(b) requires 2 r bit operations, one for each qS~,-1 (j) (see Lemma 4). Since Steps 5 (a) and 5 (b) are executed for i ~ {0, ..., p},
p

(2'-1 - 1 + 2~-~) = 2p+a - p i=0

1 = O(N)

bit operations are required in total.

[]

Theorem 3: Algorithm 3.1 computes the product C = A B using at most 0 operations.

(N 2) bit

Proof: By Lemmas 6 and 7, and by observing that each execution of Step 4 requires O (N) bit operations (see Lemma 4), and that Steps 4 and 5 are executed N times while Step 2 is executed only once. [ ] Algorithm 2.1 can be implemented so to employ only O (N log N) bits of additional storage. The basic ideas of this implementation are the following. An/-canonical bijection ~i can be obviously stored in an integer array of N elements, where the j-th entry contains ~h(J). Since rc~+ 1 (J) is constructed only afterwards, the storage area for zh(j) can be reused for ~ i + 1 (J)- Since PkUQk ~_{1,...,N} and i i i i i i Pk c~(~k= 0, each of the auxiliary sets Pk and Qk can obviously be stored in an integer array of N elements; furthermore, the same storage area can be used to store P~+ 1 and Q~+ 1 once Ok has been computed. Since the mapping Ok is only a "fragment" of gi+ 1 (see Step 2 of Construction 2.1), it is implicitly contained in rci+1. To store all values l~ [i, k] and h~i [i, k], two integer arrays of size 2 N each suffice (recall, there are 2p + l - 1 _ < 2 N - 1 possible /'s, and the same number of h's). Since all these integers are in {1,..., N}, 0 (Nlog N) bits of additional storage in total suffice to implement Steps 1 and 2 of Algorithm 2.1. For a fixed j, 1 < j < N, all the 2; + 1 _ 1 values ~b~(j)'s can be stored in a Boolean array of 2 N elements (recall, 2 p + 1 - l < 2 N - 1 ) ; the same array can obviously, be employed for successive j's. Hence, O (N) bits of additional storage in total are sufficient to implement the remaining Steps 3 - 6 of Algorithm 2.1. Therefore Theorem 4: Algorithm 2.1 can be implemented so to use at most 0 (Nlog N) bits of additional space.
26*

382

N. Santoro and J. Urrutia: An Improved Algorithm for Boolean Matrix Multiplication

4. An Algorithm for (N, N, N)


Consider now the product of two N x N Boolean matrices, A and B. Let m-- [log NJ and k = [ N / m ] . Partition matrix A into matrices A1,A 2.... ,Ak, where matrix A i ( l < i < k ) consists of rows m ( i - 1 ) + l .... ,mi of A, and A k consists of. the remaining rows. The product Ci = A; x B (1 _<i < k) is a m x N matrix which consists of the rows m ( i - 1) + 1, ..., m i of the product matrix C = A x B; C k = Ak x B consists of the remaining rows. Therefore, to compute C, it suffices to compute the products A i x B (1 _<i < k) using the matrix multiplication algorithm proposed in Section 2. Since m = / l o g NJ, the computation of each product will take O (N 2) bit operations; hence O (N3/log N) bit operations will be performed in total. The additional storage required when multiplying A i x B can obviously be employed in the computation of Ai+ 1 x B; hence, by Theorem4, O ( N l o g N ) bits of additional storage suffice to compute the product matrix C.

5. Conclusions
A new algorithm for computing the product of two arbitrary N x N Boolean matrices has been presented. It has been shown that the proposed algorithm requires O (N3/log N) bit operations (thus, achieving the Four Russians' bound) but only O (N log N) bits of additional storage. It should be pointed out that unlike the Four Russians' Method, the proposed algorithm cannot be directly executed on a vector computer.

Acknowledgements
The authors wish to thank Mike Atkinson for the helpful discussions.

References
[l] Arlazarof, V. L., Dinic, E. A., Kronrod, M. A., Faradzev, I. A. : On economical construction of the transitive closure of a directed graph. Doki. Akad. N a u k SSR 194, 4 8 7 - 4 8 8 (1970). [2] Santoro, N. : Four O (N 2) multiplication methods for sparse and dense Boolean matrices. In: Proc. 10th Conf. Numerical Mathematics and Computing, pp. 2 4 1 - 2 5 3 . Winnipeg, Manitoba, 1980. [3] Vyskoe, J.: A note on Boolean matrix multiplication. Inf. Proc. Letters 19, 2 4 9 - 2 5 1 (1984). Dr. Nicola Santoro School of Computer Science Carleton University Ottawa, K1S 5B6, Canada Dr. Jorge Urrutia Computer Science Department University of Ottawa Ottawa, K 1 N 5B4, Canada

Verleger: Springer-Verlag KG, Mtilkerbastei 5, A-1010 Wien. - - Herausgeber: Prof. Dr. Hans J. Stetter, Institut ffir Angewandte und Numerische Mathematik der Technischen Universit~it Wien, Wiedner Hauptstral3e 6 10, A-1040 Wien. - - Redaktion : Wiedner Hauptstrage 6--10, A- 1040 Wien. Hersteller : Satz Austro-Filmsatz Richard Gerin, Zirkusgasse 13, A-1020 Wien, Druck Paul Gerin, Zirkusgasse 13, A-1021 Wien. Verlagsort: Wien. - Herstellungsort : Wien. - - Printed in Austria.

You might also like