You are on page 1of 19

ECS289: Scalable Machine Learning

Big-O Notations

Cho-Jui Hsieh
UC Davis

Oct 20, 2015


Outline

Time complexity and Big-O notations


Time complexity for basic linear algebra operators
Time Complexity

From Wikipedia:
The time complexity of an algorithm quantifies the amount of time taken
by an algorithm to run as a function of the length of the input
The time complexity of an algorithm is commonly expressed using big O
notation, which excludes coefficients and lower order terms.
Time Complexity

From Wikipedia:
The time complexity of an algorithm quantifies the amount of time taken
by an algorithm to run as a function of the length of the input
The time complexity of an algorithm is commonly expressed using big O
notation, which excludes coefficients and lower order terms.
Although time complexity is a good indication of efficiency, in practical
numerical computation sometimes constants are important:
For example, time for running 1 billion operations:
exp: 30.19 secs
*: 1.84 secs
/: 7.31 secs
+: 1.77 secs
In this course we will ignore these constants
Big-O

Definition of O():
Let f and g be two functions, we write

f (x) = O(g (x)) as x

if and only if there exists a positive constant M and x0 such that

|f (x)| M|g (x)| for all x x0

In short, f (x) = O(g (x)) means f is upper bounded by g up to a


constant factor
Big-O

Definition of O():
Let f and g be two functions, we write

f (x) = O(g (x)) as x

if and only if there exists a positive constant M and x0 such that

|f (x)| M|g (x)| for all x x0

In short, f (x) = O(g (x)) means f is upper bounded by g up to a


constant factor
How to show the time complexity O(g (x))?
Show there exists a way to implement the algorithm and the
implementation requires Cg (x) operations for some C
Big-Omega

Definition of ():
Let f and g be two functions, we write

f (x) = (g (x)) as x

if and only if there exists a positive constant m and x0 such that

|f (x)| m|g (x)| for all x x0

In short, f (x) = (g (x)) means f is lower bounded by g up to a


constant factor
Big-Omega

Definition of ():
Let f and g be two functions, we write

f (x) = (g (x)) as x

if and only if there exists a positive constant m and x0 such that

|f (x)| m|g (x)| for all x x0

In short, f (x) = (g (x)) means f is lower bounded by g up to a


constant factor
How to show the time complexity (g (x))?
Prove any implementation requires at least Cg (x) operations for
some constant C .
Big-Theta

Definition of ():
Let f and g be two functions, we write

f (x) = (g (x)) as x

if and only if there exists positive constant m, M and x0 such that

M|g (x)| |f (x)| m|g (x)| for all x x0

In short, f (x) = (g (x)) means f has the same order with g up to a


constant factor
Big-Theta

Definition of ():
Let f and g be two functions, we write

f (x) = (g (x)) as x

if and only if there exists positive constant m, M and x0 such that

M|g (x)| |f (x)| m|g (x)| for all x x0

In short, f (x) = (g (x)) means f has the same order with g up to a


constant factor
How to show the time complexity (g (x))?
Show both Big-O and Big-Omega
Count number of operations

Count the total number of operations (+, , , /, exp, log , if, . . . )


Only need to count the order of operations, and then use the big-O
notation
Dense vector and sparse vector

If x, y Rm are dense:
x + y, x y, x T y: O(m) operations
If x, y Rm , x is dense and y is sparse:
x + y, x y, x T y: O(nnz(y)) operations
If x, y Rm and both of them are sparse:
x + y, x y, x T y: O(nnz(y) + nnz(x)) operations
Dense Matrix vs Sparse Matrix

Any matrix X Rmn can be dense or sparse


Dense Matrix: most entries in X are nonzero (mn space)
Sparse Matrix: only few entries in X are nonzero (O(nnz) space)
Dense Matrix Operations

Let A Rmn , B Rmn , s R


A + B, sA, AT : O(mn) operations
Let A Rmn , b Rn1
Ab: O(mn) operations
Dense Matrix Operations

Matrix-matrix multiplication: let A Rmk , B Rkn , what is the


time complexity of computing AB?
Dense Matrix Operations

Assume A, B Rnn , what is the time complexity of computing AB?


Naive implementation: O(n3 )
Theoretical best: O(n2.xxx ) (but slower than naive implementation in
practice)
Best way: using BLAS (Basic Linear Algebra Subprograms)
Dense Matrix Operations

BLAS matrix product: O(mnk) for computing AB where


A Rmk , B Rkn
Compute matrix product block by block to minimize cache miss rate
Can be called from C, Fortran; can be used in MATLAB, R, Python, . . .
Sparse Matrix Operations

Widely-used format: Compressed Sparse Column (CSC), Compressed


Sparse Row (CSR), . . .
CSR: three arrays for storing an m n matrix with nnz nonzeroes
1 val (nnz real numbers): the values of each nonzero elements
2 row ind (nnz integers): the column indices corresponding to the values
3 col ptr (m + 1 integers): the list of value indexes where each column
starts
Sparse Matrix Operations

If A Rmn (sparse), B Rmn (sparse or dense), s R


A + B, sA, AT : O(nnz) operations
If A Rmn , b Rn1
Ab: O(nnz) operations
If A Rmk (sparse), B Rkn (dense):
AB: O((nnz)n) operations (use sparse BLAS)
If A Rmk (sparse), B Rkn (sparse):
AB: O(nnz(A)nnz(B)/k) in average
AB: O(nnz(A)n) worst case
The resulting matrix will be much denser

You might also like