You are on page 1of 34

The Linear Algebra Behind Google

by
SINDHUJARANI.V Roll No. [10212338]

()

May 1, 2012

1 / 34

Outline of the Talk

Basic working of the Google.

Page Rank Algorithm.

Solving the Page Rank algorithm by eigen system.

Solving the Page Rank algorithm by linear system.

()

May 1, 2012

2 / 34

Introdution

The Internet can be seen as a large directed graph, where the Web pages themselves represent verticess, and their links as edges.

The page rank algorithm ranking the back links of the verticess.

Which verticess having more back links getting more important.

()

May 1, 2012

3 / 34

Example:Figure 1
'$ 1 '$ E 3

'  &% &% d s d T d d d d d d d d d d d d d d c '$ d d '$ d d 2 4 E &%


()

&%

May 1, 2012

4 / 34

we have the vote for the page x1 = 2, x2 = 1, x3 = 3, and x4 = 2. So that page 3 is the most important, pages 1 and 4 are second important, and page 2 is least important.

Drawback: Not all votes are equally important. A vote from a page with low importance should be less important than the vote from the more importance page.

To avoid this, each vote s importance is divided by the number of dierent votes a page casts.

()

May 1, 2012

5 / 34

Matrix Model
The new formate consider as a matrix which is in the form of Aij = 1/Nj 0 if Pj links to Pi , otherwise (1)

where Nj is the number of out links from page Pj . Recursive form: The recursive form per page is dened as ri =
jLi

rj /Nj ,

(2)

where ri is the page rank of page Pi , Nj is the number of out links from page Pj and Li are the pages that link to page Pi .

()

May 1, 2012

6 / 34

Let s apply this approach to gure 1. For page 1, the recursive form as x1 = x13 + x24 , for the page 2 x2 = x31 , x3 = x1 + x22 + x24 and x4 = x31 + x22 . 3 Now, these linear equations can be written Ax = x, where x=[x1 , x2 , x3 , x4 ]T and in the matrix form as 0 0 1 1/2 1/3 0 0 0 A= 1/3 1/2 0 1/2 , 1/3 1/2 0 0

which transforms the web ranking problem into the standardproblem of nding an eigenvector for a square matrix. In this case, we obtain x1 0.387, x2 0.129, x3 0.290, and x4 0.194, where page 1 getting rank 1, page 2 getting rank 3, page 3 getting rank 2, and page 4 getting rank 4.

()

May 1, 2012

7 / 34

Speciality of the matrix A

Denition: A square matrix is called a column stochastic matrix, if all of its entries are nonnegative and the entries in each column sum to 1. A is called a column stochastic matrix. A has 1 as an eigenvalue. A has left eigenvector, which sum is equal to 1.

()

May 1, 2012

8 / 34

Diculty arise when using the formula 2

Stuck at a page.

Nonuniquness rankings.

Stuck in a subgraph.

()

May 1, 2012

9 / 34

Stuck at a page
Denition: A node that has no out links is called dangling node.

If graph has dangling node then the link matrix has a column of zeros to that node. so we cannot get the column stochastic matrix. To modify the link matrix as a column stochastic matrix, replace all zeros with 1 n in all the zero column, where n is the dimension of the matrix. 1 A = A + eT d n where e is a row vector of ones, and d is a row vector, dened as dj =
()

Now the matrix as

(3)

1 0

if Nj = 0 otherwise
May 1, 2012

(4)
10 / 34

Example:Figure 2

 &% s d d d '$ '$ E 2 3 ' &% &% '$ '$ ' 4 5 E &% &% s '$ d d d d d 6 d &%
()

'$ 1

May 1, 2012

11 / 34

For our gure 2, we have d = [1, 0, 0, 0]. Thus 1 A = A + eT d n 0 0 0 = 0 0 0 0 0 0 0


1 2 1 2 1 3 1 3 1 3

0 0 0

0 0 0 0
1 2 1 2

0 0 0 1/2 0
1 2

With the creation of matrix A, we have a column stochastic matrix.

1 0 6 0 1 6 0 1 + 6 1 1 6 0 1 6 1 0 6

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

1 0 6 1 0 6 1 0 6 = 1 0 6 1 0 6 1 0 6

0 0 0 0
1 2

1 2

0 0 0
1 3

1 3 1 3

0 0 0 0
1 2 1 2

0 0 0
1 2 1 2

0 0 0 1 0 0

()

May 1, 2012

12 / 34

Nonuniquness ranking

For our rankings, it is desirable that the dimension of V1 (A) (corresponding to the eigenvalue 1) equal s 1, so that there is a nonzero eigenvector x with i xi =1 which can be for page ranking.

It is not always true in general.

()

May 1, 2012

13 / 34

Example:Figure 3

'$ &% T 1

'$ 3

'$ c 2 &%
()

&% '$ c 4 &%

s d &% T d d d '$ d 5

May 1, 2012

14 / 34

The link matrix of, gure 3 as

0 1 A = 0 0 0

1 0 0 0 0

0 0 0 1 0

0 0 0 0 1 1/2 . 0 1/2 0 0

We nd here that V1 (A) is two-dimensional. One possible pair of basis vectors is x = [1/2, 1/2, 0, 0, 0]T and y = [0, 0, 1/2, 1/2, 0]T .

We know that any linear combination of these two vectors yields another vector in V1 (A). so we will face the problem in the ranking.

()

May 1, 2012

15 / 34

Overcome the dim(V1(A))


To solving this problem we are modifying the equation (2).This analysis that follows, basically a special case of the Perron Frobenius theorem. Perron Frobenius theorem: Let B be an n n matrix with nonnegative real entries. Then we have the following: 1. B has a nonnegative real eigenvalue. The largest such eigenvalue (B), dominates the absolute values of all other eigenvalues of B. The domination is strict if the entries of B are strictly positive. 2. If B has strictly positive entries, then (B) is a simple positive eigenvalue, and the corresponding eigenvector can be normalized to have strictly positive entries. 3. If B has an eigenvector v with strictly positive entries, then the corresponding eigenvalue is (B).

()

May 1, 2012

16 / 34

A Modication of the Link Matrix A

For an n page web with no dangling nodes, We will replace the matrix A with the matrix A = A + (1 )S. (5)

For an n page web with dangling nodes, We will replace the matrix A with the matrix A = A + (1 )S. (6) where 0 1 is called a damping factor and S denote an n n matrix with all 1 entries n . The matrix S is column stochastic, and also V1 (S) is one dimensional.

()

May 1, 2012

17 / 34

Speciality of the matrix A

1. All the entries Aij satisfy 0 Aij 1. 2. Each of the columns sum to one, Aij = 1 for all j.

3. If the value of = 0, then we have the original problem as A = A. 4. If the value of = 1, then we have the problem as A = S.

()

May 1, 2012

18 / 34

Random walker

The random walker starts from a random page, and then selects one of the out links from the page in a random fashion.

The page rank of a specic page can now be viewed as the asymptotic probability that the walker is present at the page.

This is possible, as the walker is more likely to randomly wander to pages with many votes (lots of in links), giving him a large probability of ending up in such pages.

()

May 1, 2012

19 / 34

Stuck in a subgraph
There is still one possible pitfall in the ranking. The walker wander into a subsection of the complete graph that does not link to any outside pages. The link matrix for this model reducible matrix. We therefore want the matrix to be irreducible, which making sure he cannot get stuck in a subgraph. This irreducibility is called teleportationwhich means ability to jump with a small probability from any page in the link structure to any other page. This can mathematically be described for page with no dangling nodes as: 1 (7) A = A + (1 ) e T e. n For page with dangling nodes as: 1 A = A + (1 ) e T e n where e is a row vector of ones, and is a damping factor.
() May 1, 2012 20 / 34

(8)

Example:Figure 4

 &% s d d d '$ '$ E 2 3 ' &% &% '$ '$ ' 4 5 E &% &% s '$ d d d d d 6 d &%
()

'$ 1

May 1, 2012

21 / 34

The link matrix for gure 4 using equation (8) as, 1 11 19


6 12 1 60 60

1 60 1 60 1 60 1 60 7 15 7 15

1 60 1 60 1 60 7 15 7 15 1 60

1 60

1 6 1 6 1 T A = A + (1 ) e e = 1 n 6 1 6
1 6

19 60 1 60 19 60 1 60 1 60

11 12 1 60 1 60 1 60

1 60 1 60 11 12 1 60 1 60

Here set to 0.85. Here the matrix A is a column stochastic matrix. When adding (1 ) 1 e T e gives an equal chance of jumping to all pages. n

()

May 1, 2012

22 / 34

Analysis of the matrix A

Denition: A matrix A is positive if Aij > 0 for all i and j. If A is positive and column stochastic, then any eigenvector in V1 (A) has all positive or all negative components. If A is positive and column stochastic, then V1 (A) has dimension 1.

()

May 1, 2012

23 / 34

Solution Methods for Solving the Page rank Problem


The page rank is the same as nding the eigenvector corresponding to the largest eigenvalue of the matrix A. To solve this we need an iterative method that works well for large sparse matrices. There are two methods for solving Page rank Problem: 1. Eigen system problem. (The power method) 2. Linear system problem. (Jacobi method,Gauss-Seidel method,SOR method,etc..)

()

May 1, 2012

24 / 34

The power method

The Power method is a simple method for nding the largest eigenvalue and corresponding eigenvector of a matrix. It can be used when there is a dominant eigenvalue of A. Consider iterates of the power method applied to A as 1 1 Ax (k1) = Ax (k1) + e T dx (k1) + (1 ) e T ex (k1) = x (k) , n n where x (k1) is a probability vector, and thus ex (k1) = 1.

()

May 1, 2012

25 / 34

Convergence of the power method

Rescale the power method at each iteration by xk = any vector norm.

Axk1 Axk1

, where

can be

Every positive column stochastic matrix A has a unique vector x with positive components such that Ax = x with x 1 = 1. The vector x can be computed as x=limk Ak x0 for any initial guess x0 with positive components such that x0 1 = 1. The rate of convergence may be shown to be linear for the Power method is |2 /1 |.

()

May 1, 2012

26 / 34

Linear system problem

We begin by formulating the page rank problem as a linear system. 1 The eigen system problem Ax = Ax + (1 ) n e T ex = x can be rewritten as, 1 (I A)x = (1 ) e T =: b. n Let we split the matrix (I A) as, (I A) = A = (L + D + U), (10) (9)

where D is the diagonal matrix and, L and U are strict lower triangular and strict upper triangular respectively.

()

May 1, 2012

27 / 34

Properties of (I A)

1. (I A) is an M-matrix. 2. (I A) is nonsingular. 3. The row sums of (I A) are 1 . = 1 + , provided at least one nondangling node exists. 4. I A 5. Since (I A) is an M-matrix, (I A)1 0. 6. The row sums of (I A)1 is
1 1 .

Therefore (I A)1
1+ 1 .

1 1 .

7. Thus, the condition number (I A) =

Denition: A real matrix A that has Aij 0 when i = j and aii 0 for all i.A can be expressed as A = sI B, where s > 0 and B 0. when s > (B), A is called an M matrix. M matrix can be either singular or nonsingular.

()

May 1, 2012

28 / 34

Jacobi method

The Jacobi method can be applied to Google matrix (10) (L + D + U)x = b Dx = b (L + U)x k1
k

x k = D 1 [b (L + U)x k1 ],

where D is invertible matrix.

()

May 1, 2012

29 / 34

Gauss Seidel method

The Gauss Seidel method can be applied to Google matrix (10) (L + D + U)x = b (L + D)x k = b Ux k1 x k = (L + D)1 [b Ux k1 ],

where (L + D) is invertible matrix. The Gauss seidel method converges much faster than the Power and Jacobi methods. The disadvantage is very hard to parallelize.

()

May 1, 2012

30 / 34

SOR method
The SOR method can be applied to Google matrix (10) (L + D + U)x = b (L + D)x k = (b Ux k1 ) + (1 )Dx k1 x k = (L + D)1 [(b Ux k1 ) + (1 )Dx k1 ],

where 1 2. when =1, this method return to the Gauss seidel. Here (L + D) is invertible matrix.

The cost of SOR method per iteration is more expansive and less ecient in parallel computing for huge matrix system.

()

May 1, 2012

31 / 34

Plot for number of the iteration required for convergence by dierent method
0.8 0.7 1.4

1.2

0.6 1 0.5 0.8 0.4 0.6 0.3 0.4 0.2 0.1 0.2

10

15

20

25

30

1.5

2.5

3.5

4.5

Figure: Jacobi method


1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Figure: Gauss-seidal

1.5

2.5

3.5

4.5

Figure: SOR method

Figure: Power method

()

May 1, 2012

32 / 34

Conclusion

We are discussed mathematical idea used in Google search engine.

Investigated various problem arises when computing the Google matrix (Link matrix).

Taken, an example of large matrix representation of the Internet, and developed computing the Page rank using dierent methods.

()

May 1, 2012

33 / 34

Bibliography

Erik Andersson and Per-Anders Ekstrom. Investigating googles pagerank algorithm. (2):19, 2004. Pavel Berkhin. A survey on pagerank computing. (1):8889, 13-07-2005. Kurt Bryan and Tanya Leise. The 25,000,000,000 eigenvector: The linear algebra behind google. SIAM Review, (3), 2006. Amy N. Langville and Carl D. Meyer. Deeper inside pagerank. 2004.

()

May 1, 2012

34 / 34

You might also like