Professional Documents
Culture Documents
by
SINDHUJARANI.V Roll No. [10212338]
()
May 1, 2012
1 / 34
()
May 1, 2012
2 / 34
Introdution
The Internet can be seen as a large directed graph, where the Web pages themselves represent verticess, and their links as edges.
The page rank algorithm ranking the back links of the verticess.
()
May 1, 2012
3 / 34
Example:Figure 1
'$ 1 '$ E 3
&%
May 1, 2012
4 / 34
we have the vote for the page x1 = 2, x2 = 1, x3 = 3, and x4 = 2. So that page 3 is the most important, pages 1 and 4 are second important, and page 2 is least important.
Drawback: Not all votes are equally important. A vote from a page with low importance should be less important than the vote from the more importance page.
To avoid this, each vote s importance is divided by the number of dierent votes a page casts.
()
May 1, 2012
5 / 34
Matrix Model
The new formate consider as a matrix which is in the form of Aij = 1/Nj 0 if Pj links to Pi , otherwise (1)
where Nj is the number of out links from page Pj . Recursive form: The recursive form per page is dened as ri =
jLi
rj /Nj ,
(2)
where ri is the page rank of page Pi , Nj is the number of out links from page Pj and Li are the pages that link to page Pi .
()
May 1, 2012
6 / 34
Let s apply this approach to gure 1. For page 1, the recursive form as x1 = x13 + x24 , for the page 2 x2 = x31 , x3 = x1 + x22 + x24 and x4 = x31 + x22 . 3 Now, these linear equations can be written Ax = x, where x=[x1 , x2 , x3 , x4 ]T and in the matrix form as 0 0 1 1/2 1/3 0 0 0 A= 1/3 1/2 0 1/2 , 1/3 1/2 0 0
which transforms the web ranking problem into the standardproblem of nding an eigenvector for a square matrix. In this case, we obtain x1 0.387, x2 0.129, x3 0.290, and x4 0.194, where page 1 getting rank 1, page 2 getting rank 3, page 3 getting rank 2, and page 4 getting rank 4.
()
May 1, 2012
7 / 34
Denition: A square matrix is called a column stochastic matrix, if all of its entries are nonnegative and the entries in each column sum to 1. A is called a column stochastic matrix. A has 1 as an eigenvalue. A has left eigenvector, which sum is equal to 1.
()
May 1, 2012
8 / 34
Stuck at a page.
Nonuniquness rankings.
Stuck in a subgraph.
()
May 1, 2012
9 / 34
Stuck at a page
Denition: A node that has no out links is called dangling node.
If graph has dangling node then the link matrix has a column of zeros to that node. so we cannot get the column stochastic matrix. To modify the link matrix as a column stochastic matrix, replace all zeros with 1 n in all the zero column, where n is the dimension of the matrix. 1 A = A + eT d n where e is a row vector of ones, and d is a row vector, dened as dj =
()
(3)
1 0
if Nj = 0 otherwise
May 1, 2012
(4)
10 / 34
Example:Figure 2
&% s d d d '$ '$ E 2 3 ' &% &% '$ '$ ' 4 5 E &% &% s '$ d d d d d 6 d &%
()
'$ 1
May 1, 2012
11 / 34
0 0 0
0 0 0 0
1 2 1 2
0 0 0 1/2 0
1 2
1 0 6 0 1 6 0 1 + 6 1 1 6 0 1 6 1 0 6
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
1 0 6 1 0 6 1 0 6 = 1 0 6 1 0 6 1 0 6
0 0 0 0
1 2
1 2
0 0 0
1 3
1 3 1 3
0 0 0 0
1 2 1 2
0 0 0
1 2 1 2
0 0 0 1 0 0
()
May 1, 2012
12 / 34
Nonuniquness ranking
For our rankings, it is desirable that the dimension of V1 (A) (corresponding to the eigenvalue 1) equal s 1, so that there is a nonzero eigenvector x with i xi =1 which can be for page ranking.
()
May 1, 2012
13 / 34
Example:Figure 3
'$ &% T 1
'$ 3
'$ c 2 &%
()
s d &% T d d d '$ d 5
May 1, 2012
14 / 34
0 1 A = 0 0 0
1 0 0 0 0
0 0 0 1 0
0 0 0 0 1 1/2 . 0 1/2 0 0
We nd here that V1 (A) is two-dimensional. One possible pair of basis vectors is x = [1/2, 1/2, 0, 0, 0]T and y = [0, 0, 1/2, 1/2, 0]T .
We know that any linear combination of these two vectors yields another vector in V1 (A). so we will face the problem in the ranking.
()
May 1, 2012
15 / 34
()
May 1, 2012
16 / 34
For an n page web with no dangling nodes, We will replace the matrix A with the matrix A = A + (1 )S. (5)
For an n page web with dangling nodes, We will replace the matrix A with the matrix A = A + (1 )S. (6) where 0 1 is called a damping factor and S denote an n n matrix with all 1 entries n . The matrix S is column stochastic, and also V1 (S) is one dimensional.
()
May 1, 2012
17 / 34
1. All the entries Aij satisfy 0 Aij 1. 2. Each of the columns sum to one, Aij = 1 for all j.
3. If the value of = 0, then we have the original problem as A = A. 4. If the value of = 1, then we have the problem as A = S.
()
May 1, 2012
18 / 34
Random walker
The random walker starts from a random page, and then selects one of the out links from the page in a random fashion.
The page rank of a specic page can now be viewed as the asymptotic probability that the walker is present at the page.
This is possible, as the walker is more likely to randomly wander to pages with many votes (lots of in links), giving him a large probability of ending up in such pages.
()
May 1, 2012
19 / 34
Stuck in a subgraph
There is still one possible pitfall in the ranking. The walker wander into a subsection of the complete graph that does not link to any outside pages. The link matrix for this model reducible matrix. We therefore want the matrix to be irreducible, which making sure he cannot get stuck in a subgraph. This irreducibility is called teleportationwhich means ability to jump with a small probability from any page in the link structure to any other page. This can mathematically be described for page with no dangling nodes as: 1 (7) A = A + (1 ) e T e. n For page with dangling nodes as: 1 A = A + (1 ) e T e n where e is a row vector of ones, and is a damping factor.
() May 1, 2012 20 / 34
(8)
Example:Figure 4
&% s d d d '$ '$ E 2 3 ' &% &% '$ '$ ' 4 5 E &% &% s '$ d d d d d 6 d &%
()
'$ 1
May 1, 2012
21 / 34
1 60 1 60 1 60 1 60 7 15 7 15
1 60 1 60 1 60 7 15 7 15 1 60
1 60
1 6 1 6 1 T A = A + (1 ) e e = 1 n 6 1 6
1 6
19 60 1 60 19 60 1 60 1 60
11 12 1 60 1 60 1 60
1 60 1 60 11 12 1 60 1 60
Here set to 0.85. Here the matrix A is a column stochastic matrix. When adding (1 ) 1 e T e gives an equal chance of jumping to all pages. n
()
May 1, 2012
22 / 34
Denition: A matrix A is positive if Aij > 0 for all i and j. If A is positive and column stochastic, then any eigenvector in V1 (A) has all positive or all negative components. If A is positive and column stochastic, then V1 (A) has dimension 1.
()
May 1, 2012
23 / 34
()
May 1, 2012
24 / 34
The Power method is a simple method for nding the largest eigenvalue and corresponding eigenvector of a matrix. It can be used when there is a dominant eigenvalue of A. Consider iterates of the power method applied to A as 1 1 Ax (k1) = Ax (k1) + e T dx (k1) + (1 ) e T ex (k1) = x (k) , n n where x (k1) is a probability vector, and thus ex (k1) = 1.
()
May 1, 2012
25 / 34
Axk1 Axk1
, where
can be
Every positive column stochastic matrix A has a unique vector x with positive components such that Ax = x with x 1 = 1. The vector x can be computed as x=limk Ak x0 for any initial guess x0 with positive components such that x0 1 = 1. The rate of convergence may be shown to be linear for the Power method is |2 /1 |.
()
May 1, 2012
26 / 34
We begin by formulating the page rank problem as a linear system. 1 The eigen system problem Ax = Ax + (1 ) n e T ex = x can be rewritten as, 1 (I A)x = (1 ) e T =: b. n Let we split the matrix (I A) as, (I A) = A = (L + D + U), (10) (9)
where D is the diagonal matrix and, L and U are strict lower triangular and strict upper triangular respectively.
()
May 1, 2012
27 / 34
Properties of (I A)
1. (I A) is an M-matrix. 2. (I A) is nonsingular. 3. The row sums of (I A) are 1 . = 1 + , provided at least one nondangling node exists. 4. I A 5. Since (I A) is an M-matrix, (I A)1 0. 6. The row sums of (I A)1 is
1 1 .
Therefore (I A)1
1+ 1 .
1 1 .
Denition: A real matrix A that has Aij 0 when i = j and aii 0 for all i.A can be expressed as A = sI B, where s > 0 and B 0. when s > (B), A is called an M matrix. M matrix can be either singular or nonsingular.
()
May 1, 2012
28 / 34
Jacobi method
The Jacobi method can be applied to Google matrix (10) (L + D + U)x = b Dx = b (L + U)x k1
k
x k = D 1 [b (L + U)x k1 ],
()
May 1, 2012
29 / 34
The Gauss Seidel method can be applied to Google matrix (10) (L + D + U)x = b (L + D)x k = b Ux k1 x k = (L + D)1 [b Ux k1 ],
where (L + D) is invertible matrix. The Gauss seidel method converges much faster than the Power and Jacobi methods. The disadvantage is very hard to parallelize.
()
May 1, 2012
30 / 34
SOR method
The SOR method can be applied to Google matrix (10) (L + D + U)x = b (L + D)x k = (b Ux k1 ) + (1 )Dx k1 x k = (L + D)1 [(b Ux k1 ) + (1 )Dx k1 ],
where 1 2. when =1, this method return to the Gauss seidel. Here (L + D) is invertible matrix.
The cost of SOR method per iteration is more expansive and less ecient in parallel computing for huge matrix system.
()
May 1, 2012
31 / 34
Plot for number of the iteration required for convergence by dierent method
0.8 0.7 1.4
1.2
0.6 1 0.5 0.8 0.4 0.6 0.3 0.4 0.2 0.1 0.2
10
15
20
25
30
1.5
2.5
3.5
4.5
Figure: Gauss-seidal
1.5
2.5
3.5
4.5
()
May 1, 2012
32 / 34
Conclusion
Investigated various problem arises when computing the Google matrix (Link matrix).
Taken, an example of large matrix representation of the Internet, and developed computing the Page rank using dierent methods.
()
May 1, 2012
33 / 34
Bibliography
Erik Andersson and Per-Anders Ekstrom. Investigating googles pagerank algorithm. (2):19, 2004. Pavel Berkhin. A survey on pagerank computing. (1):8889, 13-07-2005. Kurt Bryan and Tanya Leise. The 25,000,000,000 eigenvector: The linear algebra behind google. SIAM Review, (3), 2006. Amy N. Langville and Carl D. Meyer. Deeper inside pagerank. 2004.
()
May 1, 2012
34 / 34