Professional Documents
Culture Documents
chunqi.shi@hotmail.com
http://hi.baidu.com/shichunqi
1 07/06
2 07/08
3 07/16 1/3 07/30
4
5
http://sewm.pku.edu.cn/IR-Guide.txt
--
.................................................................................................................................. 1
A Simple Scratch of Search Engine.................................................................................................... 1
................................................................................................................................. 2
.................................................................................................................................. 2
.................................................................................................................................. 2
-- ....................................................................................................... 3
............................................................................................................. 3
............................................................................................................. 5
1 ............................................................................................................................ 6
2 ................................................................................................................ 7
3 .......................................................................................................................... 10
.......................................................................................................... 10
1.
Spider ............................................................................................... 11
2.
Spider ............................................................................................... 12
3.
Spider ....................................................................................... 15
4.
Spider ....................................................................................... 18
5.
Spider ........................................................................................................ 23
...................................................................................................... 23
1.
2.
De-duplicate........................................................................................... 35
3.
Anti-spam .................................................................................................. 43
.......................................................................................................... 48
....................................................................................................... 48
................................................................................................................ 48
-- ........................................................................................................ 49
.......................................................................................................................................... 49
1,
GoogleBaidu
http://www.baidu.com/more/ Google
http://www.google.com.hk/intl/en/options/
Lab
Wiki
http://en.wikipedia.org/wiki/List_of_search_engines 10
1 2 3
45P2P
6Email78910
14
1[] 23456
78910 1112
13 14
2
Yahoo
InfoseekGoogle Baidu
Google 1 1000
3171
Grassroots
--
1.
2.
(Index) (Hash)
3.
Storage Pyramid
Register
CacheInternal StorageExternal Storage
Index
Keywords/Term
Retrieval
1.1
entirelytimely
fast indexing
Terms
resemblance ranking
Internet
WWW
1.1
IO
/
/
CACHE->->->
/->/ Pyramid Hierachy
ClusterDistributed
Inverted IndexSequential
Hashing
WEB
MVC(Model-View-Controller) WEB DATA
ResemblanceRank WEB DATA
Retrieval
Web-Data-Retrieval
Google Baidu
Yahoo
TRECSIGIRWWW
(Information Retrieval) WEB Web Technology
1.2
Spider
Spider CrawlerSpider Schedule
Spider Update
Indexer Indexer
IR
Indexer Analyze
Data Base
Retrieval Retrieval
Retrieval Query Resemblance
Rank
User Interface
Frontend
Internet
Query
Schedule
Spider
Indexer
Update Preprocess
Retrieval
Analyze
Rank
Backend
1.2
Modeling the Internet and the
Web. Probabilistic Methods and Algorithm
http://book.douban.com/subject/1756106/
http://ibook.ics.uci.edu/slides.html
PDF
http://bib.tiera.ru/DVD-010/Baldi_P.,_Frasconi_P.,_Smyth_P._Modeling_the_Internet_and_the_Web
._Probabilistic_Methods_and_Algorithms_(2003)(en)(285s).pdf
Internet Intranet LAN
RJ-45
ISO TCP/IP
Internet
InternetIntranet LAN
TCP/IP
World Wide Web HTTP hyperlinks
Net Web WWWWebsite
WWW
1999 Chinaren 263
WWW WWW Web
1
-- CNNIC
16 http://www.cnnic.net.cn/index/0E/21/index.htm
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Internet Internet
1994 5 BBS BBS
2000 12 12
2001 1 1 ""
2001 7 9
2001 12 20
2004 2 3 18 2003
2004 5 13
2004 6 16
2005 8 5
BlogRSSWIKISNS
14. 2006 12 18 Verizon
15. 2007 100
2.
()
3.
4.
()
5.
WEB2.0
=> => =>
2
2010 1 CNNIC
http://www.cnnic.net.cn/uploadfiles/pdf/2010/1/15/101600.pdf
/
- 11
html/htm shtml
php asp jsp aspx 3:1:5
- 10
75% 55%
30% 8%
1%
- 5 336 ~
- 14 30K
964 Terabytes
- 11
.html
20.1%
htm
6.5%
2.1%
shtml
8.7%
asp
12.6%
php
22.2%
txt
0.0%
nsf
0.0%
xml
0.0%
jsp
1.0%
cgi
0.2%
pl
0.0%
aspx
6.1%
do
0.5%
dll
0.0%
jhtml
0.0%
cfm
0.0%
php3
0.0%
phtml
0.0%
19.7%
100%
- 10
7.7%
21.2%
28.1%
18.8%
24.3%
100%
- 5
2008
2009
16,086,370,233
33,601,732,128
108.88%
7,891,388,272
18,998,243,013
140.75%
49.06%
56.54%
8,194,981,961
14,603,489,115
78.20%
50.94%
43.46%
0.96:1
1.3:1
KB
460,217,386,099
1,059,950,881,533
130.32%
5,588
10,397
86.06%
KB
28.6
31.5
10.30%
- 14
289.5
119.4
32.2
35.2
124.1
26.6
20.7
40.4
29.6
29.7
93.5
30.6
12.8
8.1
31.8
30.5
7.2
27.3
31.3
33.4
18.9
26.5
61.8
30.8
1.1
28.1
30.0
27.7
22.8
33.0
8.0
26.5
2.4
27.4
18.4
27.5
12.1
38.9
0.2
29.5
8.8
34.6
10.9
27.2
7.2
31.9
2.7
31.2
5.5
35.5
1.2
28.6
0.2
26.1
1.2
24.6
2.1
25.1
1.2
26.5
0.1
43.9
964.0
30.8
3
QQ
hao123
- 8 80%
- 8
Spider (MIT).Matthew Gray)
1993 NCSA Mosaic
("www wanderer") Wanderer Perl
.David Eichmann RBSE spider
3.2David EichmannIowa
Spider Spider
EVP() Google --
http://www.cqumzh.cn/att_blog/month_0901/a2fe1b64c99263b246e9d923f1055549_1231307756.p
df
1. Spider
Spider
1
2 Spider
1
2
URI/URL
1 URL URL
JSP/ASP/Servlet/PHPURL URL Mapping
2 URL URL
URL
Spider Trap
URL
URL Spider URI /
Internet
tcp/ip
Downloader
unfetched URLs
URI
update
tcp/ip
Spider
URL extractor
& normalizer
HTML
update
update
Analyzer
2. Spider
IP
URL HOST IP
DNS Resolver Spider WebSite
Robots Robots
Win-Win Crawler Trap URL path
robots.txt Spider
(Stress) Access RateMaximum Stress
Block
Website(Domain
Name) DN IP Download
IP Wildcard Domain
Infinite Sub-domain Generator
Domain Uniformization
IP
Multi-Downloader Schedule DNS DNS resolver Robots Robots Protocol
Checker Website Meta-info Collector[Maximum Stress
IP Multi-IP Stress BalanceDomain Uniformization]
HTML
Parser
Spider
Internet
robots.txt
DNS Cache
DNS Resolver
Client
pages
Robots
Checker
Multi-Downloader
Multi-Downloader
Scheduler
tcp/ip
unfetched URLs
update
URI
Site/Domain
update
URL extractor
& normalizer
Content
extractor
update
update
HTML Parser
Spider
HTML
Data Package
Analyzer
3.2.1 Spider
Spider
//Multi-Downloader
Multi-Downloader
Paralleled Crawler
Partitioning
Crawler Center
Distributed Crawler
Locally DistributedLower Latency
Internet Backbone Traffic Interchange Politeness
Crawler
Center 0
Downloader 0
Downloader 0
Crawler 0
Downloader 0
Multi- Downloader
Crawler 1
Crawler
Center 1
Spider
Architecture
Crawler
Center L
Crawler N
Paralleled Crawler
Distributed Crawler
3.2.2 Spider
Internet
Crawler 0
qq.com/download
qq.com/blog
163.com/news
163.com/mail
Crawler 1
news.sina.com.cn/beijing
news.sina.com.cn/whether
sohu.com/news
sohu.com/maps
Scheduler
St at ic Paralleled
3.2.3
Crawler N-1
Spider
Internet
Crawler 0
Crawler 1
Crawler N-1
qq.com/download
qq.com/blog
163.com/news
163.com/mail
news.sina.com.cn/beijing
news.sina.com.cn/whether
sohu.com/news
sohu.com/maps
Scheduler
Spider
Dynamic Paralleled
3.2.4
3. Spider
Devanshu Dhyani A Survey of Web Metrics
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.5859&rep=rep1&type=pdf
336 Seeds
30% 8%
Yahoo Research Barcelona Lab http://labs.yahoo.com/Yahoo_Labs_Barcelona
Ricardo
Baeza-Yates
B.
Barla
Cambazoglu
Tutorial
Yahoo
http://www.lirmm.fr/~coletta/CaisePresentations/TutorialYAHOO.pdf Spider
Quality Metrics
1 Crawler
Coverage: The percentage of the Web discovered or downloaded by the crawler.
2
Freshness: Measure of out-datedness of the local copy of a page relative to the pages original copy
on the Web
3
Page importance: Percentage of important or popular pages in the repository
Ricardo Baeza-YatesYahooVP
ACM Fellow SIGIR 2009
Quantifying Performance and Quality Gains in Distributed Web Search Engines. In SIGIR
2009http://research.yahoo.com/search/node/Quantifying
http://www.dcc.uchile.cl/~rbaeza/
http://research.yahoo.com/user/70
Spider
Coverage
Internet
Schedule
Freshness
Page importance
Analyzer
3.3.1
2003 :
--
http://sewm.pku.edu.cn/TianwangLiterature/Other/%5B%CD%F5%BC%CC%C3%F1,2003a%5D/032
116.pdf
Poisson Web Poisson
3 :(1);(2);(3)
Web T Web
1-exp-*T=0.5
CNNIC -
10
Sanasam Ranbir
http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-462.pdf
Link Rot
Dead Link / Broken Link Link RotDead Link
Link Rot
Dead Link
Spider
Spider
obsolete
stable
Re-fetch
Fetched
fresh
To-fetch
New Links
Internet Status(t + T)
3.3.2T
Outdated/Obsolete Spider
Create-Delete-Update CRUD
Spider Spider
Google Deepbot Freshbot Deepbot
Coverage Freshbot Re-visit/Refresh
Spider
Coverage
Internet
Fetch
Refresh
Schedule
Freshness
Page importance
Dead links
Analyzer
3.3.3
4. Spider
Carlos Castillo EffectiveWeb Crawling
http://www.webir.org/resources/phd/Castillo_2004.pdf
Carlos Castillo University of Chile
YAHOO
&VPRicardo Baeza-Yates
--
http://www.c-s-a.org.cn/ch/reader/view_abstract.aspx?file_no=20090752&flag=1
Selecting/Ordering Strategies
FetchRefresh
Granularity
Divide Standard
Fetch
Refresh
Regular
Historical/Empirical Feedback
4.1
Granularity Geographically
Website
Page
Link/URL
Tracks
Linguistic
Architecture
Popularity
Group Pattern
Encoding
Traffic
Utility
URL Keyword
Popularity
Relevance
Path Depth
Quantity &
Saturation
4.2
,
Page Importance
Website ArchitectureLink Path Depth
Search Engine
Optimize
Search Engine Cheat
Spam Website
Page Importance
Google .Matt Cutts
HIThis is the great content I has
http://www.mattcutts.com/blog/
http://v.youku.com/v_show/id_XMTY3NTM2ODQ0.html
{
~
.
Is content still the king or has something else (structure) taken over? "Content is necessary. It's
not always sufficient because people have to find out about your content. But if you don't have good
content, it's a lot harder to do good search engine optimization for your site." ~ Matt Cutts.
}
Net
Homepage/Index Link
ContentLink
Content
2.3.4.1
1 Breadth First
a) --
b) /
c)
Baseline
Carlos Castillo EffectiveWeb
Crawling
Crawling the Infinite Web: Five Levels are Enough
3-5
90%
90%
Follow 5
d)
d) TOP N
3Skeleton Links
a)--
b)
c) Yida Wang SIGIR08
Exploring Traversal
Strategy for Web Forum Crawling
http://research.microsoft.com/pubs/131117/forumcrawl_sigir08.pdf
Unique
Pruning
4Possion Process
a)--
b)
c)
--
http://d.wanfangdata.com.cn/Periodical_xdjsj-xby200912018.aspx
10%
Index
LinkContent
http://www.jos.org.cn/ch/reader/view_abstract.aspx?file_no=20060513
d)Index/Link/Content F(Index)/F(Link)/F(Content)
X 5*X/t
5 Backlink Count
a)
b)/
c) Hyperlink
Backlink/Inlink
Baseline
d)
Link Backlink Count
6 PagerankBatch Pagerank
a)
b)/
c) Pagerank
Pagerank
d) Pagerank K Pagerank
Pagerank
7 PagerankPartial Pagerank
a)
b)/
c) Pagerank Pagerank Pagerank
Pagerank Pagerank Pagerank Pagerank
d) Pagerank K Pagerank
Pagerank Pagerank
d) Phrase/Multi-Words Query
Single Word Query
p q r1 r2 r1,r2 Q(p)
p Q(p) Q(p) P(p,t)=Q(p)*(t-LR(p,t)) LR(p,t)
5. Spider
Information Retrieval
Pre-process
Universal/Opening DistributionClean
Pros and Cons
Anti-Spam
Site/Domain
HTML
parser
Anchors
Link-relation
Tilte
Content
Content
extractor
Indexer
De-duplicate
Pre-process
Segmentation
Quality Selection
page analyze
Specials
Dictionary
4.1
1. Quality Selection
Page
Importance
GeneralQuality Selection
/
Pros and Cons
4.1.1
Sites Dictionary
2-4
<20%
2-5
>80%
Site Map
Site Topics
3-5 90%
2-4 2-5
20% 80% 3 N
1N N>4
1/5=20% 4/5=80%
Google Baidu
Site Evaluation
Website
1)/Credibility/ Authority
.org
com
.net/.cn
LuceneHadoop Doug Cutting
http://cutting.wordpress.com/
Semantic Web
Email
2)Reputation
TrafficIndex
Alex 100
Navigation
3)Audience
4)Completeness
5)Access/Workability
6)Accuracy
7)Currency
8)Uniqueness
9)/Facticity/Objectivity
Encyclopaedia
Wikipedia Ask
Wikipedia Google
10)(Quality of writing)
Typographical errors/spelling mistakes
Google
Sign of Zodiac
4.1.3
1
10
11
12
/Link/Anchor Content
Canon
(Saint)
Authority and Hub
Link
Anchor
Relevant Linkage Principle [Kleinberg 1997]
Link_A Link_B Link_A Link_B
Topical Unity Principle [Kessler 1963, Small 1973]
Link_C Link_A Link_B Link_A Link_B
Lexical Affinity Principle [Maarek et al. 1991]
Link_A Link_B URL Link_A
Link_B Anchor
Link_A
Link_B
Page Clean Site Templates
Pagelets Analysis
HTML DOM TREE HTML
(ordered linear space) two-dimensional space
DOM Tree
DOM Tree Web Page Cleaning for Web Mining through Feature Weighting
Visual Tree Entropy-Based Visual Tree Evaluation on Block Extraction
Site Templates Joint Optimization of Wrapper Generation and Template Detection
Site Templates Site-Independent Template-Block Detection
Site Templates Page-level Template Detection via Isotonic Smoothing
Visual Tree
DOM Tree
CSS
HTML
4.1.2
http://news.sina.com.cn/c/2010-07-29/012020778393.shtml
Site Templates
http://news.sina.com.cn/c/2010-07-29/163620785082.shtml
Pagelets Analysis
1 PageRank Hilltop
2 HITS SALSA
3 Entropy Analysis
Ranking Link Analysis Ranking
PageRank
The Anatomy of a Large-Scale HypertextualWeb Search Engine
4.1.5 PageRank
Hilltop
When Experts Agree: Using Non-Affiliated. Experts to Rank Popular Topics
Krishna Bharat
George
Andrei
Mihaila
4.1.6 Hilltop
HITS
Hyperlink-Induced Topic Search --Authoritative Sources in a Hyperlinked Environment
4.1.7 HITS
SALSA
The Stochastic Approach for Link-Structure Analysis The Stochastic Approach for Link Structure
Analysis (SALSA) and the TKC Effect
4.1.8 SALSA
Entropy Analysis
Entropy-Based Link Analysis for Mining Web Informative Structures
Mining Web Informative Structures and Contents Based on Entropy Analysis
4.1.9 Entropy
Text Content
Pros and Cons
Noise
HTML Semi-structured
Fixed Structured
Multimedia data
...
More structured
Table
List
...
Unstructured
(Plain Text)
...
HTML
4.1.10
Wikipedia 14 Topic
10
1 2 3
45P2P 6Email
78910
Universal Search GoogleBaiduYahooBing
Vertical Searchkooxoo gougou
qihoo
4.1.2
Blog
News
Image
Vedio
Forum
(P2P)
Wap
2. De-duplicate
Duplicate/Near-Duplicate Detection
Copy Detection / Plagiarism Detection / Duplicate Detection),
// 76 Ottenstein
Attribute Counting Copy Detection
20 1993 . Udi Manber Arizona SIFF (Finding
http://blog.csdn.net/malefactor/
http://blog.csdn.net/malefactor/archive/2006/06/09/782882.aspx
Google Gurmeet Singh Manku Detecting Near Duplicates for Web Crawling
SimHash
http://infolab.stanford.edu/~manku/papers/07www-duplicates.ppt
Yahoo P Govindarajulu Duplicate and Near Duplicate
Documents Detection: A Review
http://www.eurojournals.com/ejsr_32_4_08.pdf
MIT Shreyes Seshasai 09 Efficient Near Duplicate Document Detection
for Specialized Corpora
http://via.mit.edu/documents/Seshasai.pdf
http://sewm.pku.edu.cn/TianwangLiterature/PhdDissertation/%5BHuang,2008%5D/hle_thesis.pd
f
(Process Introduction)
1,
2,
20% 30%
3,
1
exact duplicates:mirroringplagiarism
near duplicates: Advertisements
Template Frames
timestamps
2
post process:
inline process:
url
4
1.
LOGO Page Clean/Noise Redection
Abstarct Extraction
2. /FingerPrint
3. Resemblance 2 Distance
Fingerprint online
4. Cluster IterativeGraph
Union Find
5. Delegates
Hashing=>Signatures=>Fingerprint; Vector=>Cosine=>Distance=>Resemblance;
Delegates
Cluster
De-duplicate
FingerPrint
Resemblance
Segmentation
HTML
Dictionary
Page Clean
4.2.2
Link-duplicate
outlinksPath Hash
Hash
SEOSPAM
proper
subgraph
Content-duplicate
Fingerprint
FingerPrintMilestones
1, CheckSum: Checks MD5 & SHA & CRCs
2, Longest Common Subsequence
3, Shingling Broder 1997: Jaccard index of tokens
4, SimHash Charikar 2002/ Gurmeet Singh Manku 2007WWW:
5, I Match Chowdhury 2002: IDF tokens
tokens tokens Digest Digest
Jaccard
7, Bloom Filter Bloom 1970 / Chazelle 2004
: K Hash m Hash
xk=>m0<=m<=M-1 1
8, Chunk HP LAB 2005/2009: Window Chunk
Chunk
1. CheckSum
URL
URL
Statement
3. Shingling
Broder 1997Jaccard index of tokens
Syntactic Clustering of the Web
www.hpl.hp.com/techreports/Compaq-DEC/SRC-TN-1997-015.pdf
4. SimHash
Google --Detecting Near-Duplicates for Web Crawling
http://infolab.stanford.edu/~manku/papers/07www-duplicates.ppt
5. I Match
Improved Robustness of Signature-Based Near-Replica. Detection via Lexicon Randomization
www.ir.iit.edu/~abdur/publications/470-kolcz.pdf
6. Spotsig
SpotSigs: Robust and Efficient Near Duplicate Detection in. Large Web Collections.
http://ilpubs.stanford.edu:8090/831/1/2008-14.pdf
7. BloomFilter
Using Bloom Filters to Refine Web Search Results
www.cs.utexas.edu/users/dahlin/papers/webdb-167.pdf
8. Chunk
A Framework for Analyzing and Improving Content-BasedChunking Algorithms
http://www.hpl.hp.com/techreports/2005/HPL-2005-30R1.pdf
Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup
www.hpl.hp.com/personal/Mark_Lillibridge/Extreme/final.pdf
Resemblance Distance
Euclidean,
Manhattan,Chebyshev, Jaccard, Cosine
,Correlation Coefficient
6
1. Cosine Similarity
2. Jaccard Index
3. Tonimoto Index
4. Pearson Correlation Coefficient
5. SimRank
6. Levenshtein distance
1. Cosine Similarity.
TF-IDF
Stop
Words 1. 2.
Document A
Document B
Document C
Stop Words
1000
1.
**
2.
3.
4.
*
*
5.
6
*
*
*
*
Document A 1 4 1 6 7
Document B 1 3 4 6
Document C 2 7
Document A Document C
1234567
Document A 2 0 0 1 0 1 1
Document B 1 0 1 1 0 1 0
Document C 0 1 0 0 0 0 1
Number(i)) )
Number(i)) )
Document B ?
Document C
Document A Document B [0~1]
Cos = AB / (||A||
*||B||) = AB / ((A A) *( B B)) )
Google
2. Jaccard Index
Cosine Similarity Jaccard Index
Jaccard Coefficient A, B A, B A,B
Jaccard Coefficient = ||AB || / ||AB||
Jaccard Distance = 1 - Jaccard Coefficient = (||AB|| - ||AB ||) / ||AB||
Document A 1 4 1 6 7
Document B 1 3 4 6
Jaccard Distance = 1 - || (1, 4, 6) || / || (1, 3, 4, 6, 7) || = 1 - 3 / 5 = 0.4
3. Tonimoto Index
1. Cosine Similarity. 2. Jaccard Index Tonimoto Index
T(A, B) = AB / (AA + BB - AB)
T(A, B) = ||AB || / ( ||A|| + ||B|| - ||AB || )
Dice Coefficient
D(A, B) = 2 AB / (AA + BB)
D(A, B) = 2 ||AB || / ||A|| + ||B||
D = 2J / (1 + J) and J = D / (2 D)
5. SimRank
R0(a,b) = (a == b);
Ri(a, b) = CijR(Ii(a), Ij(b)) / (||I(a)|| * ||I(b)|| )
I(x) : x in-neighbors of x
6. Levenshtein distance
Levenshtein distance Edit distance
Document A 1 4 1 6 7
Document B 1 3 4 6
Levenshtein distance = 3
3. Anti-spam
Spam Spam
WEB2.0
Spam
anti-spam
http://blog.csdn.net/malefactor/archive/2006/05/30/762895.aspx
Stanford University Zoltan Gyongyi Hector Garcia-Molina
http://infolab.stanford.edu/~zoltan/
http://infolab.stanford.edu/people/hector.html
Web Spam Taxonomy Spam
BoostingHiding
repetition(dumping)weaving
stiching
hony Pot
directory
posting
exchange
farm dir.clone
Link-spam
Spam cheat
outlinks
Hub pages
inlinks
Spam farm
In-link exchange
Web directory
Honey pot
Anti-Spam
1, Spam HITS
2, Spam PageRank
3, TrustRank (VLDB2004)
4, BadRank (WWW2005)
5, SpamRank (WWW2005, workshop)
6, ParentRank (WWW2005)
1. Spam HITS
Improvements of HITS algorithms for spam links
2. Spam PageRank
Microsoft --Robust PageRank and Locally Computable Spam Detection Features
3. TrustRank
Yahoo -- Combating Web Spam with TrustRank
Propagating Trust and Distrust to Demote Web Spam
4. BadRank
Google -- PR0 -Google's PageRank 0 penalty.
Generalized BadRank with Graduated Trust
5. SpamRank
SpamRank Fully Automatic Link Spam Detection
6. ParentRank
Identifying link farm spam pages
Content-Spam
Zipfs law Heaps Heaps' law
1. Zipfs law
Zipfs law GKZipf 1935
1/f
1/2
1/3 n 1/n
4.3.1 GKZipf
Simon Newcomb
b n
logb(n + 1) logb(n)
1 2 17.6%3 12.5%9
4.6%
= KnK[10~100],[0.4~0.6]
Spam
3.
Title
AnchorMeta Spam
5% 30%
4.3.2
A successful search engine requires more bandwidth to upload query result pages than its
crawler needs to download pages
http://www.seo.com.cn/seopdf/.pdf
Precision
Recall
F1 :
=/
=/
F1
F1
:
1. Mean Average Precision MAP: MAP
MAP MAP MAP
2. R-Precision: R-Precision R R
R-Precision R-Precision
3. P@10: P@10 10
10
P@10
--
http://www.sales2marketing2.com/PracticalInternetMarketing_VincentCheng.ppt
Online Marketing Channels are:
1.
2.
3.
4.
5.
6.
Email Marketing
7.
Viral Marketing
8.
9.
Blogs
,,
, ,
WWW
5.1. ""
5.2.
a)
b)
c)
d) e)
5.3.
5.4.
Google Google Answer AnswerBot
"how can kill virus of computer?"
"virus"
"how can kill virus of computer?"
5.5.
FTPFlash
5.6.
5.7.
GoogleYahoo
""
XML