Professional Documents
Culture Documents
Viraj Paripatyadar
GS Lab
Contents
A recommendation problem What is a recommender Building a recommender using Mahout
Tips and tweaks
Recommender considerations
A book store
Sells books:
By various authors Of various categories On different subjects From various publishers
Look for all books which are similar to books your friend owns Pick books from this set that you friend doesnt own
Look for users similar to your friend and see what they read Pick books which these people like and your friend doesnt own
Example data
1,101,5.0 1,102,3.0 1,103,2.5 3,101,2.5 3,104,4.0 3,105,4.5 4,106,4.0 5,101,4.0 5,102,3.0
2,101,2.0
2,102,2.5 2,103,5.0 2,104,2.0
3,107,5.0
4,101,5.0 4,103,3.0 4,104,4.5
5,103,2.0
5,104,4.0 5,105,3.5 5,106,4.0
A pictorial representation
1 5 3
101
102
103
104
105
106
107
Visualize
1 5 3
101
102
103
104
105
106
107
1,109,3.5
1,112,4.0 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 2,107,4.5 2,113,3.5 3,101,2.5 3,104,4.0 3,105,4.5
4,104,4.5
4,106,4.0 4,109,2.0 4,111,2.5 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.0 5,109,3.0
6,115,5.0
7,103,4.5 7,104,2.5 7,108,4.0 7,109,3.5 7,110,3.5 7,112,2.5 8,101,2.0 8,105,4.0 8,106,4.5 8,110,3.0
3,107,5.0
3,115,4.0
5,112,4.0
6,101,4.5
8,114,5.0
8,115,3.5
A pictorial representation
1 2 3 4
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
Use cases:
Recommenders Clustering Classification
Recommenders in Mahout
Recommenders use data culled from user behavior Recommending using Mahout
Similarity between users or items
Expressed as a number between 0-1
Similarity
Various algorithms:
Euclidean distance Pearson correlation Cosine measure Spearman correlation Tanimoto coefficient Log-likelyhood
Effectiveness dependent on the input data Influences running time and memory
Neighborhood
Nearest N neighborhood (say, 4):
5
3 U
3 U
Recommender
Recommenders
Generic recommender
User based Item based
Slope-one recommender Singular Value Decomposition based Liner Interpolation based Cluster-based
Overall design
Third party applications User, application data (MySQL)
REST
Phone/tablet applications
REST
Web application
REST
Recommender
REST service
Database
MySQL
1000
511
128
100
51
10
4 4 2
1000
100
10
2
200 300 400 500 600 Number of news articles/topics 700 800
7
6 5 No. of readers with x articles each No. of readers with x topics each
4
3 2 1 0 5 25 45 65 Number of news articles/topics 85
Learnings
Know thy user
Frequency of visits Preference logic wrt user
Questions
Thank you
viraj@gslab.com viraj.paripatyadar@gmail.com