Professional Documents
Culture Documents
Information Filtering
Part II: Collaborative Filtering
Chengxiang Zhai
• User similarity
– If Jamie liked the paper, I’ll like the paper
– ? If Jamie liked the movie, I’ll like the movie
– Suppose Jamie and I viewed similar movies in the past
six months …
• Item similarity
– Since 90% of those who liked Star Wars also liked
Independence Day, and, you liked Star Wars
– You may also like Independence Day
Collaborative Filtering vs.
Content-based Filtering
• Basic filtering question: Will user U like item
X?
• Two different ways of answering it
– Look at what U likes => characterize X => content-based filtering
• Can be combined
Rating-based vs. Preference-based
• Rating-based: User’s preferences are encoded using
numerical ratings on items
– Complete ordering
– Absolute values can be meaningful
– But, values must be normalized to combine
• Preferences: User’s preferences are represented by
partial ordering of items
– Partial ordering
– Easier to exploit implicit preferences
A Formal Framework for Rating
Objects: O
Users: U o1 o2 … oj … on Xij=f(ui,oj)=?
u1
u2 3 1.5 …. … 2
The task
… • Assume known f values for some (u,o)’s
? • Predict f values for other (u,o)’s
•
ui 2 Essentially function approximation, like
other learning problems
...
1
um Unknown function
f: U x O R
Where are the intuitions?
• Similar users have similar preferences
– If u u’, then for all o’s, f(u,o) f(u’,o)
• Similar objects have similar user preferences
– If o o’, then for all u’s, f(u,o) f(u,o’)
• In general, f is “locally constant”
– If u u’ and o o’, then f(u,o) f(u’,o’)
– “Local smoothness” makes it possible to predict unknown values
by interpolation or extrapolation
• What does “local” mean?
Two Groups of Approaches
• Memory-based approaches
– f(u,o) = g(u)(o) g(u’)(o) if u u’
– Find “neighbors” of u and combine g(u’)(o)’s
• Model-based approaches
– Assume structures/model: object cluster, user cluster, f’
defined on clusters
– f(u,o) = f’(cu, co)
– Estimation & Probabilistic inference
Memory-based Approaches
(Breese et al. 98)
• General ideas:
– Xij: rating of object j by user i
– ni: average rating of all objects by user i
– Normalized ratings: Vij = Xij - ni
– Memory-based prediction
m m
vˆaj K w(a, i )vij xˆ aj vˆaj na k 1 / w(a, i )
• Specific approaches
i 1
differ in w(a,i) --i the
1
j ( x aj na )( x ij ni )
w p ( a, i )
j ( x aj na ) j ( x ij ni )
2 2
• Cosine measure
n
j
x aj x ij
1
w c ( a, i )
• Many other possibilities! n n
x aj x ij
2 2
j 1 j 1
Improving User Similarity
Measures (Breese et al. 98)
• General ideas
– Assume that data/ratings are explained by a probabilistic
model with parameter
– Estimate/learn model parameter based on data
– Predict unknown rating using E [xk+1 | x1, …, xk], which is
computed using the estimated model
E [ x k 1 | x 1 ,..., x k ] p (x k 1 r | x 1 ,..., x k , )r
• Specific methods differr in the model used and how
the model is estimated
Probabilistic Clustering
• Clustering users based on their ratings
– Assume ratings are observations of a
multinomial mixture model with parameters
p(C), p(xi|C)
– Model estimated using standard EM
• CF + content-based
• Generative model
• (u,d,w) as observations
• z as hidden variable
• Standard EM
• Essentially clustering the joint data
• Evaluation on ResearchIndex data
• Found it’s better to treat (u,w) as
observations
Evaluation Criteria (Breese et al. 98)
• Rating accuracy
– Average absolute deviation Sa |P1 | |xˆ aj x aj |
a
j Pa
• Motivation
– Explicit ratings are not always available, but
implicit orderings/preferences might be available
– Only relative ratings are meaningful, even if when
ratings are available
– Combining preferences has other applications, e.g.,
• Merging results from different search engines
A Formal Model of Preferences
Hypothesis space
The Hypothesis Space H
• Initialization: wi is uniform
• Updating: [0,1]
L ( R it ,F t )
t 1 w
t
w i i
Zt
• L=0 => weight stays
• L is large => weight is decreased
Some Theoretical Results
• Rank thenode
(v ) with
R (v , uthe
) highest
R (u ,v ) potential value
above all others u O u O
Dt ( x0 , x1 ) vt ( x0 )vt ( x1 )
Zt Z Z 0 1 vt ( x0 )e t ht ( x0 )
t t vt 1 ( x0 ) 0
Zt
Dt ( x 0 , x 1 )e t ( ht ( x 0 ) ht ( x 1 ))
Dt 1 ( x 0 , x 1 ) vt ( x1 )e t ht ( x1 )
Zt vt 1 ( x1 ) 1
Zt
# users #movies/user
#feedback movies
Performance Comparison
Cohen et al. 99 vs. Freund et al. 99
Summary
• CF is “easy”
– The user’s expectation is low
– Any recommendation is better than none
– Making it practically useful
• CF is “hard”
– Data sparseness
– Scalability
– Domain-dependent
Summary (cont.)
• CF as a Learning Task
– Rating-based formulation
• Learn f: U x O -> R
• Algorithms
– Instance-based/memory-based (k-nearest neighbors)
– Model-based (probabilistic clustering)
– Preference-based formulation
• Learn PREF: U x O x O -> R
• Algorithms
– General preference combination (Hedge), greedy ordering
– Efficient restricted preference combination (RankBoost)
Summary (cont.)
• Evaluation
– Rating-based methods
• Simple methods seem to be reasonably effective
• Advantage of sophisticated methods seems to be limited
– Preference-based methods
• More effective than rating-based methods according to
one evaluation
• Evaluation on meta-search is weak
Research Directions
• Exploiting complete information
– CF + content-based filtering + domain knowledge +
user model …
• More “localized” kernels for instance-based
methods
– Predicting movies need different “neighbor users” than
predicting books
– Suggesting using items similar to the target item as
features to find neighbors
Research Directions (cont.)
• Modeling time
– There might be sequential patterns on the items a user
purchased (e.g., bread machine -> bread machine mix)
• Probabilistic model of preferences
– Making preference function a probability function, e.g,
P(A>B|U)
– Clustering items and users
– Minimizing preference disagreements
References
• Cohen, W.W., Schapire, R.E., and Singer, Y. (1999) "Learning to Order Things",
Journal of AI Research, Volume 10, pages 243-270.
• Freund, Y., Iyer, R.,Schapire, R.E., & Singer, Y. (1999). An efficient boosting
algorithm for combining preferences. Machine Learning Journal. 1999.
• Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empirical Analysis of Predictive
Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on
Uncertainty in Articial Intelligence, pp. 43-52.