You are on page 1of 10

International Journal JOURNAL of AdvancedOF Research in Engineering and Technology (IJARET), INTERNATIONAL ADVANCED RESEARCH IN ENGINEERING ISSN 0976

6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 1, January (2014), IAEME

AND TECHNOLOGY (IJARET)

ISSN 0976 - 6480 (Print) ISSN 0976 - 6499 (Online) Volume 5, Issue 1, January (2014), pp. 73-82 IAEME: www.iaeme.com/ijaret.asp Journal Impact Factor (2013): 5.8376 (Calculated by GISI) www.jifactor.com

IJARET
IAEME

ENHANCING MOVIE RECOMMENDER SYSTEM


1 2 3

Ronak Patel ,
1

Priyank Thakkar , K Kotecha

Assistant Professor, CE Department, C. S. Patel Institute of Technology, Changa - 388421, Gujarat, India 2 Assistant Professor, CSE Department, Institute of Technology, Nirma University, Ahmedabad - 382 481, Gujarat, India 3 Director, Institute of Technology, Nirma University, Ahmedabad - 382 481, Gujarat, India

ABSTRACT Recommender system helps customers buying products/items efficiently and at the same time benefits the business. It can be built using approaches like: (1) Collaborative Filtering (2) Content Based Filtering and (3) Hybrid Filtering. In Collaborative Recommender System, ratings of the most similar users (in case of user based collaborative filtering) or items (in case of item based collaborative filtering) are used to predict the rating of the new item. In Content Based Filtering, user profile is constructed based on the contentof theitems liked by the user in the past and then based on similarity between user and item profile, recommendations are made. Hybrid Filtering combines collaborative and content based approach. In this paper, we focus on movie recommendation task. Prediction task is modelled as classification task where our aim is to predict whether the item (movie in our case) will be liked or disliked by the user. In our work, we propose an item based recommender which combines usage, tag and movie specific data such as genres, star castand directors to improve the accuracy of the Recommender System. We have tested ourapproach using Hetrec2011-movielens-2kdataset. We use Accuracy and F-measure to evaluate the performance of our proposed system. Key words: Movie Recommender System, Content Based Filtering, Collaborative Filtering, Hybrid Recommender System. 1. INTRODUCTION The information about the products is increasing with exponential rate. As the e-commerce industry is growing and becoming complex. In such environment, it has become difficult for the customer to find optimal information about products/items from the tremendous amount of information. To help their customers to choose products/items more efficiently, major e-business
73

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 1, January (2014), IAEME

companies are developing their recommender systems (RS). The customers get benefitted by receiving some truly useful information about the products which they are planning to buy. At the same time, business is benefitted with a growth of its sales. Recommender systems emerged in the mid-90s in order to filter out irrelevant information and select content that meets users needs. Recommendation system has been described as An information filtering technology that produces individualized recommendations as a nout put or have the effect of guiding the user in a personalized way to interesting or useful objects in a large space of possible options[1]. These systems can be used for different purposes in several domains from offering products to consumerine-commerce to finding proper information in research. Various movie businesses like Netflix [2], IMDB [3], and Hulu [4] etc. also recommend movies to their customer. Although there are several factors which affect the quality of recommender system, recommendations based on common view points of user have become more and more trust worthy and widely used. There commendation task is often times, reduced to the problem of estimating what rating a user would give for an unseen item, or to find a list of items that the user is most likely to enjoy. Movie recommendation is an open research area with unanswered problems and with growing social networking data. There is a need of systematically fusing different types of data about movies and users from various sources to improve the quality of recommendations.Recommendation systems are categorized ascontent-based, collaborative or hybrid recommender system [5]. Content-based recommendation system recommends user, items similar to the ones, the user favoured in the past. However, it suffers from the problem such as limited content analysis, overspecialization and new user problem [5]. User-based collaborative filtering (CF) is a technique for producing personalized recommendations by computing the similarity between the current user and other users with similar choices.Thus, the current user choice is predicted by gathering choice information from other users with similar preferences. If choices matched in the past, it is assumed that they will match in future as well. However, it suffers from the problem such as sparsity, new user problem and new item problem [5]. In item-based collaborative filtering first similarity between items is found and then to predict the rating of item by user , ratings of for most similar items of are used. Hybrid approaches combine collaborative and content-based methods to overcome certain limitations of these individual techniques.Hybrid Recommendercan be built by different ways such as: combining separate recommenders, adding content-based characteristics to collaborative models, adding collaborative characteristics to content-based models and developing a single unifying recommendation model [5]. In this paper, we propose an item based hybrid recommender that combines usage, tag and movie specific data such as genres, star castand directors to improve the accuracy of the recommender system. 2. RELATED WORK A separate collaborative and content-based system can be implemented and then can be used to build the hybrid recommender system. Outputs obtained from individual recommendation systems are combined linearly in [6] while [7] uses the voting scheme for the same. In [8], additional ratings are calculated using a pure content-based predictor. These ratings are then used to augment the users rating vector in collaborative filtering. Latent Semantic Indexing is used in [9] to generate a collaborative view of a collection of user profiles. A rule-based classifier using content-based and collaborative characteristics is proposed in [10]. The book recommender system proposed by Liang in [11] is built from tag information only. The authors state that tags can capture the content information of items. However, tags are
74

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 1, January (2014), IAEME

sometimes meaningful only to the users that assigned them. They can be ambiguous and can also have a lot of synonyms. Authors proposed a way to address this problem by expanding the tag set. Weighted Tag Rating Recommender (WTRR) proposed in [1] is an extension to the work carried out in Weighted Tag Recommender (WTR) [11]. WTR exploits tag data but does not use ratings data and other information available about the items. Tags may not always capture the true preferences of users.This is addressed in WTRR by also using actual ratings with tags. One main difference between WTR and WTRR is, instead of simply counting the number of times a user has tagged an item with the tag , ratings are also considered of the movies which are tagged with by user . We have made two key observations about WTRR: (1) it is a user-based recommender system and it does not use all the information available about items apart from tags and ratings. (2) during prediction, it only uses ratings of those movies which have been tagged as well. In our approach, we also use item (movie in our case) specific information like genre, star cast and director of the movie.This information is used alongside ratings and tags to find similarity between items. During rating prediction, we also use all the available ratings rather than considering ratings of only those movies, which are tagged also. 3. ITEM-BASED COLLABORATIVE FILTERING The first objective in item-based collaborative filtering is to find similarity between items. In our implementation of basic item-based collaborative filtering, we have used Pearson Correlation to find similarity between items (movies) where items profile is in terms of ratings given to them by different user. The formula for finding similarity between items and is as given in equation (1). , , , , , 1

is the average rating of item and Where, is the set of users who have rated both and , , is the rating of item by user . Once the similarity between items is calculated, we predict ratings of unseen items as under.To predict the rating of user for an unseen movie , formula as shown in equation (2) is used. , , , |, | 2

Where is the ordered set of movies which are most similar to and rated by user . If the predicted rating is more than 3, than we consider that the user will like the movie otherwise it is considered that user will dislike the movie. 4. PROPOSED APPROACH As stated earlier, we have used Hetrec2011-movielens-2k dataset. From this dataset, first of all, we have constructed user-movie rating matrix and user-movie-tag matrix. User-movie rating matrix stores ratings of users to movies while user-movie-tag matrix stores number of times a tag is assigned to the movie by the user. A third matrix, user-movie sub-rating matrix is then constructed from the two matrices. This matrix stores only those movies for which every user provided a tag as well as a rating.After preparing matrices as above, under mentioned steps are followed.Our proposed approach is inspired from the work done in [1] and [11].
75

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 1, January (2014), IAEME

4.1 Movies Tag Profile Generation All the users that tagged movies in hetrec2011-movielens-2k dataset [12][3][18][19] are confined in the user set , , . . . , || . All the movies from the corpus are contained in the movie set , , . . . , || , while all the tags used by the users in to label movies in are enclosed in the tag set , , . . . , ||. Finally, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5is used to denote the set of all possible ratings that users can give. Following steps are performed to construct Movies Tag Profile. Calculate the relevance of a tag to a movie as a weight. Calculate relevance of a tag to a user as a weight. Estimate relatedness between two tags using these weights. Construct the tag profile of the movie using relatedness. 4.1.1 Relevance of a Tag to a Movie To find relevance of a tag to a movie which captures ratings in addition to the tag, following equation (3) is proposed in [1]. , ,

where the numerator represents a summation of the ratings , assigned to the movie byall the users who used to annotate it. , denotes the set of users who used to tag . A summation of all the ratings from the users whotagged is represented by the denominator. The true popularity of the tag with respectto a movie is now captured by the value of . 4.1.2 Relevance of a Tag to a User In [1], relevance of a tag to a user which signifies how strongly the user feels about the tag is defined as stated in equation (4).
,

, ,

where a summation of the ratings assigned to the movie by all the userswho used to annotate it is represented by the numerator, and the summation over,all ratings assignedto the movie by all the users who tagged it is signified by the denominator. 4.1.3 Tag Relatedness Metric for the Movie We can calculate the relatednessof two tags with respect to a movie given the relevance of a tag with respect to the user. The relatedness metric is used to avoid semantic ambiguity while constructing the movieprofiles.The relatedness metric between two tags and is denoted by , ) and it represents the degree of correspondence (orconnection) between tags with respect to movie . Itmeasures similarity between tags and in the contextof themovie . The formula to calculate tag relatedness metric is given in equation (5). , , 1 5

76

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 1, January (2014), IAEME

4.1.4 Movies Tag Profile Assuming tag as the representative for the movie , the weight or relevance of tag to the movie is calculated as a summation of relatedness between the tags used by movie (i.e., ) and target tag . The total relevance weight of for the movie is denoted as . It is defined in equation (6). ,

Similar to the concept of the inverse document frequency (IDF) in information retrieval,to measure the general importance of the tag inthe topic preference identification of the movie,a tags occurrence for all movies must be taken into consideration.We denote as the inverse movie frequency of tag and it is defined in equation (7). log 1 7

note that 0 1. Tag profile for each movie is then defined as in equation (8). | 8

Where is the number of movies that is tagged with and is the Eulers number. It is easy to

4.2 Movies Rating Profile Generation Movies user preference takes the popularity ofmovie into the consideration for two movies and it is given in equation (9).
,

Where, specifies the number of users who have tagged movie , designates the inverse movie frequency of user and it is defined in equation (10). 1 log | | 10

Where indicates the number of movies which have been tagged by user . 4.3 Neighborhood Formation In order to predict users rating for an unseen movie m, we first find the list of movies similar to m. The fundamentalidea is to recognize for each movie, an ordered list of most similar movies, , , , such that and , is maximum, , is the second highest and so on. The -nearest movies are selected based on the similarity value. Each movie is encoded with its own topic preferencesand user preferences,where topic preferences are captured by tags while user preferences are captured by ratings.The similarity between two
77

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 1, January (2014), IAEME
movies and based on tags is denoted as , where is the setsof tags used to tag movie and .We use Pearson correlation coefficient to measure similaritybetween two movies which are represented by the set of allweighted tags. , is defined in equation (11). ,

, , , ,

11

Whereas, the similarity between two movies based on user movie preference is denoted as is theset of all users. Given the tag and rating profiles of movies and , the similarity between these two based on the tag and rating profile is given by equation (12).
, where, , , 1 ,

12

is a weighting parameter such that 0 1. It controls the extent of the collaborative dimension of thealgorithm. As we decrease the value of the algorithm will bepredominantly collaborative, as the contribution of the moviesuser preferences will dominate. During the experimental phase,we have varied to see the impact on the quality of recommendations. We have also found similarity between movies from their genre, star cast and director profile. We have also experimented with combinations of these profiles in calculation of similarity between movies. In star-cast profile of an item, we have included only first five actors of each movie according to the order in which they appear on the movie IMDB cast page. Pearson correlation is used throughout to calculate similarity between movies.

Figure 1. Conceptual Flow of the Proposed Framework


78

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 1, January (2014), IAEME

4.4 Rating Prediction Formula To predict the rating of user for an unseen movie , we have used the formula as given in equation (2). If the predicted rating is more than 3, than we consider that the user will like the movie otherwise it is considered that user will dislike the movie. These steps are summarized in the Fig. 1. 5. EMPIRICAL EVALUATION In this section, we discuss about the dataset used, experimental methodology and measures used to evaluate our system. 5.1 Dataset We have used the dataset hetrec2011-movielens-2k [12][3][18][19] dated May 2011 in our experiments. Cantador et al. [12] have made it available to the public. It is based on the original MovieLens10Mdataset, published by the Group Lens research group. In this dataset, movies alsorefer to their corresponding web pages at the IMDB website. The dataset contains 2,113 users, 10,197 movies and a total of 13,222 uniquetags.These tags fall into 47,957 tag assignment tuples of the form [user, tag, movie]. It also contains855,598 user ratings ranging from 0.5 to 5.0, in increments of 0.5, leading to a total of 10 distinctrating values. There is an average of 405 ratings per user, and 85 per movie. There are 20 genre types, 20,809movie-genre assignments, 4060 directors and 95321 actors.There are average 22 actors per movie. We have preprocessed the data to construct user-movie rating matrix and user-movie-tag matrix.A third matrix, user-movie sub-rating matrix is then constructed from the two matrices. This matrix stores rating of only those movies which have been tagged also.In construction of star-cast profile of movie, only those actors who have worked in more than 2 movies are considered. This data sethas been previously used in [13][14][15]. 5.2 Evaluation Measure Accuracy and f-measure is used as the evaluation measures in our work. Accuracy is the ratio of the number of correctly classified instances in the test set to the total number of instances in the test set. In our work, we consider user liking a movie as positive class while user disliking a movie as negative class. In this sense, true positive (TP), false negative (FN), false positive (FP) and true negative (TN) are defined as under [16]. TP: the number of correct classifications of the positive instances FN: the number of incorrect classifications of the positive instances FP: the number of incorrect classifications of the negative instances TN: the number of correct classifications of the negative instances Based on the above formulations precision ( )and recall ( )are defined in equations (13) and (14) respectively. 13 14

F-measure (F) is used to compare classifier on a single measure and it is represented by the equation (15) 2 15
79

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 1, January (2014), IAEME

Precision, recall and f-measure for positive examples is calculated using above formulations. We also compute the same for negative examples and then report their weighted average in the results section. 5.3 Experimental Methodology To evaluate and compare outcome of our experiments, we have carried out 5-fold cross validation for all the experiments performed by us.For each of the experiment we have selected 20 items as the target items. These items are rated and tagged by minimum of 20 and maximum of 50 users. Following experiments are carried out by us. Experiment 1: Basic item-based collaborative filtering where similarity between movies is found using user-movie rating matrix and predictions are made using user-movie rating matrix. Experiment 2: Hybrid filtering where similarity between movies is found using genre profile of the movies and predictions are made using user-movie rating matrix. Experiment 3: Hybrid filtering where similarity between movies is found using genre and star cast profile of the movies and predictions are made using user-movie rating matrix. Experiment 4: Hybrid filtering where similarity between movies is found using genre, star cast and director profile of the movies and predictions are made using user-movie rating matrix. Experiment 5: Hybrid filtering where similarity between movies is found using Boolean tag profile of the movies and predictions are made using user-movie rating matrix. Experiment 6: Hybrid filtering where similarity between movies is found using bag-of-words tag profile of the movies and predictions are made using user-movie rating matrix. Experiment 7: Hybrid filtering where similarity between movies is found using termfrequency (TF) [17] tag profile of the movies and predictions are made using user-movie rating matrix. Experiment 8: Hybrid filtering where similarity between movies is found using termfrequency inverse document frequency (TFIDF) [17] tag profile of the movies and predictions are made using user-movie rating matrix. Experiment 9: Hybrid filtering where similarity between movies is found by setting 0.9 in equation (12) and predictions are made using user-movie sub-rating matrix. Experiment 10: Hybrid filtering where similarity between movies is found by setting 0.9in equation (12) and predictions are made using user-movie rating matrix. Experiment 11: Hybrid filtering where similarity between movies is found by setting 1.0in equation (12) and predictions are made using user-movie rating matrix. Experiment 12: Hybrid filtering where similarity between movies is found by modeling movie profiles as combination of tag profiles ( 1.0 in equation (12)) and ratings and predictions are made using user-movie rating matrix. Experiment 13: Hybrid filtering where similarity between movies is found by modeling movie profiles as combination of tag profiles ( 1.0 in equation (12)) and genre profile and predictions are made using user-movie rating matrix. Experiment 14: Hybrid filtering where similarity between movies is found by modeling movie profiles as combination of ratings and genre profile and predictions are made using user-movie rating matrix. Experiment 15: Hybrid filtering where first we find similarity between movies using movies tag profile and then we find similarity between movies using their genre profile. To compute the final similarity between movies we combine these two similarities with weight 0.8 and 0.2 respectively.
80

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 1, January (2014), IAEME

6. RESULTS AND DISCUSSION Performance of the recommender under various settings as discussed in the last section is shown in Table 1. We have experimented with varying size of the neighbourhood but due to space limitation, for each of the technique, we report results for that size of neighbourhood where it has performed the best.It is evident from the result that hybrid recommender system outperforms the basic item-based collaborative filtering in all settings apart from that in experiment 5,6,7 and 9. The approach proposed in [1] uses the user-movie sub-rating matrix for the calculation of rating to be predicted. We proposed to use user-movie rating matrix to calculate ratings to be predicted. Use of user-movie sub-rating matrix is obvious for the construction of movie profile and finding similarity between movies but we advocate using user-movie rating matrix rather than user-movie sub-rating matrix during the phase of rating prediction. This allows us to predict rating based on more number of ratings which leads to improvement in the performance. 7. CONCLUSION & FUTURE WORK We propose an item-based hybrid filtering approach which combines usage,tag and content data of movies. Movie recommendation task is modelled asclassification problem where our aim is to predict whetherthe user will like or dislike the movie.MovieRecommender system proposed by us exploits movie specificdata such as movie genres, star cast and directors in addition to the ratings and tags. Results show that combining the right type of data in the right manner in the phases of constructing the item profile and calculating the item similarity improves the quality of recommendations. It is also seen that using user-movie rating matrix rather than user-movie subrating matrix for predictions improves the quality of recommendations. In future, we plan to model the problem in machine learning framework. Table 1. Experimental Results Experiment Experiment 1 Experiment 2 Experiment 3 Experiment 4 Experiment 5 Experiment 6 Experiment 7 Experiment 8 Experiment 9 Experiment 10 Experiment 11 Experiment 12 Experiment 13 Experiment 14 Experiment 15 Number of Nearest Neighbours 100 100 500 500 40 40 40 40 5 20 20 100 100 100 10
81

Accuracy 0.7024 0.7198 0.7454 0.7454 0.7101 0.7090 0.7090 0.7465 0.6698 0.7570 0.7570 0.7117 0.7430 0.7198 0.7726

F-measure 0.6880 0.7220 0.7510 0.7510 0.6775 0.6567 0.6567 0.6956 0.5957 0.7384 0.7384 0.6921 0.7258 0.7220 0.7511

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 6480(Print), ISSN 0976 6499(Online) Volume 5, Issue 1, January (2014), IAEME

REFERENCES
1. 2. 3. 4. 5. Swapnill Nagar, A Hybrid Recommender: User Profiling from Tags/Keywords and Ratings Masters Thesis, Rajiv Gandhi Technical University, 2012. http://www.netflix.com http://www.imdb.com http://www.hulu.com Alexander Tuzhilin, Gedimin as Adomavicius, Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on Knowledge and Data Engineering, Volume 17(6), pages 734-749, June 2005. M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. Netes, and M. Sartin, Combining ContentBased and Collaborative Filters in an Online Newspaper, Proc. ACM SIGIR 99 Workshop Recommender Systems: Algorithms and Evaluation, Aug. 1999. M. Pazzani, A Framework for Collaborative, Content-Based, and Demographic Filtering, Artificial Intelligence Rev., pages 393-408, Dec.1999. P. Melville, R.J. Mooney, and R. Nagarajan, Content-Boosted Collaborative Filtering for Improved Recommendations, Proc.18th Natl Conf. Artificial Intelligence, 2002. I. Soboroff and C. Nicholas, Combining Content and Collaboration in Text Filtering, Proc. Intl Joint Conf. Artificial Intelligence Workshop: Machine Learning for Information Filtering, Aug.1999. C. Basu, H. Hirsh, and W. Cohen, Recommendation as Classification: Using Social and ContentBased Information in Recommendation, Recommender Systems. Papers from 1998 Workshop, Technical Report WS-98-08, AAAI Press 1998. Huizhi Liang, Yue Xu, Yuefeng Li, Richinayak, Gavin Shaw, A Hybrid Recommender Systems Based on Weighted Tags, International Conferenceon Data Mining (SDM2010), May2011. Cantador, P. Brusilovsky, and T. Kuflik, Second Workshop on Information Heterogeneity and Fusion in Recommender Systems (Hetrec 2011), In Proceedings of the Fifth ACM Conference on Recommender Systems, pages 387-388. ACM, 2011. E. Bothos, K. Christidis, D. Apostolou, and G. Mentzas, Information Market Based Recommender Systems Fusion. In Proceedings of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems, pages 1-8, ACM, 2011. A. Said, E.W. De Luca, B. Kille, B. Jain, I. Micus, and S. Albayrak. Kmule, A Framework for User-Based Comparison of Recommender Algorithms. In Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces. ACM, 2012. C. Jones, J. Ghosh, and A. Sharma. Learning multiple models for exploiting predictive heterogeneity in recommender systems. In Proceedings of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems, HetRec '11, pages 17{24, New York, NY, USA, 2011. ACM. Bing Liu, Web Data Mining Exploring Hyperlinks, Contents, and Usage Data, Springer, 2007. Zdravko Markov, Daniel T. Larose, Data Mining the Web Uncovering Patterns in Web Content, Structure and Usage, Wiley-Interscience, A John Wiley & Sons, Inc., Publication, 2007. http://www.grouplens.org http://www.rottentomatoes.com Paulo J. G. Lisboa, Huda Naji Nawaf and Wesam S. Bhaya, Recommendation System Based on Association Rules Applied to Consistent Behavior Over Time, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 4, 2013, pp. 412 - 421, ISSN Print: 0976 6367, ISSN Online: 0976 6375. Anuj Verma and Kishore Bhamidipati, A Survey of Memory Based Methods for Collaborative Filtering Based Techniques for Online Recommender Systems, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 2, 2013, pp. 366 - 372, ISSN Print: 0976 6367, ISSN Online: 0976 6375.
82

6.

7. 8. 9.

10.

11. 12.

13.

14.

15.

16. 17. 18. 19. 20.

21.

You might also like