Welcome to Scribd!

Project Proposal 2

Uploaded by

100% found this document useful (3 votes)

193 views1 page

The netflix prize is a competition in which contestants are asked to devise a learning algorithm that will accurately predict movie ratings. In our project, we propose to use the netflix dataset to investigate and compare the accuracy with which supervised learning algorithms predict movie ratings by users. We will also investigate whether the choice of algorithm or the availability of more background knowledge affects the performance of the classification more.

Original Description:

Copyright

Available Formats

PDF or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF or read online from Scribd

Flag for inappropriate content

100% found this document useful (3 votes)

193 views1 page

Project Proposal 2

Uploaded by

potatoid

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 1

Search inside document

B656: Web Mining, Prof.

Fil Menczer
Project proposal: Learning to Predict Movie Ratings From the Netflix Dataset

The Netflix Prize is a competition in which contestants are asked to devise a learning algorithm that
will accurately predict movie ratings based on users' past rating history. The prize was set up in 2006
by netflix.com to encourage the development of an algorithm which, compared to Netflix's own
Cinematch algorithm, will perform at least 10% better with regard to the root mean squared error
(RMSE) of predicted ratings against actual ratings. Since the start of the competition hundreds of teams
have entered it and the current leading solutions improve Cinematch by more than 9%.

In our project, we propose to use the Netflix dataset to investigate and compare the accuracy with
which supervised learning algorithms predict movie ratings by users. This will include implementing
classifiers such as Naïve Bayes and SVM (and potentially others as we do some more research), as well
as investigating metrics for similarity between users.

In addition, we intend to extend the original Netflix dataset with additional features. This will let us
compare how the algorithms perform on data of higher dimension, and it will also allow us to
investigate whether the learning algorithm or the availability of more background knowledge affects
the performance of the classification more. We suspect that using additional data features will do more
to improve our results than the choice of algorithm.

In more detail, the current Netflix data set consists of <UserID, Rating, Date> data items for each of
about 18,000 movies identified by movie ids. There is also a mapping of ids to the titles of the movies.
We plan to use the movie titles to crawl movie portal sites (such as IMDB or Rotten Tomatoes, this is
still to be decided), and gather additional features like the following: genre, director, writer, lead actor,
lead actress. The exact set of features might change slightly; we might consider adding more features if
we see fit. Additionally, if time allows, we would like to mine movie reviews in order to come up with
a list of adjectives (e.g. “boring”, “sad”, “sentimental”, etc.) that describe the movie, and use these with
our data as well.

Since the Netflix dataset is huge we will consider dividing our datasets into several groups according to
a related property (e.g., genre) and then build training/test sets by selecting elements of each group. In
addition, we will consider using Latent Semantic Index (LSI) to reduce the computation time of some
of our training algorithms. Since the dataset given by Netflix is in a plain text format, we will import
the data to postrgeSQL which we believe can provide good performance with a large size of data.

We consider approaching the initial implementation of the classifiers and the setting up of our crawling
infrastructure in parallel. Before we begin any implementation, we need to research and evaluate
existing APIs or open source projects to help with the following portions of our project:
– Classification algorithms
– Crawler
– HTML parser
– Some storage/indexing system

Movie Recommender System Using Content Based AndCollaborative Filtering
Document7 pages
Movie Recommender System Using Content Based AndCollaborative Filtering
International Journal of Innovative Science and Research Technology
No ratings yet
How Netflix Uses AI (AutoRecovered)
Document8 pages
How Netflix Uses AI (AutoRecovered)
Hik Hjkk
No ratings yet
Recommendation System
Document11 pages
Recommendation System
Shane Fernandez
No ratings yet
Technical Documenetflix Technicalnt
Document15 pages
Technical Documenetflix Technicalnt
RANJIT BISWAL (Ranjit)
No ratings yet
Recommendationsystem
Document11 pages
Recommendationsystem
SHAILESH KUMAR
No ratings yet
It Netflix
Document3 pages
It Netflix
Elgün Abdullayev
No ratings yet
ML Project Movie Recommendation System
Document2 pages
ML Project Movie Recommendation System
Mussab Shahid
No ratings yet
Review 1
Document6 pages
Review 1
Binod Adhikari
No ratings yet
Personal and Big
Document6 pages
Personal and Big
Ted Teddy
No ratings yet
Synopsis
Document8 pages
Synopsis
Shivam Singh
No ratings yet
NguyenJung ISSAT2019 OMS
Document6 pages
NguyenJung ISSAT2019 OMS
Thanh Thiên
No ratings yet
SRMDB - in (B28 - Research Paper)
Document5 pages
SRMDB - in (B28 - Research Paper)
shubham
No ratings yet
4-6 Analytics Lifecycle Case Study Netflix
Document2 pages
4-6 Analytics Lifecycle Case Study Netflix
Luis Alfonso Dañez
0% (1)
Survey On Cinematics Recommendation System
Document10 pages
Survey On Cinematics Recommendation System
Damn Damn
No ratings yet
Hit or Flop - Box Office Prediction For Feature Films - Cocuzzo, Dan & Wu, Stephen - Stanford University - December 13, 2013
Document5 pages
Hit or Flop - Box Office Prediction For Feature Films - Cocuzzo, Dan & Wu, Stephen - Stanford University - December 13, 2013
Sean O'Brien
No ratings yet
Netflix Recommendation Based On IMDB
Document5 pages
Netflix Recommendation Based On IMDB
Ayush Bansal
No ratings yet
Transform MovieLens Data for Analysis
Document4 pages
Transform MovieLens Data for Analysis
don donn
No ratings yet
The Programming: Million Dollar Prize
Document6 pages
The Programming: Million Dollar Prize
梅止观
No ratings yet
Become A Data Engineer
Document14 pages
Become A Data Engineer
D Work
100% (1)
Final Report
Document8 pages
Final Report
api-398625007
No ratings yet
Project On Movie Recommendation by Using Data Mining
Document23 pages
Project On Movie Recommendation by Using Data Mining
18-5A5 S YESWANTH
No ratings yet
Minor Project
Document15 pages
Minor Project
harmeetpics1607
No ratings yet
R Report
Document24 pages
R Report
shravani kadam
No ratings yet
Getting Structured Data from the Internet: Running Web Crawlers/Scrapers on a Big Data Production Scale
From Everand
Getting Structured Data from the Internet: Running Web Crawlers/Scrapers on a Big Data Production Scale
Jay M. Patel
No ratings yet
Building a Recommender System With Pandas: A Guide to Content-Based Recommendation
Document1 page
Building a Recommender System With Pandas: A Guide to Content-Based Recommendation
Jordan J Tate
No ratings yet
Artwork Personalization at Netflix
Document7 pages
Artwork Personalization at Netflix
Aditya Dhavala
No ratings yet
Hitchhiker's Guide To Exploratory Data Analysis - by Harshit Tyagi - Towards Data Science
Document14 pages
Hitchhiker's Guide To Exploratory Data Analysis - by Harshit Tyagi - Towards Data Science
SaikatPandit
No ratings yet
CC Project - Tarik Sulic
Document16 pages
CC Project - Tarik Sulic
Tarik Sulic
No ratings yet
ML.Net
Document284 pages
ML.Net
Gladys Nyoni
No ratings yet
Assignment 4
Document5 pages
Assignment 4
Ahmed Haa
No ratings yet
Quick Guide Build Recommendation Engine Python
Document17 pages
Quick Guide Build Recommendation Engine Python
Perni Akash
No ratings yet
It Optics Project Report
Document6 pages
It Optics Project Report
api-697727439
No ratings yet
Predicting Movie and TV Preferences From Facebook Profiles
Document4 pages
Predicting Movie and TV Preferences From Facebook Profiles
Bart Sa
No ratings yet
Personalize Movie Recommendation System CS 229 Project Final Writeup
Document6 pages
Personalize Movie Recommendation System CS 229 Project Final Writeup
abhay
0% (1)
Movie Recommendation System Using Machine Learning: Robin Sharma (1613106009)
Document21 pages
Movie Recommendation System Using Machine Learning: Robin Sharma (1613106009)
2K19/EC/005 AASTIK
No ratings yet
Experiment No 10 - Updated
Document28 pages
Experiment No 10 - Updated
Aman Jain
No ratings yet
Rajan Dhabalia Netflix Prize
Document9 pages
Rajan Dhabalia Netflix Prize
rajan_sfsu
100% (1)
FLEX: A Content Based Movie Recommender: Abstract-Recommender Systems Are An Efficient and
Document4 pages
FLEX: A Content Based Movie Recommender: Abstract-Recommender Systems Are An Efficient and
Aditya Jikamade
No ratings yet
Frank Kane's Taming Big Data with Apache Spark and Python
From Everand
Frank Kane's Taming Big Data with Apache Spark and Python
Frank Kane
No ratings yet
Term Project
Document17 pages
Term Project
Asjad Ali
No ratings yet
Project File of Business Analytics
Document9 pages
Project File of Business Analytics
Anonymous WeDBYik
No ratings yet
Interview Notes (Project Related
Document5 pages
Interview Notes (Project Related
varun goel
No ratings yet
Cinema Recommendation
Document7 pages
Cinema Recommendation
ezekill
No ratings yet
Projects Brochure
Document6 pages
Projects Brochure
android basics
No ratings yet
Movie Recommendation System-1
Document25 pages
Movie Recommendation System-1
Singi Tejaswini
No ratings yet
Applications of TOP 10 Algorithms
Document16 pages
Applications of TOP 10 Algorithms
sharath
No ratings yet
Appm 3310 Final Project
Document13 pages
Appm 3310 Final Project
api-491772270
No ratings yet
Problem Statement
Document6 pages
Problem Statement
Archana Jagadeesan
No ratings yet
Detect AI-generated Text Using Machine Learning
Document5 pages
Detect AI-generated Text Using Machine Learning
Kanika Saxena
No ratings yet
Informatics Practices Project Synopsis Title: Imdb Movie Analysis System
Document24 pages
Informatics Practices Project Synopsis Title: Imdb Movie Analysis System
Aadya XIIC 18 7109
No ratings yet
Using Python To Scrape The Meet-Up API
Document9 pages
Using Python To Scrape The Meet-Up API
Fake
No ratings yet
Application of Machine Learning in Recommendation System: A Movie Recommender
Document6 pages
Application of Machine Learning in Recommendation System: A Movie Recommender
Binod Adhikari
No ratings yet
Advanced Recommender Systems with Python
Document13 pages
Advanced Recommender Systems with Python
Fabian Hafner
No ratings yet
Predicting IMDB Movie Ratings Using Social Media
Document5 pages
Predicting IMDB Movie Ratings Using Social Media
James Alberts
No ratings yet
ShapeAI Data Analyst Training and Internship Syllabus
Document25 pages
ShapeAI Data Analyst Training and Internship Syllabus
Mahantesh Koli
No ratings yet
Problem Statement - 1 Movie Dataset Analysis
Document5 pages
Problem Statement - 1 Movie Dataset Analysis
Atul kumar maurya
No ratings yet
18bce0890 VL2020210504596 Pe003
Document41 pages
18bce0890 VL2020210504596 Pe003
pranayan saha
No ratings yet
DS100-1 Case Study Group 1
Document6 pages
DS100-1 Case Study Group 1
freshvlogstv
No ratings yet
Collaborative Filtering Lab
Document3 pages
Collaborative Filtering Lab
Kanwar Zain
No ratings yet
Movie Success Prediction Using Data Mining PHP: Objective
Document2 pages
Movie Success Prediction Using Data Mining PHP: Objective
Romeo Jatt
No ratings yet
IT211-SQL Project: Ronin Arms Inc DB
Document102 pages
IT211-SQL Project: Ronin Arms Inc DB
CarmenLim07
100% (1)
Ubicom Ch07 Slides
Document96 pages
Ubicom Ch07 Slides
anupam20099
No ratings yet
Pensamiento Creativo Longoria, Cantu, Ruiz - Compressed (1) - Compressed PDF
Document342 pages
Pensamiento Creativo Longoria, Cantu, Ruiz - Compressed (1) - Compressed PDF
Julián VL
100% (3)
Course Outline DBA (IT-445)
Document5 pages
Course Outline DBA (IT-445)
Danyal Fasihi
No ratings yet
General Models of AIS
Document14 pages
General Models of AIS
Kert Mendoza
No ratings yet
Speedometer Chart in Excel
Document2 pages
Speedometer Chart in Excel
Astro Holic
No ratings yet
Design Non-Relational Storage Case - CS
Document4 pages
Design Non-Relational Storage Case - CS
madarchowd
No ratings yet
Dberr
Document17 pages
Dberr
Juan Parra
No ratings yet
Italy (8586)
Document147 pages
Italy (8586)
mrpobolee
No ratings yet
WLAS - CSS 12 - w3
Document11 pages
WLAS - CSS 12 - w3
Rusty Ugay Lumbres
No ratings yet
File in The Hole! PDF
Document59 pages
File in The Hole! PDF
bobindian
No ratings yet
WPR Term Paper PDF
Document6 pages
WPR Term Paper PDF
Saquib Sajjad
No ratings yet
ADF Course Deck V2
Document216 pages
ADF Course Deck V2
ravikumar lanka
No ratings yet
The Harvard System of Referencing
Document12 pages
The Harvard System of Referencing
Sandro Ananiashvili
No ratings yet
No-Sql: Introduction To NOSQL Objective Examples of NOSQL Databases Nosql Vs SQL Conclusion
Document13 pages
No-Sql: Introduction To NOSQL Objective Examples of NOSQL Databases Nosql Vs SQL Conclusion
Irfan Pinjari
No ratings yet
SDSD
Document6 pages
SDSD
sandeep k
No ratings yet
D7.12 Data Management Plan Phase 3 v1.0
Document9 pages
D7.12 Data Management Plan Phase 3 v1.0
gkout
No ratings yet
Batch processing accumulates transactions for processing
Document4 pages
Batch processing accumulates transactions for processing
Shweta Dhareshwar
No ratings yet
SAP Community Network Wiki - Enterprise Information Management - EIM Home
Document11 pages
SAP Community Network Wiki - Enterprise Information Management - EIM Home
Dudi Kumar
No ratings yet
Mysql Workbench: Database Design. Development. Administration. Migration
Document16 pages
Mysql Workbench: Database Design. Development. Administration. Migration
Manuel I. Lantigua Díaz
No ratings yet
Performance Tuning Guide PDF
Document0 pages
Performance Tuning Guide PDF
Krishnaprasad Oru
No ratings yet
Mariadb 5.5.30 Tokudb 7.1.0 Users Guide
Document56 pages
Mariadb 5.5.30 Tokudb 7.1.0 Users Guide
Alexey Zilber
No ratings yet
VCS-279.examsforall - Premium.exam.89q: Number: VCS-279 Passing Score: 800 Time Limit: 120 Min File Version: 1.0
Document45 pages
VCS-279.examsforall - Premium.exam.89q: Number: VCS-279 Passing Score: 800 Time Limit: 120 Min File Version: 1.0
Mustapha ASSILA
No ratings yet
The Informatica Master Data Management
Document2 pages
The Informatica Master Data Management
Laxmi Reddy
No ratings yet
Database Systems Lecture 1: Introduction to Database Concepts
Document38 pages
Database Systems Lecture 1: Introduction to Database Concepts
Umar Khan
No ratings yet
East West Institute of Technology: BENGALURU-560091
Document15 pages
East West Institute of Technology: BENGALURU-560091
Suchithra
No ratings yet
Final Report
Document17 pages
Final Report
SHAFA RAIHAN
No ratings yet
Logi Analytics Helps YES BANK Automate Reporting
Document4 pages
Logi Analytics Helps YES BANK Automate Reporting
AMAN PANDEY
No ratings yet
Accomplishment Report Week 1
Document4 pages
Accomplishment Report Week 1
Erica Jambongana
No ratings yet
Wikipedia Handbook of Biomedical Informatics
Document709 pages
Wikipedia Handbook of Biomedical Informatics
Monark Huny
100% (1)