You are on page 1of 16

Deep Learning in Recommender

Systems

Ashish Sharma
14MCB1025
Guide : Dr.Geetha.S

CONTENTS:

Problem Statement.
Architecture Design.
Literature Survey.
Proposed Algorithm.
Implementation Techniques.
Modules and its current status.
Snapshot
Expected Results.
Milestone.
References.

PROBLEM STATEMENT OF THE


PROJECT :
Unsupervised Feature learning:Can we learn
meaningful features from unlabeled data?
The project is primarily concerned with exploring
the applications of convolutional neural networks
towards solving classification and retrieval
problems as they pertain to images of items.

Classification of items on a dataset.


Retreiving the product based on classification

ARCHITECTURE DESIGN:

What is Deep Learning?


Deep Learning is about learning multiple levels of
representation and abstraction that help to make sense of
data such as images, sound, and text.
Train networks with many layers
Multiple layers work to build an improved feature space.
First layer learns 1st order features (e.g. Edges...).
2nd layer learns higher order features (combinations of first
layer features, combinations of edges, etc.)
In current models layers often learn in an unsupervised mode
and discover general features of the input space serving
multiple tasks related to the unsupervised instances (image
recognition, etc.)
Then final layer features are fed into supervised layer(s)
By using parallelism, we can quickly train very large and
effective deep neural models on very large datasets

PROPOSED ALGORITHM
MULTI MODAL LEARNING
We present a model that generates natural language descriptions
of images and their regions.
Our approach leverages datasets of images and their sentence
descriptions to learn about the inter-modal correspondences
between language and visual data. Our alignment model is
based on a novel combination of Convolutional Neural Networks
over image regions,bidirectional Recurrent Neural Networks
over sentences, and a structured objective that aligns the two
modalities through a multimodal embedding. We then describe a
Multimodal Recurrent Neural Network architecture that uses the
inferred alignments to learn to generate novel descriptions of
image regions. We demonstrate that our alignment model
produces state of the art results in retrieval experiments on
Flickr30K and MSCOCO datasets.

PROPOSED ALGORITHM
Convolutional Networks express a single differentiable function
from raw image pixel values to class probabilities.

PROPOSED ALGORITHM
MULTI MODAL LEARNING
We present a model that generates natural language descriptions
of images and their regions.
Our approach leverages datasets of images and their sentence
descriptions to learn about the inter-modal correspondences
between language and visual data. Our alignment model is
based on a novel combination of Convolutional Neural Networks
over image regions,bidirectional Recurrent Neural Networks
over sentences, and a structured objective that aligns the two
modalities through a multimodal embedding. We then describe a
Multimodal Recurrent Neural Network architecture that uses the
inferred alignments to learn to generate novel descriptions of
image regions. We demonstrate that our alignment model
produces state of the art results in retrieval experiments on
Flickr30K and MSCOCO datasets.

PROPOSED ALGORITHM
MULTI MODAL LEARNING
We present a model that generates natural language descriptions
of images and their regions.
Our approach leverages datasets of images and their sentence
descriptions to learn about the inter-modal correspondences
between language and visual data. Our alignment model is
based on a novel combination of Convolutional Neural Networks
over image regions,bidirectional Recurrent Neural Networks
over sentences, and a structured objective that aligns the two
modalities through a multimodal embedding. We then describe a
Multimodal Recurrent Neural Network architecture that uses the
inferred alignments to learn to generate novel descriptions of
image regions. We demonstrate that our alignment model
produces state of the art results in retrieval experiments on
Flickr30K and MSCOCO datasets.

IMPLEMENTATION TOOL:

Language Used : Python, C++

Platform Used : Torch7 , Luarocks

Software for visualization : torch

MODULES :
MODULES DESCRIPTION

CURRENT STATUS

1. Identify needs and benefits

Completed

2. Collection of dataset.

Completed

3. Creation of model and dataset.

Completed

4. Design and Implementation.

Completed

5. Creation of visualizations.

Pending

EXPECTED RESULTS:
Decompose images and sentences into fragments
and infer their inter-modal alignment using
ranking objective. Our model aligns contiguous
segments of sentences which are more meaningful,
interpretable, and not fixed in length.
Convolutional Neural Networks (CNNs) models for
image classification and object detection On the
sentence side, our work takes advantage of
pretrained word vectors to obtain low-dimensional
representations of words. Finally, Recurrent
Neural Networks have
been previously used in language modeling.

MILESTONE:
WORK TASK

PLANNED START

PLANNED
COMPLETE

1. Identify needs and


benefits

SEPTEMBER

SEPTEMBER

2. Collection of reviews.

OCTOBER

OCTOBER

3. Creation of model and


dataset.

OCTOBER

NOVEMBER

4. Design and
Implementation.

DECEMBER

JANUARY

5. Creation of
visualizations.

MARCH

MARCH

REFERENCES:
A. Krizhevsky. Learning multiple layers of features from tiny images. Masters
thesis, Department of Computer Science, University of Toronto, 2012.
A. Berg, J. Deng, and L. Fei-Fei. Large scale visual recognition challenge 2010.
www.imagenet.org/challenges. 2013.
A. Krizhevsky and G.E. Hinton. Using very deep autoencoders for content-based
image retrieval. InESANN, 2011.
Y. LeCun, K. Kavukcuoglu, and C. Farabet. Convolutional networks and
applications in vision. InCircuits and Systems (ISCAS), Proceedings of 2013 IEEE
International Symposium on, pages 253256.IEEE, 2013.
H. Lee, R. Grosse, R. Ranganath, and A.Y. Ng. Convolutional deep belief networks
for scalable unsupervised learning of hierarchical representations. In Proceedings
of the 26th Annual International Conference on Machine Learning, pages 609
616. ACM, 2014.

THANK YOU!!

You might also like