Professional Documents
Culture Documents
IM 6, English 12 AP
5/15/17
Antonio Torralba, Kevin P. Murphy, William T. Freeman, Sharing visual features for
multiclass and multiview object detection IEEE TRANSACTIONS ON
PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Volume: 29, p.
854-869, May 2007
The human brain allows us to put different objects into categories on the spot
regardless of whether we have seen the object before or not. Even if we see a new object,
the brain still receives information on the object using words to describe it, for example
yellow, spotted, and small. Computers do not have this capability and if they do not
recognize a certain object, the computers will not obtain any information about the
object. In order to program a computer to recognize certain objects, thousands of images
must be passed through until the computer can recognize the object based on certain
attributes. Now, computer scientists are trying to get computers to recognize semantic
qualities of an object like yellow, spotted, or small for both known and unknown objects.
This is difficult because there are many kinds of spots if the computer did not recognize
the kind of spot, it would not return any data, as opposed to a human which would still be
able to recognize the spot, even if they had never seen the same spot before. This
example represents the main dilemma for visual recognition which is the question of how
to get a computer to analyze an object that it had never encountered before. This paper
has helped me understand the basic abilities of visual technology up until this point in
time, and the main problem that computer scientists are trying to solve in order to make
visual technology more like that of the human brain.
Jacob Chandran
IM 6, English 12 AP
5/15/17
The Support Vector Machines (SVM) is a very popular method to solve problems
in classification and novelty detection. An important property of support vector machines
is that determination of the model parameters corresponds to a convex optimization
problem, and so any local solution is also a global optimum. But the support vector
machine is fundamentally a two class classifier. But in practice we need a multi-classifier
where the number of classes is more than 2. Various methods have been proposed for
combining multiple two-class SVMs in order to build a multiclass classifier. One
commonly used approach is one-versus-the rest approach. In this method if there are K
classes, K separate SVMs are constructed in which the kth model is trained used data
from a class C as positive examples and the data from the K-1 classes are used as
negative examples. But the disadvantage of the method is that using decisions of the
individual classifiers can lead to inconsistent results in which an input is assigned to
multiple classes simultaneously. In spite of the disadvantages, SVM is the best classifier
available for object classification and in our experiments in research the number of
classes are large (e.g. 33 classes) and the one-versus-the rest approach is used. It is
important to know about the different equations used for Support Vector Machines
because for different situations, different equations need to be used. This article was
helpful in describing the different equations and why they are used.
Boyd, S., and Vandenberghe, V., Convex Optimization, Chapter 8.6. Classification,
pages 422 431. Cambridge University Press, 2004
In pattern recognition and classification problem, two sets of patterns are given
and the task is to find a function that is positive on the first set and negative on the
second. In linear description, an affine function classifies a set of points in a class as
positive and the other set of points as negative. Geometrically, a hyper plane that
separates the two sets of points is determined. But when two sets of points cannot be
linearly separated, it is required to find an affine function that approximately classifies
the points for example one that minimizes the number of points misclassified. In general,
it is very difficult to find solution to the problem. But a solution for approximate linear
discrimination based on Support Vector classifier is used to solve the problem. The
support vector machine is very successful in image classification and hence is used in our
research. At APL, one of the foundational aspects of work is programming a support
vector machine so it is necessary to understand its processes and how it can aid in
classifying objects.
Erik G. Miller and Nicholas E. Matsakis and Paul A. Viola Learning from One Example
Through Shared Densities on Transforms Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2000
Fan, R., Chang, K., Hsieh, C., Wang, X., and Lin, C., LIBLINEAR: A library for large
linear classification, The Journal of Machine Learning Research, vol. 9, pp.
18711874, 2008.
G. Kulkarni, V. Premraj, S. Dhar, S.Li, Y. Choi, A.C. Berg, and T.L. Berg, Babytalk:
Understanding and Generating Simple Image Descriptions, Proc. IEEE Conf.
Computer Vision and Pattern Recognition (CVPR), pp. 1601-1608, 2011.
Computer scientists have created a system to reduce a picture with many objects
into words describing the picture. The system uses a natural language processer to take
all the objects in the picture and turn them into words, similar to what the human brain
would do. This process demonstrates a refined attribute processor which can correctly
label different objects in a picture. In addition, this has the potential to make labeling
more specific, labeling objects inside a picture as opposed to just labeling the picture as
one object. This refined ability to label smaller objects in a larger picture is critical to the
research done at APL which aims to improve the computers ability to recognize an
object based on a given description.
Jayaraman, D., and Grauman, K., Zero-shot recognition with unreliable attributes
Advances in Neural Information Processing Systems 27, Dec 2014.
Jegou, H. and Chum, O. "Negative evidences and co-occurrences in image retrieval: The
benefit of pca and whitening," In ECCV, pages 774787, 2012.
The main problem with large scale image retrieval and object classification is that
many approaches are limited to search in a database of only a few million images on a
single machine due to computational or memory constraints. In this paper a short vector
representation of images, which compacts images, is introduced as a possible solution to
the memory constraints. The Principal Component Analysis (PCA) method is used to
perform dimensionality reduction which would basically compact the information into a
smaller space. The PCA is a two-step process (1) centering the data and (2) selecting a
decor related (orthogonal) basis of subspace minimizing the dimensionality reduction
error. The method uses two feature sets for image classification: the Locally Aggregated
Descriptors (VLAD) and Bag of Words (BOW). VLAD and BOW vectors are high
dimensional and therefore take up lots of space. The size of VLAD is 512 x 128 and
BOW ranges from one thousand to one million in size. To limit the size of these feature
sets while still retaining most of the important information, dimensionality reduction by
PCA can be used to compact these matrices.
Lampert, C.H., Nickisch, H., and Harmeling, S, Learning to Detect Unseen Object
Classes by Between-Class Attribute Transfer, Procedings IEEE Conference
Computer Vision and Pattern Recognition (CVPR), 2009.
This article discusses the latest research in the classification of visual objects for
which no training examples are available (Zero-Shot Learning). Standard object
classification methods use labeled training images to classify objects however, humans
are capable of identifying an object if they are provided with a description of the object
(e.g. large gray animal with large trunks elephants). In attribute based classification,
high level descriptions of objects like color, shape or geographic information are used
instead of only images to classify the objects. This allows attributes of known objects to
be transferred and used to describe objects of an unknown class for which no training
data exists. For example, for an attribute striped, images of zebras, bees and tigers can be
used. Many tests have been done passing known and unknown objects through an
Jacob Chandran
IM 6, English 12 AP
5/15/17
attribute based classification system and the results show that by including specific
attribute information for classes, information can be transferred between known and
unknown classes which form the foundation for zero shot learning.
Larochelle, H., Erhan, D., and Bengio, Y., Zero-Data Learning of New Tasks,
Proceedings. 23rd National Conference in Artificial Intelligence, vol. 1, no. 2, pp.
646-651, 2008.
Machine learning has in the past required many thousands of training values and
much human involvement to recognize objects. With the rise of a new technique,
Zero-Shot Learning, systems are now becoming able to recognize unknown objects by
transferring characteristics of known objects to those of unknown objects. This process
often requires humans telling the machine that a certain unknown object has the same
characteristics of an object in the training data. The goal of computer scientists is to
produce a system for which no human involvement is necessary, where the system itself
can automatically recognize similarities between seemingly different objects as the
human brain is capable of doing. As of now, machines are semi-supervised. The
machines are able to link different attributes to words so if a new object is introduced to
the system with a definition, the machine is able to detect a known word and realize that
the two objects share a characteristic.
The large amount of image data available in the internet can be used to develop
more sophisticated and robust models and algorithms to index, retrieve, organize and
interact with images and multimedia data. An Image database ImageNet built on
WorldNet, a set of synonyms (80,000 nouns) is a useful resource for visual recognition
applications such as image classification. ImageNet aims to populate the 80,000 synsets
(synonym set) with an average of 500-1000 clean images. The data is collected using
Amazon Mechanical Turk. The database is used as a training resource to transfer
knowledge of common attributes to learn new rare objects and as a bench mark dataset
with high quality to test new algorithms. The significance of the dataset is that this set
introduces new semantic relations for visual modeling. Because ImageNet is uniquely
linked to all nouns of WorldNet, whose synsets are richly interconnected, the semantic
relations of different words can be used to learn new models in zero shot learning. The
authors illustrate the usefulness of the ImageNet through applications in object
recognition, image classification and automatic clustering. The scale, accuracy, diversity
and hierarchical structure of the database provide a good training resource for developing
zero shot learning methods in computer vision.
Palatucci, M., Pomerleau, D., Hinton, G., Mitchell, T., Zero-Shot Learning with
Semantic Output Codes. Neural Information Processing Systems (NIPS), Dec,
2009
Formally, the research involved in zero shot learning has been interested in describing
unknown objects based on a foundation of known objects that are pre-defined in a
database. This research is being applied to a new concept: comparing two unknown
objects. A system in the technology of today could recognize unknown objects at a basic
descriptional level, but what would happen if two similar unknown objects were
introduced. Likely the current system would misclassify the second object as the first
unknown object which was similar to it. The technology in this article could potentially
combine the power of discriminative attributes and comparison. First, if discriminatory
variables were used, there would be a lower chance of confusion, but now a system is
being built so that two similar unknown objects can be distinguished by comparison
statements that would further discriminate the two objects. This would be very useful in
expanding the bounds of computer recognition. Right now, recognition is heavily based
in the original training set so scientists had to pick objects demonstrating many
characteristics so that they could be recognized in new objects. If this technology could
introduce comparisons between unknown objects, the focus could come off of the
original data set and onto the attributes themselves which would make the system more
pliable. Then, the original data sets could be made smaller which would free up lots of
memory and make the system quicker because to program an original data set requires
thousands of images.
Jacob Chandran
IM 6, English 12 AP
5/15/17
Parikh. D., and Grauman, K., Interactively Building a Discriminative Vocabulary of
Nameable Attributes, Proc. IEEE Conf. Computer Vision and Pattern
Recognition (CVPR), pp. 1681-1688, 2011.
Philippe M. Burlina, Aurora C. Schmidt, I-Jeng Wang, Zero Shot Deep Learning from
Semantic Attributes. IEEE International Conference on Machine Learning and
Applications, Dec 2015
Piyush Rai, 'MATLAB for Machine Learning' University of UTAH, Fall 2011
Razavian, A. H., Azizpour, H., Sullivan, J., and Carlsson, S. "CNN Features
Off-the-Shelf: An Astounding Baseline for Recognition", Proceedings of the 2014
IEEE Conference on Computer Vision and Pattern Recognition Workshops pp.
512-519, 2014
Jacob Chandran
IM 6, English 12 AP
5/15/17
Conventional Neural Net OverFeat features are very powerful features for image
classification and attribute detection. These generic descriptors (OverFeat features)
extracted from conventional neural networks consistently superior results compared to
state of the art visual classification systems on various datasets. The OverFeat system
obtains its accurate classification rate by using a linear Support Vector Machine
Classifier which is then applied to a feature representation of size 4096 that was extracted
from a layer in the net. In attribute detection, an attribute is defined as a semantic or
abstract quality which different categories share. In the experiment on attribute detection,
two attribute data sets, one containing shape, part or material (UIUC 64) and other
containing 9 human attributes (H3D datasets)were used along with OverFeat features.
Linear kernels with libsvm with one-vs-one were then used for multi-class classification.
At APL research on zero shot learning primarily deals with OverFeat features and
understanding the process by which the OverFeat network was selected is important in
understanding the zero shot learning projects.
This presentation provides the basic understanding of using visual attributes for
zero-shot learning. Visual attributes are attributes, which are visual qualities of objects,
such as red, striped or spotted. Nameable attributes capable of or susceptible to being
named or identified; identifiable. An Attribute based detector is a detector that first
predicts the presence of an array of visual properties (e.g., spotted, metallic etc.) and
then uses the outputs of those models as features to an object classification. Nameable
attribute discovery can be made using a feature space can be used to identify attributes
that are discriminable. Discriminable attributes can be obtained from product description.
Attributes are used for zero shot learning Train relative attributes on a set of categories.
Describe unseen categories with comparisons: bears are furrier than giraffes but less
furry than rabbits lions are larger than dogs, as large as tigers, but less large than
elephants Build a model based on the attributes for the unseen categories. Test the
accuracy of that model. Unseen categories are modeled as Gaussian distributions in
attribute space constrained by the category definition.
This article discusses a new learning method to find the mid-level feature
representation which combines the semantic attribute representation with non-semantic
features derived from images. Semantic representation augmented with non-semantic
attributes derived from images was found to improve the object classification in a visual
classification system. Semantic attributes are used to transfer knowledge in zero-shot
classification system where training data is not available. The method is extended to
cases where few training samples are given either with class annotation (supervised) or
without it (unsupervised). In this method the semantic attribute representation is
Jacob Chandran
IM 6, English 12 AP
5/15/17
augmented with additional non-semantic midlevel features to improve the classification
accuracy. The non-semantic part of the representation is learned and is added to semantic
part. The additional feature dimension overcomes the shortcomings of the semantic ones.
Socher, R., Ganjoo, M., Manning, C.D., Ng, A.Y., Zero-Shot Learning Through
Cross-Model Transfer Advances in Neural Information Processing Systems, 26
NIPS 2013
This article proposes a new system called a joint system. The purpose of this
system is to combine advances in many new technologies such as zero shot learning
which identifies new objects, one shot learning which can determine objects based on few
descriptive words or attributes, and knowledge and visual attribute transfer which has the
ability to classify different objects and categorize them. The joint system model has two
different methods, one for known objects and one for unknown objects. The system puts
pictures through one set of procedures if it is a known object and another set of
procedures if the object is new or unknown. This technology finally implements zero shot
technology into a larger framework that includes memory (recognizing familiar items)
and categorization (describing new objects). This technology is in its incipient stages but
would be revolutionary towards the work being done at APL in order to recognize
unknown objects. By providing a way to store known objects and classify unknown
objects within the same substructure, scientists could not only store mass amounts of
data, but use attributes of the known objects to reinforce the categorization of unknown
objects. By using the technique of a joint system along with the technology of zero shot
learning at APL, the error in classification would drop significantly to produce accuracy
of 90% and above.
In one-shot or Zero shot learning problems, the object categories have only one or
no training example per category for classification. Conventional learning algorithms
cannot function due to lack of training examples. To solve these problems, knowledge
transfer is important wherein prior knowledge obtained from known categories is
transferred to unknown categories via object attributes. Object attributes are high-level
descriptions of object categories such as color, texture, shape, parts, context etc. The
semantic knowledge of the attributes represents common properties across different
categories and they can be used to transfer knowledge between known and unknown
categories. In this paper, an attribute based transfer learning framework is developed. A
generative attribute model based on Author-Topic Model to learn the probabilistic
distribution of image features for each attribute (attribute priors) is used. These attribute
priors are used to (1) classify unseen images of target categories (zero shot learning) and
(2) facilitate learning classifiers for target categories when there is only one training
Jacob Chandran
IM 6, English 12 AP
5/15/17
example per target category. The main contributions to the method are (1) a generative
attribute model which offers flexible representations for attribute knowledge transfer (2)
Two methods that use attribute priors in learning of target classifiers and combine the
training examples of target categories when they are available. The method uses the
Animal with Attributes dataset to show the performance of the methods.