You are on page 1of 6

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org


Volume 3, Issue 5, September-October 2014

ISSN 2278-6856

A New Approach to Introduce Celebrities from


an Image
Ms. Jaya Marotirao Jadhav1, Ms. Deipali Vikram Gore2
1

Department of Computer Engineering, PESs Modern College of Engineering,


Shivaji Nagar, Pune University, Pune, India

Assistant Professor, Department of Computer Engineering, PESs Modern College of Engineering,


Shivaji Nagar, Pune University, Pune, India

Abstract
Nowadays there is explosive growth in number of images
available on the web. Among the different images celebrities
images are also available in large amount which are in the
form of posters, photographs and images taken at different
events. Celebrities related queries are ranking high. Most of
the end users are more interested in celebrity related data and
images. To better serve the end user demand we have
developed an application which will provide celebrities
information when an image is given as input. Algorithm used
for face detection is HAAR cascade algorithm which
eliminates false positive rate as compared to canny edge
detection and thus increase accuracy and detection rate.

Keywords: Face detection method, Common HAAR


feature, HAAR Cascade Classifier, CFW dataset, PCA.

1. INTRODUCTION
General web image page do not always contain the name
of celebrity in an image. Because of noise in web data it
becomes difficult to identify celebrity name form web page
text. There are mainly two challenges. Firstly the surround
text of web image is lacking of standard grammar
structure, therefore it is difficult to apply natural language
processing techniques to extract celebrity names from it.
Secondly the celebrities face in image may be having
different pose, makeup, expression and occlusion caused
by sunglasses or fancy hairstyles. So it becomes difficult to
identify celebrity in an image with visual analysis and a
normal face database. To face this challenge a CFW
dataset can be used which contain millions of celebrity
images in different pose, makeup, expression. Work so far
is conducted on news images, where descriptive captions
are usually provided and most of the time the caption
contains the celebrity name that is there in an image. By
This project we are not only providing name of celebrity
but also provide other information related to celebrity.
This paper presents related work done for image
annotation in section 2; introduce proposed system using
HAAR Cascade Classifier and CFW dataset in section 3.

2. RELATED WORK
Name annotation system has been developed for family
album, news images, ARISTA project. Normally family
photographs are indexed according to when, where, who
and what. The advance digital camera provides date and

Volume 3, Issue 5, September-October 2014

time as well as location data. The paper [1], focus on how


to automatically extract who in family photographs.
Compared to general image collections, most images in
family photo albums are in color. Also, a same individual
often appears in a number of photographs taken at the
same day or event. In family photo albums, only limited
number of people, e.g. ten to fifty, are of our concern and
appear frequently. When a new photograph is imported to
the system and a face is detected, the system will allow the
user either to provide a name label for that face, or to
confirm or select a name from a recommended name list
derived from prior history. Image professionals, such as
journalists, need new solutions for efficiently and
effectively store, index and retrieve the images they
produce. The paper [2], present an image annotation
scheme that associates pictures with textual information
extracted from surrounding text relying on a large parallel
text-image corpus consisting of news articles, and its
associated ground truth. Each image comes with a caption
that is often divided into two parts. The first caption
sentence, in bold, describes the image precisely. The rest of
the caption reminds the context of the article. In corpus,
we can use article texts, image captions, or both, as textual
information. This method allows us to work with largescale corpora of any size at no manual annotation cost.
The paper [3], introduce the Arista project (lARgescale
Image Search to Annotation), which deals with images of
popular concepts as an attempt to investigate the
performance of image annotation on a real web-scale
image dataset. Two billion web images were leveraged for
this investigation. The goal is to build a practical image
annotation engine which is able to automatically annotate
images of any popular concepts. Examples of such
concepts are celebrity, logo, product, landmark, poster,
pattern, etc., which are popular concepts and are more
likely to be duplicated. On the other hand, near duplicate
images are of a special type of visually similar images, and
near-duplicate detection is a well-defined problem as
contrast to visual similarity search. The system in the
paper [4] is working as explained further. The first step of
the annotation framework is to automatically construct a
large-scale celebrity name vocabulary from semi-structured
web data. A near duplicate image is searched across the
web. From the surrounding text a list of celebrity name is

Page 154

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 3, Issue 5, September-October 2014
obtained with confidence scores. Among these names the
final name assignment takes place.

3.IMPLEMENTATION
Proposed work in this project is extension of the above
system by introducing celebrities with the help of other
information like DOB, Designation, Achievements, Family
details etc. rather than just name tagging. The input to
system will be an image of celebrity. Output of the system
is name assignment to the celebrity in an image and also
display other related information to it. Therefore, to better
serve the end-user demand and foster multimedia research
I propose,
1. Scalable and accurate face annotation approach to
name celebrities in general web images and
2. Methods to infer more properties of the celebrity from
the web like DOB, Designation, Family Background,
Achievements etc.
The system presented in this paper uses HAAR cascade
algorithm for face detection which minimize false positive
rate and increase accuracy as well as speed of detection. As
MSRA-CFW dataset is rich source of images of various
celebrities it helps to detect face in any pose, makeup and
hairstyle. Or we can build our own database which
contains celebrity images downloaded from Google
images. The working of the system and the algorithms
used in each phase are explained further in this section.
Input to the system is an image of celebrity, from the
image first the face is detected then the same image is used
by tineye.com for similar image search. We get some
names of celebrity which can look like the query image.
From this list of celebrities best match is found by face
recognition. And finally from the Wikipedia link details
are extracted and displayed as the end result.
3.1Algorithms used for face detection
In an image annotation system first we need to detect the
face in an image. Face detection methods are as follows.
They are divided into four categories. [5] These categories
may overlap, so an algorithm could belong to two or more
categories. This classification can be made as follows:
Knowledge-based methods or Ruled-based methods:
that encodes our knowledge of human faces using
different rules.
Feature-invariant methods: Algorithms that try to find
invariant features of a face despite its angle or
position.
Template matching methods: These algorithms
compare input images with stored patterns of faces or
features.
Appearance-based methods: A template matching
method whose pattern database is learnt from a set of
training images.
Above methods with their strength and limitations can be
summarized in the following Table 1.

Volume 3, Issue 5, September-October 2014

ISSN 2278-6856

Table 1: Face detection approaches with strength and


limitations
Method

Strengths

Limitations

Knowledge
-based
methods

Its easy to guess


some simple
rules.

Difficulty in
building an
appropriate set of
rules. Its unable to
find many faces in a
complex image.

Featureinvariant
methods

Success rate of
94%.

Skin color can vary


significantly if light
conditions change.

If face is with
sunglasses, Skin
color detects the
face.

Template
matching
methods

Define a face as
a function.

Appearancebased
methods

Use a wide
variety of
classification
methods.
Sometimes two
or more
classifiers are
combined to
achieve better
results

Simple to
implement.

Limited to faces
that are frontal,
cannot achieve good
results with
variations in pose,
scale and shape.
Rely on techniques
from statistical
analysis and
machine learning to
find relevant
characteristics of
face images.

Many different algorithms exist to perform face detection;


each has some strengths and limitations. Most of them are
based on analysis of pixels. These algorithms suffer from
the same problem; they are slow and expensive. Any image
is only a collection of color and/or light intensity values.
Analyzing these pixels for face detection is lengthy process
and also difficult to accomplish because of the wide
variations of shape and pigmentation within a human face.
So Viola and Jones devised an algorithm, called HAAR
Classifiers, to rapidly detect any object, including human
faces, using AdaBoost classifier cascades that are based on
HAAR-like features and not pixels. Let us see two face
detection algorithms canny edge detection and HAAR
cascade classifiers.
3.1.1Canny edge detector
The algorithm was presented by John F. Canny. Canny
aimed to come up with an optimal edge detection
technique. He followed a list of criteria to improve current
methods of edge detection: Good detection: The first and
most obvious criterion is a low error rate. It is important
that the edges occurring in images should not be missed
and that there are no responses to non-edges. Good
localization: The second criterion is that the edge points be
well localized. In other words, the distance between the
edge pixels are as found by the detector and the actual edge
is at a minimum. Minimal response: is to have only one

Page 155

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 3, Issue 5, September-October 2014
response to a single edge. This was implemented because
the first 2 were not sufficient enough to completely
eliminate the possibility of multiple responses to an edge.
Based on these criteria; the Canny edge detector [6] first
smoothes the image to eliminate the noise. It then finds the
image gradient to highlight regions with high spatial
derivatives. The algorithm then tracks these regions and
suppresses any pixel that is not at the maximum (nonmaximum suppression). The gradient array is now further
reduced by hysteresis. Hysteresis is used to track the
remaining pixels that have not been suppressed. Hysteresis
uses two thresholds and if the magnitude is below the first
threshold, it is set to zero (made a non-edge). If the
magnitude is above the high threshold, it is made an edge.
And if the magnitude is between the two thresholds, then it
is set to zero unless there is a path from this pixel to a
pixel with a gradient above the second threshold. The steps
of the Canny edge detection algorithms are demonstrated
in more details below: Noise reduction: Using a Gaussian
mask, the input image is convolved, so that the output
looks like a blurred copy of the original in order to reduce
the effect of the single noisy pixel. The larger the width of
the Gaussian mask, the lower the detector's sensitivity to
noise. The localization error in the detected edges also
increases slightly as the Gaussian width increases. Find the
intensity gradient of the image: An edge in an image may
points in a variety of directions, so the canny algorithm
uses four masks to detect horizontal, vertical and diagonal
edges. The result of convolving the original image with
each of these masks is stored. For each pixel, we then mark
the largest result at that pixel, and the direction of mask
which produced that edge. From the original image, we
created a map of intensity gradients at each point in the
image, and direction of the intensity gradient points.
Based on the value of, the detected direction will be linked
to a direction from the four possible directions that can be
tracked in an image (North, South, East, and West).
Tracing edges through the image: After the edge directions
are known, non maximum suppression is then applied.
Non-maximum suppression is used to trace along the edge
in the edge direction and suppress any pixel value (sets it
equal to 0) that is not considered an edge. This will give a
thin line in the output image. The higher intensity
gradients are more likely to be edges. There is not an exact
value at which a given intensity gradient switches from not
being an edge to being an edge. Therefore Canny uses
thresholding along with hysteresis. Thresholding with
hysteresis requires two thresholds - high and low. Making
the assumption that those important edges should be in
continuous lines through the image, allows us to follow a
faint section of a given line, but avoiding the identification
of a few noisy pixels that do not constitute a line.
Therefore we begin by applying a high threshold to mark
out the edges we can be fairly sure are genuine. Starting
from these, using the directional information derived
earlier, edges can be traced through the image. While
tracing a line, we apply the lower threshold, allowing us to
trace faint sections of lines as long as we find a starting
point. Once this process is complete, we have a binary

Volume 3, Issue 5, September-October 2014

ISSN 2278-6856

image where each pixel is marked as either an edge pixel


or a non-edge pixel. Figure 1 shows an example of an
image and the result of applying the canny algorithm on it.

Figure 1 An example of canny edge detection. For an


input of a gray scale image and the corresponding response
of the Canny edge detection algorithm.
3.1.2Facial feature detection using HAAR classifiers
Viola and Jones [7] introduced a method for accurate and
rapid face detection within an image. This technique
accurately detects facial features, because the area of the
image being analyzed for a facial feature needs to be
regionalized to the location with the highest probability of
containing the feature. For example eyes can detected at
the upper part of the face, mouth is at bottom, nose is at
the center of face. By regionalizing the detection area, false
positives are eliminated and the speed of detection is
increased due to the reduction of the area examined.
b) HAAR cascade classifiers
The important factor for HAAR classifier object detection
is HAAR - like feature. These features are based on change
in Contrast values between adjacent rectangular groups of
pixels rather than the intensity values of a single pixel. For
a human face, eye area pixels are dark than the nose area
pixels. The contrast values between adjacent rectangular
groups of pixels are used to determine relative light and
dark areas. Two or three adjacent groups with a relative
contrast groups form a HAAR-like feature. HAAR like
features are as shown in Figure 2 that is used to detect
features in an image. Figure 2 also shows how these
features detect eye and nose for a face image.

Page 156

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 3, Issue 5, September-October 2014

ISSN 2278-6856

the facial features within another set of images in database.


The accuracy of the classifier was as shown in Table 2.
Table 2: Accuracy of classifiers
Facial

Positive Hit

Negative

Feature

Rate

Hit Rate

Eyes

93%

23%

Nose

100%

29%

Mouth

67%

28%

d) Regionalized detection

Figure 2 Common HAAR features


c) Training classifiers for facial features
HAAR Classifier needs to train to detect human facial
features, Such as the mouth, eyes, and nose. To train the
classifiers, AdaBoost algorithm and HAAR feature
algorithms must be implemented. Intel developed an open
source library devoted to easing the implementation of
computer vision related programs called Open Computer
Vision Library (OpenCV). To train the classifiers, two set
of images are needed that are negative image set and
positive image set. Negative image set contains an image
or scene that does not contain the object, in our case a
facial feature, which is going to be detected. The other set
of images, the positive images, contain objects that are
facial features. The location of the objects within the
positive images is specified by: image name, the upper left
pixel and the height, and width of the object. For training
facial features 5,000 negative images with at least a megapixel resolution were used. These images consisted of
everyday objects, like paperclips, and of natural scenery,
like photographs of forests and mountains but not the face
image. For more accurate facial feature detection, the
original positive set of images is needed which include
large variation between different people, including, race,
gender, and age. Three separate classifiers were trained,
one for the eyes, one for the nose, and one for the mouth.
Once the classifiers were trained, they were used to detect

Volume 3, Issue 5, September-October 2014

A method is needed to reduce the false positive rate of the


classifier and to increase accuracy, without modifying the
classifier training attribute. The proposed method is to
limit the region of the image that is analyzed for the facial
features. For example region analysis for mouth is limited
to bottom area of face, for nose it is limited to center area
of face and for eyes it is limited to upper area of face. By
reducing the area analyzed, accuracy will increase since
less area exists to produce false positives. It also increases
efficiency since fewer features need to be computed and the
area of the integral images is smaller. In order to
regionalize the image, one must first determine the likely
area where a facial feature might exist. The simplest
method is to perform facial detection on the image first.
The area containing the face will also contain facial
features. However, the facial feature cascades often detect
other facial features as illustrated in Figure 3.

Figure 3 Inaccurate detection: eyes (red), nose (blue), and


mouth (green),
To eliminate inaccurate feature detection, it can be
assumed that the eyes will be located near the top of the
head, the nose will be located in the center area and the
mouth will be located near the bottom. The upper 5/8 of
the face is analyzed for the eyes. The center of the face, an
area that is 5/8 by 5/8 of the face, was used to for detection
of the nose. The lower half of the facial image was used to
detect the mouth. After Regionalization the accurate
results for feature detection are shown in Figure 4. Since
each the portion of the image used to detect a feature

Page 157

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 3, Issue 5, September-October 2014

ISSN 2278-6856

become smaller than that of the whole image, detection of


all three facial features takes less time on average than
detecting the face itself. Regionalization provides a
tremendous increase in efficiency in facial feature
detection also increases the accuracy of the detection. All
false positives were eliminated. Detection rate is around
95% for nose and eye. The mouth detection has a lower
rate due to the minimum area for detection. By changing
the height and width parameter to more accurately
represent the dimensions of the mouth and retraining the
classifier the accuracy should increase the accuracy to that
of the other features.

Figure 4 Detected objects: face (white), eyes (red), nose


(blue) and mouth (green)
3.2Similar image search
After face detection in query image it becomes necessary to
find few similar celebrities out of our large celebrity
database which can be possible with tineye.com TinEye [8]
is a reverse image search engine. You can submit an image
to TinEye to find out where it came from, how it is being
used, if modified versions of the image exist, or to find
higher resolution versions. TinEye is the first image search
engine on the web to use image identification technology
rather than keywords, metadata or watermarks. It is free to
use for non-commercial searching. TinEye crawl the web
for new images regularly. It also accepts contributions of
complete online image collections. To date, TinEye has
indexed 6,202,942,860 images from the web to help you
find what you're looking for.
3.3 Face recognition
Face recognition systems recognize a given face image
according to the images in database. The database of a face
recognizer is formed using a training set of images.
Training set is set of the features extracted from face
images of different persons. The face recognition system
finds the most similar feature vector among the training set
matching to the feature vector of a given test image. Here,
we want to recognize the identity of a person where an
image of that person (test image) is given to the system [9].
Principal Component Analysis (PCA) is working as
follows shown in figure 5.

Volume 3, Issue 5, September-October 2014

Figure 5 Face recognition


In the training phase, you should extract feature vectors for
each image for each training image, you should calculate
and store these feature vectors. In the recognition phase
(or, testing phase), you will be given a test image of a
known person. In order to identify the person, you should
compute the similarities between feature set of test image
and all of the feature vectors in the training set. The
similarity between feature vectors can be computed using
Euclidean distance or any other comparative method. The
identity of the most similar feature set will be the output of
our face recognizer. Schematic diagram of the face
recognition system that will be implemented is shown in
Figure 5.
3.4 Get details of celebrity after identification
After identification of celebrity in an image the name is
passed to Wikipedia where each celebrity has a page
containing table and the image at the right hand side
which has basic details about that person. This table used
by my system to introduce the celebrity with some details.
3.5 Dataset
There are many face datasets available for face
recognitions and many of them contain labeled celebrities.
Some of them are accessible for all for research purpose,
CFW is one of them. MSRA-CFW (Microsoft Research
Asia- Celebrities on the Web) [11] is a data set of celebrity
face images collected from the web The CFW dataset is
advantageous in several aspects [4]. First, it contain more
images of celebrities which much larger than previous
datasets, most of which contain only tens of thousands of
faces. Second, since all faces in CFW are collected from
the web, they have large variation in pose, expression,
hairstyle and makeup. So the labels of CFW dataset are
much more accurate than previous automatically generated
datasets. Or we can build our own dataset containing
celebrities of our interest. Distinct images can be
downloaded from Google images. In this project dataset
mostly concentrated on Indian celebrities from every field.

Page 158

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 3, Issue 5, September-October 2014
Also system has a web crawler which crawls the web
starting from given Wikipedia link and start finding and
downloading images of celebrities from the web. It also
store new links found on previous crawled web page into
the database table which will be crawled further. And thus
the database goes on increasing.

4. RESULT
Figure 6 shows how the system works. An image is input
which further goes for face detection and tineye.com. The
list shows possible names from database. After correct face
recognition a new window shows current details about
celebrity from the Wikipedia.

5. CONCLUSION
This system is developed to identify celebrities in an image
which can be in the form of poster, photographs and web
images where name is missing. This paper introduces
different approach for face detection which can be
combined for better results. HAAR is feature based method
for face detection. HAAR features, Integral images,
regionalized detection of features improve Face detection
in terms of speed and accuracy. HAAR algorithm also
gives small false positives rate. We have also seen CFW
dataset which contain large number of celebrity images.
CFW is open to all for research purpose and it is
downloadable [11]. It gives better results for name
annotation, as it contains

ISSN 2278-6856

[4] Xiao Zhang, Lei Zhang, Xin-Jing Wang, HeungYeung Shum, "Finding Celebrities in Billions of Web
Images, IEEE Transactions On Multimedia, Vol. 14,
No. 4, August 2012, pp. 995-1007.
[5] Ion Marques Face Recognition Algorithms June 16,
2010
[6] Abdallah S. Abdallah Investigation of new
techniques For face detection, May 9, 2007
Blacksburg, Virginia
[7] Phillip Ian Wilson Dr. John Fernandez Facial Feature
Detection Using HAAR Classifiers JCSC 21, 4
(April 2006), pp. 127-133.
[8] TinEye.com
[9] A tutorial Face Recognition using Principal
Component Analysis
[10] R. Padilla, C. F. F. Costa Filho and M. G. F. Costa
Evaluation of HAAR Cascade Classifiers Designed
for Face Detection World Academy of Science,
Engineering and Technology, Vol: 6 2012-04-22, pp.
323-326.
[11] http://research.microsoft.com/en-us/projects/msracfw/default.aspx

Figure 6 Working of system

References
[1] Lei Zhang, Longbin Chen, Mingjing Li, H. Zhang,
Bayesian Face Annotation in Family albums,
Microsoft Research Asia
[2] P. Tirilly, V. Claveau, and P. Gros, News image
annotation on a large parallel text- image corpus,
Presented at the LREC, Malta, 2010, pp. 2564-2569.
[3] Xin-Jing Wang, Lei Zhang, Ming Liu, Yi Li, WeiYing Ma, ARISTA - Image Search to Annotation on
Billions of Web Photos, in Proc. CVPR, 2010, pp.
2987-2994.

Volume 3, Issue 5, September-October 2014

Page 159

You might also like