You are on page 1of 10

An Approach to Detect an Image

as a Selfie Using Object Recognition


Methods

Madhuri A. Bhalekar, Mangesh V. Bedekar and Saba Aslam

Abstract Selfie is the act of taking self portrait through the front camera of the
mobile. By visualizing the captured image one can identify the details regarding the
image such as number of objects, location and much more. By the method of object
recognition in the image we can identify whether the image taken is a selfie or not.
For this we should first segregate both the foreground and background details from
an image. From the details of foreground one can identify the object (i.e., the person
taking the selfie) and from the background we can tell about the location. Using
various object recognition methods such as exhaustive search, segmentation,
selective search, Gaussian mixture model the information regarding objects, fore-
ground and background can be detected. And further the value of foreground and
background will be compared with a certain threshold value and according to the
obtained result can recognized whether an image is a selfie or not. In this paper we
are presenting an approach which can be used to detect an image is a selfie.

 
Keywords Object recognition Selfie Exhaustive search  Segmentation

Selective search Gaussian mixture model

1 Introduction

Internet has become a very major part of our lives. Earlier the Internet was used for
other purposes such as research work, official work, military, communication, etc.
As the times have changed, the entertainment section has taken its space. People use

M. A. Bhalekar (&)  M. V. Bedekar  S. Aslam


Department of Computer Engineering, MAEER’S Maharashtra Institute of Technology,
Savitribai Phule Pune University, Pune, Maharashtra, India
e-mail: madhuri.bhalekar@mitpune.edu.in
M. V. Bedekar
e-mail: mangesh.bedekar@mitpune.edu.in
S. Aslam
e-mail: saba.cs14@gmail.com

© Springer Nature Singapore Pte Ltd. 2019 111


P. K. Mallick et al. (eds.), Cognitive Informatics and Soft Computing,
Advances in Intelligent Systems and Computing 768,
https://doi.org/10.1007/978-981-13-0617-4_12
112 M. A. Bhalekar et al.

the Internet to watch movies, share movies, upload picture, connect through social
media and much more. As the social media has become a very famous, it generates
a lot of data. The data generated is mainly photographs and very less of textual data.
The photographs these days have become more and more of selfies and groupfies.
Photographs have been a very essential and important part of our lives. The
traditional method was storing the photographs for one’s memories but in due
course of time, photographs are now used for the self representation over the
Internet. These are now used to create an image, to make them visible to the world.
They are used to define the geographical mobility, status of living, culture, and
social circles. The term ‘Selfie’ was coined in 2013. It is defined as the
self-photography using the front camera of a mobile. The first selfie taken was back
in 1839 by Robert Cornelius.

1.1 Selfie and Social Media

The social media platforms which are popularly used like Instagram, Flickr,
Twitter, Facebook, Tumblr, and many more are the major sources of generating the
selfies. These websites act as a platform for commercializing the pictures taken by
people. As and when anyone (like general public, actors, singers, dancers, you
tubers, influential speakers, politicians, etc.) posts his/her picture, the social media
generates a big amount of data and hence these can tell a lot about a person’s life,
interests, mobility, social circles, etc.

1.2 Selfie Detection

The detection of selfie is done by identifying the foreground and the background in
the picture as shown in Fig. 1. This segregation helps in identifying various objects,
locations, persons, animals and many more. Selfie detection can further helps in

Fig. 1 Portrait which


represent a selfie
An Approach to Detect an Image as a Selfie … 113

finding the behavior and the lifestyle of the person in the selfie. It could also
highlight a person’s hobby, like and dislikes, interests.

2 Overview

As the social media has become a very famous and widely used, it generates a lot of
data in the form of images. To manage this large amount of data and help in
classifying whether the photograph that has been taken by the user is a selfie or not
is the new challenge. Figure 2 Shows the proposed approach system flow for
detecting an image is a selfie or not.
An image consists of many elements. It can contain object or a group of objects,
people, nature or just simple single colored images. To identify the objects, people
or nature, we can apply various object recognition algorithms. Some of the object
recognition algorithms are exhaustive searching, segmentation, selective search,
Gaussian mixture model and many more. The paper is organized as in the following
sequence. In Sect. 3 related works is surveyed, mathematical model given in Sect. 4
and the proposed approach is elaborated in Sect. 5.

Fig. 2 System flow for the


proposed approach to detect
an image is a selfie
114 M. A. Bhalekar et al.

3 Related Work

The related work is segregated in two parts. In first part we focus on selfie culture in
today’s world and in second part elaborates different methods that are used for
object recognition.
Orekh et al. [1] discuss about the selfie culture in today’s world. It explains how
people have taken this culture as a way of self acceptance and self representation. It
helps to build and destroy many people’s self-esteem as well. It also discusses about
the information sharing. It can be used to obtain information about the standard of
living, geographical mobility, friends circle, work place, availability of exclusive
places and things, hobbies and many more. It also discusses how important people
take their representation over the Internet.
Du Preez [2] presents another aspect regarding selfie culture. It presents a study
on how sometimes taking a selfie can be dangerous. The emphasis falls on the
analysis of selfies of death that overlaps with the sublime experience almost
entirely, and it becomes nearly impossible to distinguish between selfie and
sublimity.
Dalal and Triggs [3] describe how different objects like various people’s can be
detected in a single photograph. First, the faces in the photograph are recognized.
Further the face with the best expression is selected and finally synthesizing photo
gets produced by color transfer and image completion.
Harzallah et al. [4] have reviewed and analyzed that Histogram of Oriented
Gradient (HOG) approach performs best for human object detection.
Viola and Jones [5] mentions about finding the localized objects and then
classifying them using the Support Vector Machine (SVM). The images are clas-
sified into two categories. A positive training set is the set of images that were
identified correctly and then classifier SVM was able to classify them in the right
category. The other category is the negative examples. These are not used for the
classifiers as they have been rejected by it. The classifier either cannot classify the
identified objects correctly and hence it is tagged negative example or the object
identification has not taken place in the right order and as it has failed to recognize
and identify the object, it is termed as a negative example.
Felzenszwalb et al. [6] describe a robust step by step machine learning approach
for visual object detection. This method provides more accurate and faster results.
Alexe et al. [7] proposed segmentation based greedy decision approach with
graph based representation. Mentioned approach give much clear and distin-
guishing boundaries for object detection in the image.
Uijlings et al. [8] introduced the selective search which is the combination of
exhaustive search and segmentation. It helps in taking all objects into consideration
and diversifies the searching. Mean Average Best Overlap (MABO) value is used to
evaluate the quality of the object hypothesis that is found. After the average best
overlap is found, mean is taken in order to find the MABO value.
An Approach to Detect an Image as a Selfie … 115

Anghelescu et al. [9] describes iterative clustering approach for efficient edge
detection. Sun—Jung Kim et al. [10] discusses about the differentiation of the
foreground and the background of the image. This segregation helps in finding the
location of the person.

4 Methods Surveyed for Object Recognition

4.1 Exhaustive Search

The exhaustive search is a brute force search method. This means that it takes a
very large area into consideration and goes through every necessary and unnec-
essary detail in the picture. This makes the exhaustive search computationally very
expensive. It uses classifiers such as Support Vector Machines (SVM) and
Histogram of Oriented Gradient (HOG) descriptors [6].

4.2 Segmentation

Segmentation helps in generating a class independent object hypothesis. The


contour lines are drawn around the objects present in the photograph. This is done
by convoluting the edges using the gray scale imaging along with local derivatives.
Contouring is done around the objects and hence the background is separated from
the picture and the entire focus is on the objects and person present in the
photograph.
The input image is given and then hierarchical segmentation is done on the
image. The proposed regions are the ones which can be used as the images in the
classifiers. The ranked regions are the images which are fed into the classifiers. It
helps in localizing the objects [6].

4.3 Bag of Words Approach

The Bag of Words model helps in predicting the object location. This is done using
jumping windows and then the relation between individual visual words and the
object location is learned. To represent an image is treated as a document in the Bag
of Words method. To define the words, we need to follow certain steps:
1. Feature detection: The different objects present in the image are detected using
the different points of the objects. For example, if a door is to be detected, the
four corners would be the points of recognition of the door.
116 M. A. Bhalekar et al.

2. Feature description: The description is made for every possible angle of the
object. The door when opened might have two points in the object for the
recognition of the door whereas a closed door might have four points.
3. Codebook generation: These entries are then enclosed in a single codebook. The
entry helps for the future use in identifying objects.

4.4 Selective Search

The selective search strategy is an improved method of object recognition. It fol-


lows certain design conditions:
Fast to compute: This method of object recognition is faster than the methods
discussed above.
Diversification: The method of searching objects within the photographs must
be diversified. It should not follow just one method or path to identify the objects.
The strategy for choosing the parameters of selection of objects should be different
and efficient.
Capture all scales: The object size can vary in a picture. It can have a full
boundary of partial boundary. The algorithm should take every single detail into
account and identify the objects present in the image [8].
The object recognition method in selective search is better than the above
methods. The selective search algorithm helps to find objects present in an image of
different textures. This is done using the hierarchical algorithm. The hierarchical
algorithm helps in detecting object over an object or any object that is contained in
it. For example, if there is a bowl of salad on a table, both made out of wood, the
hierarchical algorithm helps in detecting both the wooden material different from
each other.
The comparative analysis of exhaustive search, segmentation and selective
search methods is shown in Table 1.

Table 1 Analysis of the different methods used in object recognition


Technique Characteristic
Exhaustive Image is divided into – Effective algorithm for recognizing objects in
search regular grids an image
– It aim to captures all possible object locations
[6]
Segmentation Image is recognized – Scan the image by dividing in grid patterns
using contour lines – It tries to find a single partitioning of the
image into its unique objects before any
recognition [6]
Selective Combination of – Supports hierarchy of the image
search exhaustive search and – As compared to an exhaustive search, it
segmentation provides reduced number of locations by
uniting the advantages of segmentation
exhaustive search methods [8]
An Approach to Detect an Image as a Selfie … 117

4.5 Gaussian Mixture Model

In this, two burst shots are taken and these are then compared to find the fore-
ground, (i.e., the person taking the selfie) and the background. This is then using
tri-map blurs the background of the image that is being taken and gives the result.
A tri-map is constructed. The tri-map consists of four regions which are foreground
(FG), probably foreground (PRFG), probably background (PRBG), and background
(BG). The estimated shoulder points are defined by learning RGB information from
initial foreground (FG) and background (BG) region using Gaussian Mixture Model
(GMM). The PRFG and PRBG region are located within a certain margin from the
FG region [10, 11].
To gather different information from the selfie, it is needed to separate the
foreground from the background. In order to do so, the motion vectors are used.
The motion vectors define the objects that have moved in the two burst shots taken.
The burst shots are the images which are taken one after the other in a very
minimum interval of time. This time is in some milliseconds. The photographs
taken are very closely timed. This is done by comparing the motion vectors of both
the burst shot images. When pictures are normally compared, not much of a dif-
ference can be found. This difference is pointed out by the motion vector as it can
detect even the slightest change in the foreground as the background of image stays
the same due to it being stationary.

5 An Approach to Identify Image as a Selfie

A selfie itself is an image, so for detecting whether an image is an selfie or not we


need to perform some preprocessing on an image which mostly includes following
steps:
– Image Preprocessing
– Feature Extraction
– Segmentation
– Object Recognition
– Detecting a Selfie
The main objective of image preprocessing is to improve the image data that
suppresses unwanted distortions as well as to enhance some image features which
are required for further processing.
An image consists of many elements in it which can be detected as objects. To
identify the objects we can apply various object recognition algorithms. Here we are
considering following methods for object recognition: exhaustive searching, seg-
mentation, selective search, and Gaussian mixture model.
118 M. A. Bhalekar et al.

5.1 Mathematical Modeling

For the proposed approach we considered the selfie image taken by the human
hands and not by the selfie stick. The flow of proposed mathematical model is
shown in Fig. 3.
Let us consider an image with attributes/characteristics such as width, height,
color, texture, aspect ratio, orientation of the image, resolution, etc. Further to detect
that the image can be classified as a selfie, let a system considered be S.
It is represented as S = {Input (I), Function (F), Output (O)}
The function F can be defined as

F ¼ ff 1 ; f 2 ; f 3 ; f 4 ; f 5 g

The different functions considered are as follows:

f 1 ¼ fGaussian Mixture Modelg


f 2 ¼ fExhaustive methodg
f 3 ¼ fSegmentationg
f 4 ¼ fSelective methodg
f 5 ¼ fBag of Wordsg

These functions are discussed in Sect. 4.


The input goes through the function f1 and finds the following outputs:

Background (bg) and f2={Exhaustive


method}
Probable Background f3={Segmentation}
(pbg) detection f4={Selective Search}
f5={Bag of Words}

Input f1=
Image {Gaussian Objects recognized in the
Mixture foreground and
Model background
Foreground (fg) and
Probable Foreground
(pfg) detection Detect Image as an Selfie:
If(fg>bg) and the
difference is above a
certain value

Fig. 3 Flow of mathematical modeling


An Approach to Detect an Image as a Selfie … 119

O1 ¼ fForegroundg
O2 ¼ fBackgroundg
O3 ¼ fProbable Foregroundg
O4 ¼ fProbable Backgroundg

The output O1 and O2 help in finding the foreground and the background that is
present in the input image. The outputs O3 and O4 help in finding the undefined
region, that is, the region which could not be classified as the foreground or
background definitively, to be classified as a the probable foreground or the
probable background.
Objects in the foreground can be recognized using the f2, f3, f4, f5 methods. As
the image is separated into two regions, the object like human figure in the fore-
ground area is identified using the other desirable functions, namely f2, f3, f4, f5.
The background area is then checked for the locations recognition by the objects
present. A value is assigned to them which in turn is used in determining if the
given input image is a selfie or not. This is done by the following comparison:
If value of foreground is greater than the value of background and the difference
is above a certain limit, we can predict the input image can be a selfie. Using the
above-mentioned approach we can identify different objects from an image and by
analyzing its foreground and background details can further detect whether an
image is an selfie or not.

6 Conclusion

In this paper we present an approach to detect whether a given image is a selfie or


not. For object recognition the different methods such as exhaustive search, seg-
mentation, and selective search were analyzed. The exhaustive search takes a very
large area into consideration and goes through every necessary detail in the image.
Segmentation is the searching process in which it forms a contour line. The contour
lines help in defining the object boundary. According to the different methodology
analyzed, the algorithms used for the object recognition can be used for selfie
detection. The Gaussian Mixture Model helps in differentiating the foreground and
the background through which the selfies can be detected.

References

1. Orekh, E., Sergeyeva, O., Bogomiagkova, E.: Selfie phenomenon in the visual content of
social media. IEEE Conference (2016)
2. Du Preez, A.: Sublime selfies: to witness death. Eur. J. Cult. Stud. Published 10 August
(2017)
120 M. A. Bhalekar et al.

3. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
4. Harzallah, H., Jurise, F., Schmid, C.: Combining efficient object localization and image
classification. In: ICCV (2009)
5. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. CVPR
1, 511–518 (2001)
6. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int.
J. Comput. Vision 59, 167–181 (2004)
7. Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: CVPR (2010)
8. Uijlings, J.R., Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object
recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
9. Anghelescu, P., Iliescu, V.G., Mara, C., Gavriloaia, M.B.: Automatic thresholding method for
edge detection algorithms. In: ECAI—International Conference, 8th edn (2016)
10. Kim, S.J., Kim, B.S., Kim, H.I., Hong, T.H., Son, J.Y.: The method for defocusing selfie
taken by mobile frontal camera using burst shot. IEEE Conference (2016)
11. Kamate, S., Yilmazer, N.: Application of object detection and tracking techniques
forunmanned aerial vehicles. Published by Elsevier, Science Direct (2015)

You might also like