You are on page 1of 8

A Survey on Pixel-Based Skin Color Detection Techniques

Vladimir Vezhnevets ∗ Vassili Sazonov Alla Andreeva


Graphics and Media Laboratory †
Faculty of Computational Mathematics and Cybernetics
Moscow State University,
Moscow, Russia.

Abstract evaluated three different skin color modelling strategies. [Lee and
Yoo 2002] also have compared two most popular parametric skin
Skin color has proven to be a useful and robust cue for face models in different chrominance spaces and have proposed a model
detection, localization and tracking. Image content filtering, of their own.
content-aware video compression and image color balancing Our goal, in this paper, is to classify published skin modelling
applications can also benefit from automatic detection of skin techniques, describe their key ideas and try to find out and
in images. Numerous techniques for skin color modelling and summarize their advantages, disadvantages and characteristic
recognition have been proposed during several past years. A few features. The rest of the paper is organized as follows. Section
papers comparing different approaches have been published [Zarit 2 is devoted to description of different colorspaces used for skin
et al. 1999], [Terrillon et al. 2000], [Brand and Mason 2000]. detection. Section 3 covers the existing skin color modelling
However, a comprehensive survey on the topic is still missing. We methods. In section 4 numerical evaluation of some of the described
try to fill this vacuum by reviewing most widely used methods and methods is provided. In Sections 5 and 6 we discuss and compare
techniques and collecting their numerical evaluation results. the colorspaces and modelling methods. In Section 7 the conclusion
are drawn.
Keywords: image processing, color segmentation, color space
model selection, skin detection
2 Colorspaces used for skin modelling
1 Introduction Colorimetry, computer graphics and video signal transmission
standards have given birth to many colorspaces with different
Face detection and tracking has been topics of extensive research properties. A wide variety of them have been applied to the problem
for the several past decades. Many heuristic and pattern-recognition of skin color modelling. We will briefly review the most popular
based strategies have been proposed for achieving robust and colorspaces and their properties.
accurate solution. Among feature-based face detection methods,
ones using skin color as a detection cue, have gained strong
popularity. Color allows fast processing and is highly robust 2.1 RGB
to geometric variations of the face pattern. Also, experience
RGB is a colorspace originated from CRT (or similar) display
suggests that human skin has a characteristic color, which is
applications, when it was convenient to describe color as a
easily recognized by humans. So trying to employ skin color for
combination of three colored rays (red, green and blue). It
face detection was an idea suggested both by task properties and
is one of the most widely used colorspaces for processing and
common sense.
storing of digital image data. However, high correlation between
When building a system, that uses skin color as a feature for face
channels, significant perceptual non-uniformity (see section 2.6 for
detection, researcher usually faces three main problems. First, what
perceptual uniformity explanation), mixing of chrominance and
colorspace to choose, second, how exactly skin color distribution
luminance data make RGB not a very favorable choice for color
should be modelled, and finally, what will be the way of processing
analysis and color-based recognition algorithms. This colorspace
of color segmentation results for face detection. This paper covers
was used in [Brand and Mason 2000], [Jones and Rehg 1999].
the first two questions, leaving the third (an equally important one)
for another discussion.
In this paper we discuss pixel-based skin detection methods, that 2.2 Normalized RGB
classify each pixel as skin or non-skin individually, independently
Normalized RGB is a representation, that is easily obtained from
from its neighbors. In contrast, region-based methods [Kruppa
the RGB values by a simple normalization procedure:
et al. 2002], [Yang and Ahuja 1998], [Jedynak et al. 2002] try to
take the spatial arrangement of skin pixels into account during the R G B
detection stage to enhance method’s performance. r= g= b= (1)
R+G+B R+G+B R+G+B
Pixel-based skin detection has long history, but surprisingly few
papers that provide surveys or comparisons of different techniques As the sum of the three normalized components is known
were published. [Zarit et al. 1999] have provided a comparison (r + g + b = 1), the third component does not hold any significant
of five colorspaces (actually their chrominance planes) and two information and can be omitted, reducing the space dimensionality.
non-parametric skin modelling methods (lookup table and Bayes The remaining components are often called ”pure colors”, for the
skin probability map). [Terrillon et al. 2000] have compared nine dependance of r and g on the brightness of the source RGB color
chrominance spaces and two parametric techniques (Gaussian and is diminished by the normalization. A remarkable property of this
mixture of Gaussians models). [Brand and Mason 2000] have representation is that for matte surfaces, while ignoring ambient
light, normalized RGB is invariant (under certain assumptions) to
∗ e-mail: vvp@graphics.cmc.msu.ru changes of surface orientation relatively to the light source [Skarbek
† www: http://graphics.cmc.msu.ru and Koschan 1994]. This, together with the transformation
simplicity helped this colorspace to gain popularity among the 2.5 YCrCb
researchers [Brown et al. 2001], [Zarit et al. 1999], [Soriano et al.
2000], [Oliver et al. 1997], [Yang et al. 1998] YCrCb is an encoded nonlinear RGB signal, commonly used by
European television studios and for image compression work.
Color is represented by luma (which is luminance, computed from
2.3 HSI, HSV, HSL - Hue Saturation Intensity nonlinear RGB [Poynton 1995]), constructed as a weighted sum of
(Value, Lightness) the RGB values, and two color difference values Cr and Cb that are
formed by subtracting luma from RGB red and blue components.
Hue-saturation based colorspaces were introduced when there was
a need for the user to specify color properties numerically. They
describe color with intuitive values, based on the artist’s idea of Y = 0.299R + 0.587G + 0.114B
tint, saturation and tone. Hue defines the dominant color (such Cr = R −Y (9)
as red, green, purple and yellow) of an area, saturation measures
the colorfulness of an area in proportion to its brightness [Poynton Cb = B −Y
1995]. The ”intensity”, ”lightness” or ”value” is related to the
The transformation simplicity and explicit separation of
color luminance. The intuitiveness of the colorspace components
luminance and chrominance components makes this colorspace
and explicit discrimination between luminance and chrominance
attractive for skin color modelling [Phung et al. 2002], [Zarit et al.
properties made these colorspaces popular in the works on skin
1999] [Menser and Wien 2000], [Hsu et al. 2002], [Ahlberg 1999],
color segmentation [Zarit et al. 1999], [McKenna et al. 1998],
[Chai and Bouzerdoum 2000].
[Sigal et al. 2000], [Birchfield 1998], [Jordao et al. 1999]. Several
interesting properties of Hue were noted in [Skarbek and Koschan
1994]: it is invariant to highlights at white light sources, and 2.6 Perceptually uniform color systems
also, for matte surfaces, to ambient light and surface orientation
relative to the light source. However, [Poynton 1995], points The term ”skin color” is not a physical property of an object,
out several undesirable features of these colorspaces, including rather a perceptual phenomenon and therefore a subjective human
hue discontinuities and the computation of ”brightness” (lightness, concept. Therefore, color representation similar to the color
value), which conflicts badly with the properties of color vision. sensitivity of human vision system should help to obtain high
performance skin detection algorithm.
1
2 ((R − G) + (R − B))
CIELAB and CIELUV are perceptually uniform colorspaces
H = arccos p (2) (reasonably perceptually unform, to be exact) that were
((R − G)2 + (R − B)(G − B))
proposed by G. Wyszecki and standardized by CIE (Commission
min(R, G, B) Internationale de L’Eclairage). Perceptual uniformity means that
S = 1−3 (3)
R+G+B a small perturbation to a component value is approximately
1 equally perceptible across the range of that value ([Poynton
V = (R + G + B) (4) 1995]). The well-known RGB colorspace is far from being
3
perceptually uniform, the non-linear transformation to CIELAB
An alternative way of hue and saturation computation using log and CIELUV try to correct the situation. The price for better
opponent values was introduced in [Fleck et al. 2002], where perceptual uniformity is complex transformation functions from
additional logarithmic transformation of RGB values aimed to and to RGB space, demanding far more computation than most
reduce the dependance of chrominance on the illumination level. other colorspaces, described here. These colorspaces were used in
The polar coordinate system of Hue-Saturation spaces, resulting [Zarit et al. 1999], [Yang and Ahuja 1999], [Schumeyer and Barner
in cyclic nature of the colorspace makes it inconvenient for 1998], [Yang and Ahuja 1998]
parametric skin color models that need tight cluster of skin colors Psychologist Farsnworth have proposed an even more
for best performance. A different representation of Hue-Saturation perceptually uniform color system, derived from psychophysical
using Cartesian coordinates can be used [Brown et al. 2001]: experiments. It also uses nonlinear transforms from an RGB space.
It was first used for the skin detection in [Chen et al. 1995].
X = S cos H, Y = S sin H (5)
2.7 RGB channels ratio
2.4 TSL - Tint, Saturation, Lightness
It was observed, that skin invariably contains a significant level of
A normalized chrominance-luminance TSL space is a red. Using this observation, certain values of R/G ratio were used
transformation of the normalized RGB into more intuitive as skin presence indicators [Wark and Sridharan 1998]. Usefulness
values, close to hue and saturation in their meaning. of other RGB-space ratios (R/B and G/B) for skin detection was
tested and evaluated by [Brand and Mason 2000].
S = [9/5(r02 + g02 )]1/2 (6)

 arctan(r0 /g0 )/2π + 1/4, g0 > 0 2.8 Other colorspaces
T = arctan(r0 /g0 )/2π + 3/4, g0 < 0 (7) Besides YCrCb, several other linear transforms of the RGB space
 0, g0 = 0
were employed for skin detection - YES [Saber and Tekalp 1998],
L = 0.299R + 0.587G + 0.114B (8) YUV [Marques and Vilaplana 2000] and YIQ [Brand and Mason
2000], [Wang and Brandstein 1999]. Among less frequently used
r0 g0
where = r − 1/3, = g − 1/3 and r, g come from (1). [Terrillon colorspaces, CIE-xyz [Terrillon et al. 2000] can be mentioned.
et al. 2000] have compared nine different colorspaces for skin
modelling with a unimodal Gaussian joint pdf (only chrominance
components of the colorspaces were used). They argue that 3 Skin modelling
normalized TSL space is superior to other colorspaces for this task.
[Brown et al. 2001] has also employed this representation for their The final goal of skin color modelling is to build a decision rule, that
approach. will discriminate between skin and non-skin pixels. This is usually
accomplished by introducing a metric, which measures distance (in all histogram bin values [Jones and Rehg 1999], or maximum bin
general sense) of the pixel color to skin tone. The type of this metric value present [Zarit et al. 1999]). The normalized values of the
is defined by the skin color modelling method. lookup table bins constitute the likelihood that the corresponding
colors will correspond to skin.
3.1 Explicitly defined skin region
3.2.2 Bayes classifier
One method to build a skin classifier is to define explicitly (through
a number of rules) the boundaries skin cluster in some colorspace. The value of Pskin (c) computed in (11) is actually a conditional
For example [Peer et al. 2003]: probability P(c|skin) - a probability of observing color c, knowing
that we see a skin pixel. A more appropriate measure for skin
(R, G, B) is classified as skin if: detection would be P(skin|c) - a probability of observing skin,
given a concrete c color value. To compute this probability, the
R > 95 and G > 40 and B > 20 and Bayes rule is used:
max{R, G, B} − min{R, G, B} > 15 and (10)
|R − G| > 15 and R > G and R > B
P(c|skin)P(skin)
P(skin|c) = (12)
The simplicity of this method have attracted (and still does) many P(c|skin)P(skin) + P(c|¬skin)P(¬skin)
researchers [Peer et al. 2003], [Ahlberg 1999], [Fleck et al. 2002],
[Jordao et al. 1999]. The obvious advantage of this method is P(c|skin) and P(c|¬skin) are directly computed from skin and
simplicity of skin detection rules that leads to construction of a non-skin color histograms (11). The prior probabilities P(skin) and
very rapid classifier. The main difficulty achieving high recognition P(¬skin) can also be estimated from the overall number of skin
rates with this method is the need to find both good colorspace and non-skin samples in the training set [Jones and Rehg 1999],
and adequate decision rules empirically. Recently, there have been [Zarit et al. 1999], [Chai and Bouzerdoum 2000]. An inequality
proposed a method that uses machine learning algorithms to find P(skin|c) ≥ Θ, where Θ is a threshold value, can be used as a
both suitable colorspace and a simple decision rule that achieve skin detection rule [Jones and Rehg 1999]. Receiver operating
high recognition rates [Gomez and Morales 2002]. The authors characteristics (ROC) curve [Trees 1968] shows the relationship
start with a normalized RGB space and then apply a constructive between correct detections and false detections for a classification
induction algorithm (see [Gomez and Morales 2002] for details) to rule as a function of the detection threshold. It turns out, that
create a number of new sets of three attributes being a superposition the ROC curve for P(skin|c) ≥ Θ is invariant to choice of prior
of r, g, b and a constant 1/3, constructed by basic arithmetic probabilities, due to nature of the Bayes model. This means that
operations. A decision rule, similar to (10) that achieves the best P(skin) value affects only the choice of the threshold Θ.
possible recognition is estimated for each set of attributes. The One can avoid computing (12) explicitly, if what is really needed
authors prohibit construction of too complex rules, which helps is the comparison of P(skin|c) to P(¬skin|c), not their exact values.
avoiding data overfitting, that is possible in case of lack of training Using (12) the ratio of P(skin|c) to P(¬skin|c) can be written as:
set representativeness. They have achieved results that outperform
P(skin|c) P(c|skin)P(skin)
Bayes skin probability map (see section 3.2.2) classifier in RGB = (13)
space for their dataset. P(¬skin|c) P(c|¬skin)P(¬skin)

Comparing (13) to a threshold produces the skin/non-skin


3.2 Nonparametric skin distribution modelling decision rule. That after some manipulations, can be rewritten as:
The key idea of the non-parametric skin modelling methods is P(c|skin)
to estimate skin color distribution from the training data without >Θ (14)
P(c|¬skin)
deriving an explicit model of the skin color. The result of
1 − P(skin)
these methods sometimes is referred to as construction of Skin Θ=K×
Probability Map (SPM) [Brand and Mason 2000], [Gomez 2000] P(skin)
- assigning a probability value to each point of a discretized
colorspace. This shows, why the choice of prior probabilities does not affect
the overall detector behavior - for any prior probability P(skin) it is
possible to choose the appropriate value of K, that gives the same
3.2.1 Normalized lookup table (LUT) detection threshold Θ. It is also clear, that maximum likelihood
Several face detection and tracking algorithms [Chen et al. 1995], (ML) and maximum a posteriori (MAP) Bayes classification rules
[Zarit et al. 1999], [Schumeyer and Barner 1998],[Sigal et al. 2000], compared in [Zarit et al. 1999] are equivalent to (14) with different
[Soriano et al. 2000], [Birchfield 1998] use a histogram based- Θ values.
approach to skin pixels segmentation. The colorspace (usually, the
chrominance plane only) is quantized into a number of bins, each 3.2.3 Self Organizing Map
corresponding to particular range of color component value pairs
Self-Organizing Map (or SOM), devised by Kohonen in 80’s is
(in 2D case) or triads (in 3D case). These bins, forming a 2D or 3D
now one of the most popular types of unsupervised artificial neural
histogram are referred to as the lookup table (LUT). Each bin stores
network. In [Brown et al. 2001] a SOM-based skin detector
the number of times this particular color occurred in the training
was proposed. Two SOM’s - skin-only and skin + non-skin
skin images. After training, the histogram counts are normalized,
were trained from a set of about 500 manually labelled images.
converting histogram values to discrete probability distribution:
The detectors performance was tested on the authors training/test
skin[c] images set and famous Compaq skin database [Jones and Rehg
Pskin (c) = (11) 1999]. Several colorspaces (normalized RGB, Hue-Saturation,
Norm
cartesian Hue-Saturation and chrominance plane of TSL) were
where skin[c] gives the value of the histogram bin, corresponding tested with SOM detector. The results have shown, that SOM
to color vector c and Norm is the normalization coefficient (sum of skin detectors do not exhibit vivid performance change when using
different colorspaces. The SOM performance on the authors dataset
is marginally better than Gaussian mixture model, while for the k
Compaq database the SOM performance is inferior to the RGB p(c|skin) = ∑ πi · pi (c|skin) (18)
i=1
histograms used in [Jones and Rehg 1999]. The authors stress out
that SOM method needs considerably less resource than histogram In (18) k is the number of mixture components, πi are the mixing
and mixture models and is efficiently implemented for run-time parameters, obeying the normalization constraint ∑ki=1 πi = 1,
applications by the means of SOM hardware. and pi (c|skin) are Gaussian pdfs, each with its own mean and
covariance matrix. Model training is performed with a well-
3.2.4 Non-parametric methods summary known iterative technique called the Expectation Maximization
(EM) algorithm, which assumes the number of components k to be
Two clear advantages of the non-parametric methods are i. they known beforehand. The details of training Gaussian mixture model
are fast in training and usage and ii. they are theoretically with EM can be found, for example in [Yang and Ahuja 1999],
independent to the shape of skin distribution (which is not true [Terrillon et al. 2000]. The classification with a Gaussian mixture
for explicit skin cluster definition and parametric skin modelling). model is done by comparing the p(c|skin) value to some threshold.
The disadvantages are much storage space required and inability The choice of the components number k is important here. The
to interpolate or generalize the training data. If, for example, we model needs to explain the training data reasonably well with the
consider RGB quantized to 8 bits per color, we’ll need an array of given model on one hand, and avoid data over-fitting on the other.
224 elements to store skin probabilities. To reduce the amount of The number of components used by different researchers varies
needed memory and to account for possible training data sparsity, significantly - from 2 [Yang and Ahuja 1999] to 16 [Jones and
coarser colorspace samplings are used - 128x128x128, 64x64x64 Rehg 1999]. A bootstrap test for justification of k = 2 hypothesis
and 32x32x32. The evaluation of different RGB samplings in was performed in [Yang and Ahuja 1999], in [Terrillon et al. 2000]
[Jones and Rehg 1999] has shown, that 32x32x32 shows the best k = 8 was chosen as a ”good compromise between the accuracy of
performance. of estimation of the true distributions and the computational load
for thresholding”. [McKenna et al. 1998], [Oliver et al. 1997] have
also used Gaussian mixture models.
3.3 Parametric skin distribution modelling
The most popular histogram-based non-parametric skin models 3.3.3 Multiple Gaussian clusters
require much storage space and their performance directly depends
on the representativeness of the training images set. The need for Approximation of skin color cluster with three 3D Gaussians in
more compact skin model representation for certain applications YCbCr space is described in [Phung et al. 2002]. A variant of
along with ability to generalize and interpolate the training data k-means clustering algorithm for Gaussian clusters performs the
stimulates the development of parametric skin distribution models. model training. The pixel is classified as skin, if the Mahalanobis
distance from the c color vector to the closest model cluster center
is below a pre-defined threshold.
3.3.1 Single Gaussian
Skin color distribution can be modelled by an elliptical Gaussian 3.3.4 Elliptic boundary model
joint probability density function (pdf), defined as:
By examining skin and non-skin distributions in several colorspaces
1 Lee and Yoo [Lee and Yoo 2002] have concluded that skin color
· e− 2 (c−µs ) Σs (c−µs )
1 T −1
p(c|skin) = (15) cluster, being approximately elliptic in shape is not well enough
2π |Σs |1/2
approximated by the single Gaussian model. Due to asymmetry
Here, c is a color vector and µs and Σs are the distribution of the skin cluster with respect to its density peak, usage of the
parameters (mean vector and covariance matrix respectively). The symmetric Gaussian model leads to high false positives rate. They
model parameters are estimated from the training data by (16): propose an alternative they call an ”elliptical boundary model”
which is equally fast and simple in training and evaluation as the
1 n
1 n single Gaussian model and gives superior detection results on the
µs =
n ∑ c j; Σs =
n−1 ∑ (c j − µs )(c j − µs )T (16) Compaq database [Jones and Rehg 1999] compared both to single
j=1 j=1 and mixture of Gaussians . The elliptical boundary model is defined
as:
where n is the total number of skin color samples c j . The p(c|skin) Φ(c) = (c − φ )T Λ−1 (c − φ ) (19)
probability can be used directly as the measure of how ”skin-
like” the c color is [Menser and Wien 2000], or, alternatively, the The model training procedure has two steps - first, up to 5% of the
Mahalanobis distance from the c color vector to mean vector µs , training color samples with low frequency are eliminated to remove
given the covariance matrix Σs can serve for the same purpose noise and negligible data. Then, model parameters (φ and Λ) are
[Terrillon et al. 2000]: estimated by

λs (c) = (c − µs )T Σ−1
s (c − µs ) (17) 1 n 1 n
φ= ∑ ci ;
n i=1
Λ=
N ∑ fi · (ci − µ )(ci − µ )T ; (20)
i=1
Single Gaussian modelling method was also employed in [Hsu et al. n n
1
2002], [Ahlberg 1999], [Yang and Ahuja 1998], [Saber and Tekalp
1998].
µ=
N ∑ f i ci ; N= ∑ fi
i=1 i=1

3.3.2 Mixture of Gaussians where n is the total number of distinctive training color vectors ci
of the training skin pixel set (not the total samples number!), and
A more sophisticated model, capable of describing complex-shaped fi is the number of skin samples of color vector ci . Pixel with
distributions is the Gaussian mixture model. It is the generalization color c is classified as skin in case when Φ(c) < θ , where θ is a
of the single Gaussian, the pdf in this case is: threshold value. The authors claim that their model approximates
the skin cluster better, because the data skew does not affect the 1997], dynamic histograms [Soriano et al. 2000], [Stern and Efros
model centroid φ calculation. 2002], [Sigal et al. 2000], Gaussian distribution adaptation [Yang
et al. 1998].
Several authors have investigated how the color of a single
3.3.5 Parametric methods summary
person should be modelled and how it varies with lighting change.
All described parametric methods (except described in Section The hypothesis of unimodal Gaussian distribution of one person’s
3.3.3) operate in colorspace chrominance plane, ignoring the skin color under fixed lighting was justified in [Yang et al. 1998].
luminance information. A special study on skin color change under different lighting
Of course, since an explicit distribution model is used, a question conditions was made by [M. Storring 1999] and [Martinkauppi
of model validation arises. Obviously, the goodness of fit is more et al. 2001]. An unusual method for automatic colorspaces
dependent on the distribution shape, and therefore colorspace used, switching during the face tracking was proposed in [Stern and Efros
for parametric than for non-parametric skin models. This is clearly 2002]. See colorspaces discussion (Section 6) for more information
visible in the results of [Terrillon et al. 2000], [Lee and Yoo 2002], on the two latter methods.
where the model performance varies significantly from colorspace
to colorspace.
Only several authors have included theoretical justification for
4 Comparative evaluation
the validity of models they used. [Yang et al. 1998] has shown For fair performance evaluation of different skin color modelling
that skin color distribution of a single person under fixed lighting methods identical testing conditions are preferred. Unfortunately,
conditions in normalized RGB space obeys Gaussian distribution. many skin detection methods provide results on their own,
[Yang and Ahuja 1999] have justified the hypotheses of skin publicly unavailable databases. The most famous training and
data normality in CIELuv space and validity of two-component test image database for skin detection is the Compaq database
Gaussian mixture model by statistical tests. Others relied whether [Jones and Rehg 1999]. In the table below the best results of
on the observation of nearly elliptic shape of the skin chrominances different methods, reported by the authors, for this dataset are
cluster in the colorspace they used (to employ single Gaussian presented. Table 1 shows true positives (TP) and false positives
model or similar), or its clearly non-elliptical shape (to employ (FP) rates for different methods. Although different methods
mixture of Gaussians or several Gaussian clusters) with further use slightly different separation of the database into training and
model performance evaluation as the acceptance criterion [Terrillon testing image subsets and employ different learning strategies,
et al. 2000], [Lee and Yoo 2002], [McKenna et al. 1998]. the table should give an overall picture of the methods performance.

Method TP FP
3.4 Dynamic skin distribution models
Bayes SPM in RGB 80% 8.5%
A family of skin modelling methods was designed and tuned [Jones and Rehg 1999] 90% 14.2%
specifically for skin detection during face tracking. This task makes Bayes SPM in RGB 93.4% 19.8%
skin detection different from the static images analysis in several [Brand and Mason 2000]
aspects. First, in principle, the skin model can be less general Maximum Entropy Model 80% 8%
(more specific) - i.e. tuned for one concrete person, camera or in RGB [Jedynak et al. 2002]
lighting. Second, initialization stage is possible, when the face Gaussian Mixture models 80% ∼ 9.5%
region is discriminated from background by different classifier or in RGB [Jones and Rehg 1999] 90% ∼ 15.5%
manually. This gives a possibility to obtain skin classification SOM in TS 78% 32%
model, that is optimal for the given conditions (person, camera, [Brown et al. 2001]
lighting, background). Since there is no need for model generality, Elliptical boundary model 90% 20.9%
it is possible to reach higher skin detection rates with low false in CIE-xy [Lee and Yoo 2002]
positives with this specific model, than with general skin color Signle Gaussian in CbCr 90% 33.3%
models, intended to classify skin in totally unconstrained images [Lee and Yoo 2002]
set (like in [Jones and Rehg 1999]). On the other hand, skin Gaussian Mixture in IQ 90% 30.0%
color distribution can vary with time, along with lighting or camera [Lee and Yoo 2002]
white balance change, so the model should be able to update Thresholding of I axis 94.7% 30.2%
itself to match the changing conditions. Also, model training in YIQ [Brand and Mason 2000]
and classification time becomes extremely important here, for the
skin detection system must work at real-time, consuming little Table 1: Performance of different skin detectors reported by the
computing power. authors
To summarize the most important properties of skin color model
for face tracking: first, it should be fast in both training and
classification and second, it should be able to update itself to The best performance (lower false positives for a given correct
changing conditions. Minding these aspects, many researches detection rate) is demonstrated by Bayes SPM and it’s descendant -
turn to simple parametric skin modelling - it is easily updated to maximum entropy model [Jedynak et al. 2002]. The parametric
distribution change, is acceptably fast (except for many-component modelling techniques (Gaussian, mixture of Gaussians, elliptic
mixture of Gaussians) and needs little storage space. The boundary model) are left behind together with SOM-based detector.
high false positives rate - a usual companion of parametric skin High performance of the mixture of Gaussians used in [Jones
modelling, is less a problem here. The need for specific, not and Rehg 1999] is due to the fact, that they actually modelled
general skin color model permits achievement of good classification both p(RGB|skin) and p(RGB|¬skin) pdfs (in contrast to other
performance. Among non-parametric models, the histogram-based parametric skin modelling papers). They did not provide a clear
LUT is popular for face tracking tasks, thanks to its simplicity and indication on how exactly final skin probability was computed
high training and working speed. from these pdfs, so we conclude that Bayesian rule was used (14).
A number of methods for skin color distribution recalculation This, altogether with high number of mixture components (sixteen)
were proposed: online Expectation Maximization [Oliver et al. makes this model an approximation of Bayes SPM. We believe that
this is the explanation of high performance of Gaussian mixture compactness of the skin cluster. These measures are independent
model of Jones and Rehg. A fact worth noting is that simple to color modelling strategy and are determined to evaluate the
thresholding of I component of YIQ space, proposed by [Wang and colorspace performance for skin detection ’in general’. They surely
Brandstein 1999] and evaluated in [Brand and Mason 2000] shows can provide an overall impression on the distribution of the skin
result comparable to more sophisticated Gaussian and mixture of and non-skin samples of the training set, but their feasibility for
Gaussians skin models. evaluation of the colorspace goodness seems doubtful to us.
Another promising method, appeared recently, which is not Recently, there emerged several papers that seriously doubt
included in this table, is automatic construction of a colorspace and any significant influence of colorspace selection on the final skin
an explicitly defined skin cluster in it [Gomez 2000], [Gomez and detection result [Shin et al. 2002], [Albiol et al. 2001]. In [Shin
Morales 2002] (refer back to Section 3.1 for more details). The et al. 2002] scatter matrices of skin and non-skin clusters and
authors have achieved results that outperform Bayes SPM classifier skin and non-skin histograms overlap were used as colorspace
in RGB space for their dataset, giving significantly lower false performance metrics. The authors conclusion was that skin
positives rate (around 6% against 22%) and almost equal false and non-skin color classes separability is highest in RGB space,
negatives (around 5%). and that dropping luminance component significantly worsens the
separability. We do not quite agree with colorspace comparison
strategy, carried out in [Shin et al. 2002]. Our strong belief is
5 Methods discussion that valid colorspace comparison is the one carried not ’in general’
(by assessing skin and non-skin colors overlap and skin cluster
Main advantage of methods that use explicitly defined skin cluster shape), but for a certain skin distribution model. The performance
boundaries (section 3.1) is the simplicity and intuitiveness of of parametric skin classifiers depends heavily on the colorspace
the classification rules. However, the problem with them is the choice - this can be observed by the results obtained in [Terrillon
need to find both good colorspace and adequate decision rules et al. 2000], [Lee and Yoo 2002]. Methods, that use explicitly
empirically. The recently proposed method that uses machine defined skin region also benefit much by appropriate colorspace
learning algorithms to find both suitable colorspace and simple choice [Peer et al. 2003], [Gomez 2000], [Gomez and Morales
decision rules [Gomez and Morales 2002] has shown a way to 2002]. Non-parametric methods (Bayes SPM, SOM, LUT), on the
overcome these difficulties. contrary, are almost independent to the colorspace choice [Zarit
Non-parametric methods (section 3.2) are fast both in training et al. 1999], [Brown et al. 2001], [Albiol et al. 2001]. We believe,
and classification, independent to distribution shape and therefore that skin and non-skin overlap damages heavily the performance
to colorspace selection (see colorspaces discussion section for more of parametric skin color models [Terrillon et al. 2000], [Lee and
information on the topic). But, they require much storage space and Yoo 2002] and lookup-table (LUT) method [Zarit et al. 1999],
a representative training dataset. because this overlap is not taken into account by the model.
Parametric methods (section 3.3) can also be fast, they have a The independence on colorspace choice for most non-parametric
useful ability to interpolate and generalize incomplete training data, models fits well with theoretical results obtained in [Albiol et al.
they are expressed by a small number of parameters and need very 2001]. The authors state that for an optimal skin detector D(x) in
little storage space. However, they can be really slow (like mixture colorspace C, and for an invertible colorspace transformation rule
of Gaussians) in both training and work, and their performance T : C → C0 , there exists a classifier D0 (x0 ) in C0 colorspace, that
depends strongly on the skin distribution shape. Besides, most has the same correct detection and false positive rates. They give
parametric skin modelling methods ignore the non-skin color an example of Bayes SPM, that performs almost equally in several
statistics. This, together with dependance on skin cluster shape colorspaces.
results in higher false positives rate, compared to non-parametric
methods. Many works on skin detection drop the luminance component
of the colorspace. This decision seems logical, as the goal is
to model what can be thought of ”skin tone”, which is more
6 Colorspaces discussion controlled by the chrominance than luminance coordinates. The
dimensionality reduction, achieved by discarding luminance also
At a first glance, colorspace selection seems to be crucial for color- simplifies the consequent color analysis. Another argument for
based skin detection. One important question is: what is the best ignoring luminance is that skin color differs from person to person
colorspace for skin detection, or more generally - is there an optimal mostly in brightness and less in the tone itself. Illumination
colorspace for skin-classification? Surprisingly, many papers on conditions clearly affect the color of the objects in the scene. The
skin detection do not provide strict justification of their colorspace goal of any color-based system is diminishing this influence to
choice, probably because of possibility to obtain acceptable skin make color-based recognition robust to illumination change. It
detection results on limited dataset with almost any colorspace. seems, that chrominance-only color analysis should render the
Only few papers have been devoted to comparative analysis of system partially independent from the lighting conditions. Profit
different colorspaces used for skin detection [Zarit et al. 1999], of luminance component removal, that seemed perfectly logical
[Terrillon et al. 2000], [Gomez 2000], [Gomez and Morales for many researchers before, was doubted by [Shin et al. 2002].
2002], [Stern and Efros 2002]. Several authors have seriously Tests the authors have performed show that luminance removal
considered the problem of colorspace selection, and have provided does not increase separability of skin and non-skin clusters. This
justifications for the optimality (or adequateness) of their choice is, of course, true because the projection of 3D data on a plane
for the skin model they employed [Yang and Ahuja 1999], [Yang almost certainly smears skin and non-skin classes together. But
et al. 1998], [M. Storring 1999], [Schumeyer and Barner 1998]. we think, that dropping luminance is a matter of training data
Colorspace ’goodness’ for skin modelling is usually evaluated generalization. For a training dataset with sparse distribution of
by two different families of measures. First is training and test skin luminances (e.g. little number of face image under similar
set classification error, computed after color model parameter lighting conditions) the removal of luminance component helps
estimation. It is a well-known classifier performance evaluation constructing skin classifier that will also work for images with
principle, which clearly indicates the goodness-of-fit of the selected different lighting intensity. Also, reduction of space dimensionality
model to the given dataset. The second family of measures is is very attractive in some cases.
skin and non-skin colors overlap in the given colorspace and An interesting study of skin color distribution behavior under
changing lighting conditions was performed in [M. Storring 1999] References
and [Martinkauppi et al. 2001]. The authors have shown that
for different lighting conditions the skin color from their dataset A HLBERG , J. 1999. A system for face localization and facial feature
(of approximately 125 individuals) lies inside a definitely shaped extraction. Tech. Rep. LiTH-ISY-R-2172, Linkoping University.
colorspace region - the so-called skin locus, that can be modelled A LBIOL , A., T ORRES , L., AND D ELP, E. J. 2001. Optimum color spaces
by one or two functions of up to quadratic order only. The locus is for skin detection. In Proceedings of the International Conference on
camera-specific and is used by [Soriano et al. 2000] as the skin color Image Processing, vol. 1, 122–124.
filter for dynamic skin histogram updating during face tracking.
B IRCHFIELD , S. 1998. Elliptical head tracking using intensity gradients
The locus may be found experimentally or, in principle, may be and color histograms. In Proceedings of CVPR ’98, 232–237.
calculated. A database of illuminants, skin spectral reflectance and
a knowledge of the camera sensitivities (for example, supplied by B RAND , J., AND M ASON , J. 2000. A comparative assessment of
the manufacturer) can allow user to compute the camera skin locus three approaches to pixellevel human skin-detection. In Proc. of the
([Martinkauppi and Soriano 2001]). International Conference on Pattern Recognition, vol. 1, 1056–1059.
An adaptive colorspace switching method for face tracking was B ROWN , D., C RAW, I., AND L EWTHWAITE , J. 2001. A som based
proposed in [Stern and Efros 2002]. Optimal colorspace for a given approach to skin detection with application in real time systems. In Proc.
video frame is determined by a simple colorspace quality measure. of the British Machine Vision Conference, 2001.
Dynamic change of the colorspace is intended to contribute to C HAI , D., AND B OUZERDOUM , A. 2000. A bayesian approach to skin
robustness of the face tracking method. However, judging from color classification in ycbcr color space. In Proceedings IEEE Region
experimental data the authors have provided, among five colorspace Ten Conference (TENCON’2000), vol. 2, 421–424.
chromaticity planes (normalized RGB, HS, YQ, and CrCb) and
RG plane of the RGB space, the normalized RGB and HS planes C HEN , Q., W U , H., AND YACHIDA , M. 1995. Face detection by fuzzy
performed almost equally and much better than the others. This pattern matching. In Proc. of the Fifth International Conference on
Computer Vision, 591–597.
suggests that little was gained by adaptive colorspace switching, if
compared to using solely HS or normalized RGB. F LECK , M., F ORSYTH , D. A., AND B REGLER , C. 2002. Finding nacked
people. In Proc. of the ECCV, vol. 2, 592–602.
G OMEZ , G., AND M ORALES , E. 2002. Automatic feature construction
7 Conclusion and a simple rule induction algorithm for skin detection. In Proc. of the
ICML Workshop on Machine Learning in Computer Vision, 31–38.
In this paper, we have provided the description, comparison
and evaluation results of popular methods for skin modelling G OMEZ , G. 2000. On selecting colour components for skin detection. In
Proc. of the ICPR, vol. 2, 961–964.
and detection. We tried to summarize the most notable and
significant differences between the methods, their advantages and H SU , R.-L., A BDEL -M OTTALEB , M., AND JAIN , A. K. 2002. Face
disadvantages. The most important conclusions we draw are listed detection in color images. IEEE Trans. Pattern Analysis and Machine
below: Intelligence 24, 5, 696–706.
J EDYNAK , B., Z HENG , H., DAOUDI , M., AND BARRET, D. 2002.
• Parametric skin modelling methods are better suited for Maximum entropy models for skin detection. Tech. Rep. XIII, Universite
constructing classifiers in case of limited training and des Sciences et Technologies de Lille, France.
expected target data set. Generalization and interpolation
J ONES , M. J., AND R EHG , J. M. 1999. Statistical color models with
ability of these methods makes it possible to construct application to skin detection. In Proc. of the CVPR ’99, vol. 1, 274–280.
a classifier with acceptable performance from incomplete
training data. J ORDAO , L., P ERRONE , M., C OSTEIRA , J., AND S ANTOS -V ICTOR , J.
1999. Active face and feature tracking. In Proceedings of the 10th
• Methods that are less dependent on skin cluster shape and take International Conference on Image Analysis and Processing, 572–577.
skin and non-skin colors overlap into account (Bayes SPM, K RUPPA , H., BAUER , M. A., AND S CHIELE , B. 2002. Skin patch
Maximum entropy model [Jedynak et al. 2002], automatically detection in real-world images. In Annual Symposium for Pattern
constructed colorspace and classification rules [Gomez and Recognition of the DAGM 2002, Springer LNCS 2449, 109–117.
Morales 2002]) look more promising for constructing skin
L EE , J. Y., AND YOO , S. I. 2002. An elliptical boundary model for
classifier for large target datasets. skin color detection. In Proc. of the 2002 International Conference on
Imaging Science, Systems, and Technology.
• Excluding color luminance from the classification process
cannot help achieving better discrimination of skin and non- M ARQUES , F., AND V ILAPLANA , V. 2000. A morphological approach for
skin colors, but can help to generalize sparse training data. segmentation and tracking of human faces. In International Conference
on Pattern Recognition (ICPR’00), vol. 1, 5064–5068.
• Evaluation of colorspace goodness ’in general’ by assessing M ARTINKAUPPI , B., AND S ORIANO , M. 2001. Basis functions of the
skin/non-skin overlap, skin cluster shape, etc. regardless color signals of skin under different illuminants. In 3rd Intl conference
to any specific skin modelling method cannot give the on Multispectral Color Science, 21–24.
impression of how good is the colorspace suited for skin
M ARTINKAUPPI , B., L AAKSONEN , M., AND S ORIANO , M. 2001.
modelling, because different modelling methods react very
Behavior of skin color under varying illumination seen by different
differently on colorspace change. cameras at different color spaces. In Machine Vision Applications in
Industrial Inspection IX, Proceedings of SPIE, vol. 4301, 102–112.

8 Acknowledgements M C K ENNA , S., G ONG , S., AND R AJA , Y. 1998. Modelling facial colour
and identity with gaussian mixtures. Pattern Recognition 31, 12, 1883–
We would like to express our gratitude to our colleague Alexey 1892.
Ignatenko, who has provided us with a convenient statistical data M ENSER , B., AND W IEN , M. 2000. Segmentation and tracking of facial
graphical analysis tool, that was used for testing of skin color regions in color image sequences. In Proc. SPIE Visual Communications
modelling methods. and Image Processing 2000, 731–740.
O LIVER , N., P ENTLAND , A., AND B ERARD , F. 1997. Lafter: Lips and
face real time tracker. In Proc. Computer Vision and Pattern Recognition,
123–129.
P EER , P., KOVAC , J., AND S OLINA , F. 2003. Human skin colour clustering
for face detection. In submitted to EUROCON 2003 - International
Conference on Computer as a Tool.
P HUNG , S. L., B OUZERDOUM , A., AND C HAI , D. 2002. A novel
skin color model in ycbcr color space and its application to human
face detection. In IEEE International Conference on Image Processing
(ICIP’2002), vol. 1, 289–292.
P OYNTON , C. A. 1995. Frequently asked questions about colour. In
ftp://www.inforamp.net/pub/users/poynton/doc/colour/ColorFAQ.ps.gz.
S ABER , E., AND T EKALP, A. 1998. Frontal-view face detection and facial
feature extraction using color, shape and symmetry based cost functions.
In Pattern Recognition Letters, vol. 9, 669–680.
S CHUMEYER , R., AND BARNER , K. 1998. A color-based classifier for
region identification in video. In Visual Communications and Image
Processing 1998, SPIE, vol. 3309, 189–200.
S HIN , M. C., C HANG , K. I., AND T SAP, L. V. 2002. Does colorspace
transformation make any difference on skin detection? In IEEE
Workshop on Applications of Computer Vision.
S IGAL , L., S CLAROFF , S., AND ATHITSOS , V. 2000. Estimation and
prediction of evolving color distributions for skin segmentation under
varying illumination. In Proc. IEEE Conf. on Computer Vision and
Pattern Recognition, vol. 2, 152–159.
S KARBEK , W., AND KOSCHAN , A. 1994. Colour image segmentation
– a survey –. Tech. rep., Institute for Technical Informatics, Technical
University of Berlin, October.
S ORIANO , M., H UOVINEN , S., M ARTINKAUPPI , B., AND L AAKSONEN ,
M. 2000. Skin detection in video under changing illumination
conditions. In Proc. 15th International Conference on Pattern
Recognition, vol. 1, 839–842.
S TERN , H., AND E FROS , B. 2002. Adaptive color space switching for
face tracking in multi-colored lighting environments. In Proc. of the
International Conference on Automatic Face and Gesture Recognition,
249–255.
S TORRING , M., A NDERSEN , H., AND G RANUM , E. 1999. Skin colour
detection under changing lighting condition. In Araujo and J. Dias (ed.)
7th Symposium on Intelligent Robotics Systems, 187–195.
T ERRILLON , J.-C., S HIRAZI , M. N., F UKAMACHI , H., AND A KAMATSU ,
S. 2000. Comparative performance of different skin chrominance
models and chrominance spaces for the automatic detection of human
faces in color images. In Proc. of the International Conference on Face
and Gesture Recognition, 54–61.
T REES , H. L. V. 1968. Detection, Estimation, and Modulation Theory,
vol. I. Wiley.
WANG , C., AND B RANDSTEIN , M. 1999. Multi-source face tracking with
audio and visual data. In IEEE MMSP, 169–174.
WARK , T., AND S RIDHARAN , S. 1998. A syntactic approach to automatic
lip feature extraction for speaker identification. In ICASSP, 3693–3696.
YANG , M.-H., AND A HUJA , N. 1998. Detecting human faces in color
images. In International Conference on Image Processing (ICIP), vol. 1,
127–130.
YANG , M., AND A HUJA , N. 1999. Gaussian mixture model for human skin
color and its application in image and video databases. In Proc. of the
SPIE: Conf. on Storage and Retrieval for Image and Video Databases
(SPIE 99), vol. 3656, 458–466.
YANG , J., L U , W., AND WAIBEL , A. 1998. Skin-color modeling and
adaptation. In Proceedings of ACCV 1998, 687–694.
Z ARIT, B. D., S UPER , B. J., AND Q UEK , F. K. H. 1999. Comparison of
five color models in skin pixel classification. In ICCV’99 Int’l Workshop
on recognition, analysis and tracking of faces and gestures in Real-Time
systems, 58–63.

You might also like