You are on page 1of 6

(ICRCICN)

Detection of Drowsiness based on HOG features and


SVM classifiers
Leo Pauly, Deepa Sankar
Division of Electronics Engineering
School of Engineering
Cochin University of Science and Technology
Kochi -682022, Kerala, India
leopauly@ieee.org, deepasankar@cusat.ac.in

AbstractThis paper presents an accurate method of uses a Haar based cascade classifier based method for eye
drowsiness detection for the images obtained using low resolution tracking, Histogram of Oriented Gradients (HOG) and Support
consumer grade web cameras under normal lighting conditions. Vector machines (SVM) for blink detection. After blink
The drowsiness detection method uses Haar based cascade detection the PERCLOS value is calculated. If the PERCLOS
classifier for eye tracking and combination of Histogram of
value is greater than 6000ms then the person is said to be
oriented gradient (HOG) features combined with Support Vector
Machine (SVM) classifier for blink detection. Once the eye blinks drowsy.
are detected then the PERCLOS is calculated from it. If the The presented system is very much user friendly, non
PERCLOS value is greater than 6 seconds then the person is said intrusive and doesnt require any specialized hardware. It
to be drowsy. The presented system was validated by comparing performs well in uncontrolled lighting conditions under normal
the prediction of the system with that of a human rater. The resolutions of an USB web camera. The prototype of the
system matched with the human observer with 91.6 % accuracy. system was developed using MATLAB 2014a. The reliability
of the system was verified by comparing its performance with
Index Terms Drowsiness detection, Eye tracking, blink
the judgments of a human rater. The system has shown a
detection, PERCLOS, Haar based cascade classifier, SVM, HOG
feature
91.6% match with the judgments of that of a human rater.

I. INTRODUCTION II. DEVELOPED SYSTEM


Drowsiness is an involuntary human physical activity. The different steps involved in the algorithm of the
Websters Dictionary defines drowsiness as a feeling of being developed system are as follows:
sleepy and lethargic [1]. Since drowsiness can be directly
related to the human concentration and activeness, drowsiness 1. Capturing video frames using the web camera
detection has been applied in fields like in human behavioral 2. Face detection and extraction
analysis, fatigue detection, alertness level measurement etc. 3. Eye region Extraction and Eye detection
A lot of different methods have been proposed for 4. Blink detection
drowsiness detection. These methods can be broadly divided 5. PERCLOS Calculation
into two categories: Intrusive methods and Non- Intrusive 6. Drowsiness detection
methods. Intrusive methods include the introduction of
measuring apparatus on the human body, whereas in Non- Each of these steps is explained in the following sections
intrusive methods no devices come in direct contact with in detail.
human body. The methods such [2]-[7] are some of the
examples of intrusive drowsiness detection methods and A. Capturing Image using the web camera
methods such as [8]-[12] are some of the non-intrusive In the first step, the video of the subject under test is
methods used for drowsiness detection. captured using the web camera. The video captured is stored as
But most of these methods face many challenges. The a collection of frames (images). Each of these frames are
intrusive methods always require measuring devices to be in extracted and processed separately.
contact with human body and additional hardware devices. In
non intrusive methods, high resolution cameras are required for B. Detection and extraction of face image
capturing images. Even though, most of the methods give high From the video frames captured by the web camera, the
accurate results in constrained lab environments, the faces are extracted. For this, the face detection method of Haar
performance of these systems comes down when used in real based cascade classifiers proposed by Viola and Jones [13] is
world conditions in an unconstrained environment. used.
In this background, this paper presents a new method for
drowsiness detection using consumer grade web cameras. It

181
1) Haar features: The face detection method uses Haar process is illustrated in detail in section III during the
features for face detection. Haar features are extracted by using discussion of implementation.
a set of rectangular black and white windows. The black color
has a weight of -1 and the white region has weight 0. The
windows are first applied to the image and corresponding
values are multiplied with the pixel intensities. Then these
values are added together, and the Haar feature corresponding
to the window used is obtained. But all the Haar features
extracted are not required for successful detection of faces.
Hence a boosting algorithm [14] is used to find the most
important features that can be used for face detection.
2) Cascade classifiers: Once the Haar features are
obtained then individual classifiers are built based on the
values of each Haar feature. These individual classifiers are
then arranged into a cascade classifier. A cascaded classifier is Fig 2: Geometric ratios of human face
combination of several classifiers arranged in the different
stages cascaded on after one another. The number of classifiers
Once the eye regions are obtained then the next step is to
in each stage and their threshold values are determined by the
detection of eyes in those regions. The left and right eye
boosting algorithm during the training of the classifiers with
regions are processed separately. Eye detection is performed
labeled face images. The cascade classifier used here has 22
using Haar based cascade classifier described in section II.B.
stages and a total of 2135 features.
Here a cascade classifier trained with eye images is used. The
3) Face detection: The general structure of the cascade classifier classifies the eye region into portions that contains
classifier is given in Fig 1. A certain number of features are eye and those portions that do not contain an eye. Then the
checked at each stage of the cascade classifier. When a region portions containing an eye are extracted and thus the eyes are
of the image is taken it is first checked using the first stage of detected. These eye images are then sent to the next stage for
the cascade classifier. If the image region fails to pass the first further processing.
stage it will be rejected as it does not contain a face else it will
But this detection method fails when the eyes are closed. In
be passed to the next stage. If the face region satisfies all the
such situations the eyes images are cut out from the positions
stages then the region is classified to have a face in it. The
of eyes in the extracted eye region from the preceding frame in
advantage of using a cascade classifier is that it can reduce
the video sequence.
computational load as each stage need only a certain set of the
features to be extracted rather than the entire set of 2135
D. Blink detection:
features extracted together.
Once the eyes are detected next stage is the blink
detection. The presented system uses a Histograms of Oriented
Gradients (HOG) as features and Support Vector Machines as
binary classifiers for eye blink detection from the eye images.

1) HOG features: The HOG features [16] method was


developed by Dalal and Trigs in 2005. It is a feature descriptor
that is used for various object detection applications in the field
of computer vision. The key idea of the HOG features is to
group gradient magnitudes into bins in a histogram based on its
orientation.
HOG features are extracted from the eye images obtained in
Fig 1: Geometric ratios of human face the previous section. For that the image is first resizes into
24x24 pixels. Then the image is divided into 4 blocks of size
C. Eye Region Extraction and eye detection
16x16 pixels with each block overlapping half of the region
Once the face images are obtained the eye region is to be
covered by the preceding block. Each block has 4 cells of size
extracted. The regions containing eyes extracted using the
8x8 pixels. Next the gradients are computed for each pixel
geometrical properties of the human face. According to [15]
inside the cells using Sobel filters. These gradients magnitudes
there is geometrical ratio being followed in the arrangement of
are then plotted in a histogram which has magnitudes on the y-
organs like eyes, nose, mouth etc. If d corresponds to the total
axis and orientations in the x-axis. The x-axis is divided into 9
length of the human face then the eye region lies between .2d
bins, each bin having a width of 20 degrees. The gradient
to .6d from the top of the human face. The Fig 2 shows these
magnitudes are then arranged into these 9-bin histograms based
geometrical proportionalities of the human face. This concept
on their orientation. The value of each of these 9 bins
is used to extract the eye region from the human face. The
corresponds to the feature values extracted from each cell of
the image. Thus each cell of 64 pixels is represented using 9

182
feature values. Then these feature values of all the cells inside problem and it is solved using Lagrange multipliers i (i =
each block are concatenated to obtain the final feature 1,...l). In case the data is not linearly separable then it is first
descriptor. The Fig 3 shows the HOG features extracted from mapped into another feature space where it is linearly separable
the open and closed eye images. A more detailed description of using Kernel function. Then the equation becomes:
the HOG features can be seen in [16]. n
f ( h) = y V (h , h(i)) + b
i =1
i i i

where V represents the Kernel function.

In the presented system the SVM classifier uses a Gaussian


Radial Basis function for mapping the data into another feature
space. The classifiers are trained using 40 eye images each of
open and closed eyes. Once the classifier is trained then the test
images are classified applied to the classifier for classification.

E. Measurement of PERCLOS

PERCLOS [18] is a measure of the duration of time in


which the eyes were closed. It is defined as the amount of time
the eyelids remain closed in 1 min. PERCLOS (in seconds) is
calculated in the presented system using the following
equation:

No of frames in which eyes are closed in one min x 60


Total number of frames in one minute

F. Drowsiness Detection
Once the PERCLOS value of the user is calculated next
step is to detect whether the person in a drowsy state or not.
Basic principle is that the eyes are closed for longer durations
when the person is drowsy than when the person is in active
state. Hence if the value of PERCLOS exceeds above a
particular threshold then the person can be said to be in a
Fig 3: HOG features extracted from the open and closed eyes drowsy state else the person can be said to be active. According
to [19] the average blink duration of a human being is 100-400
ms and the number of blinks per second is 10-15 from [20].
2) SVM classifier: Support Vector Machines were initially From these values the time interval for which human eyes will
developed by Vapnik and his team. It was later improved by be closed in 1 minute would be = 400 x 15 = 6000ms for a
other researchers. It is a statistical learning model used normal person when he is in the active state. So the threshold
commonly for classification problems. A detailed description value for PERCLOS is estimated to be 6.00 seconds. If the
of the SVM is given in [17]. PERCLOS duration exceeds this limit then the person is
Let the data points of the eye images be represented by: estimated to be in a drowsy state else he is said to be active.
(h1,y1), (h2,y2)..... (hn,yn) were hi represent the HOG feature
vector representing the nth eye image and yn represent the class
of the nth eye image. y can have two values 0 or a 1. 0 III. RESULTS & DISCUSSIONS
represents the closed eye image and 1 represents the open eye A. Implementation
image. The basic idea of the HOG features is to find a hyper In the presented system the video frames are captured using
plane with the maximum margin that separates the two classes. an ordinary CMOS web camera which has a resolution of
In case of linearly separable data the hyper plane in terms of 640480 pixels and has a frame rate of 5ps. Since the captured
support vectors is given by: video is as an array of video frames (images), each of these
n image frames are separated and processed individually. The
f ( h) = y h (.)h(i) + b
i i i Fig 4 shows a single video frame extracted from the video
i =1 captured by the web camera.

were yi denotes the class of the data point hi and h(i) represents
the support vector machines. This is a Lagrange optimization

183
(b)

Fig 4: Video frame captured by the web camera

Then the face is detected from the image using Haar based (c)
cascade classifier described in section 2.2. The classifier After that just like the face detection is performed by the
classifies the frames into portions that contain a face and into Haar based cascade classifiers, these are used for detection of
that does not contain a face. The portion containing a face is eyes from the eye region as explained in section II.D. The Fig 7
extracted and thus the faces are detected. The Fig 5 shows the shows the left and right eyes extracted from the two eye
face detection algorithm working on the frame captured by the regions. These eye images are passed on to the next stage for
web camera. further classification.

Fig 7: Eyes detected using Haar based cascade classifier


Fig 5: Face detected from the frame using applied on the eye region
Haar based cascade classifier
The geometrical properties of human face are computed, With the extraction of the eyes as shown in Fig 7 the eye
in the face image extracted from the video frame as mentioned tracking algorithm is completed. The eyes are tracked
in the section II.C. These geometrical ratios and extracted eye successfully from the video frame extracted from the video
regions is shown in Fig 6(a) and Fig 6(b) respectively. This sequence. Next is the eye blink detection. For detecting the eye
extracted eye region is then separated into two regions by blinks the combination of HOG features and SVM classifier is
cutting exactly at the center as shown in Fig 6(c) and sent to used. 144 HOG features are extracted from each eye image.
the next stages for further processing. These feature values are applied to the trained SVM classifier
which classifies the eye image into either open or closed
classes.
Next is the PERCLOS measurement. To measure the
PERCLOS number of frames in which the eyes were detected
to be blinked was calculated. Then it was divided by the
number of frames in each minute. If the value is greater than 6
second then the person is said to be drowsy else he is termed as
active.

B. Validation
For validating the efficiency of the method, twelve test
videos of different durations and frame rates captured under
(a) normal lighting conditions of an ordinary room was used. The
Fig 8 shows the sample frames from the test videos used.

184
Table 1: Comparison of predictions made by the developed
and system and human rater

Sub Time Resolutio Fp PERCL Predicti Human


No: (sec) n s OS (s) on rater
1 35 640 x 480 30 26.71 Drowsy 9
2 31 320 x234 25 15.11 Drowsy 7
Fig 8: Sample frames from the test videos used 3 100 480 x 360 15 18.71 Drowsy 7
4 68 640 x 480 10 1.09 Active 1
Then the algorithm is applied on to the test videos and the 5 79 640 x 360 15 3.05 Active 5
PERCLOS value is calculated. From the PERCLOS value it is 6 65 640 x 360 15 17.48 Drowsy 9
determined whether the subject is drowsy or not. Then the 7 46 640 x 480 30 9.09 Drowsy 6
correctness of the prediction is validated by a human rater. The 8 47 640 x 480 30 27.72 Drowsy 8
human rater was asked to rate the drowsiness level based on a 9 40 640 x 480 30 2.74 Active 4
sleepiness scale Karolinska Sleepiness Scale (KSS) [21]. For 10 36 640 x 480 30 39.37 Drowsy 9
simplicity the KSS scale was further divided into 2 broad
11 38 320 x 184 30 27.41 Drowsy 8
categories Active and Drowsy [22]. The Fig 9 shows the
KSS scale and its two category classification. 12 37 320 x 184 30 4.02 Active 2

From the table it can be inferred that the presented system


was successful in detecting human drowsiness. The predictions
of the human rater and the system matched in 11 out of the 12
cases tested. The wrongly classified case is highlighted in the
above table. Thus the system has an accuracy of 91.6%
accuracy.
VI. CONCLUSIONS AND FUTURE SCOPE
In this paper a drowsiness detection method on the images
obtained using web cameras is presented. The system uses the
Haar based cascade classifier for eye tacking and HOG-SVM
combination for eye blink detection. After blink detection the
PERCLOS value was calculated. If the PERCLOS value was
greater than 6seconds the person was detected to be drowsy
and else he was termed as active. The system was further
validated by comparing results with the observations of a
Fig 9: Karolinska Sleepiness Scale divided into two clases
human rater. The system gave results with accuracy
comparable to that of a human observer.
The human rater views the videos and determines whether
the subject is actually drowsy by rating based on the KSS scale The presented system performs well under the normal
lighting conditions and normal resolutions. The method is non
provided in the form of a graphical scale. The Fig 10 shows the
graphical scale used by the human rate. The results are intrusive and hence user friendly. It doesnt need any special
tabulated in the following Table 1. hardware other than a normal web camera. This makes the
system suitable to be implemented in desktop computers,
mobile devices and so on.
This method can be used in wide variety of applications
like driver alertness measurement, liveliness detection,
concentration measurement, measure of attentiveness etc.

REFERENCES
[1] Sandberg, David, et al. The characteristics of sleepiness during
real driving at nighta study of driving performance,
physiology and subjective experience, Sleep 34.no.10, pp.1317
2011.
[2] Lin, C. T., Chang, C. J., Lin, B. S., Hung, S. H., Chao, C. F., &
Wang, I. J. A real-time wireless braincomputer interface

Fig 10: Graphical scale used by human rater

185
system for drowsiness detection, In IEEE Transactions on [18] Wierwille, W. W. Historical perspective on slow eyelid
Biomedical Circuits and Systems, 4(4), 214-222, 2010. closure: Whence PERCLOS, Ocular Measures of Driver
[3] Lin, C. T., Chen, Y. C., Huang, T. Y., Chiu, T. T., Ko, L. W., Alertness, Technical Conference Proceedings. 1999.
Liang, S. F., ... & Duann, J. R. Development of wireless brain [19] Schiffman, H.R., Sensation and Perception. An Integrated
computer interface with embedded multitask scheduling and its Approach, New York: John Wiley and Sons, Inc., 2001
application on real-time driver's drowsiness detection and [20] https://www.ucl.ac.uk/media/library/blinking
warning, In IEEE Transactions on Biomedical Engineering,
[21] Murray W Johns, What is excessive day time sleeping?, 2009.
55(5), 1582-1591, 2008.
[22] Trutschel, Udo, et al. PERCLOS: An alertness measure of the
[4] Picot, Antoine, Sylvie Charbonnier, and Alice Caplier. "On-line
past In Proceedings of the Sixth International Driving
automatic detection of driver drowsiness using a single
Symposium on Human Factors in Driver Assessment, Training
electroencephalographic channel, In proceedings of 30th
and Vehicle Design. 2011.
Annual International Conference of the Engineering in Medicine
and Biology Society, 2008. EMBS, IEEE, 2008.
[5] Pal, Nikhil R., et al. EEG-based subject-and session-
independent drowsiness detection: an unsupervised approach,
EURASIP Journal on Advances in Signal Processing, 2008.
[6] Lin, Chin-Teng, et al. EEG-based drowsiness estimation for
safety driving using independent component analysis, Circuits
and Systems I: Regular Papers, IEEE Transactions on 52.12, pp
2726-2738, 2005.
[7] Sahayadhas, Arun, Kenneth Sundaraj, and Murugappan
Murugappan. "Drowsiness detection during different times of
day using multiple features." Australasian Physical &
Engineering Sciences in Medicine 36.2, pp.243-250, 2013.
[8] Kurian, D., Johnson Joseph, P. L., Radhakrishnan, K., &
Balakrishnan, A. Drowsiness Detection using
Photoplethysmography Signal. In Fourth International
Conference on Advances in Computing and Communications
(ICACC), pp. 73-76, IEEE, 2014.
[9] Patel, S. P., Patel, B. P., Sharma, M., Shukla, N., & Patel, H. M.
Detection of Drowsiness and Fatigue level of Driver, In
International Journal for Innovative Research in Science and
Technology, 1(11), 133-138, 2015.
[10] Jo, J., Lee, S. J., Jung, H. G., Park, K. R., & Kim, J. Vision-
based method for detecting driver drowsiness and distraction in
driver monitoring system, Optical Engineering, 50(12), 2011.
[11] Dasgupta, A., George, A., Happy, S. L., & Routray, A. A
vision-based system for monitoring the loss of attention in
automotive drivers, In IEEE Transactions on Intelligent
Transportation Systems, 14(4), 1825-1838, 2013.
[12] You, Chuang-Wen, et al. CarSafe: a driver safety app that
detects dangerous driving behavior using dual-cameras on
smartphones, In Proceedings of the 2012 ACM Conference on
Ubiquitous Computing, ACM, 2012.
[13] Viola, Paul, and Michael Jones. Rapid object detection using a
boosted cascade of simple features, In Proceedings of the 2001
IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR), Vol. 1. IEEE, 2001.
[14] Freund, Y., Schapire, R.E.: A short introduction to boosting
J. Jap. Soci. Artif. Intell. 14(5), 771780, 1999.
[15] Oguz, . "The proportion of the face in younger adults using the
thumb rule of Leonardo da Vinci." Surgical and Radiologic
Anatomy 18.2, pp.111-114, 1996.
[16] Dalal, Navneet, and Bill Triggs. Histograms of oriented
gradients for human detection In proceedings of IEEE
Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR) 2005, vol. 1, pp. 886-893. IEEE, 2005.
[17] Cristianini, Nello, and John Shawe-Taylor. An introduction to
support vector machines and other kernel-based learning
methods. Cambridge university press, 2000.

186

You might also like