You are on page 1of 69

(93 5 )

/() ( ) () ( ( ( ) () ) () )

: :

91522087 94

7 19

1. 2. 3.


:/ Support Vector Machine, SVM


................................................................................................................ I .............................................................................................................. II ............................................................................................................. III ............................................................................................. 1 ..................................................................................... 2 ............................................................. 3 ..................................................................................... 4 ..................................................................................... 5 ......................................................................... 6 ............................................................................................. 7 ............................................................................................. 8 ................................................................................. 9

/ /

; ;

A Driver Drowsiness Detection Based on An Active IR illumination

Student: Kuang-Siyong Chang Advisor: Prof. Din-Chang Tseng

June 2005 Institute of Computer Science and Information Engineering National Central University Chung-li, Taiwan 320

Submitted in partial fulfillment of the requirements for the degree of Master of Computer Science and Information Engineering from National Central University

Abstract
An active computer vision system is proposed to extract various visual cues for drowsiness detection of drivers. The visual cues include eye close/ open, eye blinking, eyelid movement, and face direction. The proposed system consists of four parts: an active image acquisition equipment, eye detector, eye tracker, and visual cue extractor. For working in various ambient light conditions, we used an IR camera equipped with a blinking IR illuminator to acquire derivers pupils and face for detecting and tracking eyes. The bright and dark pupil images acquired by the active equipment share the same background and external illumination; we can simply subtract the two images to extract pupils. Based on the location of a pupil, the eye region is clipped to be verified by the SVM method. If the detection is success in consecutive three frames, the procedure is turned to tracking phase. There are two stages in the tracking phase. The first-stages method is the same the detection method. If it is fail, the second tracking strategy is launched based on the matching principle. In experiments, we conduct several experiments with various ambient light conditions, such as day and night to evaluate the proposed system. From the experimental results, we find that the proposed approach can accurately detect and track eyes in the various ambient light conditions.

- ii -

Contents
Abstract ......................................................................................................... ii Contents ........................................................................................................ iii List of Figures ............................................................................................... v List of Tables ............................................................................................... vii Chapter 1 Introduction .................................................................................. 1 1.1 Motivation ........................................................................................ 1 1.2 System overview .............................................................................. 2 1.3 Thesis organization ........................................................................... 2 Chapter 2 Related Works ............................................................................... 4 2.1 Techniques for detecting drowsiness ................................................ 4 2.2 Existed drowsiness detection systems .............................................. 5 2.2.1 Drivers feature detection ....................................................... 5 2.2.2 Drowsiness judgment ........................................................... 11 Chapter 3 An Active Image Acquisition Equipment ................................... 13 3.1 Introduction of bright/dark pupil phenomenon .............................. 13 3.2 Two-ring IR illuminator .................................................................. 15 3.3 Hardware and software architectures ............................................. 18 Chapter 4 Eye Detection ............................................................................. 20 4.1 Subtraction ...................................................................................... 21 4.2 Adaptive thresholding .................................................................... 22 4.3 Connected-component generation .................................................. 22 4.4 Geometric constraints ..................................................................... 23 4.5 Eye verification using support vector machine (SVM) .................. 24 4.5.1 Introduction of support vector machine ............................... 24 4.5.2 Training data ........................................................................ 26 Chapter 5 Eye Tracking ............................................................................... 28 5.1 Prediction ........................................................................................ 29 5.1.1 Basic concept ....................................................................... 29 5.1.2 Normal update ...................................................................... 30 5.1.3 Tracking-fail update ............................................................. 31 5.2 Two-stage verification .................................................................... 32 Chapter 6 Visual Cue Extraction ................................................................. 34 - iii -

6.1 Eye close/open ................................................................................ 6.2 Face orientation .............................................................................. Chapter 7 Experiments ................................................................................ 7.1 Experimental platform .................................................................... 7.2 Experimental results ....................................................................... 7.2.1 Eye detection ........................................................................ 7.2.2 Eye tracking ......................................................................... 7.2.3 Discussions ........................................................................... Chapter 8 Conclusions and Future Work .................................................... 8.1 Conclusions .................................................................................... 8.2 Future work .................................................................................... References ...................................................................................................

34 35 37 37 37 37 39 39 44 44 44 45

- iv -

List of Figures
Fig.1.1. The process steps of the driver drowsiness detection system. ......... 3 Fig.2.1. Eye deformable templates. ............................................................... 6 Fig.2.2. The masks of dimension (2 Rmax + 1) (2 Rmax + 1) that are convoluted with the gradient image. ............................................... 7 Fig.2.3. Face segmentation. (a) Original color image. (b) Skin segmentation. (c) Connected components. (d) Best-fit ellipses. ..... 8 Fig.2.4. The first Purkinje image. ................................................................. 9 Fig.2.5. Composition of image capture apparatus. ..................................... 10 Fig.2.6. Difference of shapes around the eyes. ........................................... 10 Fig.2.7. Evaluation criteria for brain waves, blinking, and facial expression. ..................................................................................... 12 Fig.2.8. Number of eyes close times and alertness level. ........................... 12 Fig.3.1. Principle of bright and dark pupil effects. (a) Bright pupil effect. (b) Dark pupil effect. ..................................................................... 14 Fig.3.2. Examples of bright/dark pupils (a) Bright pupil image. (b) Dark pupil image. ................................................................................... 14 Fig.3.3. IR light source configuration. ........................................................ 15 Fig.3.4. The active image acquisition equipment. (a) Front view. (b) Side view. ............................................................................................... 16 Fig.3.5. IR LEDs control circuit. ................................................................... 17 Fig.3.6. Connection diagram of image acquisition equipment and PC. ..... 17 Fig.3.7. Three different configuration types of DirectShow filter. (a) Recording. (b) Live. (c) Playback. ................................................ 19 Fig.4.1. Steps for eye detection. .................................................................. 20 Fig.4.2. An example of subtraction. (a) The C frame image (bright pupil image). (b) The L frame image (dark pupil image). (c) The difference image. ............................................................... 21 Fig.4.3. Some examples of positive and negative sets. (a) Positive bright pupil set. (b) Positive dark pupil set. (c) Negative non-eye set. .... 27 Fig.5.1. The flowchart of two-strategy eye tracking. .................................. 28 Fig.5.2. The concept of prediction and tracking. ........................................ 29 Fig.5.3. The ongoing discrete Kalman filter cycle. ..................................... 30 -v-

Fig.5.4. A complete diagram of prediction. ................................................. Fig.5.5. The steps of search strategy. .......................................................... Fig.6.1. Eye open/close cue. (a) Image of opened eye. (b) Binary image of opened eye. (c) Image of closed eye. (d) Binary image of closed eye. ................................................................................................. Fig.6.2. An example of locating the nose position. ..................................... Fig.7.1. Results of eye detection in different light situations. (a) Strong light. (b) Daytime. (c) Indoor light. (d) Indoor light. (e) Indoor light. (f) In the Dark. ...................................................................... Fig.7.2. The eye tracking results in a sequence of consecutive frames. The face is looking downward. ............................................................. Fig.7.3. The eye tracking results in a sequence of consecutive frames. The face is looking to left. .................................................................... Fig.7.4. The eye tracking results in a sequence of consecutive frames. The face is looking to right. .................................................................. Fig.7.5. The false detected eye region. .......................................................

31 33

35 36

38 40 41 42 43

- vi -

List of Tables
Table 2.1. Technique for Detecting Drowsiness ........................................... 4 Table 7.1. The Detection Rate of Three Different Testing Videos .............. 39

- vii -

Chapter 1 Introduction
In this chapter, we describe the motivation of our study, present the overview of our drowsiness detection system and give the organization of this thesis.

1.1 Motivation
It is a hard endurance for drivers to take a long-distance driving. It is also difficult for them to pay attention to driving on the entire trip unless they strong willpower, patience, and persistence. Thus, the driver drowsiness problem has become an important factor of causing traffic accidents, and then the driver assist and warning systems were promoted to detect the drivers status consciousness [6]. We have focus on the development of computer vision techniques extract the useful visual cues for the detection. The acquired images should give relatively consistent photometric property under different climatic and ambient conditions; the images should also produce distinguishable features to facilitate the subsequent image processing. In this study, we used an IR camera and blinking IR illuminator. The use of infrared illuminator has three purposes: (i) It minimizes the impact of different ambient light conditions, therefore ensuring image quality under varying real-word conditions including poor illumination, day, and night tours. (ii) It allows producing the bright pupil effect, which constitutes the foundation for detecting and tracking the proposed visual cues. (iii) Infrared is barely visible to the driver; thus, the IR illumination will minimize any interference with the drivers driving.

-1-

1.2 System overview


The main tasks in the proposed system are to detect, track, extract ad analyze the drivers eye and head movement for monitoring drivers status of consciousness. The proposed system is consisted of four components: an active image acquisition equipment, eye detection, eye tracking, and visual cue extraction as shown in Fig.1.1. For working under various light environments, we used an IR camera equipped with a blinking IR illuminator to acquire derivers pupils and face. The system is detection and tracking derivers eye for monitoring drivers state of consciousness. The alternative images are obtained by the active image acquisition equipment. Then detect and tracking eye position in alternative images. After obtain the eye position, we extracted the necessary visual cues such as eyelid movement and head movement to detect the drowsiness of driver.

1.3 Thesis organization


The remaining sections of this thesis are organized as follows. Chapter 2 is the survey of the related works. In Chapter 3, we introduce the bright and dark pupil effects. The blinking IR illuminator based on the effects is also presented in this chapter. Chapter 4 describes the eye detection and verification methods. Chapter 5 describes the eye tracking method. Chapter 6 presents the eye close/open and face orientation estimation methods. Chapter 7 is reports experiments and the results. Chapter 8 is the conclusions and some suggestions for future works.

-2-

Fig.1.1. The process steps of the driver drowsiness detection system.

-3-

Chapter 2 Related Works


In this chapter several related works to our study are surveyed.

2.1 Techniques for detecting drowsiness


Ueno et al. [23] presented possible techniques for detecting drowsiness of drivers that can be broadly divided into five major categories as shown in Table 2.1. Table 2.1. Technique for Detecting Drowsiness Detection techniques Description

Detection by changes in brain waves, Physiological blinking, heart rate, skin electric potential, Sensing of signals etc. human Detection by changes in inclination drivers physiological Physical head, sagging posture, frequency at which phenomena reactions eyes close, grapping force on steering wheel, etc. Detection by changes in driving operations Sensing of driving operation (steering, accelerator, braking, shift lever, etc.) Detection by changes in vehicle behavior Sensing of vehicle behavior (speed, lateral G, yaw rate, lateral position, etc.) Response of driver Detection by periodic request for response Detection by measurement of traveling time Traveling conditions and conditions (day hour, nigh hour, speed, etc.) Among these methods, the best one is based on the physiological -4-

phenomena and which can be accomplished by two ways. One way is to measure the changes of physiological signals, such as brain waves, eye blinking, heart rate, pulse rate, and skin electric potential, as means of detecting a drowsy situation. The approach is suitable for making accurate and quantitative judgments of alertness levels; however, it must annoy drivers to attach the sensing electrodes on the body directly. Thus, it would be difficult to use based on the sensors under real-world driving condition. The approach has also the disadvantage of being ill-suited to measure over a long period of time owing to the large effect of perspiration to the sensors. The other way focuses on the physical changes, such as the inclination of the drivers head, sagging posture, and decline in gripping force on steering wheel or the open/closed state of the eyes. The measurements of these physical changes are classified into the contact and the non-contact types. The contact type involves the detection of movement by direct means, such as using a hat or eye glasses or attaching sensors to the drivers body. The non-contact type uses optical sensors or video cameras to detect the changes.

2.2 Existed drowsiness detection systems


Many active systems [5-7, 10-12, 15, 16, 19, 20, 23, 25] have been proposed for detecting driver drowsiness to avoid traffic accidents. Typical drossiness detection systems are generally classified into drivers feature detection and drowsiness judgment.

2.2.1 Drivers feature detection


The primary task of drivers feature detection is to detect drivers face and eyes through the techniques of image processing and compute vision. -5-

Yuille et al. [26] presented a method for detecting and describing eye features using deformable templates as shown Fig.2.1. The eye feature is described by a parameterized template and an energy function is used to define the links of edges, peaks, and valleys in the image intensity to the corresponding properties of the template. The template then interacts dynamically with the image, by altering its parameter values to minimize the energy function, thereby deforming itself to find the best fit.

Fig.2.1. Eye deformable templates.

Bakic and Stockman [1] used a skin color model (normalized RGB) within a connected components algorithm to extract a face region. The eyes and nose is found based on the knowledge of the face geometry. The eye blob is found by gradual thresholding smoothed red component intensity. For each thresholded image, the connected components algorithm is used to find dark blobs. Each two blobs that are candidates for the eyes are matched to find the nose. To find the nose, the image of the red component intensity is thresholded at average intensity. Frames are processed in sequence and a Kalman filter is used to smooth the feature point coordinates over time. Next frame is then predicted; if the predictions are verified, the initial frame processing is passed; if predictions are not verified, the entire process -6-

described above is repeated. DDorzio et al. [3] presented an approach based on the phenomenon that iris is always darker than the sclera no matter what color it is. In this way, the edge of the iris is relatively easy to detect. Then a circle detection operator based on the directional Circular Hough Transform is used to locate the iris. A range [Rmin, Rmax] is set to tackle different iris dimensions. In this algorithm is based on convolutions applied to the edge image. The masks shown in Fig.2.2 represent in each point the direction of the radial vector scaled by the distance form the center in a ring with minimum radius Rmin, and maximum radius Rmax. The circle detection operator is applied on the whole image without any constraint on plain background or limitations on eye regions. Search maximum value M1 of the output convolution in the whole image is best candidate to contain an eye. Search the second maximum value M2 in the region that is candidate to contain the second eye. Then verify whether M1 and M2 are eyes pair or not.

Fig.2.2. The masks of dimension (2 Rmax + 1) (2 Rmax + 1) that are

convoluted with the gradient image.

Sobottka and Pitas [20] presented an approach for face localization based on the face oval shape and skin color, as one example shown in Fig.2.3.

-7-

(a)

(b)

(c)

(d)

Fig.2.3. Face segmentation. (a) Original color image. (b) Skin segmentation. (c) Connected components. (d) Best-fit ellipses.

Huang and Marianu [9] presented a method to detect human face and eyes. They used multi-scale filters to get the pre-attentive features of objects. Then these features are supplied to three different models to analyze the image further. The first is a structural model that partitions the features into facial candidates. After they obtain a geometric structure that fits their constraints they use affine transformations to fit the real world face. Secondly, they used a texture model to measure color similarity of a candidate with the face model, which includes variation between facial regions, symmetry of the face, and color similarity between regions of the face. The texture comparison relies on the cheek regions. Finally they used a feature model to obtain the candidate position of the eyes, and they used eigen-eyes combined with image feature analysis for eyes detection. Then they zoom in on the eye region and -8-

perform more detailed analysis. Their analysis includes Hough transforms to find circles and reciprocal operations using contour correlation. Shih et al. [19] presented a system using 3-D vision techniques to estimate and track the 3-D sight line of a person. The approach uses multiple cameras and multiple point light sources to estimate the line of sight without using user dependent parameters, thus avoiding cumbersome calibration processes. The method uses a simplified eye model, and it first uses the Purkinje images of an infrared light source to determine eye location. When light hits a medium part is reflected and part is refracted. The first Purkinje image is the light reflected by the exterior cornea as shown in Fig.2.4 [11]. Then they use linear constraints to determine the line of sight, based on their estimation of the cornea center.

Lens

Cornea

Point Light Source (LED)


The 1st Purkinje Image

Image Plane Pims

Optical Axis

O Optical Center

Fig.2.4. The first Purkinje image.

Hamada et al. [5] presented a capture system and image processing to solve in measuring the blinking of a driver during driving. First, they have developed a capture method against changes in the surrounding illumination, -9-

as shown in Fig.2.5. Then an image processing deals with the difference in the shapes of faces and the shapes around the eyes of individual people, as shown in Fig.2.6. It deals with the difference in blinking waveforms that differs from individual to individual. Finally, it presumes the drivers consciousness from the changing blinking period.

Polarizing filter

Infrared light

Controller

CCD camera Polarizing filter IR pass filter

Fig.2.5. Composition of image capture apparatus.

Shadow by swelling eyeball

Wrinkle and shadow Wrinkle and shadow Shadow by swelling of eyeball

Fig.2.6. Difference of shapes around the eyes.

- 10 -

2.2.2 Drowsiness judgment


The primary purpose of drowsiness judgment is to judge drivers drowsiness situation by different cues. Ueno et al. [23] estimated drivers alertness level by detecting number of eyes close, and they verified that using brain wave ( wave). They devise an alertness index for quantitative judgments of drivers situation of drowsiness. This index is based on the assignment of points to brain waves, blinking and facial expression. The point total provides a quantitative measure for judging the alertness level. The specific procedure for rating these three elements is shown in Fig.2.7. As a persons level of alertness drops, a large number of 2 waves appear and their amplitude becomes larger. Blinking is rated by evaluating the measured waveforms for the upper and lower electric potential of the eyes. In a normal state of alertness, blinking appears as sharp spikes in the waveform. As the level of alertness drops, the spikes appear more frequently and subsequently lose their shape to become a gentle waveform when a person becomes drowsy. Eventually, the waveform shows trapezoidal shapes indicating that the eyes close for long interval. Fig.2.8 presents experimental results showing the alertness level and the number of times the driver eyes closed for two or more seconds while driving on a test course. This result indicated that a reduced level of alertness could be detected with good accuracy by monitoring changes in the degree of openness of the drivers eyes.

- 11 -

Fig.2.7. Evaluation criteria for brain waves, blinking, and facial expression.

Fig.2.8. Number of eyes close times and alertness level.

Smith et al. [18, 19] presented a system not only for detecting drowsiness of driver but also for analyzing human driver visual attention. The system relies on estimation of global motion and color statistics to track a persons head and facial features. The system classifies rotation in all viewing directions, detects eye/mouth occlusion, detects eye blinking and eye closure, and recovers the three dimensional gaze of the eyes to determining drivers visual attention by a hierarchical detecting and tracking method. - 12 -

Chapter 3 An Active Image Acquisition Equipment


The proposed eye detecting and tracking methods are based on the special bright and dark pupil effects. In this chapter we explain what bright and dark pupil effects are. We obtained bright and dark pupil effects image by our active image acquisition equipment that consists of an IR camera and a two rings infrared illuminator. We illustrate the configuration. And then describe our hardware and software architectures.

3.1 Introduction of bright/dark pupil phenomenon


According to the original patent from Hutchinson [10], a bright pupil can be obtained if the eyes are illuminated with a NIR illuminator beaming light along the camera optical axis at certain wavelength. At the NIR wavelength, pupils reflect almost all IR light they receive along the path back to the camera, producing the bright pupil effect, very much similar to the red eye effect in photography. If illuminated off the camera optical axis, the pupils appear dark since the reflected light will not enter the camera lens. This produces the so-called dark pupil effects. Fig.3.1 is illustrated the principle of bright and dark pupil effects. An example of the bright and dark pupil image is shown in Fig.3.2.

- 13 -

IR light source Incident infrared light

Reflected infrared light

(a)
IR Light source Incident infrared light

Reflected infrared light

(b) Fig.3.1. Principle of bright and dark pupil effects. (a) Bright pupil effect. (b) Dark pupil effect.

(a)

(b)

Fig.3.2. Examples of bright/dark pupils (a) Bright pupil image. (b) Dark pupil image.

- 14 -

3.2 Two-ring IR illuminator


We used a simple geometric disposition of the IR LEDs similar to that of morimoto et. al. [24] that can achieve bring and dark pupil effects but with minimal reduction in camera operational view. This IR illuminator consist two sets of IR LEDs, distributed evenly and symmetrically along the circumference of two coplanar concentric rings as shown in Fig.3.3. We used a USB infrared camera acquisition bright and dark pupil images. Mounting our two-ring IR illuminator in front of the camera, a dark pupil image is produced if the outer ring is turned on and a bright pupil image is produced if the inner ring is turned on. A physical set-up of the IR illuminator is shown in Fig.3.4.

Outer ring IR LEDs

Inner ring IR LEDs IR illuminator USB IR camera

Camer

Front View

Side view

Fig.3.3. IR light source configuration.

- 15 -

(a)

(b)

Fig.3.4. The active image acquisition equipment. (a) Front view. (b) Side view.

The IR illuminator control circuit is shown in Fig.3.5. Inner and outer rings are individually used a transistor that connect to RS-232 to control turn on or turn off. The control transistor of inner ring connects to DTR pin, and another rings transistor connects to RTS pin. The ring is turned on if its voltage of the control pin is set high; otherwise, the ring is turned off. Turn on and off is synchronized with the image capture. The Connection diagram of image acquisition equipment and PC is shown in Fig.3.6.

- 16 -

5V

330

330

RS-232 (DTR) Pin 4 (RTS) Pin 7

1K

0913

Fig.3.5. IR LEDs control circuit.

USB

Camera

Inner signal Outer signal

RS-232

Fig.3.6. Connection diagram of image acquisition equipment and PC.

- 17 -

3.3 Hardware and software architectures


For synchronization inner and outer LEDs with every frame of the alternative image, We have developed a program to synchronize the outer ring of LEDs and inner ring of LEDs with the even and odd frame of the alternative image respectively so that can be turned on and off alternatively. We utilized Microsoft Foundational Classes (MFC) library and DirectShow SDK to develop our USB Camera capture and IR switch control program. The building block of DirectShow is a software component called a filter. A filter is a software component that performs some operation on a multimedia stream. For example, DirectShow filters can read files, get video from a video capture device, decode various stream formats, such as MPEG-1 video, and pass data to the graphics or sound card. In DirectShow, an application performs any task by connecting chains of filters together. In our program used three different configuration type of DirectShow filter: recording, playback and living, as shown in Fig.3.7. Recording type is used to record video to file. Living type is used to perform eye detection and tracking in real-time. Playback type can play the video file that is recorded by recording type frame by frame. It can help us to develop our eye detection and tracking algorithm. In these types, a special DirectShow filter is Sample Grabber filter. It is a transform filter that can be used to grab media samples from a stream as they pass through the filter. It provides a way to retrieve samples as they pass through the filter graph. When a sample is retrieved, it notifies us. Therefore we can process this sample and switch inner and outer ring LEDs.

- 18 -

Capture source filter

Sample grabber

AVI Mux

AVI fIle

IR switch control

(a)
Capture source filter Sample grabber AVI decompressor Video render

Eye detection/tracking & IR switch control

(b)
File source filter Sample grabber AVI decompressor Video render

Eye detection/tracking

(c) Fig.3.7. Three different configuration types of DirectShow filter. (a) Recording. (b) Live. (c) Playback.

- 19 -

Chapter 4 Eye Detection


In this chapter we described procedure of eye detection as shown in Fig.4.1. All function will be described in the following sections, respectively.

Alternative images L frame 2 C frame 1

Subtraction difference image Adaptive Thresholding binary image

Connected-component generation blobs Geometric constraints eye candidates Eye verification using SVM Eyes Fig.4.1. Steps for eye detection.

- 20 -

4.1 Subtraction
The C frame is current obtained frame and the L frame is the stored frame in the last time. If C frame is bright pupil image, then L fame is dark pupil image, and vice versa. While both images share the same background and external illumination, pupils in the bright pupil image look significantly brighter than in the dark pupil image, as shown in Fig.4.2. To eliminate the background and reduce external light illumination, the C frame is subtracted from the L frame producing the difference image as shown in Fig.4.2 (c), in which most of the background and external illumination effects are removed.

(a)

(b)

(c) Fig.4.2. An example of subtraction. (a) The C frame image (bright pupil image). (b) The L frame image (dark pupil image). (c) The difference image. - 21 -

4.2 Adaptive thresholding


After a subtraction, we can obtain a difference image. The difference image is a gray-level image and will be thresholded to extract pupils. The pupils region in the difference image are bright than the background; pixels with gray level less than threshold value Tdf are labeled black; otherwise, labeled white. Thus the pupil pixels are white in the binary image. The adaptive thresholding algorithm is presented as follows. Step 1. Select an initial threshold value T; usually it is set 128. Step 2. Threshold the image with threshold value T. This will produce two groups of pixels: G1, consisting of all pixels with intensity values >= T, and G2, consisting of pixels with values < T. Step 3. Compute the average intensity values 1 and 2 for the pixels in regions G1 and G2. Step 4. Compute a new threshold value: Step 5. T = 1
2

(1 + 2 ) .

Step 6. Repeat Steps 2 through 4 until the difference whit T in successive iterations is smaller than a predefined parameter T0. Step 7. Finally we obtained a threshold value T. we adjust this value to suit the proposed system. If (T >= 40) {Tdf = 255 T 40} else {Tdf = 40}.

4.3 Connected-component generation


After adaptive thresholding, we obtained a binary image. The next task is to apply connected components algorithm, label each bright pixel, and find the center, size and boundary box of each blob. Let I be a binary image and let F and B be the foreground and background pixel subsets in I, respectively. A connected component of I, here

- 22 -

referred to as C is a subset of F of maximal size such that all the pixels in C are connected. Two pixels, p and q are connected if there exists a path of pixels (p0, p1, , pn) such that p0 = p, pn = q and 1 i n , pi-1 and pi are neighbors. Here, the definition of connected component relies on that of a pixels neighborhood: if all paths between pixels in C are 4 connected, then C is an 4-connected component. The classical sequential algorithm for labeling connected components consists of two subsequent raster-scans of I. In the first scan a temporary label is assigned to each pixel F based on the values of its neighbors already visited by the scan. For 4-neighbor connected components, the pre-visited neighbors are upper and left neighbor pixels. As a result of the scan, no temporary label is assigned to pixels belonging to different components but different labels may be associated with the same component. Therefore, after completion of the first scan equivalent labels are sorted into equivalence classes and a unique class identifier is assigned to each class. Then a second scan is run over the image so as to replace each temporary label by the class identifier of its equivalence class. The classical sequential method is not efficient enough. So we implemented a simple and efficient 4-connected components labeling algorithm [21].

4.4 Geometric constraints


After connected-component generation, we obtained many binary candidate blobs. Pupils are found somewhere in these candidates. However, it is usually not possible to isolate eye blob only by picking the right threshold value, since pupils are often small and not bright enough compared with other noise blobs. We can distinguish pupils blobs with other noise blobs by their geometric shapes. In our system, we defined several constraints to make the - 23 -

distinction as follow: (i) 1 < blob size < 60, (ii) 1 < Width of blob bounding box < 20, (iii) 1 < Height of blob bounding box < 20, and (iv) (Bounding box size - blob size) < 10.

4.5 Eye verification using support vector machine (SVM)


With the proposed geometric constraints, several non-pupil blobs may leave because they are similar the pupil blob in shape and size and we cant distinguish the real pupil from them. So we need another feature to separate pupil and non-pupil blobs. Here we use Support Vector Machine [24] classification to verify whether each candidate blob is an eye or not.

4.5.1 Introduction of support vector machine


For the case of two-class pattern recognition, the task of predictive learning from examples can be formulated as follows [24]. Given a set of functions

{ f : }, f : R N

{ 1,+1}

(4.1)

( is an index set) and a set of L examples

(x1, y1 ),L, (xi , yi ),L, (xl , yl ), xi R N , yi { 1,+1},

(4.2)

each one generated from an unknown probability distribution P(x,y), one wants to find a function f* which provides the smallest possible value for the risk R( ) = f ( x ) y dP( x, y ) .
- 24 -

(4.3)

The SVM implementation seeks separating hyperplanes D(x) defined as D( x ) = (w x ) + w0 (4.4)

by mapping the input data x into a higher dimensional space z using a nonlinear function g. Low weights w defining the class boundaries imply low VC-dimension and lead to high separability between the class patterns. An optimal hyperplane has maximal margin. The data points at the (maximum) margin are called the support vectors since they alone define the optimal hyperplane. The reason for mapping the input into a higher dimension space is that this mapping leads to better class separability. The complexity of SVM decision boundary, however, is independent of the feature z space dimensionality, which can be very large (or even infinite). SVM optimization takes advantage of the fact that the evaluation of the inner products between the feature vectors in a high dimensional feature space is done indirectly via the evaluation of the kernel H between support vectors and vectors in the input space

(z z) = H ( x x ) ,
space. In the dual form, the SVM decision function has the form D( x ) = i yi H (xi , x) .
M i =1

(4.5)

where the vectors z and z are the vectors x and x mapped into the feature

(4.6)

The RBFs kernels H are given by


2 x xi H ( xi , x) = exp , 2

(4.7)

and the corresponding SVM hyperplanes are defined then as - 25 -

2 M x x i f ( x ) = sign i exp + b , 2 i =1

(4.8)

and can be fully specified using dual quadratic optimization in terms of the number of kernels used M and their width. The polynomial kernels H of degree q are given by
H ( x , x) = [( x x) + 1]q , and the corresponding SVM hyperplanes are defined as (4.9)

M f ( x ) = sign i [( x x) + 1]q + b . i =1

(4.10)

4.5.2 Training data


Training data are needed to obtain the optimal hyper-plane. The size of image we use 3020 pixels and the image pixel are processed using histogram equalization and normalized to [0, 1] range before training. The eye training images were divided into three set: positive bright pupil set, positive dark pupil set, and negative set. In the whole positive image set, we include eye images of different gazes, different degrees of opening, different subjects, and with/without glasses. The non-eye images were placed in the negative image set. Some examples of eye and non-eye images in the training sets, as shown in Fig.4.3. SVM can work under different illumination conditions due to the intensity normalization for the training images via histogram equalization.

- 26 -

(a)

(b)

(c) Fig.4.3. Some examples of positive and negative sets. (a) Positive bright pupil set. (b) Positive dark pupil set. (c) Negative non-eye set.

- 27 -

Chapter 5 Eye Tracking


In this chapter we present our tracking method. If eye detection is succeed in consecutive three frames, the procedure is turned to tracking phase. There are two strategies in the tracking phase. First strategy is eye detection in a predicted region. If eye detection in the predicted region is fail. The second strategy is then used. We search the center of eye darkest in predicted region. The detailed steps are shown in Fig.5.1 and described in the following sections.

Alternative images
detection phase

Eye detection Success?


yes no

Initial the predicted parameter

1st& 2nd success

tracking phase

Eye detection in predicted region Success?


no

Update the predicted parameter (normal update)

yes

Search darkest eyeball center in predicted region


yes

Success?
no

no Consecutive three frames

Tracking fail update

1st& 2nd fail

Fig.5.1. The flowchart of two-strategy eye tracking. - 28 -

5.1 Prediction
The concept of prediction and tracking is described in Fig.5.2; in which the position at time t+1 is predicted form the position and velocity of pupil blob at times t, t-1, and t-2.

Predicted position and search area at time t+1 (xt+1, yt+1)

(x t , y t )

Detected position at time t

Fig.5.2. The concept of prediction and tracking.

5.1.1 Basic concept


The Kalman filter is the most popular method to estimate the position of moving objects. Here we used Kalman filter concept to predict the new position of pupil. In 1960, R.E. Kalman [26] published his famous paper describing a recursive solution to the discrete-data linear filtering problem, and then the
Kalman filter has been the subject of extensive research and application,

particularly in the area of autonomous or assisted navigation. The Kalman


filter addresses the general problem of trying to estimate the state x R n of a

discrete-time controlled process with an observation z R m . We can use the - 29 -

models and observations in two ways. First, multiple observations z1, z2, should permit an improved estimate of the underlying model x. Second, the estimate of x at time k may also provide a prediction for the observation xk+1, and thereby for zk+1. Whereby we observe zk, estimate xk, predict xk+1 thereby predict zk+1, observe zk+1 taking advantage of the prediction, and then update our estimate of xk+1. A predictor-corrector algorithm for solving numerical problems is shown in Fig.5.3.

Fig.5.3. The ongoing discrete Kalman filter cycle.

5.1.2 Normal update


We here use the Kalman filter concept to predict the new positions of pupils. Let (xt, yt) represent the pupil center pixel at time t and (ut, vt) be its velocity at time t in x and y directions, respectively. The predicted position of pupils can be calculated by t +1 = xt + ut t x and t +1 = yt + vt t , y where t means the interval time between two frames. (5.2) (5.1)

- 30 -

Then we use the correct position of the pupil to update its velocity,
ut = (1 ) ut 1 + st , vt = (1 ) vt 1 + zt , s t = ( xt xt 1 ) / t , and zt = ( yt yt 1 ) / t ,

(5.3) (5.4) (5.5) (5.6)

where (st, zt) is the correct velocity of the pupil in time t, and , is the weight number to correct the (ut, vt). The complete diagram of predict operation as shown in Fig.5.4.

t +1 = xt + ut t x
t +1 = yt + vt t y

Fig.5.4. A complete diagram of prediction.

5.1.3 Tracking-fail update


The predicted region is based on the predicted position. Its width and height are invariant. Sometimes, our two strategies are failed because pupil is covered briefly by something. If our two strategies are failed in tracking phase at time t, equations - 31 -

t +1 = xt 1 + 2 ut 1 x and t +1 = yt 1 + 2 vt 1 , y are used to update predicted parameters.

(5.7)

(5.8)

If tracking is success in the next frame, we normally update the predicted parameter; otherwise tracking-fail update is done again. If the tracking failed in consecutive three frames, the procedure is turned to detection phase, and the predicted parameters will be clear and reinitialize.

5.2 Two-stage verification


After the prediction, we detect eye in the predicted region. The eye detection method is the same globe eye detection, only a difference is operator in a small region. It can avoid much unnecessary detection and save up much detection time. If the detection is fail, the searching stage is launched. If the SVM was not well trained or the pupils are not as bright due to either face orientation or external illumination interference, the detection may fails. Then the searching strategy is launched to search the eyeball center. The steps of the searching is described in Fig.5.5. In this strategy, we clip the predicted region form the current frame, and make a binary image from the clipped region with a threshold value Tdk. The threshold value is made by adaptive thresholding algorithm. Pixels with gray level greater than Tdk are labeled black; otherwise pixels are labeled white. Then employ connected component algorithms to find the center and boundary box of each blob. We find the largest size of blob and used its center as the fined eye center. We defined some constraints for to verify the fined eye position. The constraints - 32 -

are shown as: (i) 20 < blob size < 400. (ii) The distance between this position and position of last frame < 10.

Predicted eye region image Adaptive thresholding Binary image Connect-component generation Blobs Search the center of the largest blob
yes

Verify success?

no

The center of eye

Fail

Fig.5.5. The steps of search strategy.

- 33 -

Chapter 6 Visual Cue Extraction


In this chapter, we explain several methods to extract visual cues for drowsiness detection the visual cues include eye close/open and face orientation.

6.1 Eye close/open


After we obtained the positions of eyes, we determinant the eyes are opened or closed. At first, a binary image is generated from the eye region by the adaptive thresholding algorithm. Then, pixels which gray levels are larger than Tew are labeled as white and pixels which gray levels are less than Tew are labeled as black. The result can be expressed as
g ( x, y ) = 1, if g ( x, y ) > Tew , g ( x, y ) = 0, if g ( x, y ) Tew (6.1)

where g(x, y) is the gray level of pixel (x, y). We accumulate the white pixels in the eye region, n = g ( x, y ) ,
x =1 y =1 M N

(6.2)

where M is width of the eye region, N is height of eye region. We defined a threshold value Tcp. If n is greater than Tcp, we classify the eye into case of eye closing. If n is less than Tcp, we classify the eye into case of eye opening as examples shown in Fig.6.1.

- 34 -

(a)

(b)

(c)

(d)

Fig.6.1. Eye open/close cue. (a) Image of opened eye. (b) Binary image of opened eye. (c) Image of closed eye. (d) Binary image of closed eye.

6.2 Face orientation


The face orientation is also an important visual cue for detecting drowsiness. When the driver does not pay attention to forward direction for a long time, system should give a warning to driver. In this study, we used the positions of two eyes and nose to estimate the face orientation. After extracting two eyes, we utilize their positions to locate the position of nose. At first, we use the distance between two eye region centers to define square searching area as shown in Fig.6.2. The area is just below the eyes with side length as the mentioned distance. Secondly, in the area we accumulate the gray levels in both horizontal and vertical to fit the horizontal and vertical accumulation curve. We search along the horizontal accumulation curve and find the first valley point to be the y coordinates of nostrils, and we search along the - 35 -

vertical accumulation curve and find average of two valley points to be the x coordinates of the median of two nostrils.

Fig.6.2. An example of locating the nose position.

- 36 -

Chapter 7 Experiments
Several experiments and comparisons are reported in this chapter. At first, we introduce our develop platform. Secondly, we demonstrate several detection and tracking results.

7.1 Experimental platform


All algorithms were implemented with C++ programming language, Microsoft Foundational Classes (MFC) Library, and DirectShow SDK. All experiments were executed on a general PC with AMD AthlonTM 2500+ CPU and Microsoft Windows XP professional operation system.

7.2 Experimental results


All experimental image sequences were recorded by our active image acquisition equipment. The frame sizes are all 320240 pixels with 30 frame/sec rate.

7.2.1 Eye detection


The light conditions: strong light, normal light, and dark, were considered to evaluate the performance of the eye detector. Six results are shown in Fig 7.1 and we can see that the detector can work correctly in different ambient light conditions.

- 37 -

(a)

(b)

(c)

(d)

(e)

(f)

Fig.7.1. Results of eye detection in different light situations. (a) Strong light. (b) Daytime. (c) Indoor light. (d) Indoor light. (e) Indoor light. (f) In the Dark.

- 38 -

7.2.2 Eye tracking


Three videos were used to evaluate the eye tracker as shown in Figs.7.2-7.4. When the tester tuned his head, our tracker can continuously track their eye positions.

7.2.3 Discussions
In the videos, the proposed detector can all detected two regions, each contains an eye; but sometimes the detected center of eye region is not the actual eye as one example is shown in Fig.7.5. In Fig.7.5, the two eye regions are marked a blue rectangle, but the detected positions two eyes (the center of eye) are not correct. We checked all detected frames to count the detection rate as listed in Table 7.1. In the table, B means the false detection number in bright pupil frame, and D is the false detection number in dark pupil frame. Table 7.1. The Detection Rate of Three Different Testing Videos Video eye left right left right left right Total frame 157 130 379 Correct 124 127 110 121 369 374 Detection error B 12 9 0 2 4 2 D 21 21 20 7 6 3 Total error 33 30 20 9 10 5 Detection Rate 78.9% 80.8% 84.6% 93.0% 97.3% 98.6%

V1 V2 V3

- 39 -

(a)

(b)

(c)

(d)

(e)

(f)

Fig.7.2. The eye tracking results in a sequence of consecutive frames. The face is looking downward.

- 40 -

(a)

(b)

(c)

(d)

(e)

(f)

Fig.7.3. The eye tracking results in a sequence of consecutive frames. The face is looking to left.

- 41 -

(a)

(b)

(c)

(d)

(e)

(f)

Fig.7.4. The eye tracking results in a sequence of consecutive frames. The face is looking to right.

- 42 -

Fig.7.5. The false detected eye region.

- 43 -

Chapter 8 Conclusions and Future Work


In this chapter, we give conclusions and suggestions for future work.

8.1 Conclusions
In this study, we developed a prototype computer vision system with active image acquisition equipment for drowsiness detection of driver. It is detection, tracking the drivers eye in alternative images that are obtained by our active Image acquisition equipment. Therefore some visual cues can be extracting form the detected eyes. It can detect eye positions in the different ambient light conditions.

8.2 Future work


Several problems should improve in the future. First, The SVM eye verification is time consuming. Second, if the drivers head move too fast, the tracker can not tracking accurately. We will improve the tracking method. Thread, the extracted visual cues are not enough to detecting drivers status. We can add more visual cues such as gaze direction, facial expression, etc.

- 44 -

References
[1] Bakic, V. and G. Stockman, "Real-time tracking of face features and gaze direction determination," in Proc. IEEE Workshop Applications of Computer Vision, Princeton, NJ, Oct.19-21, 1998, pp.256-257. [2] Cortes, C. and V. Vapnik, "Support-Vector Networks," Machine Learning, vol.20, no.3, pp.273-297, 1995. [3] D'Orazio, T., M. Leo, G. Cicirelli, and A. Distante, "An algorithm for real time eye detection in face images," in Proc. 17th Int. Conf. Pattern Recognition, Cambridge, UK, Aug. 23-26, 2004, vol.3, pp.278-281. [4] Grace, R., V. E. Byrne, D.M. Bierman, J.-M. Legrand, D. Gricourt, B.K. Davis, J.J Staszewski, and B. Carnahan, "A drowsy driver detection system for heavy vehicles," in Proc. Conf. Digital Avionics Systems, Oct.31-Nov.7, 1998, vol.2, pp.I36/1-I36/8. [5] Hamada, T., T. Ito, K. Adachi, T. Nakano, and S. Yamamoto, "Detecting method for drivers' drowsiness applicable to individual features," in Proc. IEEE Int'l Conf. Intelligent Transportation Systems, Oct.12-15, 2003, vol.2, pp.1405-1410. [6] Haro, A., M. Flickner, and I. Essa, "Detecting and tracking eyes by using their physiological properties, dynamics, and appearance," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Hilton Head Island, SC, Jun.13-15, 2000, vol.1, pp.163-168. [7] Hayami, T., K. Matsunaga, K. Shidoji, and Y. Matsuki, "Detecting drowsiness while driving by measuring eye movement - a pilot study," in Proc. IEEE Intl Conf. Intelligent Transportation Systems, 2002, pp.156-161. [8] Horng, W.B., C.Y. Chen, Y. Chang, and C.H. Fan, "Driver fatigue - 45 -

detection based on eye tracking and dynamic template matching," in Proc. Int. Conf. Networking, Sensing and Control, Taiwan, Mar.21-23, 2004, pp.7-12. [9] Huang, W. and M. Robert., "Face detection and precise eyes location," in Proc. 15th Int. Conf. Pattern Recognition, Barcelona, Spain, Sep. 3-7, 2000, vol.4, pp.722-727. [10] Hutchinson, T.E., Eye Movement Detection with Improved Calibration and Speed, U.S. Patent 4950069, Apr. 1990. [11] Ito, T., S. Mita, K. Kozuka, T. Nakano, and S. Yamamoto, "Driver blink measurement by the motion picture processing and its application to drowsiness detection," in Proc. IEEE Conf. Intelligent Transportation Systems, Singapore, Sep.3-6, 2002, pp.168-173. [12] Ji, Q. and Y. Xiaojie, "Real-time eye, gaze, and face pose tracking for monitoring driver vigilance," Real-Time Imaging, vol.8, Issue.5, pp.357-377, Oct. 2002. [13] Moreno, F., F. Aparicio, W. Hernandez, and J. Paez, "A low-cost real-time FPGA solution for driver drowsiness detection," in Proc. IEEE Conf. Industrial Electronics Society, Nov.2-6, 2003, vol.2, pp.1396-1401. [14] Morimoto, C.H., D. Koons, A. Amir, and M. Flickner, "Pupil detection and tracking using multiple light sources," Image and Vision Computing, vol. 8, pp.331-335, 2000. [15] Parikh, P., and E. Micheli-Tzanakou, "Detecting drowsiness while driving using wavelet transform," in Proc. IEEE Conf. Annual Northeast Bioengineering, Apr.17-18, 2004, pp.79-80. [16] Popieul, J. C., P. Simon, and P. Loslever, "Using driver's head movements evolution as a drowsiness indicator," in Proc. IEEE Intelligent Vehicles Sym., Jun.9-11, 2003, pp.616-621. - 46 -

[17] Shih, S. W., Y. T. Wu, and J. Liu, "A calibration-free gaze tracking technique," in Proc. 15th Int. Conf. Pattern Recognition, Barcelona, Spain, Sep.3-7, 2000, vol.4, pp.201-204. [18] Smith, P., M. Shah, and N. da Vitoria Lobo, "Monitoring head/eye motion for driver alertness with one camera," in Proc. Intl. Conf. Pattern Recognition, Barcelona, Spain, Sep.3-8, 2000, vol.4, pp.636-642. [19] Smith, P., M. Shah, and N. da Vitoria Lobo, "Determining driver visual attention with one camera," in IEEE Trans. Intelligent Transportation Systems, Dec. 2003, vol.4, pp.205-218. [20] Sobottka, K. and I. Pitas, "Extraction of facial regions and features using color and shape information," in Proc. Int. Conf. Pattern Recognition, Vienna, Austria, Aug. Aug. 25-29, 1996, pp.421-425. [21] Stefano, L.D., and A. Bulgarelli, "A Simple and Efficient Connected Components Labeling Algorithm," in Proc. Int. Conf. Image Analysis and Processing, 1999, pp.322-327. [22] Terrillon, J. C., M. N. Shirazi, M. Sadek, H. Fukamachi, and S. Akamatsu, "Invariant face detection with support vector machines," in Proc. 15th Int. Conf. Pattern Recognition, Barcelona, Spain, Sep.3-7, 2000, vol.4, pp.210-217. [23] Ueno, H., M. Kaneda, and M. Tsukino, "Development of drowsiness detection system," in Proc. Conf. Vehicle Navigation and Information Systems, Yokohama, Japan, Aug.31-Sep.2, 1994, pp.15-20. [24] Vapnik, V., The Nature of Statistical Learning Theory, Springer-Verlag, 1995. [25] Vuckovic, A., D. Popovic, and V. Radivojevic, "Artificial neural network for detecting drowsiness from EEG recordings," in Proc. Int'l Conf. Neural Network Applications, Yugoslavia, Sep.26-28, 2002, pp.155-158. - 47 -

[26] Welch, G.., and G.. Bishop, An Introduction to the Klaman Filter, 1997. [27] Yuille, A. L., D.S. Cohen, and P. W. Hallinan, "Feature extraction from faces using deformable templates," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Diego, CA, Jun.4-8, 1989, pp.104-109. [28] Zhu, Z., K. Fujimura, and Q. Ji, "Real-Time eye detection and tracking under various light conditions," in Proc. Eye Tracking Research & Application Sym., New Orleans, Louisiana, 2002, pp.139-144.

- 48 -

You might also like