You are on page 1of 13

Xavier University Ateneo de Cagayan College of Engineering Department of Electronics Engineering

Research Method ACE 08 D

Date Submitted: September 1, 2011

Submitted by: Adrian Y. Zayas

Remarks: Instructor: Engr. Mary Jean O. Apor Signature:

I. Research Title: Converting Filipino sign language into its equivalent in sound. II. Name of Proponent: Adrian Y. Zayas III. Background of the Study: Since their existence, humans have been physically and mentally challenged with various impediments that hinder their advancement on all economical, political, and social levels. Statistics around the world show a relatively high rate of people with speaking difficulties. Rates extend to an average of 9% of the whole population, whether in the Middle East, Europe, Africa, America, or other continents of the world. Taking individual age groups into consideration, it is observed that the speakingimpairment rate increases as the age increase. For it is observed that the elderly (65 years and above) show a 25% rate and people whose ages range between 55 and 64 have a rate of 15%. [1] It would be unjust for such people to be excluded from society just because they lack elementary communication means. They are mentally capable individuals who deserve and are even demanded to play an effective role in society. Society cannot dismiss whatever potential they have, simply because it needs it; for it is known that society advances only with the collective effort of all of its members. To this end, technology is utilized to aid humans in their collective effort to overcome such impediments. In this proposal, a system is presented that enables speaking-impaired Filipino people to further connect with their society and aids them in overcoming communication obstacles created by the society's incapability of understanding sign language. The system proposed is based on translating motion into sound it suggests an approach involving human gesture and sign language recognition via a computer processor that assimilates these gestures and signs to produce their equivalent in sound. The objective of the project is to provide a practical way of translating Filipino sign language into speech, offering people with vocal disabilities a means of communication with people incapable of understanding sign language. I first survey existing literature discussing systems dealing with the recognition of sign language and then propose a design that can achieve the desired objective. The Hardware components of the system compromise a video camera, a relatively fast processor, and output speakers. Software components comprise running different algorithms to detect and track the face and hand. Other algorithms are needed to capture frames from the camera, subtract their background, normalize them, and then compare them to a set of stored images in a reference database to come up with a decision about the meaning of the gesture made. The C++ openCV library functions are used to write the corresponding image processing algorithms.

IV. Review of Related Literatures: In order to have an idea about the best approach to follow to build the system, it was important to review already developed systems that are related to this study. To this end, I researched and studied a set of seven papers that were most relevant to the system. Then went on with analyzing and comparing the methods of implementation and development applied in each paper; after that, I came-up with an approach most suitable to develop the system which converts Filipino sign language into its equivalent in sound.

Machine Gesture and Sign Language Recognition Machine gesture and sign language recognition is, as the name suggests, about recognition of gestures and sign language using computers. The hardware technique that is used for gathering information about body positioning described in this paper [4] is image-based using a digital camera as the interface. However, getting the data is only the first step. The second step, that of recognizing the sign or gesture once it has been captured, is much more challenging, especially in a continuous stream. For that end, this paper describes an effective system developed by Tony Heap for tracking hand position using a video camera. While not directly involving gesture recognition, it is an important preprocessing step for video-based gesture recognition. After undergoing a normalization stage, this tracked hand motion is then compared automatically with thousands of images present in a database in the system's memory unit. With the help of this material, I was able to make a conceptual framework of the system and I applied or use this kind of similar system to make a system that perform a specific function which is to convert Filipino sign language into its equivalent in sound. Online, Interactive Learning of Gestures for Human/Robot Interfaces This paper [5] describes a gesture recognition system that can interactively recognize gestures and perform online learning of new gestures. Based on Hidden Markov Models (HMM), i.e. on a doubly stochastic process, it is able to update its model of a gesture iteratively with each example it recognizes. This system has demonstrated reliable recognition of 14 different gestures after only one or two examples of each. The system is currently interfaced to a Cyber glove used to recognize sign language gestures. This system is being implemented as part of an interactive interface for robot tele-operation and programming by example. The approach used to develop this system is based on automated generation and interactive training of a set of HMM's that represent human gestures. Each kind of gesture represented by an HMM contains a list of example observation sequences and an optional action to be performed upon recognition of the gesture. The approach of this system in image recognition is actually similar to the one mentioned above except that it uses a cyber glove to to recognize sign language. Recognition of the gesture components of the sign language A sensor-based approach [6] that can be employed in sign language description and recognition methods is based on recognition of the gesture components of the sign language context. The focus of other approaches was on the recognition of finger spellings and the simple linear motions created by the signer. However, such simplicity does not provide an accurate meaningful presentation for the signer. The actual gestures of the sign language are composed of complex motions accompanied by a sequence of continuous sign language words in a sentence (known as cheremes). Therefore, a recognition method that is able to recognize complex gestures in a continuous pattern is quite effective. In this (gesture-based) method, the sign language is represented as a combination of gestures or intuitions of the hand (motion). The hand motion is studied in terms of the position and direction of the entire hand, as well as on the bending movements of the fingers. This method uses a sensing glove as a way to input the sign language gestures; however, sensing gloves will cause motion impracticality for the signer, and gesture interpretation will be limited only to the motion of hands that means neither face nor body expressions will take part in the interpretation. Video-based approach In contrast to the above, a method to collect data for the recognition process is based on video data collection. This video-based approach [7] leads to a more efficient and successful interface since, unlike a glove-based method, the signer is not required to be directly connected to the computer. In some of the video-based approaches, three video cameras were used in addition to an electromagnetic tracking

system to acquire the full 3D movements of the signers hands. These movements (known as epenthesis), along with their directions, were tracked and recorded using the cameras for further synthesis. However, the sentence structure in this case was unconstrained and the number of signs within a sentence was variable. Another video-based approach, presented in 1998, employed two video camera for real-time recognition. The first was used for observing the frontal view of the signer while and the second was responsible for image recording. The image recording process was based on the Hidden Markov Models (HMM) algorithm in distinguishing the changes in the x and y positions (area and angle changes) of the moving hand/arm. The recognition accuracy ranges between 70-95 % depending on the camera position. Real-Time American Sign Language Recognition from Video Using Hidden Markov Models Similar to few of the above approaches, this approach [8] describes an extensible system which uses a single color camera to track hands in real time and interprets American Sign Language (ASL) using HMM. Instead of a fine-grain description of hand shape, the hand tracking process of the system produces a coarse description of the hand shape, orientation, and trajectory. While a skin color hand tracker has been demonstrated, the paper requires that the user wear solid colored gloves to facilitate the hand tracking frame rate and stability. The shape, orientation, and trajectory information is then input to a HMM for recognition of the signed words. Through the use of HMM, low error rates were achieved on both the training set and an independent test set without invoking complex models of the hands. The paper expects lower error rates with a larger training set and context modeling, leading to a freer, personindependent ASL recognition system. OGRE: Open Gesture Recognition Engine This article [9] discusses a gesture recognition engine based on Computer Vision (CV). The system initially removes the background of captured images, eliminating irrelevant pixel information. The human hand is then detected, segmented and its contours localized. Significant metrics are derived from these contours, allowing search in a pre-defined library of hand poses, where each pose is previously converted into a set of metric values. This article discusses necessary about gesture recognition which will be applied in the system I proposed. A System for Sign Language Recognition using fuzzy Object similarity tracking An Arabic sign language recognition system is presented in this article [10]. In addition to techniques mentioned above, fuzzy object similarity tracking methods were used by the researchers to segment hands on different frames. Skin color detection methods were utilized to locate the face of the signer. The centroid of the signer face was the parameter used to centralize the signers body position in all the frames. Simple features like hand area, centroid, eccentricity of the bounded ellipse and angle of axis of least criteria were all employed to form a 10-element feature vector. For parameters estimation, training and testing of 5-HMM's and EM (Expectation Maximization) algorithms was done through the use of these vectors. Recognition rate of 98% on testing data was achieved over a dataset of 50 words. Although these papers aided in the attempt to build the proposed system, none of them present a system which converts Filipino sign language into its equivalent in sound and some of the papers lacked in implementation to their system and only a theoretical approach was presented to build it. A set of other papers that discussed an image processing approach lacked the necessary algorithms needed to capture, track, and recognize the video frames. Furthermore, some papers provided results of experimentation but did not provide all the needed information discussing the testing environment and the methodology used to perform it. In order to decide on the system design, the main design alternatives are compared to come up with an optimal design that would best suite the objective. The objective is to build a system that can

detect, track, and recognize face and body gestures for the sole purpose of outputting their equivalent in sound or simply text. In addition, the actual implementation of the system should take practicality and ease of use into account. Design Alternatives a.Sensor-based Approach: One technology for detecting the motion of or the gestures made by the hands is based on using sensory detectors. In most of the sensory-based approaches, a cyber-glove is used as a way of interfacing with the processor. The tiny sensors integrated in each glove detect the position and can in some cases measure the velocity of corresponding segments of the hand. During processing, each sensor translates the coordinates of its point into a certain eigenspace relative to a certain reference. The input data from the sensors is manipulated by a certain algorithm that synthesizes it either into recognized or unrecognized gestures. b. Image processing Approach: This second approach to recognition of gestures is based on image processing where frames are captured using a video camera. After data is acquired, the frames are then analyzed in order to detect the hands and face. The use of skin color facilitates the differentiation of the hand and face from the background colors using a specific color space. In some of the approaches, distinctly colored gloves are used in order to have faster detection of the hands. The motion of the hands or the face is tracked using parameters such as the centroid, area, and vertical and horizontal axis of the hands and face. Recognition of the gestures is sometimes done using Hidden Markov Models that use a set of stored images taken as reference. c. Sensors and Image Processing Approach: A third alternative is one that uses both sensors and image processing, each in a specific area. The sensors would be used to detect motion while image recognition method would be employed to detect still images. And since the sensors detect the exact angle in which the different body parts are positioned, a large number of sensors needs to be spread-out on the entire body; the fact that fact makes a system based on such an approach less portable. Adding to this, a system performing image recognition based on both image processing and sensors is more complex and costs more than each approach implemented alone. This is because it combines two technologies based on different principles that require distinct hardware and software components in order to function successfully. After investigating the existing related designs and studying the available alternatives given several constraints, I decided that a design that is based on image processing is better to achieve the sought-after objective. The image processing approach has two main advantages over the sensor-based approach. The first is that a sensor-based approach requires the signer to be continuously connected to sensing devices, the condition that contradicts our objective to have a practical system that can be used in daily life without much complication. An image processing approach, on the other hand, does not require the signer to wear any kind of external devices such as gloves. The second advantage lies in the fact that an image-processing approach allows the recognition of a wider variety of movements and gestures than those 'sensed' by a sensor-based approach. In a video-based approach, body movements and face gestures can be detected in an easier way using the same video frames used to detect hand signs. This is in contrast to the sensor approach that one, cannot detect face gestures in the first place and two, requires even more sensors added over the whole body in order to track body movements, which if done contradicts the first point. Thus, the two advantages render the image processing approach a more effective and efficient one.

With no additional devices added onto the signer and with a wider variety of gesture and movement detection, the image processing approach poses itself as the best design to achieve the objective. To build this design, the hardware components needed are a digital video camera to track sequential frames, a relatively fast processor that can assimilate frames and data in real time and a sound output module, and the software algorithms needed are an algorithm to detect and track frames, an algorithm to recognize frames and an algorithm to read data after comparison and output it in sound. The software algorithms can either be C++ or matlab.[4]

V. Conceptual/Theoretical Framework of the Study: To build the system, it must follow a divide and conquer approach where the entire system is partitioned into several modules to be able to simultaneously focus on small tasks and complete them in parallel. The next step was to assemble them and test their functionality altogether. These modules are shown below in the block diagram of the overall system.

Figure 1. Block Diagram for the entire system As stated earlier, the project is based on image recognition. To that end, the user interface is a digital camera that provides portability to the system in addition to being an excellent medium for capturing continuous images in real time and facilitating the data acquisition process.

The users hand and face are first detected and tracked. After this, the system starts capturing frames from the camera. The system then performs background subtraction and passes the modified images to be normalized. The images are then sent to the comparator where they are compared with reference images in the database. After recognizing the signs with an algorithm, a text reader will read the corresponding text and output it to speakers. Hand Detection and Tracking Since the application aims on recognizing hand gestures, it is highly important to continuously keep track of the hand after successfully detecting it. This procedure is attained by the use of the CAMSHIFT (Continuously Adaptive Mean-Shift) algorithm described in the OpenCV library[7]. Data Acquisition (frame capturing) The DirectShow program will be used for data acquisition[11]. DirectShow is particularly useful to process image sequences or sequence captures using PC cameras. The DirectShow architecture relies on filter architecture. The processing of a sequence is therefore done using a series of filters connected together; the output of one filter becoming the input of the next one. The first filter is usually a decompressor that reads a file stream and the last filter could be a renderer that displays the sequence in a window. Background Subtraction Following the process of data acquisition and object detection, captured frames can be processed as still images stored as data structures in the memory. The background subtraction method depends on the variation in the brightness of rear pixels that structure the background of the image with the front pixels that form the gesture we need to recognize. The image's brightness and contrast are initially modified in order to further increase the variation of the pixel brightness. The image is then passed through a threshold filter that filters out pixels with brightness below a certain threshold. Later on, the image is converted into a binary image composed of front bright pixels only. After subtracting the background that might act as noise or some kind of interference the image is now ready to be normalized. Normalization Technique Normalization is the process of setting the image to a certain spatial center or a certain normalized moment. Each image should be normalized before comparison with data from the reference database. The spatial difference corresponds to a difference in the pixel filling and shape; this will give a fault negative recognition decision when compared to a similar image but with a different spatial orientation or grey level intensity[11]. Gesture Recognition In gesture recognition the captured normalized image is first compared to the first image in the database. If the comparator yields a mismatch, then the comparison is continued with the next image in the database and so on until the image is found. In case the comparator yields a mismatch to all the images in the database, this means that the captured image is not found in the database. The user interface provided in the system allows the user to save the image of the new sign to the database as well as save the corresponding text-equivalent that could be read later on by the text-reader.

Text-To-Sound Converter In case of a match in the gesture recognition, the meaning of the matched image is sent through a text-to-sound converter to generate it in sound.

VI. Statement of the Problem: What can we do to help the speaking-impaired persons, specifically Filipinos, in communicating to the community who does not understand Filipino sign language?

VII.Assumptions:

VIII. Significance of the Study. The translation of sign language into speech opens a new communication channel between people incapable of understanding sign language and people incapable of audible articulation. This paper proposes both effective and efficient solution for such communication. However, the existence of a system such as this will not only enable speaking-impaired individuals to be more involved in the society, allowing them to live up to their true potential and be more productive this will also allow the society to overcome impediments created by miscommunication or the lack of it and hence be able, as a whole, to make use of every existing human resource and grow advancing on all economical, political and social levels. This project beholds an elemental impact on society and, if further developed, can give room to potential break-through technological applications.

IX. Scope and Limitation. The scope of the study is the design of a system that will convert Filipino sign language into its equivalent in sound. It also includes a survey of related literature which will be the basis of the design and which will also be a source of the design that will be revised for the specific purpose of this study which is to prioritize the conversion of Filipino sign language into sound. The limitation of the study is on the design of the system and the revision of the algorithm used from other existing projects to make it suitable for converting Filipino sign language.

X. Definition of Terms. OpenCV (Open Source Computer Vision Library). It is a library of programming functions mainly aimed at real time computer vision, developed by Intel and now supported by Willow Garage. It is free for use under the open source BSD license. The library is cross-platform. It focuses mainly on realtime image processing. If the library finds Intel's Integrated Performance Primitives on the system, it will use these proprietary optimized routines to accelerate itself. Hidden Markov model (HMM). It is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be considered as the simplest dynamic Bayesian network. Hidden Markov models are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.

Normalization. It is the process of isolating statistical error in repeated measured data. A normalization is sometimes based on a property. Quantile normalization, for instance, is normalization based on the magnitude (quantile) of the measures. Data acquisition. It is the process of sampling signals that measure real world physical conditions and converting the resulting samples into digital numeric values that can be manipulated by a computer. Data acquisition systems (abbreviated with the acronym DAS or DAQ) typically convert analog waveforms into digital values for processing. The components of data acquisition systems include: Sensors that convert physical parameters to electrical signals. Signal conditioning circuitry to convert sensor signals into a form that can be converted to digital values. Analog-to-digital converters, which convert conditioned sensor signals to digital values.

Background subtraction. It is a commonly used class of techniques for segmenting out objects of interest in a scene for applications such as surveillance. It involves comparing an observed image with an estimate of the image if it contained no objects of interest. The areas of the image plane where there is a significant diference between the observed and estimated images indicate the location of the objects of interest. The name "background subtraction" comes from the simple technique of subtracting the observed image from the estimated image and thresholding the result to generate the objects of interest.

Chereme. It is a basic unit of signed communication and is functionally and psychologically equivalent to the phonemes of oral languages, and has been replaced by that term in the academic literature. Cherology is the study of cheremes. Epenthesis. In phonology, it is the addition of one or more sounds to a word, especially to the interior of a word. Epenthesis may be divided into two types: excrescence, for the addition of a consonant, and anaptyxis for the addition of a vowel. Expectation-maximization (EM) algorithm. It is a method for finding maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. EM is an iterative method which alternates between performing an expectation (E) step, which computes the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. Cyberglove. It is a glove-like input device for human-computer interaction, often in virtual reality environments. Various sensor technologies are used to capture physical data such as bending of fingers. Often a motion tracker, such as a magnetic tracking device or inertial tracking device, is attached to capture the global position/rotation data of the glove. These movements are then interpreted by the software that accompanies the glove, so any one movement can mean any number of things. Gestures can then be categorized into useful information, such as to recognize Sign Language or other symbolic functions. Eigenspace. It is the vector space spanned by the eigenvectors associated with that eigenvalue. Its dimension is the number of linearly independent eigenvectors.

ODBC (Open Database Connectivity). It is a standard software interface for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of programming languages, database systems, and operating systems. Thus, any application can use ODBC to query data from a database, regardless of the platform it is on or DBMS it uses. ODBC accomplishes platform and language independence by using an ODBC driver as a translation layer between the application and the DBMS. The application thus only needs to know ODBC syntax, and the driver can then pass the query to the DBMS in its native format, returning the data in a format the application can understand. SQL (Structured Query Language). It is a programming language designed for managing data in relational database management systems (RDBMS). Originally based upon relational algebra and tuple relational calculus, its scope includes data insert, query, update and delete, schema creation and modification, and data access control. SQL was one of the first commercial languages for Edgar F. Codd's relational model, as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks". Despite not adhering to the relational model as described by Codd, it became the most widely used database language.

XI. Methodology. This project is based on image recognition and the user interface is a digital camera that provides portability to the system in addition to being an excellent medium for capturing continuous images in real time and facilitating the data acquisition process. The users hand and face are first detected and tracked. After this, the system starts capturing frames from the camera. The system then performs background subtraction and passes the modified images to be normalized. The images are then sent to the comparator where they are compared with reference images in the database. After recognizing the signs with an algorithm a text reader will read the corresponding text and output it to the speakers. Hand Detection and Tracking Since the application aims on recognizing hand gestures, it is highly important to continuously keep track of the hand after successfully detecting it. This procedure is attained by the use of the CAMSHIFT (Continuously Adaptive Mean-Shift) algorithm described in the OpenCV library[7]. The algorithm is as follows: After acquiring frames from the camera, each frame (image) is converted to a color probability distribution image using a color histogram model. The color distribution of the object that is aimed to track will be continuously distinguished and hence tracked. Originally, the algorithm requires us to determine the color of the object we plan to track and its initial position. This can be done either by manually assigning the position of the hand (object) using the mouse by drawing a small window around it; or it can be done through implementing a motion detecting algorithm that will determine the position -and hence color- of the hand (object) intend to track and automatically draws a window around it. Therefore, using the latter method, the user is required to initially move his hand in a simple smooth motion in order for the program to detect the position and color of the hand. After determining the position of the object, its color, center, and size are found based on the color probability and color distribution of the image. The current size and location of the tracked hand are reported and used as an initial guess for determining the new location of the search window in the next frames. The process is repeated for each frame and, as a result, the program will keep on tracking the object by tracking its new position in each frame. [4]

Data Acquisition (frame capturing) The DirectShow program will be used for data acquisition. The OpenCV library collaborates with DirectShow technology, which is part of Microsoft DirectX technology [11]. DirectShow is particularly useful to process image sequences or sequence captures using PC cameras. The DirectShow architecture relies on filter architecture. The filters are basically of three types: 1. source filters that output video and/or audio signals, 2. transform filters that process an input signal and produce one (or several) output(s) 3. rendering filters that display or save a media signal. The processing of a sequence is therefore done using a series of filters connected together; the output of one filter becoming the input of the next one. The first filter is usually a decompressor that reads a file stream and the last filter could be a renderer that displays the sequence in a window. In the DirectShow terminology, a series of filters is called a filter graph. The DirectShow program is first used to determine the type of filter graph required for acquiring video stream whether from a camera or from a file (MPEG, AVI, etc). The filters are then implemented (or included to) by the C++ OpenCV library in order to start capturing the video frames using the HighGUI component of the OpenCV library. Background Subtraction Following the process of data acquisition and object detection, captured frames can be processed as still images stored as data structures in the memory. The background subtraction method depends on the variation in the brightness of rear pixels that structure the background of the image with the front pixels that form the gesture we need to recognize. The image's (frame's) brightness and contrast are initially modified using the ContrastBrightness() function in order to further increase the variation of the pixel brightness. The image is then passed through a threshold filter that filters out pixels with brightness below a certain threshold using the built-in function cvThreshold(,,200,255,BINARY). Later on, the image is converted into a binary image composed of front bright pixels only. After subtracting the background that might act as noise or some kind of interference that causes an error in the output, the image is now ready to be normalized. Normalization Technique Normalization is the process of setting the image to a certain spatial center or a certain normalized moment. Each image should be normalized before comparison with data from the reference database. The spatial difference corresponds to a difference in the pixel filling and shape; this will give a fault negative recognition decision when compared to a similar image but with a different spatial orientation or grey level intensity[11]. Gesture Recognition The gesture recognition is based on image subtraction and threshold comparison. The captured normalized image is first compared to the first image in the database. If the comparator yields a mismatch, then the comparison is continued with the next image in the database and so on until the image is found. In case the comparator yields a mismatch to all the images in the database, this means that the captured image is not found in the database. The user interface provided in the system allows the user to

save the image of the new sign to the database as well as save the corresponding text-equivalent that could be read later on by the text-reader. In case the image is found, the comparator module returns the ID of the reference image to which the match was successful. This ID is then used in the text to sound mapping module in order to provide the corresponding correct sound output. Reference Database The database contains the images, their corresponding IDs, and text meaning of the sign. To this end, Microsoft Access was used to build it. This is because, in addition to its ease of use, it provides ability to store images and sounds. An ODBC data source is used to connect the database to the C code. The C code reads-in from the text file that was produced by the gesture recognition module the ID of the image that yielded a successful match. A select SQL statement is used to locate the image in the database in order to get the text meaning of the sign. The corresponding sound is generated after the text-to-sound converter reads from a text-file that contains the meaning of the recognized sign. Text-To-Sound Converter In order to make the system dynamic, allowing the user to have control over the system, the user is provided with the option of inserting new signs to the database. However, in order to simplify the addition of sound to the database and overcome the incapability of the user to do so, it is made available the feature of inserting the meaning of the new sign as a string of text to the database. This text is then processed (by the user) using the text-to-sound module to produce a wave file of the corresponding sound. This sound file is added to the sound files folder and thus the system can now comprehend a new sign.[4]

XII. Working Bibliography: [1]http://books.google.com/books?id=BYiMgQytRU8C&pg=PA6&lpg=PA6&dq=%22hearing+mutism% 22&source=web&ots=W2ypNqM5Zm&sig=abMOBzrb9WReh8MYU9lkVie9UnM&hl=en#v=onepage& q=%22hearing%20mutism%22&f=false [2] http://www.linkedin.com/company/mute/statistics [3]http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F5028 855%2F5069167%2F05069179.pdf%3Farnumber%3D5069179&authDecision=-203 [4]: D. Chai and K. N. Ngan, Locating facial region of a head-and-shoulders color image, in Proc. 3rd Int. Conf. Automatic Face and Gesture Recognition, 1998, pp. 124-129. [5]: BigEye A Real-Time Video to MIDI Macintosh Software [online] Available: http://www.steim.nl/bigeye.html.
th

[6]: Ohki, M. The Sign Language Telephone. 7 World Telecommunication Forum, Vol. 1, pp.3.91-395, 1995 [7]: V. Pavlovic, R. Sharma, and T. Huang. Visual interpretation of hand gestures for human-computer interaction: A review.IEEE Transactions on Pattern Analysis and Machine Intelligence,19(7):677695, July 1997. [8]: Starner T and Pentland A. Real-Time American Sign Language Recognition from Video Using Hidden Markov Models. Perceptual Computing Section, The Media Laboratory, Massachusetts Institute of Technology. IEEE 1995 [9]: Dias J, Nande P, Barata N, Correia A. O.G.R.E. Open Gestures Recognition Engine. ADETTI/ISCTE, Lisboa, Portugal. XVII Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI 2004) [10]: Sarfraz M, Yusuf A Syed, Zaeshan M. A System for Sign Language Recognition using Fuzzy Object Similarity Tracking. Information and Computer Science Department, King Fahd University of Petropleum & Minerals, Dhahran, Saudi Arabia. Ninth International Conference on Information Visualisation (IV 2005) [11]: Intel Corporation. (2006). Open Source Computer vision Library. [Online]. Available: http://www.intel.com/technology/computing/opencv/index.htm

You might also like