Professional Documents
Culture Documents
In Partial Fulfillment
of the Requirements for the Degree
BACHELOR OF SCIENCE IN INFORMATION TECHNOLOGY
Ommar, Vargas
Tracy, Cadiente
Marissa, Bonilla
Mary Grace, Asas
Joshua, Canada
Mark, Galicinao
In partial fulfillment of the requirements for the degree BACHELOR OF SCIENCE IN INFORMATION
TECHNOLOGY, this Capstone project entitled “Voice Writer”, has been prepared and submitted by Asas
Mary Grace, Bonilla Marissa, Cadiente Tracy, Canada Marc Joshua, Galicinao Mark and Vargas Ommar
who are hereby recommended for project presentation.
<Chair>
Chair, Defense Panel
Accepted and approved in partial fulfillment of the requirements for the degree BACHELOR OF
SCIENCE IN INFORMATION TECHNOLOGY.
First and foremost, praises and thanks to God, the almighty, for his showers of blessings, throughout our
project work to complete this documentation successfully.
We ae using this opportunity to express our gratitude to everyone who supported us throughout
the study of this CAPSTONE project. We are thankful for their aspiring guidance, unending support and
friendly advice during the project work. We are sincerely grateful to them for sharing their truthful ideas
and views on our project. Their guidance helped us in all the time of research and documenting this
thesis.
We would like to express our deep and sincere gratitude to our Professor Herminino Lagunzad for giving
us the opportunity to do research and providing invaluable guidance throughout this research. His
sincerity and motivation have deeply inspired us. It was a great privilege and honor to work and study
under his guidance.
Besides our Professor, we would also like to thank our panelists: Ma’am Hilarion, Sir De Guzman, and Sir
Pineda, for their insightful comments and encouragement. That greatly improved our research. Without
their precious support it would not be possible to conduct this research.
Last but not the least, we would like to thank our family especially our parents for trusting and
supporting us financially and spiritually throughout writing this thesis.
ABSTRACT
When people talk, they often their hands and their arms. These movements seem to have some type of
Relationship with concurrent speech. Some of the result of a research project on Gesture are presented
here. These studies aimed at describing different gestures occurring during.
The deaf normally considered to be disabled that do not need any mobile technology due to the
inabilities of hearing and talking. However, many deaf are using mobile phone in their daily life for
various purposes such as communication and learning. Many studies have attempted to identify the
need of deaf people in mobile application and level of usage of the applications. This study aims in
studying the recent research conducted on deaf mobile application to understand the level of
importance of mobile technology for this disabled community. This paper enables identification of
studies conducted are limited and the need of more research done of this disabled people to ensure
their privilege of using mobile technology and its application, which leads to the identification of deaf
user requirement for mobile application as future study.
1 INTRODUCTION
Deafness is one of the most common disability in this world. We all know that being deaf is not that
easy. Deaf people take so much time to understand our language. They can feel isolated and find it hard
to get information or help in an emergency. It’s difficult for them to interact and communicate with the
people who surround them. Deafness can affect one’s life in so many ways; it may be good or bad. As a
deaf person, you rely on your eyes for clues to what people are saying or feeling, you rely on other clues
like vibrations in floor to be aware of what is going on around you.
We all know that deaf people have difficulty of hearing sounds and mostly they don’t know how to
speak languages.
Deaf normally use sign language in order to communicate with each other. In this communication
system, deaf people are not able to represent their ideas or messages to other people which they want
to say.
In today’s world technology has been developed very fast and presents each action in digital form then
it may be in images or audio format. In order to make their life more advanced, application is needed to
be developed so they can get opportunity to learn new thing and can get a chance to introduce with
new technologies.
Our group conducted a research that would improve the communication between a hearing and deaf
people. Our app is called “Voice Writer”. An app like this can be useful in so many ways.
Voice writer is the process of converting speech in the form of a sound signal into a sequence of written
words. It is typically used to dictate the text into the computer. Voice writer allows providing input to an
application with human voice. Just by tapping the button, typing through keyboard or pressing a key in
the phone keypad provides input to an application; speech recognition allows us to provide input by
talking. Voice writer is an application where audio, text, graphics, a video are integrated to convey
various types of information. Often conversions are necessary between different media. The objective is
to provide more natural interaction between the human user and computer.
While natural language and the audio channel are the primary means of human-to-human
communication, they are little used between human and machine. This is an area in transition, however
when one considers communicating with sound, speech generally comes first to mind.
This study is conducted to provide a solution to the deaf community, so that they can use this app
anytime anywhere. Voice writer allows a quick method of writing. It is also useful for people with
disabilities who find it hard to use the keyboard. This app can also assist those who have difficulty with
transferring ideas onto paper, as it helps take the focus out of the mechanics of writing (ex. Spelling,
sentence structure etc.) Not all speech recognition software packages are equal in function, capability,
or ease of use.
Voice writer application required each word to be separated by a distinct space (i.e., briefly pausing
between each word). This allowed the software to determine where one word begins and next to stops.
Continuous speech voice recognition applications allow a user to dictate text fluently into the phone
screen. Voice recognition is becoming increasingly popular as a built-in function, especially in tablets and
smartphones. Comparing the recognition and ease of use may help determine if built-in is good enough
or if the student requires a more robust solution, especially for writing longer blocks of text. Voice writer
will serve as the “ear” to them (deaf people).
MAIN OBJECTIVE
To Create an Application For Deaf People
SPECIFIC OBJECTIVE
To Develop A Record voice and text for Documentation reviewer
To Develop A Voice into text Format Converter
To Develop A Clear Human Communication Between Deaf And Normal People
SCOPE
Our system can convert voice into text. It can also recognize languages like English or Filipino.
It can convert fingerspelling into text form. It allows user to save and delete recorded speech and text. It
is accurate in noisy environments meaning our app can successfully handle noisy audio from a variety of
environments. It filters inappropriate content in text results for some languages.
2 CONCEPTUAL FRAMEWORK
DRAGON DICTATION
Dragon dictation started as speech recognition application for Apple’s iOS platforms, including
iPhone, iPod Touch and iPad. The app provided automatic speech-to-text capabilities. It was developed
by Nuance Communications, and released in December 2009 as free app. It is now commonly found
licensed in vehicle infotainment systems and healthcare equipment.
It works as an online solution, requiring an Internet connection by the user. It is an automated dictation
program that turns spoken words into editable text. In the full version, users can enter text to any
application on their computer by speaking into a microphone. Over time, the software improves
accuracy by learning the voice and pronunciation of the user. Voice commands also allow users to
perform basic editing functions and punctuation.
SIRI
Siri is an intelligent personal assistant, part of Apple Inc.’s iOS, watchOS, macOS and tvOS
operating systems. This assistant uses voice queries and a natural language user interface to attempt to
answer questions, make recommendations, and perform actions by delegating requests to a set of
Internet services. The software adapts to user’s individual language usages, searches, and preferences,
with continuing use. Returned results are individualized.
Artificial Intelligence – is the intelligence of machines and the branch of computer science which aims to
create it.
BETA – a pre release of software that is given out to a large group of users to try under real conditions.
Beta versions have gone through alpha testing inhouse and are generally fairly close in look, feel and
function to the final product; however, design changes often occur as a result.
DEAFNESS – the condition of lacking the power of hearing or having impaired hearing.
DIGITAL – describes electronic technology that generates, stores, and processes data in terms of two
states; positive or non- positive.
DISABILITY – a physical or mental condition that limits a person’s movements, senses, or activities.
iOS – basically, iOS truncated way of saying ‘iPhone OS, or iPhone Operating System’.
macOS – is the computer operating system for Apple Computer’s Macintosh line of personal computers
and workstations.
tvOS – “Terminal Velocity Operating System” is an operating system developed by Apple Inc for the
fourth generation and later Apple TV digital media player.
VIRTUAL – not physically existing as such but made by software to appear to do so.
watchOS – is a line of smartwatches designed, developed, and marketed by Apple Inc. It incorporates
fitness tracking and health-oriented capabilities with integration with iOS other Apple products and
services.
3. OPERATIONAL FRAMEWORK
3.1 Materials
3.1.1 Software
Microsoft Visual Studio/Xamarin
Microsoft Visual Studio is an Integrated Development Environment (IDE) from Microsoft. It
is used to develop computer programs as well as wed sites, wed apps, wed services, and
mobile apps
Xamarin is a mobile app development platform for building native iOS,Android, and
Windows apps from a common C#/.NET code base,achieving 75% to nearly 100% codes
reuse between platforms
Our Group Plan to use this IDE and plarform Because Microsoft Visual Studio also use C#
language which is our Group are planning to use to develop our application
OpenCV
It is a open source computer vision and machine learning software library. OpenCV was built
to provide a common infrastructure for computer vision application and to accelerate the
use of machine perception in the commercial product. Being a BSD licensed product
Our Group Planning to use this machine learning to recognize the Fingerspelling and also
OpenCV was using C# which is the language were going to use to develop our application
3.1.2 Hardware
Laptop
A computer that is portable and suitable for use while travelling
Our Group are Planning to use this Because Laptop can anywhere you are and also it
function as computer which is allowed as to develop our application.
Smartphone
A mobile phone that performs many of the function of computer, typically having a
touchscreen interface, internet access, and an operating system capable of running
download applications.
Our Group are Planning to use this as test subject Because SmarthPhone can
perform as a computer which is good for a test subject for our application.
3.1.3 DATA
Voice, Visual, Fingerspelling
3.2 METHOD
3.2.1 Algorithm Under Study
End Point Detection
The speech signals at low frequencies have more energy than at high frequencies.
Therefore, the energies of signal are necessary to be boost at high frequencies.
According to the saturation of environment, the unwanted noise may affect the
recognition rate worse. This problem can be overcome by end point detection
method
Mel Frequency Cepstral Coefficient (MFCC)
Feature extraction is the most important part of the entire system. The aim of
feature extraction is to reduce the data size of the speech signal before pattern
classification or recognition
Color Model
Is a system for creating a full range of colours from a small set of primary colors.
There are two types of colour models: additive and subtractive. Additive color
models use light to display color, while subtractive color models use printing inks.
The most common color models that graphic designers work with are the CMYK
model for printing and the RGB model for computer display.
Image segmentation
Is the process of partitioning a digital image into multiple segments The goal of
segmentation is to simplify and/or change the representation of an image into
something that is more meaningful and easier to analyze Image segmentation is
typically used to locate objects and boundaries (lines, curves, etc.) in images. More
precisely, image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain characteristics.