You are on page 1of 64

COE 490- Senior Design

Report
Date of Submission: 9th May 2015

Read2Me : A Reading Aid for the Visually


Impaired
Supervised by: Dr. Assim Sagahyroon

Anza Shaikh 42554


HebaSaleous 42416
Ragini Gupta 49089
0
Page
Abstract
The inability to read has a huge impact on the lives of the visually impaired. In this
context, our paper focuses on the development of a device that can translate an image of text into
an audible speech for the user. Mobile applications are used extensively today to collect
information in a digital format which can be further deciphered without any human assistance.
Therefore, through this project we propose a solution of a portable camera device that can be
used to take images of any printed material which will be processed and converted into speech
synthesis using OCR and TTS software on a mobile application.

In addition to this, the application on the mobile platform will incorporate the
feature of GPS technology that can be used to help the user locate themselves or allow others to
locate them in case they forget their way around.

1
Page
Acknowledgement
We would like to express our heartfelt gratitude to Dr. Assim Sagahyroon, our senior
design advisor for his constant motivation and guidance throughout the project. We would also
like to thank Dr. Raafat Abu Rukba for giving us the guidance on how to implement a cloud
system for our project. We would also like to extend our thanks to our parents and friends for
their support during this work and any guidance they have given us.

2
Page
Table of Contents
I. Introduction .......................................................................................................................................... 7
II. Background and Statement of Problem................................................................................................ 9
PREVIOUS WORK.......................................................................................................................................... 9
III. Requirement Specifications .......................................................................................................... 24
IV. Design Objectives .......................................................................................................................... 27
DESIGN OBJECTIVES ................................................................................................................................... 27
DESIGN CONSTRAINTS ................................................................................................................................ 28
COMPONENT LEVEL SPECIFICATION .............................................................................................................. 29
V. Technical Approach and Solution .................................................................................................... 32
OCR SOFTWARE ........................................................................................................................................ 34
TTS SOFTWARE .......................................................................................................................................... 35
PROGRAMMING LANGUAGE ........................................................................................................................ 35
RASPBIAN OS ............................................................................................................................................ 35
FLOW CHART ............................................................................................................................................. 36
SYSTEM ARCHITECTURE .............................................................................................................................. 37
HARDWARE ARCHITECTURE ......................................................................................................................... 39
VI. Preliminary Design ........................................................................................................................ 40
VII. Preliminary Cost Estimates .......................................................................................................... 44
VIII. Project Management ..................................................................................................................... 45
PRELIMINARY SCHEDULE ............................................................................................................................. 45
GANTT CHART ............................................................................................................................................ 48
IX. Societal Impact .............................................................................................................................. 49
X. List of Components............................................................................................................................ 50
HARDWARE ............................................................................................................................................... 50
SOFTWARE ................................................................................................................................................ 51
NETWORK.................................................................................................................................................. 51
XI. Future Prospects ........................................................................................................................... 52
MOTION DETECTION .................................................................................................................................. 52
SECURITY ................................................................................................................................................... 52
3

DISTANCE SENSORS .................................................................................................................................... 52


Page

SIRI-LIKE APPLICATION ................................................................................................................................ 52


CLOUD COMPUTING SYSTEM ....................................................................................................................... 52
SHARING F ACILITY ...................................................................................................................................... 52
XII. Conclusion ...................................................................................................................................... 54
XIII. Bibliography .................................................................................................................................... 59

4
Page
List of Figures
Figure 1: Read2Me Prototype ....................................................................................................................... 7
Figure 2: Functional diagram for an automated pen explaining the differentprocess from capturing a
image to reception by a ZigBee headset [3] ................................................................................................. 9
Figure 3: Simulation of a newspaper page for extracting the textual contents [3] .................................... 10
Figure 4: A block diagram of the architecture and design of K-NFB Reader [4] ......................................... 11
Figure 5: Tyflon Reader Prototype [6]......................................................................................................... 12
Figure 6: Finger Reader ............................................................................................................................... 15
Figure 7: Software portraying the detected text while finger scanning, and the extracted text in camera
analysis ........................................................................................................................................................ 15
Figure 8: Block Based Text Tracking ............................................................................................................ 16
Figure 9: Comparison of Frameworks ......................................................................................................... 17
Figure 10: System architecture for Camera Reading for Blind ................................................................... 17
Figure 11: The System Diagram of the Open Source OCR Framework ....................................................... 18
Figure 12: The object is rotated to capture different scales and angles of the text intended for reading
[14] .............................................................................................................................................................. 19
Figure 13: Each angled image captured by the camera while the object is being rotated is set vertically
[14] .............................................................................................................................................................. 20
Figure 14: An overall image is created from segments of the vertical images [14] ................................... 20
Figure 15: The proposed flowchart of the system, along with the input and output hardware [16] ........ 21
Figure 16: The Tactobook concept. a) The e-book is stored onto a memory drive and plugged into the
Tactobook. b) The reader feels for the Braille on the Tactobook as it prints out the e-book ten cells at a
time [17]...................................................................................................................................................... 22
Figure 17: Raspberry Pi Model B+ ............................................................................................................... 29
Figure 18: Read2Me System Architecture .................................................................................................. 37
Figure 19: System Use Case ........................................................................................................................ 38
Figure 20: Read2Me Hardware Architecture .............................................................................................. 39
Figure 21: A draft of the layout of the system ............................................................................................ 40
Figure 22: Read2Me Clicker Diagram (Design View and Implementation) ................................................ 41
Figure 23: A tactile push switch .................................................................................................................. 42

5
Page
List of Tables

Table 1: Example Positioning Commands [6].............................................................................................. 13


Table 2: Example Reader Commands [6] .................................................................................................... 13
Table 3: Specifications of Raspberry Pi Model B+....................................................................................... 30
Table 4: Specifications of Raspberry Pi Camera Module ............................................................................ 31
Table 5: Comparison of Arduino and Raspberry Pi [7]................................................................................ 33
Table 6: Raspberry Pi models comparison [13] .......................................................................................... 34
Table 7: Hardware cost ............................................................................................................................... 44

6
Page
I. Introduction

Worldwide, there are 285 million visually impaired people. Out of these, 39 million are blind
and 246 million have low vision. About 1.4 million Children under the age of 15 are blind. More
than 90% of the worlds visually impaired people live in low and middle income countries and
82% of blind people living are aged 50 and above [1]. Popular belief holds that vision dominates
human experience. According to an experiment performed in [2], blocking vision resulted in the
largest loss of functional information and increased task difficulty, and fostered dependency.
Consequently, this impediment affects nearly all activities of the visually disabled.

However, despite of an entrenched research efforts, the world of print information such as
newspaper, books, signs, menus etc. remains mostly out of reach to visually impaired
individuals. Therefore our project will seek an answer to this persistent problem by developing
an assistive technology which will now onwards be referred to as Read2Me. A prototype is
shown in Figure 1.

7
Page

Figure 1: Read2Me Prototype


In this project, we describe the development of an innovative device, which will help the
visually impaired to read literary work and labels without any external assistance. This device
will be a valuable to gain access to information which cannot be read by the visually challenged
people in other words, material which is in print. The importance of visually impaired accessing
books or any material in print is to make them educated and literate by par to other people of the
society.
As most of the visually impaired people struggle in their work environment ending up in low
income jobs, therefore one of the goals of this project is to design a cost-effective technology.
The main goal of this project is to device a read2aid system with the following features:
small-size
lightweight
Efficient in using computational resources
low cost.
With the rapid advancements in mobile applications, the mobile phones available today are fully
fledged developed to run complicated applications such as business card readers, siri integrated
apps, road sign detector and translator, autonomous navigation, etc. Therefore, to make our
product scalable and efficient, we will be developing an android application that will have an
intuitive interface with OCR and TTS capability for the user.
Our system uses a camera module mounted on any wearable or a standalone device such
as glasses, bookstand , etc. that the user already owns or has to purchase. This module will be
further used to capture textual information, perform optical character recognition (OCR), and
provide audio feedback through an earpiece. Read2Me is switched on using on/off button of the
clicker upon which the user will be notified with two beeps as an indication. Capturing of the
image is triggered by the wired clicker in the hands of the user. The positioning of the camera
will be done through training of the users.
Additionally, the application will be responsible to play/pause the audio feedback of the text to
the user.
8
Page
II. Background and Statement of Problem
PREVIOUS WORK

The engineering report [3] discusses the implementation of an automated electronic pen
to aid the visually impaired in reading and understanding textual contents as shown in Figure 2.
The pen consists of a pinhole camera which captures the image of the text highlighted by the
pen, and then this image is input to the Intelligent Word Recognition to convert the image into
the text as shown in Figure 3. The text is then read out aloud using the text-to-speech converter.
This paper uses the Zigbee technology for the transmission and reception of the audio signals.
However this paper did not consider the amount of training a blind person would require to place
the pen on the words to be read. This could be a major problem since its obvious that a blind
person would not be able to accurately place the pen on the words, thereby rendering inaccurate
results.

Figure 2: Functional diagram for an automated pen explaining the differentprocess from capturing a image to reception by a
ZigBee headset [3]
9
Page
Figure 3: Simulation of a newspaper page for extracting the textual contents [3]

The research article [4] introduces the KNFB Reading Technology which puts the
functionality of a reading machine into a multifunction cell phone as shown in Figure 4. This
device as well requires the user to take a photo of the document of interest which is then
processed into speech and read out aloud. Simultaneously, it can display the text on the phones
built-in screen and highlight each word as it is spoken. However this feature is only compatible
with a selected number of high-end phones which are costly and are not affordable to a larger set
of people who are blind. Moreover, word recognition rates are too low to be used by visually
impaired. Also, this device has a human-machine interface that makes use of braille thereby
making it impossible for those to use who do not know braille. Its also more prone to be stolen
from the blind.
10
Page
Figure 4: A block diagram of the architecture and design of K-NFB Reader [4]

The journal article [5] discusses the implementation of a bar-code detector and reader.
This will help the blind person to distinguish between different products in a supermarket or in a
pantry at home. The interesting thing about this reader is that it has speech feedback to guide a
blind user to a detected barcode, thereby eliminating any kind of inaccuracy that might arise
from the wrong positioning of the reader, which was a problem in [3]. Nonetheless, the target
platform of the device is Nokia N95 phone which implies that the use of this reader is limited to
those who know how to operate this phone and who are in possession of one. Its obvious from
the above description that the ultimate goal of this device is to help the blind recognize different
products. This could also be done through a device that detects text and can simply read out the
labels instead.

11
Page
Figure 5: Tyflon Reader Prototype [6]

The journal article [6] discusses about a wearable document reader known as Tyflos
Reader (shown in Figure 5) for the visually impaired. The device is glasses which has two stereo
vision cameras mounted on top of it on either side. The device has a microphone which reads out
the text extracted from the image captured by the two stereo vision cameras. This device also
uses similar image processing technology as described above however the interesting aspect of
this reader is that it not only integrates speech feedback ( Commands shown in Table 1) but it
also takes speech commands (i.e. a voice user interface) from the user and acts appropriately.
Some of the user commands are shown in Table 2. Moreover, this device also uses page
segmentation through Zeta Scanning Algorithm which segments the document image into textual
blocks depending on the font size. This was done specifically for newspapers so that the
headlines could be separated from the supporting text. The primary processing device is a
PDA/laptop which implies that the user needs to purchase one before using the device.
Moreover, the Voice User Interface might not function perfectly in a noisy environment,
rendering it limited to indoor use.
12
Page
Table 1: Example Positioning Commands [6]

Table 2: Example Reader Commands [6] 13


Page
The journal article [8] elaborates on an innovative device named, Finger Reader (as
shown in figure 6) that serves as a wearable device like a ring to support text reading for the
visually impaired. The device was designed for the blind in response to several difficulties
encountered while reading text with the existing technology such as alignment issues, mobility,
accuracy, positioning, and efficiency. The Finger Reader introduces an innovative concept of
local sequential text scanning to read big blocks of text with each line progressively. It can also
be used for skimming to the major parts of a text in parallel with the provision of auditory
feedback to the user. The hardware implementation of this device comprises of two vibration
motors fixed on top and bottom of the ring that gives a haptic feedback through signal patterns
like pulsing to the user in order to guide in which direction the user should move the camera. In
accordance with the hardware design, a software stack is also implemented (as depicted in figure
7) on a PC application that comprises of text extraction algorithm, hardware control driver,
Tesseract OCR and flite Text-to-Speech software. The text extraction algorithm includes
complex image binarization and selective contour extraction methods that aid in refining the line
equations sequentially before sending them to the OCR engine. From the OCR engine the user is
able to listen to each word that falls under his/her finger, and at the end of every line read, the
system triggers an auditory feedback. However, one major drawback of this device is that as the
user moves progressively from one line to another the audio feedback sent to the user is
segmented instead of a continuous feedback, which confuses the user in their positioning of the
device on each line.

14
Page
Figure 6: Finger Reader

Figure 7: Software portraying the detected text while finger scanning, and the extracted text in camera analysis

Another journal article [9] on Text Tracking Wearable Camera System for the Blind aids
as a wearable camera with the ability to locate text regions in natural scenes and translation of
that text into an audible speech or braille. The two important mechanisms incorporating this
device are; tracking text in a video sequence and text detection. The device assembles
homogenous text areas prevalent in a natural scene so as to eliminate any redundant audio
synthesis or braille. The device consists of a head mounted video camera, an NTSC video
converter and a laptop PC. The captured videos are converted into a DV stream through the
converters. The procedure in this system include; Segmenting the current frame into pixel
blocks, extracting text blocks using a DCT feature[10], merging identical text blocks, creating
groups of text regions between the current and initial frame using the particle filter mechanism,
filtering non-text region from the frame and rendering the data into the OCR engine. Text
tracking (as shown in Figure8) is performed using particle filter by scattering weighted particles
over the text frame. The non-text area is removed from the image using the edge count method.
To obtain a perfect text image, image binarization(using Fishers Discriminant Ratio), edge
15

density, edge count, summation of text area image and width, average intensity of vertical edges
Page

using Sobel edge count detector is performed before feeding to the character recognition stage. A
higher value of these physical parameters calculated, ensures a better OCR accuracy. Finally,
after text image selection and filtering, the text images are fed to a commercial OCR engine, i.e.
Panasonic YomitoriKakumei 9.

Figure 8: Block Based Text Tracking

Journal article [11] on Camera Reading for Blind People, explains about the focus on the
development of a mobile application that allows the blind user to read a printable text. It
integrates the use of OCR and TTS tools so that the picture taken from the smartphone can be
converted into an audio feedback. For choosing an efficient OCR framework, a preliminary test
was performed by taking pictures of text with different layout, size, light and shape. Then a text
file was transcribed from each of the images to make a comparison between the text received
after the optical character recognition and the original text image. A Levenshtien Distance
Algorithm function was created for the measurement of similarity between the two strings of
optically recognized text and original text. Levenshtien distance was calculated based on the
number of operations needed to make the two strings equal. For instance, if the original string p
=kitten and the OCRed string t=sitten, the value of levenshtien distance d(p,t) calculated
will be 1 since only one alphabet has to be substituted to make the two strings equal. Hence, the
lesser the value of d , the better the framework performance. This test was conducted with 30
images being recognized by three different OCR frameworks and the differences produced
16

between the original and OCR-ed Text was verified. The results of the Levenshtien distance
Page

computed for the three different OCR frameworks are shown in the following Figure 9.
Figure 9: Comparison of the median value of the string distance in images for three frameworks Tesseract, Abbyy and Leadtools

It was observed from the above results that although the commercial frameworks like
Abbyy and Leadtools had better results, the research project was based on the free software
Tesseract due to project budget limitations. Furthermore, the TTS tool implemented on the
application was AVSpeech Synthesizer which was supported by iOS7 for human voice synthesis.
For a better system optimization and efficiency in the use of their application, two additional
stages were also included; preprocessing and post-processing. In the preprocessing stage
different image filters like CIColour control and CIColorMonochrome were added to improve
the image quality before feeding them to the OCR engine, whereas in the post processing stage, a
function was created to calculate an error rate percentage through a mathematical formula. If the
error rate exceeded a defined value, the user was prompted to repeat the process of capturing the
text image. The System architecture of their work is shown in Figure 10.

Figure 10: System architecture for Camera Reading for Blind

Since the application is aimed for a blind person, one limitation of this product is that it will be
17

difficult for a blind person to orient image capture with a proper positioning of the mobile
camera. The user will require some external assistance as the image capture is not implemented
Page
as an automatic system. Also, the user will have to purchase an iphone mobile to be able to
access the application.

The paper [12] Open Source OCR Framework using Mobile devices explains about a
project in which an image is captured using a Microsoft Windows mobile phone camera and
processed to be read out aloud from the built in speaker of the mobile. For the image processing,
the existing open source desktop technology is used to develop a complete OCR framework with
TTS capability. The popular Tesseract software was used for the text recognition and detection
whereas Flite speech synthesis module was used for adding the text-to speech functionality. The
image captured is converted into a bitmap format which is transformed into a single channel
intensity image. The intensity bitmap is used by the OCR engine. The Tesseract OCR engine will
translate this converted image into a text file with ASCII coded characters. This ASCII text file
is post processed to remove all the non alphanumeric characters before feeding into the speech
synthesizer. The total time for the entire system to capture image and synthesize text into speech
was accounted from around 8 to 12 seconds. The following diagram illustrates the overall
architecture of the system:

18

Figure 11: The System Diagram of the Open Source OCR Framework
Page
From the above figure, we can observe that the application core components comprised of a
simple GUI for the user interaction, a DAI i.e. Digital Audio Interfere for output of synthesized
speech, an adapter to transform the input image into output data readable by the OCR engine ,
an OCR engine and TTS synthesizer. The open source OCR engine and TTS tool were ported on
the mobile platform using Visual Studio IDE and Windows Mobile 5.0 SDK. The system
developed cannot, however, result in efficient real time results for the visually impaired people
who will have to operate the mobile camera. There will be image noise and distortions using the
mobile platform for capturing images. Moreover, images taken from a mobile camera will have
to be compressed in to a format readable by the OCR engine. This overhead of image conversion
can be eliminated with the use of a digital camera device which will support image capture in
BMP.

An issue with reading text via a camera rather than eyesight would be the shape of the
physical medium of the text. Work done by Ye, Yi, and Tian [14] suggests that reading from
cylindrical objects is possible using a camera attached to a pair of sunglasses or a cap. In order to
solve the issue of improper aiming from the blind person, a wide camera was used in order to
ensure that the whole image of what is to be read is captured. However, the camera will capture
not only the intended reading material, but also the background and surrounding objects. Using
the plain camera image in this project will confuse the system and cause inaccurate audio output.
However, in [14] motion-based background subtraction (BGS) is used in order to extract the
actual reading material from its overall surroundings. In order to improve accuracy, BGS
methods used with Gaussian mixtures assist in dynamic environments where light changes and
movement occurs in the foreground [14]. A Gaussian mixture is a parametric probability
density function represented as a weighted sum of Gaussian component densities [15].

Figure 12: The object is rotated to capture different scales and angles of the text intended for reading [14]

The objects in the images are normalized such that the intended text stands vertically in
19

the image, as seen in figure 13. From each of these vertical images, an overall image is stitched
Page

together, as seen in figure 14.


Figure 13: Each angled image captured by the camera while the object is being rotated is set vertically [14]

Figure 14: An overall image is created from segments of the vertical images [14]

For this project, where the wearer of the glasses would be trying to read books and labels,
this method of image to text conversion would be convenient, since the reader will not always be
looking at a flat surface to read. Book pages tend to be curvy closer to the binding to the spine,
and labels are often found on non-flat surfaces such as cans, bottles, and bags. However, the
actual program mentioned in this article was only made for labels on cylindrical objects such as
cans. The project discussed in this report is mainly intended for book reading, and using this
program itself may pose issues in image to text conversion.

Another scholarly article [16], related to reading labels, was released one year later after
the previously mentioned work. The purpose of this proposal was to establish the ability to read
label text from products with colorful text of varying fonts and patterns [16]. A wide-angle
camera was also used for this system in order to gather as much input as possible for the image-
to-text conversion. This project uses a camera attached to a pair of sunglasses acting as a live
20

webcam to capture images using OPENCV libraries, resulting in images with the RGP24 format
Page

to be processed [16]. This article proposes the use of algorithms in order to process images and
extract the label text from the surrounding clutter. The proposed algorithms then localize the
detected text into a readable format [16].

The reader is required to shake the object with the text intended to be read, such as the
program proposed by Ye, et al. Once the text has been detected, localized, and processed, it is
read as audio output to the glasses wearer via a Bluetooth earpiece [16].

Figure 15: The proposed flowchart of the system, along with the input and output hardware [16]

Like the previous article by Ye, et al, these algorithms are useful for reading curved
surfaces with large text, but may pose problems when reading books. Books are usually printed
in small text, which these algorithms may not be able to detect. The project discussed in this
report is made mainly for reading books, rather than only signs and labels.

One engineering article [17] has approached the issue of aiding blind people with reading in a
different manner. Rather than converting images of actual text into audio to be read to the reader,
Velazquez, Hernandez, and Preza have proposed a portable e-book solution where text is
converted into Braille for Braille readers to read themselves rather than relying on audio. This
product is called the Tactobook. The concept can be seen in figure 10. In the Tactobook, the pins
that are to be felt by the reader rise and fall depending on voltage being applied to the
transducers they are connected to [17].

21
Page
Figure 16: The Tactobook concept. a) The e-book is stored onto a memory drive and plugged into the Tactobook. b) The
reader feels for the Braille on the Tactobook as it prints out the e-book ten cells at a time [17]

This device gives another approach to allowing blind people to read by themselves. This
method may boost their confidence because they are reading the Braille alone rather than having
an aid of any form read it to them. The ability of this proposed idea to translate text in a digital
format into a mechanical form opens the window to other possibilities that may further help the
blind.

There are many mobile applications available in the market that integrate the OCR
technology and TTS software for reading out text from the images A few examples of such
public apps are:

SayText[18] application is built for iOS platform to read out the text from the
photos. The photographs taken can also be communicated through an email
service or used for reading the embedded text.
Google Goggles[19] is an android application based on OCR technology. It can
be used to translate the images into editable text. The editable text can be further
copied and sent as a message through email or an SMS. Moreover, Goggles
comes with a Google translate which can translate the text in to various languages
such as French ,Italian ,German, Spanish, etc
Voxdox[20] is an android Text-to-Speech application that can be used to read out
loud any form of text in more than 10 languages. It can be extensively used to
read ebooks, pdf or .doc documents, papers, articles or web pages using the TTS
speech. Voxdox can also be utilized as a document scanner through the mobiles
22

camera. The device take an image of the document in any language and convert
Page

its text into speech


CapturaTalk [21] application is available on both iOS and android platform. It
integrates five high quality applications, that is, Talkeing borswer, messages,
type and speech, file manager and OCR into one simple, novel app.

STATEMENT OF PROBLEM

Despite the availability of extensive studies and applications on the theme proposed in
this paper, we observe that there are many shortcomings existing in the works with respect to
real life scenarios such as image capture, and efficiency in text recognition when the conditions
are not ideal. Also, most of the systems developed are built using expensive hardware
components that stand beyond the reach of many visually impaired people.

After reviewing the published literature we can infer that there is no technology small enough for
the visually challenged to carry with themselves at all times and fast enough to match the pace of
reading by a person with normal sight. We intend to introduce a product which not only helps the
visually challenged to read but also helps the user gain confidence in exploring the city without
having the fear of being lost. Moreover; we intend to add additional features such as GPS
tracking to locate the user in case they feel they are lost. We will also try to develop a cost
effective product that is within the realms of the vast population of visually impaired people or
those who cannot read.

23
Page
III. Requirement Specifications

Most visually impaired people are distinguishable from the thick, black glasses they usually
wear to cover their eyes. From that observation, we decided to use these glasses to be the
surrogate for the project. Like people who wear normal glasses to see, the blind will be able to
use these glasses to read. In order for this to be possible, a Raspberry Pi microcontroller unit
will be designed and implemented. It will be integrated with a camera module and interfaced
with a mobile application. The weight of this whole unit will not exceed more than 3 lb.

For this prototype, three separate devices will be involved: the glasses to be worn, a small
case to be accompanied and a clicker to be carried while the glasses are in use.

Glasses components:

The glasses will contain the following parts:

Camera

The camera is the main component of the glasses part of this project. The main functionality
of the camera is to capture pictures of whatever the camera is aimed at. These pictures are then
sent to the Raspberry Pis memory.

The earpiece provides the main output of this product, which is the audio playback of the text
detected in the images sent. The audio will be received from the mobile phone once the image
has completed its processing cycle. The audio played back should be clearly audible in the
intended language (English, in this case).

Mobile Application:

The mobile application platform that we have chosen to use is Android because it larger number
of users and is more development friendly than the iOS platform.
24
Page
Other Carried components:

The following components will be included in the case:

Raspberry Pi B+ microcomputer
Battery
SD Card (8GB)
Clicker
Android phone

The Raspberry Pi model B+ is the center of operations. This device will be controlling
the camera connected to it, sending and receiving pictures taken by the camera, temporarily
storing audio conversions of captured images, and playing the output when requested.

The battery component is connected to some of the GPIO pins on the microcomputer, and
is estimated to power the overall system for approximately four hours, before it exhausts its
battery power.

The android application will be running on the android device for image processing and
speech synthesis.

The in- built speaker of the mobile phone will be used to connect a headset to act as the
medium of the audio product resulting from image processing.

The 8 GB SD card that is inserted into the Raspberry Pi will act as the memory of the system.
The operating system, images and audio results will be stored on this card.

The clicker will be used as the medium of interaction between the user and the glasses. This
clicker will include the buttons to switch on/off the glasses, capture images, and for sending the
GPS location to a friend (which will be already specified by the user).
25
Page
Additional Capabilities:

With the inclusion of the microphone in the system, further expansion can occur where extra
features are added:

1) The ability to pause audio. While the audio output is being played back through the
headset, the user will be able to pause it. This will stop the audio from playing until the
wearer of the glasses wants it to continue playing. This will be done through the clicker.
2) The ability to replay. If the user is not currently taking any pictures to be converted, they
can request to replay the last recorded audio session. If this is done, the most recent audio
file is retrieved and played back into the headset. While this is happening, pause and play
capabilities are possible.
3) The ability to increase or decrease the volume of the audio output as requested by the
user.

26
Page
IV. Design Objectives
DESIGN OBJECTIVES

The system should have a clicker that will be attached to the RPi through a wired
connection. The clicker consists of 3 buttons namely; on/Off, capture and trace me as
shown in figure.
The system will start with the click of the On/Off button on the clicker.
The camera attached to the glasses is portable so they can be mounted on any wearable
device or a stand.
The system will allow the visually impaired to take images of material in print they desire
to read at the click of the capture button on the clicker attached to the glasses. The clicker
will trigger the camera to initiate capturing.
The device will be able to scan and read a multiple number of English printed material
ranging from books, bills, documents, etc.
The image taken by the wearable glasses will be sent to the android application through a
USB connection so that they can be processed into an editable text format such as .txt or
.docx.
There is no storage of images on the RPi. Once the image is sent to the android device, it
will be deleted from the SD card of the board.
An OCR framework will be implemented on the application for the optical recognition of
the existing text. This OCR framework can be implemented using the open source
Tessaract framework.
A text-to-speech feature will also be integrated with the mobile application so that it can
synthesize human voice from the provided string of the text file.
An audio jack headphone or an earphone will be attached to the mobile which will play
an audio feedback in English language with a male or a female reading voice.
The application will be implemented with a GPS feature which will be activated on the
click of the trace me button of the clicker by the blind person if the user feels that he is
27

lost. This click will send an SMS with the GPS location to the blinds human aids phone
Page

from which the human aid will be able to track the blind person.
Since Read2Me is intended for indoor as well as outdoor, so to power the Pi board in a no
wire application, the best choice is to use a battery power supply.

DESIGN CONSTRAINTS
One of the major constraints in our system will be the correct positioning and alignment
of the user while taking a photo of the printable text from the wearable glasses. To
mitigate this problem, we will conduct special training sessions for the users to make
them accustomed to the proper use of the device.
The 5 Mega Pixel camera module is used with 8GB capacity of the raspberry SD card
that can be exhausted with approximately 4000 photos. To fix this problem, we delete the
image captured as soon as it is converted to speech. As for the audio clips, only up to 10
audio clips are stored on the SD card at any one time.
Since the camera module used does not support night vision, the user has to be in a well-
lighted environment to capture images.
The user can only take an image after the processing and audio feedback for the last
image has been completed.
Image processing on the application will demand some time and a lot of computational
power of the mobile phone that can result in a fast battery drain.

28
Page
Figure 17: Raspberry Pi Model B+

COMPONENT LEVEL SPECIFICATION

29
Page
Table 3: Specifications of Raspberry Pi Model B+

Specification of Raspberry Pi Model B+

BroadCom BCM 2835


Chip
SoC full HD multimedia applications Processor

Core Architecture ARM 11

Processor technology/CPU 700MHz CPU processor

Boots from micro SD card, running a version of Linux OS


Operating System Also supports Raspbian, RISC linux, Arch OS and many
more

Ethernet 10/100 Ethernet RJ45 socket on network board

Audio Output 3.5mm Jack, HDMI

Camera Interface 15-pin MIPI camera Serial Interface (CSI-2)

Operating System Boots from

USB 2.0 4 x USB Ports

Storage MicroSD

RAM 512MB SDRAM @ 400 MHz

Power Draw/Voltage 600 mA up to 1.8A @ 5V

CPU Speed 1.60 GHz

Storage MicroSD

GPIO 40

Operating Voltage 5V

Dimensions 85x56x17mm
30

Specifications of Raspberry Pi Camera Module


Page
Photo Resolution 5MegaPixel

Lens 5M

Aperture 2.9

Focal Length 2.9mm

Power Operates at 250mA

Connect the ribbon from the module to CSI port of


Usage
Raspberry Pi

Video Resolution 1080p30

JPEG, PNG, GIF, BMP, Uncompressed YUV, uncompressed


Picture Format Supported
RGB photos

Table 4: Specifications of Raspberry Pi Camera Module

Specifications of the GPS module:

165 dBm sensitivity, 10 Hz updates, 66 channels


Only 20mA current draw
Built in Real Time Clock (RTC) - slot in a CR1220 backup battery for 7-years or more of
timekeeping even if the Pi is off!
PPS output on fix, by default connected to pin #4
We have received reports that it works up to ~32Km altitude (the GPS theoretically does
not have a limit until 40Km)
Internal patch antenna which works quite well when used outdoors + u.FL connector for
external active antenna for when used indoors or in locations without a clear sky view
Fix status LED blinks to let you know when the GPS has determined the current
coordinates
31
Page
V. Technical Approach and Solution
The literature review showed that most of the image processing took place on a PDA or a
small laptop which required a certain amount of memory for processing. We decided to use a
microcomputer for processing whose memory is variable and depends on the capacity of the SD
card. We intend to send the images captured by the camera to the RPi through the ribbon cable
and the processed output i.e. .wav file to the audio jack.

Also, because we intend that our device to be wireless and ready-to-use anywhere, we decided
the power the RPi through a battery.

There are a lot of all the microcontrollers that support media transfer but we narrowed
our choices down to Arduino and Raspberry pi because of their popularity and support for REST
API (for future prospects). However, we have to pick one microcontroller that will establish the
connectivity between itself and the GPS and motion sensors so that the device could later on be
scaled to add more functionality. The following table summarizes the advantages and
disadvantages of the Arduino and Raspberry pi.

32
Page
Table 5: Comparison of Arduino and Raspberry Pi [7]

As it can be inferred from Table 3, Raspberry pi is 40 times faster than Arduino and has
128,000 times more RAM, and since our project involves sending and receiving multimedia, size
of the RAM and fast processing is one of the main goals we have to achieve. Moreover, we will
be using the newer version of Raspberry pi i.e. Model B+ that has a total of 4 USB ports.

According to [7], the Raspberry Pi is best suited for projects that require a graphic
interface or the internet and because of its various inputs and outputs; it also tends to be the
preferred board for multimedia projects. And hence we chose Raspberry pi as the primary
controlling device for our project. Furthermore, Arduino does not support audio output which is
a primary requirement for our project.

As per [13], out of the three raspberry pi models (model A ,model B and model B+),we
33

chose to work on raspberry pi model B+, since it has a provision of additional USB ports with
less power consumption by 0.5W to 1W in comparison with other raspberry models. In addition
Page
to this, the audio circuitry on B+ model is improved with an integrated feature of low noise
power supply. The table shown below will depict a comprehensive distinction between the three
raspberry pi models:

Table 6: Raspberry Pi models comparison [13]

OCR SOFTWARE
The software that we will be using for conversion from image to text will be Tesseract
OCR Engine because its free and available on the Raspberry pi. Tesseract converts the .jpg
images to .txt files. For the meantime, we will be testing the OCR as well as the TTS (Text-to-
Speech) functionality on the Raspberry Pi itself and will be extending it to the cloud in the next
semester.
34
Page
TTS SOFTWARE
The software that we will be using for conversion from text to speech will be Festival (Flite
as an alternative) because both of them are compatible with RPi and they synthesize speech
offline therefore no active internet connection is required.

PROGRAMMING LANGUAGE
The RPi can be programmed in many languages but we decided to program it using Python
because of the readily available documentation of media related projects that have mostly been
coded in Python and also because Python is way faster than any other programming language on
RPi.

RASPBIAN OS
Raspian is a Debian based free operating system that is used to make the Raspberry Pi
hardware run. More than just a pure OS, it comes with multiple software packages, and various
pre-compiled software combined in an efficient format for easy installation on the Raspberry pi.
We will be burning this OS on our 8GB SD card so we can start using the raspberry pi.

35
Page
FLOW CHART

36
Page
SYSTEM ARCHITECTURE

Camera is Camera Module


activated connected to RPi
Interfacing the Camera
Press ON/OFF module with the RPi to
Button of the send images.
clicker

Images are stored in


the micro SD card of RPi
Text to Speech
Image conversion to
Conversion
editable text format

TTS
OCR

Audio clip stored in the


SD card Audio jack connected to
AV port of RPi
Audio jack plays the
audio clip to the user
37 Page

Figure 18: Read2Me System Architecture


Capture

User Pause/Play
Raspberry Pi

Replay

Trace Me

Increase/Decrease Volume

Figure 19: System Use Case

38
Page
HARDWARE ARCHITECTURE

INPUT
The breadboard serves as a Camera Module
clicker connecting tactile
push switches to the RPi

8 GB SD card inserted at the


back of the RPi board

Battery Pack

OUTPUT
Earphone Android Device
s
39
Page

Figure 20: Read2Me Hardware Architecture


VI. Preliminary Design

The design of the final product, as a prototype, will consist of three parts: the glasses, clicker
, the RPi unit as a pocket device, and an android phone. As mentioned previously in this report,
the glasses will consist of the camera to take pictures of the text to be read, and the earpiece in
which the resulting audio will be played back to the wearer. The camera will be mounted to the
bridge of the glasses in order to capture most of the text the wearer desires to read. The
earpiece can be attached from the phone device to the user. Below is a figure of the drafted
design for the system.

Camera

Earpiece
Implementation

Raspberry Pi B+

Figure 21: A draft of the layout of the system

The case with the Raspberry Pi will include a battery. There will be a wired connection
between the RPi board and the camera module as well as the clicker. The ribbon cable from the
camera will hang out of the side of the glasses rather than on the front, in order to prevent
40

discomfort for the wearer. The wired connection to the Raspberry Pi case will be extended long
enough for the wearer to place the case wherever they want on their being.
Page
The user will carry a clicker (as shown in figure 21) with him when he wants to use the
glasses to read. The clicker is used to interact with the glasses.

Figure 22: Read2Me Clicker Diagram (Design View and Implementation)

Each of the buttons on the clicker is a tactile push switch as shown in Figure 22. All of these
41

will be connected to a small breadboard which will be connected to the RPi. This small
breadboard will act as a clicker in the prototype. The buttons connected to the clicker will be
Page

distinguished from their sizes by the user.


Figure 23: A tactile push switch

Communication between the RPi and Android Device:

An efficient Communication mode between the RPi board and the Android device is essential in
order to foster fast transmission of images and a quick audio response from the phone. There are
three different ways to implement this :

Bluetooth: The smartphone comes with an inbuilt feature of sending and receiving videos
or images through its Bluetooth interface. In order to enable the RPi for sending images
via Bluetooth, it will require a Bluetooth radio USB adaptor connected to one of its USB
hub which will install the Bluetooth on the Linux system of the RPi. This will be
followed by setting up the android phone as the Bluetooth FTP(File Transfer Profile)
server and the RPi as the Bluetooth File Transfer Profile Client. After this preliminary
setup, images can be sent to the android device using the Bluetooth File Transfer Profile
Client in Raspberry Pi[22]. One limitation of following this approach is the pairing issues
between the RPi and the android device. The RPi unit and the standalone android device
will hold a constraint to be in close vicinity to each other to fall in the same range.
Wireless USB Dongle or wired Ethernet Connection :
USB connection: This is the most efficient approach in making the RPi communicate
42

with the android device. The RPi will serve as the USB host and the android phone is the
Page

cxlient connected to the USB prt


Page 43
VII. Preliminary Cost Estimates
Item Available in COE Quantity Cost (Dhs)
store

Raspberry pi Model B+ Y 1 350

Raspberry Pi camera module N 1 175

Black Glasses N 1 93

Gomadic Portable AA Y 1 116


Battery Pack

Raspberry Pi Model B+ N 1 39
enclosure

8 GB SD card for Raspberry Y 1 80


Pi preinstalled with
Raspbian Wheezy

Tactile Push Y 3 30
switches(Clicker)

3.5 MM audio jack N 1 30


headphone

Huawei e173 3G-modem N 1 78

Android Phone N 1 500

Cost => 1740 Dhs.

Table 7: Hardware cost

44
Page
VIII. Project Management
PRELIMINARY SCHEDULE

Task Name Duration Start Finish Predecessors Resource Names

Sun Thu
Project Declaration 5 days
2/15/15 2/19/15

Sun Tue
Brainstorming Project Ideas 3 days Anza,Heba,Ragini
2/15/15 2/17/15

Wed Wed
Project Ideas Narrowed Down 1 day 2 Anza,Heba,Ragini
2/18/15 2/18/15

Thu Thu
Single Project Idea is Agreed On milestone 3 Anza,Heba,Ragini
2/19/15 2/19/15

Tue
Research 3 days Fri 2/20/15
2/24/15

Discuss and research more about


1 day Fri 2/20/15 Fri 2/20/15 4 Anza,Heba,Ragini
OCR and Text-to-Audio technologies

Mon Mon
Research Similar Projects 1 day 6 Anza,Heba,Ragini
2/23/15 2/23/15

Tue Tue
Research OCR 1 day 7 Anza,Heba,Ragini
2/24/15 2/24/15

Planning 10 days Fri 2/20/15 Thu 3/5/15

Thu
Flow chart drawn and finalized 1 wk Fri 2/20/15 4 Anza
2/26/15

Glasses design drafted 3 days Sun 3/1/15 Tue 3/3/15 10 Heba

System Diagram drafted and


5 days Fri 2/27/15 Thu 3/5/15 10 Anza,Ragini
finalized
45

Mon
Part Research 7 days Fri 3/6/15
3/16/15
Page
Brainstorm and research possible Wed
4 days Fri 3/6/15 12 Anza,Heba,Ragini
hardware parts 3/11/15

Thu
Discuss required hardware parts 3 days Sat 3/14/15 14 Anza,Heba,Ragini
3/12/15

Mon Mon
Ordered parts milestone 15 Heba
3/16/15 3/16/15

Researched how to use Raspberry Tue


13 days Thu 4/2/15
Pi 3/17/15

Tue Thu
Watched programming tutorials 3 days 16 Anza,Heba,Ragini
3/17/15 3/19/15

Met with lab instructor for Mon Mon


1 day 16 Heba,Ragini
explanation of setup 3/23/15 3/23/15

Tue
Raspberry Pi practice 13 days Thu 4/2/15 16 Anza,Ragini
3/17/15

Wed Thu
Report Preparation 12 days
4/1/15 4/16/15

Wed Wed
Divide parts 1 day 16 Anza,Heba,Ragini
4/1/15 4/1/15

Mon
Report progress 8 days Thu 4/2/15 22 Anza,Heba,Ragini
4/13/15

Tue Thu
First draft submitted to advisor 3 days 23 Anza,Heba,Ragini
4/14/15 4/16/15

Mon Mon
Implementation 11 days
4/20/15 5/4/15

Mon Wed
OCR installed and tested on RPi 3 days Anza,Ragini
4/20/15 4/22/15

Thu
TTS installed and tested on RPi 3 days Sat 4/25/15 26 Anza
4/23/15
46

Connect camera to Raspberry Pi Mon Tue


2 days 27 Anza
and test 4/27/15 4/28/15
Page
Test overall system with several Wed Mon
4 days 28 Anza,Heba,Ragini
scenarios 4/29/15 5/4/15

Wed Tue
Presentation Preparation 5 days
5/6/15 5/12/15

Wed
Prepare presentation slides 2 days Thu 5/7/15 24 Anza,Heba,Ragini
5/6/15

Mon
Rehearse parts 2 days Fri 5/8/15 31 Anza,Heba,Ragini
5/11/15

Tue Tue
Presentation Day 1 day 32
5/12/15 5/12/15

Wed Sun
Report Revision 8 days
5/13/15 5/24/15

Wed Thu
Regroup 2 days 33 Anza,Heba,Ragini
5/13/15 5/14/15

Thu
Revise first draft of final report 5 days Fri 5/15/15 35 Anza,Heba,Ragini
5/21/15

Sun Sun
Submit final report milestone 36 Anza,Heba,Ragini
5/24/15 5/24/15

47
Page
GANTT CHART

48
Page
IX. Societal Impact
The OCR technology is rapidly evolving in being an instrumental part of our everyday lives.
Even though the application of OCR technology falls in various different categories like
business, teaching, Medicine, etc., the most effective and efficient application of it can be made
for the disabled. In this context, the Read2Me collaborates the OCR technology with the Speech
synthesis tool to make reading an easy task for the visually impaired. It eliminates the need to
learn Braille which might take years to learn for the blind. The goal of this project is to make the
best use of the available technology that can alleviate the lives of these people. Read2Mewill
significantly speed up the reading process using the OCR without having to manually transfer the
text scrip from an image. Another advantage of this device is that the user does not have to
install any additional hardware or software and they can start reading the document anywhere.

Integrating a GPS feature to Read2Me makes it a more robust product. It can help the blind
person to be located by their loved ones in case they are lost. All in all, Read2Me can serve as a
complete package to enhance the lives of the visually impaired people to a great extent.

The pouct can be used as a literacy support for people who are learning to read or who cannot
read, the visually impaired as well as dyslexic patients

49
Page
X. List of Components
HARDWARE

Item Available in COE store Quantity

Raspberry pi Model B+ Y 1

Raspberry Pi camera module N 1

Black Glasses N 1

GPS HAT for Raspberry Pi B+ N 1

Gomadic Portable AA Battery Pack N 1

Raspberry Pi Model B+ enclosure N 1

8GB SD card for Raspberry Pi preinstalled with Y 1


Raspbian Wheezy

Push buttons (Clicker) Y 3

Huawei e173 3G-modem N 1

Android Phone N 1

50
Page
SOFTWARE
Software packages needed:

Raspbian Wheezy software


Tesseract OCR software
Festival and Flite TTS software

NETWORK
Network protocols used:

IC Serial communication between Raspberry Pi and the Camera Module


USB connection between the Raspberry Pi and camera module.

51
Page
XI. Future Prospects
MOTION DETECTION
Read2Me could be scaled to include a Passive Infrared (PIR) motion sensor to detect motion
from pets/humanoids from about 20 feet away. This could be helpful for the blind because it
would give them the confidence of knowing if theres anyone within 20 feet away and it would
be an indication for the blind to be careful while walking.

SECURITY
Read2Me could be made secure by using the fingerprint sensor. Security might be of interest to
some if they desire to keep the GPS locations private.

DISTANCE SENSORS
The product can use Infrared (IR) distance sensors, also known as IR break beam sensors, to
determine how close the nearest object is (for over 1m distance). This will also boost the blinds
confidence and would alert them if they are about to approach any object.

SIRI-LIKE APPLICATION
The Raspberry Pi can have a Siri-like application which will allow the user to communicate with
the glasses. To implement this application, the RPi needs a listener/talker pair to develop a voice
user interface (VUI). We decided not to implement this because we wanted to limit the scope of
our project by eliminating the VUI.

CLOUD COMPUTING SYSTEM


TO BE DONE

SHARING FACILITY
The user could also have the ad-on feature to share the book that he/she is reading i.e. the audio
output of the glasses, with any other user possessing the same ear piece and present within the
same wireless network. For this, the Raspberry Pi must be connected to the internet or the
52

Bluetooth, as well as the other ear pieces that are expected to receive the audio. This sharing
facility could allow the blind person in the possession of a Read2Me glasses share the book
Page
he/she is reading with any other user who has a wireless earpiece and is within the wireless
vicinity.

53
Page
XII. Conclusion
Assistive technologies have been rapidly evolving and it is a major step in aiding the blind
and visually impaired (BVI) in educational preparation for work, and in employment. The use of
these technologies has helped the BVI to access the information that was previously out of their
reach. There have been various solutions and improvements in the area of assisting the blind to
read however the technology has been limited to braille which requires the blind to learn braille.
Other technologies that have eliminated the need to learn braille have so far only been limited to
research and their functionality is restricted to reading only. Our proposed project is set to give
greater independence to blind by not only allowing them to read books of their own choice but to
have the confidence to navigate through different places without having the fear of being lost,
since our project integrates GPS positioning which will allow the people authorized by the owner

54
Page
XIII. Standards

Android accessory communication protocol on the RPi to communicate with android

55
Page
XIV. Testing

56
Page
Appendix

57
Page
Glossary

TTS
OCR
RPi
RESTful
Assistive technology
Image Binarization
NTSC
DV Stream
DCT feature
Levenshtien distance
Sobel edge count detector
CIColour control
CIColorMonochrome

58
Page
XV. Bibliography

[1] World health organization official website. August 2014. [Online]. Available at
http://www.who.int/mediacentre/factsheets/fs282/en. Accessed on March 4, 2015.

[2] H. Schifferstein&P.Desmet. The effects of sensory impairments on product experience and


personal well-being.Ergonomics, 50, 2026-2048.

[3] Joshi Kumar, A.V., T. MadhanPrabhu, and S. Mohan Raj. A pragmatic approach to aid
visually impaired people in reading, visualizing and understanding textual contents with an
automatic electronic pen'. AMR 433-440 (2012): 5287-5292. Web. 4 Apr. 2015.

[4] CelineMancas-Thillou, Silvio Ferreira, Jonathan Demeyer, ChristopheMinetti, and Bernard


Gosselin A Multifunctional Reading Assistant for the Visually Impaired in Hindawi Publishing
Corporation EURASIP Journal on Image and Video Processing Volume 2007, Article ID 64295,
11 pages doi:10.1155/2007/6429

[5] Tekin, E., & Coughlan, J. M. (2009). An Algorithm Enabling Blind Users to Find and Read
Barcodes. Proceedings / IEEE Workshop on Applications of Computer Vision. IEEE Workshop
on Applications of Computer Vision, 2009, 18. doi:10.1109/WACV.2009.5403098

[6] R. Keefer., & N. Bourbakis. 'Interaction with a Mobile Reader for the Visually Impaired'.
21st IEEE International Conference with Artificial Intelligence Tools. 18.03 (2009): 229-236.
Web.

[7]T. Klosowski. (2013, Nov. 7).How to Pick the Right Electronics Board for Your DIY Project.
[Online]. Available: http://lifehacker.com/how-to-pick-the-right-electronics-board-for-your-diy-
pr-742869540

[8] R.Shilkrot.,&P.Maes.(2014,May.1).FingerReader:A wearable device to support text reading


on the go.[Online]. Available: http://fluid.media.mit.edu/sites/default/files/paper317.pdf

[9]H.Goto&M.Tanaka. Text Tracking Wearable Camera System for the Blind.10th


International Conference on Document Analysis and Recognition. (2009).[Online].
Available: http://www.cvc.uab.es/icdar2009/papers/3725a141.pdf

[10]H. Goto. Redefining the DCT-based feature for scene text detection Analysis and
comparison of spatial frequencybased features. International Journal on Document Analysis and
Recognition (IJDAR), 11(1):18, 2008

[11]R.Neto. &N.Fonseca. Camera Reading for Blind People. Volume 11, 11.11 (2014) 1200-
59

1209.[Online].Available at
Page

http://www.sciencedirect.com/science/article/pii/S2212017314003624
[12] Zhou, S.Z., Open Source OCR Framework Using Mobile Devices, Multimedia on Mobile
Devices 2008. Edited by Creutzburg, Reiner; Takala, Jarmo H. Proceedings of the SPIE, Volume
6821, article id. 682104, 6 pp. (2008)

[13]E.Brown (2014,July 14). Raspberry Pi Model B+ adds USB ports, Expansion pins
[Online]. Available at http://linuxgizmos.com/raspberry-pi-model-b-plus-adds-usb-ports-
expansion-pins/. Accessed on April 12, 2015.

[14] Ye, Z., Yi, C., Tian, Y. Reading labels of cylinder objects for blind persons in 2013 IEEE
International Conference on Multimedia and Expo, 2013, IEEE. doi: 10.1109/ICME.2013.6607632

[15] Reynolds, D., Gaussian Mixture Models [Online] Available at:


https://www.ll.mit.edu/mission/cybersec/publications/publication-
files/full_papers/0802_Reynolds_Biometrics-GMM.pdf [Accessed: April 13, 2015]

[16] Rajkumar, N., Anand, M.G., Barathiraka, N., Portable camera-based product label reading
for blind people, International journal of engineering trends and technology, vol. 10, no. 11,
pp. 521 524, April 2014.

[17] Velazquez, R., Hernandez, H., Preza, E., A portable ebook reader for the blind in 2010
Annual International Conference of the IEEE Engineering in Medicine and Biology Society
(EMBC), 2010, IEEE. doi: 10.1109/IEMBS.2010.5626218

[18] SAYTEXT, Saytext, [Online]. Available at: https://itunes.apple.com/PT/app/id376337999

[19] Google Play, Google Goggles, [Online]. Available at:


https://play.google.com/store/apps/details?id=com.google.android.apps.unveil&hl=en

[20] Google Play, Voxdox, [Online]. Available at


:https://play.google.com/store/apps/details?id=org.apache.cordova.Voxdox&hl=en

[21] CapturaTalk,[Online]. Available at: http://www.capturatalk.com/134/features

[22]Osman, M. Transfer MP3 songs in Raspberry Pi to Android Phone using Bluetooth.


[Online] Available at:
http://www.instructables.com/id/Transfer-MP3-songs-in-Raspberry-Pi-to-Android-Phon/
60
Page
QUESTIONS:

Why using the mobile application for processing? Why not do it on RPi itself?

Features of mobile application?

How is the RPi interfaced with the android device?

Are there open source OCR and TTS framework for mobile platform( because threre are for the
desktop already)? If so, which ones? Are we going to use them, or develop our own OCR
framework?

What have we inferred from the literature review? Are we following any of the approaches
presented there in our system?

Positioning of the user?( Are we using a stand or glasses)

Speed of Serial communication?

Is serial communication same as USB connection between the RPi and android?

Check the time it takes to send image from RPi to android?

Write about Tesseract: opern source software developed by HP

This OCr enginer will return the text image as a text characters fdile. Also a log file records are
61

obtained through OCR engine which can be used for debugging


Page
If the OCR engine cannot read the text from the image, due to image poor quality, hould be
integrate any pre processing techniques like conversion of image into RGB

The tts tool will ynthesize speech at what frequency?

Why are we not using the mobile camera instead of going for a digital camera????? The
resolution of the mobile camera is 8 Mpi and raspberry is 5 mp

We use mobile for more memory spcane and computational power of the phones, as they also
compise of FPU

Since the camera is portable change ghlasses to portable camera( everywhere in the report)

https://www.raspberrypi.org/forums/viewtopic.php?p=521067

62
Page
http://techpp.com/2011/11/03/top-ocr-apps-for-android-and-ios/

Transfer MP3 songs from raspberry pi device to mobile camera

http://www.instructables.com/id/Transfer-MP3-songs-in-Raspberry-Pi-to-Android-Phon/

Bluettoootb

http://raspberrypi.stackexchange.com/questions/5346/bluetooth-connection

63
Page