You are on page 1of 36

CINF E NOR – Vision Conception and Proposed Design Model

Vision Conception, Elicitation, Design and


Proposed Model

Computer Interface for Non-visual Friends Environment using


Naturally Oriented Response
(CINF E NOR)

Conceived By:
Ahsan Nabi Khan

-1-
CINF E NOR – Vision Conception and Proposed Design Model

Document Information

Category Information
Customer The Non-Visual User

Project Computer Interface for Non-visual Friends Environment using Naturally Oriented
Response (CINF E NOR)
Document Vision Conception, Elicitation, Design and Proposed Model
Document Version 1.2
Status Draft
Author(s) Ahsan Nabi Khan
Release Date April 12, 2007

Definition of Terms, Acronyms and Abbreviations


This section should provide the definitions of all terms, acronyms, and abbreviations required to
interpret the terms used in the document properly.

Term Description
The person or environment that does not or may not depend on visual
Non-visual
signal and perception
The understanding of the problem with interest in discovery of real
Vision Conception needs and requirements. Emphasis is on vision of the unseen dilemma
typical of research areas focusing ill-defined problems.
RS Requirements Specifications
M Module
APHI All-Purpose Help Interface
TREBLE Thematic and Rhythmic Enabling of Music Environment
An Italian Text-to-Speech Synthesizer that can pronounce words in
FESTIVAL
many different languages
CART Classification and Regression Tree
TTS Text-to-Speech (Synthesizer)
An application that reads the monitor using a text to speech converter,
Screen Reader
speech engine, and a lexicon with associated sound files
An application that recognizes human voice and maps it to meaningful
Dictators
text
The Braille that changes characters mechanically to state different
Refreshable (Braille
messages on the same physical object. It may be imagined as a tactile
display)
graphical user interface.
CD / DVD The compact optical disks used for installation and huge data storage
An application commonly used in Microsoft Windows but also in other
Media Player
operating systems that plays media files, mostly audio and video.
It is not a visually determined menu. Instead, menu just gives an
APHI Menu abstract concept and may resemble even cooking menus or customer
help line menus
Menu Bar / Tool Bar / These terms are used in the common sense of visually graphical user
Bold / Italic interface

-2-
CINF E NOR – Vision Conception and Proposed Design Model

Term Description
Bell is specifically a discrete unit of sound used in our rhythmic musical
Bell interpretation. Sound file associated with a bell is as short as the sound
of a single keyboard key press
Tune is event-specific musical sound file just like the welcome sound
Tune
and USB detaching sound in Microsoft Windows XP
Resembles anything like drum beats, dholki, keyboard typing sounds
Percussion
(drum used in South Asia)
Brass Resembles metallic sounds like trumpet , xylophone and tuba
Resembles guitar, sitar, sarangi (musical instruments listened in
Strings
Pakistan)
Winds Resembles flute, cuckoo sound or dove cooing
Aims to create the most basic and common subset of the entire
Generic requirement set from all possible relevant users in Pakistan, in order to
streamline current scope.

-3-
CINF E NOR – Vision Conception and Proposed Design Model

Table of Contents

Introduction........................................................................................................................8
1.1Purpose of Document ........................................................................................................................8
1.2Project Overview................................................................................................................................8
1.3Scope 9

2.Elicited Functional Requirements............................................................................... 11


2.1All-Purpose Help Interface (APHI)................................................................................................. 11
2.2Thematic and Rhythmic Enabling of Music Environment (TREBLE):...........................................11

3.Elicited Non-functional Requirements........................................................................12


3.1Special Emphasis on Informal Touch in Help................................................................................. 12
12
3.2Ethical Constraints: Unconditional Positive Regard for main Stakeholders................................... 12
3.3System Level Constraints.................................................................................................................13
3.4Aesthetic Sense of Musical Environment required for TREBLE.................................................... 14

4.Design Principles...........................................................................................................16
4.1Design Principles for Non-Visual Information................................................................................ 16
4.2Speech Synthesizer Design Principles............................................................................................. 17

5.Conceived Model...........................................................................................................18
5.1Model Selection............................................................................................................................... 18
5.2Recent Research Extracts for Relevant Model Selection.................................................................18
5.2.1ConSearch: An Concept-Associating Search Interface using Commonsense, Chia-Hsun Lee,
Henry Lieberman..............................................................................................................................19
5.2.2A Concept Graph Editor for Computer_Aided Learning, Graham Horton_ Richard Grillenbeck_
Florian Kraus....................................................................................................................................19
5.2.3User Interface Design Principles for Visual Information Seeking Systems, Herald Reiterer,
Human-Computer Interaction Group, Department of Computer and Information Science,
University of Konstanz.....................................................................................................................20
5.3First Draft of Module Diagram........................................................................................................ 22
5.4Modules Descriptions...................................................................................................................... 23
5.4.1Module Description: Run Interface...............................................................................................23
5.4.2Module Description: Find meaning.............................................................................................. 23
5.4.3Module Description: Respond with TREBLE.............................................................................. 24
5.4.4Module Description: Rescue APHI...............................................................................................24

6.Engaging feelings in any language using Efficient Emotional Speech Synthesizers


and improvements in FESTIVAL TTS..........................................................................26
6.1Emotional FESTIVAL-MBROLA TTS Synthesis.......................................................................... 26
6.2General Purpose Unit Selection Speech Synthesizer....................................................................... 27
6.2.1 Synthesis Procedure..................................................................................................................... 27
6.2.2 Voice Design................................................................................................................................27
6.2.3Voice Building Tools.................................................................................................................... 28
6.2.4Real World Voice Performance:................................................................................................... 28
6.3TTS for Indian Languages............................................................................................................... 28

-4-
CINF E NOR – Vision Conception and Proposed Design Model

7.APHI Interface Prototype Walkthrough.................................................................... 29

8.Semantic Network Model summarizing the Prototype Walkthrough..................... 35

9.References and Helps taken from................................................................................36

-5-
CINF E NOR – Vision Conception and Proposed Design Model

Table of Figures

Figure 1: ConSearch System Architecture.................................................................... 19

Figure 2: Concept Graph being edited in Java Application........................................ 20

Figure 3: Semantic Network representing knowledge in the field of music...............21

Figure 4: A Module Diagram for CINF E NOR using rules, triggers, facts and flows
through Knowledge Bases and other storage systems.................................................. 22

Figure 5: Welcome Screen on start, introducing the keys and simultaneously waiting
for input. User gets next option by ENTER or SPACEBAR keys...............................29

Figure 6: Option to change volume ............................................................................... 29

Figure 7: Option is selected.............................................................................................29

Figure 8: System prompts to open the volume changer............................................... 29

Figure 9: Volume Control appears. Change by tab and cursors.................................29

Figure 10: User Returns to APHI by ALT+TAB..........................................................29

Figure 11: Help option is given for program help and general documentary help... 30

Figure 12: "Manage Computer" option is given.......................................................... 30

Figure 13: "Manage Computer" option is selected...................................................... 30

Figure 14: "Manage Computer" submenu starts with voice echo.............................. 30

Figure 15: "Explore My documents" option is given................................................... 30

Figure 16: "Explore My documents" option is selected...............................................30

Figure 17: System prompts to open "My Documents"................................................ 31

Figure 18: "My documents" opens up........................................................................... 31

-6-
CINF E NOR – Vision Conception and Proposed Design Model

Figure 19: User returns to APHI using ALT+TAB...................................................... 31

Figure 20: Next option is to open "My computer"....................................................... 31

Figure 21: Next option is to open D Drive and so on.................................................... 31

Figure 22: User wants to return to main menu and chooses the ‘home’ option........ 31

Figure 23: Semantic Network......................................................................................... 35

-7-
CINF E NOR – Vision Conception and Proposed Design Model

Introduction
1.1 Purpose of Document
The document aims for an understanding of the non-visual human-computer interaction. It
attempts to assess, derive and possibly discover the real needs and requirements of the non-visual
users. Emphasis is on vision of the unseen dilemma typical of information systems and interfaces
limited by visual signal detection and representation. This document lists down and describes the
elicited requirements broadly related to needs of business, user, functionalities and constraints
within the defined scope and relevant to our vision in the background. Our main aim is to build a
computer interface specially designed for the unaddressed needs of the non-visuals, which may
also be the basic needs of the visually limited. The vision and scope discussed in overview elicits
the aim further.

1.2 Project Overview

1.2.1 Understanding of the non-visual


There are millions of non-visual people, mostly non-congenital, and hundreds of millions visually
limited people in the world. They use minimally or do not use their sense of sight, even if they
have some visual experience and knowledge from the past. Therefore, their cognition relies
heavily on other senses. This means their perception of the world is different from ours.

However, their percepts may not necessarily be a subset of our percepts. Since ordinarily people
may have visual illusions and tend to rely on visual sensory input more than other inputs, they
may ignore the cues from other senses. Also, our knowledge base is subjective to visual
associations. Understanding the words of Milton’s Paradise Lost is difficult for ordinary people
who get biased by visual cues.

“As one great Furnace flam'd, yet from those flames


No light, but rather darkness visible
Serv'd onely to discover sights of woe…”

And earlier:

“Dove-like satst brooding on the vast Abyss


And mad'st it pregnant: What in me is dark
Illumin, what is low raise and support;”

(Taken from Book 1, Milton’s Paradise Lost)

As can be inferred from these words and otherwise, the knowledge base and cognition of the non-
visuals does not necessarily depend upon shapes, sizes, colors, shades, and spatial configuration.
Rather, their words associations may be more abstract, aurally and tactilely inspired. They may
not necessarily resemble autistic individuals, who tend to be aloof from the world. A limit to
sensory input is by no means an indication of detachment from environment.

When interacting with environment, our subjects may have special needs which when not catered
for, can produce issues specific to their professions, age groups, economic and technological
factors, etc. Let us classify these issues:

1) Professional Issues related to:


a) Home-based earners
b) Students, Teachers and Researchers

-8-
CINF E NOR – Vision Conception and Proposed Design Model

c) Laborers,
d) Administrative Staff
e) Technical Staff

2) Age group Issues:


a) Learning and training expectations
b) Community interaction and communication issues
c) Educational and Information gathering issues

3) Economic Issues:
a) Market demand and supply of related products
b) Government, NGO, etc. involvement and interest

4) Technology issues:
a) Availability of related helpful technology
b) Patents and Licensing
c) Relevance and completeness for users
d) Extensibility and compatibility

1.2.2 Available Products for the non-visual and the visually limited
The available products can be classified into the following categories:

1) Products related to computer operation


a) Screen readers
b) Dictators
c) Braille display (refreshable), typewriters, translators

2) Products for movement


a) Walking helpers
b) Lasers
c) Location tellers

3) Products for specialized objects manipulation


a) Money counters
b) Time tellers
c) Talking Calendars
d) Notebooks

1.2.3 Core features observed in existing software applications


1) screen reader specific to particular software and operating system, with some stress patterns
2) keyboard shortcuts
3) Braille viewer, scanner, magnifier, printer
4) Inaccurate voice recognition systems

1.3 Scope
We are limiting our scope to the products related to computer operation, due to our limited
expertise related to computers. However, computers also have to interact with products related to
many other categories and fields. The electronic notebooks, geographic information systems,
location tellers, and many real-time operating systems need to communicate and interface with
ordinary personal computers. Therefore, we cannot exclude a computer from the external factors,
fields and environment when designing software for the non-visuals. Our product should be
extensible and compatible to any relevant present and future technology.

-9-
CINF E NOR – Vision Conception and Proposed Design Model

Presently, we are also limiting our study to the users who understand English language,
primarily because of ease in requirement gathering and our expertise and resources in English
language processing. Replicating the localization techniques specific to any other culture and
language should not be much of a challenge in other regions of the world once we realize the
formal product.

Having taken interviews and viewpoints of the users can help us categorize and prioritize users.
However, we admit that all categories of the user set cannot be searched exhaustively. Secondly,
getting complete and relevant requirements from all the discovered categories and implementing
them is also not possible, primarily because of the issues listed in the introduction.
Therefore, problems of scope include the following:

• If we let our product be specific to a certain group, profession, software application, and
commercial firm, then we are not doing anything to cater for the needs of remaining
million users.

• Limiting to specific technical needs of a minor user subset may require double domain
expertise and quadruple complexity. For example, developing a genuinely ‘user-friendly’
graphical user interface enabling computer vision for interpreting video clips, abstract art
paintings, 3-D contour graphs and landscape exploration may be close to the ultimate
achievements in the interfaces for the non-visuals, but far from feasible.

• Limiting to a specific existing software application can risk our product to inherit the
problems of compatibility, reliability, volatility, licensing, recurring costs, etc and also
introduces our products’ intrinsic set of problems. Both kinds of errors combined are not
just a sum.

Therefore, the critical determinants of scope boundaries are:

• Our user is the generic non-visual person who understands our natural language of
communication. Most of the people fall in this category. ‘Generic’ aims to create the
most basic and common subset of possible user requirements. Requirements specific to
some particular profession, age group, community, etc may be evaluated on demand
basis later, reusing the constructs developed by basic and common requirements.

• We do not assume or expect our user to be computer literate. If we require him to be


computer literate, then he may not require us to create a basic computer interface for
him. Many elderly, poor, and women Pakistanis are computer shy, or unaware, or simply
not inspired by the complex graphical user interface even if they have good vision. They
are included our potential users.

• Choosing through a list of options by just listening and imagination is an innate


human ability and demonstrated practically efficient in automated and live customer care
help lines. We call these constructs ‘menus’. Common computer commands and tasks
can also employ this model.

• Focus should be on the sense and sensibility in other than visual dimensions. The non-
visuals can extract more meaning out of sounds, voices, words, and their
explanations, tones, music, rhythms, vibrations, surface texture, and possibly many
unknown phenomena. Illustrating the functional aspects of computers can be
materialized through responses using any or all of these.

- 10 -
CINF E NOR – Vision Conception and Proposed Design Model

2. Elicited Functional Requirements


2.1 All-Purpose Help Interface (APHI)

APHI (pronounced ‘aahpee’) in Urdu refers to a familiar and informal word for elder sister. In
English it is close in connotation to ‘Bro’ or ‘Sis’. The product shall provide help when user
presses F1. Help shall be provided in menu form, the product read a menu from which the user can
choose options and can take help easily. Menu options can cover:

1. Reduce or increase the volume and speed of speech and music, or change the language to
make the computer more accessible
2. Launch any program or installed software on system
3. Find and manage files
4. Run audio media
5. Manage other system resources

There can be even user defined and customized submenus and keyboard shortcuts for any task.
Some examples could be:
1. To shutdown the system
2. To read date and time from the system
3. To search a particular file from the system
4. To start play (program) from CD OR DVD ROM.
5. Help for the installation of any program.
6. To calculate
7. To run JAWS, Microsoft Narrator, or other text-to-speech systems
8. To read out emails and RSS feeds (online news)

When giving options for launching any program or installed software, the user should be able to
choose from two lists:
1. Most frequently or recently used programs, e.g. media player, calculator, etc.
2. All programs

Most importantly, our APHI help must be understood, functional and error free. For that we have
to deal with following issues:

Communicate with the user when


1. volume is zero
2. language is not understood
3. music is distracting and interfering abnormally
4. any other fatal exception occurs

Our software help feature should be extendable to integrate help and support elsewhere. In future,
our software Help could be integrated to help files of other software applications and programs.
Text-to-speech converters may read the respective help files, while our software may provide a
musical environment for illustrating and consolidating messages through any such texts.

As an example, if and when our software gets integrated with a text-to-speech converter with
special support for Urdu, it should help the local user for an Urdu text Editor to activate menus or
focus on text virtually.

2.2 Thematic and Rhythmic Enabling of Music Environment (TREBLE):

- 11 -
CINF E NOR – Vision Conception and Proposed Design Model

1. The monotone bells may be broadly classified into bells of distinguishable quality or
instruments, e.g. percussion, brass, strings, winds, etc.

2. For each monotone bell class, the bells pitch may vary by basic pentatonic scale of music, or
any other scale, as may be suitable to implement and use.

3. Moving through menus may couple with string bells, moving through text may couple with
percussion bells, moving through continuous sliders and scales may couple with winds bells,
and such kind of appropriate differentiation be maintained consistently.

4. The bell pitch goes higher as the user moves up the screen with the standard hardware keys
used for moving up. Similarly, the pitch goes lower as user moves down the screen using the
standard keys. Similar rules can be formed for left and right directions.

5. The user can hear five levels of bells. The five notes in the pentatonic scale may determine
pitch at each level. The higher the level, the higher the pitch, and so is higher the importance
of the place it is indicating: for example when operating a text editor, the highest level (Level
4) may be indicating a command error, the spelling error is one level down (Level 3), an
alphanumeric character is on Level 2 but with different quality class of bells relevant to text
movement, and the lowest pitch indicates punctuation and spaces (Level 1) in that class.

6. Tunes are assigned to specific computer outputs: Browsing through the Help feature of our
software triggers tunes specific to semantics inherent in the events. Any other read-only
documents trigger various tunes that attempt to illustrate the general structure depending upon
formatting (Headings and specially formatted text taking a higher level), reading out the
menus takes the Level 2, simple error and computer prompting for user input takes Level 3,
and serious error takes Level 4.

7. If these tunes are rather distracting, a user can decrease frequency of ringing or can turn them
off.

3. Elicited Non-functional Requirements


3.1 Special Emphasis on Informal Touch in Help

Our software help from APHI aims to use its own sound files for reading and interacting with the
user in an informal, friendly, human-like and aesthetically appealing manner. This can be more
subjectively defined than simple uninformed text-to-speech converters. Since we would know the
semantic content of our inbuilt help files and documentation for our software, we need to assign
points of stress, tone change, choice of bells and tunes, etc. more accurately. For fulfilling this
requirement we may record audio files in our trained native speakers’ voices reading out
statements in help files effectively. The critical emphasis lies in illustrating our scenarios to the
user in close-to-natural interaction with predefined sound files and native speaker dialogues.

3.2 Ethical Constraints: Unconditional Positive Regard for main


Stakeholders

We cannot frame our questions that hint at their being 'special' or 'impaired' or 'disabled' or
'limited'. There are some things so obvious that asking them is embarrassing. For the others, one
has to read in between the lines so as not to let the user become defensive. Understanding of

- 12 -
CINF E NOR – Vision Conception and Proposed Design Model

Psychology is hence necessary. For example, some past questionnaires designed by computer
science researchers lacking experience in psychology wrongly asked in the manner below:

• Is there any difference between the learning capabilities of a Blind student and a sighted
student?
• Do you have any experience of teaching to sighted students?
• Do you give some concept of web designing to students?

Naturally, the response did not exceed more than a brief statement. To read in between the lines
becomes out of the question. A better approach would be an informal psychoanalysis by a
psychologist inspiring the user to freely say what he wants, rather than help us fill questionnaires.
Unconditional Positive Regard hence becomes the crucial determinant in requirement gathering.

Secondly, we cannot overlook the crucial needs or requirements of the pivotal stakeholders when
catering for the specific needs of less important ones, like some commercial firm or particular
class exclusively. Hence we list down the main stakeholders in order of decreasing priority. The
ones with higher priority have to be satisfied at first.

Stakeholders:
Priority 0: (Implicitly having the highest priority) God Almighty.
Priority 1: The general 'non-visual' user, not necessarily computer literate.
Priority 2: The ordinary 'non-visual' or 'partially sighted' old or young that would rather avoid
computer experience due to certain reasons like age, gender, profession, economic background,
social limitations, partial linguistic exposure, dependency on Visual Graphical User Interface etc.
Priority 3: The quality brand that produces the 'Computer Interface for the Non Visual
Environment' effectively for all the most important stakeholders mentioned above.
Priority 4: Quality assurance analysts, software testers and ordinary non-visual users as described
above, who can help in the gap analysis of requirement elicitation and prototype use.
Priority 5: International Software Open Source and Freeware Community, for example GNU, Free
ware Microsoft.NET Community, etc.
Priority 6: Non-governmental organization, Governmental Organizations (for example Pakistan
Information Technology Board), International Organizations,

3.3 System Level Constraints


The application should reside in a system that is compatible, maintainable, reliable, stable,
scalable, current, learnable, cost-effective and efficient in performance. The points listed below
specify such constraints:

• Our software application is platform independent. It should use cross-platform applications


like Adobe Acrobat 6.0 ® to create interactive, generic task-performing, index and lexicon-
building and easy-to-navigate documents that may also be used as APHI interface with
TREBLE using compatible sound files, e.g. WAV format files that are supported in both
Microsoft Windows ® and Macintosh ®.

• Currently it should support English and one more native speaker’s language. APHI should
playback recorded phrases and sentences in native pronunciation and stress using language-
specific phonetics and phonology for friendly interaction.

• CINFENOOR application should be learnable within a single day. The user should be able to
operate the basic functionalities of a computer after getting used to interacting with APHI for
one day.

- 13 -
CINF E NOR – Vision Conception and Proposed Design Model

• When the user does not understand the language, cannot listen to APHI’s sound, or cannot
resolve words easily, APHI must be ‘rescued’ in order to save her from crashing. APHI must
communicate with the user to inform him or her of any of these exceptions

• Currently it should support keyboard. Support needs to be extended to mouse, refreshable


Braille on keyboard and mouse, mobile technology and real-time operating systems in future.

• Currently a user may not be able to increase or decrease APHI’s speed since it is a recorded
sound, but if and when such Text-to-Speech converter is plugged in, APHI may talk slower or
faster.

• Speakers or headphones or any other listening device that plays the sound file correctly must
be available to the user in order do his/her particular task.

• APHI and TREBLE should not consume so much of CPU that the other complex applications
lose efficiency or become unstable.

• APHI should be able to tell if the operating system has become unstable. In future APHI may
catch viruses and spies using other anti-virus and anti-spy-ware technologies.

• APHI should not connect online without being requested for it; rather she should stay offline
at present. This is in order to prevent viruses, spy-wares, and also because of our limited
expertise in handling situations related to intranet and the Internet. APHI may also decline
installing any suspicious software or any software that is not in her trust list. She may ask the
help of a computer administrator for that.

3.4 Aesthetic Sense of Musical Environment required for TREBLE

User interacts with the computer in a musical environment with bells and tunes designed for
special needs.

• The bells intend to help our user build up a mental picture of where he/she is and where things
are on the screen. This mental picture should lead to improved and accurate navigation across
programs, windows, menus, etc.

• The tunes are by their nature more complex than monotone bells. They bear more meaning
than what bells can carry. Carefully selected and relevant tunes can illustrate and consolidate
messages from the computer to the user. Existing examples are system start-up and shutdown
tunes in mature and popular operating systems that mark the events with a welcoming note
and see-you-later feeling. Our software intends to add other important events specific to the
non-visual and visually limited within our scope. For each added event, newer tunes are to be
selected to illustrate them suitably. For example, opening a text editor can trigger a tune
resembling a typewriter speed typing followed by a ‘ding’!

• We observe from user viewpoints, interviews and preferences that non-visual people have a
great interest in music. So, musical environment will give them an aesthetic pleasing effect.
This fills in the dry space of existing screen readers that mainly act as close-to-natural text-to-
speech converters, but do not address issues of aesthetic appeal. Such environment also
attempts to supplement the minimal sounds and bells output in existing operating systems.
Presently, events like system start-up, shutdown, error alert, hardware plug-ins are some of
these minimal sounds, unless the user just plays a distracting song in the background. Here we
attempt to add events to our bells and tunes responses, and also to add new aesthetically
appealing and event-related tunes.

- 14 -
CINF E NOR – Vision Conception and Proposed Design Model

• The tunes intend to emotionally and aesthetically engage the non-visual and the visually
limited, as well as to illustrate the messages from the computer. They also intend to give an
informal or thematic touch whenever needed. Naturally, there can be more than one theme for
all possible tunes responses, subject to the temperament of the user. It is similar to visual
themes that many applications and operating systems provide for general users. We do not
intend to do detailed research at present in user music themes, but our software should be
extendable to incorporate newer themes, even customized themes, just like the visual themes.

- 15 -
CINF E NOR – Vision Conception and Proposed Design Model

4. Design Principles
Following Design Principles are being followed by our project. These principles are empirically
derived from seven or more different domains included but not limited to the list below. The principles
listed are not exhaustive and domain expertise is required for additional principles already derived
from past researches and projects.

1. Psychology
2. Human Computer Interaction (HCI)
3. Speech Synthesis, Signal Processing and Phonetics
4. Semantic, Syntactic and Morphological Level Natural Language Processing
5. Computer Vision
6. Design (Architectural Design, Computational Design, Software Design)
7. Data Mining and Information Trend Analysis
8. Business Intelligence and Data Warehouse
9. Knowledge Management

4.1 Design Principles for Non-Visual Information

These principles are empirically derived from at least six different domains relevant to Visual
Information Seeking Systems.

1 Human Computer Interaction (HCI) and Usability Engineering (UE)


2 Information Visualization (InfoVis)
3 Design (e.g. Computational Design)
4 Information Retrieval (IR) and Data Mining (DM)
5 Business Intelligence (BI) and Data Warehouse (DWH)
6 Knowledge Management (KM)

The principles are being taken verbatim to eliminate inaccuracy in intermediate interpretation.

User Interface Design Principles

1. Design an easy to use system that supports the user’s work in an effective and efficient
manner. UE
2. Design an easy to learn system that shows the user the possibilities of its use during the
interaction with it. HCI
3. Offer support during the formulation of the query to allow the user to express the right
information needs. IR + InfoVis
4. Offer a quick and insightful overview about all search results to find the “needles in the
haystack”. InfoVis
5. Offer the right amount of information in the context where the user need it. InfoVis
6. Present different aspects of interest at the same time to compare them or to get more
information at a glance. InfoVis
7. Offer possibilities to restrict the amount of information to selected topics of interest. InfoVis
8. Offer the possibility to customize the system reflecting the user’s personal needs (e.g. kind of
visualization, MCV, amount of information). HCI

- 16 -
CINF E NOR – Vision Conception and Proposed Design Model

9. Design a digital information space that offers the user a rich representation of information
from different information sources in an integrated fashion. BI + DWH
10. Offer different spheres of interest to keep and manage the user’s information needs and
search results for later use. KM

Some of these domains come in activities streamlined by Peter Coad’s Methodology to derive an
object model. However, an object model created in the domains of User Interfaces, Business
Intelligence and Database Management may not precisely map the conception of our model.

For example, one crucial domain that needs additional mentioning is the ‘Human Cognitive
Experience and Psychology’ (emphasizing the human element). Human Computer Interaction and
Usability Engineering are close to this domain, but they often fail to emphasize the experience beyond
engineering and usability. Here we need the expertise of any ordinary individual who has experienced
tremendous communication hurdles and has overcome those successfully. Examples could be a non-
visual, partially sighted, non-auditory, immobile or autistic yet accomplished professional.
Comprehensive study through his introspection should be a rich source of knowledge representation
techniques. A non-visual programmer, for instance, can help in transferring 3-D data to 0-D where D
represents the Visual Dimension. In other words, we can communicate the data not seen on even a
single line!

4.2 Speech Synthesizer Design Principles

1 Script Design: keep a good sampling, at least including all the unique diphones.

2 Recording Conditions: quality of recording sound and speaker should be professional as in


commercial media.

3 Performance should be as fast as is currently experienced in using Microsoft Narrator and


JAWS.

4 Pitch and Quality of sound should reflect the stress and tone variation in speech.

5 Close-to-neutral yet emotional parameters should be catered for at least as close as currently
done using FESTIVAL techniques for foreign and native languages.

- 17 -
CINF E NOR – Vision Conception and Proposed Design Model

5. Conceived Model
5.1 Model Selection

Modeling systems employing different signals has resulted in a surplus of models employing domain-
specific technical terminology and symbolism. In some domains it is convenient to consider the
systems composed primarily of Objects; in others Structural paradigm; sometimes the objects get
weaved through Aspects; sometimes sequenced through States and Spaces; sometimes defined as
Engineering Processes and Components; yet others find their own real-time and rational methods of
modeling and knowledge representation. All these work well in their domains, but need to
communicate with each other by way of humans or other agents. Increasingly cryptic domain-specific
knowledge implies increasingly complex, imprecise and fuzzy knowledge transfer.

Our project has to work with common knowledge base system for extracting and transferring
information from the huge diversity of fields in different material aspects. The reason for commonality
lies in the human’s own natural knowledge base structure. From the Biological and Cognitive
Perspective of Psychology, all the external knowledge that is useful for human being in any sense must
be encoded, processed, and/or stored in human nervous system, especially the specialized gray and
white matter in the brain.

All the external signals transduced directly by our senses or indirectly by machines for sensory
representation must eventually be encoded and processed by the network of neurons within the nervous
system. Therefore there already exists a model of common knowledge base for all the knowledge
gained by humans. The challenge is how to implement the neural network model of our brain in order
to exploit the power of such a knowledge base coupled with technological advances for understanding
material systems.

Here are two suggestions derived from recent researches and project assignments undertaken.

1. Concept Graphs with Search Interfaces and Computer-Aided Learning


2. Semantic Networks using Frames, Maps and Multi-maps
3. Directed Acyclic Graph representations

Through the representations, we should be able to:

1. Formulate query in close-to-natural language,


2. Search for facts and behaviors with close-to-natural response,
3. Refine query by ranking paths with the help of
a. Probabilistic techniques,
b. Heuristic techniques
c. Conceptual building blocks
d. Invariant rules of knowledge bases and meta-knowledge
e. Inference rules of Logic

5.2 Recent Research Extracts for Relevant Model Selection

Here are quoted some researches on Concept Graphs and Semantic Network::

- 18 -
CINF E NOR – Vision Conception and Proposed Design Model

5.2.1 ConSearch: An Concept-Associating Search Interface using


Commonsense, Chia-Hsun Lee, Henry Lieberman

MIT Media Laboratory


20 Ames. Str. E15-324
{jackylee, lieber}@media.mit.edu

ABSTRACT
This paper presents ConSearch- a concept-associating search interface based on a cognitive model of
web searching. Web search usually isn’t a good experience when possible results are totally un-
searchable. People consumed heavy mental loads of filtering out irrelevant web links. To make the
search experience easier, the search mechanism should be mapped into our mental model. Human
cognition has a great advantage over machines on recognizing things that make sense. Adding a layer
of conceptual relationship could help users easily figure out the right ways to go. ConSearch provides
an interactive way of retrieving search results by associating concepts.

Figure 1: ConSearch System Architecture

5.2.2 A Concept Graph Editor for Computer_Aided Learning, Graham


Horton_ Richard Grillenbeck_ Florian Kraus

Concept Graphs focus on ideas and relationships between them without worrying about the syntax and
semantics of language, such as grammar or sentence construction. This focus makes Concept Graphs
significantly easier to write and read than standard text. The typical linear single colour presentation of
information such as this paper_ is not at all conducive to the effective communication of knowledge.
Concept Graphs can achieve this significantly better.

The preparation of presentations and documents is greatly facilitated by the use of


concept graphs, since:
• ideas can be ordered easily and placed in context
• the overall structure can grow dynamically and adaptively
• Creativity is enhanced by adding a visual element to thinking
• By adding visual stimuli the right creative hemisphere of the brain is used in conjunction with
the left thereby taking advantage of both sides of the brain

- 19 -
CINF E NOR – Vision Conception and Proposed Design Model

Figure 2: Concept Graph being edited in Java Application

5.2.3 User Interface Design Principles for Visual Information Seeking


Systems, Herald Reiterer, Human-Computer Interaction Group,
Department of Computer and Information Science, University of
Konstanz

Comprehensive visual support during


• formulation of the query using a visualization of the semantic network (thesaurus)
• review of the search results using multiple synchronized visualizations
• refinement of the query using a visualization of the semantic network (thesaurus)

- 20 -
CINF E NOR – Vision Conception and Proposed Design Model

Figure 3: Semantic Network representing knowledge in the field of music

Potentials of user interface design principles for VISS

• They relieve the user from drowning in information flood,


• They help him find the particular “needles in the haystack”.
• They facilitate in explorations through intuition,
• They help in finding patterns and exceptions, and
• The even make browsing fun (Ahlberg, Shneiderman1994)?

- 21 -
5.3 First Draft of Module Diagram

Knowledge Base for


Non-visual Meaning Description

Actor or
External
System Find
Meaning

Response
Run
with
Interface
TREBLE

Interface Rules, Rescue


Facts and
Triggers APHI

Storage for Bells and


Tunes with rules
.
Figure 4: A Module Diagram for CINF E NOR using rules, triggers, facts and flows through Knowledge Bases
and other storage systems
5.4 Modules Descriptions
5.4.1 Module Description: Run Interface
<Module Id: Run Interface>
Actors: Non-visual user
Feature: The Module runs the interface of this project and through this interface it provides
the required functionalities in computer use.
Pre-condition: The interface application can be activated preferably at computer
start-up, or whenever the user needs to. The interface assumes that
it can be rescued in exceptions defined.
Scenarios
Step# Action Software Reaction
1. User enters the interface System welcomes the user and asks for any
choice from the main menu using <Respond
with TREBLE > (Exception 1)
2. User specifies the choice by one key System carries out the work implied in the
minimum and three keys maximum, choice, be it a submenu, a shell or DOS
that is: ENTER, SPACEBAR, and command, a configuration variation, etc. and
only in rare cases, one more key. waits for next choice. This is done with the
(Alternative 1) sound of APHI
3. User presses ESC the extreme top APHI asks permission to leave and the
left button of the keyboard. interface exits.
Alternate Scenarios: additional, optional, branching or iterative steps. Refer to specific action
number to ensure understandability.
1. User does not understand some word(s) and asks the meaning by pressing a key (easily
accessible and unique for word meaning). Calls <Find Meaning>
Exceptions:

1. The user faces some interface communication problems like non-native language,
incomprehensible or irritable speed, pitch or volume. The Module invokes <Rescue APHI> to
resolve the issues.

Post Conditions
Step# Description
The running operating system or the external system that called our Interface gets
back the control.
Module Cross referenced <Respond with TREBLE>, <Rescue APHI>, <Find
Meaning>

5.4.2 Module Description: Find meaning


<Module Id: Find Meaning>
Actors: Non-visual user
Feature: The Module finds for the user the meaning, explanation, description, or online
search result of any word(s) in dictionary, technical resources, the Web, etc.
Pre-condition: The user has access to the words to find and also the resources
where to find.
Scenarios
Step# Action Software Reaction
1. User inputs the word(s) to find System asks the user for the resource where to
find word(s) using APHI and TREBLE

- 23 -
2. User specifies the resource from a System searches the word(s) in the location
list or types the location and returns the result to the user through <Run
Interface>
Module exits.
Alternate Scenarios: additional, optional, branching or iterative steps. Refer to specific action
number to ensure understandability.

Post Conditions
Step# Description
Interface is running and <Run Interface> gets the returned meaning.
Module Cross referenced <Run Interface>, <Respond with TREBLE>

5.4.3 Module Description: Respond with TREBLE


<Module Id: Respond with TREBLE>
Actors: Non-visual user
Feature: The Module provides a musically aesthetic environment for the interface
background that aims to echo user choices, inputs, and actions.
Pre-condition: The interface application has been activated and sound is enabled.
Scenarios
Step# Action Software Reaction
1. User presses some key or System reads the scan codes of the keys and
combination of keys determines the action that the input generates.
System responds with an appropriate sound,
bell, or tune to echo the action. Then listens to
next key press through <Run Interface>.
(Exception 1)
3. User presses ESC the extreme top System echoes the ‘Bye Bye’ or ‘Keep in
left button of the keyboard. touch’ response with the action and exits with
APHI.
Alternate Scenarios: additional, optional, branching or iterative steps. Refer to specific action
number to ensure understandability.

Exceptions:

1. The user faces some interface communication problems like irritable or disturbing speed,
pitch or volume of music, no detection of keyboard, etc. The Module invokes <Rescue APHI>
to resolve the issues.

Post Conditions
Step# Description
System returns the control to the caller.
Module Cross referenced <Run Interface>, <Rescue APHI>

5.4.4 Module Description: Rescue APHI


This Module shall be used when volume is zero, language is not understood or music is distracting. User can
reduce and increase music speed.

- 24 -
<Module Name: Rescue APHI >
Actors: Non-Visual user
Feature: The Module tries to fix some interface communication problems like non-native
language, incomprehensible or irritable speed, pitch or volume of speech and music.
Pre-condition: <Run Interface> has invoked the Module.
Scenarios
Step# Action Software Reaction
1. User specifies the problem if not System tries to fix the problem and asks the
already identified. User inputs by user to test the fixed state
one key minimum and three keys
maximum, that is: ENTER,
SPACEBAR, and only in rare cases,
one more key. (Exception 1)
2. User accepts or rejects the System returns to the caller with the accepted
suggested solution of the problem. or last state.
Alternate Scenarios: Write additional, optional, branching or iterative steps. Refer to specific
action number to ensure understandability.

Exceptions:
If the ENTER, SPACEBAR, or other easily accessible keys are not functioning, or there are
some other serious hardware faults or failures, the System should notify the prescribed and
authorized hardware technician through some means like automatic telephone call.
Post Conditions
Step# Description
2 System has reached a stable state, or communicated the problem to a technician. The
telephone line works.
Module Cross referenced <Run Interface>, <Respond with TREBLE>

- 25 -
6. Engaging feelings in any language using Efficient Emotional
Speech Synthesizers and improvements in FESTIVAL TTS
In future, APHI should evidently have feelings, emotions, stress, tone and other semantic and
psychological level variations on a close-to-neutral scale. The possible research resources for
efficient emotional speech synthesis could be:

6.1 Emotional FESTIVAL-MBROLA TTS Synthesis

This is an extension of the previous research on speech synthesis using Italian FESTIVAL TTS and CART technique.
The extension is based on some valuable observations:

1 Voice Quality and Pitch show emotions


2 Punctuations and word, phrase, clause, sentence, and paragraph boundaries signal some typical stress and tone
pattern
3 Several Basic Emotions stored in Emotional Database: Neutral, Anger, Disgust, Fear, Joy, Sadness, and Surprise
4 Emotions correlate with pitch boundaries

See their reference for the VQ-PaIntE (Vector Quantized Parametric Representation of Intonation Events) model used to
effectively represent intonation curves:

The results have achieved satisfactory accuracy. See the quoted performance measurement below:

- 26 -
6.2 General Purpose Unit Selection Speech Synthesizer

Clarke, Richmond, King have defined a step-by-step procedure to build a native general-purpose speech synthesizer
using FESTIVAL Techniques. They also give a practical design and performance measurement. The main modules
in the procedure are listed below:

6.2.1 Synthesis Procedure


• Target construction: using phones as basic building blocks for speech structure rather than determining prosodic
level candidates
• Pre-Selection: limit the number of candidates to be used in join and Viterbi search
• Backing-off: to deal with the problem of ‘missing diaphones’
• Target Cost and Join Cost: used to sum user-defined weights and penalize bad candidates

6.2.2 Voice Design


• Script Design: keep a good sampling, at least all the diphones

- 27 -
• Recording Conditions: quality of recording sound and speaker

6.2.3 Voice Building Tools


Automatic Labelling: using Hidden Markov Models for lexical look-up, letter-to-sound and other voice synthesis
modules.

6.2.4 Real World Voice Performance:

6.3 TTS for Indian Languages


This is a TeNeT group’s project for developing speech interfaces. The people involved are the Faculty Members: Dr.
Hema.A.Murthy, Dr. C. S. Ramalingam and project members Project Members: N. Sridhar Krishnan, Samuel Thomas,
M. Nageshwara Rao, Y. R. Venugopalakrishna. See

http://lantana.tenet.res.in/apache2-default/Research/Speech/TTS/contents/main.html

They have found these critical issues


• A representative set of phones for Indian languages.
• Basic unit selection - half-phones, diphones, or syllables?
• A new database for storing all language variations
• Prosodic Model

This is their way of addressing their issues:

• They have used two common sets of phones: one set covering Hindi and similar sister languages of Aryan
origin, the other covering Telugu and similar sister languages of Dravidian origin.

• They use diphone as the basic unit or building block.

• They use the data-driven approach using Classification and Regression Trees (CART). So they also consider
Prosody

• They concatenate syllable-like units.

The sample voices created are available at their website:


http://lantana.tenet.res.in/apache2-default/Research/Speech/TTS/contents/main.html

- 28 -
7. APHI Interface Prototype Walkthrough

Figure 5: Welcome Screen on start,


introducing the keys and
simultaneously waiting for input. User
gets next option by ENTER or
SPACEBAR keys Figure 6: Option to change volume

Figure 8: System prompts to open the


Figure 7: Option is selected volume changer

Figure 9: Volume Control appears.


Change by tab and cursors Figure 10: User Returns to APHI by
ALT+TAB

- 29 -
Figure 11: Help option is given for
program help and general documentary Figure 12: "Manage Computer" option
help is given

Figure 13: "Manage Computer" option Figure 14: "Manage Computer"


is selected submenu starts with voice echo

Figure 15: "Explore My documents" Figure 16: "Explore My documents"


option is given option is selected

- 30 -
Figure 17: System prompts to open
"My Documents" Figure 18: "My documents" opens up

Figure 19: User returns to APHI using


ALT+TAB Figure 20: Next option is to open "My
computer"

Figure 21: Next option is to open D Figure 22: User wants to return to main
Drive and so on... menu and chooses the ‘home’ option

Walkthrough continues to show how APHI does more actions and operations like sound recording,
opening text editor, and internet explorer, remembering the passed submenus (“Manage Computer”
has been remembered):

- 31 -
- 32 -
Some more important options using text-to-speech systems with smoothing effects to speak in Urdu
and Arabic by just pronouncing roman letters, increasing or decreasing speed and pitch level of
discourse.

Screenshot using cross-platform media players:

- 33 -
- 34 -
8. Semantic Network Model summarizing the Prototype
Walkthrough

Change My Computer
Volume
APHI welcomes and
asks if user needs help.
She introduces the two/ My Documents
three keys
Change Speech, for “next” option, “select”
Voice or Language option,
Manage
and “back” option
Computer

APHI INTERFACE Help Documentation

Urdu
English Arabic

Open Application
Programs
Mekaal/Michael
Maryam / Mary

Surah
Falaq Word Text Editor
Calculator
Bayaanul Quraan
audio
Surah Record Sound
Google
Ikhlaas
Surah Faatiha

Figure 23: Semantic Network

- 35 -
9. References and Helps taken from
[1] Milton’s Paradise Lost, Book 1

[2] User Interface Design Principles for Visual Information Seeking Systems, Herald Reiterer, Human-Computer Interaction
Group, Department of Computer and Information Science, University of Konstanz

[3] ConSearch: An Concept-Associating Search Interface using Commonsense, Chia-Hsun Lee, Henry Lieberman, MIT
Media Laboratory

[4] A Concept Graph Editor for Computer Aided Learning, Graham Horton, Richard Grillenbeck, Florian Kraus

[5] Emotional FESTIVAL-MBROLA TTS Synthesis, Fabio Tesser, Piero Cosi, Carlo Drioli, Graziano Tisato, Istituto
Trentino di Cultura – Centro per le Ricerca Scientifica e Tecnologica, Trento, Italy and Istituto di Scienze e Tecnologie
della Cognizione, C.N.R. Padova, Italy

[6] Cosi P., Tesser F., Gretter R., and Pianesi F., “A modified ‘PaIntE Model’ for Italian TTS”, CDROM Proc. of IEEE
Workshop on Speech Synthesis, Santa Monica, California, 2002.

[7] FESTIVAL 2 – Build Your Own General Purpose Unit Selection Speech Synthesizer. Robert A.J. Clark, Korin
Richmond and Simon King CSTR, The University of Edinburgh

[8] Text-To-Speech Synthesis for Indian Languages and Indian English, Dr. Hema.A.Murthy, Dr. C. S. Ramalingam.
http://lantana.tenet.res.in/apache2-default/Research/Speech/TTS/contents/main.html

[9] Job Access With Speech (JAWS) Documentation

[10] Personal Informal Interviews with non-visual users Mr. Hassan Tareen, Punjab University, and Sajjad Ali Khan, ex-
Engineer and relative residing in USA.

[11] Past Interview documents created by researchers in FAST-NUCES for gathering requirements.

[12] “Major Pentatonic on One Chord” Lesson 34, The Songwriter’s Workshop: Melody Jimmy Kachulis, Berklee College of
Music, 2005, http//berkleeshares.com

[13] “Deeper into Recording with Pro Tools” Producing in the Home Studio with Pro Tools, Second Edition David Franz
Chapter 3, Berklee College of Music, 2005, http//berkleeshares.com

[14] Adobe Acrobat Professional Documentation

[15] Presentation Slides Templates from OpenOffice and Microsoft used for prototypes.

[16] “Information and communication need assessment”, author: Dr. Qaiser Durrani, Ms. Sabeen Durrani, et al, FAST-
NUCES (unpublished paper)

- 36 -

You might also like