You are on page 1of 23

VOICEXML 2.

0 AND THE W3C SPEECH INTERFACE FRAMEWORK



DEPT OF IT 1 SBIT, KHAMMAM


1. INTRODUCTION

Cloud Computing has great potential of providing robust computational power to the society
at reduced cost. It enables customers with limited computational resources to outsource their
large computation workloads to the cloud, and economically enjoy the massive computational
power, bandwidth, storage, and even appropriate software that can be shared in a pay-per-use
manner. Despite the tremendous benefits, security is the primary obstacle that prevents the wide
adoption of this promising computing model, especially for customers when their confidential
data are consumed and produced during the computation.

On the one hand, the outsourced computation workloads often contain sensitive information,
such as the business financial records, proprietary research data, or personally identifiable health
information etc. To combat against unauthorized information leakage, sensitive data have to be
encrypted before outsourcing so as to provide end to- end data confidentiality assurance in the
cloud and beyond. However, ordinary data encryption techniques in essence prevent cloud from
performing any meaningful operation of the underlying plaintext data, making the computation
over encrypted data a very hard problem. On the other hand, the operational details inside the
cloud are not transparent enough to customers. As a result, there do exist various motivations for
cloud server to behave unfaithfully and to return incorrect results, i.e., they may behave beyond
the classical semi honest model.

Fully homomorphic encryption (FHE) scheme, a general result of secure computation
outsourcing has been shown viable in theory, where the computation is represented by an
encrypted combinational Boolean circuit that allows to be evaluated with encrypted private
inputs.






VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 2 SBIT, KHAMMAM


Existing System
Despite the tremendous benefits, outsourcing computation to the commercial public
cloud is also depriving customers direct control over the systems that consume and produce
their data during the computation, which inevitably brings in new security concerns and
challenges towards this promising computing model. On the one hand, the outsourced
computation workloads often contain sensitive information, such as the business financial
records, proprietary research data, or personally identifiable health information etc.

To combat against unauthorized information leakage, sensitive data have to be encrypted
before outsourcing. so as to provide end to- end data confidentiality assurance in the cloud and
beyond. However, ordinary data encryption techniques in essence prevent cloud from performing
any meaningful operation of the underlying plaintext data, making the computation over
encrypted data a very hard problem. On the other hand, the operational details inside the cloud
are not transparent enough to customers. As a result, there do exist various motivations for cloud
server to behave unfaithfully and to return incorrect results, i.e., they may behave beyond the
classical semi hones model. For example, for the computations that require a large amount of
computing resources, there are huge financial incentives for the cloud to be lazy if the
customers cannot tell the correctness of the output. Besides, possible software bugs, hardware
failures, or even outsider attacks might also affect the quality of the computed results.

Thus, we argue that the cloud is intrinsically not secure from the viewpoint of customers.
Without providing a mechanism for secure computation outsourcing, i.e., to protect the sensitive
input and output information of the workloads and to validate the integrity of the computation
result, it would be hard to expect cloud customers to turn over control of their workloads from
local machines to cloud solely based on its economic savings and resource flexibility. For
practical consideration, such a design should further ensure that customers perform less amount
of operations following the mechanism than completing the computations by themselves directly.
Otherwise, there is no point for customers to seek help from cloud. Recent researches in both the
cryptography and the theoretical computer science communities have made steady advances in
secure outsourcing expensive computations.

VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 3 SBIT, KHAMMAM


Proposed System

On the one hand, the outsourced computation workloads often contain sensitive information,
such as the business financial records, proprietary research data, or personally identifiable health
information etc. To combat against unauthorized information leakage, sensitive data have to be
encrypted before outsourcing so as to provide end to- end data confidentiality assurance in the
cloud and beyond. However, ordinary data encryption techniques in essence prevent cloud from
performing any meaningful operation of the underlying plaintext data, making the computation
over encrypted data a very hard problem. On the other hand, the operational details inside the
cloud are not transparent enough to customers. As a result, there do exist various motivations for
cloud server to behave unfaithfully and to return incorrect results, i.e., they may behave beyond
the classical semi honest model.

Fully homomorphic encryption (FHE) scheme, a general result of secure computation
outsourcing has been shown viable in theory, where the computation is represented by an
encrypted combinational Boolean circuit that allows to be evaluated with encrypted private
inputs.




Motorola embraced the markup approach as a way to provide mobile users with up-to-
the-minute information and interactions. Given the corporate focus on mobile productivity,
Motorola's efforts focused on hands-free access. This led to an emphasis on speech recognition
rather than touch-tones as an input mechanism. Also, by starting later, Motorola was able to base
its language on the recently-developed XML framework. These efforts led to the October 1998
announcement of the VoxML technology. Since the announcement, thousands of developers
have downloaded the VoxML language specification and software development kit.

VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 4 SBIT, KHAMMAM

There has been growing interest in this general concept of using a markup language to
define voice access to Web-based applications. For several years Netphonic has had a product
known as Web-on-Call that used an extended HTML and software server to provide telephone
access to Web services; in 1998, General Magic acquired Netphonic to support Web access for
phone customers. In October 1998, the World Wide Web Consortium (W3C) sponsored a
workshop on Voice Browsers. A number of leading companies, including AT&T, IBM, Lucent,
Microsoft, Motorola, and Sun, participated.
Some systems, such as Vocalis' SpeecHTML, use a subset of HTML, together with a
fixed set of interaction policies, to provide interactive voice services.
Most recently, IBM has announced SpeechXML, which provides a markup language for
speech interfaces to Web pages; the current version provides a speech interface for desktop PC
browsers.
The VoiceXML Forum will explore public domain ideas from existing work in the voice
browser arena, and where appropriate include these in its final proposal. As the standardization
process for voice browsers develops, the VoiceXML Forum will work with others to find
common ground and the right solution for business needs.






VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 5 SBIT, KHAMMAM





GOALS OF VOICEXML
VoiceXMLs main goal is to bring the full power of web development and content
delivery to voice response applications, and to free the authors of such applications from low-
level programming and resource management. It enables integration of voice services with
data services using the familiar client-server paradigm. A voice service is viewed as a
sequence of interaction dialogs between a user and an implementation platform. The dialogs
are provided by document servers, which may be external to the implementation platform.
Document servers maintain overall service logic, perform database and legacy system
operations, and produce dialogs. A VoiceXML document specifies each interaction dialog to
be conducted by a VoiceXML interpreter. User input affects dialog interpretation and is
collected into requests submitted to a document server. The document server may reply with
another VoiceXML document to continue the users session with other dialogs.
VoiceXML is a markup language that:
Minimizes client/server interactions by specifying multiple interactions per document.
Shields application authors from low-level, and platform-specific details.
Separates user interaction code (in VoiceXML) from service logic (CGI scripts).
Promotes service portability across implementation platforms. VoiceXML is a common
language for content providers, tool providers, and platform providers.
Is easy to use for simple interactions, and yet provides language features to support
complex dialogs.

VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 6 SBIT, KHAMMAM

While VoiceXML strives to accommodate the requirements of a majority of voice response
services, services with stringent requirements may best be served by dedicated applications
that employ a finer level of control.
SCOPE OF VOICEXML
The language describes the human-machine interaction provided by voice response
systems, which includes:
Output of synthesized speech (text-to-speech).
Output of audio files.
Recognition of spoken input.
Recognition of DTMF input.
Recording of spoken input.
Telephony features such as call transfer and disconnect.
The language provides means for collecting character and/or spoken input, assigning
the input to document-defined request variables, and making decisions that affect the
interpretation of documents written in the language. A document may be linked to other
documents through Universal Resource Identifiers (URIs).












VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 7 SBIT, KHAMMAM

CREATING A BASIC VOICE XML DOCUMENT

VoiceXML is an extensible markup language (XML) for the creation of automated
speech recognition (ASR) and interactive voice response (IVR) applications. Based on the XML
tag/attribute format, the VoiceXML syntax involves enclosing instructions (items) within a tag
structure in the following manner:
< element_name attribute_name="attribute_value">
......contained items......
< /element_name>
A VoiceXML application consists of one or more text files called documents. These
document files are denoted by a ".vxml" file extension and contain the various VoiceXML
instructions for the application. It is recommended that the first instruction in any document to be
seen by the interpreter be the XML version tag:
< ?xml version="1.0"?>
The remainder of the document's instructions should be enclosed by the vxml tag with the
version attribute set equal to the version of VoiceXML being used ("1.0" in the present case) as
follows:
< vxml version="1.0">









VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 8 SBIT, KHAMMAM

VOICEXML ELEMENTS

Element Purpose
<assign> Assign a variable a value.
<audio> Play an audio clip within a prompt.
<block> A container of (non-interactive) executable code.
<break> JSML element to insert a pause in output.
<catch> Catch an event.
<choice> Define a menu item.
<clear> Clear one or more form item variables.
<disconnect> Disconnect a session.
<div> JSML element to classify a region of text as a particular type.
<dtmf> Specify a touch-tone key grammar.
<else> Used in <if> elements.
<elseif> Used in <if> elements.
<emp> JSML element to change the emphasis of speech output.
<enumerate> Shorthand for enumerating the choices in a menu.
<error> Catch an error event.
<exit> Exit a session.
<field> Declares an input field in a form.
<filled> An action executed when fields are filled.
<form> A dialog for presenting information and collecting data.
<goto> Go to another dialog in the same or different document.
<grammar> Specify a speech recognition grammar.
<help> Catch a help event.
<if> Simple conditional logic.
<initial>
Declares initial logic upon entry into a (mixed-initiative)
form.
<link> Specify a transition common to all dialogs in the links scope.
<menu> A dialog for choosing amongst alternative destinations.

VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 9 SBIT, KHAMMAM

Element Purpose
<meta> Define a meta data item as a name/value pair.
<noinput> Catch a noinput event.
<nomatch> Catch a nomatch event.
<object> Interact with a custom extension.
<option> Specify an option in a <field>
<param> Parameter in <object> or <subdialog>.
<prompt> Queue TTS and audio output to the user.
<property> Control implementation platform settings.
<pros> JSML element to change the prosody of speech output.
<record> Record an audio sample.
<reprompt> Play a field prompt when a field is re-visited after an event.
<return> Return from a subdialog.
<vxml> Top-level element in each VoiceXML document.
<script> Specify a block of ECMAScript client-side scripting logic.
<subdialog> Invoke another dialog as a subdialog of the current one.
<submit> Submit values to a document server.
<throw> Throw an event.
<transfer> Transfer the caller to another destination.
<value> Insert the value of a expression in a prompt.











VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 10 SBIT, KHAMMAM

ARCHITECTURAL MODEL
The architectural model assumed by this document has the following components:



FIG. 7.1
A document server (e.g. a web server) processes requests from a client application, the
VoiceXML Interpreter, through the VoiceXML interpreter context. The server produces
VoiceXML documents in reply, which are processed by the VoiceXML Interpreter. The
VoiceXML interpreter context may monitor user inputs in parallel with the VoiceXML
interpreter. For example, one VoiceXML interpreter context may always listen for a special
escape phrase that takes the user to a high-level personal assistant, and another may listen for
escape phrases that alter user preferences like volume or text-to-speech characteristics.

VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 11 SBIT, KHAMMAM


The implementation platform is controlled by the VoiceXML interpreter context and
by the VoiceXML interpreter. For instance, in an interactive voice response application, the
VoiceXML interpreter context may be responsible for detecting an incoming call, acquiring
the initial VoiceXML document, and answering the call, while the VoiceXML interpreter
conducts the dialog after answer. The implementation platform generates events in response to
user actions (e.g. spoken or character input received, disconnect) and system events (e.g. timer
expiration). Some of these events are acted upon by the VoiceXML interpreter itself, as
specified by the VoiceXML document, while others are acted upon by the VoiceXML
interpreter context.

























VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 12 SBIT, KHAMMAM

PRINCIPLES OF DESIGN
VoiceXML is an XML schema. For details about XML, refer to the Annotated XML
Reference Manual.
1. The language promotes portability of services through abstraction of platform resources.
2. The language accommodates platform diversity in supported audio file formats, speech
grammar formats, and URI schemes. While platforms will respond to market pressures and
support common formats, the language per se will not specify them.
3. The language supports ease of authoring for common types of interactions.
4. The language has a well-defined semantics that preserves the author's intent regarding
the behavior of interactions with the user. Client heuristics are not required to determine
document element interpretation.
5. The language has a control flow mechanism.
6. The language enables a separation of service logic from interaction behavior.
7. It is not intended for heavy computation, database operations, or legacy system
operations. These are assumed to be handled by resources outside the document interpreter,
e.g. a document server.
8. General service logic, state management, dialog generation, and dialog sequencing are
assumed to reside outside the document interpreter.
9. The language provides ways to link documents using URIs, and also to submit data to
server scripts using URIs.
10. VoiceXML provides ways to identify exactly which data to submit to the server, and
which HTTP method (get or post) to use in the submittal.
11. The language does not require document authors to explicitly allocate and deallocate
dialog resources, or deal with concurrency. Resource allocation and concurrent threads of
control are to be handled by the implementation platform.




VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 13 SBIT, KHAMMAM

IMPLEMENTATION PLATFORM REQUIREMENTS
This section outlines the requirements on the hardware/software platforms that will
support a VoiceXML interpreter.
Document acquisition. The interpreter context is expected to acquire documents for the
VoiceXML interpreter to act on. In some cases, the document request is generated by the
interpretation of a VoiceXML document, while other requests are generated by the interpreter
context in response to events outside the scope of the language, for example an incoming
phone call.
Audio output. An implementation platform can provide audio output using audio files and/or
using text-to-speech (TTS). When both are supported, the platform must be able to freely
sequence TTS and audio output. Audio files are referred to by a URI. The language does not
specify a required set of audio file formats.
Audio input. An implementation platform is required to detect and report character and/or
spoken input simultaneously and to control input detection interval duration with a timer
whose length is specified by a VoiceXML document.
It must report characters (for example, DTMF) entered by a user.
It must be able to receive speech recognition grammar data dynamically. Some
VoiceXML elements contain speech grammar data; others refer to speech grammar data
through a URI. The speech recognizer must be able to accommodate dynamic update of the
spoken input for which it is listening through either method of speech grammar data
specification.
It should be able to record audio received from the user. The implementation platform
must be able to make the recording available to a request variable.





VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 14 SBIT, KHAMMAM

CONCEPTS
A VoiceXML document (or a set of documents called an application) forms a
conversational finite state machine. The user is always in one conversational state, or dialog,
at a time. Each dialog determines the next dialog to transition to. Transitions are specified
using URIs, which define the next document and dialog to use. If a URI does not refer to a
document, the current document is assumed. If it does not refer to a dialog, the first dialog in
the document is assumed. Execution is terminated when a dialog does not specify a successor,
or if it has an element that explicitly exits the conversation.
10.1 DIALOGS AND SUBDIALOGS
There are two kinds of dialogs: forms and menus. Forms define an interaction that
collects values for a set of field item variables. Each field may specify a grammar that defines
the allowable inputs for that field. If a form-level grammar is present, it can be used to fill
several fields from one utterance. A menu presents the user with a choice of options and then
transitions to another dialog based on that choice.
A subdialog is like a function call, in that it provides a mechanism for invoking a new
interaction, and returning to the original form. Local data, grammars, and state information are
saved and are available upon returning to the calling document. Subdialogs can be used, for
example, to create a confirmation sequence that may require a database query; to create a set
of components that may be shared among documents in a single application; or to create a
reusable library of dialogs shared among many applications.
10.2 SESSIONS
A session begins when the user starts to interact with a VoiceXML interpreter context,
continues as documents are loaded and processed, and ends when requested by the user, a
document, or the interpreter context.

VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 15 SBIT, KHAMMAM


10.3 APPLICATIONS
An application is a set of documents sharing the same application root document.
Whenever the user interacts with a document in an application, its application root document
is also loaded. The application root document remains loaded while the user is transitioning
between other documents in the same application, and it is unloaded when the user transitions
to a document that is not in the application. While it is loaded, the application root documents
variables are available to the other documents as application variables, and its grammars can
also be set to remain active for the duration of the application.

Figure 10.3.1: Transitioning between documents in an application.
10.4 GRAMMARS
Each dialog has one or more speech and/or DTMF grammars associated with it. In
machine directed applications, each dialogs grammars are active only when the user is in that
dialog. In mixed initiative applications, where the user and the machine alternate in
determining what to do next, some of the dialogs are flagged to make their grammars active
(i.e., listened for) even when the user is in another dialog in the same document, or on another
loaded document in the same application. In this situation, if the user says something matching
another dialogs active grammars, execution transitions to that other dialog, with the users
utterance treated as if it were said in that dialog. Mixed initiative adds flexibility and power to
voice applications.

VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 16 SBIT, KHAMMAM

10.5 EVENTS
VoiceXML provides a form-filling mechanism for handling "normal" user input. In
addition, VoiceXML defines a mechanism for handling events not covered by the form
mechanism.
Events are thrown by the platform under a variety of circumstances, such as when the
user does not respond, doesn't respond intelligibly, requests help, etc. The interpreter also
throws events if it finds a semantic error in a VoiceXML document. Events are caught by
catch elements or their syntactic shorthand. Each element in which an event can occur may
specify catch elements. Catch elements are also inherited from enclosing elements "as if by
copy". In this way, common event handling behavior can be specified at any level, and it
applies to all lower levels.
10.6 LINKS
A link supports mixed initiative. It specifies a grammar that is active whenever the user
is in the scope of the link. If user input matches the links grammar, control transfers to the
links destination URI. A <link> can be used to throw an event to go to a destination URI.













VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 17 SBIT, KHAMMAM


SUPPORTED AUDIO FILE FORMATS
VoiceXML recommends that a platform support the playing and recording audio
formats specified below. Note: a platform need not support both A-law and -law
simultaneously.
Audio Format MIME Type
Raw (headerless) 8kHz 8-bit mu-law [PCM] single channel. audio/basic
Raw (headerless) 8kHz 8 bit A-law [PCM] single channel. audio/x-alaw-basic
WAV (RIFF header) 8kHz 8-bit mu-law [PCM] single channel. audio/wav
WAV (RIFF header) 8kHz 8-bit A-law [PCM] single channel. audio/wav











VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 18 SBIT, KHAMMAM


EXAMPLE
The top-level element is <vxml>, which is mainly a container for dialogs. There are
two types of dialogs: forms and menus. Forms present information and gather input; menus
offer choices of what to do next. This example has a single form, which contains a block that
synthesizes and presents Hello World! to the user. Since the form does not specify a
successor dialog, the conversation ends.
Our second example asks the user for a choice of drink and then submits it to a server
script:
<?xml version="1.0"?>
<vxml version="1.0">
<form>
<field name="drink">
<prompt>Would you like coffee, tea, milk, or nothing?</prompt>
<grammar src="drink.gram" type="application/x-jsgf"/>
</field>
<block> <submit next="http://www.drink.example/drink2.asp"/> </block>
</form>
</vxml>
A field is an input field. The user must provide a value for the field before proceeding
to the next element in the form. A sample interaction is:C (computer): Would you like coffee,
tea, milk, or nothing?
H (human): Orange juice.
C: I did not understand what you said.
C: Would you like coffee, tea, milk, or nothing?
H: TeaC: (continues in document drink2.asp)

VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 19 SBIT, KHAMMAM


APPLICATIONS OF VOICE XML

Below are a few examples in which VoiceXML applications can be used:
Voice portals: Just like Web portals, voice portals can be used to provide personalized
services to access information like stock quotes, weather, restaurant listings, news, etc.
Location-based services: You can receive targeted information specific to the location you
are dialing from. Applications use the telephone number you are dialing from.
Voice alerts (such as for advertising): VoiceXML can be used to send targeted alerts to a
user. The user would sign up to receive special alerts informing him of upcoming events.
Commerce: VoiceXML can be used to implement applications that allow users to purchase
over the phone. Because voice gives you less information than graphics, specific products that
don't need a lot of description (such as tickets, CDs, office supplies, etc.) work well.




















VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 20 SBIT, KHAMMAM

CONCLUSION
VoiceXML is designed for creating audio dialogs that feature synthesized speech,
digitized audio, recognition of spoken and DTMF key input, recording of spoken input,
telephony, and mixed-initiative conversations. Its major goal is to bring the advantages of web-
based development and content delivery to intera. With this we can conclude VoiceXML
development in speech recognition application is greatly simplified by using familiar web
infrastructure, including tools and Web servers. Instead of using a PC with a Web browser, any
telephone can access VoiceXML applications via a VoiceXML "interpreter" (also known as a
"browser") running on a telephony serverctive voice response applications.

Future versions of the standard:
VoiceXML 3.0 will be the next major release of VoiceXML, with new major features. It
includes a new XML statechart description language called SCXML.








VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 21 SBIT, KHAMMAM

REFERENCES
XML UNLEASHED By Michael Morrison, et al
http://www.w3.org/TR/2000/NOTE-voicexml-20000505
http://www.w3.org/TR/voicexml
http://www.voicexmlreview.org
[1] Anderson, E. A., Breitenbach, S., Burd, T., Chidambaram, N., Houle, P., D.
Newsome, D, Tang, X., Zhu, X., Early Adopter VoiceXML, Wrox, 2001.
[2] Andrews, T., Curbera, F., Dholakia, H., Goland, Y., Klein, J., Leymann, F., Liu, K.,
Roller, D., Smith, D., Thatte, S. and Trickovic, I, Business Process Execution Language
forWebServices version 1.1., Technical report, BEA, IBM, Microsoft, SAP, Siebel
Systems, May 2003.
[3] IBM Web Service BPEL: http://www-
106.ibm.com/developerworks/webservices/library/ws-bpel/
[4] Courbis, C., Finkelstein, A., Toward aspect weaving applications, 27th International
Conference on Software Engineering (ICSE'05), St. Louis, Missouri, USA, 2005.
[5] Courbis, C., Finkelstein, A., Weaving Aspects into Web Service Orchestrations, 3rd
IEEE International Conference on Web Services (ICWS 2005), Orlando, Florida, 2005.
[6] Dettmer, R., Its good to talk, speech technology for on-line services access, IEE
Review, Volume: 49, Issue:6, June 2003, pp 30-33.
[7] Dragon NaturallySpeakingTM Preferred, http://www.scansoft.com/naturallyspeaking
[8] Chesnut, C., FreeSpeech project, http://www.mperfect.net/freeSpeech
[9] EMMA: Extensible MultiModal Annotation markup language, W3C Working Draft
(16September 2005), http://www.w3.org/TR/emma/
[10] Froumetin, Max, The W3C and its Multimodal Interaction Activity, March, 2005.
University of Lille, http://www.w3.org/2005/Talks/0321-maxf-w3c
[11] JSGF: Java Speech Grammar Format, http://www.w3.org/TR/jsgf
[12] Google API : http://www.google.com/apis/

VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 22 SBIT, KHAMMAM

[13] Krafzig, D., Banke, K., Slama, D., Enterprise SOA: Service Oriented-Architecture
Best Practices, Prentice Hall PTR, 2004, 416 pages.
[14] Levin L. , Lavie A. , Woszczyna M. , Gates D. , Ga-valda M. , Koll D., Waibel A.,
The JANUS-III translation system: Speech-to-speech translation in multipledomains,
Machine translation, 2000 , vol. 15, no 1-2 , pp. 3 25.
[15] Larson, James A, VoiceXML on Steroids , speech technology magazine,
November/December 2005, online version available : http://www.larson-
tech.com/Writings/Steriods.htm
[16] Mittendorfer, M., Winiwarter, W., Niklfeld, G., Making the VoiceWeb smarter
Integrating Intelligent Component Technologies and VoiceXML, Proc. WISE 2001,
Kyoto, Japan
[17] Media Resource Control Protocol (MRCPv1 (RFC
4463))http://www.ietf.org/rfc/rfc4463.txt
[18] MMIWG, MultiModal Interaction Working Group, http://www.w3.org/2002/mmi/
[19] Naur, P., Revised Report on the Algorithmic Language ALGOL 60,
Communications of the ACM, V l. 3 No.5, pp. 299-314, 1960.
[20] Newcomer, E., Lomow, G., Service-Oriented Architecture with web services,
Addison Wesley, 2005, 444 pages.
[21] GSL: Nuance Grammar Specification Language,
http://studio.tellme.com/grammars/gs
22] Oracle BPEL Process manager home page:
http://www.oracle.com/technology/products/ias/bpel/index.html
[23] Peltz, C., Web Services Orchestration - a review of emerging technologies, tools,
and standards, Technical report, HP, January 2003. Technical white paper,
http://devresource.hp.com/drc/technical_white_papers/WSOrch/WSOrchestration.pdf
[24] Rouillard, J., VoiceXML, Le langage d'accs Internet par tlphone, Paris, Vuibert,
2004.
[25] Rouillard, J., Web services and speech-based applications, ICPS'06, IEEE
International Conference on Pervasive Services 2006 Lyon, 2006.
[26] Rouillard, J., Truillet, P., Enhanced VoiceXML, HCI International 2005, Las Vgas,
2005.

VOICEXML 2.0 AND THE W3C SPEECH INTERFACE FRAMEWORK

DEPT OF IT 23 SBIT, KHAMMAM

[27] http://www.microsoft.com/speech/download/sdk51/
[28] SOAP: Simple Object Access Protocol,http://www.w3.org/TR/soap
[29] Telispeech 1.2 from Telisma, http://www.telisma.com/
[30] UDDI, Universal Description, Discovery and Inte-gration protocol,
http://www.uddi.org
[31] ViaVoice (IBM), http://www.scansoft.com/viavoice
[32] VoiceXML 1.0., W3C Recommendation, http://www.w3.org/TR/voicexml10
[33] VoiceXML 2.0., W3C Recommendation, http://www.w3.org/TR/voicexml20
[34] VoiceXML 2.1, W3C Candidate Recommendation http://www.w3.org/TR/2005/CR-
voicexml21-20050613
[35] WSDL: Web Services Description Language, http://www.w3.org/TR/wsdl
[36] X+V, XHTML + Voice Profile,http://www.voicexml.org/specs/multimodal/x+v/12

You might also like