Articles: Speech Synthesis 1 Prosody (Linguistics) 11 Tone (Linguistics) 13

Contents
Articles
Speech synthesis Prosody (linguistics) Tone (linguistics) 1 11 13
References
Article Sources and Contributors 24
Article Licenses
License 25
Speech synthesis
Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.[1] Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output.[2] The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1980s.
Overview of text processing

A text-to-speech system (or "engine") is composed of two parts: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion.[3] Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-endoften referred to as the synthesizerthen converts the symbolic linguistic representation into sound.
History
Long before electronic signal processing was invented, there were those who tried to build machines to create human speech. Some early legends of the existence of "speaking heads" involved Gerbert of Aurillac (d. 1003 AD), Albertus Magnus (11981280), and Roger Bacon (12141294). In 1779, the Danish scientist Christian Kratzenstein, working at the Russian Academy of Sciences, built models of the human vocal tract that could produce the five long vowel sounds (in International Phonetic Alphabet notation, they are [a], [e], [i], [o] and [u]).[4] This was followed by the bellows-operated "acoustic-mechanical speech machine" by Wolfgang von Kempelen of Vienna, Austria, described in a 1791 paper.[5] This machine added models of the tongue and lips, enabling it to produce consonants as well as vowels. In 1837, Charles Wheatstone produced a "speaking machine" based on von Kempelen's design, and in 1857, M. Faber built the "Euphonia". Wheatstone's design was resurrected in 1923 by Paget.[6] In the 1930s, Bell Labs developed the VOCODER, a keyboard-operated electronic speech analyzer and synthesizer that was said to be clearly intelligible. Homer Dudley refined this device into the VODER, which he exhibited at the 1939 New York World's Fair. The Pattern playback was built by Dr. Franklin S. Cooper and his colleagues at Haskins Laboratories in the late 1940s and completed in 1950. There were several different versions of this hardware device but only one currently survives. The machine converts pictures of the acoustic patterns of speech in the form of a spectrogram back into sound. Using this device, Alvin Liberman and colleagues were able to discover acoustic cues for the perception of phonetic segments (consonants and vowels).
Speech synthesis Early electronic speech synthesizers sounded robotic and were often barely intelligible. The quality of synthesized speech has steadily improved, but output from contemporary speech synthesis systems is still clearly distinguishable from actual human speech. As the cost-performance ratio causes speech synthesizers to become cheaper and more accessible to the people, more people will benefit from the use of text-to-speech programs.[7]
Electronic devices
The first computer-based speech synthesis systems were created in the late 1950s, and the first complete text-to-speech system was completed in 1968. In 1961, physicist John Larry Kelly, Jr and colleague Louis Gerstman[8] used an IBM 704 computer to synthesize speech, an event among the most prominent in the history of Bell Labs. Kelly's voice recorder synthesizer (vocoder) recreated the song "Daisy Bell", with musical accompaniment from Max Mathews. Coincidentally, Arthur C. Clarke was visiting his friend and colleague John Pierce at the Bell Labs Murray Hill facility. Clarke was so impressed by the demonstration that he used it in the climactic scene of his screenplay for his novel 2001: A Space Odyssey,[9] where the HAL 9000 computer sings the same song as it is being put to sleep by astronaut Dave Bowman.[10] Despite the success of purely electronic speech synthesis, research is still being conducted into mechanical speech synthesizers.[11]
Synthesizer technologies
The most important qualities of a speech synthesis system are naturalness and intelligibility. Naturalness describes how closely the output sounds like human speech, while intelligibility is the ease with which the output is understood. The ideal speech synthesizer is both natural and intelligible. Speech synthesis systems usually try to maximize both characteristics. The two primary technologies for generating synthetic speech waveforms are concatenative synthesis and formant synthesis. Each technology has strengths and weaknesses, and the intended uses of a synthesis system will typically determine which approach is used.
Concatenative synthesis
Concatenative synthesis is based on the concatenation (or stringing together) of segments of recorded speech. Generally, concatenative synthesis produces the most natural-sounding synthesized speech. However, differences between natural variations in speech and the nature of the automated techniques for segmenting the waveforms sometimes result in audible glitches in the output. There are three main sub-types of concatenative synthesis. Unit selection synthesis Unit selection synthesis uses large databases of recorded speech. During database creation, each recorded utterance is segmented into some or all of the following: individual phones, diphones, half-phones, syllables, morphemes, words, phrases, and sentences. Typically, the division into segments is done using a specially modified speech recognizer set to a "forced alignment" mode with some manual correction afterward, using visual representations such as the waveform and spectrogram.[12] An index of the units in the speech database is then created based on the segmentation and acoustic parameters like the fundamental frequency (pitch), duration, position in the syllable, and neighboring phones. At runtime, the desired target utterance is created by determining the best chain of candidate units from the database (unit selection). This process is typically achieved using a specially weighted decision tree. Unit selection provides the greatest naturalness, because it applies only a small amount of digital signal processing (DSP) to the recorded speech. DSP often makes recorded speech sound less natural, although some systems use a small amount of signal processing at the point of concatenation to smooth the waveform. The output from the best unit-selection systems is often indistinguishable from real human voices, especially in contexts for which the TTS system has been tuned. However, maximum naturalness typically require unit-selection speech databases to be very
Speech synthesis large, in some systems ranging into the gigabytes of recorded data, representing dozens of hours of speech.[13] Also, unit selection algorithms have been known to select segments from a place that results in less than ideal synthesis (e.g. minor words become unclear) even when a better choice exists in the database.[14] Diphone synthesis Diphone synthesis uses a minimal speech database containing all the diphones (sound-to-sound transitions) occurring in a language. The number of diphones depends on the phonotactics of the language: for example, Spanish has about 800 diphones, and German about 2500. In diphone synthesis, only one example of each diphone is contained in the speech database. At runtime, the target prosody of a sentence is superimposed on these minimal units by means of digital signal processing techniques such as linear predictive coding, PSOLA[15] or MBROLA.[16] The quality of the resulting speech is generally worse than that of unit-selection systems, but more natural-sounding than the output of formant synthesizers. Diphone synthesis suffers from the sonic glitches of concatenative synthesis and the robotic-sounding nature of formant synthesis, and has few of the advantages of either approach other than small size. As such, its use in commercial applications is declining, although it continues to be used in research because there are a number of freely available software implementations. Domain-specific synthesis Domain-specific synthesis concatenates prerecorded words and phrases to create complete utterances. It is used in applications where the variety of texts the system will output is limited to a particular domain, like transit schedule announcements or weather reports.[17] The technology is very simple to implement, and has been in commercial use for a long time, in devices like talking clocks and calculators. The level of naturalness of these systems can be very high because the variety of sentence types is limited, and they closely match the prosody and intonation of the original recordings. Because these systems are limited by the words and phrases in their databases, they are not general-purpose and can only synthesize the combinations of words and phrases with which they have been preprogrammed. The blending of words within naturally spoken language however can still cause problems unless the many variations are taken into account. For example, in non-rhotic dialects of English the "r" in words like "clear" /kli/ is usually only pronounced when the following word has a vowel as its first letter (e.g. "clear out" is realized as /klit/). Likewise in French, many final consonants become no longer silent if followed by a word that begins with a vowel, an effect called liaison. This alternation cannot be reproduced by a simple word-concatenation system, which would require additional complexity to be context-sensitive.
Formant synthesis
Formant synthesis does not use human speech samples at runtime. Instead, the synthesized speech output is created using additive synthesis and an acoustic model (physical modelling synthesis)[18] . Parameters such as fundamental frequency, voicing, and noise levels are varied over time to create a waveform of artificial speech. This method is sometimes called rules-based synthesis; however, many concatenative systems also have rules-based components. Many systems based on formant synthesis technology generate artificial, robotic-sounding speech that would never be mistaken for human speech. However, maximum naturalness is not always the goal of a speech synthesis system, and formant synthesis systems have advantages over concatenative systems. Formant-synthesized speech can be reliably intelligible, even at very high speeds, avoiding the acoustic glitches that commonly plague concatenative systems. High-speed synthesized speech is used by the visually impaired to quickly navigate computers using a screen reader. Formant synthesizers are usually smaller programs than concatenative systems because they do not have a database of speech samples. They can therefore be used in embedded systems, where memory and microprocessor power are especially limited. Because formant-based systems have complete control of all aspects of the output speech, a wide variety of prosodies and intonations can be output, conveying not just questions and statements, but a variety of emotions and tones of voice.
Speech synthesis Examples of non-real-time but highly accurate intonation control in formant synthesis include the work done in the late 1970s for the Texas Instruments toy Speak & Spell, and in the early 1980s Sega arcade machines.[19] and in many Atari, Inc. arcade games[20] using the TMS5220 LPC Chips. Creating proper intonation for these projects was painstaking, and the results have yet to be matched by real-time text-to-speech interfaces.[21]
Articulatory synthesis
Articulatory synthesis refers to computational techniques for synthesizing speech based on models of the human vocal tract and the articulation processes occurring there. The first articulatory synthesizer regularly used for laboratory experiments was developed at Haskins Laboratories in the mid-1970s by Philip Rubin, Tom Baer, and Paul Mermelstein. This synthesizer, known as ASY, was based on vocal tract models developed at Bell Laboratories in the 1960s and 1970s by Paul Mermelstein, Cecil Coker, and colleagues. Until recently, articulatory synthesis models have not been incorporated into commercial speech synthesis systems. A notable exception is the NeXT-based system originally developed and marketed by Trillium Sound Research, a spin-off company of the University of Calgary, where much of the original research was conducted. Following the demise of the various incarnations of NeXT (started by Steve Jobs in the late 1980s and merged with Apple Computer in 1997), the Trillium software was published under the GNU General Public License, with work continuing as gnuspeech. The system, first marketed in 1994, provides full articulatory-based text-to-speech conversion using a waveguide or transmission-line analog of the human oral and nasal tracts controlled by Carr's "distinctive region model".
HMM-based synthesis
HMM-based synthesis is a synthesis method based on hidden Markov models, also called Statistical Parametric Synthesis. In this system, the frequency spectrum (vocal tract), fundamental frequency (vocal source), and duration (prosody) of speech are modeled simultaneously by HMMs. Speech waveforms are generated from HMMs themselves based on the maximum likelihood criterion.[22]
Sinewave synthesis
Sinewave synthesis is a technique for synthesizing speech by replacing the formants (main bands of energy) with pure tone whistles.[23]
Challenges
Text normalization challenges
The process of normalizing text is rarely straightforward. Texts are full of heteronyms, numbers, and abbreviations that all require expansion into a phonetic representation. There are many spellings in English which are pronounced differently based on context. For example, "My latest project is to learn how to better project my voice" contains two pronunciations of "project". Most text-to-speech (TTS) systems do not generate semantic representations of their input texts, as processes for doing so are not reliable, well understood, or computationally effective. As a result, various heuristic techniques are used to guess the proper way to disambiguate homographs, like examining neighboring words and using statistics about frequency of occurrence. Recently TTS systems have begun to use HMMs (discussed above) to generate "parts of speech" to aid in disambiguating homographs. This technique is quite successful for many cases such as whether "read" should be pronounced as "red" implying past tense, or as "reed" implying present tense. Typical error rates when using HMMs in this fashion are usually below five percent. These techniques also work well for most European languages,
Speech synthesis although access to required training corpora is frequently difficult in these languages. Deciding how to convert numbers is another problem that TTS systems have to address. It is a simple programming challenge to convert a number into words (at least in English), like "1325" becoming "one thousand three hundred twenty-five." However, numbers occur in many different contexts; "1325" may also be read as "one three two five", "thirteen twenty-five" or "thirteen hundred and twenty five". A TTS system can often infer how to expand a number based on surrounding words, numbers, and punctuation, and sometimes the system provides a way to specify the context if it is ambiguous.[24] Roman numerals can also be read differently depending on context. For example "Henry VIII" reads as "Henry the Eighth", while "Chapter VIII" reads as "Chapter Eight". Similarly, abbreviations can be ambiguous. For example, the abbreviation "in" for "inches" must be differentiated from the word "in", and the address "12 St John St." uses the same abbreviation for both "Saint" and "Street". TTS systems with intelligent front ends can make educated guesses about ambiguous abbreviations, while others provide the same result in all cases, resulting in nonsensical (and sometimes comical) outputs.
Text-to-phoneme challenges
Speech synthesis systems use two basic approaches to determine the pronunciation of a word based on its spelling, a process which is often called text-to-phoneme or grapheme-to-phoneme conversion (phoneme is the term used by linguists to describe distinctive sounds in a language). The simplest approach to text-to-phoneme conversion is the dictionary-based approach, where a large dictionary containing all the words of a language and their correct pronunciations is stored by the program. Determining the correct pronunciation of each word is a matter of looking up each word in the dictionary and replacing the spelling with the pronunciation specified in the dictionary. The other approach is rule-based, in which pronunciation rules are applied to words to determine their pronunciations based on their spellings. This is similar to the "sounding out", or synthetic phonics, approach to learning reading. Each approach has advantages and drawbacks. The dictionary-based approach is quick and accurate, but completely fails if it is given a word which is not in its dictionary. As dictionary size grows, so too does the memory space requirements of the synthesis system. On the other hand, the rule-based approach works on any input, but the complexity of the rules grows substantially as the system takes into account irregular spellings or pronunciations. (Consider that the word "of" is very common in English, yet is the only word in which the letter "f" is pronounced [v].) As a result, nearly all speech synthesis systems use a combination of these approaches. Languages with a phonemic orthography have a very regular writing system, and the prediction of the pronunciation of words based on their spellings is quite successful. Speech synthesis systems for such languages often use the rule-based method extensively, resorting to dictionaries only for those few words, like foreign names and borrowings, whose pronunciations are not obvious from their spellings. On the other hand, speech synthesis systems for languages like English, which have extremely irregular spelling systems, are more likely to rely on dictionaries, and to use rule-based methods only for unusual words, or words that aren't in their dictionaries.
Evaluation challenges
The consistent evaluation of speech synthesis systems may be difficult because of a lack of universally agreed objective evaluation criteria. Different organizations often use different speech data. The quality of speech synthesis systems also depends to a large degree on the quality of the production technique (which may involve analogue or digital recording) and on the facilities used to replay the speech. Evaluating speech synthesis systems has therefore often been compromised by differences between production techniques and replay facilities. Recently, however, some researchers have started to evaluate speech synthesis systems using a common speech dataset.[25]
Speech synthesis
Prosodics and emotional content

A recent study reported in the journal "Speech Communication" by Amy Drahota and colleagues at the University of Portsmouth, UK, reported that listeners to voice recordings could determine, at better than chance levels, whether or not the speaker was smiling.[26] It was suggested that identification of the vocal features which signal emotional content may be used to help make synthesized speech sound more natural.
Dedicated hardware
Votrax SC-01A (analog formant) SC-02 / SSI-263 / "Arctic 263" General Instruments SP0256-AL2 (CTS256A-AL2, MEA8000) Magnevation SpeakJet (www.speechchips.com TTS256) Savage Innovations SoundGin National Semiconductor DT1050 Digitalker (Mozer) Silicon Systems SSI 263 (analog formant) Texas Instruments LPC Speech Chips
TMS5110A TMS5200 Oki Semiconductor ML22825 (ADPCM) ML22573 (HQADPCM) Toshiba T6721A Philips PCF8200 TextSpeak Embedded TTS Modules
Computer operating systems or outlets with speech synthesis

Atari
Arguably, the first speech system integrated into an operating system was the 1400XL/1450XL personal computers designed by Atari, Inc. using the Votrax SC01 chip in 1983. The 1400XL/1450XL computers used a Finite State Machine to enable World English Spelling text-to-speech synthesis[27] . Unfortunately, the 1400XL/1450XL personal computers never shipped in quantity.
Apple
The first speech system integrated into an operating system that shipped in quantity was Apple Computer's MacInTalk in 1984. Since the 1980s Macintosh Computers offered text to speech capabilities through The MacinTalk software. In the early 1990s Apple expanded its capabilities offering system wide text-to-speech support. With the introduction of faster PowerPC-based computers they included higher quality voice sampling. Apple also introduced speech recognition into its systems which provided a fluid command set. More recently, Apple has added sample-based voices. Starting as a curiosity, the speech system of Apple Macintosh has evolved into a fully-supported program, PlainTalk, for people with vision problems. VoiceOver was for the first time featured in Mac OS X Tiger (10.4). During 10.4 (Tiger) & first releases of 10.5 (Leopard) there was only one standard voice shipping with Mac OS X. Starting with 10.6 (Snow Leopard), the user can choose out of a wide range list of multiple voices. VoiceOver voices feature the taking of realistic-sounding breaths between sentences, as well as improved clarity at high read rates over PlainTalk. Mac OS X also includes say, a command-line based application that
Speech synthesis converts text to audible speech. The AppleScript Standard Additions includes a say verb that allows a script to use any of the installed voices and to control the pitch, speaking rate and modulation of the spoken text.
AmigaOS
The second operating system with advanced speech synthesis capabilities was AmigaOS, introduced in 1985. The voice synthesis was licensed by Commodore International from a third-party software house (Don't Ask Software, now Softvoice, Inc.) and it featured a complete system of voice emulation, with both male and female voices and "stress" indicator markers, made possible by advanced features of the Amiga hardware audio chipset.[28] It was divided into a narrator device and a translator library. Amiga Speak Handler featured a text-to-speech translator. AmigaOS considered speech synthesis a virtual hardware device, so the user could even redirect console output to it. Some Amiga programs, such as word processors, made extensive use of the speech system.
Microsoft Windows
Modern Windows systems use SAPI4- and SAPI5-based speech systems that include a speech recognition engine (SRE). SAPI 4.0 was available on Microsoft-based operating systems as a third-party add-on for systems like Windows 95 and Windows 98. Windows 2000 added a speech synthesis program called Narrator, directly available to users. All Windows-compatible programs could make use of speech synthesis features, available through menus once installed on the system. Microsoft Speech Server is a complete package for voice synthesis and recognition, for commercial applications such as call centers. Text-to-Speech (TTS) capabilities for a computer refers to the ability to play back text in a spoken voice. TTS is the ability of the operating system to play back printed text as spoken words.[29] An internal (installed with the operating system) driver (called a TTS engine): recognizes the text and using a synthesized voice (chosen from several pre-generated voices) speaks the written text. Additional engines (often use a certain jargon or vocabulary) are also available through third-party manufacturers[29]
Android
Version 1.6 of Android added support for speech synthesis (TTS).[30]
Internet
The most recent TTS development in the web browser, is the JavaScript Text to Speech [31] work of Yury Delendik, which ports the Flite C engine to pure JavaScript. This allows web pages to convert text to audio using HTML5 technology. The ability to use Yury's TTS port currently requires a custom browser build [32] that uses Mozilla's Audio-Data-API [33]. However, much work is being done in the context of the W3C to move this technology into the mainstream browser market through the W3C Audio Incubator Group [34] with the involvement of The BBC and Google Inc. Currently, there are a number of applications, plugins and gadgets that can read messages directly from an e-mail client and web pages from a web browser or Google Toolbar such as Text-to-voice which is an add-on to Firefox . Some specialized software can narrate RSS-feeds. On one hand, online RSS-narrators simplify information delivery by allowing users to listen to their favourite news sources and to convert them to podcasts. On the other hand, on-line RSS-readers are available on almost any PC connected to the Internet. Users can download generated audio files to portable devices, e.g. with a help of podcast receiver, and listen to them while walking, jogging or commuting to work. A growing field in internet based TTS is web-based assistive technology, e.g. 'Browsealoud' from a UK company and Readspeaker. It can deliver TTS functionality to anyone (for reasons of accessibility, convenience, entertainment or information) with access to a web browser. Additionally SPEAK.TO.ME [35] from Oxford Information
Speech synthesis Laboratories is capable of delivering text to speech through any browser without the need to download any special applications, and includes smart delivery technology to ensure only what is seen is spoken and the content is logically pathed.
Others
Some models of Texas Instruments home computers produced in 1979 and 1981 (Texas Instruments TI-99/4 and TI-99/4A) were capable of text-to-phoneme synthesis or reciting complete words and phrases (text-to-dictionary), using a very popular Speech Synthesizer peripheral. TI used a proprietary codec to embed complete spoken phrases into applications, primarily video games.[36] IBM's OS/2 Warp 4 included VoiceType, a precursor to IBM ViaVoice. Systems that operate on free and open source software systems including Linux are various, and include open-source programs such as the Festival Speech Synthesis System which uses diphone-based synthesis (and can use a limited number of MBROLA voices), and gnuspeech which uses articulatory synthesis[37] from the Free Software Foundation. Companies which developed speech synthesis systems but which are no longer in this business include BeST Speech (bought by L&H), Eloquent Technology (bought by SpeechWorks), Lernout & Hauspie (bought by Nuance), SpeechWorks (bought by Nuance), Rhetorical Systems (bought by Nuance).
Speech synthesis markup languages

A number of markup languages have been established for the rendition of text as speech in an XML-compliant format. The most recent is Speech Synthesis Markup Language (SSML), which became a W3C recommendation in 2004. Older speech synthesis markup languages include Java Speech Markup Language (JSML) and SABLE. Although each of these was proposed as a standard, none of them has been widely adopted. Speech synthesis markup languages are distinguished from dialogue markup languages. VoiceXML, for example, includes tags related to speech recognition, dialogue management and touchtone dialing, in addition to text-to-speech markup.
Applications
Speech synthesis has long been a vital assistive technology tool and its application in this area is significant and widespread. It allows environmental barriers to be removed for people with a wide range of disabilities. The longest application has been in the use of screen readers for people with visual impairment, but text-to-speech systems are now commonly used by people with dyslexia and other reading difficulties as well as by pre-literate children. They are also frequently employed to aid those with severe speech impairment usually through a dedicated voice output communication aid. Sites such as Ananova and YAKiToMe! have used speech synthesis to convert written news to audio content, which can be used for mobile applications. Speech synthesis techniques are used as well in the entertainment productions such as games, anime and similar. In 2007, Animo Limited announced the development of a software application package based on its speech synthesis software FineSpeech, explicitly geared towards customers in the entertainment industries, able to generate narration and lines of dialogue according to user specifications.[38] The application reached maturity in 2008, when NEC Biglobe announced a web service that allows users to create phrases from the voices of Code Geass: Lelouch of the Rebellion R2 characters.[39] TTS applications such as YAKiToMe! and Speakonia are often used to add synthetic voices to YouTube videos for comedic effect, as in Barney Bunch videos. YAKiToMe! is also used to convert entire books for personal podcasting purposes, RSS feeds and web pages for news stories, and educational texts for enhanced learning.
Speech synthesis Software such as Vocaloid can generate singing voices via lyrics and melody. This is also the aim of the Singing Computer project (which uses GNU LilyPond and Festival) to help blind people check their lyric input.[40]
See also
Text-to-voice Mozilla Firefox extension Comparison of speech synthesizers Articulatory synthesis Chinese speech synthesis Natural language processing Paperless office Comparison of screen readers Sinewave synthesis Speech processing Silent speech interface
External links
IVONA Text-To-Speech [41] Text to Speech Synthesis in the Web Browser with JavaScript [31] Speech synthesis [42] at the Open Directory Project Text to Voice or Text to Speech Firefox Addon [43] Dennis Klatt's History of Speech Synthesis [44]
References
[1] Jonathan Allen, M. Sharon Hunnicutt, Dennis Klatt, From Text to Speech: The MITalk system. Cambridge University Press: 1987. ISBN 0-521-30641-8 [2] Rubin, P., Baer, T., & Mermelstein, P. (1981). An articulatory synthesizer for perceptual research. Journal of the Acoustical Society of America, 70, 321-328. [3] P. H. Van Santen, Richard William Sproat, Joseph P. Olive, and Julia Hirschberg, Progress in Speech Synthesis. Springer: 1997. ISBN 0-387-94701-9 [4] History and Development of Speech Synthesis (http:/ / www. acoustics. hut. fi/ publications/ files/ theses/ lemmetty_mst/ chap2. html), Helsinki University of Technology, Retrieved on November 4, 2006 [5] Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine ("Mechanism of the human speech with description of its speaking machine," J. B. Degen, Wien). [6] Mattingly, Ignatius G. Speech synthesis for phonetic and phonological models. In Thomas A. Sebeok (Ed.), Current Trends in Linguistics, Volume 12, Mouton, The Hague, pp. 2451-2487, 1974. [7] Kurzweil, Raymond (2005). The Singularity is Near. Penguin Books. ISBN0-14-303788-9. [8] Lambert, Bruce (1992-03-21). "NY Times obituary for Louis Gerstman" (http:/ / query. nytimes. com/ search/ query?ppds=per& v1=GERSTMAN, LOUIS& sort=newest). New York Times. . Retrieved 2010-02-17. [9] Arthur C. Clarke online Biography (http:/ / www. lsi. usp. br/ ~rbianchi/ clarke/ ACC. Biography. html) [10] "Where "HAL" First Spoke (Bell Labs Speech Synthesis website)" (http:/ / www. bell-labs. com/ news/ 1997/ march/ 5/ 2. html). Bell Labs. . Retrieved 2010-02-17. [11] Anthropomorphic Talking Robot Waseda-Talker Series (http:/ / www. takanishi. mech. waseda. ac. jp/ top/ research/ voice/ index. htm) [12] Alan W. Black, Perfect synthesis for all of the people all of the time. IEEE TTS Workshop 2002. <http://www.cs.cmu.edu/~awb/papers/IEEE2002/allthetime/allthetime.html> [13] John Kominek and Alan W. Black. (2003). CMU ARCTIC databases for speech synthesis. CMU-LTI-03-177. Language Technologies Institute, School of Computer Science, Carnegie Mellon University. [14] Julia Zhang. Language Generation and Speech Synthesis in Dialogues for Language Learning, masters thesis, http:/ / groups. csail. mit. edu/ sls/ publications/ 2004/ zhang_thesis. pdf Section 5.6 on page 54. [15] PSOLA Synthesis (http:/ / www. fon. hum. uva. nl/ praat/ manual/ PSOLA. html) [16] T. Dutoit, V. Pagel, N. Pierret, F. Bataille, O. van der Vrecken. The MBROLA Project: Towards a set of high quality speech synthesizers of use for non commercial purposes. ICSLP Proceedings, 1996.
Speech synthesis
[17] L.F. Lamel, J.L. Gauvain, B. Prouts, C. Bouhier, R. Boesch. Generation and Synthesis of Broadcast Messages, Proceedings ESCA-NATO Workshop and Applications of Speech Technology, September 1993. [18] Dartmouth College: Music and Computers.http:/ / digitalmusics. dartmouth. edu/ ~book/ MATCpages/ chap. 4/ 4. 4. formant_synth. html, 1993. [19] Examples include Astro Blaster, Space Fury, and Star Trek: Strategic Operations Simulator [20] Examples include Star Wars, Firefox, Return of the Jedi, Road Runner, The Empire Strikes Back, Indiana Jones and the Temple of Doom, 720, Gauntlet, Gauntlet II, [A.P.B. (video game)A.P.B., Paperboy, RoadBlasters, Vindicators Part II (http:/ / www. arcade-museum. com/ game_detail. php?game_id=10319), Escape from the Planet of the Robot Monsters. [21] John Holmes and Wendy Holmes. Speech Synthesis and Recognition, 2nd Edition. CRC: 2001. ISBN 0-7484-0856-8. [22] The HMM-based Speech Synthesis System, http:/ / hts. sp. nitech. ac. jp/ [23] Remez, R.E., Rubin, P.E., Pisoni, D.B., & Carrell, T.D. Speech perception without traditional speech cues. Science, 1981, 212, 947-950. [24] "Speech synthesis" (http:/ / www. w3. org/ TR/ speech-synthesis/ #S3. 1. 8). World Wide Web Organization. . [25] Blizzard Challenge http:/ / festvox. org/ blizzard [26] The Sound of Smiling (http:/ / www. port. ac. uk/ aboutus/ newsandevents/ news/ title,74220,en. html) [27] 1400XL/1450XL Speech Handler External Reference Specification (http:/ / www. atarimuseum. com/ ahs_archives/ archives/ pdf/ computers/ 8bits/ 1400xlmodem. pdf) [28] Miner, Jay et al. (1991). Amiga Hardware Reference Manual: Third Edition. Addison-Wesley Publishing Company, Inc. ISBN 0-201-56776-8. [29] "How to configure and use Text-to-Speech in Windows XP and in Windows Vista" (http:/ / support. microsoft. com/ kb/ 306902). Support.microsoft.com. 2007-05-07. . Retrieved 2010-02-17. [30] Jean-Michel Trivi (2009-09-23). "An introduction to Text-To-Speech in Android" (http:/ / android-developers. blogspot. com/ 2009/ 09/ introduction-to-text-to-speech-in. html). Android-developers.blogspot.com. . Retrieved 2010-02-17. [31] http:/ / vimeo. com/ 12039415 [32] https:/ / wiki. mozilla. org/ Audio_Data_API#Obtaining_Code_and_Builds [33] https:/ / wiki. mozilla. org/ Audio_Data_API [34] http:/ / www. w3. org/ 2010/ 04/ audio/ audio-incubator-charter. html [35] http:/ / www. oxil. co. uk/ decSpeakToMe/ modResourcesLibrary/ HtmlRenderer/ SpeakToMe. html [36] "Smithsonian Speech Synthesis History Project (SSSHP) 1986-2002" (http:/ / www. mindspring. com/ ~ssshp/ ssshp_cd/ ss_home. htm). Mindspring.com. . Retrieved 2010-02-17. [37] "gnuspeech" (http:/ / www. gnu. org/ software/ gnuspeech/ ). Gnu.org. . Retrieved 2010-02-17. [38] "Speech Synthesis Software for Anime Announced" (http:/ / animenewsnetwork. com/ news/ 2007-05-02/ speech-synthesis-software). Animenewsnetwork.com. 2007-05-02. . Retrieved 2010-02-17. [39] "Code Geass Speech Synthesizer Service Offered in Japan" (http:/ / www. animenewsnetwork. com/ news/ 2008-09-09/ code-geass-voice-synthesis-service-offered-in-japan). Animenewsnetwork.com. 2008-09-09. . Retrieved 2010-02-17. [40] Brailcom, o.p.s.. "Singing Computer" (http:/ / www. freebsoft. org/ singing-computer). Free(b)soft. . Retrieved 2010-02-17. [41] http:/ / www. ivona. com [42] http:/ / www. dmoz. org/ Computers/ Speech_Technology/ Speech_Synthesis/ / [43] https:/ / addons. mozilla. org/ en-US/ firefox/ addon/ 91405/ ?src=external-wp [44] http:/ / www. cs. indiana. edu/ rhythmsp/ ASA/ Contents. html
10
Prosody (linguistics)
11
In linguistics, prosody (pronounced /prsdi/, PROSS--dee) is the rhythm, stress, and intonation of speech. Prosody may reflect various features of the speaker or the utterance: the emotional state of a speaker; whether an utterance is a statement, a question, or a command; whether the speaker is being ironic or sarcastic; emphasis, contrast, and focus; or other elements of language that may not be encoded by grammar or choice of vocabulary.
Acoustic attributes of prosody

In terms of acoustics, the prosodics of oral languages involve variation in syllable length, loudness, pitch, and the formant frequencies of speech sounds. In sign languages, prosody involves the rhythm, length, and tension of gestures, along with mouthing and facial expressions. Prosody is absent in writing, which is one reason e-mail, for example, may be misunderstood. Orthographic conventions to mark or substitute for prosody include punctuation (commas, exclamation marks, question marks, scare quotes, and ellipses), typographic styling for emphasis (italic, bold, and underlined text), and emoticons. The details of a language's prosody depend upon its phonology. For instance, in a language with phonemic vowel length, this must be marked separately from prosodic syllable length. In similar manner, prosodic pitch must not obscure tone in a tone language if the result is to be intelligible. Although tone languages such as Mandarin have prosodic pitch variations in the course of a sentence, such variations are long and smooth contours, on which the short and sharp lexical tones are superimposed. If pitch can be compared to ocean waves, the swells are the prosody, and the wind-blown ripples in their surface are the lexical tones, as with stress in English. The word dessert has greater stress on the second syllable, compared to desert, which has greater stress on the first; but this distinction is not obscured when the entire word is stressed by a child demanding "Give me dessert!" Vowels in many languages are likewise pronounced differently (typically less centrally) in a careful rhythm or when a word is emphasized, but not so much as to overlap with the formant structure of a different vowel. Both lexical and prosodic information are encoded in rhythm, loudness, pitch, and vowel formants.
The prosodic domain

Prosodic features are suprasegmental. They are not confined to any one segment, but occur in some higher level of an utterance. These prosodic units are the actual phonetic "spurts", or chunks of speech. They need not correspond to grammatical units such as phrases and clauses, though they may; and these facts suggest insights into how the brain processes speech. Prosodic units are marked by phonetic cues, such as a coherent pitch contour or the gradual decline in pitch and lengthening of vowels over the duration of the unit, until the pitch and speed are reset to begin the next unit. Breathing, both inhalation and exhalation, seems to occur only at these boundaries where the prosody resets. "Prosodic structure" is important in language contact and lexical borrowing. For example, in Modern Hebrew, the XiXX verb-template is much more productive than the XaXX verb-template because in morphemic adaptations of non-Hebrew stems, the XiXX verb-template is more likely to retain in all conjugations throughout the tenses the prosodic structure (e.g., the consonant clusters and the location of the vowels) of the stem.Hybridity versus Revivability: Multiple Causation, Forms and Patterns [1]. In Journal of Language Contact, Varia 2 (2009), pp. 40-67.</ref>
12
Prosody and emotion

Emotional prosody is the expression of feelings using prosodic elements of speech. It was recognized by Charles Darwin in The Descent of Man as predating the evolution of human language: "Even monkeys express strong feelings in different tones anger and impatience by low, fear and pain by high notes."[2] Native speakers listening to actors reading emotionally neutral text while projecting emotions correctly recognized happiness 62% of the time, anger 95%, surprise 91%, sadness 81%, and neutral tone 76%. When a database of this speech was processed by computer, segmental features allowed better than 90% recognition of happiness and anger, while suprasegmental prosodic features allowed only 44%49% recognition. The reverse was true for surprise, which was recognized only 69% of the time by segmental features and 96% of the time by suprasegmental prosody.[3] In typical conversation (no actor voice involved), the recognition of emotion may be quite low, of the order of 50%, hampering the complex interrelationship function of speech advocated by some authors.[4]
Brain location of prosody

An aprosodia is an acquired or developmental impairment in comprehending or generating the emotion conveyed in spoken language. Producing these nonverbal elements requires intact motor areas of the face, mouth, tongue, and throat. This area is associated with Brodmann areas 44 and 45 (Broca's area) of the left frontal lobe. Damage to areas 44/45 produces motor aprosodia, with the nonverbal elements of speech being disturbed (facial expression, tone, rhythm of voice). Understanding these nonverbal elements requires an intact and properly functioning Brodmann area 22 (Wernicke's area) in the right hemisphere. Right-hemispheric area 22 aids in the interpretation of prosody, and damage causes sensory aprosodia, with the patient unable to comprehend changes in voice and body language. Prosody is dealt with by a right-hemisphere network that is largely a mirror image of the left perisylvian zone. Damage to the right inferior frontal gyrus causes a diminished ability to convey emotion or emphasis by voice or gesture, and damage to right superior temporal gyrus causes problems comprehending emotion or emphasis in the voice or gestures of others.
References
[1] http:/ / www. zuckermann. org/ pdf/ Hybridity_versus_Revivability. pdf [2] Charles Darwin (1871). "The Descent of Man" (http:/ / www. infidels. org/ library/ historical/ charles_darwin/ descent_of_man/ chapter_19. html). . citing Johann Rudolph Rengger, Natural History of the Mammals of Paraguay, s. 49 [3] R. Barra, J.M. Montero, J. Macas-Guarasa, L.F. DHaro, R. San-Segundo, R. Crdoba. "Prosodic and segmental rubrics in emotion identification" (http:/ / www-gth. die. upm. es/ research/ documentation/ AG-39Pro-06. pdf). . [4] H.-N. Teodorescu and Silvia Monica Feraru. In: Lecture Notes in Computer Science, Springer Berlin, Heidelberg. ISSN 0302-9743, Volume 4629/2007, Text, Speech and Dialogue. Pages 254-261. "A Study on Speech with Manifest Emotions," (http:/ / www. springerlink. com/ content/ lm172787m377556p/ ). .
Further reading
Nolte, John. The Human Brain 6th Edition
External links
Lessons in Prosody (at the [[University of Freiburg (http://paul.igl.uni-freiburg.de/lip/)])] Prosody on the Web - (a tutorial on prosody) (http://www.eptotd.btinternet.co.uk/pow/powin.htm)
Tone (linguistics)
13
Tone (linguistics)
Tone is the use of pitch in language to distinguish lexical or grammatical meaningthat is, to distinguish or inflect words. All verbal languages use pitch to express emotional and other paralinguistic information, and to convey emphasis, contrast, and other such features in what is called intonation, but not all languages use tones to distinguish words or their inflections, analogously to consonants and vowels. Such tonal phonemes are sometimes called tonemes. In the most widely-spoken tonal language, Chinese, tones are distinguished by their shape (contour), most syllables carry their own tone, and many words are differentiated solely by tone. Moreover, tone plays little role in modern Chinese grammar, though the tones descend from features in Old Chinese that did have morphological significance. In many tonal African languages, such as most Bantu languages, however, tones are distinguished by their relative level, words are longer, there are fewer minimal tone pairs, and a single tone may be carried by the entire word, rather than a different tone on each syllable. Often grammatical information, such as past versus present, "I" versus "you", or positive versus negative, is conveyed solely by tone. Many languages use tone in a more limited way. Somali, for example, may only have one high tone per word. In Japanese, less than half of the words have drop in pitch; words contrast according to which syllable this drop follows. Such minimal systems are sometimes called pitch accent, since they are reminiscent of stress accent languages which typically allow one principal stressed syllable per word. However, there is debate over the definition of pitch accent, and whether a coherent definition is even possible.
Tonal languages
Most languages of sub-Saharan Africa are tonal, though notably excepting Swahili in the East, and Wolof and Fulani in the West. The Chadic, Omotic, and to some extent Cushitic branches of Afroasiatic are tonalthe Omotic languages heavily sothough their sister families of Semitic, Berber, and Egyptian are not. There are numerous tonal languages in East Asia, including all the Chinese languages (though some such as Shanghainese are only marginally tonal), Vietnamese, Thai, and Lao. Some East Asian languages, such as Burmese, Korean, and Japanese have simpler tone systems, which are sometimes called 'register' or 'pitch accent' systems. However, some languages in the region are not tonal at all, including Mongolian, Khmer, and Malay. Of the Tibetan languages, Central Tibetan (including the dialect of the capital Lhasa) and Amdo Tibetan are tonal, while Khams Tibetan and Ladakhi are not. Some of the native languages of North and South America are tonal, notably many of the Athabaskan languages of Alaska and the American Southwest (including Navajo), and especially the Oto-Manguean languages of Mexico. Among the Mayan languages, which are mostly non-tonal, Yucatec (with the largest number of speakers), Uspantek, and one dialect of Tzotzil have developed simple tone systems. In Europe, Norwegian, Swedish, Latvian, Lithuanian, Serbo-Croatian, some dialects of Slovene, and Limburgish have simple tone systems generally characterized as pitch accent. Other Indo-European tonal languages, spoken in the Indian subcontinent, are Punjabi, Lahanda, Rabinian and Western Pahari.[1] [2] [3] [4] Languages that are tonal include: Some of the Sino-Tibetan languages, including the numerically most important ones. Most forms of Chinese are strongly tonal (an exception being Shanghainese, where the system has collapsed to only a two-way contrast at the word level with some initial consonants, and no contrast at all with others); while some of the Tibetan languages, including the standard languages of Lhasa and Bhutan and Burmese, are more marginally tonal. However, Nepal Bhasa, the original language of Kathmandu, is non-tonal, as are several Tibetan dialects and many other Tibeto-Burman languages.
Tone (linguistics) In the Austro-Asiatic family, Vietnamese and its closest relatives are strongly tonal. Other languages of this family, such as Mon, Khmer, and the Munda languages, are non-tonal. The entire Kradai family, spoken mainly in China, Vietnam, Thailand, and Laos, is strongly tonal. The entire Hmong-Mien languages family is strongly tonal. Many Afroasiatic languages in the Chadic, Cushitic and Omotic families have register-tone systems, such as Chadic Hausa. Many of the Omotic tone systems are quite complex. However, many other languages in these families, such as the Cushitic language Somali, have minimal tone. The vast majority of Niger-Congo languages, such as Ewe, Igbo, Lingala, Maninka, Yoruba, and the Zulu, have register-tone systems. The Kru languages have contour tones. Notable non-tonal Niger-Congo languages are Swahili, Fula, and Wolof. Possibly all Nilo-Saharan languages have register-tone systems. All Khoisan languages in southern Africa have contour-tone systems. Slightly more than half of the Athabaskan languages, such as Navajo, have simple register-tone systems (languages in California, Oregon and a few in Alaska excluded), but the languages that have tone fall into two groups that are mirror images of each other. That is, a word which has a high tone in one language will have a cognate with a low tone in another, and vice versa. Iroquoian languages are tonal, for example the Mohawk language has three tones. All Oto-Manguean languages are tonal. Most have register-tone systems, some contour systems. These are perhaps the most complex tone systems in North America. The Kiowa-Tanoan languages. Scattered languages of the Amazon basin, usually with rather simple register-tone systems. Scattered languages of New Guinea, usually with rather simple register-tone systems. A few Indo-European languages, namely Panjabi, Ancient Greek, Vedic Sanskrit, Swedish, Norwegian, Limburgish, Lithuanian, and West South Slavic languages (Slovene, Croatian and Serbian) have limited word-tone systems which are sometimes called pitch accent or "tonal accents". Generally there can only be at most one tonic syllable per word of 2-5 different registers, as well as additional distinctive and non-distinctive pre- and post-tonic lengths. Some European-based creole languages, such as Saramaccan and Papiamentu, have tone from their African substratum languages. The vast majority of Austronesian languages are non-tonal, but a small number, for example Ma'ya (which also has lexical stress) have developed tone. No tonal language has been reported from Australia. In some cases it is difficult to determine whether a language is tonal. For example, the Ket language has been described as having up to eight tones by some investigators, as having four tones by others, but by some as having no tone at all. In cases such as these, the classification of a language as tonal may depend on the researcher's interpretation of what tone is. For instance, the Burmese language has phonetic tone, but each of its three tones is accompanied by a distinctive phonation (creaky, murmured or plain vowels). It could be argued either that the tone is incidental to the phonation, in which case Burmese would not be phonemically tonal, or that the phonation is incidental to the tone, in which case it would be considered tonal. Something similar appears to be the case with Ket.
14
Tone (linguistics)
15
Mechanics
Most languages use pitch as intonation to convey prosody and pragmatics, but this does not make them tonal languages. In tonal languages, each syllable has an inherent pitch contour, and thus minimal pairs exist between syllables with the same segmental features but different tones. Here is a minimal tone set from Mandarin Chinese, which has five tones, here transcribed by diacritics over the vowels: 1. 2. 3. 4. 5. A high level tone: // (pinyin ) A tone starting with mid pitch and rising to a high pitch: // (pinyin ) A low tone which dips briefly before, if there is no following syllable, rising to a high pitch: // (pinyin ) A sharply falling tone, starting high and falling to the bottom of the speaker's vocal range: // (pinyin ) A neutral tone, sometimes indicated by a dot () in Pinyin, has no specific contour; its pitch depends on the tones of the preceding and following syllables. Mandarin speakers refer to this tone as the "light tone" (simplified Chinese: ; traditional Chinese: ; pinyin: qng shng), also called the "fifth tone", "zeroth tone", or "neutral tone". Note, however, that in Mandarin the occurrence of this tone on single syllable words is marginal, and furthermore it only occurs with grammatical syllables. In disyllabic words, there is a strong tendency in modern Mandarin for the second syllable to be pronounced with a light tone.
These tones combine with a syllable such as "ma" to produce different words. A minimal set based on "ma" are, in pinyin transcription, 1. 2. 3. 4. 5. m "mother" m "hemp" m "horse" m "scold" ma (an interrogative particle)
These may be combined into the rather contrived sentence, ?/? Pinyin: mma m m de m ma? English:"Is Mother scolding the horse's hemp?" A well-known tongue-twister in the Thai language is: IPA: /mi mi mi mi/ "Does new silk burn?"[5] Tones can interact in complex ways through a process known as tone sandhi.
Register tones and contour tones

Tone systems fall into two broad patterns: Register tone systems and contour tone systems. Most Chinese languages use contour tone systems, where the distinguishing feature of the tones are their shifts in pitch (that is, the pitch is a contour), such as rising, falling, dipping, or level. Most Bantu languages, on the other hand, have register tone systems, where the distinguishing feature is the relative difference between the pitches, such as high, mid, or low, rather than their shapes. In many register tone systems there is a default tone, usually low in a two-tone system or mid in a three-tone system, that is more common and less salient than other tones. There are also languages that combine register and contour tones, such as many Kru languages, where nouns are distinguished by contour tones and verbs by register. Others, such as Yoruba, have phonetic contours, but these can easily be analysed as sequences of register tones, with for example sequences of highlow // becoming falling [], and sequences of
Tone (linguistics) lowhigh // becoming rising [].
16
Register languages
The term "register", when not used in the phrase "register tone", commonly indicates vowel phonation combined with tone in a single phonological system. Burmese, for example, is a register language, where differences in pitch are so intertwined with vowel phonation that neither can be considered without the other.
Tone terracing and tone sandhi

Tone terracing
Tones are realized as pitch only in a relative sense. 'High tone' and 'low tone' are only meaningful relative to the speaker's vocal range and in comparing one syllable to the next, rather than as a contrast of absolute pitch such as one finds in music. As a result, when one combines tone with sentence prosody, the absolute pitch of a high tone at the end of a prosodic unit may be lower than that of a low tone at the beginning of the unit, because of the universal tendency (in both tonal and non-tonal languages) for pitch to decrease with time in a process called downdrift. Tones may affect each other just as consonants and vowels do. In many register-tone languages, low tones may cause a downstep in following high or mid tones; the effect is such that even while the low tones remain at the lower end of the speaker's vocal range (which is itself descending due to downdrift), the high tones drop incrementally like steps in a stairway or terraced rice fields, until finally the tones merge and the system has to be reset. This effect is called tone terracing. Sometimes a tone may remain as the sole realization of a grammatical particle after the original consonant and vowel disappear, so it can only be heard by its effect on other tones. It may cause downstep, or it may combine with other tones to form contours. These are called floating tones.
Tone sandhi
In many contour-tone languages, one tone may affect the shape of an adjacent tone. The affected tone may become something new, a tone that only occurs in such situations, or it may be changed into a different existing tone. This is called tone sandhi. In Mandarin Chinese, for example, a dipping tone between two other tones is reduced to a simple low tone, which otherwise does not occur in Mandarin, whereas if two dipping tones occur in a row, the first becomes a rising tone, indistinguishable from other rising tones in the language. For example, the words [xn] 'very' and [xa] 'good' produce the phrase [xn xa] 'very good'.
Word tones and syllable tones

Another difference between tonal languages is whether the tones apply independently to each syllable or to the word as a whole. In Cantonese, Thai, and to some extent the Kru languages, each syllable may have any tone, whereas in Shanghainese, the Scandinavian languages, and many Bantu languages, the contour of each tone operates at the word level. That is, a trisyllabic word in a three-tone syllable-tone language has many more tonal possibilities (3 3 3 = 27) than a monosyllabic word (3), but there is no such difference in a word-tone language. For example, Shanghainese has two contrastive tones no matter how many syllables are in a word. Many languages described as having pitch accent are word-tone languages. Tone sandhi is an intermediate situation, as tones are carried by individual syllables, but affect each other so that they are not independent of each other. For example, a number of Mandarin suffixes and grammatical particles have what is called (when describing Mandarin) a "neutral" tone, which has no independent existence. If a syllable with a neutral tone is added to a syllable with a full tone, the pitch contour of the resulting word is entirely determined by that other syllable:
Tone (linguistics)
17
Realization of neutral tones in Mandarin

Tone in isolation Tone pattern with added 'neutral tone' . . . . Example Pinyin English meaning
high rising dipping falling
bli bbo lba tzi
glass elder uncle horn rabbit
After high level and high rising tones, the neutral syllable has an independent pitch that looks like a mid-register tone the default tone in most register-tone languages. However, after a falling tone it takes on a low pitch; the contour tone remains on the first syllable, but the pitch of the second syllable matches where the contour leaves off. And after a low-dipping tone, the contour spreads to the second syllable: the contour remains the same () whether the word has one syllable or two. In other words, the tone is now the property of the word, not the syllable. Shanghainese has taken this pattern to its extreme, as the pitches of all syllables are determined by the tone before them, so that only the tone of the initial syllable of a word is distinctive.
Tonal polarity
Languages with simple tone systems or pitch accent may have one or two syllables specified for tone, with the rest of the word taking a default tone. Such languages differ in which tone is marked and which is the default. In Navajo, for example, syllables have a low tone by default, while marked syllables have high tone. In the related language Sekani, however, the default is high tone, and marked syllables have low tone.[6] There are parallels with stress: English stressed syllables have a higher pitch than unstressed syllables, whereas in Russian, stressed syllables have a lower pitch.
Uses of tone
In East Asia, tone is typically lexical. This is characteristic of heavily tonal languages such as Chinese, Vietnamese, Thai, and Hmong. That is, tone is used to distinguish words which would otherwise be homonyms, rather than in the grammar, though some Yue Chinese dialects have minimal grammatical use of tone. However, in many African languages, especially in the Niger-Congo family, tone is crucial to the grammar, with relatively little lexical use. In the Kru languages, a combination of these patterns is found: nouns tend to have complex tone systems reminiscent of East Asia, but are not much affected by grammatical inflections, whereas verbs tend to have simple tone systems of the type more typical of Africa, which are inflected to indicate tense and mood, person, and polarity, so that tone may be the only distinguishing feature between 'you went' and 'I won't go'. In colloquial Yoruba, especially when spoken quickly, vowels may assimilate to each other, and consonants elide, so that much of the lexical and grammatical information is carried by tone. In languages of West Africa such as Yoruba, people may even communicate with so-called "talking drums", which are modulated to imitate the tones of the language, or by whistling the tones of speech.
Tone (linguistics)
18
Phonetic notation
There are three main approaches to notating tones in phonetic descriptions of a language. The easiest from a typological perspective is a numbering system, with the pitch levels assigned numerals, and each tone transcribed as a numeral or sequence of numerals. Such systems tend to be idiosyncratic, for example with high tone being assigned the numeral 1, 3, or 5, and so have not been adopted for the International Phonetic Alphabet. Also simple for simple tone systems is a series of diacritics, such as for high tone and for low tone. This has been adopted by the IPA, but is not easy to adapt to complex contour tone systems (see under Chinese below for one work-around). The five IPA diacritics for level tones are . These may be combined to form contour tones, o o o o o o, though font support is sparse. Sometimes a non-IPA vertical diacritic for a second, higher, mid tone is seen, o, so that in a language with four level tones, they may be transcribed o . The most flexible system is that of tone letters, which are iconic schematics of the pitch trace of the tone in question. They are most commonly used for complex contour systems, as in Liberia and southern China.
Africa
In African linguistics (as well as in many African orthographies), usually a set of accent marks is used to mark tone. The most common phonetic set (which is also included in the International Phonetic Alphabet) is found below:
High tone acute Mid tone
macron
Low tone grave
Several variations are found. In many three tone languages, it is common to mark High and Low tone as indicated above, but to omit marking of the Mid tone, e.g., m (High), ma (Mid), m (Low). Similarly, in some two tone languages, only one tone is marked explicitly. With more complex tonal systems, such as in the Kru and Omotic languages, it is usual to indicate tone with numbers, with 1 for HIGH and 4 or 5 for LOW in Kru, but 1 for LOW and 5 for HIGH in Omotic. Contour tones are then indicated 14, 21, etc.
Asia
In the Chinese tradition, numerals are assigned to various tones. For instance, Standard Mandarin has four lexically contrastive tones, and the numerals 1, 2, 3, and 4 are assigned to four tones. Syllables can sometimes be toneless and are described as having a neutral tone, typically indicated by omitting tone markings. Chinese dialects are traditionally described in terms of four tonal categories ping 'level', shang 'rising', qu 'exiting', ru 'entering'. Depending on the dialect, each of these categories may then be divided into two tones, typically called yin and yang. Syllables carrying the ru tones are closed by voiceless stops in all Chinese dialects, so that ru is not a tonal category in the sense used by Western linguistics, but rather a category of syllable structures. Chinese phonologists perceived these checked syllables as having concomitant short tones, justifying them as a tonal category. During the period of Middle Chinese, when the tonal categories were established, the shang and qu tones also had characteristic final obstruents with concomitant tonic differences, whereas syllables bearing the ping tone ended in a simple sonorant. An alternate to using the Chinese category names is to assign to each category a numeral ranging from 18, or sometimes higher for dialects with additional tone splits. It should be noted that syllables belonging to the same tone category differ drastically in actual phonetic tone across the Chinese dialects. For example, the yin ping tone is a high level tone in Beijing Mandarin, but a low level tone in Tianjin Mandarin. More iconic systems are to use tone numbers, or an equivalent set of graphic pictograms known as 'Chao tone letters'. These divide the pitch into five levels, with the lowest being assigned the value 1, and the highest the value
Tone (linguistics) 5. (This is the opposite of equivalent systems in Africa and the Americas.) The variation in pitch of a tone contour is notated as a string of two or three numbers. For instance, the four Mandarin tones are transcribed as follows (note that the tone letters will not display properly unless you have a compatible font installed):
19
Tones of Standard Mandarin

High tone Mid rising tone 55 35 (Tone 1) (Tone 2)
Low dipping tone 214 (Tone 3) High falling tone 51 (Tone 4)
A mid-level tone would be indicated by /33/, a low level tone /11/, etc. Standard IPA notation is also sometimes seen for Chinese. One reason it is not more widespread is that only two contour tones, rising // and falling //, are widely supported by IPA fonts, while several Chinese languages have more than one rising or falling tone. One common work-around is to retain standard IPA // and // for high-rising (/35/) and high-falling (/53/) tones, and to use the subscript diacritics // and // for low-rising (/13/) and low-falling (/31/) tones. The Thai language has five tones: high, mid, low, rising and falling. The Thai written script is an alphasyllabary which specifies the tone unambiguously. Tone is indicated by an interaction of the initial consonant of a syllable, the vowel, the final consonant (if present), and sometimes a tone mark. A particular tone mark may denote different tones depending on the initial consonant. Vietnamese uses the Latin alphabet, and the 6 tones are marked by diacritics above or below a certain vowel of each syllable. In many words that end in diphthongs, however, exactly which vowel is marked is still debatable. Notation for Vietnamese tones are as follows:
Tones of northern Vietnamese

Name ngang huyn sc hi ng nng Contour mid level, low falling, high rising, dipping, creaky rising, creaky falling, Diacritic not marked grave accent acute accent hook tilde dot below Example a
The Latin-based Hmong and Iu Mien alphabets use full letters for tones. In Hmong, one of the eight tones (the tone) is left unwritten, while the other seven are indicated by the letters b, m, d, j, v, s, g at the end of the syllable. Since Hmong has no phonemic syllable-final consonants, there is no ambiguity. This system enables Hmong speakers to type their language with an ordinary Latin-letter typewriter without having to resort to diacritics. In the Iu Mien, the letters v, c, h, x, z indicate tones but, unlike Hmong, it also has final consonants written before the tone. The Japanese language does not have tone, but does have downstep, so that me (rain), with a drop in pitch after the first syllable, is distinguished from ame (candy), which has no drop.
Tone (linguistics)
20
Americas
Several North American languages have tone, one of which is Oklahoma Cherokee, said to be the most musical of the Iroquoian languages. Cherokee has six tones (1 low, 2 medium, 3 high, 4 very high, 23 rising and 32 falling). In Mesoamericanist linguistics, /1/ stands for High tone and /5/ stands for Low tone, except in Oto-Manguean languages, where /1/ may be Low tone and /3/ High tone. It is also common to see acute accents for high tone and grave accents for low tone and combinations of these for contour tones. Several popular orthographies use j or h after a vowel to indicate low tone. Southern Athabascan languages that include the Navajo and Apache languages are tonal, and are analyzed as having 2 tones, high and low. One variety of Hopi has developed tone, as has the Cheyenne language. The Mesoamerican language stock called Oto-Manguean is famously tonal and is the largest language family in Mesoamerica, containing languages including Zapotec, Mixtec, and Otom, some of which have as many as 8 different tones (Chinantec,) and others only two (Matlatzinca and Chichimeca Jonaz). Other languages in Mesoamerica that have tones are Huichol, Yukatek Maya, Tzotzil Maya of San Bartolo and Uspantec Maya (Quich of Uspantn), and one variety of Huave. A number of languages of South America are tonal. For example, the Pirah language has three tones. The Ticuna language isolate is exceptional for having five level tones (the only other languages to have such a system are the Trique language and the Usila dialect of Chinantec (both Oto-Manguean languages of Mexico).
Europe
Both Swedish and Norwegian have simple word tone systems, often called pitch accent, that only appears in words of two or more syllables. This differentiates some two-syllable words depending on their morphological structure. The two word tones are usually called accent 1 and accent 2 (or acute accent and grave accent), respectively. In Limburgish tones can also occur in words of one syllable: dg (one day) - dg (several days).
Practical orthographies
In practical alphabetic orthographies, a number of approaches are used. Diacritics are common, as in pinyin, though these tend to be omitted.[7] Thai uses a combination of redundant consonants and diacritics. Tone letters may also be used, for example in Hmong RPA and several minority languages in China. Or tone may simply be ignored. This is possible even for highly tonal languages: for example, the Chinese navy has successfully used toneless pinyin in government telegraph communications for decades, and likewise Chinese reporters abroad may file their stories in toneless pinyin. Dungan, a variety of Mandarin spoken in Central Asia, has, since 1927, been written in orthographies that do not indicate tone.[7] Ndjuka, where tone is less important, ignores tone except for a negative marker. However, the reverse is also true: in the Congo, there have been complaints from readers that newspapers written in orthographies without tone marking are insufficiently legible.
Tone (linguistics)
21
Number of tones
Languages may distinguish up to five levels of pitch, though the Chori language of Nigeria is described as distinguishing six surface tone registers. Since tone contours may involve up to two shifts in pitch, there are theoretically 5 x 5 x 5 = 125 distinct tones for a language with five registers. However, the most that are actually used in a language is a tenth of that number. Several Kam-Sui languages of southern China have nine tones, including contour tones, assuming that checked syllables are not counted as having additional tones, as they traditionally are in China. Preliminary work on the Wobe language of Liberia and Ivory Coast and the Chatino languages of southern Mexico suggests that some dialects may distinguish as many as fourteen tones, but many linguists have expressed doubts, believing that many of these will turn out to be sequences of tones or prosodic effects.
Tonal consonants
Tone is carried by the word or syllable, so syllabic consonants such as nasals and trills may bear tone. This is especially common with syllabic nasals, for example in many Bantu and Kru languages, but also occurs in Serbo-Croatian and Yorb.
Origin
Sound change and alternation
Fortition (strengthening)
Dissimilation Rhotacism ([z], [d], or [n] [r]) Rhinoglottophilia ([h] or [] [] or [])
Andr-Georges Haudricourt established that Vietnamese tone originated in earlier consonantal contrasts, and suggested similar mechanisms for Chinese.[8] It is by now well-established that Old Chinese did not have phonemically contrastive tone. The historical origin of tone is called tonogenesis, a term coined by James Matisoff. Tone is frequently an areal rather than a genealogical feature. That is, a language may acquire tones through bilingualism if influential neighboring languages are tonal, or if speakers of a tonal language shift to the language in question, and bring their tones with them. In other cases, tone may arise spontaneously, and surprisingly quickly: the dialect of Cherokee in Oklahoma has tone, but the dialect in North Carolina does not, although they were only separated in 1838. Very often, tone arises as an effect of the loss or merger of consonants. (Such trace effects of disappeared tones or other sounds have been nicknamed Cheshirisation, after the lingering smile of the disappearing Cheshire cat in Alice in Wonderland.) In a non-tonal language, voiced consonants commonly cause following vowels to be pronounced at a lower pitch than other consonants do. This is usually a minor phonetic detail of voicing. However, if consonant voicing is subsequently lost, that incidental pitch difference may be left over to carry the distinction that the voicing had carried, and thus becomes meaningful (phonemic). We can see this historically in Panjabi: the Panjabi murmured (voiced aspirate) consonants have disappeared, and left tone in their wake. If the murmured consonant was at the beginning of a word, it left behind a low tone; if at the end, a high tone. If there was no such consonant, the pitch was unaffected; however, the unaffected words are limited in pitch so as not to interfere with the low and high tones, and so has become a tone of its own: mid tone. The historical connection is so regular that Panjabi is still written as if it had murmured consonants, and tone is not marked: the written consonants tell the reader which tone to use.
Tone (linguistics) Similarly, final fricatives or other consonants may phonetically affect the pitch of preceding vowels, and if they then weaken to /h/ and finally disappear completely, the difference in pitch, now a true difference in tone, carries on in their stead. This was the case with the Chinese languages: Two of the three tones of Middle Chinese, the "rising" and "leaving" tones, arose as the Old Chinese final consonants // and /s/ /h/ disappeared, while syllables that ended with neither of these consonants were interpreted as carrying the third tone, "even". Most dialects descending from Middle Chinese were further affected by a tone split, where each tone split in two depending on whether the initial consonant was voiced: Vowels following an unvoiced consonant acquired a higher tone while those following a voiced consonant acquired a lower tone as the voiced consonants lost their distinctiveness. The same changes affected many other languages in the same area, and at around the same time (AD 10001500). The tone split, for example, also occurred in Thai, Vietnamese, and the Lhasa dialect of Tibetan. In general, voiced initial consonants lead to low tones, while vowels after aspirated consonants acquire a high tone. When final consonants are lost, a glottal stop tends to leave a preceding vowel with a high or rising tone (although glottalized vowels tend to be low tone, so if the glottal stop causes vowel glottalization, that will tend to leave behind a low vowel), whereas a final fricative tends to leave a preceding vowel with a low or falling tone. Vowel phonation also frequently develops into tone, as can be seen in the case of Burmese. Tone arose in the Athabascan languages at least twice, in a patchwork of two systems. In some languages, such as Navajo, syllables with glottalized consonants (including glottal stops) in the syllable coda developed low tones, whereas in others, such as Slavey, they developed high tones, so that the two tonal systems are almost mirror images of each other. Syllables without glottalized codas developed the opposite tonefor example, high tone in Navajo and low tone in Slavey, due to contrast with the tone triggered by the glottalization. Other Athabascan languages, namely those in western Alaska (such as Koyukon) and the Pacific coast (such as Hupa), did not develop tone. Thus, the Proto-Athabascan word for "water" *tu is toneless to in Hupa, high-tone t in Navajo, and low-tone t in Slavey; while Proto-Athabascan *-t "knee" is toneless -ot in Hupa, low-tone -d in Navajo, and high-tone - in Slavey. Kingston (2005) provides a phonetic explanation for the opposite development of tone based on the two different ways of producing glottalized consonants with either (a) tense voice on the preceding vowel, which tends to produce a high F0, or (b) creaky voice, which tends to produce a low F0. Languages with "stiff" glottalized consonants and tense voice developed high tone on the preceding vowel and those with "slack" glottalized consonants with creaky voice developed low tone. The Bantu languages also have "mirror" tone systems, where the languages in the northwest corner of the Bantu area have the opposite tones of other Bantu languages. Three Algonquian languages developed tone independently of each other and of neighboring languages: Cheyenne, Arapaho, and Kickapoo. In Cheyenne, tone arose via vowel contraction; the long vowels of Proto-Algonquian contracted into high-pitched vowels in Cheyenne, while the short vowels became low-pitched. In Kickapoo, a vowel with a following [h] acquired a low tone, and this tone later extended to all vowels followed by a fricative.
22
Bibliography
Bao, Zhiming. (1999). The structure of tone. New York: Oxford University Press. ISBN 0-19-511880-4. Chen, Matthew Y. 2000. Tone Sandhi: patterns across Chinese dialects. Cambridge, England: CUP ISBN 0-521-65272-3 Clements, George N.; Goldsmith, John (eds.) (1984) Autosegmental Studies in Bantu Tone. Berlin: Mouton de Gruyer. Fromkin, Victoria A. (ed.). (1978). Tone: A linguistic survey. New York: Academic Press. Halle, Morris; & Stevens, Kenneth. (1971). A note on laryngeal features. Quarterly progress report 101. MIT. Haudricourt, Andr-Georges. (1954). De l'origine des tons en vietnamien. Journal Asiatique, 242: 69-82. Haudricourt, Andr-Georges. (1961). Bipartition et tripartition des systmes de tons dans quelques langues d'Extrme-Orient. Bulletin de la Socit de Linguistique de Paris, 56: 163-180.
Tone (linguistics) Hombert, Jean-Marie; Ohala, John J.; & Ewan, William G. (1979). Phonetic explanations for the development of tones. Language, 55, 37-58. Hyman, Larry. 2007. There is no pitch-accent prototype. Paper presented at the 2007 LSA Meeting. Anaheim, CA. Hyman, Larry. 2007. How (not) to do phonological typology: the case of pitch-accent. Berkeley, UC Berkeley. UC Berkeley Phonology Lab Annual Report: 654-685. Available online. [9] Kingston, John. (2005). The phonetics of Athabaskan tonogenesis. In S. Hargus & K. Rice (Eds.), Athabaskan prosody (pp.137184). Amsterdam: John Benjamins Publishing. Maddieson, Ian. (1978). Universals of tone. In J. H. Greenberg (Ed.), Universals of human language: Phonology (Vol. 2). Stanford: Stanford University Press. Michaud, Alexis. (2008). Tones and intonation: some current challenges. Proc. of 8th Int. Seminar on Speech Production (ISSP'08), Strasbourg, pp.1318. (Keynote lecture.) Available online. [10] Odden, David. (1995). Tone: African languages. In J. Goldsmith (Ed.), Handbook of phonological theory. Oxford: Basil Blackwell. Pike, Kenneth L. (1948). Tone languages: A technique for determining the number and type of pitch contrasts in a language, with studies in tonemic substitution and fusion. Ann Arbor: The University of Michigan Press. (Reprinted 1972, ISBN 0-472-08734-7). Wee, Lian-Hee (2008) Phonological Patterns in the Englishes of Singapore and Hong Kong. World Englishes 27(3/4):480-501. Yip, Moira. (2002). Tone. Cambridge textbooks in linguistics. Cambridge: Cambridge University Press. ISBN 0-521-77314-8 (hbk), ISBN 0-521-77445-4 (pbk).
23
References
[1] Barbara Lust, James Gair. Lexical Anaphors and Pronouns in Selected South Asian Languages. Page 637. Walter de Gruyter, 1999. ISBN 9783110143881. [2] (http:/ / www. omniglot. com/ writing/ gurmuki. htm) [3] Phonemic Inventory of Punjabi (http:/ / www. crulp. org/ Publication/ Crulp_report/ CR02_21E. pdf) [4] Geeti Sen. Crossing Boundaries. Orient Blackswan, 1997. ISBN 9788125013419. Page 132. Quote: "Possibly, Punjabi is the only major South Asian language that has this kind of tonal character. There does seem to have been some speculation among scholars about the possible origin of Punjabi's tone-language character but without any final and convincing answer." [5] Tones change over time, but may retain their original spelling. The Thai spelling of the final word in the tongue-twister, , indicates a rising tone, but the word is now commonly pronounced with a high tone. Therefore a new spelling, , is occasionally seen. [6] Kingston, John (2004). "The Phonetics of Athabaskan Tonogenesis" (http:/ / people. umass. edu/ jkingstn/ web page/ research/ athabaskan tonogenesis camera ready final 21 october 04. pdf). Athabaskan Prosody. John Benjamins Press. pp. 131179. . Retrieved 2008-11-14. [7] Implications of the Soviet Dungan Script for Chinese Language Reform (http:/ / www. pinyin. info/ readings/ texts/ dungan. html) [8] The seminal references are two Haudricourt articles published in 1954 and 1961 [9] http:/ / linguistics. berkeley. edu/ phonlab/ annual_report/ documents/ 2007/ Hyman_Pitch-Accent. pdf [10] http:/ / halshs. archives-ouvertes. fr/ halshs-00325982/
External links
World map of tone languages (http://wals.info/feature/13?tg_format=map) The World Atlas of Language Structures Online
Article Sources and Contributors
24
Article Sources and Contributors

Speech synthesis Source: http://en.wikipedia.org/w/index.php?oldid=374262819 Contributors: 12 Noon, 5 albert square, 67-21-48-122, 7, A3 nm, ACSE, Aaronrp, Abdull, Ace Frahm, Actam, Agoubard, Altermike, Amakuha, Andreas Bischoff, AndrewHowse, Angr, Argon233, Arj, Arvindn, BStrategic, Back ache, Backstabb, Badly Bradley, Beetstra, Betterusername, Bigbubba954, BjarteSorensen, Bobbo, Bobo192, Bocharov, Bovineone, Bradfuller, Brianreading, Brion VIBBER, Bumm13, CHIPSpeaking, Calltech, Calton, Caltrop, Canderra, Canis Lupus, CanisRufus, Cassowary, Chachou, Charlie danger, Chmod007, Chocolateboy, Chris the speller, Chrischris349, Chrysoula, Chuck Sirloin, ChuckOp, Conscious, CortlandKlein, Crystallina, Damian Yerrick, Darkspartan4121, Dave w74, David Eppstein, David Gerard, Ddp224, Dennishc, Discospinster, Dogman15, Doodle77, Dr.K., Dwiki, Dycedarg, Dylan Lake, E946, EdC, Eik Corell, Ellamosi, Eurleif, Everyking, Fr33kman, Fractaler, Ftiercel, Furrykef, GVnayR, Gaius Cornelius, Gary Cziko, Gerstman ny, Giftlite, Glenn, Grendelkhan, GreyCat, Gwalla, H, Hamedkhoshakhlagh, Harryboyles, Heron, Hhanke, Hike395, Holizz, Instine, Intgr, Invitatious, Itman, Itzcuauhtli, JJblack, JLaTondre, Jacob Poon, Jeff Henderson, Jenliv2ryd, Jerryobject, Jimfbleak, Jimich, Jlittlet, Joelr31, JoergenB, JohnMRye, Johnny Au, K7aay, Kaldosh, Kate, Kayemel3, Kcordina, Kindall, Knowledgerend, Kuszi, Kwamikagami, Kwekubo, Kwizy, KyraVixen, L736E, Ladysmithrose, Lakshmish, Lalalele, Latitude0116, Ligulem, LittleHow, Lksdfvbmwe, Lukeluke1978, Lupin, MER-C, Mac, MarcS, Marskell, Martinevans123, Matrixbob, Matt Crypto, MatthPeder, Maury Markowitz, MaviAte, Michal Jurosz, MonteChristof, Morfeuz, MotherFerginPrincess, Moyogo, MrTree, MuthuKutty, N1RK4UDSK714, Nahitk, Naikrovek, Nanard, NawlinWiki, Neelix, NeonMerlin, Nige7, Nlu, Nohat, Nopetro, Oemengr, Oli Filth, Outriggr, Oznull, Oznux, PAR, Palthrow, Parasane, Patrick, Paulson74, Paxse, Pb30, Pedro, Pengo, Pgillman, Pinethicket, PleaseStand, Politepunk, Polluks, Ppareit, PrimroseGuy, Python eggs, Quadell, Raffaele Megabyte, Raul654, RedWolf, Redox12, Reichenbachj, Renaudforestie, Requestion, Rich Farmbrough, Rjanag, Rjwilmsi, Rmcguire, RobertG, Roberto111199, Rogerb67, Ruud Koot, Santhosh.thottingal, Satori, Savetz, Saxbryn, Scott Sanchez, Serezniy, Shadowjams, Shaftesbury12, Shaneymac, SigmaEpsilon, Silas S. Brown, Simeon, Singlebarrel, Sinus Pi, SiobhanHansa, SlowByte, Smack, Sonjaaa, Southpolekat, Speechgrl, Stephanos Georgios John, StephenPratt, Stephenchou0722, SteveRenals, Supremeknowledge, Suruena, T33guy, TMC1221, TextToSpeech, The Founders Intent, Thefreethinker, Thenickdude, Thesilverbail, Thoobik, Thorenn, Thue, Thumperward, Thw416, TimMagic, Tjwood, Tlesher, Tobias Bergemann, Tommy Blueseed, Tony1, Trainra, Twikir, Twistor, Twthmoses, Uluboz, ValerieChernek, Versus22, Viajero, Voxsprechen, W4HTX, W9000, Wayne Miller, Wayp123, Weheh, Wesley crossman, Wik, Wikipodium, Wizzard2006, Wolf grey, Xdenizen, Yamamoto Ichiro, Yiddophile, Yosri, Yrithinnd, Ziusudra, , 448 anonymous edits Prosody (linguistics) Source: http://en.wikipedia.org/w/index.php?oldid=374403383 Contributors: Avg, Bobblehead, Bobo192, Counterfact, Crohnie, Dcoetzee, Denni, Doc9871, Drphilharmonic, Efe, Enduser, Esn, Eubulides, Furrykef, Gailtb, Gpkh, Hakeem.gadi, Heroeswithmetaphors, Hippopha, Hongooi, Hteodor, Ish ishwar, JNShutt, Java13690, Jmrowland, Kjoonlee, Kwamikagami, Kwertii, Largoplazo, Liberatus, LilHelpa, Matturn, Mgcsinc, Mike Serfas, Moyogo, N-true, Noetica, Pajast, Paul e. spencer, Pgan002, Pi zero, Picapica, PuzzletChung, Renne, Rik G., RiverDeepMountainHigh, Rror, Rspeer, SameerKhan, SimonDonnelly, Tonyfleming, USA Linguistics, Umofomia, Unconventional, Zack wadghiri, 52 anonymous edits Tone (linguistics) Source: http://en.wikipedia.org/w/index.php?oldid=374837569 Contributors: 19DrPepper91, A. Parrot, Aeusoes1, Aggelikiii, Akitstika, Alex.tan, AlexisMichaud, Andre Engels, Andrewcottrell, Angr, Anthony Fok, Ary29, Asoer, Austinkamp, Ave matthew, AwesomeTruffle, Badagnani, Bendzh, Benwing, Bhumiya, Burschik, CJLL Wright, CRGreathouse, CarlKenner, Cassowary, CatherineMunro, Chamaeleon, Ciraric, Closedmouth, Cmdrjameson, Csernica, Cuaxdon, Curps, Cyferx, DHN, Damian Yerrick, Darrenhusted, Dattebayo321, Dbachmann, Donarreiskoffer, Duja, Dwo, Dylanwhs, EaungHawi, Edricson, Ellywa, Elvenscout742, EmirA, Erauch, Excalibre, FilipeS, Fredrik, Furrykef, Gailtb, Gaius Cornelius, Gandalf1491, Gatewaycat, Gilgamesh, Godfrey Daniel, Graham87, Haitike, Havelino, Henry Flower, Heron, Hippietrail, Hippopha, Hvn0413, Hyacinth, INyar, Ihcoyc, Ingolemo, Io Katai, Ish ishwar, Ivan tambuk, J04n, J3ff, Jacob1207, Jalwikip, James Crippen, Jameswilson, Jiang, Jogloran, Jonsafari, Jorge Stolfi, Karada, Karmosin, Kbh3rd, KingTT, Kompar, Korg, Kowey, Krich, Kwamikagami, LaggedOnUser, Le Anh-Huy, Leandrod, LjL, Lorenzarius, MKoltnow, Mark Dingemanse, Maunus, Mboverload, Menchi, Michael Hardy, Micro01, Mikewazhere, Miskwito, Munci, Mustafaa, Mxn, N-true, Nameless123456, Never give in, NinjaCharlie, O.Peterson, Od Mishehu, Olivier, Oniows, Ooswesthoesbes, Ori, Pablo-flores, Pawyilee, Pbice, Pepsi Lite, Peter Isotalo, Phantomtiger, Pigman, Pne, Poccil, Populus, Pratyeka, RafaAzevedo, Ran, Ranveig, Rasen, Retireduser1111, Rjanag, Roadrunner, Rodan44, Rrburke, Sam Hocevar, Sardanaphalus, Sceptre, Scipius, Seraphim, Shingrila, Showeropera, Simon12, Slawojarek, Sligocki, Slomo, Sobusola, Spencer195, Ssola, Steinbach, Strangeloop, Svato, Synchronism, TAKASUGI Shinji, TShilo12, Tabletop, TakuyaMurata, Timwi, Tokek, Tonkawa68, Tony1, Tothebarricades.tk, Troglo, Umofomia, Uncle G, Vassili Nikolaev, Voidvector, Voxpuppet, Vuo, WhisperToMe, Wikipeditor40, Wikky Horse, Wiser2001, Woodstone, Xdante88x, Zeimusu, Zyxoas, 481 ,anonymous edits
Image Sources, Licenses and Contributors
25
License
Creative Commons Attribution-Share Alike 3.0 Unported http:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

Articles: Speech Synthesis 1 Prosody (Linguistics) 11 Tone (Linguistics) 13

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Articles: Speech Synthesis 1 Prosody (Linguistics) 11 Tone (Linguistics) 13

Uploaded by

Copyright:

Available Formats

Contents

Overview of text processing

Prosodics and emotional content

Computer operating systems or outlets with speech synthesis

Speech synthesis markup languages

Acoustic attributes of prosody

The prosodic domain

Prosody and emotion

Brain location of prosody

Register tones and contour tones

Tone (linguistics) lowhigh // becoming rising [].

Tone terracing and tone sandhi

Word tones and syllable tones

Realization of neutral tones in Mandarin

high rising dipping falling

bli bbo lba tzi

glass elder uncle horn rabbit

Low tone grave

Tones of Standard Mandarin

Low dipping tone 214 (Tone 3) High falling tone 51 (Tone 4)

Tones of northern Vietnamese

Dissimilation Rhotacism ([z], [d], or [n] [r]) Rhinoglottophilia ([h] or [] [] or [])

Article Sources and Contributors

Article Sources and Contributors

Image Sources, Licenses and Contributors

You might also like