You are on page 1of 6

AN INTERACTIVE SPEECH CODING TOOL USING LABVIEWTM Karthikeyan N. Ramamurthy, Jayaraman J.

Thiagarajan and Andreas Spanias SenSIP Center, School of ECEE, Arizona State University, Tempe, AZ USA 85287-5706
ABSTRACT Code Excited Linear Prediction (CELP) is a closed-loop analysis-by-synthesis speech coding algorithm that has been standardized in Federal Standard-1016. Variants of the CELP algorithm form the core of many speech coding standards that exist today. In this paper, we discuss the development of an interactive speech coding tool in National Instruments LabVIEWTM software for the Federal Standard1016 CELP algorithm. A brief description of the speech coding algorithm and the features of the LabVIEW speech coding tool are presented. Illustrations demonstrating the use of the interactive software tool in analyzing the speech coding algorithm are provided. This tool can be used to teach the various modules of the CELP based speech coders to undergraduate and graduate students. Index Terms Speech coding, LabVIEW, code excited linear prediction, interactive tools.
1. INTRODUCTION

Speech coding is concerned with compact digital representations of voice signals for the purpose of efficient transmission or storage [1-6]. Linear predictive coding is the core of many speech coding standards that exist today [7,8]. Linear predictive coding relies on the source-system model of speech production, which is inspired by the human speech production mechanism. Voiced speech is produced by exciting the vocal tract filter with periodic impulses and unvoiced speech is generated using random pseudo-white noise excitation. The vocal tract is usually represented by a tenth-order digital all-pole filter. This source-system analysis-synthesis model is used in most standardized algorithms. In fact, the Levinson-Durbin linear prediction algorithm is embedded in every cell phone. The closed-loop source-system encoders use linear prediction (LP) along with an excitation scheme determined by closed-loop analysis-by-synthesis (A-by-S) optimization. The excitation sequence that minimizes the perceptuallyweighted (PW) mean-square-error (MSE) between the input speech and reconstructed speech is chosen as the optimal [9]. In the CELP algorithm [10-11], the excitation sequences are stored in two code books and the indices to the codebooks are chosen during the PW MSE minimization

process. The adaptive code book (ACB) predicts the pitch delay using the long term predictor (LTP) and the stochastic code book (SCB) predicts the random component of the excitation. Other components of a generic CELP encoder include autocorrelation analysis and linear prediction, and line spectral pair (LSP) computation. CELP decoder implements a part of the encoder itself. A generic CELP encoder is illustrated in Figure 1. LabVIEWTM [12] was chosen as the programming environment to implement the CELP algorithm as it has a rich set of signal processing and visualization functions, and real-time signal acquisition capabilities. Implementation of speech coding algorithms involves integration of software and hardware components, which can be easily performed with LabVIEW. The graphical programming approach enables users to easily visualize and understand the basic blocks of the speech analysis-synthesis procedure. This speech coding tool is scalable in the sense that additional options and capabilities could be added. In this paper, we extend the work published in [13] and discuss the implementation of the Federal Standard 1016 (FS-1016) version of the CELP speech coder in LabVIEWTM. Our main goal here is to introduce and demonstrate the concepts of speech coding to students and enhance their learning experience using an interactive visual interface. We choose the CELP coder for analysis, because it can be connected to several concepts covered in DSP classes including digital filter theory, estimation of periodicity, autocorrelation computation and filter stability. Exercises that expose students to the non-stationarity of the speech signal, the all-pole spectral modeling performed by LP analysis-synthesis and the distortion caused by quantization of LSP parameters will be developed. The tool can be used along with the books [4,10] that have demonstrations of the FS-1016 algorithm and exercises based on MATLAB. The speech coding tool is of value not only to undergraduate and graduate students but also to DSP practitioners. The tool can also be used in high school science classes after some simplifications, for demonstrating the basic aspects of coding and transmission of speech. Assessment instruments will be developed and pre-, postquizzes and interviews will be conducted among the students.

978-1-61284-227-1/11/$26.00 2011 IEEE

180

DSP/SPE 2011

Input speech

Excitation vectors Codebook

PWF W(z)

sw
. . ..

x( i )

g(i)

LTP synthesis filter

LP synthesis filter

(i ) s
Synthetic speech

1/AL(z)

1/A(z)

PWF W(z)

(i ) w

Residual error

e( i )
MSE minimization

Figure 1. Block diagram of a generic CELP encoder. 2. CELP BASED SPEECH CODING STANDARDS The speech coding standards based on CELP are surveyed in this section. In our survey, we divide the algorithms based on CELP into three categories based on their chronology of their development, i.e., first-generation CELP (1986-1992), second-generation CELP (1993-1998), and third-generation CELP (1999-present). A detailed description of the FS-1016 standard is also provided. 2.1. Survey of Speech Coders The first-generation CELP algorithms are generally of high complexity and non-toll quality that operate at bit rates between 4.8 kb/s and 16 kb/s. Some of the first-generation CELP algorithms include: the FS-1016 CELP, the IS-54 vector sum excited linear prediction (VSELP), the ITU-T G.728 low delay-CELP, and the IS-96 Qualcomm CELP. The newer second and third generation A-by-S coders replaced most of these standardized CELP algorithms. The second-generation CELP algorithms are targeted for Internet audio streaming, voice-over-Internet-protocol (VoIP), teleconferencing applications, and secure communications. Some of the second-generation CELP algorithms include: the ITU-T G.723.1 dual-rate speech codec [14], the GSM EFR [15], the IS-127 Relaxed CELP (RCELP) [16], and the ITU-T G.729 CS-ACELP [17]. The algebraic CELP (ACELP) uses algebraic codes in place of the SCB and hence provides a huge reduction in computational complexity for code book search. The third-generation (3G) CELP algorithms accommodate different bit rates and are multimodal. They are designed to operate in different modes: low-mobility, high-mobility, indoor, etc., and consistent with the vision on wideband wireless standards. There are at least two algorithms that have been developed and standardized for these applications. The Global Systems for Mobile Rate Communications (GSM) standardized the Adaptive Multi(AMR) coder [18] in Europe and the Telecommunications Industry Association (TIA) has tested the Selectable Mode Vocoder (SMV) [19] in the U.S. 2.2. The FS-1016 CELP Coder FS-1016 is a 4.8 kb/s CELP algorithm that was adopted in the late 1980s by the Department of Defense (DoD) for use in the third-generation secure telephone unit (STU-III). The CELP FS-1016 remains interesting for our study as it contains core elements of A-by-S algorithms that are still very useful. The synthesis configuration for the FS-1016 CELP is shown in Figure 2. Speech is sampled at 8 kHz and segmented into frames of 30ms in the FS-1016 CELP. Each frame is segmented in sub-frames of 7.5 ms. The excitation in CELP is formed by combining vectors from an adaptive and a stochastic codebook (gain-shape VQ). The excitation vectors are selected in every sub-frame by minimizing the perceptually weighted error measure. The codebooks are searched sequentially starting with the ACB. The ACB contains the history of past excitation signals and the LTP lag search is carried over 128 integer (20 to 147) and 128 non-integer delays. Only a subset of lags is searched in even sub-frames to reduce the computational complexity. The SCB contains 512 sparse and overlapping code vectors [20]. Each code vector consists of sixty samples and each sample is ternary valued
Stochastic codebook VQ index

gs
+ Speech

Postfilter

Adaptive codebook Lag index

ga

A(z)

Figure 2. FS-1016 CELP synthesis.

181

MATLAB Implementation of the Algorithm

Create a Shared Library using MATLAB Compiler

Create a C++ Wrapper Function

Create a New Shared Library for LabVIEW

Build User Interface in LabVIEW

Figure 3. Block diagram illustrating the steps involved in building the speech coding tool. (1,0,-1) [21] to allow for fast convolution. Ten short-term prediction parameters are encoded as LSPs on a frame-by-frame basis. LSPs are more amenable to quantization and hence they are transmitted instead of LP coefficients. Sub-frame LSPs are obtained by performing linear interpolation of frame LSPs. A short-term pole-zero postfilter is also part of the standard. The details on the bit allocations are given in the standard [11]. The computational complexity of FS-1016 CELP was estimated at 16 Million Instructions per Second (MIPS) for partially searched codebooks and the Diagnostic Rhyme Test (DRT) and Mean Opinion Scores (MOS) were reported to be 91.5 and 3.2 respectively. 3. LABVIEW SPEECH CODING TOOL In this section, we present the software tool developed for teaching speech coding theory and the CELP algorithm using the National Instruments LabVIEW package. Implementation of highly complex signal processing algorithms involves integration of several software and hardware components developed across different platforms. Hence, there is a need for a scalable framework that provides flexibility to extensions and ability to perform detailed analysis under different system conditions. Such a framework can be realized using two different approaches: a) hybrid programming and b) integration of existing software. Hybrid programming combines the inherent graphical programming functions of LabVIEW with the textual programming using Mathscript. The primary limitations of this approach are the speed of execution and the overhead involved in converting the external source code from the native programming language to Mathscript. The other approach is to integrate existing software and this requires a complete understanding of the underlying platform in which the native code was developed. The primary challenge in this case is to develop suitable software interfaces for LabVIEW to communicate with the different components. The important limitation of this approach is that extension of the algorithm and modification of the native source code may be required. However, we base our speech coding tool on this approach, since hybrid programming is not fast enough to realize real-time applications.

Figure 4. An example LabVIEW model using the built dll. 3.1. Basic Framework The framework has been built using shared libraries that exploited existing MATLAB implementation along with LabVIEWs native functionalities. We make use of shared libraries that are built from the native implementation and integrated with software/hardware components developed in LabVIEW. The basic steps involved in building the speech coding tool using LabVIEW is illustrated in Figure 3. i) MATLAB implementation: The speech coding/processing algorithms are implemented using MATLAB [10]. This implementation includes functions from specific toolboxes. The inputs and outputs of the speech coding tool are identified. ii) Create shared library from MATLAB: The MATLAB compiler is used to build a shared C library of the algorithm. This requires the MATLAB Component Runtime (MCR). iii) Create C++ wrapper: A C++ wrapper is built to interface the MATLAB library and LabVIEW. This step includes the identification of the functions that are to be exposed to LabVIEW. iv) Make library for LabVIEW: A new shared library is built over the wrapper code. In effect, invoking a function of this new library will implicitly invoke the functions in the MATLAB shared library. v) Build user interface for the tool: Call Library node is used to call the external shared library in LabVIEW. A graphical

182

Figure 5. User interface of the LabVIEW speech coding tool. interface is developed to handle the inputs and outputs of the library function 3.2. Challenges The primary challenge involved in this process is in the creation of the wrapper function. In addition to communicating data types between MATLAB and LabVIEW, it needs to account for the memory issues in LabVIEW. When LabVIEW loads a VI, it loads all the subVIs into memory. Specifically, it loads all the shared libraries (*.dll) used. The dlls are erased from memory only when the top-level VI is closed. This is a problem when MATLAB libraries are used in our speech coding tool. We cannot initialize the MATLAB dll again once we have terminated the tool. This is because the dlls are not erased from the memory unless the LabVIEW application is restarted. This implies that we are only able to run the dll once. Therefore the solution we resort to is splitting the initialize, execute and terminate functions of the MATLAB libraries. Then when running the tool in LabVIEW, we initialize the libraries only once, before running the dll functions and terminate them before we shut down. Figure 4 illustrates an example LabVIEW model that uses the built dll. 4. USER INTERFACE OF THE TOOL Figure 5 shows the user interface of the LabVIEW speech coding tool. The interface consists of multiple tabs that illustrate several modules of the FS-1016 algorithm. The software can access either an audio (.wav) file or real-time speech input. The user also has options to change certain speech parameters to analyze the performance and behavior of the algorithm under different conditions. The preprocessed input speech is displayed and processed on a frame-by-frame basis. Frame-by-frame display is also used to view the spectra of the decoded output speech frames, the LP spectral envelopes before and after quantizing the LSPs, pole-zero plots of the synthesis filter, synthesized speech waveforms etc. The software has options to save the output speech. The user can also analyze the subjective quality of these algorithms by listening to the synthesized speech with the aid of the playback feature. In the following sections, the various outputs obtained with the LabVIEW tool for a single frame of speech will be shown. The analyzed frame is a voiced frame with a pitch period of 65 samples.

(a)

(b) Figure 6. (a) Input spectrum and (b) output spectrum for the given frame of speech data.

183

using the options to play back the postfiltered, high-pass filtered and non-postfiltered speech as shown in Figure 8. 5. UTILITY IN EDUCATION AND ASSESSMENT The main educational objective of the LabVIEW speech coding tool developed in this paper is to introduce and demonstrate the concepts of speech coding, in particular coding based on analysis-by-synthesis methods. The interactive visual interface of LabVIEW is intuitive and the tabbed interface of the tool allows the students to visualize various concepts of speech coding simultaneously, which is not possible when text-based programming languages are used. The tool can also be extended easily to include other outputs that are useful for student learning. This can be used to demonstrate speech coding in a DSP class or in a more advanced speech coding class. The authors have written books on audio coding [4] and FS-1016 [10], which contain exercises and demonstrations of the FS-1016 in MATLAB. The proposed tool can be used along with these books for demonstrating speech coding concepts. The following exercises will be developed and presented to the students as a part of the proposed assessment. Assessment results will be generated after introducing the students to the speech coding tool in the DSP class at Arizona State University. The assessment results will include pre- and post-quizzes on the fundamentals of speech coding. 5.1. Analysis of Voice/Unvoiced/Mixed Frames The students are required to identify voiced, unvoiced and mixed frames from a speech file. They will plot the time domain input and output waveforms, Fourier spectra, unquantized and quantized LPC plots. The time and frequency domain characteristics of the voiced/unvoiced and mixed frames will be analyzed. 5.2. Subjective Quality Analysis The students will be asked to evaluate the performance of the FS-1016 coder with the speech files provided. The three speech outputs, (a) postfiltered speech, (b) non-postfiltered speech and, (c) highpass filtered speech will be listened to and the differences in subjective quality will be analyzed. The students will also provide a MOS, which is a measure of perceived speech quality. 5.3. Pitch Forcing The students will have the option to force the pitch to a predefined value, using the Force Pitch option in the tool. Different values of pitch periods, (e.g.) 40, 75 and 110, Figure 8. Options for subjective quality analysis.

(a)

(b) Figure 7. (a) Input LP spectrum and (b) output LP spectrum for the given frame of speech data. 4.1. Input and Output Spectra The Fourier magnitude spectra of the input speech frame and the output speech frame can be observed using the tool. In Figure 6, the spectra for the analyzed frame as obtained from the LabVIEW tool are shown. The user can analyze the spectra of any desired frame. 4.2. Quantized and Unquantized LP Spectra The LP spectra obtained before and after quantizing the LSPs for the given frame are shown in Figure 7. This feature of the tool is very useful in order to analyze the spectral distortion caused by the quantization of LSP parameters. The roots of the input LP polynomial obtained using the unquantized LSPs and the output polynomial obtained from the quantized LSPs are shown in Figure 9. It can be seen that the output LP filter is still stable after quantization. This can be used to demonstrate the preservation of stability by quantizing LSPs instead of LP coefficients. Quantization of LP coefficients is more likely to result in an unstable filter. 4.3. Subjective Quality The subjective quality of the speech coder can be analyzed

184

Figure 9. Roots of input (left) and output (right) LP polynomials. will be forced and the students will evaluate the perceptual quality of output speech. 6. CONCLUSIONS In this paper, a LabVIEW speech coding tool that implements the FS-1016 algorithm was presented. The steps involved in creating the software tool from the existing MATLAB implementation of FS-1016 were described. The tool will be very useful to students and practitioners of DSP for teaching and understanding the principles behind CELP based speech coding algorithms. 7. ACKNOWLDGEMENTS Portions of this work have been sponsored by the ASU SenSIP center National Instruments project and the NSF CCLI award 0443137. 8. REFERENCES [1] A. Spanias, Speech Coding: A Tutorial Review, Proceedings of the IEEE, Vol.82, Issue 10, Oct 1994. [2] V. Atti, A Simulation Tool For Introducing Algebraic CELP (ACELP) Coding Concepts In A DSP Course, IEEE 2002 DSP Workshop, Callaway, Georgia, Oct. 2002. [3] A. Spanias, E.M. Painter, A Software Tool for Introducing Speech Coding Fundamentals in a DSP Course, IEEE Trans. On Education, Vol.39,2, pp.143-152, May 1996. [4] A. Spanias, T Painter, V. Atti, Audio Signal Processing and Coding, ISBN: 0-471-79147-4, Wiley, February 2007. [5] A. Spanias, Digital Signal Processing; An Interactive Approach, ISBN: 978-1-4243-2524-5, January 2007. [6] V. Atti, Interactive On-line Undergraduate Laboratories Using J-DSP, IEEE Trans. on Education Special Issue on Web-based Instruction, vol. 48, no. 4, pp. 735-749, Nov. 2005.

[7] A. Spanias, Chapter 3: Speech Coding Standards, pp. 25-44, Invited. Academic Press, Ed: G. Gibson, ISBN 2000 0-12- 282160-2. [8] FS-1016 CELP C Code Implementation, Available at World Wide Web: ftp://svr-ftp.eng.cam.ac.uk/ comp.speech/ coding/celp_3.2a.tar.Z. [9] M.R. Schroeder and B. Atal, Code-Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates, Proc. ICASSP-85, p. 937, Apr. 1985. [10] K. Ramamurthy and A. Spanias, MATLAB Software for the Code Excited Linear Prediction Algorithm: The Federal Standard-1016, Morgan and Claypool, 2010 . [11] J.P. Campbell Jr., T.E. Tremain and V.C. Welch, The Federal Standard 1016 4800 bps CELP Voice Coder, Digital Signal Processing, Academic Press, Vol. 1, No. 3, p. 145-155, 1991. [12] LabVIEW Fundamentals, Available at World Wide Web: http://www.ni.com/pdf/manuals/374029a.pdf [13] A. Spanias, K. Natesan, J. Jayaraman and P. Spanias, Work in progress - teaching speech signal processing and coding using LabVIEWTM, Proc. IEEE FIE, pp.T1C-22T1C-23, Oct. 2007. [14] ITU Recommendation G.723.1, Dual Rate Speech Coder for Multimedia Communications transmitting at 5.3 and 6.3 kb/s, Draft 1995. [15] TIA/EIA/IS-641, Cellular/PCS Radio Interface Enhanced Full-Rate Speech Codec, TIA 1996. [16] TIA/EIA/IS-127, Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems, TIA, 1997. [17] ITU Study Group 15 Draft Recommendation G.729, Coding of Speech at 8kb/s using Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-ACELP), 1995. [18] R. Ekudden, R. Hagen, I. Johansson, and J. Svedburg, The Adaptive Multi-Rate speech coder, Proc. IEEE Workshop on Speech Coding, pp. 117-119, Jun. 1999. [19] Y. Gao et. al., The SMV algorithm selected by TIA and 3GPP2 for CDMA applications, Proc. IEEE ICASSP01, vol. 2, pp. 709-712, May 2001. [20] W.B. Kleijn, Source-Dependent Channel Coding and its Application to CELP, Advances in Speech Coding, Eds. B. Atal, V. Cuperman, and A. Gersho, pp. 257-266, Kluwer Ac. Publ., 1990. [21] D. Lin, New Approaches to Stochastic Coding of Speech Sources at Very Low Bit Rates, Proc. EUPISCO86, p. 445, 1986.

185

You might also like