Professional Documents
Culture Documents
Keywords
Voice over IP (VoIP), Error concealment, Error recovery
1. INTRODUCTION
As the Internet grows in popularity, new technologies are discovered and become more prevalent with time, usually starting in the business arena and slowly working their way down to the consumer market. One of these new technologies focuses on the transmission of voice over the Internet, enabling two or more parties to communicate with each other. In the last few years, voice over IP or VoIP, has grown immensely in its popularity. Businesses for quite some time have realized the features and cost savings VoIP provides them; now consumers are also realizing these same benefits. VoIP is the likely candidate to replace todays method of how we transmit voice over highly reliable, wired telecommunication networks. The Internet, however, and the underlying protocols may be unreliable and are considered best effort over a packet-switched network. With the advent of voice over IP and its growing popularity to the home user, many things should be considered regarding errors in the voice transmission. These include what should be done when an error within the audio transmission is discovered and how, or whether, one should recover from such an error. Furthermore, circuit-switched networks like the ones used by phone companies today provide a high level of service and quality mainly because a point-to-point connection is established between both parties before a conversation takes place. Errors such as delay or static in the circuit-switched voice environment
packets. In VoIP, this translates to errors in the audio stream. It is then important to overcome these errors to improve the perceived quality and be able to offer the same level quality of service found in circuit-switched telephone networks. To overcome network delay and offer a continuous audio stream, playout buffers are typically used at the receiver. These offer some error concealment by discarding any packets that arrive after their successor packets have been played, but the end user may still experience jitter and an audio stream that is below the quality found in public switched telephone networks.
Although the data rates that CELP provides are quite low and attractive, further improvement is possible by noticing that speech is not a continuous stream of audio, but is in fact composed of silence and sound. This offers the ability to encode the audio only when there is sound and not to encode the silence, which is known as Voice-Activated Detection (VAD) [2]. The capability this provides is very beneficial to the transmission of voice because it requires less available bandwidth when compared to encoding both silence and sound. The transmission of real-time information such as voice over the Internet is typically made possible by using the Real-Time Transport Protocol (RTP), which runs over the User Datagram Protocol (UDP). RTP provides the packet sequence information so that packets are kept in order and provides time stamping to help avoid jitter, while UDP provides the end-to-end delivery but is considered unreliable. That is, there is no guarantee that data will reach the intended receiver or the packets will arrive in order, because successful transmission is dependent on a variety of network conditions such as congestion and connectivity. With the increased popularity of VoIP and the many features it brings, customers will expect it to have the same type of voice quality and reliability that public switched telephone networks offer today [2]. The Internet as a whole is an ever-evolving technology and there are many people and corporations trying to improve its quality and reliability in regards to VoIP. Many of the Internet Service Providers (ISPs) are very capable of offering VoIP technology, but there are still many network paths that can cause poor performance [2]. In fact, a large number of Internet paths have high delay and high delay variability [2], which are very unfavorable to VoIP. The quality the end user will experience can be measured by the network loss and delay within the network. Typically, low delay and low delay variability lead to excellent VoIP performance [2]. Both factors are very dependent on the amount of traffic on the network. When the amount of traffic increases, congestion may cause packet delay or even lost
Encoding and decoding between analog and digital signals also adds delay, which could be variable if compression is performed. These delay variations may well cause a receiver to drop packets that arrive late, compounding the jitter with additional packet loss.
Table 1. Error handling techniques and their attributes Technique Beat Pattern Error Concealment Intra-Flow Loss Recovery and Control Packet Loss Concealment Muting Interpolation Retransmission Feedback Interleaving Repetition Error Spreading Forward Error Correction Low Bit-Rate Redundancy Classification Reactive Proactive Reactive Reactive Reactive Reactive Reactive Proactive Reactive Reactive Proactive Proactive Type Receiver-based Both Receiver-based Receiver-based Receiver-based Sender-based Receiver-based Sender-based Receiver-based Receiver-based Sender-based Sender-based
Table 1 categorizes the different types of error handling techniques related to VoIP, specifying whether they are reactive or proactive, and if they are sender- or receiver-based or possibly both. Section 6 discusses these in further detail.
case is voice, is known as a flow. Intra-flow loss recovery and control is concerned with correcting the errors as the audio stream travels from the sender to the receiver. Within VoIP, this technique is known to increase the perceptual quality an end user experiences [4]. This approach is based on the premise that a short burst of lost packets or data makes it difficult to conceal the error at the receiver, which leads to interruptions within the audio stream. Remembering that voice conversations are made up of periods consisting of both silence and sound, we conclude that some packets are in fact more important than others are. That is, those packets containing sound should be considered more important than those containing silence. If a packet containing silence is lost, the impact on the quality of the stream is lower than if the packet contains audio. This leads to a practice of marking those packets considered more important after looking at how sensitive they are to loss. Intraflow loss recovery focuses on marking those packets within the flow that are sensitive to loss on the senders side and those sensitive to loss concealment on the receivers side [4]. By marking such packets, both the sender and receiver can determine which packets can be lost with the least amount of impact on the overall quality of the stream. The differentiation of those packets that are sensitive to loss can be grouped into two categories, temporal sensitivity, and sensitivity due to Application Data Unit (ADU) heterogeneity [4]. Temporal Sensitivity is a form of time dependence in which all packets are both dependent on each other and on time as well. A packet should only be decoded if the one preceding it has been decoded [4]. This ensures that the receiver receives the transmission in the order it was sent and makes sure that the audio stream is not played out of order due to late or early arrival of packets. However, if one of these packets is lost, this is considered an isolated packet loss and not a loss that occurs in bursts [4]. An Application Data Unit (ADU) is a unit of data emitted by a source coder [4]. Heterogeneity fits with the concept that some ADUs or packets are more important to determining the perceived quality of the voice transmission experienced by end users and are thus more sensitive to loss. If for example, a packet containing silence is lost, the impact on the perceived quality is minimal compared to a lost packet containing sound. Therefore, it is important to try to identify those packets considered more important so that error recovery can be performed correctly if the packet is lost. The main problem with this technique is that the adaptation of the senders bitrate to the current network congestion state as an intra-flow QoS scheme is difficult to apply to voice [4]. When looking at streaming audio (MP3 music), in general we can conclude that it takes fewer bits to encode a voice conversation than the MP3. However, when a packet
is lost at the receiving end, the cost to transmit feedback to the sender that the packet was lost is higher because VoIP is a real-time communication technology. Furthermore, when an error is identified and recovery initiated, significant amounts of time may pass making these lost packets irrelevant because new ones have already arrived. Packet Loss Concealment (PLC) This consists of creating a replacement for the lost packet by creating a packet similar to what has already been played. This technique is used on the receivers end and it is believed that A good PLC algorithm can greatly improve the perceived quality with no bandwidth overhead [1]. Muting When an error such as a lost or delayed packet is identified on the receivers side, a simple method of recovering from the error is to insert silence or noise into the decoded audio stream in place of the missing bits [2]. This is the simplest of the error concealment techniques in terms of computational requirements, but the result of inserting silence into the stream is not necessarily desirable to end users, thus lowering the perceived quality of service. Interpolation Once the loss of a packet is discovered, recovery begins with examining the area of the lost packet in relation to other packets around it. By looking at the characteristics of the sound (speech), and examining its waveform in the area of the lost packet, either a replacement packet can be generated to fill the gap or a previously played packet that most closely fits the spot where the error is might be used. This technique requires both computation and buffering or storing of previously played packets. Retransmission One of the most common ways to recover from lost or corrupt data due to transmission on the Internet is retransmission. Although this is a very appealing and widely used technique for static data, it is not practical for VoIP because of its realtime nature [2]. If retransmission is desired, it should only be used when the end-to-end delay between the sender and receiver is very small. Feedback Notifying the sender to change the rate at which the audio stream is being sent is a feedback technique. One may increase or decrease the data stream speed, change the rate at which the audio is encoded, or notify the sender to switch to a different type of encoder [2]. Interleaving This technique focuses on separating the voice signal into several frames where each frame is just a section of the original audio stream [7]. Just before transition, the order of these frames is scrambled, each is placed into a packet, and the results rearranged back to their original order at the receivers end [2]. In other words, the frames are interleaved over multiple packets, thus spreading the probability of an error across multiple sections in the audio stream. The goal is to optimize the order in which the packets are sent but this requires CPU calculations in
order to separate the audio into frames and to place them into separate packets. Therefore, this technique is not entirely applicable to VoIP. It may add additional delay, which is something that needs to be avoided when transmitting real-time data such as VoIP. It is however a good option for streaming prerecorded audio, especially to small mobile devices or for streaming MP3 audio in general [7] as long as it isnt a real-time audio recording such as VoIP. Repetition Another simple form of error recovery and concealment is to repeat the audio that was last played in the packet arriving just before the error was encountered [1]. For real-time data such as audio streaming, this is acceptable if the lost section of audio is smaller than approximately 20ms [8]. On the other hand, this technique is not entirely acceptable for voice conversations within VoIP because depending on the end users ability to discern imperfections, it might be irritating if the lost section was in excess of 20ms. If the repeated sections are 20ms or less, however, then this repetition would be virtually unnoticeable by the end users because the error would result in a small stutter within the audio stream. Nevertheless, as parts of the conversation are repeated, the level of quality is reduced even if the end user is unable to notice it. Error Spreading This technique tries to decrease the impact errors have on the quality of the stream by spreading the error across a larger portion of the audio stream than just the general area where the error occurred. This increases the perceived quality of the stream [6]. It works by making the error less noticeable to the user through permuting the input sequence of packets before transmission and unscrambling them on the receiving end [6]. Forward Error Correction (FEC) Forward error correction is the most popular form of error correction on the Internet. It works on an end-to-end basis to recover from loss due to an error while still meeting the real-time delay constraints [4]. This technique works by transmitting extra, redundant information in the same packet along with the original data in order to have enough information to reconstruct the packet if any of its sections are damaged in transmission. It is therefore attractive for transmitting audio in VoIP, but there are some disadvantages. FEC recreates the lost packets and it can reduce the effect of errors but not eliminate them [1]. Furthermore, the amount of redundant information being sent needs to be adaptive to avoid taking bandwidth away from other flows [4]. In other words, only the necessary amount of redundant information to recreate the lost packet should be transmitted. In addition, the packet reconstruction process may introduce additional delay at the receiver when trying to correct jitter [4] and as the size of the redundant FEC data increases, it may increase the overall end-to-end delay [1]. Low Bit-Rate Redundancy (LBR) LBR is considered an alternative to FEC in which a copy of the audio packet is always sent but at a lower quality
[1]. Therefore, the lost packet containing the original higher-quality audio can be substituted simply by using the copy. An unrecoverable error might be experienced when both the original and the copy are lost. Nevertheless, MOS tests indicate, FEC is much preferred over LBR [1].
7. ANALYSIS
With VoIP being a technology that runs over the Internet, errors should be expected. The Internet and its underlying protocols consist of a best-effort delivery system. There is no guarantee that information will reach the intended receiver and this may present a problem for VoIP and the quality of service it can offer. Error recovery and concealment techniques try to improve this quality when an error is encountered. Error concealment focuses on hiding (concealing) the error from the end user so as to offer the perception that the QoS level hasnt deteriorated even though the actual audio transmission has been affected by network conditions. Error recovery focuses on trying to fix or compensate for the error in order to maintain or even improve the perceived QoS for the end user. Both concealment and recovery are vital to quality and should be used for VoIP. There are many methods and techniques for recovering from errors experienced in the transmission of voice using VoIP, but no one technology is best. As depicted in Table 2, Forward Error Correction (FEC) seems to be the most favorable, but it alone is not sufficient. An enhanced error recovery and concealment technique consists of a combination of more than one technique and the decision of which combination is best can only be determined by considering a variety of factors, including the type of error, its size, and the state of the network. The ideal implementation of this combination however, is one that is capable of dynamically adapting to the constant changes in the network, and doing so in real time. Dynamic adjustment of a combination is inherently a CPU-intensive process. CPU time and computer resources at either the sender or receiver (or possibly both) are needed to calculate the state of the network and adapt accordingly. This may introduce additional delays into the audio transmission from the overhead of the dynamic process. Therefore, a static combination is more appropriate for VoIP. In order to minimize any induced delays resulting from a dynamically chosen combination, we propose a static combination be chosen ahead of time, independent of any changing network conditions. The scores in Table 2 suggest that a combination of FEC, Error Spreading, and Feedback would provide the most powerful way to conceal or recover from most if not all of the errors. Such a combination would certainly do a better job at concealing and recovering from errors than any of the techniques used alone. Both Feedback and Forward Error Correction provide mechanisms for handling or reducing the likelihood of packet loss, error spreading helps reduce the impact of network delays that induce jitter into the audio stream, and all three help to improve the overall sound quality. Determining whether this is the best possible combination and how much better it would be than other combinations or individual techniques, requires a detailed investigation of both network congestion and computational requirements which are beyond the scope of this paper.
Table 2. Error handling techniques and their effectiveness in improving QoS Technique Beat Pattern Error Concealment Intra-Flow Loss Recovery and Control Packet Loss Concealment Muting Interpolation Retransmission Feedback Interleaving Repetition Error Spreading Jitter + + + ++ Packet Loss ++ ++ + ++ + + Overall Sound Score Quality + + + ++ + + ++ 0 3 3 0 2 3 4 2 2 4
9. REFERENCES
[1] Jiang, W; Schulzrinne, H; Comparison and Optimization of Packet Loss Repair Methods on VoIP Perceived Quality Under Bursty Loss, International Workshop on Network and Operating System Support for Digital Audio and Video archive Proceedings of the 12th international workshop on Network and operating systems support for digital audio and video, Miami, Florida, 2002, pp. 73-81. [2] Markopoulou, A; Tobagi, F; Karam, M; Assessing the Quality of Voice Communications Over Internet Backbones, IEEE/ACM Transactions on Networking (TON) archive, Vol. 11, Issue 5, 2003, pp. 747-760. [3] Matta, J; Pepin, C; Lashkari, K; Jain, R; A Source and Channel Rate Adaptation Algorithm for AMR in VoIP Using the Emodel, International Workshop on Network and Operating System Support for Digital Audio and Video archive Proceedings of the 13th international workshop on Network and operating systems support for digital audio and video, Monterey, California, 2003, pp. 92-99. [4] Sanneck, H; Long Lee, N; Wolisz, A; Carlie, G; Intraflow Loss Recovery and Control for VoIP, International Multimedia Conference archive Proceedings of the ninth ACM international conference on Multimedia, Ottawa, Canada, 2001, pp. 441-454. [5] Shenker, S; Partridge, C; Guerin, R; "Specification of Guaranteed Quality of Service" RFC 2212, September 1997. [6] Varadarajan, S; Ngo, H; Srivastava, J; Error Spreading: A Perception-Driven Approach to Handling Error in Continuous Media Streaming, IEEE/ACM Transactions on Networking archive, Vol. 10, Issue 1, 2002, pp. 139152. [7] Wang, Y; Huang, W; Korhonen, J; A Framework for Robust and Scalable Audio Streaming, International Multimedia Conference archive Proceedings of the 12th annual ACM international conference on Multimedia, New York, New York, 2004, pp. 144151. [8] Wang, Y; Vilermo, M; A Compressed Domain Beat Detector Using MP3 Audio Bitstreams, International Multimedia Conference Proceedings of the ninth ACM international conference on Multimedia, Ottawa, Canada, 2001, pp. 194-202. [9] Wroclawski, J; "Specification of the Controlled-Load Network Element Service" RFC 2211, September 1997.
Forward Error + ++ +++ 6 Correction Low Bit-Rate + ++ 3 Redundancy Key: - poor (0), + good (1), + + better (2), + + + best (3)
8. FUTURE WORK
The area of VoIP technology is becoming more popular each year as businesses and consumers understand the features and realize the benefits it offers. VoIP can also result in significant cost savings when compared to traditional public switched telephone networks. Besides the error recovery and concealment techniques described in this paper, future areas to be examined include reliability, security, scalability, automatic failover, and fault detection within VoIP as compared with circuit-switched networks. These could be inspected and researched in order to determine what additional steps would make VoIP as good as, if not better than, wired circuit-switched telephony.