You are on page 1of 8

Video Quality Measure for Mobile IPTV Service

Wonjun Kim and Changick Kim* School of Engineering, Information and Communications University, 119 Munji street, Yuseong-gu, Daejeon, 305-714, Republic of Korea
ABSTRACT
Mobile IPTV is a multimedia service based on wireless networks with interactivity and mobility. Under mobile IPTV scenarios, people can watch various contents whenever they want and even deliver their request to service providers through the network. However, the frequent change of the wireless channel bandwidth may hinder the quality of service. In this paper, we propose an objective video quality measure (VQM) for mobile IPTV services, which is focused on the jitter measurement. Jitter is the result of frame repetition during the delay and one of the most severe impairments in the video transmission via mobile channels. We first employ YUV color space to compute the duration and occurrences of jitter and the motion activity. Then the VQM is modeled by the combination of these three factors and the result of subjective assessment. Since the proposed VQM is based on no-reference (NR) model, it can be applied for real-time applications. Experimental results show that the proposed VQM highly correlates to subjective evaluation. Keywords: Mobile IPTV, Jitter, VQM, NR model

1. INTRODUCTION
With the development of network infrastructure, IPTV becomes an essential service with time-independency and interactivity. Since IPTV provides a digital television services by using Internet over a network infrastructure, people can watch various TV contents whenever they want. In this sense, mobile IPTV can be defined as IPTV service with mobility. Under mobile IPTV scenarios, people can watch TV using mobile devices even though they are in motion and even deliver their opinion to service providers through the wireless network. However, since a bandwidth of the wireless channel is frequently varying with time, two main types of transmission impairments can occur as follows: packet loss and delay [1]. Both impairments result in missing of some video streams. If a current frame is severely corrupted by these impairments, most of decoders discard the corrupted frame and play the previous video frame repeatedly until the next valid decoded frame is available. Jitter is the result of such frame repetition during delay. From the perceptual quality view point, jitter can give annoyance to users. For example, since the user shift between different wireless channels, such as wireless local area network (WLAN) and code division multiple access (CDMA), results in handover jitter, user would see frame freezing until a new frame arrives. Therefore, VQM evaluating jitter impact is needed to monitor the quality of mobile IPTV service. While there have been many research papers for analysis of spatial degradations, such as block distortion and blurring [2, 3, 4], little work has been done for temporal degradation like jitter and jerkiness. Moreover, few of them address the necessity of VQM regarding jitter. Chang et al. [5] proposed a delay model to reconstruct input video asynchronously with respect to network delay jitter. Claypool et al. [6] analyze the effects of jitter on the perceptual quality of video. They also compare the effect of jitter with the packet loss. Five types of contents (i.e., animation, information, news, comedy, and sports) with different temporal characteristics are used in their experiments. Based on their results, we can see that the result of subjective assessment is observed to have the logarithmical decreasing with the increasing of the distortion level of jitter. In the approach of [1], authors use mean opinion score (MOS) to conduct the subjective assessment. They also compare the results from different codec system. Interestingly, they find that a single but long frame freezing is better than the situation that frequent short frame freezing occurs. Qi et al. [7] analyze the impact of temporal degradation with the position, duration, and retransmission. Most of their results are similar with previous work but they find that the temporal position of jitter is not important to perceived video quality.

*ckim@icu.ac.kr; phone 82 42 866-6168; fax 82 42 866-6245; vega.icu.ac.kr/~ckim


Applications of Digital Image Processing XXXI, edited by Andrew G. Tescher, Proc. of SPIE Vol. 7073, 70730S, (2008) 0277-786X/08/$18 doi: 10.1117/12.799448

Proc. of SPIE Vol. 7073 70730S-1 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/26/2013 Terms of Use: http://spiedl.org/terms

Unlike previous work, we propose an objective VQM to evaluate the jitter impact occurring through mobile IPTV service. First, we detect jitter impairment in the video sequences using Euclidean distance between color vectors in the YUV color space and compute the duration and occurrence of jitter and motion activity of content. Then NR modelbased VQM is modeled by the combination of these factors and the subjective assessment. The rest of this paper is organized as follows: We summarize the perceptual video quality measures in Section 2. The proposed jitter detection algorithm and the process of VQM modeling are explained in Section 3. The experiments to estimate the correlation between VQM result and subjective assessment are carried out and analyzed in Section 4, followed by conclusion in Section 5.

2. PERCEPTUAL VIDEO QUALITY MEASURE


In general, the perceptual objective video quality measures can be categorized into three categories: full-reference (FR), reduced-reference (RR), and no-reference (NR) [8]. Both original and degraded video sequences are used in the FR model. PSNR and MSE fall into FR model. However, since pixel values of original and degraded videos are used in the FR model, computational burden is very large. In the RR model, extracted features of original and degraded videos are used instead of all pixel values. Perceptual video quality is computed by using these features. Finally, NR model uses degraded video sequence without requiring original video sequence. NR model can be applied for real-time services because the original video is not needed. Therefore, in this paper, NR model is employed to measure the jitter impact. In other words, we use only degraded videos for the jitter detection algorithm and VQM modeling. The perceptual objective video quality model is shown in Fig. 1.

Fig. 1. The perceptual video quality measure model.

3. PROPOSED METHOD
Our goal is to detect the jitter impairment and model VQM based on detection results for better mobile IPTV services. The proposed method is composed of two steps, which are jitter detection and VQM modeling. The overall procedure is shown in Fig. 2. Each step in Fig. 2 is explained in the following subsections.

In put video

Jitter detection

VQM modeling

'I

Fig. 2. The overall procedure for our proposed method.

Proc. of SPIE Vol. 7073 70730S-2 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/26/2013 Terms of Use: http://spiedl.org/terms

3.1.

Jitter detection

In this section, we measure three factors, which are the duration and occurrence of jitter and the motion activity of the content to detect jitter. Since the same frame is simply repeated when the jitter occurs, we can easily detect the jitter by using the difference between the previous and current frame. First, we divide each frame into 64 same-sized blocks and define the color vector using the average YUV value of each block. Then we compute the Euclidean distance between the color vectors of the same position on the consecutive frames as follows,
d t ,i = (Yt ,i Yt 1,i ) 2 + (U t ,i U t 1,i ) 2 + (Vt ,i Vt 1,i ) 2 , 0 i < 64,

(1)

where Y ,U ,V denote the average of YUV values of each block, respectively. t and i denote the frame number and the block number, respectively. If the distance in Eq. (1) is smaller than a pre-defined value, the corresponding block is determined as a freezing block. Further, if all of blocks in the image are determined as freezing blocks, then the freezing frame is detected and the jitter duration is increased by 1. Since people generally feel the freezing effect when jitter occurs with more than 10 frames (i.e., about 1/3 second) based on our experiments, we increase the number of jitter occurrence in the case that the number of consecutive freezing frames is more than 10. Finally, the motion activity is also computed by using the distance in Eq. (1). However, since the freezing frame and shot change can have an impact on the motion activity even though they are not related to the motion, we define freezing and shot change flag not to include the corresponding frame for computing the motion activity as follows,
1, if the jitter occurrence is increased , freezing _ flag t = 0, otherwise 1, if Dt > TH , where Dt = shot _ change _ flag t = 0, otherwise

64 .
i =1

64

d t ,i

(2)

t denotes the frame number as mentioned. Now it is safe to compute the motion activity using the summation of the average distance Dt as follows,
L

MA =

D
t =1

only if freezing _ flag t = 0 shot _ change _ flag t = 0,

(3)

where L is the total number of video frames. The procedure of jitter detection is shown in Fig. 3 in detail.

compute

____________ If the number of freezing blocks k64 - the


numberoffreezingframe+1

Ifthe numberofconsecutivefreezing frames


is more than 10 -jitter occurrence +1
freezing_flag 4* shot_change_flag

Computethe motionactivi

Fig. 3. The procedure of jitter detection. We can obtain the duration and occurrence of jitter and the motion activity of the content.

Proc. of SPIE Vol. 7073 70730S-3 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/26/2013 Terms of Use: http://spiedl.org/terms

3.2.

VQM modeling

In an effort to measure the jitter impact and model the VQM, we combine the three factors as mentioned in the previous section with the subjective assessment. To evaluate the perceptual video quality, we use the mean opinion score (MOS) in which subjects are asked for an explicit rating from 5 (excellent) to 1 (bad) after watching a video clip [9]. Test video sequences are presented according to the Double-Stimulus Continuous Quality-Scale (DSCQS) methodology. Gray sequence is inserted between the original and degraded videos to remove the effect of quality of the previous video. The MOS score table and the procedure of subjective assessment are shown in Table 1 and Fig. 4.
Table 1. MOS score. Quality Impairment Excellent Good Fair Poor Bad Imperceptible Perceptible but not annoying Slightly annoying Annoying Very annoying

MOS 5 4 3 2 1

Original sequence

Degraded

Original

Degraded sequence

Voting

los

2s

los

2s

los

2s

los

los

Fig. 4. The procedure of subjective assessment.

We select three video sequences from the three different temporal categories by our proposed motion activity: News, Movie, and Music video. The motion activity of each video is 0.4, 2.055, and 6.048, respectively. The test video sequences are composed of 300 frames, which are encoded by XviD MPEG-4 format with the image size of 320 240. We limit the maximum duration and occurrence of jitter by 30 frames and 3, respectively. There are 20 persons joined the subjective assessment, and all of them are graduate students. They are around 25 years old having moderate to extensive computer experience. We use the PDA (LG-PDA18264) for the subjective assessment. The result of subjective assessment is shown in Fig. 5 and 6. F and C in x-axis of Fig. 5 and 6 denote the frames and occurrence, respectively.

45

4-S

S-S

55
O

o
25

25

-a-
ic

1-S

'OF

20F

SOF

2C

Sc

Duration

Occurrence

(a) (b) Fig. 5. The result of subjective assessment. (a) MOS vs. duration. (b) MOS vs. occurrence.

Proc. of SPIE Vol. 7073 70730S-4 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/26/2013 Terms of Use: http://spiedl.org/terms

MOS

Fig. 6. The result of subjective assessment to analyze the effect of occurrence.

First, we evaluate the perceptual video quality with the jitter duration (10, 20, and 30 frames) when the occurrence is fixed by 1. Since the impact of the position of jitter is negligible [7], the freezing frames are randomly inserted. As shown in Fig. 5-(a), the increasing of the jitter duration causes the decreasing quality of the perceived video. We also consider the impact of occurrence of jitter on the perceptual video quality. MOS decreases with increasing the number of occurrences as shown in Fig. 5-(b). In both cases, we can see that MOS decreases logarithmically with increasing the level of distortion. Moreover, MOS decreases drastically with the increasing of the motion activity. In Fig. 6, the numbers of x-axis denote the duration of jitter and the numbers in the parenthesis denote the occurrence of jitter. The result of subjective assessment is represented with the motion activity in Fig. 6. We can see that viewers are less annoyed by a situation in which a single but long freezing occurs than a situation in which short freezing occurs frequently regardless of the motion activity. Based on the results of subjective assessment for the three factors, VQM is modeled as follows: Let D and O denote the normalized duration and occurrence and can be easily obtained from dividing the total number of durations and occurrences by L/3 and L/30, respectively, where L denotes the total number of frames as mentioned in the previous section. Based on the result of subjective assessment, we model VQM using the log function as follows,
VQM = 1 M A ln(1 + D O ),

(4)

Since the factor in the log function cannot be zero, we add 1 to the factor in Eq. (4). Even though the total duration of jitter is same between two degraded videos, the degraded video with more frequent freezing can have lower VQM value than the other in terms of Eq. (4). Therefore, the perceptual video quality can be well estimated by our proposed VQM. The performance of our proposed VQM is presented in the following section in detail.

4. EXPERIMENTAL RESULTS
To analyze the performance of our proposed VQM, we select 15 videos, which can be categorized into different five contents serviced popularly on IPTV, i.e., news, movie, TV show, music video, and sports. The order of displaying is randomly determined to minimize the effect of content. Video sequences are displayed at their original resolution of 320 240 pixels in the whole of the PDA screen. Both original and degraded sequences are 10 seconds in duration. The test sequences are also presented according to the DSCQS methodology as mentioned. We have 20 persons in our subjective assessment to analyze the correlation between the result of our proposed VQM and the perceptual video quality. The test videos are shown in Fig. 7.

Proc. of SPIE Vol. 7073 70730S-5 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/26/2013 Terms of Use: http://spiedl.org/terms

(a)

(b) (c) (d) Fig. 7. Test videos. (a) News. (b) Movie. (c) TV show. (d) Music video. (e) Sports.

(e)

Various videos are used with different temporal characteristics as shown in Fig. 7. Since news and TV show are very static compared to the movie, music video, and sports, some people cannot recognize whether the jitter occurs or not in case that the duration of jitter is short. In contrast to that, since objects are big and very dynamic in the movie, music video, and sports, the short duration of jitter can have a dramatic impact on viewers. Moreover, in case of soccer and basketball, it is critical to lose some parts of videos by the jitter because players and balls move very fast. The duration of jitter inserted in the test videos are 10, 20, and 30 frames. The total duration and occurrence of jitter are limited by 60 frames and 3 based on the total number of test video frames, respectively. The result of jitter detection and VQM are shown in Table 2. The result of subjective assessment is also shown in Table 2. The numbers in the parenthesis at the second column denote the number of occurrence.
Display order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Table 2. The result of jitter detection, VQM, and subjective assessment. Duration Motion activity VQM MOS Contents (Occurrence) 10(1) 0.498 0.995 4.1765 TV show2 20(1) 10(2), 20(1) 10(2) 10(2), 20(1) 0(0) 10(1) 10(2), 20(1) 20(2) 0(0) 20(3) 0(0) 10(3) 10(1), 20(2) 10(2), 20(1) 4.147 2.109 3.023 2.669 1.475 0.969 2.668 2.663 0.627 2.011 1.387 0.807 1.153 1.643 0.918 0.761 0.881 0.698 1 0.990 0.725 0.795 1 0.667 1 0.930 0.839 0.814 3.706 2.235 2.882 1.647 4.882 3.941 2.529 2.882 4.882 1.941 5 2.8235 2.412 2.235 Movie1 Movie2 Music1 Sports3 Music2 News1 Movie3 News2 TV show1 Music3 Sports1 TV show3 News3 Sports2

The duration and occurrence of jitter are correctly detected by our proposed method. The results of motion activity also reflect the type of content as well. As you can see, the order of test video sequences is randomized to minimize

Proc. of SPIE Vol. 7073 70730S-6 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/26/2013 Terms of Use: http://spiedl.org/terms

contextual effects. Video 6, 10, and 12 have higher values of VQM and MOS than other videos because there is no jitter in those videos. In video 3 and 5, we can see that the values of VQM and MOS decrease with the increasing of the motion activity under the same distortion condition. We already know the fact that viewers prefer a scenario in which a single but long freezing occurs to a scenario in which short freezing occurs frequently as mentioned. In this sense, although the jitter duration of video 8 is the same as that of video 9 with similar motion activity, both values of VQM and MOS are lower than those of video 9. Based on the result of VQM and MOS in the fourth and fifth column, the correlation between these two values is computed. In this paper, the pearsons correlation [10], which is widely used to reflect the degree of linear relationship between different two types of variable, is employed and can be defined as follows,
r= = cov( x, y )

x y

xy ( x)( y) n ( x ) ( x ) n ( y ) ( y )
n
2 2 2

(5)
2

where x and y denote the variables, which can be regarded as the values of VQM and MOS, respectively. n denotes the number of variables. The correlation coefficient r can be obtained by using covariance and variance between two variables. r is 1 if there is a perfect positive relationship between variables. Since the direction of change is same, it is thought that the proposed VQM estimate correctly the perceptual video quality when the value of correlation is close to 1. The correlation result is shown in Fig. 8.
Correlation between MOS and VQM

54.5 -

43.5 -

32.5 -

**
r = 0.902
0.g

21.5 -

*
0.7

10.6

0.
yaM

1.1

Fig. 8. The correlation between VQM and MOS. The correlation coefficient r = 0.902

The points in Fig. 8 denote the intersection of the value of MOS and VQM in each test video. We can see that the result of our proposed VQM is highly correlated with MOS.

5. CONCLUSION
An efficient and robust video quality measure for IPTV service is proposed in this paper. The proposed method is divided into jitter detection and VQM modeling based on the detection results. We first compute the Euclidean distance on the YUV color space between color vectors of consecutive frames. Based on the distance values, we obtain three factors, which are the duration and occurrence of jitter and the motion activity. The subjective assessment is carried out to analyze the effect of these three factors. Then VQM is modeled by the combination of these three factors and the result of subjective assessment. To validate the performance of our proposed VQM, the subjective assessment is conducted with popularly serviced on IPTV once again. The proposed method is highly correlated with MOS and is realtime working. Therefore, our proposed method is very useful for mobile applications to improve the quality of IPTV

Proc. of SPIE Vol. 7073 70730S-7 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/26/2013 Terms of Use: http://spiedl.org/terms

serviced vides. Our future work is to model the improved VQM by combining other spatial/temporal distortions and restore the degraded video for better quality service.

ACKNOWLEDGEMENTS
This research was supported by the Ministry of Knowledge Economy, Korea, under the Information Technology Research Center support program supervised by the Institute of Information Technology Advancement. (grant number IITA-2008-C1090-0801-0017)

REFERENCES
[1] Q. Huynh-Thu and M. Ghanbari, Impact of jitter and jerkiness on perceived video quality, International Workshop on Video Processing and Quality Metrics, Jan. 2006. [2] T. Vlachos, Detection of blocking artifacts in compressed video, Electronics Letters, vol. 36, no. 13, pp. 11061108, 2000. [3] H. R. Wu and M. Yuen, A generalized block-edge impairment metric for video coding, IEEE Signal Processing Letters, vol. 4, no. 11, pp. 317-320, Nov. 1997. [4] M. C. Q. Farias and S. K. Mitra, No-reference video quality metric based on artifact measurement, International Conference on Image Processing, vol. 3, pp. 11-14, Sept. 2005. [5] Y. Chang et al., Effects of temporal jitter on video quality: assessment using psychophysical methods, SPIE Human Vision and Electronic Imaging III, pp. 173-179, July 1998. [6] M. Claypool and J. Tanner, The effects of jitter on the perceptual quality of video, ACM International Conference on Multimedia, pp. 115-118, Oct. 1999. [7] Y. Qi and M. Dai, The effect of frame freezing and frame skipping on video quality, International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 423-426, Dec. 2006. [8] User requirements for perceptual video quality monitoring of IPTV, FG-IPTV-ID-0013, Geneva, 10-14 July 2006. [9] Methods for subjective determination of transmission quality, ITU-T P.800, Aug. 1996. [10] Final report from the video quality experts group on the validation of objective models of video quality assessment, VQEG, 2000.

Proc. of SPIE Vol. 7073 70730S-8 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/26/2013 Terms of Use: http://spiedl.org/terms

You might also like