You are on page 1of 4

ACCURATE PARAMETER ESTIMATION AND EFFICIENT FADE DETECTION FOR

WEIGHTED PREDICITON IN H.264 VIDEO COMPRESSION


Rui Zhang* and Guy Cote**
Cisco Systems Inc, 170 West Tasman Drive, San Jose, CA 95134*
Apple, 1 Infinite Loop, Cupertino, CA 95014**
ruizhang@cisco.com, cote@apple.com
ABSTRACT
Weighted prediction is a useful tool in video compression to
encode scenes with lighting changes, such as fading scenes.
Estimating weighted prediction parameters has been
extensively discussed in the literature, however no
mathematical model has been proposed. Moreover, the
detection of the fading scenes in a real-time encoding system
has received little attention. This paper addresses both of these
aspects. An accurate parameter estimation algorithm for H.264
encoding is first derived for both the multiplicative factor and
the additive offset based on a fading model. An efficient
algorithm is then proposed to detect fade in a real-time
encoding system, with simple statistics calculations, very low
storage requirement, and low encoding delay. Simulation
results show very accurate detection and compression gains of
5-30% over existing techniques.
Index Terms Weighted prediction, fade detection, video
compression, H.264

1.

INTRODUCTION

Motion compensation is a major tool to achieve compression


efficiency in video coding systems, where the current picture is
predicted from a reference picture and only the prediction
difference is encoded. The higher correlated the prediction
picture is to the current picture, the higher the compression
efficiency is. However, in some video scenes, particularly
fading scenes, the current picture is more correlated to the
reference picture scaled by a weighting factor than to the
reference picture itself. Hence the weighted prediction (WP) is
a useful tool under such scenarios. Modern video coding
standards, such as H.264, have adopted WP to improve coding
efficiency in certain conditions.
In a real-time coding system, there are typically two steps
for WP. First, the fading scenes are detected; second, the WP
parameters (a multiplicative factor and an additive offset in
H.264) are estimated. In a practical system, both tasks need to
be accomplished with low delay; simple calculations and low
storage requirement are also required.
For the fading detection problem, most of the algorithms in
the literature rely on a relatively long window of pictures to

978-1-4244-1764-3/08/$25.00 2008 IEEE

2836

observe enough statistics for an accurate detection. For


example, Altar proposed a method by exploiting the average
luminance changes and semi-parabolic behavior of the variance
curve [1]; Qian and al. proposed an algorithm that exploits the
accumulating histogram differences [2]. However, such
methods require the availability of the statistics of the entire
fade duration, which introduces long delays and is impractical
in real-time encoding systems. In this paper, we focus on
algorithms that detect the fade in a very short window of
pictures, and that are robust to different conditions, such as
motion.
For the parameter estimation problem, the simple and
empirical method of using pixel average values (DC) is often
used in the literature [3] and the H.264 reference software [4].
The multiplicative weighting factor is calculated as the ratio of
the DC values for the current picture and the reference picture;
the additive offset is set as zero. In this paper, an accurate
estimation for both the multiplicative weighting factor and the
additive offset are derived mathematically from the fade model.
The simulation results show that this accuracy in parameter
estimation reduces bit rate by 5%-30% for the same video
quality. This paper focuses only on uni-directional prediction
and global WP. The algorithm can be easily extended to bidirectional prediction and localized WP [5].
The rest of the paper is organized as follows. An overview
of WP in an encoding system is presented in Section 2. An
accurate parameter estimation method is then derived from the
mathematical model of fade in Section 3. An efficient and
robust fade detection algorithm which uses simple statistics is
described in Section 4. Simulation results are presented in
Section 5.

2.

WEIGHTED PREDICTION OVERVIEW

Figure 1 shows the procedure of applying WP in a real-time


encoding system. First, some statistics are generated through
video analysis. The statistics within a small window, from
several previous pictures till the current picture, are then used
to detect fade. Each picture is assigned a state value indicating
if the picture is in the state of NORMAL or in the state of
FADE. Such state values are saved for each picture. When
encoding a picture, if there is a FADE state in either the current
picture or one of its reference pictures, the WP will be used for

ICIP 2008

this current-reference pair, and statistics of current picture and


the corresponding reference picture are processed to estimate
the WP parameters. These parameters are then passed on to the
encoding engine. Otherwise the normal encoding is done.

Video
Analysis

statistics

Fade
Detection

WP
parameters WP

Parameter
Estimation

Encoding

current
picture
state

no

(3-2)

From the fade model, we can derive

D (t )
D (t  1)

var iance( f (t ))

D (t )
D (t  1)

(3-5)

G2

M (t )

mean( F (t ))

G (t )

var iance( F (t ))

M (t ) D (t )m  (1  D (t ))C

M (t  1) D (t  1)m  (1  D (t  1))C

(3-7)

and

G F2 (t ) D 2 (t )G f2 (t ) D 2 (t )G 2
G F2 (t  1) D 2 (t  1)G f2 (t  1) D 2 (t  1)G 2

(3-8)

Therefore we can derive the weight using

w(t )

D (t )
D (t  1)

w(t )

D (t )
D (t  1)

M (t )  C
M (t  1)  C

(3-9)

or

G F2 (t )
G F2 (t  1)

(3-10)

Since the solid color value C is generally unknown to the


encoder, using the square root of variance makes a more
accurate and robust estimation.
After the weight is derived, the offset can be easily
calculated as:

o (t )

F (t  1, m, n)  [ E (t ) g (t , i, j )

(3-6)

Then we have:

(3-1)

fade in (out) of f.
Now consider the WP model. For weighted uni-prediction,
when pixel at position (i,j) in frame t is predicted from pixel at
position (m,n) in frame t-1 the following relationship is
assumed:

F (t , i, j )

G (t )

2
F

D (t )  E (t ) 1 . When g is a solid color, and the


weighting factor D (t ) is getting larger (smaller), it is called

w(t ) F (t  1, m, n)  o (t )

The mean and variance of the combined signal can be defined


as:

where

F (t , i , j )

mean( f (t ))

PARAMETER ESTIMATION

D (t ) f (t , i, j )  E (t ) g (t , i, j )

and C are unknown to the

m f (t )
2
f

This section first describes the general mathematical model of


fading scenes. The proposed parameter estimation algorithm is
then derived.
First consider the following fade model. Let f (t , i, j )
denote the pixel value at position (i,j) in frame t in one original
sequence f, and g (t , i, j ) denote the pixel value at position
(i,j) in frame t in another original sequence g. The linear
combination of these two sequences within one particular
period T is represented as:

F (t , i, j )

D (t ) , E (t )

(3-4)

encoder. We have to estimate w(t ) and o(t ) with


observations of the fading scenes.
Now we derive the parameter estimation method for the fade
case. Assuming the signals are ergodic, the mean and variances
of the original sequences f can be defined as:

Decision
of using
WP

Figure 1: Weighted Prediction Workflow Chart

3.

D (t )
E (t  1)]C
o(t ) [ E (t ) 
D (t  1)
Note that all

Picture-state
Records

reference
pictures
state
yes

D (t )
D (t  1)

w(t )

M (t )  w(t ) M (t  1)

(3-11)

In H.264, after the fade is detected and the WP is to be used,


the parameters are calculated for each pair between current
picture and the reference picture.

(3-3)

E (t  1) g (t  1, m, n)]

Hence only when g is a solid color C, i.e. the values are same
regardless of time and location, we can match exactly to the
WP model with

2837

4.

FADE DETECTION

Fading effects result to the lighting changes, which can be


reflected in both the luma average values M (t ) and the luma

variance values V F 2 (t ) . We propose to check both of these


two statistics to achieve simple yet efficient and robust
detection.
First we look at the first order derivative of the luma
average values for each picture. From Equation (3-7) we have:

'M (t )

(D (t )  D (t  1))(m  C ) (4-1)

For a linear fading model where D (t ) t / T , 'M (t ) is a


constant value [1]. For more general cases, 'M (t ) should
have same sign during the fade. For example, for a fade out of
signal f into black scene, ( m  C ) is always greater than zero,

(D (t )  D (t  1)) is

always less than zero, hence


'M (t ) is always less than zero; Furthermore, the fade is
always a steady change between pictures, i.e. the changes
between adjacent frames are very similar, so we expect the
second
derivative
of
the
luma
average
values
''M (t ) 'M (t )  'M (t  1) are close to zero.
We also define the ratio of the luma variance for two
adjacent pictures as:
while

r (t )

G F2 (t )
G F2 (t  1)

D 2 (t )
D 2 (t  1)

(4-2)

It is obvious that for a fade out of f, r (t ) is always less than


one, while for a fade in of f, it is always great than one. To
avoid some false alarms of entering into the fading mode, we
also expect there are real fading changes between the pictures,
i.e., r (t ) is a little bit away from one when entering from the
NORMAL mode to the FADE mode.
Fading is a continuous behavior. It should be detected using
a window of pictures. In the following representation, statistics
of N frames are used. N equal to 1 means there is only the
current picture statistics used. This implies the delay of N-1
frames between the video analysis and the encoding. In a
practical encoding system, only a short window is allowed to
achieve low delay. For example, with a hierarchical B picture
GOP (Group of Pictures) structure of IbBbP, where B is a
reference bi-directional picture, and b is a non-reference bidirectional picture, N can be set as 4 without introducing
further delay.
In summary, we define the following criteria for the fade
detection, using the above functions of the two statistics. For
each current picture, its state is initialized as NORMAL. Only
if all of the criteria are satisfied, a fade is declared. The state of
the current picture and the states of the pictures in the past N-1
frames are then set as FADE.
1.
Detect a luminance level change (picture getting
brighter or darker) among the past N frames, i.e.,
'M (t ) , 'M (t  1) ,, 'M (t  N  1) have the
same sign.
2.
Detect a steady change between pictures (the changes
between adjacent frames are similar), i.e.,
''M (t ) , ''M (t  1) ,, ''M (t  N  2)
are within a threshold MAX_DELTA_DELTA_DC.

2838

3.

4.

Detect a consistent change of the luma variance


(continuously larger than one or less than one) among
the past N frames, i.e., r (t )  1 , r (t  1)  1 ,,
r (t  N  1)  1 have the same sign.
Detect a noticeable changes in the ratio of variances,
i.e., all r (t ) , r (t  1) ,, r (t  N  1) are less than
a threshold FADE_MIN_VAR_RATIO or greater
than a threshold FADE_MAX_VAR_RATIO. This
criterion is only checked if the previous frame t-1 is
in NORMAL state to avoid the false alarm of
entering into the FADE state.

Default
values
for
MAX_DELTA_DELTA_DC,
FADE_MIN_VAR_RATIO, FADE_MAX_VAR_RATIO have
been determined experimentally to 10, 0.96, 1.05, respectively.
. When all of the above criteria are satisfied, the states of
frame t, t-1,,t-N+1 are all set as FADE. Note that the fade is a
continuous behavior, so the states of all the frames in this N
frame window are set at the same time; Also note that the delay
happens during the fade detection. A frame can transit from
NORMAL to FADE state, but once a frame is in FADE state, it
will stay in FADE state (i.e., transition from FADE to
NORMAL state is not allowed for the same frame). For
example, when frame 0-3 are analyzed, the above criteria are
not satisfied, their states are as NORMAL; but when frame 1-4
are analyzed, the above criteria are satisfied, hence the states of
frame 1-4 are all set as FADE, reflecting the entering into
FADE from frame 1. Then when frame 2-5 are analyzed, the
above criteria are not satisfied, so frame 5 is in NORMAL state,
but frame 2-4 are still in the FADE state, reflecting the leaving
of the FADE states in frame 5.
After the fade detection, the decision to use WP or not is
made. If there any of the reference pictures or the current
picture are in FADE state, then the WP is used. For each
reference picture in the prediction list, if its state or the state of
the current picture is FADE, the WP parameters for this pair are
calculated and transmitted in the bitstream.

5.

SIMULATION RESULTS

Three sequences were used in the simulation. The Trailer is a


480x204 sequence from a movie trailer with fade out; The
Low-Motion and High-Motion are synthetically generated
720x480 sequence with both fade in and fade out, with low
motion scenes and high motion scenes respectively.
To evaluate the effectiveness of the fade detection
algorithms with different delays, delay of 2 frames and delay of
3 frames are simulated. Both are short windows and suitable for
real-time encoding systems. Figure 2 shows the detection
results for the Trailer sequences. In True Transition, value 1
means a NORMAL state, while value 2 means a FADE state.
The detection errors are calculated as the difference between
the true transition and the detected transition. For this particular
sequence, delay of 2 frames introduced some false alarm, while
delay of 3 frames gave the correct detections. The false alarm
happened on some zoom scenes, where the statistics happened
to be similar to the fade case. For the other two synthetic

sequences, both delay of 2 frames and 3 frames gave correct


detections. So delay of 3 frames is in general sufficient for the
fade detection with the proposed algorithms.
To evaluate the performance of the proposed parameter
estimation algorithm, all three sequences are encoded using
QP=28,32,36 and 40 with a H.264 codec of I/P pictures only.
Three methods are compared. In No WP, no weighted
prediction is used; In WP withDC, the weight is estimated as
the ratio of the luma DC values, while the offset is set as zero.
This is the algorithm used in the JM encoder and is the most
popular method; Proposed WP represents our proposed
algorithm. The detection results with delay of 3 frames are used
to decide which pictures use weighted prediction for both WP
withDC and Proposed WP so the only difference is the
parameter estimation. Figure 3 and Figure 4 illustrate the ratedistortion (RD) performance of all the methods for Trailer
and High-Motion respectively. Table 1 gives the average
PSNR gain and bitrate savings using the measurement in [6]. It
clearly shows that the proposed WP algorithm outperforms the
traditional methods with 5%-30% bitrate savings. The gains are
bigger in lower bit rate and in higher motion scenes.

6.

PSNR

Results for Trailer Sequence


41
40
39
38
37
36
35
34
200

No WP
WP with DC
Proposed WP

400

600

800

1000

1200

bit rate

Figure 3: RD Performance for Trailer


Results for High-motion Sequence
39

PSNR

37

No WP
WP with DC

35

Proposed WP

33
31
200

CONCLUSIONS

1200

2200

3200

4200

5200

bit rate

In this paper, an accurate weighted prediction parameters


estimation algorithm and an efficient and robust fade detection
algorithm were proposed. The algorithms use very simple
statistics with low delay, which is suitable for practical realtime encoding systems. Simulation results show accurate
detection results and significant compression efficiency gains.

Trailer
Low-Mot
High-Mot

WP withDC
Bitrate(%)
PSNR(dB)
-17.93
0.88
-20.02
1.12
-9.86
0.54

Proposed WP
Bitrate(%)
PSNR(dB)
-22.78
1.10
-34.35
1.85
-23.11
1.06

[2] X. Qian, G. Liu and R. Su, Effective Fades and Flashlight


Detection Based on Accumulating Histogram Differences, pp.
1245-1258, IEEE Transactions on CSVT, vol. 16, No. 10,
2006.
[3] J. Boyce, Weighted Prediction in the H.264/MPEG4 AVC
video coding standard, ISCAS, pp. 789-792, May 2004.

Fade Detection Results

[4]
JVT
Reference
http://bs.hhi.de/~suehring/download

2
Transition state

REFERENCES
[1] A. M. Alattar, Detecting Fade Regions in Uncompressed
Video Sequences, pp 3025-3028, ICASSP 1997.

Table 1: Performance comparison


Sequence

Figure 4: RD Performance for "High-Motion"

Software,

True Transition
Delay 2 Detection Error
Delay 3 Detection Error

0
1

11

16

21

26

-1
Picture nubmer

[5] P. Yin, A. Tourapis and J. Boyce, Localized Weighted


Prediction for Video Coding, pp.4365-4368, ISCAS, May
2005
[6] G. Bjontegaard, Calculation of average PSNR differences
between RD-Curves, document VCEG-M33, Mar01.

Figure 2: Fade Detection Results for Trailer

2839

You might also like