Accurate Parameter Estimation and Efficient Fade Detection For Weighted Prediction in H 264 Video Compression

ACCURATE PARAMETER ESTIMATION AND EFFICIENT FADE DETECTION FOR
WEIGHTED PREDICITON IN H.264 VIDEO COMPRESSION

Rui Zhang* and Guy Cote**
Cisco Systems Inc, 170 West Tasman Drive, San Jose, CA 95134*
Apple, 1 Infinite Loop, Cupertino, CA 95014**
ruizhang@cisco.com, cote@apple.com
ABSTRACT
Weighted prediction is a useful tool in video compression to
encode scenes with lighting changes, such as fading scenes.
Estimating weighted prediction parameters has been
extensively discussed in the literature, however no
mathematical model has been proposed. Moreover, the
detection of the fading scenes in a real-time encoding system
has received little attention. This paper addresses both of these
aspects. An accurate parameter estimation algorithm for H.264
encoding is first derived for both the multiplicative factor and
the additive offset based on a fading model. An efficient
algorithm is then proposed to detect fade in a real-time
encoding system, with simple statistics calculations, very low
storage requirement, and low encoding delay. Simulation
results show very accurate detection and compression gains of
5-30% over existing techniques.
Index Terms Weighted prediction, fade detection, video
compression, H.264
1.
INTRODUCTION
Motion compensation is a major tool to achieve compression

efficiency in video coding systems, where the current picture is
predicted from a reference picture and only the prediction
difference is encoded. The higher correlated the prediction
picture is to the current picture, the higher the compression
efficiency is. However, in some video scenes, particularly
fading scenes, the current picture is more correlated to the
reference picture scaled by a weighting factor than to the
reference picture itself. Hence the weighted prediction (WP) is
a useful tool under such scenarios. Modern video coding
standards, such as H.264, have adopted WP to improve coding
efficiency in certain conditions.
In a real-time coding system, there are typically two steps
for WP. First, the fading scenes are detected; second, the WP
parameters (a multiplicative factor and an additive offset in
H.264) are estimated. In a practical system, both tasks need to
be accomplished with low delay; simple calculations and low
storage requirement are also required.
For the fading detection problem, most of the algorithms in
the literature rely on a relatively long window of pictures to
978-1-4244-1764-3/08/$25.00 2008 IEEE
2836
observe enough statistics for an accurate detection. For

example, Altar proposed a method by exploiting the average
luminance changes and semi-parabolic behavior of the variance
curve [1]; Qian and al. proposed an algorithm that exploits the
accumulating histogram differences [2]. However, such
methods require the availability of the statistics of the entire
fade duration, which introduces long delays and is impractical
in real-time encoding systems. In this paper, we focus on
algorithms that detect the fade in a very short window of
pictures, and that are robust to different conditions, such as
motion.
For the parameter estimation problem, the simple and
empirical method of using pixel average values (DC) is often
used in the literature [3] and the H.264 reference software [4].
The multiplicative weighting factor is calculated as the ratio of
the DC values for the current picture and the reference picture;
the additive offset is set as zero. In this paper, an accurate
estimation for both the multiplicative weighting factor and the
additive offset are derived mathematically from the fade model.
The simulation results show that this accuracy in parameter
estimation reduces bit rate by 5%-30% for the same video
quality. This paper focuses only on uni-directional prediction
and global WP. The algorithm can be easily extended to bidirectional prediction and localized WP [5].
The rest of the paper is organized as follows. An overview
of WP in an encoding system is presented in Section 2. An
accurate parameter estimation method is then derived from the
mathematical model of fade in Section 3. An efficient and
robust fade detection algorithm which uses simple statistics is
described in Section 4. Simulation results are presented in
Section 5.
2.
WEIGHTED PREDICTION OVERVIEW
Figure 1 shows the procedure of applying WP in a real-time

encoding system. First, some statistics are generated through
video analysis. The statistics within a small window, from
several previous pictures till the current picture, are then used
to detect fade. Each picture is assigned a state value indicating
if the picture is in the state of NORMAL or in the state of
FADE. Such state values are saved for each picture. When
encoding a picture, if there is a FADE state in either the current
picture or one of its reference pictures, the WP will be used for
ICIP 2008
this current-reference pair, and statistics of current picture and

the corresponding reference picture are processed to estimate
the WP parameters. These parameters are then passed on to the
encoding engine. Otherwise the normal encoding is done.
Video
Analysis
statistics
Fade
Detection
WP
parameters WP
Parameter
Estimation
Encoding
current
picture
state
no
(3-2)
From the fade model, we can derive
D (t )
D (t 1)
var iance( f (t ))
D (t )
D (t 1)
(3-5)
G2
M (t )
mean( F (t ))
G (t )
var iance( F (t ))
M (t ) D (t )m (1 D (t ))C
M (t 1) D (t 1)m (1 D (t 1))C
(3-7)
and
G F2 (t ) D 2 (t )G f2 (t ) D 2 (t )G 2
G F2 (t 1) D 2 (t 1)G f2 (t 1) D 2 (t 1)G 2
(3-8)
Therefore we can derive the weight using
w(t )
D (t )
D (t 1)
w(t )
D (t )
D (t 1)
M (t ) C
M (t 1) C
(3-9)
or
G F2 (t )
G F2 (t 1)
(3-10)
Since the solid color value C is generally unknown to the

encoder, using the square root of variance makes a more
accurate and robust estimation.
After the weight is derived, the offset can be easily
calculated as:
o (t )
F (t 1, m, n) [ E (t ) g (t , i, j )
(3-6)
Then we have:
(3-1)
fade in (out) of f.
Now consider the WP model. For weighted uni-prediction,
when pixel at position (i,j) in frame t is predicted from pixel at
position (m,n) in frame t-1 the following relationship is
assumed:
F (t , i, j )
G (t )
2
F
D (t ) E (t ) 1 . When g is a solid color, and the

weighting factor D (t ) is getting larger (smaller), it is called
w(t ) F (t 1, m, n) o (t )
The mean and variance of the combined signal can be defined

as:
where
F (t , i , j )
mean( f (t ))
PARAMETER ESTIMATION
D (t ) f (t , i, j ) E (t ) g (t , i, j )
and C are unknown to the
m f (t )
2
f
This section first describes the general mathematical model of

fading scenes. The proposed parameter estimation algorithm is
then derived.
First consider the following fade model. Let f (t , i, j )
denote the pixel value at position (i,j) in frame t in one original
sequence f, and g (t , i, j ) denote the pixel value at position
(i,j) in frame t in another original sequence g. The linear
combination of these two sequences within one particular
period T is represented as:
F (t , i, j )
D (t ) , E (t )
(3-4)
encoder. We have to estimate w(t ) and o(t ) with

observations of the fading scenes.
Now we derive the parameter estimation method for the fade
case. Assuming the signals are ergodic, the mean and variances
of the original sequences f can be defined as:
Decision
of using
WP
Figure 1: Weighted Prediction Workflow Chart
3.
D (t )
E (t 1)]C
o(t ) [ E (t )
D (t 1)
Note that all
Picture-state
Records
reference
pictures
state
yes
D (t )
D (t 1)
w(t )
M (t ) w(t ) M (t 1)
(3-11)
In H.264, after the fade is detected and the WP is to be used,

the parameters are calculated for each pair between current
picture and the reference picture.
(3-3)
E (t 1) g (t 1, m, n)]
Hence only when g is a solid color C, i.e. the values are same
regardless of time and location, we can match exactly to the
WP model with
2837
4.
FADE DETECTION
Fading effects result to the lighting changes, which can be

reflected in both the luma average values M (t ) and the luma
variance values V F 2 (t ) . We propose to check both of these

two statistics to achieve simple yet efficient and robust
detection.
First we look at the first order derivative of the luma
average values for each picture. From Equation (3-7) we have:
'M (t )
(D (t ) D (t 1))(m C ) (4-1)
For a linear fading model where D (t ) t / T , 'M (t ) is a

constant value [1]. For more general cases, 'M (t ) should
have same sign during the fade. For example, for a fade out of
signal f into black scene, ( m C ) is always greater than zero,
(D (t ) D (t 1)) is
always less than zero, hence

'M (t ) is always less than zero; Furthermore, the fade is
always a steady change between pictures, i.e. the changes
between adjacent frames are very similar, so we expect the
second
derivative
of
the
luma
average
values
''M (t ) 'M (t ) 'M (t 1) are close to zero.
We also define the ratio of the luma variance for two
adjacent pictures as:
while
r (t )
G F2 (t )
G F2 (t 1)
D 2 (t )
D 2 (t 1)
(4-2)
It is obvious that for a fade out of f, r (t ) is always less than

one, while for a fade in of f, it is always great than one. To
avoid some false alarms of entering into the fading mode, we
also expect there are real fading changes between the pictures,
i.e., r (t ) is a little bit away from one when entering from the
NORMAL mode to the FADE mode.
Fading is a continuous behavior. It should be detected using
a window of pictures. In the following representation, statistics
of N frames are used. N equal to 1 means there is only the
current picture statistics used. This implies the delay of N-1
frames between the video analysis and the encoding. In a
practical encoding system, only a short window is allowed to
achieve low delay. For example, with a hierarchical B picture
GOP (Group of Pictures) structure of IbBbP, where B is a
reference bi-directional picture, and b is a non-reference bidirectional picture, N can be set as 4 without introducing
further delay.
In summary, we define the following criteria for the fade
detection, using the above functions of the two statistics. For
each current picture, its state is initialized as NORMAL. Only
if all of the criteria are satisfied, a fade is declared. The state of
the current picture and the states of the pictures in the past N-1
frames are then set as FADE.
1.
Detect a luminance level change (picture getting
brighter or darker) among the past N frames, i.e.,
'M (t ) , 'M (t 1) ,, 'M (t N 1) have the
same sign.
2.
Detect a steady change between pictures (the changes
between adjacent frames are similar), i.e.,
''M (t ) , ''M (t 1) ,, ''M (t N 2)
are within a threshold MAX_DELTA_DELTA_DC.
2838
3.
4.
Detect a consistent change of the luma variance

(continuously larger than one or less than one) among
the past N frames, i.e., r (t ) 1 , r (t 1) 1 ,,
r (t N 1) 1 have the same sign.
Detect a noticeable changes in the ratio of variances,
i.e., all r (t ) , r (t 1) ,, r (t N 1) are less than
a threshold FADE_MIN_VAR_RATIO or greater
than a threshold FADE_MAX_VAR_RATIO. This
criterion is only checked if the previous frame t-1 is
in NORMAL state to avoid the false alarm of
entering into the FADE state.
Default
values
for
MAX_DELTA_DELTA_DC,
FADE_MIN_VAR_RATIO, FADE_MAX_VAR_RATIO have
been determined experimentally to 10, 0.96, 1.05, respectively.
. When all of the above criteria are satisfied, the states of
frame t, t-1,,t-N+1 are all set as FADE. Note that the fade is a
continuous behavior, so the states of all the frames in this N
frame window are set at the same time; Also note that the delay
happens during the fade detection. A frame can transit from
NORMAL to FADE state, but once a frame is in FADE state, it
will stay in FADE state (i.e., transition from FADE to
NORMAL state is not allowed for the same frame). For
example, when frame 0-3 are analyzed, the above criteria are
not satisfied, their states are as NORMAL; but when frame 1-4
are analyzed, the above criteria are satisfied, hence the states of
frame 1-4 are all set as FADE, reflecting the entering into
FADE from frame 1. Then when frame 2-5 are analyzed, the
above criteria are not satisfied, so frame 5 is in NORMAL state,
but frame 2-4 are still in the FADE state, reflecting the leaving
of the FADE states in frame 5.
After the fade detection, the decision to use WP or not is
made. If there any of the reference pictures or the current
picture are in FADE state, then the WP is used. For each
reference picture in the prediction list, if its state or the state of
the current picture is FADE, the WP parameters for this pair are
calculated and transmitted in the bitstream.
5.
SIMULATION RESULTS
Three sequences were used in the simulation. The Trailer is a

480x204 sequence from a movie trailer with fade out; The
Low-Motion and High-Motion are synthetically generated
720x480 sequence with both fade in and fade out, with low
motion scenes and high motion scenes respectively.
To evaluate the effectiveness of the fade detection
algorithms with different delays, delay of 2 frames and delay of
3 frames are simulated. Both are short windows and suitable for
real-time encoding systems. Figure 2 shows the detection
results for the Trailer sequences. In True Transition, value 1
means a NORMAL state, while value 2 means a FADE state.
The detection errors are calculated as the difference between
the true transition and the detected transition. For this particular
sequence, delay of 2 frames introduced some false alarm, while
delay of 3 frames gave the correct detections. The false alarm
happened on some zoom scenes, where the statistics happened
to be similar to the fade case. For the other two synthetic
sequences, both delay of 2 frames and 3 frames gave correct

detections. So delay of 3 frames is in general sufficient for the
fade detection with the proposed algorithms.
To evaluate the performance of the proposed parameter
estimation algorithm, all three sequences are encoded using
QP=28,32,36 and 40 with a H.264 codec of I/P pictures only.
Three methods are compared. In No WP, no weighted
prediction is used; In WP withDC, the weight is estimated as
the ratio of the luma DC values, while the offset is set as zero.
This is the algorithm used in the JM encoder and is the most
popular method; Proposed WP represents our proposed
algorithm. The detection results with delay of 3 frames are used
to decide which pictures use weighted prediction for both WP
withDC and Proposed WP so the only difference is the
parameter estimation. Figure 3 and Figure 4 illustrate the ratedistortion (RD) performance of all the methods for Trailer
and High-Motion respectively. Table 1 gives the average
PSNR gain and bitrate savings using the measurement in [6]. It
clearly shows that the proposed WP algorithm outperforms the
traditional methods with 5%-30% bitrate savings. The gains are
bigger in lower bit rate and in higher motion scenes.
6.
PSNR
Results for Trailer Sequence

41
40
39
38
37
36
35
34
200
No WP
WP with DC
Proposed WP
400
600
800
1000
1200
bit rate
Figure 3: RD Performance for Trailer

Results for High-motion Sequence
39
PSNR
37
No WP
WP with DC
35
Proposed WP
33
31
200
CONCLUSIONS
1200
2200
3200
4200
5200
bit rate
In this paper, an accurate weighted prediction parameters

estimation algorithm and an efficient and robust fade detection
algorithm were proposed. The algorithms use very simple
statistics with low delay, which is suitable for practical realtime encoding systems. Simulation results show accurate
detection results and significant compression efficiency gains.
Trailer
Low-Mot
High-Mot
WP withDC
Bitrate(%)
PSNR(dB)
-17.93
0.88
-20.02
1.12
-9.86
0.54
Proposed WP
Bitrate(%)
PSNR(dB)
-22.78
1.10
-34.35
1.85
-23.11
1.06
[2] X. Qian, G. Liu and R. Su, Effective Fades and Flashlight

Detection Based on Accumulating Histogram Differences, pp.
1245-1258, IEEE Transactions on CSVT, vol. 16, No. 10,
2006.
[3] J. Boyce, Weighted Prediction in the H.264/MPEG4 AVC
video coding standard, ISCAS, pp. 789-792, May 2004.
Fade Detection Results
[4]
JVT
Reference
http://bs.hhi.de/~suehring/download
2
Transition state
REFERENCES
[1] A. M. Alattar, Detecting Fade Regions in Uncompressed
Video Sequences, pp 3025-3028, ICASSP 1997.
Table 1: Performance comparison

Sequence
Figure 4: RD Performance for "High-Motion"
Software,
True Transition
Delay 2 Detection Error
Delay 3 Detection Error
0
1
11
16
21
26
-1
Picture nubmer
[5] P. Yin, A. Tourapis and J. Boyce, Localized Weighted

Prediction for Video Coding, pp.4365-4368, ISCAS, May
2005
[6] G. Bjontegaard, Calculation of average PSNR differences
between RD-Curves, document VCEG-M33, Mar01.
Figure 2: Fade Detection Results for Trailer
2839

Accurate Parameter Estimation and Efficient Fade Detection For Weighted Prediction in H 264 Video Compression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Accurate Parameter Estimation and Efficient Fade Detection For Weighted Prediction in H 264 Video Compression

Uploaded by

Copyright:

Available Formats

ACCURATE PARAMETER ESTIMATION AND EFFICIENT FADE DETECTION FOR

WEIGHTED PREDICITON IN H.264 VIDEO COMPRESSION

Motion compensation is a major tool to achieve compression

978-1-4244-1764-3/08/$25.00 2008 IEEE

observe enough statistics for an accurate detection. For

WEIGHTED PREDICTION OVERVIEW

Figure 1 shows the procedure of applying WP in a real-time

this current-reference pair, and statistics of current picture and

From the fade model, we can derive

Therefore we can derive the weight using

Since the solid color value C is generally unknown to the

D (t )  E (t ) 1 . When g is a solid color, and the

The mean and variance of the combined signal can be defined

and C are unknown to the

This section first describes the general mathematical model of

encoder. We have to estimate w(t ) and o(t ) with

Figure 1: Weighted Prediction Workflow Chart

In H.264, after the fade is detected and the WP is to be used,

Fading effects result to the lighting changes, which can be

variance values V F 2 (t ) . We propose to check both of these

For a linear fading model where D (t ) t / T , 'M (t ) is a

always less than zero, hence

It is obvious that for a fade out of f, r (t ) is always less than

Detect a consistent change of the luma variance

Three sequences were used in the simulation. The Trailer is a

sequences, both delay of 2 frames and 3 frames gave correct

Results for Trailer Sequence

Figure 3: RD Performance for Trailer

In this paper, an accurate weighted prediction parameters

[2] X. Qian, G. Liu and R. Su, Effective Fades and Flashlight

Fade Detection Results

Table 1: Performance comparison

Figure 4: RD Performance for "High-Motion"

[5] P. Yin, A. Tourapis and J. Boyce, Localized Weighted

Figure 2: Fade Detection Results for Trailer

You might also like

D (t ) E (t ) 1 . When g is a solid color, and the