You are on page 1of 13

Temporally Coherent Local Tone Mapping of HDR Video

Tunc Ozan Aydn Nikolce Stefanoski Simone Croci Markus Gross Aljoscha Smolic

Disney Research Zurich ETH Zurich

(a)

time 5
(b) HDR
Mean Brightness

4
3

Tone 2

Mapped (c) 1
0

Figure 1: Our method allows local tone mapping of HDR video where the scene illumination varies greatly over time, in this case more than
4 log10 units (c). Note that our method is temporally stable despite the complex motion of both the camera and the actor (b), and the resulting
tone mapped video shows strong local contrast (a). The plots show the logmean luminance of the HDR video and the mean pixel values of
the tone mapped video.

Abstract CR Categories: I.3.3 [Computer Graphics]: Picture/Image


GenerationDisplay Algorithms I.4.8 [Image Processing and
Recent subjective studies showed that current tone mapping oper- Computer Vision]: Scene AnalysisTime-varying Imagery
ators either produce disturbing temporal artifacts, or are limited
in their local contrast reproduction capability. We address both Keywords: Video Tone Mapping, Edge-Aware Video Filtering
of these issues and present an HDR video tone mapping opera-
tor that can greatly reduce the input dynamic range, while at the Links: DL PDF
same time preserving scene details without causing signicant vi-
sual artifacts. To achieve this, we revisit the commonly used spa- 1 Introduction
tial base-detail layer decomposition and extend it to the temporal
domain. We achieve high quality spatiotemporal edge-aware lter- New video technologies keep improving the quality of the view-
ing efciently by using a mathematically justied iterative approach ing experience: display resolutions are moving up from HD to 4K,
that approximates a global solution. Comparison with the state-of- frame rates in cinemas are increasing to 48fps and beyond, and
the-art, both qualitatively, and quantitatively through a controlled stereoscopic 3D is introducing depth as an additional dimension.
subjective experiment, clearly shows our methods advantages over While through these advances the delity of the content in terms
previous work. We present local tone mapping results on challeng- of spatial and temporal resolution and depth has been gradually in-
ing high resolution scenes with complex motion and varying illumi- creasing, the dynamic range aspect of video has received little atten-
nation. We also demonstrate our methods capability of preserving tion until recently. However, this can be expected to change quickly
scene details at user adjustable scales, and its advantages for low as modern high-end cameras (such as the Red Epic Dragon, Sony
light video sequences with signicant camera noise. F55 and F65, and ARRI Alexa XT) now can natively capture High
Dynamic Range (HDR) video up to 14 f-stops. Creatives are highly
interested in HDR video because it allows them to show more visual
detail and extend the limits of artistic expression. Consequently,
the entertainment industry is working towards HDR video pipelines
for delivering content to the end user, including related distribution
standards (MPEG and JPEG). However, the nal element of the
pipeline is slightly lagging behind: despite the existence of a small
number of products and impressive research prototypes, consumer
level HDR display technology is not yet on the horizon.
Still, in the absence of displays that are capable of fully reproducing
the captured dynamic range, tone mapping of HDR video provides
a means of visualization and artistic expression. More specically,
it may be desirable to reduce the dynamic range of captured HDR
content while both maintaining most of the visual details and not
hampering the picture quality by introducing visible artifacts. Re-
cent subjective experiments on the state-of-the-art in HDR video often easily be recognized due to their characteristic look with re-
tone mapping [Eilertsen et al. 2013; Petit and Mantiuk 2013] re- duced brightness differences between highlights and shadows, but
vealed that none of the current methods (including a camera re- greatly enhanced ne scale details (our teaser shows an example).
sponse curve) could achieve both goals at the same time.
It is worth noting that results of local and global TMOs can look
In this work we propose a new HDR video tone mapping operator quite different, and some people have strong aesthetic preferences
(TMO) to help close the gap between the captured dynamic range for one type of tone mapping over the other. In this work, even
and displayed dynamic range. We build upon prior work in image though we do not advocate any particular visual style, we focus on
tone mapping that utilize base and detail layers. Such a decompo- local tone mapping because it is currently unresolved for video, and
sition allows compressing the base layers dynamic range while the is the more general problem among the two.
local contrast remains intact in the detail layer. Different from prior
work, our decomposition utilizes spatiotemporal ltering through One way of achieving local image tone mapping is by selectively
per-pixel motion paths. This way, our method enforces tempo- compressing the input HDR image in the gradient domain, and re-
ral coherence and signicantly reduces temporal artifacts such as constructing a reduced dynamic range image from the modied gra-
brightness ickering and camera noise without introducing ghost- dients [Fattal et al. 2002; Mantiuk et al. 2006]. Another approach
ing. As a result, we enable local HDR video tone mapping that can is to decompose the HDR image into a base and a detail layer,
be art-directed without introducing visually signicant artifacts. which respectively contain coarse and ne scale contrast [Durand
and Dorsey 2002]. The dynamic range reduction is achieved by re-
The two main contributions of our work are the following: combining the detail layer with a tone compressed base layer. The
decomposition is often done using edge-aware lters in order to re-
A temporally coherent and local video tone mapping method duce the effect of the well-known halo artifacts, which appear near
that can maintain a high level of local contrast with fewer tem- strong image edges if the base layer compression is too strong. In
poral artifacts compared to the state-of-the-art (validated by a principle, our method is also based on a base-detail layer decompo-
subjective study). sition using edge-aware lters (although we compute these layers
A practical and efciently parallelizable ltering approach in a novel way). Therefore we review edge-aware ltering in the
specically designed for tone mapping, that reduces halo arti- following paragraph from a tone mapping perspective.
facts by approximating a global solution through iterative ap-
plication (with a formal analysis of the lters halo reduction Edge-Aware Filtering is a fundamental building block for sev-
property). eral computer graphics applications including tone mapping. Mi-
lanfar [2013] gives an excellent overview of the many ltering ap-
We show our methods advantages both qualitatively through ex-
proaches. HDR image tone mapping has been a popular applica-
amples (Sections 5.1 and 5.4), and quantitatively through a con-
tion among the edge-aware ltering methods published since the
trolled subjective study (Section 5.2). We also demonstrate that our
establishment of edge-aware ltering based tone mapping [Durand
method allows creative control over spatial and temporal contrast
and Dorsey 2002]. However, the treatment of tone mapping among
(Section 5.5) through a simple user interface (Section 5.3). Finally,
these works has (understandably) been often brief, since the main
due to temporal ltering we show that our method works especially
contributions of these works lie elsewhere. In the absence of ex-
well for low light shots with signicant camera noise (Section 5.6).
tensive comparisons, it is difcult to reach any conclusions on how
Next section continues with a brief discussion of the related work.
well various edge-aware lters perform on HDR tone mapping.

2 Background Nevertheless, if the base layer is compressed strongly during tone


mapping, the bilateral lter [Tomasi and Manduchi 1998] has been
HDR Image Tone Mapping has been extensively studied in com- known to produce visible halo and ringing artifacts in the resulting
puter graphics. Here we give a brief introduction and refer the image. The Weighted Least Squares (WLS) lter [Farbman et al.
reader to Reinhard et al. [2010] for a comprehensive overview of 2008] has the advantage of preventing halo artifacts by minimizing
the eld. Since the early days of photography, reproducing the dy- a function whose data term penalizes the distance between the orig-
namic range of natural scenes on chemically limited negative has inal and ltered image. On the downside, this approach requires
been a big challenge. This problem has been reduced by adopting solving large numerically-challenging linear systems [Fattal 2009],
an S-shaped tone curve that gradually compresses highlights and and involves the complex machinery of a multi-resolution precon-
shadows [Mantiuk et al. 2008], as well as utilizing techniques such ditioned conjugate gradient solver [Paris et al. 2011]. Still, WLS
as dodging and burning during the photographic development pro- has been used as the reference method in follow-up work due to the
cess to locally adjust the image exposure [Reinhard et al. 2002]. high quality of its results [Fattal 2009; Gastal and Oliveira 2011].
The disconnect between the technical aspects of tone reproduction The computation of the Edge-avoiding Wavelets [Fattal 2009] is
and the artistic concerns of photography has been addressed by the much simpler, however they have been shown to suffer from alias-
Zone System [Adams 1981], which has been widely used since its ing and generating irregular edges [Paris et al. 2011]. The Local
inception [Reinhard et al. 2002]. Laplacian Filter is capable of producing high quality results, how-
ever its authors point out the running time as the methods main
Reinhard et al. [2002] attempted to faithfully model this photo- shortcoming [Paris et al. 2011]. While recently a faster implemen-
graphic tone reproduction process for digital HDR images. Their tation became available [Aubry et al. 2014], the temporal stability
method is known for its tendency to preserve the natural look of the aspect is still an open question, and it is not clear how to extend
original scene. Another class of TMOs [Ferwerda et al. 1996; Pat- the method to the temporal domain. A number of other recent l-
tanaik et al. 2000; Reinhard and Devlin 2005] takes the naturalness tering techniques [Gastal and Oliveira 2011; He et al. 2013] have
aspect one step further by modeling various aspects of the human been shown to produce plausible tone mapping results in a limited
visual system and aiming to reproduce the scene from the eyes of a number of images and congurations, but have not been thoroughly
hypothetical human observer. tested for halos and other tone mapping artifacts. Likewise, recent
work involving spatiotemporal ltering has neither been designed
A number of operators take an alternative approach where they fo- nor tested for video tone mapping [Lang et al. 2012; Ye et al. 2014].
cus on reproducing the original scene by preserving its local con-
trast. Images produced by such local (spatially-varying) TMOs can In this work we propose a practical and parallelizable lter designed
specically for tone video mapping. The key properties of our lter 3 Local HDR Video Tone Mapping
are its natural extension to the temporal domain and the reduction of
halo artifacts through iterative application (Section 4). Importantly, In this section we give an overview on the various aspects of the
we also present a formal discussion of our methods halo reduction HDR video tone mapping problem and describe the main computa-
property and its theoretical similarities to WLS (Section 4.4). tional steps of our method.

3.1 Visual Trade-offs in Local Video Tone Mapping


Video Tone Mapping has been a far less active eld than image
tone mapping. A major obstacle for video tone mapping research Since HDR tone mapping often signicantly reduces the input dy-
has been the absence of high quality HDR content available to the namic range, some of the scene contrast is inevitably lost during the
research community, which has been partially addressed by the re- process. As such, local image tone mapping involves a visual trade-
cent developments in high-end cameras and other experimental sys- off between ne and coarse scale contrast. If coarse scale contrast is
tems [Tocci et al. 2011]. emphasized, the luminance difference between large image regions,
as well as highlights and shadows become more pronounced at the
A number of TMOs comprise temporal components that allow cost of the visibility of the ne scale details. If, on the other hand,
them to process HDR video. Among them, the global opera- ne scale contrast is emphasized, the relative reduction of coarse
tors [Ferwerda et al. 1996; Pattanaik et al. 2000; Irawan et al. 2005; scale contrast often results in a attening effect.
Van Hateren 2006; Mantiuk et al. 2008; Boitard et al. 2012] includ- Additionally in local video tone mapping, another similar trade-off
ing the S-shaped camera response curve generally produce results exists between spatial and temporal contrast. Consider an HDR
with good temporal coherency but low spatial contrast. On the other video clip that transitions from bright sunlight to a darker hallway,
hand, the local operators [Ledda et al. 2004; Bennett and McMil- and therefore has strong temporal contrast (Figure 2). One strategy
lan 2005; Benoit et al. 2009; Reinhard et al. 2012] often maintain for tone mapping such a clip is to utilize the full dynamic range
high contrast at the cost of more temporal artifacts. The advan- independently at every video frame, such that both frame (a) and
tages and shortcoming of these operators have been discussed and frame (b) have sufcient brightness to reproduce most of the scene
subjectively evaluated by Eilertsen et al. [2013] (we present a com- details. This way one can maintain the tone mapped videos spa-
plementary subjective study in Section 5.2). tial contrast, with the side effect of reducing the sensation that the
hallway is much darker than the outside. Another (complementary)
Other work in video tone mapping [Kang et al. 2003; Ramsey et al. strategy for tone mapping the same video is maintaining a certain
2004; Kiser et al. 2012] focused on extending the photographic amount of the temporal contrast at the cost of less visible spatial
TMOs tone curve [Reinhard et al. 2002] with temporal ltering for details during the hallway part.
computing a temporally coherent key value. For such global TMOs,
the temporal coherence of the tone mapping results is shown to be
enhanced by simply ensuring the temporal smoothness in the TMO
(38) (a)
parameter space (they could also benet from automatic icker de-
tection [Guthier et al. 2011]). However, like other global TMOs, 1000
Frame (38)
these methods are inherently limited in local contrast reproduc-
Log Luminance

tion. Recent work by Boitard et al. [2014b] extends their previous


work [Boitard et al. 2012] on temporally coherent global tone map- (b)
ping. The extension consists of segmenting each video frame (typ-
ically to 2 4 segments) and applying a global tone curve to each Frame (265) - low temp. contrast
segment individually. This way they introduce local adaptation at 100

a segment level at the cost of more complex processing involving (265)


HDR LogMean (c)
video segmentation. While they claim local brightness coherency,
the effectiveness of their method remains unclear since their user 0 100 200 300
Frame (265) - high temp. contrast
Frame Index
study measures subjective preference rather than assessing tempo-
ral artifacts and/or local contrast reproduction. It is important to
Figure 2: The visual trade-off between emphasizing spatial con-
note that the parameter smoothing approach employed in the afore-
trast (a, b) and temporal contrast (a, c). While in both settings
mentioned methods is not an option for local tone mapping, where
frame 38 remains the same (a), frame 265 can be adjusted to either
temporal coherence has to be achieved in the image domain. Such
maintain spatial (b) or temporal contrast (c).
a temporal extension has been proposed for the gradient domain
image TMO [Lee and Kim 2007] that utilizes a block matching al-
gorithm between two consecutive frames. However, the effective-
In our view, the aforementioned trade-offs are context dependent
ness of the extension is limited to the highest frequency temporal
and ultimately artistic decisions, and no tone mapping strategy is
artifacts due to the two-frames temporal neighborhood. Also, the
inherently better than others in all possible scenarios. It is there-
Poisson reconstruction step of gradient domain operator is costly at
fore desirable that the tone mapping algorithm offers the required
high resolutions.
artistic freedom by providing explicit control over spatial contrast
at different scales, as well as temporal contrast. Importantly, the
Subjective studies and surveys that evaluated the state-of-the-art in tone mapping method should maintain a consistent level of image
HDR video tone mapping [Eilertsen et al. 2013; Petit and Mantiuk quality, since the artistic freedom makes little practical sense if a
2013] came to the conclusion that the problem is far from being certain set of edits create visually noticeable artifacts. In that sense,
solved. As for future work, temporal models for avoiding artifacts, video tone mapping is especially challenging because the tempo-
and local processing for maintaining a high level of detail and con- ral dimension emerges as a new source of artifacts. In fact, high
trast have been pointed out as two venues for improvement. In this temporal frequency artifacts such as brightness ickering, camera
paper we address both challenges and propose a novel method for noise, as well as any temporal inconsistencies of the TMOs are im-
temporally coherent local HDR video tone mapping. mediately noticeable because of the human visual systems proper-
ties. In particular, Eilertsen et al. [2013] noted that even very small halo artifacts commonly observed in such frameworks through our
amounts of ghosting and ickering in the tone mapped HDR videos shift-variant lter that approximates a global solution.
are unacceptable in practice.
(a) HDR Frame (b) Temporal Neighborhood
3.2 Tone Mapping Pipeline

Recent subjective studies revealed that local TMOs (which usu-


ally maintain local contrast) are temporally unstable [Eilertsen et al. (c) Spatial Permeability Map Computation
2013]. Similarly, navely applying image TMOs to individual video
frames has been found to produce strong temporal artifacts [Eilert- (g) Spatiotemporally
sen et al. 2013; Boitard et al. 2014a]. Global TMOs with relatively Filtered HDR Frame

simple processing steps are found to be less susceptible to temporal


(d) Edge-aware Spatial Filtering
artifacts, however they lack the contrast reproduction capabilities of
local operators. The temporal instability of local TMOs underlines
the necessity of temporal ltering for local video tone mapping.

(e) Warping
INPUT

HDR Frame Forward Optical Flow Backward Optical Flow

Figure 4: Spatiotemporal ltering process of a single HDR frame


Spatiotemporal Temporal Tone within a temporal window. Brighter colors in (c) and (f) indicate
Filtering Filtering Mapping higher permeability. Steps (c, d) comprise the spatial ltering,
whereas (e, f) comprise the temporal ltering.

The spatiotemporal ltering is performed on a temporal neighbor-


hood and uses forward and backward optical ow estimates to
Base Layer Detail Layer Tone Mapped Frame warp each frames temporal neighborhood, such that their pixels
OUTPUT are aligned on a straight line in time dimension. The temporal l-
tering proceeds only if the temporal alignment is predicted to be
Figure 3: Major processing steps of our method. Refer to the text reliable. This way temporal consistency is enhanced while mini-
for details. mizing ghosting. While our method is not bound to a specic op-
tical ow algorithm, in our implementation we use Zimmer et al.s
approach [2011]. We mostly used the methods standard parameters
Since ltering in the temporal domain is notably more challenging and spent no special effort for ne-tuning the optical ow results,
than in the spatial domain, one can assume that the scene is static but rather focused on developing measures that detect optical ow
and lter through a straight line in the temporal dimension [Ben- errors (discussed in Section 4.3).
nett and McMillan 2005]. However this approach generates strong
ghosting artifacts as in practice the static scene assumption rarely Our methods spatiotemporal ltering process is depicted in Fig-
holds. In contrast, Lang et al. [2012] propose a general purpose ap- ure 4. For a single input HDR frame It in the log domain (a), we
proach that lters through each pixels path over time. The down- consider a number of consecutive past and future frames as the tem-
side of their method is that the path computation is performed glob- poral neighborhood (b). The optimal size of the temporal neighbor-
ally over the whole video sequence, which therefore needs to be hood depends on the content and ow quality, however we empiri-
kept in memory. As such, this approach becomes infeasible for cally found that 10 frames in each direction works well in our im-
longer video sequences at high resolutions. Similarly, for such se- plementation. Next, for each frame in the temporal neighborhood
quences Ye et al.s [2014] method is prohibitively expensive as it we compute a permeability map (c) which controls the spatial dif-
is reported to require over a minute to process a 800 600 frame fusion between neighboring pixels. The permeability map is used
(moreover this method is not designed for tone mapping). as an input to the spatial ltering process (d) and prevents ltering
across strong image edges. Specically, the spatial part of our l-
Our approach, similar to Durand and Dorsey [2002] and many oth- tering involves iterative application of the shift-variant spatial lter
(k) (0)
ers, uses the idea of decomposing each frame into a base and detail hs to the frame Jt , where k denotes the iteration, and Jt = It .
layer by utilizing an edge-aware lter, but with the key difference th
The (k + 1) iteration result is computed as:
of ltering in the temporal dimension by utilizing optical ow. The
main processing steps of our method are depicted in Figure 3. In our  
(k+1) (k) (k)
approach, the base layer is obtained by edge-aware spatiotemporal Jt = Jt hs + It J t , (1)
ltering of an HDR video cube consisting of a center frame and a
number of its temporal neighbors. This way, we reduce the effect where is a spatially and temporally varying weighting factor that
of illumination changes over time and enforce temporal coherence introduces a bias towards the original image (important for halo
in the base layer. The computation of the detail layer also involves reduction), and denotes an ordinary shift-variant convolution with
ltering the input HDR sequence, but only in the temporal dimen- lter hs . Our ltering approach is discussed in detail in Section 4.
sion. This way we effectively reduce temporal artifacts with low
amplitude and high temporal frequency (such as camera noise and Next, as the rst step of the temporal part of the process in Figure 4,
brightness icker) without sacricing spatial resolution. While still we warp (e) the frames in the temporal neighborhood such that all
retaining a contrast reproduction capability comparable to Durand their pixels are aligned with the corresponding pixels of the input
and Dorseys image tone mapping framework, we also suppress HDR frame It . The warping process uses forward and backward
optical ow1 . For each pixel of a frame Jt with t > t we compute the bias parameter)2 based on personal aesthetic judgement. That
the cumulated forward vector said, other tone curves can be easily integrated to our method.

t 1
dtt := d ( +1) (2) 4 Edge-Aware Video Filtering
=t
In the previous section, we presented our tone mapping pipeline
based on optical ow vectors d ( +1) between neighboring and discussed the steps involved in the spatiotemporal ltering pro-
frames. Next, we warp all pixels of Jt by shifting them in the op- cess, while treating the underlying lter formulation as a black box.
posite direction dtt which temporally aligns the pixels of Jt In this section we start by introducing a class of iterative lter-
with those in Jt . This process is repeated in 2D for all pixels of all ing methods and analyse them from a tone mapping perspective
future frames that are in the temporal neighborhood of Jt . An anal- (Section 4.1). We show that this practical formulation can easily
ogous warping process is also repeated in the past direction using be adapted to perform the spatial (Section 4.2) and temporal (Sec-
the backward optical ow. tion 4.3) ltering steps we described earlier in Section 3. We also
discuss our lters relation to the WLS lter (Section 4.4), which
An inevitable problem with the warping process is the inaccuracies is often used as the reference against which new methods are com-
due to the errors and limits of the optical ow. In order to prevent vi- pared in terms of image quality. The main advantage of our paral-
sual artifacts we compute a ow condence map (f), which is anal- lelizable lter with sub-linear runtime is that it generates visually
ogous to the permeability map in the sense that it prevents the lter similar results to WLS even with a small number of iterations3 , and
support from growing across certain pixels. Differently, the ow that it can be easily extended to the temporal dimension.
condence map is used in temporal ltering, and is computed by
combining a photo constancy measure and a ow derivative mea-
sure (discussed later in Section 4.3). Finally, the spatiotemporally 4.1 Filter Formulation
ltered HDR frame (g) is obtained by applying the same lter that
performs the spatial ltering in the temporal domain, by simply us- A number of anisotropic smoothing lters that are used for image
ing the ow condence (f) as the permeability map. tone mapping [Farbman et al. 2008; Gastal and Oliveira 2011; Du-
rand and Dorsey 2002] can be expressed as a class of iterative l-
We compute our base layer by applying all the steps depicted in tering methods, which can be represented by the iteration shown in
Figure 4 to the log luminance of the input HDR video. Addition- Equation 1. This iteration can be expressed in a slightly different
ally, we also apply the temporal ltering steps (e-f) independently form as follows:
to all color channels of the input HDR video, which is then used   
for computing the detail layer (refer to Figure 3) similarly in the Jp(k+1) := hpq Jq(k) + hpp Ip Jp(k) , (6)
log domain. Formally, we express the temporal ltering combined q
with the aforementioned warping process as a simple shift-variant
temporal convolution: where Ip is the input image intensity at pixel position p at frame t
  (we omit the frame indices in this and the following equations for
  (k)
I  ht t := It ht , (3) brevity), Jp is the corresponding diffusion result after k iterations
t (0)
with Jp = Ip , matrix H := {hpq } is a row-stochastic matrix,
where It (t , p) = I(t , p + dtt ), i.e. It is an image sequence, in
 
 all matrix coefcients hpq are between 0 and 1, inclusive, with
i.e.
which each frame t is warped to t using the ow eld dtt . The  q hpq = 1 p, and the set contains all pixel positions of a
operator denotes the shift-variant convolution with image warping. frame. The iteration consists of two parts, a sum that computes a
The spatiotemporal ltering I  hst differs from the temporal lter- weighted average at pixel position p which we denote as diffusion
ing in Equation 3, in that the warped image I is computed from the estimate, and a delity term [Mrazek et al. 2004; Nordstrom 1989]
spatially ltered image J instead of I: whose impact is controlled by a parameter and introduces a bias
towards the input image.
   
I  hst t := Jt ht , (4) The delity term signicantly contributes to the reduction of halo
t
artifacts in tone mapping applications. To provide some intuition
where Jt (t , p) = J (t , p + dtt ). Having described and for- for this property, let us introduce the intermediate term diffusion
mally expressed the temporal and spatiotemporal ltering opera- strength at a pixel p, and dene it as p := 1/hpp . If the dif-
(k+1)
tions, we can compute the base (B) and detail (D) layers as follows: fusion strength p is large (hpp  1), then Jp is computed
(k)
    by strong contributions from pixels Jq with q = p. However,
Bt = I  hst t and Dt = I  ht t Bt . (5) if the diffusion strength p is small (hpp 1), then Jp
(k+1)

(k)
(1 )Jp + Ip , which means that the result does not depend on
Note that for computing Dt in color images, Bt is subtracted sep-
pixel positions other than p and it represents a weighted average. In
arately from each color channel of the temporally ltered source
particular, if the diffusion strength is small and = 1, then
HDR video. Once we compute the base and detail layers, we can
reduce the dynamic range of the input video without affecting the
Jp(k+1) Ip . (7)
scene details at selected scales. To achieve this, we rst apply a
compressive tone curve to the base layer, and then recombine both In other words, a parameter = 1 leads to a strong shift towards
layers. A simple way of compressing the base layer proposed by the original image in areas with low diffusion strength.
Durand and Dorsey. [2002] involves multiplying the log luminance
of the base layer by a compression factor. On the other hand, us- As discussed earlier, local tone mapping methods often suffer from
ing tone curves such as Drago et al. [2003] or Reinhard et al. [2002] halo artifacts that appear near strong image edges, especially if the
gives greater control over the nal look. In this work we used either input dynamic range is strongly compressed. The property shown in
a compression factor c, or Drago et al.s tone curve (controlled by
2 Formula presented in supplemental material.
1 See supplemental material for an illustration of the warping process. 3 Refer to the supplemental material for a visual comparison.
Stimulus and base layers 4.2 Spatial ltering

In this section, we discuss how to perform the spatial ltering step


described in Section 3 using the lter formulation from Equation 6.
stimulus More specically, while considering our conclusions from the pre-
#iter=1
#iter=50, =1 vious section, we construct an iteration matrix H that leads to the
#iter=50, =0.25 smoothing results required for tone mapping after a low number of
#iter=50, =0 iterations. Thus, H has to perform a strong edge-aware diffusion per
(a) iteration, while having a low diffusion strength close to signicant
Detail layers Tone mappings image edges, and should be 1. We observe that a strong edge-
aware diffusion can be achieved even with a single iteration, if all
pixels within a connected region of similar color contribute to the
diffusion result at all pixels of that region [Cigla and Alatan 2013;
Lang et al. 2012; Gastal and Oliveira 2011].
We derive H using permeability weights (mentioned earlier in Fig-
ure 4-c). We dene the permeability between a pixel p = (px , py )
(b) (c)
and its right neighbor p  = (px + 1, py ) as a variant of the
Lorentzian edge-stopping function [Durand and Dorsey 2002]
Figure 5: Base layers computed from a stimulus signal (a). The 

Ip Ip  1
stimulus signal is composed by a step function that is modulated p := 1 +
. (8)
with a sine wave of increasing frequency (a chirp signal) and Gaus-
sian noise. In (b) and (c), corresponding detail layers and tone
mapping results are shown, respectively. Tone mapping is per- The permeability between p and its right neighbor pixel is close to 0
formed by compressing the base layer from (a) and keeping the cor- if the absolute value of the corresponding color difference is high,
responding details from (b). After one iteration, only ne scale de- and it is 1 if the difference is low. The parameter indicates the
tails (the Gaussian noise) are present in the detail layer (this result point of transition from large to low permeability, while controls
is identical for all ), while after 50 iterations also medium scale the slope of the transition around . In this work we used = 2
details (from the sinusoidal wave) are added to the detail layer. and a in the range 0.11. Similar to [Cigla and Alatan 2013], we
Note that = 1 signicantly reduces the halo artifact, i.e. the use these permeability weights to dene the permeability between
overshooting on both sides of the step edge in (c). two arbitrary pixels p and q as


1
 : p=q

qx 1
n=px (n,py ) : px < qx , py = qy
pq := px 1 . (9)
Equation 7 is therefore highly desirable near strong edges, because
n=qx (n,py ) : px > qx , py = qy
it signicantly reduces halo artifacts by shifting the ltered result

0 : otherwise
towards the original input. Based on these observations we con-
clude that our instantiation of the lter shown in Equation 6 should Thus, the permeability between two pixels p and q of the same row
use = 1 and it should have a low diffusion strength in areas close is large if the permeability between each pair of neighboring pixels
to strong image edges. on the path between p and q is large. In particular, the permeability
between p and q is signicantly reduced, if there is a pair of neigh-
Figure 5 shows ltering results for different settings of on a repre- boring pixels along the path with low permeability (e.g. a strong
sentative 1D example (we use our instantiation of matrix H that will vertical edge between pixels p and q). The permeability between
be introduced in the following section). Note that in the case = 0, pixels of different rows is dened to be zero. To obtain a stochastic
strong edge-aware smoothing creates signicant halo artifacts in the matrix H := {hpq }, we normalize these weights according to
tone mapping result. These artifacts are substantially reduced in the
case = 1, since the ltering result is biased towards the stimulus pq
hpq := w , (10)
signal in the area close to the step edge. n=1 (n,py ),q

where w is the image width. Note that the diffusion strength


The bilateral lter [Durand and Dorsey 2002] and the domain trans-
1/hpp tends to become smaller in the neighborhood of strong ver-
form [Gastal and Oliveira 2011] correspond to an instantiation of
tical edges since the permeability to all pixels on the other side of
Equation 6 with = 0. However, the WLS lter [Farbman et al.
the edge is close to zero. The iteration matrix H is derived from
2008] uses = 1 as will be shown in Section 4.4.
weights describing inter-pixel permeability in the horizontal direc-
tion (Equation 8). Analogously, we also derive another matrix for
Note that the result of each iteration step in Equation 6 can be ef- permeability in the vertical direction. We denote the corresponding
(k+1)
ciently computed in parallel since every Jp can be computed matrices Hh and Hv .
(k+1)
independently of other Jq . Such a parallelized implementation Spatial ltering is conducted according to Equation 6 by alternating
leads to an overall runtime of O(k||) where || is the total num- between Hh and Hv for each k and using = 1. After each iteration,
ber of pixels and k is the number of iterations. Even faster runtimes the direction of diffusion is changed which allows a fast diffusion of
are possible depending on the structure of H, which we discuss in intensities in connected regions of similar color [Lang et al. 2012;
the following section. Furthermore, Perron-Frobenius and Markov Gastal and Oliveira 2011]. Figure 6 shows a spatial ltering result
chain theory [Meyer 2001] show that the iteration converges to a and a visualization of the corresponding permeability map.
non-trivial solution for 0 < 1 if all hpp = 0. This allows
us to approximate the convergence result by stopping the iteration Due to the structure of the iteration matrices, diffusion is conducted
after a reasonable number of iteration steps, which depends on the either only in horizontal or only in vertical direction. Thus the over-
particular instantiation of matrix H in Equation 6. all runtime (on a parallel architecture) is O(kw + kh) where w is
... ...
t-10 t-5 t

HDR Frame Permeability Map Spatially Filtered Frame


... ...
Flow Gradient
Figure 6: Edge-aware spatial ltering using permeability map. t t+5 t+10 Measure
Brighter colors in the permeability map indicates higher perme-
ability. Figure 8: Flow gradient measure penalizes ow vectors with high
temporal gradient, providing an additional means to prevent warp-
ing artifacts.
the width and h is the height of image I and k is the number of iter-
ations. Thus, in cases where the total number of iterations is signi-
cantly smaller than (wh)/(w + h), the runtime becomes sub-linear Both the photo-constancy and the ow gradient measures are nor-
with respect to the total number of image pixels. malized by Equation 8, and the nal permeability is obtained by
multiplying the two. This way, the photo-constancy constraint can
In practice, we observed that after a small number of iterations be relaxed which allows regions with temporal artifacts to be per-
the ltering results became visually equivalent4 . Our unoptimized meable. The parameter values of both constraints depend on the tol-
Matlab implementation (using the Parallel Computing Toolbox) re- erance to warping errors, desired level of temporal smoothness and
quires 19 ms for computing the permeability map, and 12 ms for the quality of ow. In our implementation, we empirically found
a ltering iteration (comprising both vertical and horizontal steps) that setting to 0.1 for photo-constancy, and to 2 for ow gradi-
for a megapixel color image on a current consumer PC. ent measures, while keeping = 2 results in sufcient temporal
ltering while still suppressing warping artifacts.
4.3 Temporal ltering
4.4 Relation to the WLS lter
An important advantage of our lter formulation from Equation 6
is that it can easily be adapted to also perform the temporal ltering Finally, we investigate the theoretical relation between the WLS l-
step in our tone mapping pipeline discussed in Section 3. Differ- ter, which is known to create anisotropic smoothing results of high
ently in temporal ltering, diffusion is conducted along the tempo- quality [Fattal 2009; Gastal and Oliveira 2011], and our permeabil-
ral dimension over the warped temporal neighborhood (Figure 4-e) ity weights-based smoothing method. WLS smoothing is uniquely
with only one iteration. The permeability weights are computed in dened as the solution of the following linear system
the same way as in the spatial ltering using the Lorentzian edge-

stopping function variant (Equation 8). While their computation is Ip = J p + apq (Jp Jq ) , (11)
qN4 (p)
t-10 t-5 t
... ... ... where Ip and Jp are pixel values of the input image and the
smoothed output image, respectively, N4 (p) is the 4-neighborhood
around pixel p, and apq are the smoothness weights as dened in
t+5 t+10
Photo [Lischinski et al. 2006]
... Constancy
Measure
apq := . (12)
|Ip Iq | +
Figure 7: Photo-constancy measure limits permeability near warp-
ing errors, but still allows ltering over temporal paths with small Apart from differences in notation, this is the same equation sys-
color differences. tem as shown in the original article [Farbman et al. 2008]. The
authors solve the equation system using a multi-resolution precon-
ditioned conjugate gradient solver, but they also indicate that the
the same as in the spatial domain, the interpretation of the perme- same solution can be obtained iteratively with a Jacobi iteration.
ability weights is slightly different in the temporal domain. In the The corresponding Jacobi iteration is [Meyer 2001]
temporal domain the permeability is additionally affected by the
warping errors within the temporal neighborhood. The direct ex-  (k)
Ip + qN4 (p) apq Jq
tension of the permeability weights to the temporal domain, where Jp(k+1) =  . (13)
the weights are obtained by applying Equation 8 to the color dif- 1+ qN4 (p) apq
ferences between consecutive frames, can therefore interpreted as a
photo-constancy measure (Figure 7). The photo-constancy measure This iteration is an instance of Equation 6 where is 1 and the
limits the permeability at strong temporal edges, as well as regions iteration matrix H has the coefcients
that are erroneously warped due to incorrect optical ow.   

a / 1 + a : q N4 (p)
While we observed that the photo-constancy measure can be tuned pq
 r N4 (p) pr

hpq :=  . (14)
to stop temporal ltering at most warping errors, we found that 1/ 1 + a pr : q =p


r N4 (p)
such a conservative conguration also stops at the temporal arti- 0 : otherwise
facts, which defeats the purpose of the temporal ltering. Our so-
lution is to introduce another measure that penalizes ow vectors
with high gradients (Figure 8), as it is a good indication of com- Hence, WLS uses a bias term with = 1 which reduces halo ar-
plex motion and the ow estimation tends to be erroneous in such tifacts, and it smoothens the input image by iteratively diffusing
regions [Mac Aodha et al. 2013]. intensities in a small 4-neighborhood. Our method uses also a bias
term with = 1, but in contrast to WLS it uses a signicantly larger
4A visual comparison is presented in supplemental material. diffusion range per iteration which is controlled by permeability
(a) Retina Model (b) Color Appearance (c) Virtual Exposures
HDR Tone Mapped HDR Tone Mapped HDR Tone Mapped

0 100 200 300 0 100 200 300 0 100 200 300

(d) Local Adaptation (e) Our Method (f ) Bilateral TMO


HDR Tone Mapped HDR Tone Mapped HDR Tone Mapped

0 100 200 300 0 100 200 300 0 100 200 300

(g) Temporal Coherence (h) Display Adaptive (i) Camera TMO


HDR Tone Mapped HDR Tone Mapped HDR Tone Mapped

0 100 200 300 0 100 200 300 0 100 200 300

Figure 9: Comparison of our method with current TMOs capable of processing videos. For each method, we show two representative frames
and plot the mean pixel values (tone mapped) and logmean luminance (HDR) over time. Note that the HDR plots are shifted and compressed
by an exponent for presentation.

weights. In our method, by alternating the direction of diffusion curve. When we desired a stronger dynamic range compression, we
between horizontal and vertical directions with each iteration, pix- utilized the log curve with the compression factor c = [0.2, 0.4].
els are diffused in potentially large image regions after only a few We also conservatively set the number of spatial ltering iterations
iterations. However, although both smoothing methods converge to 20. (Performance benchmarks have been obtained with these
against mathematically different solutions, ltering results are vi- settings and temporal neighborhood size 21.)
sually similar in practice4 .
In the remainder of this section, we rst qualitatively compare our
methods results on the publicly available HDR video data provided
5 Results by Eilertsen et al. [2013] (Section 5.1). On the same public data
set we also performed a controlled subjective study, which vali-
Our video tone mapping method was developed by necessity for a dates our initial claim that our method maintains a high level of
lm production. Similar to Eilertsen et al. [2013], our experience local contrast with fewer temporal artifacts compared to the state-
with the state-of-the-art in video tone mapping yielded unsatisfac- of-the-art (Section 5.2). After briey discussing user interaction
tory results. Our newly developed method has been successfully (Section 5.3), we present our main results obtained from process-
used in a professional post-production workow at FullHD and ing HDR footage shot with an Arri Alexa XT camera at FullHD or
2880 1620 resolutions. Excluding I/O, our unoptimized Mat- 2880 1620 resolutions5 (Section 5.4). Finally we present exam-
lab implementation of the full tone mapping pipeline requires on ples on how our method allows control over spatial and temporal
average 3.445 seconds to process a color frame at FullHD resolu- contrast (Section 5.5), and demonstrate the advantage of using our
tion, and 6.465 seconds at 2880 1620 resolution on a current method in low light HDR sequences (Section 5.6).
desktop computer. The computation time of our method scales ap-
proximately linearly with the resolution of the input HDR video. 5.1 Comparison with Other Methods

The results in this section have been generated by using the default In Figure 9 we show a qualitative comparison between a number
parameter values presented in Sections 3 and 4, with the excep- of current tone mapping operators on the publicly available Hall-
tions of s (that controls spatial permeability) and the tone curve way [Eilertsen et al. 2013; Kronander et al. 2013a; Kronander et al.
parameters (either the compression factor c or the bias parameter 2013b]. In addition to the provided tone mapping results, we gen-
of the adaptive logarithmic tone curve, depending on which method erated another sequence by running the Bilateral TMO using stan-
was used). Since the spatial permeability and tone curve parame- dard parameters. Presenting a concise and meaningful visual com-
ters mainly govern the aesthetics of the tone mapped content, their parison between multiple video TMOs is challenging for various
values can be set up to personal taste. For generating the results in reasons. In Figure 9, we show for each video two representative
this section and the supplemental video we used s = [0.1, 1] and
bias = [0.2, 0.5] whenever we used the adaptive logarithmic tone 5 Refer to the supplemental video for more results.
frames to demonstrate its ability to show local contrast, as well as a
plot of the mean brightness of both the HDR and the tone mapped
sequences over time, which is helpful for investigating temporal
coherence.
Ramsey04 + Boitard14 Ours Gastal11 + Boitard14
The brightness ickering problems of local methods Retina

Mean pixel values


Model [Benoit et al. 2009] (a), Color Appearance [Reinhard et al. 0. 6
2012] (b) and Virtual Exposures [Bennett and McMillan 2005]
0. 4
(c) can be observed from their mean brightness plots. The Lo-
cal Adaptation [Ledda et al. 2004] (d) operator performs better in 0. 2
terms of temporal coherence due to its temporal ltering. However,
since the temporal ltering does not utilize motion paths, its results 0 50 100 150 200 250 300
Frame Index
show strong ghosting artifacts (See the wall at the bottom image
in Figure 9-d). Consistent with the observation that local image
Figure 10: Comparison of our method with Ramsey et al. [2004]
tone mapping operators generate strong temporal artifacts [Eilert-
and Gastal et al. [2011] (recursive ltering version) combined
sen et al. 2013], the mean brightness plot of the Bilateral TMO [Du-
with Boitard et al.s [2014] segmentation based temporal coherency
rand and Dorsey 2002] shows notable uctuations over time. The
method (both provided by Ronan Boitard). The dashed lines denote
mean brightness plots of the remaining operators Temporal Coher-
the min and max of each frames averaged pixel values over the test
ence [Boitard et al. 2012], Display Adaptive [Mantiuk et al. 2008]
sequence. Our method uses the display dynamic range more ef-
and Camera TMO [Eilertsen et al. 2013] suggest less temporal arti-
ciently and causes less temporal artifacts. Note also that Gastal et
facts. The downside of these operators is their limited local contrast
al. [2011] with Boitard et al. [2014] reverses the temporal contrast
reproduction capability due to their spatially-invariant processing.
of the sequence.
On the other hand, we congured our TMO to emphasize local spa-
tial contrast while also maintaining the temporal contrast between
the beginning and the end of the sequence. Our tone mapping result
remains temporally coherent similar to the global operators even 2002], the Gradient Domain TMO [Fattal et al. 2002] and the WLS
though we strongly amplify the local contrast of the sequence (e). TMO [Farbman et al. 2008] were generated from the source HDR
videos using default parameters. Also, the results of our method
We also compared our technique with Boitard et al.s [2014b] recent have been generated using identical parameters for each video.
segmentation based temporal coherency method applied to Ram-
sey et al. [2004] and the recursive domain transform lter [Gastal The study was performed through an interface similar to Petit et
and Oliveira 2011] (Figure 10)6 . The results show that Boitard et al. [2013], and the subjects were asked to rate each video (presented
al.s [2014b] method tends to underutilize the available display dy- on a NEC MultiSync 2490WUxi2 display) on a scale from 0 to 10,
namic range and generates dark frames with relatively low contrast in a rst experiment for their local contrast reproduction, and in a
similar to their earlier work [Boitard et al. 2012] (see Figure 9- second and separate experiment for the absence of temporal arti-
g). The mean pixel value plots show uctuations (notably between facts. Note that these questions directly assess our primary claim
frames 100 and 130) even for the Ramsey et al. [2004] version from Section 1 that our method is capable of maintaining a high
which is a global TMO. Note that our method is relatively stable level of local contrast with fewer artifacts compared to the state-of-
at the same interval despite being local and therefore more suscep- the-art. Thus, we collected 88 data points for each subject. In a
tible to temporal artifacts. Finally, our method reproduces temporal training session before each experiment, the participants were ex-
contrast (Section 3.1) of the source HDR video better. Our result plained the concepts of local contrast and temporal artifacts. While
preserves the transition from the bright exterior to the darker hall- there were no time limitations to our study, the average subject n-
way by modulating the mean luminance accordingly, whereas this ished in approximately 30 minutes.
transition is diminished or reversed in Boitard et al.s [2014b] re-
sults. Local Contrast Reproduction Absence of Temporal Artifacts
9
8
5.2 Subjective Study 7
6
While the qualitative comparison in Section 5.1 is useful as an 5
4
overview and helps to put the many TMOs into perspective, it cer-
3
tainly does not capture all the aspects relevant to the evaluation of 2
video tone mapping. For example, while not apparent from Fig- 1
ure 9, most operators tend to show camera noise in their results. 0
To that end, we did a pair of controlled subjective rating studies to
compare our method with others in terms of local contrast repro-
duction and the absence of temporal artifacts. The studies respec-
tively had 20 and 19 paid participants mainly consisting of college
students. The stimuli consisted of 11 tone mapped sequences for
each of the 4 HDR video sequences in our test set. The HDR video
sequences and, whenever available, the tone mapped results were Figure 11: Average ratings and 95% condence intervals from our
obtained from the public data set by Eilertsen et al. [2013] (Hall- user study.
way, Hallway2, ExhibitionArea and Students sequences were used).
The tone mapped videos for the Bilateral TMO [Durand and Dorsey
The average ratings and the bars showing 95% condence inter-
6 Boitard et al.s [2014b] results for the Hallway sequence were made vals are shown in Figure 11. We performed One-factor Analysis
public after the conditional acceptance of this publication and therefore have of Variance (ANOVA) with repeated measures to analyze both ex-
been evaluated separately in Figure 10. periments. The main effects were statistically signicant for both
the local contrast (p < .001) and temporal artifacts (p < .001) 5.5 Spatial and Temporal Contrast Adjustment
using Greenhouse-Geisser adjusted degrees of freedom. Pairwise
comparison of video processing methods was conducted using Bon- Local tone mapping allows producing various visual styles by ad-
ferronis post-host test. For the contrast reproduction experiment, justing the spatial scale of the detail layer. Figure 14 shows results
our method was assessed to have a signicantly higher (p < .05) of two s values where one emphasizes ne scale spatial details
contrast than all methods except WLS. For the absence of tem- (right), and the other produces a more balanced result (left).
poral artifacts experiment, our method similarly was assessed to
have signicantly fewer (p < .05) artifacts than all methods except
the Temporal Coherence TMO. The similarity in local reproduction
with WLS is expected because of our ltering approachs connec-
tion to WLS (Section 4.4). Since WLS is an image TMO however,
it scores signicantly lower with respect to the absence of temporal
artifacts. On the other hand, we note that the Temporal Coherence
TMO, that scores highest in the second experiment tends to pro-
duce dark frames (See Figure 9-g), which might have effected the
visibility of the temporal artifacts. For the same reason, it gets a Figure 14: Similar to local image TMOs, our method allows the
low average score in the local contrast reproduction experiment. user to adjust the emphasis given to ne scale details. Left image
is generated with s = 1, whereas the right image with s = 0.1
To conclude, the ndings of the subjective study as well as the qual- (extreme parameter settings have been used for illustration). In the
itative comparison in Section 5.1 suggest that our method is capable right image, note how coarse scale details (e.g. shadows) become
of producing results with high local contrast with a similar level of less pronunced at the cost of emphasized ne scale details.
temporal coherence as the global operators.

Our framework also allows explicit control over the temporal con-
5.3 User Interaction
trast (discussed in Section 3.1) of the tone mapped video. One
possibility when tone mapping HDR sequences with high temporal
HDR tone mapping is an artistic process that often involves ex- contrast to apply strongly compressive tone curve, which reduces
perimentation through trial and error. As such, any tone map- the dynamic range of the whole sequence such that it ts into the
ping software is expected to give the user visual feedback at in- display dynamic range without generating any over or under ex-
teractive rates while tone mapping parameters are being changed. posed regions. However, such a tone mapping that signicantly re-
The dilemma is that inter- duces temporal contrast may sometimes be undesirable. The other
active video editing on cur- option of applying a less compressive tone curve, on the other hand,
rent standard hardware at high may generate strongly over or under exposed frames (Figure 15, top
resolutions is challenging to row). This can easily be remedied in our framework using a bright-
achieve beyond simple opera- ness compensation factor that controls the brightness of each frame.
tions. Our solution for inter- Since our methods base layer is temporally stable, such a bright-
active editing consists of the ness compensation factor can be modulated by the inverse of the
ofine pre-computation of the logmean luminance of the spatiotemporal base layer without caus-
base and detail layers required ing brightness ickering (Figure 15, bottom row).
for tone mapping, followed by Figure 12: A screenshot from
and interactive editing session our user interface during edit-
Without Brightness
Compensation

where the user adjusts tone ing process.


mapping parameters through
a simple user interface (Fig-
ure 12). While most of the artistic decisions can be deferred until
after the pre-computation step, the user needs to select the s pa-
With Brightness
Compensation

rameter that controls the base layer smoothness a priori. That said,
we found that our two-step tone mapping workow to work well
in practice, as s and temporal ltering parameters can efciently
set by trial and error on a small number of representative frames. 150
Mean Pixel Value

Overall, despite its simplicity, we observed that user interface no-


100
tably facilitated the creative side of the tone mapping process in a
post-production environment. 50
With Brightness Compensation
Without Brightness Compensation
0
0 100 200 300
5.4 Local HDR Video Tone Mapping Frame Index

The main application of our method is local video tone mapping. Figure 15: The brightness compensation applied as a function of
Figure 13, our teaser and the supplemental video show our methods logmean value of our methods temporally coherent base layer can
results produced from HDR sequences obtained with an Arri Alexa be used to limit the temporal contrast in tone mapping congura-
XT camera at FullHD or 2880 1620 resolution7 . The input video tions where a strongly compressive tone curve is not desired.
sequences contain complex motion and strong changes in scene il-
lumination. Despite that, our method maintains a temporally stable
mean brightness over time, as well as high local contrast. 5.6 Tone Mapping of Low-light Sequences
7 Due to their high contrast, our results are best viewed on a computer Tone mapping video sequences shot in low-light scenes is often dif-
display. Printed versions may not be representative of the intended results. cult due to the camera noise. TMOs can not distinguish camera
(a)
time 5
Mean Brightness

(b) 4
HDR 3

Our Result 2
1
(c) 0
0 100 200 300

(d)
time
5
(f )
Mean Brightness

(e) 4
HDR
3
Our Result 2
1

0 100 200 300 400 500 0

Figure 13: Representative local tone mapped frames from two sequences (a, d). Despite the high local contrast and strong dynamic range
compression, our method maintains a temporally stable logmean luminance over time (b, e). The input HDR videos that span more than 4
log10 units are visualized in false color (c, f).

noise from scene details, which means that as scene details become artifacts can be suppressed to a reasonable level, we still feel that
more visible during the tone mapping process, so does the noise. our method can be improved by using additional and more sophis-
Due to the denoising effect temporal ltering involved in our tone ticated ow condence measures (e.g. by utilizing machine learn-
mapping pipeline, our method is especially suitable for such low- ing [Mac Aodha et al. 2013]). Finally, our method currently pro-
light sequences. Figure 16 shows an example where the TMO sig- vides only a basic means for controlling chrominance in the form
nicantly amplies camera noise (a), which can be somewhat sup- of a saturation slider, which we hope to address in future work.
pressed by denoising the HDR frame before applying tone mapping
(b). The advantage of our method is that our results are comparable 7 Conclusion
to the latter without the need of an extra denoising step (c).
We presented a local HDR video tone mapping method, that can
6 Discussion signicantly reduce the input dynamic range while preserving lo-
cal contrast. Our key difference is the use of a temporal ltering
While our method does not offer the nal solution to the open prob- through per-pixel motion paths, which allowed us to achieve tem-
lem of local HDR video tone mapping, we believe that it takes a poral stability without ghosting. We formulated an edge-aware l-
big step forward by signicantly advancing the state-of-the-art and ter that is applied by iterative shift-variant convolutions, and shares
paves the way for the (much needed) future work in this eld. As the same halo suppression properties with the WLS lter, which
the entertainment industry is moving towards HDR video pipelines, allowed us to efciently realize our tone mapping framework. We
we think that the relevant research can be stimulated by further ad- presented qualitative and subjective comparisons to the state-of-the-
vances in ltering techniques geared towards tone mapping appli- art which resulted favorably for our method. We showed results
cations, and the emergence of high quality public HDR video data. produced with various tone mapping congurations from challeng-
ing HDR sequences, and presented a simple yet effective user in-
Our method is not without limitations. As an example, for artis- terface. Finally we noted the advantage of our method in low-light
tic reasons it is desirable to have multiple detail layers at differ- HDR sequences.
ent contrast scales. Due to the many technical challenges involved
in obtaining even a single detail layer, we deferred a multi-scale
approach to future work. Also, as with every optical ow based
method, our technique is always susceptible to visual artifacts due
to the inevitable errors in the ow computation. Even though the
outcome of our subjective study (Section 5.2) suggests that these
(a) (b) (c)

Figure 16: A frame-by-frame tone mapping using the bilateral TMO results in signicantly visible camera noise (a), which can be reduced
by denoising each frame before tone mapping using external video editing software, as an example we use The Foundry NUKE (b). On the
other hand, our method can achieve comparably low noise levels without an extra denoising step.

Acknowledgements FARBMAN , Z., FATTAL , R., L ISCHINSKI , D., AND S ZELISKI , R.


2008. Edge-preserving decompositions for multi-scale tone and
The authors would like to thank the anonymous reviewers, the par- detail manipulation. ACM Trans. Graph. 27, 3, 67:167:10.
ticipants of our user study, the Lucid Dreams of Gabriel crew,
Michael Schaffner, Steven Poulakos, Maurizio Nitti and Ronan FATTAL , R., L ISCHINSKI , D., AND W ERMAN , M. 2002. Gradient
Boitard. domain high dynamic range compression. ACM Trans. Graph.
21, 3, 249256.

References FATTAL , R. 2009. Edge-avoiding wavelets and their applications.


ACM Trans. Graph. 28, 3, 22:122:10.
A DAMS , A. 1981. The Print, The Ansel Adams Photography Series
3. New York Graphic Society. F ERWERDA , J. A., PATTANAIK , S. N., S HIRLEY, P., AND
G REENBERG , D. P. 1996. A model of visual adaptation for
AUBRY, M., PARIS , S., H ASINOFF , S. K AUTZ , J., AND D URAND , realistic image synthesis. In Proceedings of the 23rd Annual
F. 2014. Fast local laplacian lters: Theory and applications. Conference on Computer Graphics and Interactive Techniques,
ACM Trans. Graph. (To Appear) 33, 4. SIGGRAPH 96, 249258.

B ENNETT, E. P., AND M C M ILLAN , L. 2005. Video enhancement G ASTAL , E. S. L., AND O LIVEIRA , M. M. 2011. Domain trans-
using per-pixel virtual exposures. ACM Trans. Graph. 24, 3. form for edge-aware image and video processing. ACM Trans.
Graph. 30, 4, 69:169:12.
B ENOIT, A., A LLEYSSON , D., H ERAULT, J., AND C ALLET, P. L.
2009. Spatiotemporal tone mapping operator based on a retina G UTHIER , B., S., K., M., E., AND W., E. 2011. Flicker reduction
model. In CCIW, Springer, vol. 5646 of Lecture Notes in Com- in tone mapped high dynamic range video. Proc. SPIE 7866.
puter Science, 1222.
H E , K., S UN , J., AND TANG , X. 2013. Guided image ltering.
B OITARD , R., B OUATOUCH , K., C OZOT, R., T HOREAU , D., AND IEEE Transactions on Pattern Analysis and Machine Intelligence
G RUSON , A. 2012. Temporal coherency for video tone map- 35, 6, 13971409.
ping. Proc. SPIE 8499.
I RAWAN , P., F ERWERDA , J. A., AND M ARSCHNER , S. R. 2005.
B OITARD , R., C OZOT, R., T HOREAU , D., AND B OUATOUCH , K. Perceptually based tone mapping of high dynamic range image
2014. Survey of Temporal Brightness Artifacts in Video Tone streams. In Proceedings of Eurographics Conf. on Rendering
Mapping. In HDRi2014. Techniques, EGSR05, 231242.
B OITARD , R., C OZOT, R., T HOREAU , D., AND B OUATOUCH , K.
K ANG , S. B., U YTTENDAELE , M., W INDER , S., AND S ZELISKI ,
2014. Zonal brightness coherency for video tone mapping. Sig-
R. 2003. High dynamic range video. ACM Trans. Graph. 22, 3,
nal Processing: Image Communication 29, 2, 229 246.
319325.
C IGLA , C., AND A LATAN , A. A. 2013. Information permeability
for stereo matching. Signal Processing: Image Communication K ISER , C., R EINHARD , E., T OCCI , M., AND T OCCI , N. 2012.
28, 9, 10721088. Real time automated tone mapping system for HDR video. In
Proceedings of the IEEE International Conference on Image
D RAGO , F., M YSZKOWSKI , K., A NNEN , T., AND C HIBA , N. Processing, 27492752.
2003. Adaptive logarithmic mapping for displaying high con-
trast scenes. Computer Graphics Forum 22, 3, 419426. K RONANDER , J., G USTAVSON , S., B ONNET, G., AND U NGER ,
J. 2013. Unied HDR reconstruction from raw cfa data. In
D URAND , F., AND D ORSEY, J. 2002. Fast bilateral ltering for Computational Photography (ICCP), 2013 IEEE International
the display of high-dynamic-range images. ACM Trans. Graph. Conference on, 19.
21, 3, 257266.
K RONANDER , J., G USTAVSON , S., B ONNET, G., Y NMERMAN ,
E ILERTSEN , G., WANAT, R., M ANTIUK , R. K., AND U NGER , J., AND U NGER , J. 2013. A unied framework for multi-sensor
J. 2013. Evaluation of tone mapping operators for hdr-video. HDR video reconstruction. In Signal Processing: Image Com-
Computer Graphics Forum 32, 7, 275284. munications.
L ANG , M., WANG , O., AYDIN , T., S MOLIC , A., AND G ROSS , M. quisition, Display, and Image-Based Lighting, Second Edition.
2012. Practical temporal consistency for image-based graphics Morgan Kaufmann.
applications. ACM Trans. Graph. 31, 4 (July), 34:134:8.
R EINHARD , E., P OULI , T., K UNKEL , T., L ONG , B.,
L EDDA , P., S ANTOS , L. P., AND C HALMERS , A. 2004. A local BALLESTAD , A., AND DAMBERG , G. 2012. Calibrated im-
model of eye adaptation for high dynamic range images. AFRI- age appearance reproduction. ACM Trans. Graph. 31, 6 (Nov.),
GRAPH, 151160. 201:1201:11.
L EE , C., AND K IM , C.-S. 2007. Gradient domain tone mapping T OCCI , M. D., K ISER , C., T OCCI , N., AND S EN , P. 2011. A
of high dynamic range videos. In Image Processing, 2007. ICIP versatile hdr video production system. ACM Trans. Graph. 30,
2007. IEEE International Conference on, vol. 3, III 461III 4, 41:141:10.
464. T OMASI , C., AND M ANDUCHI , R. 1998. Bilateral ltering for
L ISCHINSKI , D., FARBMAN , Z., U YTTENDAELE , M., AND gray and color images. In ICCV, 839846.
S ZELISKI , R. 2006. Interactive local adjustment of tonal val- VAN H ATEREN , J. H. 2006. Encoding of high dynamic range
ues. ACM Trans. Graph., 646653. video with a model of human cones. ACM Trans. Graph. 25, 4,
M AC AODHA , O., H UMAYUN , A., P OLLEFEYS , M., AND B ROS - 13801399.
TOW, G. J. 2013. Learning a condence measure for optical Y E , G., G ARCES , E., LIU , Y., DAI , Q., AND G UTIERREZ , D.
ow. IEEE Trans. Pattern Anal. Mach. Intell. 35, 5, 11071120. 2014. Intrinsic Video and Applications. ACM Trans. Graph. (To
Appear) 33, 4.
M ANTIUK , R., M YSZKOWSKI , K., AND S EIDEL , H.-P. 2006. A
perceptual framework for contrast processing of high dynamic Z IMMER , H., B RUHN , A., AND W EICKERT, J. 2011. Optic Flow
range images. ACM Trans. Appl. Percept. 3, 3, 286308. in Harmony. International Journal of Computer Vision 93, 3,
368388.
M ANTIUK , R., DALY, S., AND K EROFSKY, L. 2008. Display
adaptive tone mapping. ACM Trans. Graph. 27, 3, 68:168:10.
M EYER , C. D. 2001. Matrix Analysis and Applied Linear Algebra.
SIAM: Society for Industrial and Applied Mathematics.
M ILANFAR , P. 2013. A tour of modern image ltering. IEEE
Signal Processing Magazine 30, 1, 106128.

M R AZEK , P., W EICKERT, J., AND B RUHN , A. 2004. On robust
estimation and smoothing with spatial and tonal kernels. In Proc.
Dagstuhl Seminar: Geometric Properties from Incomplete Data,
Springer, 388722.
N ORDSTROM , K. N. 1989. Biased anisotropic diffusiona unied
regularization and diffusion approach to edge detection. Tech.
rep., EECS Dept., UC Berkeley.
PARIS , S., H ASINOFF , S. W., AND K AUTZ , J. 2011. Local lapla-
cian lters: Edge-aware image processing with a laplacian pyra-
mid. ACM Trans. Graph. 30, 4, 68:168:12.
PATTANAIK , S. N., T UMBLIN , J., Y EE , H., AND G REENBERG ,
D. P. 2000. Time-dependent visual adaptation for fast realistic
image display. In Proc. of Conf. on Computer Graphics and
Interactive Techniques, SIGGRAPH 00, 4754.
P ETIT, J., AND M ANTIUK , R. K. 2013. Assessment of video
tone-mapping: Are cameras s-shaped tone-curves good enough?
Journal of Visual Communication and Image Representation 24,
7, 10201030.
R AMSEY, S. D., J OHNSON III, J. T., AND H ANSEN , C. 2004.
Adaptive temporal tone mapping. In Proceedings of the 7th
IASTED International Conference on Computer Graphics and
Imaging, 124128.
R EINHARD , E., AND D EVLIN , K. 2005. Dynamic range reduc-
tion inspired by photoreceptor physiology. IEEE Transactions
on Visualization and Computer Graphics 11, 1324.
R EINHARD , E., S TARK , M., S HIRLEY, P., AND F ERWERDA , J.
2002. Photographic tone reproduction for digital images. ACM
Trans. Graph. 21, 3, 267276.
R EINHARD , E., WARD , G., PATTANAIK , S., D EBEVEC , P., H EI -
DRICH , W., AND M YSZKOWSKI , K. 2010. HDR Imaging - Ac-

You might also like