Image and Video Super-Resolution

Image and Video
Super-resolution
Wenzhe Shi - Magic Pony / Twitter
July 2017
1
Super-Resolution (SR)
Applications:
● Satellite imaging
● Medical imaging
● Face recognition
● Surveillance
● ...
2
Data traffic breakdown on the interwebs
Source: Cisco VNI: Forecasts and Methdology 2013-2018

3
Typical SR problem setup
???
Noise
(optional)
4
Factors determining success in SR
● Data
● Model
● Objective function most important for low-level vision!
5
Outline
1. Efficient CNNs for Super-resolution
○ ESPCN (CVPR 2016)
○ How to initialize ESPCN (Arxiv 2017)
○ Video-ESPCN (CVPR 2017)
2. GANs for Super-Resolution

○ SRGAN (CVPR 2017)
6
#RealTimeSR
Sub-pixel convolution
7
SRCNN: C. Dong et al., ECCV 2014, TPAMI 2015
8
Proposed network: ESPCN
LR HR
Directly map I and I , operating exclusively in LR space
9
Proposed network: ESPCN
LR HR
Directly map I and I , operating exclusively in LR space
Pixel Shuffler
10
A note on sub-pixel convolution
Full view of the proposed sub-pixel convolution using just
convolution (x2)
11
A note on sub-pixel convolution
Full view of standard sub-pixel convolution (x2)
12
Convolution in LR or HR space?
● With matching complexity and receptive field
○ Networks in LR space have more parameters
→ more representation power
Shi, et al., “Is the deconvolution layer the same as a convolutional layer”, arXiv, 2016
13
Accuracy and speed (x3 on CPU@2GHz)
14
SR x3 qualitative results
Bicubic 29.43dB
15
SRCNN 32.81dB
16
ESPCN 33.66dB
17
ESPCN+ 34.85dB
18
Ground truth
19
Publications
CVPR 2016
Arxiv 2016
20
#Initializaiton
Remove checkerboard artifacts
21
Checkerboard artifacts
Caused by deconvolution and sub-pixel convolution layer
Radford 2015 Johnson 2016 Dosovitskiy 2015 Gao 2017
22
Deconv Overlap
Odena, et al., http://distill.pub/2016/deconv-checkerboard/ , 2016

Random Initialization
Random Initialization
Original Shi 2016

Resize Convolution
Original Shi 2016
Odena 2016
Initialize to Conv NN Resize
Original Shi 2016
Odena 2016 Aitken 2017

Publications
Arxiv 2017
33
#VideoSR
Exploiting temporal redundancy
34
From Image to Video SR
Image SR
Downscale SR
From Image to Video SR
Video SR
Downscale SR
time
Motivation
● Can we exploit temporal redundancies to improve video CNN-based SR?
○ If so, what is the best strategy?
● Can we further improve results with motion compensation?
37
Video ESPCN (VESPCN)
Time
38
Video ESPCN (VESPCN)
Time
39
Data consistency Motion compensation
Results: State-of-the-art comparison
40
Publications
CVPR 2017
41
#PhotoRealisticSR
SR using a GAN (SRGAN)
42
Limitations of Mean-Squared-Error
43
From MSE to Perceptual Loss
Content Loss
ensures pixel-level
content is preserved
44
Content Loss
ensures high-level
content is preserved
45
● MSE in pixel-space
● MSE in VGG feature-space
[img_source] https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learning-meetup-5/
46
Intuition for VGG loss
VGG
feature loss
Li and Wand. “Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis”, CVPR 2016
47
Perceptual Loss has two components
Content Loss Adversarial Loss
ensures high-level ensures reconstructed
content is preserved images look real
48
Generative Adversarial Network
D is a network trained to tell apart

real from super-resolved images
G is trained to fool
the discriminator
49
Intuition for Perceptual loss
50
Limitations of PSNR / SSIM
51
Mean-Opinion-Score (MOS) Testing
● PSNR and SSIM fail to assess perceptual quality
● 26 human raters
○ Give scores 1 (bad) to 5 (excellent)
○ Each rater rated more than 1000 images
52
Results: MOS test
MSE-based 1.3 0.9
[SRCNN] Dong, et al. Learning a deep convolutional network for image super-resolution. ECCV 2014.
[SelfExSR] Huang, et al. Shi, et al., “Single image super-resolution from transformed self-exemplars”, CVPR 2015
[DRCN] Kim, et al., “Deeply-recursive convolutional network for image super-resolution”, CVPR 2016
53
[ESPCN] Shi, et al., “Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network”, CVPR 2016
Bicubic (4x) Original HR
54
SRResNet (4x SR) Original HR
55
SRGAN (4x SR) Original HR
56
Example
Bicubic interpolation Original HR

(4x upscaling)
57
Example
4x SRResNet Original HR
(MSE-loss)
58
Example
4x SRGAN Original HR
(perceptual loss)
59
NN bicubic SRResNet SRGAN
16x
16x
60
* trained on CelebA
Publications
CVPR 2017
61
Credits &
Acknowledgements
Christian Ledig @LedigChr
Zehan Wang @ZehanWang
Jose Caballero @josecabjim
Andy Aitken @aitken_ap
Lucas Theis @lucastheis
Ferenc Huszár @fhuszar
Johannes Totz @johannes_totz Questions?
Alejandro Acosta @aacostad
Aly Tejani @alykhantejani
Rob Bishop @Rob_Bishop
Sebastiaan Van Leuven @svleuven
Joost van Amersfoort @y0ast
Francisco Massa @fvsmassa
Yordan Chaparov @ychaparov
Wenzhe Shi
@trustswz
wshi@twitter.com

Image and Video Super-Resolution

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image and Video Super-Resolution

Uploaded by

Copyright:

Available Formats

Image and Video

Source: Cisco VNI: Forecasts and Methdology 2013-2018

2. GANs for Super-Resolution

Radford 2015 Johnson 2016 Dosovitskiy 2015 Gao 2017

Odena, et al., http://distill.pub/2016/deconv-checkerboard/ , 2016

Original Shi 2016

Original Shi 2016

Original Shi 2016

Odena 2016 Aitken 2017

● Can we further improve results with motion compensation?

● MSE in VGG feature-space

D is a network trained to tell apart

MSE-based 1.3 0.9

Bicubic interpolation Original HR

You might also like