You are on page 1of 62

Image and Video

Super-resolution
Wenzhe Shi - Magic Pony / Twitter
July 2017

1
Super-Resolution (SR)
Applications:
● Satellite imaging
● Medical imaging
● Face recognition
● Surveillance
● ...

2
Data traffic breakdown on the interwebs

Source: Cisco VNI: Forecasts and Methdology 2013-2018


3
Typical SR problem setup
???

Noise
(optional)

4
Factors determining success in SR

● Data
● Model
● Objective function most important for low-level vision!

5
Outline
1. Efficient CNNs for Super-resolution
○ ESPCN (CVPR 2016)
○ How to initialize ESPCN (Arxiv 2017)
○ Video-ESPCN (CVPR 2017)

2. GANs for Super-Resolution


○ SRGAN (CVPR 2017)

6
#RealTimeSR
Sub-pixel convolution

7
SRCNN: C. Dong et al., ECCV 2014, TPAMI 2015

8
Proposed network: ESPCN
LR HR
Directly map I and I , operating exclusively in LR space

9
Proposed network: ESPCN
LR HR
Directly map I and I , operating exclusively in LR space
Pixel Shuffler

10
A note on sub-pixel convolution
Full view of the proposed sub-pixel convolution using just
convolution (x2)

11
A note on sub-pixel convolution
Full view of standard sub-pixel convolution (x2)

12
Convolution in LR or HR space?
● With matching complexity and receptive field
○ Networks in LR space have more parameters
→ more representation power

Shi, et al., “Is the deconvolution layer the same as a convolutional layer”, arXiv, 2016
13
Accuracy and speed (x3 on CPU@2GHz)

14
SR x3 qualitative results
Bicubic 29.43dB

15
SR x3 qualitative results
SRCNN 32.81dB

16
SR x3 qualitative results
ESPCN 33.66dB

17
SR x3 qualitative results
ESPCN+ 34.85dB

18
SR x3 qualitative results
Ground truth

19
Publications

CVPR 2016

Arxiv 2016

20
#Initializaiton
Remove checkerboard artifacts

21
Checkerboard artifacts
Caused by deconvolution and sub-pixel convolution layer

Radford 2015 Johnson 2016 Dosovitskiy 2015 Gao 2017

22
Deconv Overlap

Odena, et al., http://distill.pub/2016/deconv-checkerboard/ , 2016


Random Initialization
Random Initialization

Original Shi 2016


Resize Convolution

Original Shi 2016

Odena 2016
Initialize to Conv NN Resize
Initialize to Conv NN Resize
Initialize to Conv NN Resize
Initialize to Conv NN Resize
Initialize to Conv NN Resize

Original Shi 2016

Odena 2016 Aitken 2017


Initialize to Conv NN Resize
Publications

Arxiv 2017

33
#VideoSR
Exploiting temporal redundancy

34
From Image to Video SR
Image SR

Downscale SR
From Image to Video SR
Video SR

Downscale SR

time
Motivation
● Can we exploit temporal redundancies to improve video CNN-based SR?
○ If so, what is the best strategy?

● Can we further improve results with motion compensation?

37
Video ESPCN (VESPCN)

Time

38
Video ESPCN (VESPCN)

Time

39
Data consistency Motion compensation
Results: State-of-the-art comparison

40
Publications

CVPR 2017

41
#PhotoRealisticSR
SR using a GAN (SRGAN)

42
Limitations of Mean-Squared-Error

43
From MSE to Perceptual Loss
Content Loss
ensures pixel-level
content is preserved

44
From MSE to Perceptual Loss
Content Loss
ensures high-level
content is preserved

45
From MSE to Perceptual Loss
● MSE in pixel-space

● MSE in VGG feature-space

[img_source] https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learning-meetup-5/
46
Intuition for VGG loss

VGG
feature loss

Li and Wand. “Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis”, CVPR 2016
47
Perceptual Loss has two components
Content Loss Adversarial Loss
ensures high-level ensures reconstructed
content is preserved images look real

48
Generative Adversarial Network

D is a network trained to tell apart


real from super-resolved images
G is trained to fool
the discriminator

49
Intuition for Perceptual loss

50
Limitations of PSNR / SSIM

51
Mean-Opinion-Score (MOS) Testing
● PSNR and SSIM fail to assess perceptual quality

● 26 human raters
○ Give scores 1 (bad) to 5 (excellent)
○ Each rater rated more than 1000 images

52
Results: MOS test

MSE-based 1.3 0.9

[SRCNN] Dong, et al. Learning a deep convolutional network for image super-resolution. ECCV 2014.
[SelfExSR] Huang, et al. Shi, et al., “Single image super-resolution from transformed self-exemplars”, CVPR 2015
[DRCN] Kim, et al., “Deeply-recursive convolutional network for image super-resolution”, CVPR 2016
53
[ESPCN] Shi, et al., “Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network”, CVPR 2016
Bicubic (4x) Original HR

54
SRResNet (4x SR) Original HR

55
SRGAN (4x SR) Original HR

56
Example

Bicubic interpolation Original HR


(4x upscaling)
57
Example

4x SRResNet Original HR
(MSE-loss)
58
Example

4x SRGAN Original HR
(perceptual loss)
59
NN bicubic SRResNet SRGAN

16x

16x

60
* trained on CelebA
Publications

CVPR 2017

61
Credits &
Acknowledgements
Christian Ledig @LedigChr
Zehan Wang @ZehanWang
Jose Caballero @josecabjim
Andy Aitken @aitken_ap
Lucas Theis @lucastheis
Ferenc Huszár @fhuszar
Johannes Totz @johannes_totz Questions?
Alejandro Acosta @aacostad
Aly Tejani @alykhantejani
Rob Bishop @Rob_Bishop
Sebastiaan Van Leuven @svleuven
Joost van Amersfoort @y0ast
Francisco Massa @fvsmassa
Yordan Chaparov @ychaparov
Wenzhe Shi
@trustswz
wshi@twitter.com

You might also like