Employing Structural and Statistical Information To Learn Dictionary S For Single Image Super Resolution in Sparse Domain 2016 Signal Processing Image

Signal Processing: Image Communication 48 (2016) 6380
Contents lists available at ScienceDirect
Signal Processing: Image Communication

journal homepage: www.elsevier.com/locate/image
Employing structural and statistical information to learn dictionary

(s) for single image super-resolution in sparse domain
Srimanta Mandal n, Anil Kumar Sao
School of Computing and Electrical Engineering (SCEE), Indian Institute of Technology Mandi, HP, India
art ic l e i nf o a b s t r a c t
Article history: It has been argued that structural information plays a signicant role in the perceptual quality of images,
Received 18 February 2016 but the importance of statistical information cannot be neglected. In this work, we have proposed an
Received in revised form approach, which explores both structural and statistical information of image patches to learn multiple
23 August 2016
dictionaries for super-resolving an image in sparse domain. Structural information is estimated using
Accepted 23 August 2016
Available online 25 August 2016
dominant edge orientation, and mean value of the intensity levels of an image patch is used to represent
statistical information. During reconstruction, a low resolution test patch is inspected for its structural as
Keywords: well as statistical information to choose a suitable dictionary. This helps in preserving the orientation of
Sparse representation edge during super-resolution process. Results are further improved by adding an edge (magnitude of
Dictionary
edge) preserving constraint, which maintains the edge continuity of super-resolved image with the input
Edge orientation
low resolution image. Thus, both characteristics of edge, i.e., orientation and magnitude are preserved in
Clustering
Edge preserving constraint our proposed approach. The experimental results demonstrate the usefulness of the proposed approach
Super-resolution in comparison to state-of-the-art approaches.
& 2016 Elsevier B.V. All rights reserved.
1. Introduction divides the existing SR approaches into two classes: (i) multiple
images SR [4,5], and (ii) single image SR [614]. In the former class,
The proliferation of image related applications like medical information provided by multiple sub-pixel shifted LR images of
imaging, remote sensing and surveillance has led to increase in the scene are fused together to derive HR image of the target
demand of high resolution (HR) images. But often, it is not feasible scene. Thus the quality of super-resolved image depends on the
due to certain limitations of imaging environment (limited shutter number of sub-pixel shifted images, which is the main bottleneck
speed, out-of focus of lens, low-resolution sensors and so on). of this approach.
Reducing pixels' size in the sensor or increasing chip size may On the contrary, the approach belonging to the latter class does
increase pixel density, which results in HR image. But reducing not require multiple images of the same scene and is more suitable
pixels size below a certain limit introduces shot noise in image, for practical scenarios. Generally, these techniques need some
and increasing the area of chip results in higher capacitance [1]. example HR images to borrow the relevant information for super-
Moreover, these solutions introduce problems like higher storage resolving the LR input image.
requirement, higher bandwidth requirement for transferring an The problem of single image SR can be addressed by modeling
image and so on. Thus, we search for some techniques, which can the formation process of LR image, where an LR image y N is
increase resolution of the image captured by low resolution (LR) produced by blurring followed by down-sampling the original HR
sensors. Interpolation techniques such as nearest neighborhood, scene x M ( N M ). It can be expressed mathematically as
bi-linear, and bi-cubic [2,3] can serve the purpose up to some y = SBx + n, (1)
extent. The key idea of these methods is to ll-up the missing
NM MM
pixels by weighted average of the neighboring pixels, which pro- where S is the down-sampling operator, B is the
duces smooth images. To address this issue, super-resolution (SR) blurring operator and n N denotes the additive noise. The SR
techniques have been evolved. seeks an approximation of x from y. Since N M , S and B together
Requirement of number of LR images of the target scene form a rectangular matrix with number of columns more than the
number of rows. Thus, there could be many x, which can produce
n the same y, and the problem of deriving unique x becomes ill-
Corresponding author.
E-mail addresses: srimanta_mandal@students.iitmandi.ac.in (S. Mandal), posed. It is addressed by regularizing the inverse problem based
anil@iitmandi.ac.in (A.K. Sao). on some prior knowledge. For instance, Tikhonov regularization
http://dx.doi.org/10.1016/j.image.2016.08.006
0923-5965/& 2016 Elsevier B.V. All rights reserved.
64 S. Mandal, A.K. Sao / Signal Processing: Image Communication 48 (2016) 6380
[15] assumes smoothness as a prior knowledge due to prevailing of the patch does convey the spatial distribution of intensity
nature of this characteristic in natural images. Nevertheless, this values.
assumption leads to smoothing of image details such as edges, Training image patches are rst clustered according to their
textures and corners. Total variation (TV) regularization addresses structural information. A cluster of patches having similar struc-
the issue up to some extent by minimizing the l1-norm of gradient tures (which we denote as an edge-cluster) may vary in statistical
of images [16,17]. But the piecewise constant like structure of TV- information, i.e., intensities may vary across different patches
regularization tends to generate staircase artifacts in the processed within an edge-cluster. Thus each of the edge-clusters is further
image. Several regularization techniques for preserving image clustered using K-means clustering, such that patches having si-
details have been reported [1820]. milar structural information and similar statistical information are
The work in reference [21] shows that sparse nature of the grouped together. Principal components of each cluster are esti-
wavelet coefcients is an important prior knowledge in case of mated to learn compact dictionary as principal component ana-
image restoration. This has opened up the possibility of restoring lysis is proved to be effective in SR reconstruction [40,43]. During
an image in sparse domain, where an image x can be represented super-resolution, a particular dictionary is assigned to a test patch
as a linear combination of few columns of an over-complete ma- according to its structural as well as its statistical information.
trix A, dened as dictionary [2231,32,33]. Thus, if we have an Furthermore, the performance of SR is improved by incorporating
appropriate coefcient vector c to weight the columns of dic- an edge preserving constraint [25], which maintains edge con-
tionary matrix A, the image can be restored back as x = Ac . Here, tinuity of the SR result with the input LR image. Being a vector
the prior knowledge is sparse nature of the coefcient vector c, i.e., quantity, edge has magnitude as well as direction, hence both of
it will have few non-zero entries. Hence, the ill-posed problem of these parameters have to be taken care of while preserving edge.
SR (Eq. (1)) can be re-framed as nding the coefcient vector with Our prior work [25] has preserved edge in different directions
sparsity priors as: separately, whereas in the current work the magnitude of edge is
preserved using edge-preserving constraint, and the direction of
c^ = arg min c 0 s.t. y SBAc 22 ,
c (2) edge is preserved by the proposed dictionary learning and selec-
tion method.
where the rst term denotes the l0-norm of the coefcient vector
The remainder of this paper is organized as follows: Section 2
c, which gives the number of non-zero elements of the vector. The
compares the proposed work with some of the existing relevant
second term is the data term, which makes sure that the output SR
works of SR. Proposed approach of learning dictionaries along
result is consistent with the input LR image. in the second term is
with edge preservation in SR is described in Section 3. Experi-
related to the power of noise n. Solving Eq. (2) involves combi-
mental results are demonstrated in Section 4, and Section 5
natorial search, which is NP-hard. It has been found that l1-norm
summarizes the paper.
minimization is the closest convex approximation of l0-norm
minimization [34]. Thus, the above constrained problem can be re-
written with Lagrangian multiplier () as
2. Related works and contributions
{ }
c^ = arg min y SBAc 22 + c 1 .
c (3)
Dictionary learning techniques have been used extensively in
The efciency of SR based on sparsity prior is very much af- example-based single image SR and can be broadly classied into
fected by the choice of the suitable dictionary. It can have an sparse representation based approaches [36,44,22,41,40] and non-
analytical form, which is computationally fast or it can be learned sparse representation based approaches [4549]. Non-sparse re-
from the data itself, which is proven to be computationally ex- presentation based approaches either learn dictionaries explicitly
pensive. Due to its adaptive nature, the learned dictionaries are [49,45] or implicitly via hidden layers of convolutional neural
shown to give efcient sparse representation as compared to its network [47]. Here, approaches related to sparse representations
analytical counterpart [35], leading to better SR result. Several are explained in brief as the proposed approach belongs to this
techniques of learning dictionary have been documented in the category.
literature [3642]. Most of them consider patches as unit instead SR using sparse domain representation is not new, many so-
of the entire image to faithfully reconstruct the image. Some of phisticated approaches incorporating sparse domain framework
them learn a single dictionary ( a. k . a. universal dictionary) [36 have been published [36,44,22,41,40,50,51]. Some of them con-
39] to compute the sparse representation of all the patches of a sider universal over-complete dictionary [36,22] and some medi-
given image. But these may not be appropriate to represent the tate on learning multiple dictionaries [41,40]. In references [36,44],
variations present in the image. The works proposed in the re- Yang et al. have learned the relationship between LR and HR image
ferences [40,41] have tried to address this issue by learning mul- patches by considering two dictionaries. One dictionary Ah has
tiple dictionaries. But these approaches either consider statistical been learned from the HR raw image patches, and another dic-
information or structural information of image patches to learn tionary Al has been learned from the corresponding LR image
dictionary(s). None of them have explored the complimentary patches. The basic idea is to derive the sparse coefcient vector c
nature of both the information. using the LR dictionary Al , and later the same coefcient vector is
In this paper, we have proposed an approach of learning mul- used for choosing the corresponding HR image patches from the
tiple dictionaries for sparse representation by considering both the dictionary Ah . The linear relationship between the two types of
dictionaries may pose an issue:
structural information and the statistical information of image
patches as cue. Structural information yields the structure of image Al = SBAh, (4)
patches and is strongly related with the geometric details of the
patch. In this work, the geometric details are characterized by which can be derived as
dominant edge orientation of the patch, whereas mean and var- y = SBx,
iance of the gray levels of a patch are associated with the statistical
Al c = SBAhc. (5)
information of the patch. Thus the two quantities are compli-
mentary in nature, as mean and variance do not have information Let us say that the dictionary Ah has small mutual coherence
of pixel locations. On the contrary, the dominant edge orientation which is one of the important characteristics of efcient
S. Mandal, A.K. Sao / Signal Processing: Image Communication 48 (2016) 6380 65
Fig. 2. The residual component [41]: (a) the input LR image, (b) generated LR image
from the output HR image and (c) the residual component.
Fig. 1. Signicance of structural and statistical information: (a) patches within a continuous gradient information, known as edginess [53] to pre-
cluster are statistically similar (means of intensity values are similar) but different serve. One can observe that edginess of an image (Fig. 7(d)) has
structurally (geometrically dominant edge orientation); and (b) patches within a less noise like structures as compared to the residual of image
cluster are structurally similar but statistically different.
shown in Fig. 2(c). Thus the proposed work preserves the edges
better as compared to the work [41].
dictionaries for sparse representation. But, this does not guarantee (iii) Reference [41] learns the dictionary separately for HR im-
that the same will be true for the dictionary Al , because of the age patches and LR image patches to learn the relationship be-
multiplication by a rectangular matrix SB. This may increase the
tween HR and LR images, as done in references [36,22]. As ex-
proximity, and as a result we may end up with a coefcient vector
plained earlier, the linear mapping from HR dictionary to LR dic-
c, which is not suitable for reconstructing the HR image patch [52].
tionary may increase the mutual coherence of the latter part, and
Similarly, the work in reference [22] also exploits the relationship
thus it may not ensure sparse recovery of the signal. On the con-
between the LR and HR image patches. But the work follows
trary, the proposed approach learns dictionary from HR image
K-SVD dictionary learning approach [38,22], instead of using
patches (employed in reference [40] also) only, thus the assump-
cluster of raw patches as dictionary. In addition, both of these
tion of linear relationship is avoided. This is an improvement over
approaches also consider universal over-complete dictionaries,
the traditional sparse representation based approaches [36,22,41],
which may not be appropriate for representing all variation of
as the objective of SR is to restore HR image, it is more logical to
image structures.
use sparse coefcient vector computed using dictionary made up
The work in reference [40] attempts to address the issue of
of HR image patches than using LR patches.
universal dictionary by learning several sub-dictionaries from the
(iv) Reference [41] considers a separate cluster for smooth
HR training patches. It uses K-means clustering approach to cluster
patches. This may not be necessary because human visual system
different patches, but it does not consider the orientation of edges,
(HVS) is relatively less sensitive to smoother patches, and hence
which reects the structural information, an important char-
can be up-sampled directly with the simple interpolation ap-
acteristic of images. This is illustrated in Fig. 1(a), where a test
proaches. In the proposed approach, only patches containing
image patch having dominant edge orientation of 135 (antic-
crucial information are considered for SR, hence, computational
lockwise direction from the x-axis) can be assigned any or linear
overhead is reduced.
combination of the patches having dominant edge orientation 45
Thus, the contribution of our work can be summarized as:
or 135. This is because, both patches belong to same cluster due to
(i) We propose a dictionary learning approach which considers
their similarity in mean of intensity values. But, ideally it should be
structural as well as statistical information of image patches.
assigned the patch, having orientation 135. Thus, faithful re-
During reconstruction, a dictionary will be assigned to a target
construction of the patch may not be possible.
On the contrary, the work in reference [41] learns multiple patch by checking its structural and statistical information. (ii)
dictionaries from clusters of patches, congregated by structural Proposed approach considers only suitable patches (which are not
information. This approach has some similarity with the proposed smooth), to learn dictionaries as well as to reconstruct the image.
approach with notable important differences as follows: (iii) Our approach considers a simple yet effective edge preserving
(i) Reference [41] clusters the training patches into eight groups constraint, which will preserve edge of the image during SR. (iv)
by considering only the structural information, but has neglected Extensive experimental analysis is performed to validate the ap-
the statistical information of the patches. Thus, the dictionary proach under noiseless as well as noisy condition.
contains structurally similar but statistically different patches, and
linear combination of them may not reconstruct the target patch
faithfully, as can be seen in Fig. 1(b). On the contrary, both sta- 3. Proposed approach
tistical and structural information, which are complimentary in
nature, are used in the proposed work of learning dictionaries for The block diagram of the proposed approach, shown in Fig. 3,
better SR reconstruction. can be divided into two phases: (i) learning phase and (ii) re-
(ii) In order to preserve the detail information, the reference construction phase. These phases have been discussed in greater
[41] adds the super-resolved residual component, computed as the detail in the following sub-sections. First phase deals with the
difference between the original LR image and synthesized LR im- learning of several dictionaries using structural as well as statis-
age from the reconstructed HR counterpart. It can be observed tical information of patches. This is due to the fact that they both
from Fig. 2 (taken from reference [41]) that the residual contains contain complimentary information as demonstrated in Fig. 1. The
edge information along with noise. If the super-resolved residual is super-resolution process is discussed in the reconstruction phase,
added to the reconstructed HR image, the noise like structure may where a HR image patch is derived by assigning a suitable dic-
degrade the resultant HR image. Unlike reference [41], we have tionary to the target LR patch. Further, an edge preserving con-
added a simple edge preserving constraint to preserve the edges of straint is added to preserve the continuous gradient magnitude.
the input image in the super-resolved image. This constraint has This constraint helps in regularizing the SR problem and will be
been proposed in our prior work [25], where we consider discussed in Section 3.3.
Fig. 3. The proposed approach: learning phase where Dicxtroids (Dictionaries centroids) are learned and reconstruction phase where a LR test image is super-resolved.
3.1. Learning phase containing the singular values of T. The columns of the square
matrices U m m and V 2 2 represent left and right singular
Several images rich in content are collected for training pur- vectors of T, which are orthonormal in nature. Since, the gradient
pose, and patches of size m m are extracted from them using eld is decomposed into singular vectors and singular values, the
a patch extractor matrix Pi . Let t i denotes the extracted patch from rst column of V (denoted as v1) will give the dominant orienta-
a training image t using the relation t i = Pit . Since, HVS is less tion of the gradient eld. Specic angle of orientation can be de-
sensitive to smoother regions, we neglect the smoother patches by rived by
comparing the standard deviation (s.d.) of gray level of the patches
v (2)
with a predened threshold value (Th). = arctan 1 .
The selected patches are then distributed into six clusters ( R 0 , v1(1) (7)
R30 , R 60 , R 90 , R120 and R150 ) using the process of edge clustering1
depending on their structural information, represented by domi- Thus, if we add 90 to , the angle of dominant orientation of
nant orientation of edges. Here, R represents a cluster of patches edges of a patch can be achieved. If + 90 ( 15, + 15], that
having dominant orientation of edges near about , where is particular patch will be clustered in R . The above approach is
chosen to cover entire 360 with interval of 30, i.e., repeated for all the patches to generate the edge clusters R .
Since, each cluster R has been formed using the structural
{ 0, 30, 60, 90, 120, 150}. Generally, the computation
information of patches, which might be different in statistical
of edge orientation is performed by gradient operator (derivative
sense, i.e., intensity values could vary largely over a cluster of
operator), which is very sensitive to noise. Thus, a method of es-
patches. To make each cluster unique in structural as well as sta-
timating dominant edge orientation is required, which is less
tistical sense, we further divide each R into K clusters. We in-
sensitive to noise. The work in reference [54] shows a robust
method for estimating dominant edge orientation of a patch using (
itialize a codebook consisting of K random values r1, , r2, rK , . If )
SVD, which is followed in the proposed approach. Firstly, gradient a patch t i, from the edge cluster R follows the condition
eld of a patch t i has been calculated as
{ }
R k, = i l k , ti, rk, 2 < ti, rl, 2 , (8)
t t
Ti = [ti, h, ti, v] = i , i , then the patch will be kept in columnize mode into the cluster
h v (6)
Rk, . Finally the codebook will be updated as
where h and v represent the horizontal and vertical axes, re-
1
spectively. ti, h and ti, v both are of dimension m m and are rk, = t i, .
|R k, |
arranged in columnize mode in the matrix T . It is decom-m2 i R k, (9)
posed as T = UVT , where m 2 is a diagonal matrix In this manner, all the patches will be grouped to generate sta-
tistically as well as structurally similar clusters Rk, ( k = 1, 2, K ,
1
Edge clustering term is used to describe the process of achieving R using and {0, 30, 60, 90, 120, 150}). The signicance of the
dominant edge orientation of a patch. proposed approach of clustering the image patches can be
observe that the variance of intensity levels of statistical cluster

(Fig. 5(b)) and the cluster generated by the proposed approach
(Fig. 5(c)) are smaller as compared to the edge cluster (Fig. 5(a)).
This explains that the cluster of patches, generated by proposed
approach, is statistically similar. Whereas the range of dominant
orientation of edge cluster (Fig. 5(d)) and the cluster generated by
proposed approach (Fig. 5(f)) are smaller as compared to the sta-
tistical cluster (Fig. 5(e)). It demonstrates that the cluster of pat-
ches, generated by proposed approach, is structurally similar. Thus,
Fig. 4. The cluster of patches using: (a) structural information R 60 the edge
cluster, (b) statistical information (statistical cluster) and (c) structural as well as it can be forwarded that the cluster of patches, generated by
statistical information R1,60 our cluster. (Zoom the pdf soft copy to see the proposed approach, is similar both statistically as well as struc-
clusters). turally, which contains the complimentary information of a given
image patch.
observed in Fig. 4. It shows some randomly chosen image patches Each of those statistically as well as structurally unique clusters
from the cluster Rk, (k = 1 and = 60) obtained using the ap- is used to learn the compact dictionary. In principal, the clusters of
proach mentioned above (see Fig. 4(c)). We have also shown the raw patches Rk, could have been used to form over-complete
patches from the cluster formed using only structural information dictionaries, which will be expensive from storage point of view.
(Fig. 4(a)), and statistical information (Fig. 4(b)) separately. One This issue is addressed by learning compact dictionaries from the
can observe that patches of Fig. 4(a) have lots of variation in in- clusters of raw patches using principal component analysis (PCA).
tensity but they are similar in structural sense. On the other hand, The principal components are computed from the co-variance
patches of Fig. 4(b) have similar intensity levels, but they differ matrix k, , which is estimated as
structurally. Whereas the cluster of patches derived using the
proposed approach is statistically as well as structurally similar.
This observation can be veried further using Fig. 5, which shows
k, = ( R )( R
k, k,
T
) .
(10)
that the histogram of variance of the intensity levels and dominant Eigenvectors pk, along with the eigenvalues are computed from
edge orientation of patches belong to a cluster derived using only k, . The eigenvectors are arranged according to the descending
structural information (edge cluster), only statistical information order of the eigenvalues. First few signicant eigenvectors (s) are
(statistical cluster) and using the proposed approach. One can selected corresponding to signicant eigenvalues and are arranged
Fig. 5. Histogram of variance of the intensity levels (a) for edge cluster, (b) for statistical cluster and (c) for cluster generated using proposed approach. Histogram of range of
dominant edge orientation (d) for edge cluster, (e) for statistical cluster and (f) for cluster generated using proposed approach.

c^i = arg min x^ i A ci 22 + ci 1 ,
ci (12)
m
where A is the concatenation of Ak, , k , and
c^ i ( > m) is the estimated sparse vector of the HR image. Eq.
(12) has to be solved iteratively for all the patches to update the
patches of HR approximation until it converges and is performed
by iterative thresholding algorithm [55]. Within each iteration, the
selection of dictionaries will be updated adaptively by Eq. (11).
Once c^ i is estimated for all the patches, the HR patches can be
reconstructed by x ^ A c^ . In order to reconstruct the complete
i i
image from the patches, we have to make sure that the extracted
patches from the reconstructed HR image should not differ much
from the estimated patches. It is achieved by minimizing the fol-
lowing cost function:
L 2
^
x = arg min i = 1 ^
x iPi ^
x
Fig. 6. Intensity Prole of Cameraman image: dotted line is for the original image ^
x 2
and the continuous line is for the interpolated version of the LR image.
L 2
= arg min i = 1 A ^
c iPi ^
x
^
x 2 (13)
which is a quadratic equation, and can have a closed form solution

as
L 1 L

x^ PiT Pi PiT A c^i,
i=1 i = 1
(14)
where L denotes the number of patches. In addition, the output HR

Fig. 7. (a) Original gray scale Lena image, (b) vertical edge evidence ( xv ), (c) hor-
image x has to follow the input LR image y, thus the following cost
izontal edge evidence ( xh ), and (d) magnitude of edge ( x g ).
function has to be solved:
column-wise to construct Ak, , i.e., Ak, = [p1, , p2, , , ps, ]. Thus, ^
x^ = arg min y SBx^ 22 .
we have the dictionaries Ak, , k , related to the centroids rk, , x (15)
which are together termed as Dicxtroids.2 For convenience of expression, Eqs. (12) and (15) can be clubbed
together for the entire image, and can be written in a simpler form
as Eq. (3):
3.2. Reconstruction phase
The reconstruction phase starts with interpolating the LR test c

{
c^ = arg min y SBA c 22 + c 1 , } (16)
image by required up-sampling factor to achieve the initial ap-
where c is the concatenation of all c i . The weight parameter can
proximation of HR image x ^ of size m m are ex-
^ . Patches x
i be estimated using MAP to improve performance [40,25]. Here, the
tracted using the same patch extractor matrix Pi as is done in case estimation of c is done from the LR test image y based on Bayesian
of dictionary learning. Only the patches with higher variance in framework
intensity are selected for SR to give importance to visually sensi-
tive regions as well as to reduce the computational burden.
^ has been calcu-
Next, structural information of a test patch x i
c^ = arg max
c
{ logP (c|y) }
lated by the same process as is done in case of training. Let the
angle of dominant edge orientation for a patch be , and it is sent
= arg min
c { logP (y|c) logP (c) . } (17)
to a dicxtroid selector (see Fig. 3), which selects a particular By assuming that y is contaminated with additive white Gaussian
dicxtroid among all the available dicxtroids using the following noise (AWGN) of s. d. n , and sparse coefcient vector follows
criterion: Laplace distribution, the weight can be estimated using the
following formula:
^
k
{
ki, = arg min x^ i rk, 2 } s.t. k, | k| < 15,
(11)
=
2 2 n2
,
^ c + (18)
where k i, denotes the index of a particular centroid, out of K
centroids. The rst term of the Eq. (11) helps in assigning a suitable where c is the s.d. of c^ and is a small constant.
^
dictionary Ak, (related to k i, ) out of K dictionaries, which has been
3.3. Edge preserving constraint
chosen by restricting around k as denoted as a constraint in the
equation. Since, we are selecting one particular dictionary out of K
Edge contains perceptually signicant information, thus it
dictionaries for a patch, it compels the coefcient vector c i to be
needs to be preserved. It is generally done by constraining edge of
zero for other dictionaries, thus the coefcient vector becomes
the solution image to match with the LR input image. The reason
very sparse, and can be achieved by behind it can be seen in Fig. 6, which shows the intensity prole of
a row of original Cameraman image, and the interpolated version
2
Here we use the word Dicxtroids to express the dictionaries and centroids of the LR image. One can observe that the large variation of the
together. intensity prole of the original image (dotted line) almost
image. Another observation is that the smoother regions are al-

most similar, thus the proposed way of discarding the smoother
patches is also justied.
Several approaches of edge preservation in SR have been re-
ported [36,22,25,56,57]. But most of them consider derivative
feature as detail information to preserve, which may not be ac-
curate due to discontinuities present in the lters, deployed in
those works. The work in reference [53] has proposed a con-
tinuous gradient feature (edginess), which is computed by 1-D
processing of image. In this method, the smoothing operator is
Fig. 8. SR using edge preserving constraint. applied along one direction, and the derivative operator is applied
along the orthogonal direction. By repeating this procedure along
the orthogonal direction also, two edge gradients are obtained,
which together represent the intensity gradient of the image. As
Table 1 the smoothing is done along a direction orthogonal to the direc-
Pseudo code.
tion of the edge extraction, smearing of the edges is reduced.
Task: Dictionary Learning Generally, Gaussian operator is chosen as smoothing operator and
Available Data: Some example training HR images (t). derivative of Gaussian is used as gradient operator. The outcome of
this approach can be observed for Lena image in Fig. 7, where (b),
Data Preparation:
Extract patches t i from t using patch extractor matrix Pi . (c) and (d) represent vertical edge evidences, horizontal edge
Remove patches with lower variances. evidences and gradient magnitude of (a), respectively.
To preserve edge (edginess), we have followed the framework
Clustering: as shown in Fig. 8, which is the edge preservation block of the
Check the structural information of the patches () by examining the domi- Fig. 3. Here, the reconstructed HR image is blurred and down-
nant orientation of edges computed using Eqs. (6) and (7).
Cluster the patches into six groups of R , using criteria sampled to synthesize the LR counterpart of the input LR image.
+ 90 ( 15, + 15], where {0, 30, 60, 90, 120, 150}. 1-D processing has been applied on both of the LR images to
Each R is further clustered using K-means clustering as described by Eqs. generate the edginess representation, and both representations
(8) and (9). are compared against each other. Much difference in comparison
will cause the system to return back to the SR block. This feedback
Dictionary:
Compute k, from each Rk, cluster using Eq. (10). process will continue until the difference reaches below a
Decompose k, into its eigenvalues and eigenvectors ( p ). threshold value.
Keep few (s) signicant p column-wise to construct the dictionaries Ak, , k . Our prior work [25] considers the edge information along dif-
ferent direction to preserve during SR approach. In this work, we
Output: Dicxtroids Dictionaries Ak, and Centroids rk, . have opted the same method, but here we consider only magni-
tude of the edge to preserve during SR. The reason being that,
Task: HR reconstruction during dictionary learning and selection, preservation of edge or-
Available Data: LR image (y), scale-up factor, the trained dictionary ( Ak, ) and
ientation is addressed, and the magnitude of edge is preserved
the blur kernel (B).
through the edge preserving constraint. Thus, we are preserving
Initialization: Initialize k 0 and both edge direction as well as magnitude in the proposed SR
Set the stopping criteria: (1) Maximum iteration number Max and (2) Error process. Let us say x v and xh represent vertical and horizontal edge
Threshold . evidences, respectively. Thus the magnitude of edge can be ob-
Get the initial approximation of HR image ( x^ ) by interpolating the LR image. tained by
Iteration: Set k = k + 1 and apply the following steps: xg = x 2v + x2h . (19)

Extract patches x^ i of size m m from x^ .
Select patches with higher variation to super-resolve. We assume that x g has been obtained by applying Eg operator on
Check the structural information () of a patch similarly as is done in case of the image x. The magnitude of edge can be preserved by adding
training.
Select Dicxtroids having orientation if | | < 15. {} { }
the constraint Eg y Eg SBx 22 to Eq. (16). This constraint
Assign a particular dictionary Ak, based on Eq. (11). maintains the continuity of edge magnitude of HR output image
Compute coefcient vector c i, from Eq. (12) using iterative thresholding with the LR input image. In fact, it helps in addressing the ill-posed
algorithm [55]. nature of SR problem. Thus, the cost function can be written as
Achieve the full image by Eq. (14) after having all c i, .
Minimize the data term y SBx 22 and edge preserving term
^^
{ }
c^ = arg min y SBA c 22 + c 1 , s.t.
c
Eg {}
y
{}
Eg y Eg { }
SBx 22 to derive x .
^^
Stopping Rule: If x
^^ 2
k x k 1 2 , stop. Otherwise iterate till k Max.
Eg { } SBx 22 .
(20)
Output: The result is HR image x. With an appropriate weight (), the above constrained cost func-
tion can be rewritten as
c
{ {}
c^ = arg min y SBA c 22 + c 1 + Eg y
coincides with the variation of the intensity prole of the inter-

polated version of the LR image (continuous line). Thus, it is logical
{
Eg SBA c 22 . } } (21)
to put a constraint, which maintains edge continuity of the down- The equation can be solved using iterative thresholding algorithm
sampled version of the estimated HR image with the input LR [55]. The pseudo code of the proposed approach is given in Table 1.
Fig. 9. Results for clean Buttery image ( 3): (a) LR image, (b) result of Raw Patches (RP) [36] [PSNR 23.46, SSIM 0.7747], (c) result of Single image scale-up (SISU) [22]
[PSNR 25.92, SSIM 0.7801], (d) Aplus result [49] [PSNR 26.90, SSIM 0.7733], (e) ASDS result [40] [PSNR 26.49, SSIM 0.8861], (f) proposed structural-statistical
dictionary (SSD) results [PSNR 27.16, SSIM 0.9021], (g) proposed SSD with edge preservation (SSD-EP) result [PSNR 27.38, SSIM 0.9087] and (h) the ground truth image.
groups are clustered into K groups using K-means clustering. K is

an important parameter to choose, as smaller K may wash out the
differences among the clusters, whereas larger K makes each
cluster less informative. Empirically, we have found that K 60
gives the best results. However, slight variation in the number of
cluster does not make substantial changes in the results of super-
resolution (validated in Section 4.5.1(b)).
In case of testing, rst of all, the target image of size 256 256
is blurred by applying 7 7 Gaussian kernel with s.d. of 1.6. The
resultant image is down-sampled by the required up-sampling
factor (here 3) horizontally as well as vertically to produce the LR
test image of size 86 86. The experimental analysis is conducted
for noise-less as well as noisy condition to check the robustness of
the proposed approach. For noisy environment, additive white
Gaussian noise (AWGN) with s.d. of 5 is considered. It has to be
noted that the LR images are synthesized in similar way for all the
Fig. 10. Closer analysis of Fig. 9: the rst two rows are the zoomed version (4 ) of
the boxed part of Fig. 9, all images are kept in the same sequence as parent gure.
approaches. The value of , which makes balance between sparsity
The last row represents the error images (results are subtracted from the ground term and data term, is adaptively computed using Eq. (18). The
truth) of the second row. parameter in Eq. (21) should be larger for noiseless case as
compared to the noisy case. It is due to the fact that the larger
value can emphasize the preservation of edges more but for noisy
4. Experimental validation
image, the same value may lead to a degraded result as edginess
4.1. Experimental geography representation will also have noise. Experimentally we have found
that the best value of is 0.1 and 0.001 for noiseless and noisy
In our experiments, we have extracted patches of size 5 5 environment, respectively (shown in Section 4.5.1(d)). In compu-
from a set of training images, which are rich in content, and are tation of edginess representation, the smoothing function is cho-
sen as Gaussian with s.d. 1 and derivative of the same function
provided by the approach [40]. The threshold value (Th) to select
with s.d. of 0.4 is considered as derivative operator.
the highly varying patches extracted from the training and testing
For color images, we rst convert the image from RGB domain
images are chosen as 4.5 and 1.5, respectively for super-resolving
to YCbCr domain, where the luminance (Y) component is selected
most of the patches of test image (experimentally justied in for SR, and the chroma components (CbCr) are scaled-up by bi-
Section 4.5.1(c)). Following the approach [41], the selected patches cubic interpolation. Later the super-resolved Y component is
from training images (typically on the order of 100,000) are combined with the interpolated chroma components to derive the
grouped into one of the six clusters, denoted as R HR color image. This is because the luminance component plays
( {0, 30, 60, 90, 120, 150}) by choosing maximum devia- signicant role in comparison to chroma components in visualiz-
tion of 15 for the dominant edge orientation. Further, those ing an image. The quantitative results of luminance components
Fig. 11. Results for clean Bike image ( 3) (a) LR image, (b) result of RP [36] [PSNR 22.88, SSIM 0.7008], (c) result of SISU [22] [PSNR 24.05, SSIM 0.7525], (d) result of
Aplus [49] [PSNR 24.30, SSIM 0.7766], (e) ASDS result [40] [PSNR 24.20, SSIM 0.7813], (f) proposed SSD results [PSNR 24.49, SSIM 0.7961], (g) proposed SSD-EP
result [PSNR 24.58, SSIM 0.7987] and (h) the ground truth image.
in Fig. 9 for up-sampling factor 3, where (a) is LR input image,

(b) is the result of raw patches (RP) [36], (c) is the result of single
image scale-up (SISU) [22], (d) is the result of Aplus [49], (e) is the
adaptive sparse domain selection (ASDS) result [40], (f) is the
proposed structural-statistical dictionary (SSD) result, (g) is the
proposed SSD with edge preservation (SSD-EP) result, and (h) is
the ground truth image. At rst glance, the results are very much
comparable to each other. To observe the differences, we zoom the
parts of the black-colored box by 4 and are kept in same se-
quence in the rst two rows of Fig. 10. One can observe that the
zoomed versions of (f) and (g) are perceptually better than the
results of rst row. The edges of the results in rst row are more
smeared than the results in second row. It shows the advantages of
learning multiple dictionaries instead of relying on a universal one,
which is followed in references [36,22]. However, zoomed versions
of (f) and (g) both are visually very much similar to the same of (e),
which is the result of ASDS approach [40]. We have also shown the
Fig. 12. Closer analysis of Fig. 11: the rst two rows are the zoomed version (4 ) of error images, which are obtained by subtracting the zoomed ver-
the boxed part of Fig. 11, all images are kept in the same sequence as parent gure. sions of the results from the ground truth version, and are kept in
The last row represents the error images (results are subtracted from the ground the last row of Fig. 10. It can be observed that the error images of
truth) of the second row.
(f) and (g) contain more black regions than (e). This is because of
suitable employment of dictionaries as well as edge preserving
are compared against the state-of-the-art approaches constraint, which are neglected in reference [40]. Similarly, the
[36,22,40,49] in terms of peak signal-to-noise ratio (PSNR) and superiority of the proposed approach can be observed for Bike
structural similarity index (SSIM) [58,59] image quality assess- image in Fig. 11, and its zoomed and error version in Fig. 12.
ment metrics. While PSNR metric focuses on the mean squared We have also compared the results quantitatively (PSNR and
error perspective, SSIM concentrates on the structural similarity, SSIM) for 10 images and shown in Table 2, where the bold fonts
which is related with HVS. Thus, both the metrics examine the represent the best values in the corresponding row. It can be noted
quality of the super-resolved image in different perspectives. It has that the results of the proposed approach are better than state-of-
to be noted that the approaches [36,22,49] are retrained in the the-art approaches in most of the cases. However, reconstruction
experiments according to the degradation model (blurring fol- of the images with lots of small details such as Baboon and Racoon
lowed by down-sampling) followed in the proposed approach. are sometimes mis-treated by the proposed approach. The reason
being that the edge preserving constraint helps in preserving the
4.2. Experimental results without noise dominant edges, and the edges with small magnitude are often
smeared in down-sampling operation as well as in extracting ed-
Here we consider ten standard images to illustrate our results giness feature, as both the operations involve initial smoothing.
without noise. The SR results for Buttery image can be visualized Nevertheless, for rest of the cases the proposed approach is
Table 2
Results of SR ( 3) for noise-less LR image (sn 0); Bold fonts represent the best values.
Images Metrics BC RP [36] SISU [22] ASDS [40] Aplus [49] SSD SSD-EP
Baboon PSNR 19.95 22.27 22.63 20.91 20.80 20.90 20.90

SSIM 0.3435 0.4897 0.5297 0.4993 0.4822 0.5001 0.4982
Barbara PSNR 22.87 24.21 24.92 24.35 24.30 24.34 24.34
SSIM 0.6111 0.6854 0.7219 0.7278 0.7141 0.7276 0.7271
Bike PSNR 20.71 22.88 24.05 24.20 24.30 24.49 24.58
SSIM 0.5727 0.7008 0.7525 0.7813 0.7766 0.7961 0.7987
Buttery PSNR 20.72 23.46 25.92 26.49 26.90 27.16 27.38
SSIM 0.7222 0.7747 0.7801 0.8861 0.7733 0.9021 0.9087
Cameraman PSNR 21.48 23.73 25.07 24.67 24.90 24.83 24.91
SSIM 0.7033 0.7826 0.8249 0.8169 0.8222 0.8225 0.8243
Hat PSNR 27.13 29.21 30.73 30.76 31.00 31.05 31.16
SSIM 0.7836 0.8446 0.8716 0.8667 0.8797 0.8749 0.8769
Parrot PSNR 25.27 27.31 29.07 29.60 29.60 29.96 30.07
SSIM 0.8168 0.8692 0.9023 0.9047 0.9031 0.9103 0.9119
Pentagon PSNR 22.04 24.45 25.36 25.81 25.30 26.22 26.27
SSIM 0.4909 0.6553 0.7007 0.7453 0.7175 0.7618 0.7623
Peppers PSNR 23.36 26.06 28.82 28.79 28.90 29.10 29.19
SSIM 0.7229 0.8173 0.8668 0.8597 0.8552 0.8642 0.8648
Racoon PSNR 26.36 27.58 28.74 29.15 28.90 29.16 29.18
SSIM 0.6287 0.7007 0.7387 0.7676 0.7504 0.7651 0.7647
Average PSNR 22.99 25.12 26.53 26.47 26.49 26.72 26.80
SSIM 0.6396 0.7320 0.7689 0.7855 0.7674 0.7925 0.7938
Table 3 there is not much signicant quantitative gain while using SSD-EP
Noise-less results (3) in terms of PSNR and SSIM gains over bi-cubic interpolation over SSD, but perceptual difference can be observed in the gures.
method (sn 0).
The proposed approach is further compared in Table 3 with two
Images Metrics K-means [60] Geometric dictionary [41] SSD-EP SR approaches [41,60] in terms of metrics gain.3 The PSNR and
SSIM gains represent the difference between the super-resolved
Lena PSNR 1.44 1.93 5.49
HR image and the bi-cubic interpolated image. It has to be noted
SSIM 0.0035 0.0046 0.1550
Peppers PSNR 0.86 1.32 5.74 that the proposed approach produces best gains for both images.
SSIM 0.0025 0.0035 0.1401
Fig. 13. SR results for noisy Barbara image ( 3): (a) noisy LR image, (b) result of RP [36] [PSNR 23.76, SSIM 0.6196], (c) result of SISU [22] [PSNR 23.99, SSIM 0.6082],
(d) result of Aplus [49] [PSNR 22.80, SSIM 0.5349], (e) ASDS result [40] [PSNR 23.31, SSIM 0.5923], (f) proposed SSD results [PSNR 23.56, SSIM 0.6210], (g) proposed
SSD-EP result [PSNR 23.69, SSIM 0.6438] and (h) the ground truth image.
reporting the best results. It demonstrates the importance of

combining structural as well as statistical information of image 3
Since, source codes are not available, we compare the approaches [41,60]
patches for the construction of dictionary. It has to be noted that with ours in terms of metric gain.
rest of the images because reduction of noise as well as pre-

servation of dominant edge is better for both the examples. The
reason being that the proposed approach considers dominant edge
orientation and mean of the intensity values to select a particular
dictionary, and the method of computing dominant edge or-
ientation is less sensitive to noise. It can be observed from Fig. 17
(plotted by taking average of 13 images) that for all
{0, 30, 60, 90, 120, 150}, the correct classication
rates4 are more than 60%. Thus, it can be forwarded that the es-
timated structural information of patches has not been affected
much, due to the noise. As a consequence, most of the LR patches
select dictionaries, which are suitable in structural sense. The
linear combination of clean patches from the selected dictionaries
replaces the noisy patches of the LR image to produce better
result.
One important observation is that (g) is slightly better as
compared to (f) in terms of suppressing artifacts and preserving
Fig. 14. Closer analysis of Fig. 13: the rst two rows are the zoomed version (4 ) of edges, because edge preserving constraint helps in preserving
the boxed part of Fig. 13, all images are kept in the same sequence as parent gure;
dominant edges while removing noise. This is due to the fact that
the last row represents the error images (results are subtracted from the ground
truth) of the second row. we have considered edginess as a feature to preserve, which has
been extracted using 1-D processing of image. The initial
Fig. 15. SR results for noisy Peepers image ( 3): (a) noisy LR image, (b) result of RP [36] [PSNR 25.39, SSIM 0.7369], (c) result of SISU [22] [PSNR 26.85, SSIM 0.7204],
(d) result of Aplus [49] [PSNR 25.50, SSIM 0.6363], (e) ASDS result [40] [PSNR 26.28, SSIM 0.6975], (f) proposed SSD results [PSNR 26.86, SSIM 0.7362], (g) proposed
SSD-EP result [PSNR 27.19, SSIM 0.7721] and (h) the ground truth image.
4.3. Experimental results with noise smoothing operation of 1-D processing of image does help in re-
ducing the level of noise, but also smears the ner details. Thus,
The robustness of the SR system is also examined under noisy the noisy images having more textures cannot be effectively re-
condition. For demonstration, we have considered the images of covered, as can be noticed in terms of PSNR and SSIM values for
Table 2 for up-sampling factor 3 only. We have added AWGN with Baboon, Bike and Racoon images in Table 4, where the quantitative
n = 5 to the LR image and is given as an input to the SR system. results in terms of PSNR and SSIM are shown.
The outputs can be qualitatively visualized for Barbara and Peepers
image in Figs. 13 and 15, respectively. The zoomed versions can be 4.4. Experiments with different up-sampling factors and datasets
observed in Figs. 14 and 16. It has to be noted that the results of
(b) for both the examples are smoother as well as the edges are Further, the proposed approach is scrutinized for different up-
more smeared as compared to the rest of the images. On the other
hand (c)(e) have more high frequency information, but at the 4
Here we consider different as different classes. One noisy patch is being
same time they have artifacts in the smoother regions. However, correctly classied means, the dominant edge orientation of the patch lie within
the results of (f) and (g) are perceptually better as compared to 15 range, where is the dominant edge orientation of the ground truth patch.
sampling factors as well as for different datasets (provided by [49])

in Tables 57. For demonstration in terms of PSNR and SSIM va-
lues, we have used only four examples, randomly selected from
each of the dataset and compared with existing approaches
[36,22,40,49]. Among these approaches RP [36], SISU [22] and
Aplus [49] are retrained for different up-sampling factors accord-
ing to the degradation model followed in the proposed approach.
The bold fonts represent the best case among the approaches for a
particular scale. One can note that for most of the datasets the
proposed approach is producing the best results in terms of PSNR
and SSIM across the scales. Hence, one can infer that the proposed
approach can be used to super-resolve any image for different up-
sampling factors.
4.5. Analyzing the proposed approach
The proposed approach is further analyzed in terms of the

Fig. 16. Closer analysis of Fig. 15: the rst two rows are the zoomed version (4 ) of following factors: signicance of the parameters employed in the
the boxed part of Fig. 15, all images are kept in the same sequence as parent gure; algorithm and the computational aspect. The parameters are
The last row represents the error images (results are subtracted from the ground (a) number and type of training patches, (b) different number of K
truth) of the second row.
in clustering the training patches, (c) Th to select patches for SR, (d)
of edge preserving term in Eq. (21), and (e) the size of patch. For
each of these experiments, we have considered up-sampling factor
3.
4.5.1. Parametric analysis
(a) Training patches: The goodness of the learned dictionaries are

examined for different number as well as different types of
patches for training. Fig. 18 shows the average behavior of the
PSNR and SSIM values of randomly selected 5 images for
varying number of training patches. Here, we have plotted the
results for different number of training patches ranging from
500 to 4000 in order to demonstrate the dependency of the
performance on the size of training data. However, the
improvement is marginal for training patches more than
4000. It can be observed that initially there is rapid improve-
ment in the performance with increasing number of training
patches and it reaches to steady state after a certain point. The
Fig. 17. The rate (%) at which patches are getting correct dictionaries with similar reason could be that initially there is a requirement of
structure in the presence of noise. different variations of training patches to generalize the
principal components, which are used to estimate dictionary
Table 4
Results ( 3) of SR for noisy LR image (sn 5); Bold fonts represent the best values.
Images Metrics BC RP [36] SISU [22] ASDS [40] Aplus [49] SSD SSD-EP
Baboon PSNR 19.84 21.94 22.05 20.29 20.10 20.37 20.39

SSIM 0.3317 0.4520 0.4755 0.4106 0.4042 0.4116 0.4057
Barbara PSNR 22.65 23.76 23.99 23.31 22.45 23.56 23.69
SSIM 0.5739 0.6196 0.6082 0.5923 0.5349 0.6210 0.6438
Bike PSNR 20.66 22.43 23.23 23.49 22.70 23.45 23.33
SSIM 0.5623 0.6442 0.6677 0.7197 0.6358 0.7215 0.7129
Buttery PSNR 20.69 23.04 24.84 25.75 24.30 25.93 25.93
SSIM 0.7079 0.7573 0.7566 0.8467 0.7166 0.8608 0.8709
Cameraman PSNR 21.33 23.32 24.06 23.59 23.20 23.78 23.97
SSIM 0.6290 0.6703 0.6178 0.6112 0.5408 0.6460 0.7089
Hat PSNR 26.92 27.91 27.81 29.49 26.30 29.68 29.68
SSIM 0.7523 0.7150 0.6439 0.7947 0.5436 0.8126 0.8278
Parrot PSNR 25.12 26.46 26.95 28.60 25.80 28.50 28.43
SSIM 0.7887 0.7491 0.6884 0.8406 0.5825 0.8529 0.8663
Pentagon PSNR 21.88 23.87 24.33 24.08 23.40 24.33 24.36
SSIM 0.4760 0.6062 0.6361 0.6310 0.6076 0.6406 0.6353
Peppers PSNR 23.14 25.39 26.85 26.28 25.50 26.86 27.19
SSIM 0.6778 0.7369 0.7204 0.6975 0.6363 0.7362 0.7721
Racoon PSNR 26.21 26.61 26.76 27.99 25.40 27.97 27.86
SSIM 0.6165 0.6290 0.6202 0.6917 0.5451 0.6865 0.6761
Average PSNR 22.84 24.47 25.09 25.29 23.95 25.44 25.48
SSIM 0.6116 0.6580 0.6435 0.6836 0.5747 0.6990 0.7120
Table 5
Results of SR for Set5 for different scales ( n = 0 ); Bold fonts represent the best values.
Dataset Scales Images Metrics RP [36] SISU [22] ASDS [40] Aplus [49] SSD-EP
Set5 2 Baby PSNR 33.73 35.95 36.44 36.60 36.93

SSIM 0.9161 0.9360 0.9296 0.9421 0.9435
Time (s) 254.16 5.29 1420.30 7.01 2456.2
Bird PSNR 32.78 35.98 37.73 37.40 38.77
SSIM 0.9382 0.9614 0.9593 0.9704 0.9739
Time (s) 68.23 1.64 433.34 2.14 1502.80
Head PSNR 33.06 34.06 34.35 34.40 34.61
SSIM 0.8175 0.8361 0.8448 0.8454 0.8510
Time (s) 76.20 1.58 183.70 2.10 685.33
Woman PSNR 29.36 31.47 33.36 32.60 33.77
SSIM 0.9077 0.9326 0.9344 0.9453 0.9500
Time (s) 67.62 1.57 181.81 1.96 399.62
Average PSNR 32.23 34.37 35.47 35.25 36.02
SSIM 0.8949 0.9165 0.9170 0.9258 0.9296
Time (s) 116.55 2.52 554.77 3.30 1288.00
3 Baby PSNR 31.51 35.13 35.15 35.00 35.15

SSIM 0.8877 0.9198 0.9208 0.9181 0.9207
Time (s) 128.61 3.95 1189.60 1.80 1772.60
Bird PSNR 30.11 33.59 35.05 35.10 35.41
SSIM 0.8974 0.9342 0.9489 0.9506 0.9551
Time (s) 33.20 1.08 440.46 0.58 1166.50
Head PSNR 32.34 34.31 33.59 33.60 33.64
SSIM 0.8223 0.8460 0.8216 0.8191 0.8225
Time (s) 34.54 0.99 326.96 0.55 905.00
Woman PSNR 27.01 29.72 30.94 30.90 31.43
SSIM 0.8643 0.9051 0.9197 0.9235 0.9275
Time (s) 30.37 1.02 200.45 0.53 843.05
Average PSNR 30.24 33.19 33.68 33.65 33.91
SSIM 0.8679 0.9013 0.9027 0.9028 0.9065
Time (s) 56.68 1.76 539.36 0.86 1171.80
4 Baby PSNR 28.94 32.74 33.05 33.20 35.18

SSIM 0.8249 0.8737 0.8840 0.8822 0.8849
Time (s) 75.93 2.39 1206.90 1.80 2028.80
Bird PSNR 27.58 31.04 31.59 32.00 32.02
SSIM 0.8362 0.8344 0.9029 0.9095 0.9124
Time (s) 20.98 0.83 389.66 0.59 973.15
Head PSNR 29.95 31.99 32.25 32.40 32.48
SSIM 0.7345 0.7648 0.7768 0.7774 0.7821
Time (s) 19.80 0.66 304.71 0.55 709.26
Woman PSNR 24.74 27.44 27.87 28.50 28.66
SSIM 0.8019 0.8574 0.8706 0.8831 0.8861
Time (s) 17.62 0.68 186.06 0.53 772.51
Average PSNR 27.80 30.80 31.19 31.52 31.58
SSIM 0.7994 0.8466 0.8586 0.8630 0.8664
Time (s) 33.58 1.14 521.83 0.87 1120.90
(s) in the proposed approach. After a certain point, increasing (b) Different number of K: In the proposed approach, each edge
the number of training patches does not change the behavior cluster is further divided into K clusters, and PCA based dic-
of principal components signicantly. Hence, it can be inferred tionaries are built for each of the clusters. In order to choose
that increasing number of training patches can improve PCA the number of K, we have plotted the cost of tting the
based dictionary up to a certain point, after that the goodness training patches with respect to different number of clusters in
of the PCA based dictionary will not improve much. Fig. 19. One can observe that the tting cost has reduced with
The robustness of the proposed dictionary is further reviewed increasing number of clusters. However, the cost reduction
using different training datasets. Here, the objective is to check from K 60 to K 100 is lesser as compared to the lower K.
the dependency of the proposed dictionary on the type of Again, we have checked the PSNR values of the super-resolved
training examples. In this case, we have used two datasets, one images for 5 examples using the dictionaries learned using
is provided by the approach [40], which consists of 5 images K 50, 60 and 70, the mean values of them are 28.47, 28.48
only, and another dataset is provided by the approaches and 28.46, respectively. From both the analyses, we have
[36,22,49,48], which consists of 91 images. The resultant PSNR found that for the case K 60, the training patches are quite
and SSIM values for both of the datasets are presented in well tted as well as the proposed approach is producing the
Table 8. One can note that the results are more or less similar comparable result. Hence, to reduce computational cost as
for both the datasets. This is because, once we have enough well as tting cost, we have chosen K 60.
number of patches with different variations for training the (c) The threshold value (Th): A suitable threshold value denoted as
PCA based dictionaries, the proposed approach can produce Th is employed in the proposed approach to select or reject
results that is almost invariant to training examples. patches for learning dictionaries as well as for super-
Table 6
Results of SR for Set14 for different scales ( n = 0 ); Bold fonts represent the best values.
Set14 2 Comic PSNR 24.26 25.31 27.04 25.80 27.03

SSIM 0.7667 0.8112 0.8715 0.8299 0.8764
Time (s) 99.35 2.13 391.03 0.66 449.60
Foreman PSNR 32.13 34.27 35.14 34.90 35.59
SSIM 0.9250 0.9421 0.9352 0.9537 0.9496
Time (s) 82.64 2.03 330.82 0.73 799.82
Lena PSNR 31.90 33.78 34.87 34.60 35.24
SSIM 0.8715 0.8919 0.8909 0.9016 0.9048
Time (s) 240.57 5.25 931.64 1.88 1854.20
Zebra PSNR 27.49 30.42 32.00 31.00 32.13
SSIM 0.8385 0.8849 0.9118 0.8953 0.9141
Time (s) 264.83 4.55 708.07 1.62 989.40
Average PSNR 28.95 30.94 32.26 31.57 32.50
SSIM 0.8504 0.8825 0.9024 0.8951 0.9112
Time (s) 171.85 3.49 590.39 1.22 1023.20
3 Comic PSNR 22.92 24.08 24.22 24.10 24.44

SSIM 0.6860 0.7409 0.7696 0.7583 0.7848
Time (s) 50.39 1.38 330.21 0.47 568.87
Foreman PSNR 30.12 32.85 34.17 33.90 34.51
SSIM 0.8944 0.9236 0.9335 0.9384 0.9373
Time (s) 51.58 1.59 310.00 0.57 716.66
Lena PSNR 30.21 33.13 33.30 33.20 33.45
SSIM 0.8597 0.8910 0.8806 0.8782 0.8830
Time (s) 147.83 4.43 839.29 1.44 1621.30
Zebra PSNR 25.42 28.06 28.51 28.50 28.88
SSIM 0.7662 0.8254 0.8457 0.8390 0.8486
Time (s) 103.09 2.88 771.15 1.28 1277.40
Average PSNR 27.17 29.53 30.05 29.92 30.32
SSIM 0.8016 0.8452 0.8574 0.8535 0.8634
Time (s) 88.22 2.57 560.66 0.94 1046.10
4 Comic PSNR 21.27 22.57 22.33 22.50 22.53

SSIM 0.5731 0.6402 0.6478 0.6536 0.6659
Time (s) 23.27 0.72 252.64 0.47 812.94
Foreman PSNR 27.83 30.90 31.61 32.20 32.63
SSIM 0.8536 0.8892 0.9009 0.9124 0.9151
Time (s) 22.71 0.81 235.97 0.57 643.22
Lena PSNR 27.86 30.70 31.16 31.50 31.52
SSIM 0.7874 0.8300 0.8428 0.8450 0.8488
Time (s) 64.95 2.15 664.85 1.44 1411.60
Zebra PSNR 23.26 25.52 25.06 25.70 25.55
SSIM 0.6665 0.7352 0.7371 0.7458 0.7454
Time (s) 64.36 1.96 630.50 1.28 1512.10
Average PSNR 25.05 27.42 27.54 27.97 28.05
SSIM 0.7201 0.7737 0.7822 0.7892 0.7938
Time (s) 43.82 1.41 445.99 0.94 1095.00
resolution. Since, images collected for learning purpose are values) has improved with increasing value of up to 0.1,
very rich in detail, very few patches will be rejected due to the and it saturates after that. On the other hand, for noisy image,
predened threshold value. But the same Th will steer rejec- the performance is degrading with increasing from 0.001,
tion for most of the patches of testing image as the resolution and the degradation is quite high for > 1. The reason could be
of the image is lower. Hence, we have set a lower Th for the that for clean image, increasing will emphasize the edge
testing phase, which is lower as compared to its counterpart in preservation, and can produce a better result. On the other
the training phase. This value will allow more patches of hand for noisy image, the edginess feature will have some
testing image to participate in SR process, which leads to unwanted noise due to derivative operation. Hence, increasing
better results as can be observed in Table 9 in terms of PSNR weight enhances the noise also, and the performance
and SSIM values. It can be examined for most of the cases that degrades. By considering this observation, we have selected
the results of testing Th 1.5 are better than 4.5 and 0.1 for the value of as 0.1 and 0.001 (the rst points to produce best
noisy as well as noiseless condition. Hence, we choose testing results) for clean images and noisy images, respectively, so as
Th 1.5, which is lower than the training Th 4.5. to achieve best performance.
(d) Performance with respect to : The parameter plays an im- (e) The size of patch: In order to restore the patches in sparse re-
portant role in assigning weight to the edge preserving term in presentation framework, the patches need to contain few
the optimization cost of Eq. (21). The variation of the perfor- image details, otherwise, it cannot be restored using few
mance with respect to is shown in Fig. 20. It can be observed training patches. Hence, the size of patches needs to be
for clean image that the performance (in terms of PSNR smaller. Here, we have used patches of size 3 3, 5 5, 7 7,
Table 7
Results of SR for B100 for different scales ( n = 0 ). The numbers within parenthesis represent the rank of our approach in comparison with the others.
B100 2 Image1 PSNR 25.48 26.19 26.26 25.30 26.13 (3)

SSIM 0.6858 0.7252 0.7414 0.6927 0.7315 (2)
Time (s) 126.36 3.14 619.11 1.44 1363.60
Image2 PSNR 30.04 31.09 30.96 30.10 30.87 (3)
SSIM 0.8622 0.8828 0.8816 0.8588 0.8841 (1)
Time (s) 104.05 2.92 516.11 1.54 1274.40
Image3 PSNR 28.20 28.99 28.53 27.60 28.65 (2)
SSIM 0.8591 0.8784 0.8599 0.8564 0.8726 (2)
Time (s) 119.82 2.91 501.14 1.48 1199.70
Image4 PSNR 32.18 33.46 33.23 32.50 33.36 (2)
SSIM 0.8949 0.9169 0.9115 0.9036 0.9230 (1)
Time (s) 132.94 3.15 489.60 1.45 1182.40
Average PSNR 28.97 29.94 29.75 28.87 29.75 (2)
SSIM 0.8255 0.8508 0.8486 0.8279 0.8528 (1)
Time (s) 120.79 3.03 531.49 1.48 1255.00
3 Image1 PSNR 23.83 24.69 24.56 24.40 24.66 (2)

SSIM 0.5772 0.6236 0.6368 0.6342 0.6417 (1)
Time (s) 62.40 1.97 582.40 0.96 1338.20
Image2 PSNR 27.81 28.98 29.28 29.20 29.36 (1)
SSIM 0.7996 0.8236 0.8377 0.8273 0.8396 (1)
Time (s) 47.43 1.90 595.57 0.97 1153.80
Image3 PSNR 25.93 26.64 26.65 26.60 26.71 (1)
SSIM 0.7960 0.8182 0.8194 0.8250 0.8227 (2)
Time (s) 61.07 1.70 583.27 0.92 1091.80
Image4 PSNR 29.75 31.15 31.54 31.30 31.55 (1)
SSIM 0.8354 0.8667 0.8827 0.8710 0.8827 (1)
Time (s) 91.98 2.25 589.71 1.00 1052.50
Average PSNR 26.83 27.86 28.01 27.87 28.07 (1)
SSIM 0.7520 0.7830 0.7941 0.7894 0.7967 (1)
Time (s) 65.72 1.95 587.74 0.96 1159.10
4 Image1 PSNR 23.07 24.20 23.41 23.60 23.51 (3)

SSIM 0.5281 0.5809 0.5441 0.5639 0.5525 (3)
Time (s) 41.11 1.51 542.77 0.64 1123.90
Image2 PSNR 26.65 28.25 27.87 28.00 28.05 (2)
SSIM 0.7656 0.7938 0.7880 0.7841 0.7928 (2)
Time (s) 34.56 1.28 545.95 0.65 1043.90
Image3 PSNR 25.33 26.25 25.34 25.50 25.51 (2)
SSIM 0.7692 0.7949 0.7687 0.7800 0.7741 (3)
Time (s) 38.82 1.31 532.41 0.60 995.10
Image4 PSNR 28.67 30.44 29.84 30.00 29.95 (3)
SSIM 0.7879 0.8282 0.7371 0.7458 0.7454 (3)
Time (s) 64.36 1.96 520.04 0.68 970.18
Average PSNR 25.93 27.28 26.62 26.77 26.75 (2)
SSIM 0.7127 0.7495 0.7318 0.7184 0.7371 (2)
Time (s) 39.29 1.34 535.29 0.64 1033.30
Fig. 18. The variation of image quality (for 3) with respect to different number of training patches can be observed as the variation in (a) PSNR and (b) SSIM.
Table 8
Results ( 3) of the proposed approach of SR with different training datasets.
Datasets Metrics Baboon Barbara Bike Buttery Cameraman Hat Parrot Pentagon Peepers Racoon
Dataset1 [40] PSNR 20.90 24.34 24.58 27.38 24.91 31.16 30.07 26.27 29.19 29.18
SSIM 0.4982 0.7271 0.7987 0.9087 0.8243 0.8769 0.9119 0.7623 0.8648 0.7647
Dataset2 [36] PSNR 20.90 24.36 24.56 27.28 24.95 31.11 30.00 26.23 29.16 29.18
SSIM 0.4986 0.7285 0.7973 0.9066 0.8245 0.8760 0.9112 0.7609 0.8648 0.7651
complexity of learning phase is 6(m3 ), which is dominated by

computation of PCA and SVD.
Typical training time for the proposed approach using MATLAB
7.12.0 (R2011a) on an Intel core i7 (3.40 GHz) system with 12 GB of
RAM is about 1000 s for 400,000 patches. However, the same can
be learned for 100,000 patches using only 130 s, and can produce
comparable results. It has to be noted that the proposed approach
does not need to learn dictionary for different up-sampling factors.
Hence, the learned dictionaries are invariant to different up-
sampling factors. Nevertheless, the approaches like RP [36], SISU
[22] and Aplus [49] need to learn different dictionaries for differ-
ent up-sampling factors.
The reconstruction phase involves selection of patches with
higher variance, which is linear in time, i.e., 6(m). The next task of
estimating dominant edge orientation is same as done during
training phase and thus require 6(m2L ) computations, where L is
Fig. 19. The cost of tting training patches into different numbers of clusters. the number of selected patches. Based on the dominant edge or-
ientation and statistical information of a patch, an appropriate
and 9 9 in learning dictionaries as well as for testing to ex- dictionary is chosen, which is done using 6(m) computations.
amine the effect of patch size on the proposed approach. The Hence, for L patches the complexity becomes 6(mL ). Estimation of
average results in terms of PSNR and SSIM values for randomly sparse vector using Eq. (12) requires 6(m2) computations [55],
selected 5 images are tabulated in Table 10. One can nd that dominated by l1-norm minimization. Thus for L patches, the
the proposed approach is performing best for the patch size computational complexity of the reconstruction phase is 6(m2L ),
5 5 for noiseless as well as noisy case. Hence, we have
dominated by computation of SVD and l1-norm minimization.
chosen 5 5 as the size of patch in our implementation.
From Tables 57, it can be found out that the proposed approach
lags behind in case of testing time, but, clearly it produces better
4.5.2. Computational analysis
results for most of the cases as compared to the state-of-the-art
Here, the computational complexity of the main steps and
approaches.
overall computational time of the proposed approach separately
for learning as well as reconstruction phases are explained. As-
suming that the patches are already extracted from the training
images, the selection of patches are based on comparing standard 5. Summary
deviation (s.d.) of intensity values with a threshold (Th). Comput-
ing s.d. of gray values of a m m dimensional patch takes In this paper, we have proposed a method of learning dic-
linear time and comparison takes constant time. Thus, selection of tionaries, based on structural and statistical information of image
patches is 6(m). Since, computation of s.d. of a patch is in- patches for SR imaging. The dominant edge orientation reects the
dependent from other patches, it can be done parallely. Further, structural information, whereas statistical information is char-
the thresholding operation allows only fewer patches for learning acterized by mean of intensity values of patches. First, suitable
dictionary, hence, a reduction in the computational time for fur- training patches have been clustered into six groups according to
ther process is expected. The running time for calculating gradient their structural information. The structurally similar patches of a
of pixels of a patch along horizontal and vertical directions is 6(m). cluster may vary in statistical sense. Hence, to make each cluster of
SVD of m n matrix requires 4m2n + 8mn2 + 9n3 computations patches structurally as well as statistically similar, each of these
[61]. For our case n 2 thus SVD requires 8m2 + 32m + 72 com- clusters has been further clustered into some groups, based on the
putations. Hence, for patches SVD needs 6(m2 ) computations. statistical information of patches. As a result, each nal cluster
Edge clustering of a patch requires computation that is linear in contains patches with similar structural and statistical informa-
time 6(m). Thus, for patches the computation requirement is tion. Thus, dictionaries trained from those clusters will represent
6(m ). Whereas K-means clustering involves 6(mKi ) operations distinct features of images, which is necessary for efcient sparse
[62], where m is the dimension of the signal, K is number of recovery. During testing, the learned dictionaries have been as-
clusters and i is the required iterations for convergence. Since, K signed to the LR patches based on their both structural as well as
and i are xed, the computation of K-means clustering reduces to statistical information. Moreover, an edge preserving constraint
6(m ), where = Ki . For nding the directions of maximum has been employed to preserve the continuous gradient informa-
variance for a group of data with n patches and m dimensions, PCA tion during SR. The overall process helps in preserving orientation
requires 6(min(m3, n3)) [63]. Here, n > m, hence for patches, as well as magnitude of edge during SR. The experimental results
6(m3 ) computations are required for computing the principal validate the importance of structural as well as statistical in-
components of the clusters. Thus, the overall computational formation of image patches in learning dictionaries for SR imaging.
Racoon
0.6756
0.7660
0.7640
0.7647
0.6743
0.6761
29.21
27.83
27.86
27.85
29.18
29.17
Peepers
0.8648
0.8640
0.8642
0.7707
0.7721
0.7745
29.13
29.19
29.18
27.15
27.19
27.18
Pentagon
0.7587
0.6348
0.6353
0.6291
0.7623
0.7616
26.25
24.29
24.36
26.27
24.35
26.21
0.8663
0.8626
0.8627
0.9102
0.9119
0.9113
Parrot
30.02
30.07
30.06
28.32
28.43
28.42
Fig. 20. The performance with respect to for up-sampling factor 3.
0.8202
0.8238
0.8278
0.8769
0.8763
0.8746
29.46
29.68
29.67
31.10
31.16
31.14
Table 10
Hat
Results ( 3) of the proposed approach of SR with different patch size.
Noise strength Metrics Patch size

Cameraman
33 55 77 99

0.8248
0.8243
0.7089
0.7047
0.8251
0.7169
23.93
23.94
24.89
24.92
23.97
24.91
n = 0 PSNR 28.31 28.48 28.38 28.27

SSIM 0.86341 0.8722 0.8697 0.8663
n = 5 PSNR 26.64 26.91 26.74 26.61
Buttery
SSIM 0.7910 0.8100 0.8045 0.8014

0.9087
0.9081
0.8682
0.8709
0.8698
0.9072
25.82
25.93
25.91
27.36
27.38
27.36
References
0.7959
0.7987
0.7991
0.7047
0.7129
0.7131
23.25
23.32
24.56
24.58
24.59
23.33
Bike
[1] S.C. Park, M.K. Park, M.G. Kang, Super-resolution image reconstruction: a
technical overview, IEEE Signal Process. Mag. 20 (3) (2003) 2136, http://dx.
doi.org/10.1109/MSP.2003.1203207.
[2] R. Keys, Cubic convolution interpolation for digital image processing, IEEE
Barbara
0.6433
0.6438
0.6433
0.7269
0.7271
0.7271
Trans. Acoust. Speech Signal Process. 29 (6) (1981) 11531160, http://dx.doi.

23.68
23.69
24.34
24.34
23.70
24.35
org/10.1109/TASSP.1981.1163711.
[3] H. Hou, H. Andrews, Cubic splines for image interpolation and digital ltering,
IEEE Trans. Acoust. Speech Signal Process. 26 (6) (1978) 508517, http://dx.doi.
org/10.1109/TASSP.1978.1163154.
Baboon
0.4064
0.4982
0.4986
0.4030
0.4972
0.4057
[4] M. Irani, S. Peleg, Improving resolution by image registration, CVGIP: Graph.

20.89
20.90
20.90
20.39
20.39
20.39
Models Image Process. 53 (3) (1991) 231239, http://dx.doi.org/10.1016/

1049-9652(91)90045-L.
[5] S. Farsiu, M. Robinson, M. Elad, P. Milanfar, Fast and robust multiframe super
resolution, IEEE Trans. Image Process. 13 (10) (2004) 13271344, http://dx.doi.
Metrics
org/10.1109/TIP.2004.834669.
PSNR
PSNR
PSNR
PSNR
PSNR
PSNR
SSIM
SSIM
SSIM
SSIM
SSIM
SSIM
[6] W. Freeman, T. Jones, E. Pasztor, Example-based super-resolution, IEEE Com-

put. Graph. Appl. 22 (2) (2002) 5665, http://dx.doi.org/10.1109/38.988747.
[7] K. Zhang, X. Gao, D. Tao, X. Li, Single image super-resolution with non-local
means and steering kernel regression, IEEE Trans. Image Process. 21 (11)
Results ( 3) of SR with different testing Th while keeping training Th 4.5.
(2012) 45444556, http://dx.doi.org/10.1109/TIP.2012.2208977.

[8] N. Efrat, D. Glasner, A. Apartsin, B. Nadler, A. Levin, Accurate blur models vs.
image priors in single image super-resolution, in: IEEE International Con-
ference on Computer Vision (ICCV), 2013, pp. 28322839. http://dx.doi.org/10.
1109/ICCV.2013.352.
[9] Q. Yan, Y. Xu, X. Yang, T. Nguyen, Single image superresolution based on
gradient prole sharpness, IEEE Trans. Image Process. 24 (10) (2015)
31873202, http://dx.doi.org/10.1109/TIP.2015.2414877.
[10] K. Zhang, D. Tao, X. Gao, X. Li, Z. Xiong, Learning multiple linear mappings for
efcient single image super-resolution, IEEE Trans. Image Process. 24 (3)
[11] J.B. Huang, A. Singh, N. Ahuja, Single image super-resolution from transformed
self-exemplars, in: 2015 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2015, pp. 51975206. http://dx.doi.org/10.1109/CVPR.
2015.7299156.
[12] R. Timofte, R. Rothe, L.J.V. Gool, Seven ways to improve example-based single
Testing Th
image super resolution, CoRR abs/1511.02228.

[13] R. Timofte, V. De Smet, L. Van Gool, Semantic super-resolution: when and
where is it useful?, Comput. Vis. Image Underst. 142 (??) (2015) 112.
4.5
4.5
0.1
0.1
1.5
1.5
[14] C.-Y. Yang, C. Ma, M.-H. Yang, Single-image super-resolution: a benchmark, in:
Computer Vision ECCV 2014: 13th European Conference, Zurich, Switzerland,
September 612, 2014, Proceedings, Part IV, Springer International Publishing,
Noise strength
Cham, 2014, pp. 372386. http://dx.doi.org/10.1007/978-3-319-10593-2_25.

[15] X. Zhang, E. Lam, E. Wu, K. Wong, Application of Tikhonov regularization to
super-resolution reconstruction of brain MRI images, in: Medical Imaging and
n = 5
n = 0
Table 9
Informatics, Lecture Notes in Computer Science, vol. 4987, Springer, Berlin,

Heidelberg, 2008, pp. 5156. http://dx.doi.org/10.1007/978-3-540-79490-5_8.
[16] L.I. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal
algorithms, Physica D: Nonlinear Phenom. 60 (14) (1992) 259268, http://dx. [40] W. Dong, L. Zhang, G. Shi, X. Wu, Image deblurring and super-resolution by
doi.org/10.1016/0167-2789(92)90242-F. adaptive sparse domain selection and adaptive regularization, IEEE Trans.
[17] A. Marquina, S. Osher, Image super-resolution by TV-regularization and Image Process. 20 (7) (2011) 18381857, http://dx.doi.org/10.1109/
Bregman iteration, J. Sci. Comput. 37 (3) (2008) 367382, http://dx.doi.org/ TIP.2011.2108306.
10.1007/s10915-008-9214-8. [41] S. Yang, M. Wang, Y. Chen, Y. Sun, Single-image super-resolution reconstruc-
[18] Q. Yuan, L. Zhang, H. Shen, Multiframe super-resolution employing a spatially tion via learned geometric dictionaries and clustered sparse coding, IEEE
weighted total variation model, IEEE Trans. Circuits Syst. Video Technol. 22 (3) Trans. Image Process. 21 (9) (2012) 40164028, http://dx.doi.org/10.1109/
(2012) 379392, http://dx.doi.org/10.1109/TCSVT.2011.2163447. TIP.2012.2201491.
[19] L. Zhang, H. Zhang, H. Shen, P. Li, A super-resolution reconstruction algorithm [42] M.R. Azimi-Sadjadi, J. Kopacz, N. Klausner, K-svd dictionary learning using a
for surveillance images, Signal Process. 90 (3) (2010) 848859, http://dx.doi. fast omp with applications, in: 2014 IEEE International Conference on Image
org/10.1016/j.sigpro.2009.09.002. Processing (ICIP), 2014, pp. 15991603. http://dx.doi.org/10.1109/ICIP.2014.
[20] A. Kanemura, S. ichi Maeda, S. Ishii, Superresolution with compound Markov 7025320.
random elds via the variational {EM} algorithm, Neural Netw. 22 (7) (2009) [43] S. Miura, Y. Kawamoto, S. Suzuki, T. Goto, S. Hirano, M. Sakurai, Image quality
10251034, http://dx.doi.org/10.1016/j.neunet.2008.12.005. improvement for learning-based super-resolution with pca, in: The 1st IEEE
[21] D.L. Donoho, De-noising by soft-thresholding, IEEE Trans. Inf. Theory 41 (3) Global Conference on Consumer Electronics 2012, 2012, pp. 572573. http://
(1995) 613627, http://dx.doi.org/10.1109/18.382009. dx.doi.org/10.1109/GCCE.2012.6379917.
[22] R. Zeyde, M. Elad, M. Protter, On single image scale-up using sparse-re- [44] J. Yang, J. Wright, T. Huang, Y. Ma, Image super-resolution via sparse re-
presentations, in: Curves and Surfaces, vol. 6920, Springer, Berlin, Heidelberg, presentation, IEEE Trans. Image Process. 19 (November (11)) (2010)
2012, pp. 711730. http://dx.doi.org/10.1007/978-3-642-27413-8_47. 28612873, http://dx.doi.org/10.1109/TIP.2010.2050625.
[23] J. Yang, J. Wright, T. Huang, Y. Ma, Image super-resolution as sparse re- [45] R. Timofte, V. De, L. Van Gool, Anchored neighborhood regression for fast
presentation of raw image patches, in: IEEE Conference on Computer Vision example-based super-resolution, in: IEEE International Conference on Com-
and Pattern Recognition (CVPR), 2008, pp. 18. http://dx.doi.org/10.1109/CVPR. puter Vision (ICCV), 2013, pp. 19201927. http://dx.doi.org/10.1109/ICCV.2013.
2008.4587647. 241.
[24] W. Dong, L. Zhang, G. Shi, X. Wu, Image deblurring and super-resolution by [46] C.-Y. Yang, M.-H. Yang, Fast direct super-resolution by simple functions, in:
adaptive sparse domain selection and adaptive regularization, IEEE Trans. IEEE International Conference on Computer Vision (ICCV), 2013, pp. 561568.
Image Process. 20 (7) (2011) 18381857, http://dx.doi.org/10.1109/ http://dx.doi.org/10.1109/ICCV.2013.75.
TIP.2011.2108306. [47] H. Chang, D.-Y. Yeung, Y. Xiong, Super-resolution through neighbor embed-
[25] S. Mandal, A. Sao, Edge preserving single image super resolution in sparse ding, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
environment, in: 20th IEEE International Conference on Image Processing vol. 1, 2004, pp. II. http://dx.doi.org/10.1109/CVPR.2004.1315043.
(ICIP), 2013, pp. 967971. http://dx.doi.org/10.1109/ICIP.2013.6738200. [48] C. Dong, C. Loy, K. He, X. Tang, Learning a deep convolutional network for
[26] Z. Zhang, Y. Xu, J. Yang, X. Li, D. Zhang, A survey of sparse representation: image super-resolution, in: Computer Vision ECCV 2014, Lecture Notes in
algorithms and applications, IEEE Access 3 (2015) 490530, http://dx.doi.org/ Computer Science, vol. 8692, Springer International Publishing, Cham, 2014,
10.1109/ACCESS.2015.2430359.
pp. 184199. http://dx.doi.org/10.1007/978-3-319-10593-2_13.
[27] S. Ravishankar, B. Wen, Y. Bresler, Online sparsifying transform learning; Part
[49] R. Timofte, V. De Smet, L. Van Gool, A : adjusted anchored neighborhood
I: Algorithms, IEEE J. Sel. Top. Signal Process. 9 (4) (2015) 625636, http://dx.
regression for fast super-resolution, in: Computer Vision ACCV 2014, Lecture
doi.org/10.1109/JSTSP.2015.2417131.
Notes in Computer Science, vol. 9006, Springer International Publishing,
[28] S. Ravishankar, B. Wen, Y. Bresler, Online sparsifying transform learning; Part
Cham, 2015, pp. 111126. http://dx.doi.org/10.1007/978-3-319-16817-3_8.
II: Convergence analysis, IEEE J. Sel. Top. Signal Process. 9 (4) (2015) 637646,
[50] X. Li, H. He, R. Wang, D. Tao, Single image superresolution via directional
http://dx.doi.org/10.1109/JSTSP.2015.2407860.
group sparsity and directional features, IEEE Trans. Image Process. 24 (9)
[29] J.J. Thiagarajan, K.N. Ramamurthy, P. Turaga, A. Spanias, Image Understanding
Using Sparse Representations, Morgan & Claypool, 2014. http://dx.doi.org/10.
[51] Y. Zhang, J. Liu, W. Yang, Z. Guo, Image super-resolution based on structure-
2200/S00563ED1V01Y201401IVM015.
modulated sparse representation, IEEE Trans. Image Process. 24 (9) (2015)
[30] S. Mallat, G. Yu, Super-resolution with sparse mixing estimators, IEEE Trans.
27972810, http://dx.doi.org/10.1109/TIP.2015.2431435.
Image Process. 19 (11) (2010) 28892900, http://dx.doi.org/10.1109/
[52] M. Elad, Sparse and Redundant Representations From Theory to Applications
TIP.2010.2049927.
in Signal and Image Processing, Springer, New York, Dordrecht, Heidelberg,
[31] H. Chavez, V. Gonzalez, A. Hernandez, V. Ponomaryov, Super resolution ima-
London, 2010.
ging via sparse interpolation in wavelet domain with implementation in DSP
[53] A.K. Sao, B. Yegnanarayana, B. Vijaya Kumar, Signicance of image re-
and GPU, in: Progress in Pattern Recognition, Image Analysis, Computer Vi-
presentation for face verication, Signal Image Video Process. 1 (2007)
sion, and Applications: 19th Iberoamerican Congress, CIARP 2014, Puerto
Vallarta, Mexico, November 25, 2014, Proceedings, Springer International 225237, http://dx.doi.org/10.1007/s11760-007-0016-5.
Publishing, Cham, 2014, pp. 973981. http://dx.doi.org/10.1007/978-3-319- [54] X. Feng, P. Milanfar, Multiscale principal components analysis for image local
12568-8_118. orientation estimation, in: Conference Record of the Thirty-Sixth Asilomar
[32] S. Mandal, S. Thavalengal, A.K. Sao, Explicit and implicit employment of edge- Conference on Signals, Systems and Computers, vol. 1, 2002, pp. 478482.
related information in super-resolving distant faces for recognition, Pattern http://dx.doi.org/10.1109/ACSSC.2002.1197228.
Anal. Appl. 19 (3) (2016) 867884, http://dx.doi.org/10.1007/ [55] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for
s10044-015-0512-0. linear inverse problems with a sparsity constraint, Commun. Pure Appl. Math.
[33] V. Abrol, P. Sharma, A.K. Sao, Greedy dictionary learning for kernel sparse 57 (11) (2004) 14131457, http://dx.doi.org/10.1002/cpa.20042.
representation based, Pattern Recognition Letters 78 (2016) 6469, http://dx. [56] H. Chavez-Roman, V. Ponomaryov, Super resolution image generation using
doi.org/10.1016/j.patrec.2016.04.014. wavelet domain interpolation with edge extraction via a sparse representa-
[34] D.L. Donoho, For most large underdetermined systems of equations, the tion, IEEE Geosci. Remote Sens. Lett. 11 (10) (2014) 17771781, http://dx.doi.
minimal l1-norm near-solution approximates the sparsest near-solution, org/10.1109/LGRS.2014.2308905.
Commun. Pure Appl. Math. 59 (7) (2006) 907934, http://dx.doi.org/10.1002/ [57] V.I. Ponomaryov, H. Chavez-Roman, V. Gonzalez-Huitron, Image Resolution
cpa.20131. Enhancement Using Edge Extraction and Sparse Representation in Wavelet
[35] R. Rubinstein, A. Bruckstein, M. Elad, Dictionaries for sparse representation Domain for Real-Time Application, Proc. SPIE 9139, Real-Time Image and Vi-
modeling, Proc. IEEE 98 (6) (2010) 10451057, http://dx.doi.org/10.1109/ deo Processing, 2014, 91390H. http://dx.doi.org/10.1117/12.2051552.
JPROC.2010.2040551. [58] A. Hore, D. Ziou, Image quality metrics: Psnr vs. ssim, in: 20th International
[36] J. Yang, J. Wright, T. Huang, Y. Ma, Image super-resolution as sparse re- Conference on Pattern Recognition (ICPR), 2010, pp. 23662369. http://dx.doi.
presentation of raw image patches, in: IEEE Conference on Computer Vision org/10.1109/ICPR.2010.579.
and Pattern Recognition, June 2008, pp. 18. http://dx.doi.org/10.1109/CVPR. [59] Z. Wang, A. Bovik, H. Sheikh, E. Simoncelli, Image quality assessment: from
2008.4587647. error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (2004)
[37] K. Engan, S. Aase, J. Hakon Husoy, Method of optimal directions for frame 600612, http://dx.doi.org/10.1109/TIP.2003.819861.
design, in: IEEE International Conference on Acoustics, Speech, and Signal [60] S. Yang, Z. Liu, M. Wang, F. Sun, L. Jiao, Multitask dictionary learning and
Processing, vol. 5, 1999, pp. 24432446. http://dx.doi.org/10.1109/ICASSP.1999. sparse representation based single-image super-resolution reconstruction,
760624. Neurocomputing 74 (17) (2011) 31933203, http://dx.doi.org/10.1016/j.
[38] M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing over- neucom.2011.04.014.
complete dictionaries for sparse representation, IEEE Trans. Signal Process. 54 [61] T.K. Moon, W.C. Stirling, Mathematical Methods and Algorithms for Signal
(November (11)) (2006) 43114322, http://dx.doi.org/10.1109/ Processing, Prentice Hall, Upper Saddle River, NJ, 2000.
TSP.2006.881199. [62] G. Hamerly, J. Drake, Accelerating Lloyds algorithm for K-means clustering, in:
[39] R. Vidal, Y. Ma, S. Sastry, Generalized principal component analysis (GPCA), in: Partitional Clustering Algorithms, Springer International Publishing, 2015, pp.
IEEE Computer Society Conference on Computer Vision and Pattern Recogni- 4178. http://dx.doi.org/10.1007/978-3-319-09259-1_2.
tion, vol. 1, June 2003, pp. I-621I-628. http://dx.doi.org/10.1109/CVPR.2003. [63] I.M. Johnstone, A.Y. Lu, Sparse principal components analysis, ArXiv e-prints
1211411. arxiv:0901.4392.

Employing Structural and Statistical Information To Learn Dictionary S For Single Image Super Resolution in Sparse Domain 2016 Signal Processing Image

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Employing Structural and Statistical Information To Learn Dictionary S For Single Image Super Resolution in Sparse Domain 2016 Signal Processing Image

Uploaded by

Copyright:

Available Formats

Signal Processing: Image Communication 48 (2016) 6380

Contents lists available at ScienceDirect

Signal Processing: Image Communication

Employing structural and statistical information to learn dictionary

observe that the variance of intensity levels of statistical cluster

which is a quadratic equation, and can have a closed form solution

where L denotes the number of patches. In addition, the output HR

The reconstruction phase starts with interpolating the LR test c

image. Another observation is that the smoother regions are al-

Iteration: Set k = k + 1 and apply the following steps: xg = x 2v + x2h . (19)

coincides with the variation of the intensity prole of the inter-

groups are clustered into K groups using K-means clustering. K is

in Fig. 9 for up-sampling factor 3, where (a) is LR input image,

Baboon PSNR 19.95 22.27 22.63 20.91 20.80 20.90 20.90

reporting the best results. It demonstrates the importance of

rest of the images because reduction of noise as well as pre-

sampling factors as well as for different datasets (provided by [49])

4.5. Analyzing the proposed approach

The proposed approach is further analyzed in terms of the

4.5.1. Parametric analysis

(a) Training patches: The goodness of the learned dictionaries are

Baboon PSNR 19.84 21.94 22.05 20.29 20.10 20.37 20.39

Set5 2 Baby PSNR 33.73 35.95 36.44 36.60 36.93

3 Baby PSNR 31.51 35.13 35.15 35.00 35.15

4 Baby PSNR 28.94 32.74 33.05 33.20 35.18

Set14 2 Comic PSNR 24.26 25.31 27.04 25.80 27.03

3 Comic PSNR 22.92 24.08 24.22 24.10 24.44

4 Comic PSNR 21.27 22.57 22.33 22.50 22.53

B100 2 Image1 PSNR 25.48 26.19 26.26 25.30 26.13 (3)

3 Image1 PSNR 23.83 24.69 24.56 24.40 24.66 (2)

4 Image1 PSNR 23.07 24.20 23.41 23.60 23.51 (3)

complexity of learning phase is 6(m3 ), which is dominated by

Results ( 3) of the proposed approach of SR with different patch size.

Noise strength Metrics Patch size

33 55 77 99

n = 0 PSNR 28.31 28.48 28.38 28.27

SSIM 0.7910 0.8100 0.8045 0.8014

Trans. Acoust. Speech Signal Process. 29 (6) (1981) 11531160, http://dx.doi.

[4] M. Irani, S. Peleg, Improving resolution by image registration, CVGIP: Graph.

Models Image Process. 53 (3) (1991) 231239, http://dx.doi.org/10.1016/

[6] W. Freeman, T. Jones, E. Pasztor, Example-based super-resolution, IEEE Com-

(2012) 45444556, http://dx.doi.org/10.1109/TIP.2012.2208977.

image super resolution, CoRR abs/1511.02228.

Cham, 2014, pp. 372386. http://dx.doi.org/10.1007/978-3-319-10593-2_25.

Informatics, Lecture Notes in Computer Science, vol. 4987, Springer, Berlin,

You might also like