Professional Documents
Culture Documents
art ic l e i nf o a b s t r a c t
Article history: It has been argued that structural information plays a signicant role in the perceptual quality of images,
Received 18 February 2016 but the importance of statistical information cannot be neglected. In this work, we have proposed an
Received in revised form approach, which explores both structural and statistical information of image patches to learn multiple
23 August 2016
dictionaries for super-resolving an image in sparse domain. Structural information is estimated using
Accepted 23 August 2016
Available online 25 August 2016
dominant edge orientation, and mean value of the intensity levels of an image patch is used to represent
statistical information. During reconstruction, a low resolution test patch is inspected for its structural as
Keywords: well as statistical information to choose a suitable dictionary. This helps in preserving the orientation of
Sparse representation edge during super-resolution process. Results are further improved by adding an edge (magnitude of
Dictionary
edge) preserving constraint, which maintains the edge continuity of super-resolved image with the input
Edge orientation
low resolution image. Thus, both characteristics of edge, i.e., orientation and magnitude are preserved in
Clustering
Edge preserving constraint our proposed approach. The experimental results demonstrate the usefulness of the proposed approach
Super-resolution in comparison to state-of-the-art approaches.
& 2016 Elsevier B.V. All rights reserved.
1. Introduction divides the existing SR approaches into two classes: (i) multiple
images SR [4,5], and (ii) single image SR [614]. In the former class,
The proliferation of image related applications like medical information provided by multiple sub-pixel shifted LR images of
imaging, remote sensing and surveillance has led to increase in the scene are fused together to derive HR image of the target
demand of high resolution (HR) images. But often, it is not feasible scene. Thus the quality of super-resolved image depends on the
due to certain limitations of imaging environment (limited shutter number of sub-pixel shifted images, which is the main bottleneck
speed, out-of focus of lens, low-resolution sensors and so on). of this approach.
Reducing pixels' size in the sensor or increasing chip size may On the contrary, the approach belonging to the latter class does
increase pixel density, which results in HR image. But reducing not require multiple images of the same scene and is more suitable
pixels size below a certain limit introduces shot noise in image, for practical scenarios. Generally, these techniques need some
and increasing the area of chip results in higher capacitance [1]. example HR images to borrow the relevant information for super-
Moreover, these solutions introduce problems like higher storage resolving the LR input image.
requirement, higher bandwidth requirement for transferring an The problem of single image SR can be addressed by modeling
image and so on. Thus, we search for some techniques, which can the formation process of LR image, where an LR image y N is
increase resolution of the image captured by low resolution (LR) produced by blurring followed by down-sampling the original HR
sensors. Interpolation techniques such as nearest neighborhood, scene x M ( N M ). It can be expressed mathematically as
bi-linear, and bi-cubic [2,3] can serve the purpose up to some y = SBx + n, (1)
extent. The key idea of these methods is to ll-up the missing
NM MM
pixels by weighted average of the neighboring pixels, which pro- where S is the down-sampling operator, B is the
duces smooth images. To address this issue, super-resolution (SR) blurring operator and n N denotes the additive noise. The SR
techniques have been evolved. seeks an approximation of x from y. Since N M , S and B together
Requirement of number of LR images of the target scene form a rectangular matrix with number of columns more than the
number of rows. Thus, there could be many x, which can produce
n the same y, and the problem of deriving unique x becomes ill-
Corresponding author.
E-mail addresses: srimanta_mandal@students.iitmandi.ac.in (S. Mandal), posed. It is addressed by regularizing the inverse problem based
anil@iitmandi.ac.in (A.K. Sao). on some prior knowledge. For instance, Tikhonov regularization
http://dx.doi.org/10.1016/j.image.2016.08.006
0923-5965/& 2016 Elsevier B.V. All rights reserved.
64 S. Mandal, A.K. Sao / Signal Processing: Image Communication 48 (2016) 6380
[15] assumes smoothness as a prior knowledge due to prevailing of the patch does convey the spatial distribution of intensity
nature of this characteristic in natural images. Nevertheless, this values.
assumption leads to smoothing of image details such as edges, Training image patches are rst clustered according to their
textures and corners. Total variation (TV) regularization addresses structural information. A cluster of patches having similar struc-
the issue up to some extent by minimizing the l1-norm of gradient tures (which we denote as an edge-cluster) may vary in statistical
of images [16,17]. But the piecewise constant like structure of TV- information, i.e., intensities may vary across different patches
regularization tends to generate staircase artifacts in the processed within an edge-cluster. Thus each of the edge-clusters is further
image. Several regularization techniques for preserving image clustered using K-means clustering, such that patches having si-
details have been reported [1820]. milar structural information and similar statistical information are
The work in reference [21] shows that sparse nature of the grouped together. Principal components of each cluster are esti-
wavelet coefcients is an important prior knowledge in case of mated to learn compact dictionary as principal component ana-
image restoration. This has opened up the possibility of restoring lysis is proved to be effective in SR reconstruction [40,43]. During
an image in sparse domain, where an image x can be represented super-resolution, a particular dictionary is assigned to a test patch
as a linear combination of few columns of an over-complete ma- according to its structural as well as its statistical information.
trix A, dened as dictionary [2231,32,33]. Thus, if we have an Furthermore, the performance of SR is improved by incorporating
appropriate coefcient vector c to weight the columns of dic- an edge preserving constraint [25], which maintains edge con-
tionary matrix A, the image can be restored back as x = Ac . Here, tinuity of the SR result with the input LR image. Being a vector
the prior knowledge is sparse nature of the coefcient vector c, i.e., quantity, edge has magnitude as well as direction, hence both of
it will have few non-zero entries. Hence, the ill-posed problem of these parameters have to be taken care of while preserving edge.
SR (Eq. (1)) can be re-framed as nding the coefcient vector with Our prior work [25] has preserved edge in different directions
sparsity priors as: separately, whereas in the current work the magnitude of edge is
preserved using edge-preserving constraint, and the direction of
c^ = arg min c 0 s.t. y SBAc 22 ,
c (2) edge is preserved by the proposed dictionary learning and selec-
tion method.
where the rst term denotes the l0-norm of the coefcient vector
The remainder of this paper is organized as follows: Section 2
c, which gives the number of non-zero elements of the vector. The
compares the proposed work with some of the existing relevant
second term is the data term, which makes sure that the output SR
works of SR. Proposed approach of learning dictionaries along
result is consistent with the input LR image. in the second term is
with edge preservation in SR is described in Section 3. Experi-
related to the power of noise n. Solving Eq. (2) involves combi-
mental results are demonstrated in Section 4, and Section 5
natorial search, which is NP-hard. It has been found that l1-norm
summarizes the paper.
minimization is the closest convex approximation of l0-norm
minimization [34]. Thus, the above constrained problem can be re-
written with Lagrangian multiplier () as
2. Related works and contributions
{ }
c^ = arg min y SBAc 22 + c 1 .
c (3)
Dictionary learning techniques have been used extensively in
The efciency of SR based on sparsity prior is very much af- example-based single image SR and can be broadly classied into
fected by the choice of the suitable dictionary. It can have an sparse representation based approaches [36,44,22,41,40] and non-
analytical form, which is computationally fast or it can be learned sparse representation based approaches [4549]. Non-sparse re-
from the data itself, which is proven to be computationally ex- presentation based approaches either learn dictionaries explicitly
pensive. Due to its adaptive nature, the learned dictionaries are [49,45] or implicitly via hidden layers of convolutional neural
shown to give efcient sparse representation as compared to its network [47]. Here, approaches related to sparse representations
analytical counterpart [35], leading to better SR result. Several are explained in brief as the proposed approach belongs to this
techniques of learning dictionary have been documented in the category.
literature [3642]. Most of them consider patches as unit instead SR using sparse domain representation is not new, many so-
of the entire image to faithfully reconstruct the image. Some of phisticated approaches incorporating sparse domain framework
them learn a single dictionary ( a. k . a. universal dictionary) [36 have been published [36,44,22,41,40,50,51]. Some of them con-
39] to compute the sparse representation of all the patches of a sider universal over-complete dictionary [36,22] and some medi-
given image. But these may not be appropriate to represent the tate on learning multiple dictionaries [41,40]. In references [36,44],
variations present in the image. The works proposed in the re- Yang et al. have learned the relationship between LR and HR image
ferences [40,41] have tried to address this issue by learning mul- patches by considering two dictionaries. One dictionary Ah has
tiple dictionaries. But these approaches either consider statistical been learned from the HR raw image patches, and another dic-
information or structural information of image patches to learn tionary Al has been learned from the corresponding LR image
dictionary(s). None of them have explored the complimentary patches. The basic idea is to derive the sparse coefcient vector c
nature of both the information. using the LR dictionary Al , and later the same coefcient vector is
In this paper, we have proposed an approach of learning mul- used for choosing the corresponding HR image patches from the
tiple dictionaries for sparse representation by considering both the dictionary Ah . The linear relationship between the two types of
dictionaries may pose an issue:
structural information and the statistical information of image
patches as cue. Structural information yields the structure of image Al = SBAh, (4)
patches and is strongly related with the geometric details of the
patch. In this work, the geometric details are characterized by which can be derived as
dominant edge orientation of the patch, whereas mean and var- y = SBx,
iance of the gray levels of a patch are associated with the statistical
Al c = SBAhc. (5)
information of the patch. Thus the two quantities are compli-
mentary in nature, as mean and variance do not have information Let us say that the dictionary Ah has small mutual coherence
of pixel locations. On the contrary, the dominant edge orientation which is one of the important characteristics of efcient
S. Mandal, A.K. Sao / Signal Processing: Image Communication 48 (2016) 6380 65
Fig. 2. The residual component [41]: (a) the input LR image, (b) generated LR image
from the output HR image and (c) the residual component.
Fig. 1. Signicance of structural and statistical information: (a) patches within a continuous gradient information, known as edginess [53] to pre-
cluster are statistically similar (means of intensity values are similar) but different serve. One can observe that edginess of an image (Fig. 7(d)) has
structurally (geometrically dominant edge orientation); and (b) patches within a less noise like structures as compared to the residual of image
cluster are structurally similar but statistically different.
shown in Fig. 2(c). Thus the proposed work preserves the edges
better as compared to the work [41].
dictionaries for sparse representation. But, this does not guarantee (iii) Reference [41] learns the dictionary separately for HR im-
that the same will be true for the dictionary Al , because of the age patches and LR image patches to learn the relationship be-
multiplication by a rectangular matrix SB. This may increase the
tween HR and LR images, as done in references [36,22]. As ex-
proximity, and as a result we may end up with a coefcient vector
plained earlier, the linear mapping from HR dictionary to LR dic-
c, which is not suitable for reconstructing the HR image patch [52].
tionary may increase the mutual coherence of the latter part, and
Similarly, the work in reference [22] also exploits the relationship
thus it may not ensure sparse recovery of the signal. On the con-
between the LR and HR image patches. But the work follows
trary, the proposed approach learns dictionary from HR image
K-SVD dictionary learning approach [38,22], instead of using
patches (employed in reference [40] also) only, thus the assump-
cluster of raw patches as dictionary. In addition, both of these
tion of linear relationship is avoided. This is an improvement over
approaches also consider universal over-complete dictionaries,
the traditional sparse representation based approaches [36,22,41],
which may not be appropriate for representing all variation of
as the objective of SR is to restore HR image, it is more logical to
image structures.
use sparse coefcient vector computed using dictionary made up
The work in reference [40] attempts to address the issue of
of HR image patches than using LR patches.
universal dictionary by learning several sub-dictionaries from the
(iv) Reference [41] considers a separate cluster for smooth
HR training patches. It uses K-means clustering approach to cluster
patches. This may not be necessary because human visual system
different patches, but it does not consider the orientation of edges,
(HVS) is relatively less sensitive to smoother patches, and hence
which reects the structural information, an important char-
can be up-sampled directly with the simple interpolation ap-
acteristic of images. This is illustrated in Fig. 1(a), where a test
proaches. In the proposed approach, only patches containing
image patch having dominant edge orientation of 135 (antic-
crucial information are considered for SR, hence, computational
lockwise direction from the x-axis) can be assigned any or linear
overhead is reduced.
combination of the patches having dominant edge orientation 45
Thus, the contribution of our work can be summarized as:
or 135. This is because, both patches belong to same cluster due to
(i) We propose a dictionary learning approach which considers
their similarity in mean of intensity values. But, ideally it should be
structural as well as statistical information of image patches.
assigned the patch, having orientation 135. Thus, faithful re-
During reconstruction, a dictionary will be assigned to a target
construction of the patch may not be possible.
On the contrary, the work in reference [41] learns multiple patch by checking its structural and statistical information. (ii)
dictionaries from clusters of patches, congregated by structural Proposed approach considers only suitable patches (which are not
information. This approach has some similarity with the proposed smooth), to learn dictionaries as well as to reconstruct the image.
approach with notable important differences as follows: (iii) Our approach considers a simple yet effective edge preserving
(i) Reference [41] clusters the training patches into eight groups constraint, which will preserve edge of the image during SR. (iv)
by considering only the structural information, but has neglected Extensive experimental analysis is performed to validate the ap-
the statistical information of the patches. Thus, the dictionary proach under noiseless as well as noisy condition.
contains structurally similar but statistically different patches, and
linear combination of them may not reconstruct the target patch
faithfully, as can be seen in Fig. 1(b). On the contrary, both sta- 3. Proposed approach
tistical and structural information, which are complimentary in
nature, are used in the proposed work of learning dictionaries for The block diagram of the proposed approach, shown in Fig. 3,
better SR reconstruction. can be divided into two phases: (i) learning phase and (ii) re-
(ii) In order to preserve the detail information, the reference construction phase. These phases have been discussed in greater
[41] adds the super-resolved residual component, computed as the detail in the following sub-sections. First phase deals with the
difference between the original LR image and synthesized LR im- learning of several dictionaries using structural as well as statis-
age from the reconstructed HR counterpart. It can be observed tical information of patches. This is due to the fact that they both
from Fig. 2 (taken from reference [41]) that the residual contains contain complimentary information as demonstrated in Fig. 1. The
edge information along with noise. If the super-resolved residual is super-resolution process is discussed in the reconstruction phase,
added to the reconstructed HR image, the noise like structure may where a HR image patch is derived by assigning a suitable dic-
degrade the resultant HR image. Unlike reference [41], we have tionary to the target LR patch. Further, an edge preserving con-
added a simple edge preserving constraint to preserve the edges of straint is added to preserve the continuous gradient magnitude.
the input image in the super-resolved image. This constraint has This constraint helps in regularizing the SR problem and will be
been proposed in our prior work [25], where we consider discussed in Section 3.3.
66 S. Mandal, A.K. Sao / Signal Processing: Image Communication 48 (2016) 6380
Fig. 3. The proposed approach: learning phase where Dicxtroids (Dictionaries centroids) are learned and reconstruction phase where a LR test image is super-resolved.
3.1. Learning phase containing the singular values of T. The columns of the square
matrices U m m and V 2 2 represent left and right singular
Several images rich in content are collected for training pur- vectors of T, which are orthonormal in nature. Since, the gradient
pose, and patches of size m m are extracted from them using eld is decomposed into singular vectors and singular values, the
a patch extractor matrix Pi . Let t i denotes the extracted patch from rst column of V (denoted as v1) will give the dominant orienta-
a training image t using the relation t i = Pit . Since, HVS is less tion of the gradient eld. Specic angle of orientation can be de-
sensitive to smoother regions, we neglect the smoother patches by rived by
comparing the standard deviation (s.d.) of gray level of the patches
v (2)
with a predened threshold value (Th). = arctan 1 .
The selected patches are then distributed into six clusters ( R 0 , v1(1) (7)
R30 , R 60 , R 90 , R120 and R150 ) using the process of edge clustering1
depending on their structural information, represented by domi- Thus, if we add 90 to , the angle of dominant orientation of
nant orientation of edges. Here, R represents a cluster of patches edges of a patch can be achieved. If + 90 ( 15, + 15], that
having dominant orientation of edges near about , where is particular patch will be clustered in R . The above approach is
chosen to cover entire 360 with interval of 30, i.e., repeated for all the patches to generate the edge clusters R .
Since, each cluster R has been formed using the structural
{ 0, 30, 60, 90, 120, 150}. Generally, the computation
information of patches, which might be different in statistical
of edge orientation is performed by gradient operator (derivative
sense, i.e., intensity values could vary largely over a cluster of
operator), which is very sensitive to noise. Thus, a method of es-
patches. To make each cluster unique in structural as well as sta-
timating dominant edge orientation is required, which is less
tistical sense, we further divide each R into K clusters. We in-
sensitive to noise. The work in reference [54] shows a robust
method for estimating dominant edge orientation of a patch using (
itialize a codebook consisting of K random values r1, , r2, rK , . If )
SVD, which is followed in the proposed approach. Firstly, gradient a patch t i, from the edge cluster R follows the condition
eld of a patch t i has been calculated as
{ }
R k, = i l k , ti, rk, 2 < ti, rl, 2 , (8)
t t
Ti = [ti, h, ti, v] = i , i , then the patch will be kept in columnize mode into the cluster
h v (6)
Rk, . Finally the codebook will be updated as
where h and v represent the horizontal and vertical axes, re-
1
spectively. ti, h and ti, v both are of dimension m m and are rk, = t i, .
|R k, |
arranged in columnize mode in the matrix T . It is decom-m2 i R k, (9)
posed as T = UVT , where m 2 is a diagonal matrix In this manner, all the patches will be grouped to generate sta-
tistically as well as structurally similar clusters Rk, ( k = 1, 2, K ,
1
Edge clustering term is used to describe the process of achieving R using and {0, 30, 60, 90, 120, 150}). The signicance of the
dominant edge orientation of a patch. proposed approach of clustering the image patches can be
S. Mandal, A.K. Sao / Signal Processing: Image Communication 48 (2016) 6380 67
that the histogram of variance of the intensity levels and dominant Eigenvectors pk, along with the eigenvalues are computed from
edge orientation of patches belong to a cluster derived using only k, . The eigenvectors are arranged according to the descending
structural information (edge cluster), only statistical information order of the eigenvalues. First few signicant eigenvectors (s) are
(statistical cluster) and using the proposed approach. One can selected corresponding to signicant eigenvalues and are arranged
Fig. 5. Histogram of variance of the intensity levels (a) for edge cluster, (b) for statistical cluster and (c) for cluster generated using proposed approach. Histogram of range of
dominant edge orientation (d) for edge cluster, (e) for statistical cluster and (f) for cluster generated using proposed approach.
68 S. Mandal, A.K. Sao / Signal Processing: Image Communication 48 (2016) 6380
c^i = arg min x^ i A ci 22 + ci 1 ,
ci (12)
m
where A is the concatenation of Ak, , k , and
c^ i ( > m) is the estimated sparse vector of the HR image. Eq.
(12) has to be solved iteratively for all the patches to update the
patches of HR approximation until it converges and is performed
by iterative thresholding algorithm [55]. Within each iteration, the
selection of dictionaries will be updated adaptively by Eq. (11).
Once c^ i is estimated for all the patches, the HR patches can be
reconstructed by x ^ A c^ . In order to reconstruct the complete
i i
image from the patches, we have to make sure that the extracted
patches from the reconstructed HR image should not differ much
from the estimated patches. It is achieved by minimizing the fol-
lowing cost function:
L 2
^
x = arg min i = 1 ^
x iPi ^
x
Fig. 6. Intensity Prole of Cameraman image: dotted line is for the original image ^
x 2
and the continuous line is for the interpolated version of the LR image.
L 2
= arg min i = 1 A ^
c iPi ^
x
^
x 2 (13)
L 1 L
x^ PiT Pi PiT A c^i,
i=1 i = 1
(14)
Output: The result is HR image x. With an appropriate weight (), the above constrained cost func-
tion can be rewritten as
c
{ {}
c^ = arg min y SBA c 22 + c 1 + Eg y
to put a constraint, which maintains edge continuity of the down- The equation can be solved using iterative thresholding algorithm
sampled version of the estimated HR image with the input LR [55]. The pseudo code of the proposed approach is given in Table 1.
70 S. Mandal, A.K. Sao / Signal Processing: Image Communication 48 (2016) 6380
Fig. 9. Results for clean Buttery image ( 3): (a) LR image, (b) result of Raw Patches (RP) [36] [PSNR 23.46, SSIM 0.7747], (c) result of Single image scale-up (SISU) [22]
[PSNR 25.92, SSIM 0.7801], (d) Aplus result [49] [PSNR 26.90, SSIM 0.7733], (e) ASDS result [40] [PSNR 26.49, SSIM 0.8861], (f) proposed structural-statistical
dictionary (SSD) results [PSNR 27.16, SSIM 0.9021], (g) proposed SSD with edge preservation (SSD-EP) result [PSNR 27.38, SSIM 0.9087] and (h) the ground truth image.
Fig. 11. Results for clean Bike image ( 3) (a) LR image, (b) result of RP [36] [PSNR 22.88, SSIM 0.7008], (c) result of SISU [22] [PSNR 24.05, SSIM 0.7525], (d) result of
Aplus [49] [PSNR 24.30, SSIM 0.7766], (e) ASDS result [40] [PSNR 24.20, SSIM 0.7813], (f) proposed SSD results [PSNR 24.49, SSIM 0.7961], (g) proposed SSD-EP
result [PSNR 24.58, SSIM 0.7987] and (h) the ground truth image.
Table 2
Results of SR ( 3) for noise-less LR image (sn 0); Bold fonts represent the best values.
Images Metrics BC RP [36] SISU [22] ASDS [40] Aplus [49] SSD SSD-EP
Table 3 there is not much signicant quantitative gain while using SSD-EP
Noise-less results (3) in terms of PSNR and SSIM gains over bi-cubic interpolation over SSD, but perceptual difference can be observed in the gures.
method (sn 0).
The proposed approach is further compared in Table 3 with two
Images Metrics K-means [60] Geometric dictionary [41] SSD-EP SR approaches [41,60] in terms of metrics gain.3 The PSNR and
SSIM gains represent the difference between the super-resolved
Lena PSNR 1.44 1.93 5.49
HR image and the bi-cubic interpolated image. It has to be noted
SSIM 0.0035 0.0046 0.1550
Peppers PSNR 0.86 1.32 5.74 that the proposed approach produces best gains for both images.
SSIM 0.0025 0.0035 0.1401
Fig. 13. SR results for noisy Barbara image ( 3): (a) noisy LR image, (b) result of RP [36] [PSNR 23.76, SSIM 0.6196], (c) result of SISU [22] [PSNR 23.99, SSIM 0.6082],
(d) result of Aplus [49] [PSNR 22.80, SSIM 0.5349], (e) ASDS result [40] [PSNR 23.31, SSIM 0.5923], (f) proposed SSD results [PSNR 23.56, SSIM 0.6210], (g) proposed
SSD-EP result [PSNR 23.69, SSIM 0.6438] and (h) the ground truth image.
Fig. 15. SR results for noisy Peepers image ( 3): (a) noisy LR image, (b) result of RP [36] [PSNR 25.39, SSIM 0.7369], (c) result of SISU [22] [PSNR 26.85, SSIM 0.7204],
(d) result of Aplus [49] [PSNR 25.50, SSIM 0.6363], (e) ASDS result [40] [PSNR 26.28, SSIM 0.6975], (f) proposed SSD results [PSNR 26.86, SSIM 0.7362], (g) proposed
SSD-EP result [PSNR 27.19, SSIM 0.7721] and (h) the ground truth image.
4.3. Experimental results with noise smoothing operation of 1-D processing of image does help in re-
ducing the level of noise, but also smears the ner details. Thus,
The robustness of the SR system is also examined under noisy the noisy images having more textures cannot be effectively re-
condition. For demonstration, we have considered the images of covered, as can be noticed in terms of PSNR and SSIM values for
Table 2 for up-sampling factor 3 only. We have added AWGN with Baboon, Bike and Racoon images in Table 4, where the quantitative
n = 5 to the LR image and is given as an input to the SR system. results in terms of PSNR and SSIM are shown.
The outputs can be qualitatively visualized for Barbara and Peepers
image in Figs. 13 and 15, respectively. The zoomed versions can be 4.4. Experiments with different up-sampling factors and datasets
observed in Figs. 14 and 16. It has to be noted that the results of
(b) for both the examples are smoother as well as the edges are Further, the proposed approach is scrutinized for different up-
more smeared as compared to the rest of the images. On the other
hand (c)(e) have more high frequency information, but at the 4
Here we consider different as different classes. One noisy patch is being
same time they have artifacts in the smoother regions. However, correctly classied means, the dominant edge orientation of the patch lie within
the results of (f) and (g) are perceptually better as compared to 15 range, where is the dominant edge orientation of the ground truth patch.
74 S. Mandal, A.K. Sao / Signal Processing: Image Communication 48 (2016) 6380
Table 4
Results ( 3) of SR for noisy LR image (sn 5); Bold fonts represent the best values.
Images Metrics BC RP [36] SISU [22] ASDS [40] Aplus [49] SSD SSD-EP
Table 5
Results of SR for Set5 for different scales ( n = 0 ); Bold fonts represent the best values.
Dataset Scales Images Metrics RP [36] SISU [22] ASDS [40] Aplus [49] SSD-EP
(s) in the proposed approach. After a certain point, increasing (b) Different number of K: In the proposed approach, each edge
the number of training patches does not change the behavior cluster is further divided into K clusters, and PCA based dic-
of principal components signicantly. Hence, it can be inferred tionaries are built for each of the clusters. In order to choose
that increasing number of training patches can improve PCA the number of K, we have plotted the cost of tting the
based dictionary up to a certain point, after that the goodness training patches with respect to different number of clusters in
of the PCA based dictionary will not improve much. Fig. 19. One can observe that the tting cost has reduced with
The robustness of the proposed dictionary is further reviewed increasing number of clusters. However, the cost reduction
using different training datasets. Here, the objective is to check from K 60 to K 100 is lesser as compared to the lower K.
the dependency of the proposed dictionary on the type of Again, we have checked the PSNR values of the super-resolved
training examples. In this case, we have used two datasets, one images for 5 examples using the dictionaries learned using
is provided by the approach [40], which consists of 5 images K 50, 60 and 70, the mean values of them are 28.47, 28.48
only, and another dataset is provided by the approaches and 28.46, respectively. From both the analyses, we have
[36,22,49,48], which consists of 91 images. The resultant PSNR found that for the case K 60, the training patches are quite
and SSIM values for both of the datasets are presented in well tted as well as the proposed approach is producing the
Table 8. One can note that the results are more or less similar comparable result. Hence, to reduce computational cost as
for both the datasets. This is because, once we have enough well as tting cost, we have chosen K 60.
number of patches with different variations for training the (c) The threshold value (Th): A suitable threshold value denoted as
PCA based dictionaries, the proposed approach can produce Th is employed in the proposed approach to select or reject
results that is almost invariant to training examples. patches for learning dictionaries as well as for super-
76 S. Mandal, A.K. Sao / Signal Processing: Image Communication 48 (2016) 6380
Table 6
Results of SR for Set14 for different scales ( n = 0 ); Bold fonts represent the best values.
Dataset Scales Images Metrics RP [36] SISU [22] ASDS [40] Aplus [49] SSD-EP
resolution. Since, images collected for learning purpose are values) has improved with increasing value of up to 0.1,
very rich in detail, very few patches will be rejected due to the and it saturates after that. On the other hand, for noisy image,
predened threshold value. But the same Th will steer rejec- the performance is degrading with increasing from 0.001,
tion for most of the patches of testing image as the resolution and the degradation is quite high for > 1. The reason could be
of the image is lower. Hence, we have set a lower Th for the that for clean image, increasing will emphasize the edge
testing phase, which is lower as compared to its counterpart in preservation, and can produce a better result. On the other
the training phase. This value will allow more patches of hand for noisy image, the edginess feature will have some
testing image to participate in SR process, which leads to unwanted noise due to derivative operation. Hence, increasing
better results as can be observed in Table 9 in terms of PSNR weight enhances the noise also, and the performance
and SSIM values. It can be examined for most of the cases that degrades. By considering this observation, we have selected
the results of testing Th 1.5 are better than 4.5 and 0.1 for the value of as 0.1 and 0.001 (the rst points to produce best
noisy as well as noiseless condition. Hence, we choose testing results) for clean images and noisy images, respectively, so as
Th 1.5, which is lower than the training Th 4.5. to achieve best performance.
(d) Performance with respect to : The parameter plays an im- (e) The size of patch: In order to restore the patches in sparse re-
portant role in assigning weight to the edge preserving term in presentation framework, the patches need to contain few
the optimization cost of Eq. (21). The variation of the perfor- image details, otherwise, it cannot be restored using few
mance with respect to is shown in Fig. 20. It can be observed training patches. Hence, the size of patches needs to be
for clean image that the performance (in terms of PSNR smaller. Here, we have used patches of size 3 3, 5 5, 7 7,
S. Mandal, A.K. Sao / Signal Processing: Image Communication 48 (2016) 6380 77
Table 7
Results of SR for B100 for different scales ( n = 0 ). The numbers within parenthesis represent the rank of our approach in comparison with the others.
Dataset Scales Images Metrics RP [36] SISU [22] ASDS [40] Aplus [49] SSD-EP
Fig. 18. The variation of image quality (for 3) with respect to different number of training patches can be observed as the variation in (a) PSNR and (b) SSIM.
78 S. Mandal, A.K. Sao / Signal Processing: Image Communication 48 (2016) 6380
Table 8
Results ( 3) of the proposed approach of SR with different training datasets.
Datasets Metrics Baboon Barbara Bike Buttery Cameraman Hat Parrot Pentagon Peepers Racoon
Dataset1 [40] PSNR 20.90 24.34 24.58 27.38 24.91 31.16 30.07 26.27 29.19 29.18
SSIM 0.4982 0.7271 0.7987 0.9087 0.8243 0.8769 0.9119 0.7623 0.8648 0.7647
Dataset2 [36] PSNR 20.90 24.36 24.56 27.28 24.95 31.11 30.00 26.23 29.16 29.18
SSIM 0.4986 0.7285 0.7973 0.9066 0.8245 0.8760 0.9112 0.7609 0.8648 0.7651
Racoon
0.6756
0.7660
0.7640
0.7647
0.6743
0.6761
29.21
27.83
27.86
27.85
29.18
29.17
Peepers
0.8648
0.8640
0.8642
0.7707
0.7721
0.7745
29.13
29.19
29.18
27.15
27.19
27.18
Pentagon
0.7587
0.6348
0.6353
0.6291
0.7623
0.7616
26.25
24.29
24.36
26.27
24.35
26.21
0.8663
0.8626
0.8627
0.9102
0.9119
0.9113
Parrot
30.02
30.07
30.06
28.32
28.43
28.42
Fig. 20. The performance with respect to for up-sampling factor 3.
0.8202
0.8238
0.8278
0.8769
0.8763
0.8746
29.46
29.68
29.67
31.10
31.16
31.14
Table 10
Hat
0.8243
0.7089
0.7047
0.8251
0.7169
23.93
23.94
24.89
24.92
23.97
24.91
0.9081
0.8682
0.8709
0.8698
0.9072
25.82
25.93
25.91
27.36
27.38
27.36
References
0.7959
0.7987
0.7991
0.7047
0.7129
0.7131
23.25
23.32
24.56
24.58
24.59
23.33
Bike
[1] S.C. Park, M.K. Park, M.G. Kang, Super-resolution image reconstruction: a
technical overview, IEEE Signal Process. Mag. 20 (3) (2003) 2136, http://dx.
doi.org/10.1109/MSP.2003.1203207.
[2] R. Keys, Cubic convolution interpolation for digital image processing, IEEE
Barbara
0.6433
0.6438
0.6433
0.7269
0.7271
0.7271
23.69
24.34
24.34
23.70
24.35
org/10.1109/TASSP.1981.1163711.
[3] H. Hou, H. Andrews, Cubic splines for image interpolation and digital ltering,
IEEE Trans. Acoust. Speech Signal Process. 26 (6) (1978) 508517, http://dx.doi.
org/10.1109/TASSP.1978.1163154.
Baboon
0.4064
0.4982
0.4986
0.4030
0.4972
0.4057
20.90
20.90
20.39
20.39
20.39
org/10.1109/TIP.2004.834669.
PSNR
PSNR
PSNR
PSNR
PSNR
PSNR
SSIM
SSIM
SSIM
SSIM
SSIM
SSIM
4.5
0.1
0.1
1.5
1.5
[14] C.-Y. Yang, C. Ma, M.-H. Yang, Single-image super-resolution: a benchmark, in:
Computer Vision ECCV 2014: 13th European Conference, Zurich, Switzerland,
September 612, 2014, Proceedings, Part IV, Springer International Publishing,
Noise strength
algorithms, Physica D: Nonlinear Phenom. 60 (14) (1992) 259268, http://dx. [40] W. Dong, L. Zhang, G. Shi, X. Wu, Image deblurring and super-resolution by
doi.org/10.1016/0167-2789(92)90242-F. adaptive sparse domain selection and adaptive regularization, IEEE Trans.
[17] A. Marquina, S. Osher, Image super-resolution by TV-regularization and Image Process. 20 (7) (2011) 18381857, http://dx.doi.org/10.1109/
Bregman iteration, J. Sci. Comput. 37 (3) (2008) 367382, http://dx.doi.org/ TIP.2011.2108306.
10.1007/s10915-008-9214-8. [41] S. Yang, M. Wang, Y. Chen, Y. Sun, Single-image super-resolution reconstruc-
[18] Q. Yuan, L. Zhang, H. Shen, Multiframe super-resolution employing a spatially tion via learned geometric dictionaries and clustered sparse coding, IEEE
weighted total variation model, IEEE Trans. Circuits Syst. Video Technol. 22 (3) Trans. Image Process. 21 (9) (2012) 40164028, http://dx.doi.org/10.1109/
(2012) 379392, http://dx.doi.org/10.1109/TCSVT.2011.2163447. TIP.2012.2201491.
[19] L. Zhang, H. Zhang, H. Shen, P. Li, A super-resolution reconstruction algorithm [42] M.R. Azimi-Sadjadi, J. Kopacz, N. Klausner, K-svd dictionary learning using a
for surveillance images, Signal Process. 90 (3) (2010) 848859, http://dx.doi. fast omp with applications, in: 2014 IEEE International Conference on Image
org/10.1016/j.sigpro.2009.09.002. Processing (ICIP), 2014, pp. 15991603. http://dx.doi.org/10.1109/ICIP.2014.
[20] A. Kanemura, S. ichi Maeda, S. Ishii, Superresolution with compound Markov 7025320.
random elds via the variational {EM} algorithm, Neural Netw. 22 (7) (2009) [43] S. Miura, Y. Kawamoto, S. Suzuki, T. Goto, S. Hirano, M. Sakurai, Image quality
10251034, http://dx.doi.org/10.1016/j.neunet.2008.12.005. improvement for learning-based super-resolution with pca, in: The 1st IEEE
[21] D.L. Donoho, De-noising by soft-thresholding, IEEE Trans. Inf. Theory 41 (3) Global Conference on Consumer Electronics 2012, 2012, pp. 572573. http://
(1995) 613627, http://dx.doi.org/10.1109/18.382009. dx.doi.org/10.1109/GCCE.2012.6379917.
[22] R. Zeyde, M. Elad, M. Protter, On single image scale-up using sparse-re- [44] J. Yang, J. Wright, T. Huang, Y. Ma, Image super-resolution via sparse re-
presentations, in: Curves and Surfaces, vol. 6920, Springer, Berlin, Heidelberg, presentation, IEEE Trans. Image Process. 19 (November (11)) (2010)
2012, pp. 711730. http://dx.doi.org/10.1007/978-3-642-27413-8_47. 28612873, http://dx.doi.org/10.1109/TIP.2010.2050625.
[23] J. Yang, J. Wright, T. Huang, Y. Ma, Image super-resolution as sparse re- [45] R. Timofte, V. De, L. Van Gool, Anchored neighborhood regression for fast
presentation of raw image patches, in: IEEE Conference on Computer Vision example-based super-resolution, in: IEEE International Conference on Com-
and Pattern Recognition (CVPR), 2008, pp. 18. http://dx.doi.org/10.1109/CVPR. puter Vision (ICCV), 2013, pp. 19201927. http://dx.doi.org/10.1109/ICCV.2013.
2008.4587647. 241.
[24] W. Dong, L. Zhang, G. Shi, X. Wu, Image deblurring and super-resolution by [46] C.-Y. Yang, M.-H. Yang, Fast direct super-resolution by simple functions, in:
adaptive sparse domain selection and adaptive regularization, IEEE Trans. IEEE International Conference on Computer Vision (ICCV), 2013, pp. 561568.
Image Process. 20 (7) (2011) 18381857, http://dx.doi.org/10.1109/ http://dx.doi.org/10.1109/ICCV.2013.75.
TIP.2011.2108306. [47] H. Chang, D.-Y. Yeung, Y. Xiong, Super-resolution through neighbor embed-
[25] S. Mandal, A. Sao, Edge preserving single image super resolution in sparse ding, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
environment, in: 20th IEEE International Conference on Image Processing vol. 1, 2004, pp. II. http://dx.doi.org/10.1109/CVPR.2004.1315043.
(ICIP), 2013, pp. 967971. http://dx.doi.org/10.1109/ICIP.2013.6738200. [48] C. Dong, C. Loy, K. He, X. Tang, Learning a deep convolutional network for
[26] Z. Zhang, Y. Xu, J. Yang, X. Li, D. Zhang, A survey of sparse representation: image super-resolution, in: Computer Vision ECCV 2014, Lecture Notes in
algorithms and applications, IEEE Access 3 (2015) 490530, http://dx.doi.org/ Computer Science, vol. 8692, Springer International Publishing, Cham, 2014,
10.1109/ACCESS.2015.2430359.
pp. 184199. http://dx.doi.org/10.1007/978-3-319-10593-2_13.
[27] S. Ravishankar, B. Wen, Y. Bresler, Online sparsifying transform learning; Part
[49] R. Timofte, V. De Smet, L. Van Gool, A : adjusted anchored neighborhood
I: Algorithms, IEEE J. Sel. Top. Signal Process. 9 (4) (2015) 625636, http://dx.
regression for fast super-resolution, in: Computer Vision ACCV 2014, Lecture
doi.org/10.1109/JSTSP.2015.2417131.
Notes in Computer Science, vol. 9006, Springer International Publishing,
[28] S. Ravishankar, B. Wen, Y. Bresler, Online sparsifying transform learning; Part
Cham, 2015, pp. 111126. http://dx.doi.org/10.1007/978-3-319-16817-3_8.
II: Convergence analysis, IEEE J. Sel. Top. Signal Process. 9 (4) (2015) 637646,
[50] X. Li, H. He, R. Wang, D. Tao, Single image superresolution via directional
http://dx.doi.org/10.1109/JSTSP.2015.2407860.
group sparsity and directional features, IEEE Trans. Image Process. 24 (9)
[29] J.J. Thiagarajan, K.N. Ramamurthy, P. Turaga, A. Spanias, Image Understanding
(2015) 28742888, http://dx.doi.org/10.1109/TIP.2015.2432713.
Using Sparse Representations, Morgan & Claypool, 2014. http://dx.doi.org/10.
[51] Y. Zhang, J. Liu, W. Yang, Z. Guo, Image super-resolution based on structure-
2200/S00563ED1V01Y201401IVM015.
modulated sparse representation, IEEE Trans. Image Process. 24 (9) (2015)
[30] S. Mallat, G. Yu, Super-resolution with sparse mixing estimators, IEEE Trans.
27972810, http://dx.doi.org/10.1109/TIP.2015.2431435.
Image Process. 19 (11) (2010) 28892900, http://dx.doi.org/10.1109/
[52] M. Elad, Sparse and Redundant Representations From Theory to Applications
TIP.2010.2049927.
in Signal and Image Processing, Springer, New York, Dordrecht, Heidelberg,
[31] H. Chavez, V. Gonzalez, A. Hernandez, V. Ponomaryov, Super resolution ima-
London, 2010.
ging via sparse interpolation in wavelet domain with implementation in DSP
[53] A.K. Sao, B. Yegnanarayana, B. Vijaya Kumar, Signicance of image re-
and GPU, in: Progress in Pattern Recognition, Image Analysis, Computer Vi-
presentation for face verication, Signal Image Video Process. 1 (2007)
sion, and Applications: 19th Iberoamerican Congress, CIARP 2014, Puerto
Vallarta, Mexico, November 25, 2014, Proceedings, Springer International 225237, http://dx.doi.org/10.1007/s11760-007-0016-5.
Publishing, Cham, 2014, pp. 973981. http://dx.doi.org/10.1007/978-3-319- [54] X. Feng, P. Milanfar, Multiscale principal components analysis for image local
12568-8_118. orientation estimation, in: Conference Record of the Thirty-Sixth Asilomar
[32] S. Mandal, S. Thavalengal, A.K. Sao, Explicit and implicit employment of edge- Conference on Signals, Systems and Computers, vol. 1, 2002, pp. 478482.
related information in super-resolving distant faces for recognition, Pattern http://dx.doi.org/10.1109/ACSSC.2002.1197228.
Anal. Appl. 19 (3) (2016) 867884, http://dx.doi.org/10.1007/ [55] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for
s10044-015-0512-0. linear inverse problems with a sparsity constraint, Commun. Pure Appl. Math.
[33] V. Abrol, P. Sharma, A.K. Sao, Greedy dictionary learning for kernel sparse 57 (11) (2004) 14131457, http://dx.doi.org/10.1002/cpa.20042.
representation based, Pattern Recognition Letters 78 (2016) 6469, http://dx. [56] H. Chavez-Roman, V. Ponomaryov, Super resolution image generation using
doi.org/10.1016/j.patrec.2016.04.014. wavelet domain interpolation with edge extraction via a sparse representa-
[34] D.L. Donoho, For most large underdetermined systems of equations, the tion, IEEE Geosci. Remote Sens. Lett. 11 (10) (2014) 17771781, http://dx.doi.
minimal l1-norm near-solution approximates the sparsest near-solution, org/10.1109/LGRS.2014.2308905.
Commun. Pure Appl. Math. 59 (7) (2006) 907934, http://dx.doi.org/10.1002/ [57] V.I. Ponomaryov, H. Chavez-Roman, V. Gonzalez-Huitron, Image Resolution
cpa.20131. Enhancement Using Edge Extraction and Sparse Representation in Wavelet
[35] R. Rubinstein, A. Bruckstein, M. Elad, Dictionaries for sparse representation Domain for Real-Time Application, Proc. SPIE 9139, Real-Time Image and Vi-
modeling, Proc. IEEE 98 (6) (2010) 10451057, http://dx.doi.org/10.1109/ deo Processing, 2014, 91390H. http://dx.doi.org/10.1117/12.2051552.
JPROC.2010.2040551. [58] A. Hore, D. Ziou, Image quality metrics: Psnr vs. ssim, in: 20th International
[36] J. Yang, J. Wright, T. Huang, Y. Ma, Image super-resolution as sparse re- Conference on Pattern Recognition (ICPR), 2010, pp. 23662369. http://dx.doi.
presentation of raw image patches, in: IEEE Conference on Computer Vision org/10.1109/ICPR.2010.579.
and Pattern Recognition, June 2008, pp. 18. http://dx.doi.org/10.1109/CVPR. [59] Z. Wang, A. Bovik, H. Sheikh, E. Simoncelli, Image quality assessment: from
2008.4587647. error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (2004)
[37] K. Engan, S. Aase, J. Hakon Husoy, Method of optimal directions for frame 600612, http://dx.doi.org/10.1109/TIP.2003.819861.
design, in: IEEE International Conference on Acoustics, Speech, and Signal [60] S. Yang, Z. Liu, M. Wang, F. Sun, L. Jiao, Multitask dictionary learning and
Processing, vol. 5, 1999, pp. 24432446. http://dx.doi.org/10.1109/ICASSP.1999. sparse representation based single-image super-resolution reconstruction,
760624. Neurocomputing 74 (17) (2011) 31933203, http://dx.doi.org/10.1016/j.
[38] M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing over- neucom.2011.04.014.
complete dictionaries for sparse representation, IEEE Trans. Signal Process. 54 [61] T.K. Moon, W.C. Stirling, Mathematical Methods and Algorithms for Signal
(November (11)) (2006) 43114322, http://dx.doi.org/10.1109/ Processing, Prentice Hall, Upper Saddle River, NJ, 2000.
TSP.2006.881199. [62] G. Hamerly, J. Drake, Accelerating Lloyds algorithm for K-means clustering, in:
[39] R. Vidal, Y. Ma, S. Sastry, Generalized principal component analysis (GPCA), in: Partitional Clustering Algorithms, Springer International Publishing, 2015, pp.
IEEE Computer Society Conference on Computer Vision and Pattern Recogni- 4178. http://dx.doi.org/10.1007/978-3-319-09259-1_2.
tion, vol. 1, June 2003, pp. I-621I-628. http://dx.doi.org/10.1109/CVPR.2003. [63] I.M. Johnstone, A.Y. Lu, Sparse principal components analysis, ArXiv e-prints
1211411. arxiv:0901.4392.