You are on page 1of 62

ABSTRACT

While there has been a lot of recent work on object recognition and image understanding, the
focus has been on carefully establishing mathematical models for images, scenes, and
objects. In this paper, we propose a novel, nonparametric approach for object recognition and
scene parsing using a new technology we name label transfer. For an input image, our system
first retrieves its nearest neighbors from a large database containing fully annotated images.
Then, the system establishes dense correspondences between the input image and each of the
nearest neighbors using the dense SIFT flow algorithm which aligns two images based on
local image structures. Finally, based on the dense scene correspondences obtained from
SIFT flow, our system warps the existing annotations and integrates multiple cues in a
Markov random field framework to segment and recognize the query image. Promising
experimental results have been achieved by our nonparametric scene parsing system on
challenging databases. Compared to existing object recognition approaches that require
training classifiers or appearance models for each object category, our system is easy to
implement, has few parameters, and embeds contextual information naturally in the
retrieval/alignment procedure.
Chapter 1

Introduction
CHAPTER 1: INTRODUCTION

Scene Parsing: Segmentation + Grouping + Recognition


Natural images consist of an overwhelming number of visual patterns generated by very
diverse stochastic processes in nature. The objective of image understanding is to parse an
input image into its constituent patterns. The figure below is an example of parsing a
stadium scene hierarchically: human (face and clothes), sports field (a point process, a
curve process, homogeneous color regions, text) and spectators ( textures, persons).

Scene parsing, or recognizing and segmenting objects in an image, is one of the core
problems of computer vision.

Traditional approaches to object recognition begin by specifying an object model, such as


template matching constellations bags of features or shape models etc. These approaches
typically work with a fixed number of object categories and require training generative or
discriminative models for each category from training data. In the parsing stage, these
systems try to align the learned models to the input image and associate object category labels
with pixels, windows, edges, or other image representations. Recently, context information
has also been carefully modeled to capture the relationship between objects at the semantic
level. Encouraging progress has been made by these models on a variety of object recognition
and scene parsing tasks.

However, these learning-based methods do not, in general, scale well with the number of
object categories. For example, to include more object categories in an existing system, we
need to train new models for the new categories and, typically, adjust system parameters.
Training can be a tedious job if we want to include thousands of object categories in a scene
parsing system. In addition, the complexity of contextual relationships among objects also
increases rapidly as the quantity of object categories expands.

Recently, the emergence of large databases of images has opened the door to a new family
of methods in computer vision. Large database-driven approaches have shown the potential
for nonparametric methods in several applications. Instead of training sophisticated
parametric models, these methods try to reduce the inference problem for an unknown image
to that of matching to an existing set of annotated
images. In the authors estimate the pose of a human, relying on 0.5 million training examples.
In the proposed algorithm can fill holes on an input image by introducing
elements that are likely to be semantically correct through searching a large image database.
In a system is designed to infer the possible object categories that may appear in an image
by retrieving similar images in a large database. Moreover, the authors in showed that with
a database of 80 million images, even simple SSD match can give semantically meaningful
parsing for 32 -32 images.

In this paper, we propose a novel, nonparametric scene parsing system to transfer the labels
from existing samples in a large database to annotate an image, as illustrated in Fig. 1. For a
query image our system first retrieves the top matches in a large, annotated image database
using a combination of GIST matching and SIFT flow. Since these top matches are labeled,
we transfer the annotation of the top matches to the query image and obtain the scene parsing
result in . For comparison, the ground truth user annotation of the query is displayed in. Our
system is able to generate promising scene parsing results if images from the same scene type
as the query are retrieved in the annotated database.

However, it is nontrivial to build an efficient and reliable scene parsing system using dense
scene alignment. To account for the multiple annotation suggestions from the top matches,
a Markov random field model is used to merge multiple cues (e.g., likelihood, prior, and
spatial smoothness) into a robust annotation. Promising experimental results are achieved
on images from the LabelTransfer Database.Our goal is to explore the performance of
scene parsing through the transfer of labels from existing annotated images, rather than
building a comprehensive object recognition system. We show, however, that the
performance of our system outperforms existing approaches on our databases.
Fig. 1. For a query image (a), our system finds the top matches (b) (three are shown here) using scene retrieval and a SIFT flow
matching algorithm. The annotations of the top matches (c) are transferred and integrated to parse the input image, as shown in
(d). For comparison, the ground-truth user annotation of (a) is shown in (e).
Chapter 2

MATLAB
Chapter 2: MATLAB

2.1: Introduction
MATLAB (matrix laboratory) is a fourth-generation high-level programming language and
interactive environment for numerical computation, visualization and programming.

It allows matrix manipulations; plotting of functions and data; implementation of


algorithms; creation of user interfaces; interfacing with programs written in other languages,
including C, C++, Java, and FORTRAN; analyze data; develop algorithms; and create
models and applications.

It has numerous built-in commands and math functions that help you in mathematical
calculations, generating plots, and performing numerical methods.

1.2: Power Of Computational Mathematics


MATLAB is used in every facet of computational mathematics. Following are some
commonly used mathematical calculations where it is used most commonly −

 Dealing with Matrices and Arrays


 2-D and 3-D Plotting and graphics
 Linear Algebra
 Algebraic Equations
 Non-linear Functions
 Statistics
 Data Analysis
 Calculus and Differential Equations
 Numerical Calculations
 Integration
 Transforms
 Curve Fitting
 Various other special functions
2.3: Features
Following are the basic features of MATLAB −

 It is a high-level language for numerical computation, visualization and application


development.

 It also provides an interactive environment for iterative exploration, design and


problem solving.

 It provides vast library of mathematical functions for linear algebra, statistics, Fourier
analysis, filtering, optimization, numerical integration and solving ordinary
differential equations.

 It provides built-in graphics for visualizing data and tools for creating custom plots.

 MATLAB's programming interface gives development tools for improving code


quality maintainability and maximizing performance.

 It provides tools for building applications with custom graphical interfaces.

 It provides functions for integrating MATLAB based algorithms with external


applications and languages such as C, Java, .NET and Microsoft Excel.

1.4: Uses
MATLAB is widely used as a computational tool in science and engineering encompassing
the fields of physics, chemistry, math and all engineering streams. It is used in a range of
applications including −

 Signal Processing and Communications


 Image and Video Processing
 Control Systems
 Test and Measurement
 Computational Finance
 Computational Biology
Chapter 3

Related Work
CHAPTER 3: RELATED WORK

Object recognition is an area of research that has greatly evolved over the last decade. Many
works focusing on single-class modeling, such as faces, digits, characters, and pedestrians
have been proven successful and, in some cases, the problems have been mostly deemed as
solved. Recent efforts have turned to mainly focusing in the area of multiclass object
recognition. In creating an object detection system, there are many basic building blocks to
take into account; feature description and extraction is the first stepping stone. Examples of
descriptors include gradient-based features such as SIFT and HOG , shape context and patch
statistics. Consequently, selected feature descriptors can be further applied to images in either
a sparse manner by selecting the top key points containing the highest response from the
feature descriptor, or densely by observing feature statistics across the image.

Sparse key point representations are often matched among pairs of images. Since the generic
problem of matching two sets of key points is NP-hard, approximation algorithms have been
developed to efficiently compute key point matches minimizing error rates (e.g., the pyramid
match kernel and vocabulary trees). On the other hand, dense representations have been
handled by modeling distributions of the visual features over neighborhoods in the image or
in the image as a whole. We chose the dense representation in the paper due to recent
advances in dense image matching.

At a higher level, we can also distinguish two types of object recognition approaches:
parametric approaches that consist of learning generative/discriminative models, and
nonparametric approaches that rely on image retrieval and matching. In the parametric family
we can find numerous template-matching methods, where classifiers are trained to
discriminate between an image window containing an object or a background. However,
these methods assume that objects are mostly rigid and are susceptible to little or no
deformation. To account for articulated objects, constellation models have been designed to
model objects as ensembles of
parts considering spatial information depth ordering information and multi-resolution modes.

Recently, a new idea of integrating humans in the loop via crowd sourcing for visual
recognition of specialized classes such as plants and animal species has emerged this method
integrates the description of an object in less than 20 discriminative questions that humans
can answer after visually inspecting the image.

In the realm of nonparametric methods we find systems such as Video Google a system that
allows users to specify a visual query of an object in a video and subsequently retrieve
instances of the same object across the movie. Another nonparametric system is the one in
where a previously unknown query image is matched against a densely labeled image
database; the nearest neighbors are used to build a label probability map for the query, which
is further used to prune out object detectors of classes that are unlikely to take place in the
image.Nonparametric methods have also been widely used in web data to retrieve similar
images. For example, a customized distance function is used at a retrieval stage to compute
the distance between a query image and images in the training set, which subsequently cast
votes to infer the object class of the query. In the same spirit, our nonparametric label transfer
system avoids modeling object appearances explicitly as our system parses a query image
using the annotation of similar images in a
training database and dense image correspondences.

Recently, several works have also considered contextual information in object detections to
clean and reinforce individual results. Among contextual cues that have been used are object-
level co-occurrences, spatial relationships and 3D scene layout. For a more detailed and
comprehensive study and benchmark of contextual works. Instead of explicitly
modeling context, our model incorporates context implicitly as object co-occurrences and
spatial relationships are retained in label transfer. An earlier version of our work appeared at.
In this paper, we will explore the label-transfer framework in-depth with more thorough
experiments and insights. Other recent papers have also introduced similar ideas. For
instance, over-segmentation is performed to the query image and segment-based classifiers
trained on the nearest neighbors are applied to recognize each segment. Scene boundaries are
discovered by the common edges shared by nearest neighbors.

Fig. 2. System pipeline. There are three key algorithmic components (rectangles) in our system: scene
retrieval, dense scene alignment, and label transfer. The ovals denote data representations.
Chapter 4

System Overview
CHAPTER 4: SYSTEM OVERVIEW

The core idea of our nonparametric scene parsing system is recognition-by-matching. To


parse an input image, we match the visual objects in the input image to the images in a
database. If images in the database are annotated with object category labels and if the
matching is semantically meaningful, i.e., building corresponds to building, window to
window, person to person, then we can simply transfer the labels of the images in the
database to parse the input. Nevertheless, we need to deal with many practical issues in order
to build a reliable system.

Fig. 2 shows the pipeline of our system, which consists of the following three algorithmic
modules:

. Scene retrieval: Given a query image, use scene retrieval techniques to find a set of
nearest neighbors that share similar scene configuration (including objects and their
relationships) with the query.

. Dense scene alignment: Establish dense scene corre-spondence between the query
image and each of the retrieved nearest neighbors. Choose the nearest neighbors with
the top matching scores as voting candidates.

. Label transfer: Warp the annotations from the voting candidates to the query image
according to estimated dense correspondence. Reconcile multiple labeling and impose
spatial smoothness under a Markov random field (MRF) model.

Although we are going to choose concrete algorithms for each module in this paper, any
algorithm that fits to the module can be plugged into our nonparametric scene parsing system.
For example, we use SIFT flow for dense scene alignment, but it would also suffice to use
sparse feature matching and then propagate sparse correspondences to produce dense
counterparts.

A key component of our system is a large, dense, and annotated image database. In this
paper, we use two sets of databases, both annotated using the LabelMe online annotation tool
to build and evaluate our system. The first is the LabelMe Outdoor (LMO) database
containing 2,688 fully annotated images, most of which are outdoor scenes including street,
beach, mountains, fields, and buildings. The second is the SUN database containing 9,566
fully annotated images, covering both indoor and outdoor scenes; in fact, LMO is a subset
of SUN.

1. Other scene parsing and image understanding systems also require such a database.
We do not require more than others.

We use the LMO database to explore our system in-depth, and also report the results on the
SUN database.

Before jumping into the details of our system, it is helpful to look at the statistics of the LMO
database. The 2,688 images in LMO are randomly split into 2,488 for training and 200 for
testing. We chose the top 33 object categories with the most labeled pixels. The pixels that
are not labeled, or labeled as other object categories, are treated as the 34th category:
“unlabeled.” The per pixel frequency count of these object categories in the training set is
shown at the top of Fig. 3. The color of each bar is the average RGB value of the
corresponding object category from the training data with saturation and brightness boosted
for visualization purposes. The top 10 object categories are sky, building, mountain, tree,
unlabeled, road, sea, field, grass, and river. The spatial priors of these object categories are
displayed at the bottom of Fig. 3, where white denotes zero probability and the saturation of
color is directly proportional to its probability. Note that, consistent with common
knowledge, sky occupies the upper part of the image grid and field occupies the lower part.
Furthermore, there are only limited samples for the sun, cow, bird, and moon classes.
Fig. 3. Top: The per-pixel frequency counts of the object categories in our data set (sorted in
descending order). The color of each bar is the average RGB value of each object category
from the training data with saturation and brightness boosted for visualization. Bottom: The
spatial priors of the object categories in the database. White means zero and the saturated
color means high probability.
Chapter 5

Image Processing

Chapter 5: Image Processing


5.1: Introduction to Image Processing

Image Processing is a technique to enhance raw images received from cameras/sensors


placed on space probes, aircrafts and satellites or pictures taken in normal day-today life for
various applications. An Image is rectangular graphical object. Image processing involves
issues related to image representation, compression techniques and various complex
operations, which can be carried out on the image data. The operations that come under
image processing are image enhancement operations such as sharpening, blurring,
brightening, edge enhancement etc. Image processing is any form of signal processing for
which the input is an image, such as photographs or frames of video; the output of image
processing can be either an image or a set of characteristics or parameters related to the
image. Most image-processing techniques involve treating the image as a two-dimensional
signal and applying standard signal-processing techniques to it. Image processing usually
refers to digital image processing, but optical and analog image processing are also possible.

5.2: Image formats supported by MATLAB

The following image formats are supported by Matlab:

 BMP
 HDF
 JPEG
 PCX
 TIFF
 XWB

Most images you find on the Internet are JPEG-images which is the name for one of the most
widely used compression standards for images. If you have stored an image you can usually
see from the suffix what format it is stored in..

5.3: Image Representation

If an image is stored as a JPEG-image on your disc we first read it into Matlab. However, in
order to start working with an image, for example perform a wavelet transform on the image,
we must convert it into a different format. This section explains four common formats.

5.3.1: Intensity image (gray scale image)


This is the equivalent to a "gray scale image" and this is the image we will mostly work with
in this course. It represents an image as a matrix where every element has a value
corresponding to how bright/dark the pixel at the corresponding position should be colored.
There are two ways to represent the number that represents the brightness of the pixel: The
double class (or data type). This assigns a floating number ("a number with decimals")
between 0 and 1 to each pixel. The value 0 corresponds to black and the value 1 corresponds
to white. The other class is called uint8 which assigns an integer between 0 and 255 to
represent the brightness of a pixel. The value 0 corresponds to black and 255 to white. The
class uint8 only requires roughly 1/8 of the storage compared to the class double. On the
other hand, many mathematical functions can only be applied to the double class. We will
see later how to convert between double and uint8.

5.3.2: Binary image


This image format also stores an image as a matrix but can only color a pixel black or white
(and nothing in between). It assigns a 0 for black and a 1 for white.

5.3.3: Indexed image


This is a practical way of representing color images. (In this course we will mostly work
with gray scale images but once you have learned how to work with a gray scale image you
will also know the principle how to work with color images.) An indexed image stores an
image as two matrices. The first matrix has the same size as the image and one number for
each pixel. The second matrix is called the color map and its size may be different from the
image. The numbers in the first matrix is an instruction of what number to use in the color
map matrix.

5.3.4: RGB image


This is another format for color images. It represents an image with three matrices of sizes
matching the image format. Each matrix corresponds to one of the colors red, green or blue
and gives an instruction of how much of each of these colors a certain pixel should use.

5.3.5: Multiframe image

In some applications we want to study a sequence of images. This is very common in


biological and medical imaging where you might study a sequence of slices of a cell. For
these cases, the multiframe format is a convenient way of working with a sequence of
images. In case you choose to work with biological imaging later on in this course, you may
use this format.
Chapter 6

Segmentation

CHAPTER 6: SEGMENTATION

6.1: Introduction
Natural images consist of an overwhelming number of visual patterns generated by very
diverse stochastic processes in nature. The objective of image understanding is to parse an
input image into its constituent patterns. Depending on the type of patterns that a task is
interested in, the parsing problem is called respectively 1) Image segmentation --- for
homogeneous grey/color/texture region processes. 2) Perceptual grouping --- for point,
curve, and general graph processes 3) Object recognition --- for text and objects.

6.2: Segmentation
In computer vision, segmentation refers to the process of partitioning a digital image into
multiple regions (sets of pixels). The goal of segmentation is to simplify and/or change the
representation of an image into something that is more meaningful and easier to analyze.
Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in
images.
The result of image segmentation is a set of regions that collectively cover the entire image,
or a set of contours extracted from the image. Each of the pixels in a region is similar with
respect to some characteristic or computed property, such as color, intensity, or texture.
Adjacent regions are significantly different with respect to the same characteristic(s).
Segmentation algorithms generally are based on one of 2 basis properties of intensity values
Discontinuity : to partition an image based on abrupt changes in intensity (such as edges)
Similarity : to partition an image into regions that is similar according to a set of predefined
criteria.
For intensity images (i.e. those represented by point-wise intensity levels) four popular
approaches are: threshold techniques, edge-based methods, region-based techniques, and
connectivity-preserving relaxation methods.
Threshold techniques, which make decisions based on local pixel information, are effective
when the intensity levels of the objects fall squarely outside the range of levels in the
background. Because spatial information is ignored, however, blurred region boundaries can
create havoc. Edge-based methods center around contour detection: their weakness in
connecting together broken contour lines make them, too, prone to failure in the presence of
blurring.
A region-based method usually proceeds as follows: the image is partitioned into connected
regions by grouping neighboring pixels of similar intensity levels. Adjacent regions are then
merged under some criterion involving perhaps homogeneity or sharpness of region
boundaries. Over stringent criteria create fragmentation; lenient ones overlook blurred
boundaries and over merge. Hybrid techniques using a mix of the methods above are also
popular.
A connectivity-preserving relaxation-based segmentation method, usually referred to as the
active contour model, was proposed recently. The main idea is to start with some initial
boundary shape represented in the form of spline curves, and iteratively modifies it by
applying various shrink/expansion operations according to some energy function. Although
the energy minimizing model is not new, coupling it with the maintenance of an ``elastic''
contour model gives it an interesting new twist. As usual with such methods, getting trapped
into a local minimum is a risk against which one must guard; this is no easy task.

6.3: Application of Image Segmentation


 Identifying objects in a scene for object-based measurements such as size and shape.
 Identifying objects in a moving scene for object-based video compression (MPEG4).
 Locate tumors and other pathologies.
 Measure tissue volumes.
 Computer-guided surgery.
 Diagnosis.
 Treatment planning.
 Study of anatomical structure.
 Locate objects in satellite images (roads, forests, etc.) .
 Face recognition.
 Fingerprint recognition.
 Traffic control systems .
 Brake light detection.
 Machine vision

Chapter 7
Delta E or Delta E*

CHAPTER 7: DELTA E OR DELTA E*


7.1: Introduction

Delta E is the standard calculation metric which correlates the human visual judgment of
differences between two perceived colors. This standard quantifies this difference and is used
to calculate the deviation from the benchmark standards which allows a tolerance level to be
set (based on L*a*b coordinates). Generally speaking, the lower the Delta E number, the
closer the display matches the input color to the display’s reproduced color. The Commission
Internationale de l’Eclairge (International Commission on Illumination) or CIE, has
established Delta E as the standard color distance metric, revising past definitions to
incorporate the human eye being more sensitive to certain colors. To address these issues, in
addition to finding accepted tolerance levels, CIELAB can be used to scale, with the
underlying theory being no two colors can be both red and green nor blue and yellow at the
same time. With a lightness scale vertically in the center, colors can then be expressed with
single values.

The term Delta E or ∆E is used to describe color differences in the CIELAB color space. The
term stems from the greek letter delta which is used in science to denote difference. The E
stands Empfindung, a german wording meaning feeling. Put them together and you get
different feeling.

7.2: The History of Delta E


The scientific community and color application industries have spent a great deal of time and
resources developing methods for quantifying the visual perception of color. Even before
industrial applications existed to utilize such knowledge, people explored human vision and
hypothesized how color is perceived. In the later half of the twenty-first century, the
Commission Internationale De L'Eclairage (International Commission on Illumination,
denoted CIE) made two formal recommendations for perceptual color spaces and color
difference formulas that could be utilized in industry to quantify and judge color. These color
spaces became known as CIELAB and CIELUV.

What led to the creation of these two color spaces and associated color difference
formulas? Its a long interesting history that began with Isaac Newton hypothesizing the
origin of white light.

7.3: Early Theories of Color Vision


The science of color can be traced back to the times of Isaac Newton. Newton had
hypothesized that white light was made up of 'rays of colored light.' The nineteenth century
saw an increased pace in research and the development of theory in color vision. Thomas
Young, James Clark Maxwell, Herman von Helmholtz, and Ewald Hering reported major
theories during this time. In 1807 Young described a trichromatic color theory in his Lectures
on Natural Philosophy and Mechanical Arts. Young's theory stated that all colors could be
created with three wavelengths. With red, green, and blue making up the dominant color of
the wavelengths. Maxwell further quantified Young's theory in empirical studies.

In 1855 Helmholtz published his Manual of Psychological Optics, in it he introduced the


concepts of hue, saturation, and brightness. Helmholtz was also a proponent of Young's
three-color theory. In 1878 Hering published On the Theory of Sensibility to Light in which
he proposed in opposition to the work of Young, Maxwell, and Helmhotz a theory of
opposing colors. Hering's opponent color theory stated that some color primaries oppose each
other and cannot be combined to produce a perceivable color. These color are red & green,
yellow & blue, and white & black.

It is known today that both Helmholtz and Hering were correct in their theories of human
color vision. The human vision system is complex and starts with receptors that are sensitive
to the long (L), medium (M), and short (S) wavelengths of the electromagnetic region that
make up visible light. The vision system then transforms these visual stimuli into opponent
color signals and sends them to the brain for processing.

7.4: Working

CIE L*, a*, b* color values provide a complete numerical descriptor of the color, in a
rectangular coordinate system.

 L* represents lightness with 0 being a perfect black of 0% reflectance or transmission;


50 a middle gray; 100 a perfect white of 100% reflectance or a perfect clear of 100%
transmission.
 a* represents redness-greenness of the color. Positive values of a* are red; negative
values of a* are green; 0 is neutral.
 b* represents yellowness-blueness of the color. Positive values of b* are yellow;
negative values of b* are blue; 0 is neutral.

Delta L*, delta a* and delta b* values provide a complete numerical descriptor of the
color differences between a Sample or lot and a Standard or target color.

 dL* represents a lightness difference between sample and standard colors.


 da* represents a redness or greenness difference between sample and standard
colors.
 db* represents blueness-yellowness difference between sample and standard colors.

In the case of dL*, da*, db*, the higher the value, the greater the difference in that dimension.
Delta E* (Total Color Difference) is calculated based on delta L*, a*, b* color differences
and represents the distance of a line between the sample and standard.
In addition to quantifying the overall color difference between a sample and standard color,
delta E* was intended to be a single number metric for Pass/Fail tolerance decisions.
Effectively a delta E* tolerance value defines an acceptance sphere around the standard or
target color.
The lower the delta E* value is, the closer the sample is to the standard. A delta E* value of
0.00 means the color of the sample is identical to the color of the standard.
Please note that while CIE L*, a*, b* are being used in this example, delta E (no star)
represents the overall sample difference in Hunter L, a, b coordinates.
The "E" in delta E or delta E* is derived from "Empfindung", the German word for
sensation. Delta E means a "difference in sensation" for any delta E-type metric, CIE or
Hunter.

7.5: Meaning of Delta E Number


The higher the Delta E (ΔE), the further away the colour is from the true hue, using CIELAB.
Perfect colour has a Delta E of zero, although this cannot be detected using the human eye.
The minimal detectable difference is between 1-2.5 Delta E. Without a fully-realised colour
management system, it is very unlikely that monitor calibration can achieve this level of
performance for primary and secondary colours. Most virtual displays have a full
complement of gray scale settings, so for white, calibration can often meet this standard. If a
Delta E number is less than 1 between two colours that are not touching, it is barely
perceivable by the average human observer. A Delta E between 3 and 6 is usually considered
an acceptable number in commercial reproduction, but the colour difference may be
perceived by printing and graphic professionals. (Note: Human vision is more sensitive to
colour differences if two colours actually touch each other).

7.6: Importance Of Delta E


 Colour Accuracy: With a lower Delta E, the image colour from input signals, like
cameras and camcorders, can be more accurate while displayed on the monitor,
without colour distortion. This is incredibly important for professionals that need to
have exact colour replication.

 No colour difference between multiple monitors: Professionals may use multiple


monitors for creating graphics. On one monitor, graphics and video can be edited and
then can be reviewed on a second, without colour degradation or inaccuracies. With
lower a Delta E, professionals are able to have accurate and consistent colour
reproduction.

7.7: Color Difference Revisions


While the goal of CIELAB was create a perceptually uniform color space that could be
used for specifying color differences, it was found that some non-uniformities in the space
caused color match problems in industrial applications of the color space.
CMC Color Difference formula

The Color Measurement Committee of the Society of Dyers and Colorists published a new
equation for determining color differences in 1986. This equation became known as the
∆ECMC color difference formula. The CMC's work was to derive a formula that better
handled small color differences found in the colorant industries.[4] This was accomplished
by adding weighting factors to the equations that made correlate better with what the eye
senses. The equation used ellipsoids to create weighting factors for the lightness and
chroma factors.[6]

It is know that changes in lightness are hard to perceive than changes in chroma. The
∆ECMC takes this into account by induction of a lightness to chroma factor. This factor is
traditionally set to 2:1. Hue is a constant defined as 1. [6] The CMC color difference
equation has been adopted by the colorant industry and by graphic arts as a more accurate
color difference equation. The CMC equation became an ISO standard for the textile
industry in 1995.
Chapter 8

System Design

CHAPTER 8: SYSTEM DESIGN


In this section, we will describe each module of our nonparametric scene parsing system.

8.1: Scene Retrieval

The objective of scene retrieval is to retrieve a set of nearest neighbors in the database for a
given query image. There exist several ways for defining a nearest neighbor set. The most
common definition consists of taking the K closest points to the query (K-NN). Another
model, -NN, widely used in texture synthesis, considers all of the neighbors within ð1 þ Þ
times the minimum distance from the query. We generalize these two types to hK; i-NN, and
define it as

N ðxÞ ¼ fyi j distðx; yiÞ ð1 þ Þdistðx; y1Þ;


ð1Þ
y1 ¼ arg min distðx; yiÞ; i Kg:

As ! 1, hK; 1i-NN is reduced to K-NN. As K ! 1, h1; i-NN is reduced to -NN. However, hK;
i-NN representation gives us the flexibility to deal with the density variation of the graph, as
shown in Fig. 5. We will show how K affects the performance in the experimental section.
In practice, we found that ¼ 5 is a good parameter and we will use it through our experiments.
Nevertheless, dramatic improvement of hK; i-NN over K-NN is not expected as sparse
samples are few in our databases.

We have not yet defined the distance function distð ; Þ between two images. Measuring
image similarities/dis-tances is still an active research area; a systematic study of image
features for scene recognition can be found in. In this paper, three distances are used:
euclidean distance of GIST, spatial pyramid histogram intersection of HOG visual words,
and spatial pyramid histogram intersection of the ground-truth annotation. For the HOG
distance, we use the standard pipeline of computing HOG features on a dense grid and
quantizing features to visual words over a set of images using k-means clustering. The
ground truth-based distance metric is used to estimate an upper bound of our system for
evaluation purposes. Both the HOG and the ground truth distances are computed in the same
manner. The ground truth distance is computed by building histograms of pixel-wise labels.
To include spatial information, the histograms are computed by dividing an image into 2 2
windows and concatenating the four histograms into a single vector. Histogram intersection
is used to compute the ground truth distance. We obtain the HOG distance by replacing pixel-
wise labels with HOG visual words.

In Fig. 4, we show the importance of the distance metric as it defines the neighborhood
structure of the large image database. We randomly selected 200 images from the LMO.
Fig. 4. The structure of a database depends on image distance metric. Top: The hK; i-NN graph of the
LabelMe Outdoor database visualized by scaled MDS using GIST feature as distance. Bottom: The hK; i-
NN graph of the same database visualized using the pyramid histogram intersection of ground-truth
annotation as distance. Left: RGB images; right: annotation images. Notice how the ground-truth
annotation emphasizes the underlying structure of the database. In (c) and (d), we see that the image
content changes from urban, streets (right), to highways (middle), and to nature scenes (left) as we pan
from right to left. Eight hundred images are randomly selected from LMO for this visualization.
database and computed pair-wise image distances using GIST (top) and the ground-truth
annotation (bottom). Then, we use multidimensional scaling (MDS) to map these images to
points on a 2D grid for visualization. Although the GIST descriptor is able to form a
reasonably meaningful image space where semantically similar images are clustered, the
image space defined by the ground-truth annotation truly reveals the underlying structures
of the image database. This will be further examined in the experimental section.

8.2: SIFT Flow for Dense Scene Alignment

As our goal is to transfer the labels of existing samples to parse an input image, it is essential
to find the dense correspondence for images across scenes. In our previous work, we have
demonstrated that SIFT flow is capable of establishing semantically meaningful
correspondences among two images by matching local SIFT descriptors. We further
extended SIFT flow into a hierarchical computa-tional framework to improve the
performance . In this section, we will provide a brief explanation of the algorithm; for a
detailed description.

Similarly to optical flow, the task of SIFT flow is to find dense correspondence between
two images. Let p ¼ ðx; yÞ contain the spatial coordinate of a pixel, and wðpÞ ¼ ðuðpÞ;
vðpÞÞ be the flow vector at p. Denote s1 and s2 as the per-pixel SIFT descriptor [30] for two
images,2 and " contains all the spatial neighborhood (a four-neighbor system is used). The
energy function for SIFT flow is defined as:

X
EðwÞ minðks1ðpÞ s2ðp þ
¼ wðpÞÞk1; tÞ þ ð2Þ
p
X
ðjuðpÞj þ jvðpÞjÞ þ ð3Þ
P
minð juðpÞ uðqÞj; dÞþ
ðX ðÞ
p;qÞ2" 4

minð jvðpÞ vðqÞj; dÞ;

which contains a data term, small displacement term, and smoothness term (a.k.a. spatial
regularization). The data term in (2) constrains the SIFT descriptors to be matched along with
the flow vector wðpÞ. The small displacement term in (3) constrains the flow vectors to be
as small as possible when no other information is available. The smoothness term in (4)
constrains the flow vectors of adjacent pixels to be similar. In this objective function,
truncated L1 norms are used in both the data term and the smoothness term to account for
matching outliers and flow discontinuities, with t and d as the threshold, respectively.

While SIFT flow has demonstrated the potential for aligning images across scenes [29],
the original implementation scales poorly with respect to the image size. In SIFT flow, a
pixel in one image can literally match to any other pixel in another image. Suppose the image
has h2 pixels, then the time and space complexity of the belief propagation algorithm to
estimate the SIFT flow is Oðh4Þ. As reported

SIFT descriptors are computed at each pixel using a 16 16 window. The window is divided
into 4 4 cells, and image gradients within each cell are quantized into a 8-bin histogram.
Therefore, the pixel-wise SIFT feature is a 128D vector

in [29], the computation time for 145 105 images with an 80 80 searching neighborhood is
50 seconds. The original implementation of SIFT flow would require more than 2 hours to
process a pair of 256 256 images in our database with a memory usage of 16 GB to store the
data term. To address the performance drawback, a coarse-to-fine SIFT flow matching
scheme was designed to significantly
Fig. 5. An image database can be non uniform as illustrated by some random 2D points. The green node (A)
is surrounded densely by neighbors, whereas the red node (B) resides in a sparse area. If we use

improve the performance. As illustrated in Fig. 6, the basic idea consists of estimating the
flow at a coarse level of image grid, and then gradually propagating and refining the flow
from coarse to fine; please refer to [28] for details. As a result, the complexity of this coarse-
to-fine algorithm is Oðh2 log h Þ, a significant speed up compared to Oðh4Þ. The matching
between two 256 256 images take 31 seconds on a workstation with two quad-core 2.67 GHz
Intel Xeon CPUs and 32 GB memory, in a C++ implementation. We also discovered that the
coarse-to-fine scheme not only runs

Fig. 6. An illustration of our coarse-to-fine pyramid SIFT flow matching. The green square denotes the
searching window for pk at each pyramid level k. For simplicity, only one image is shown here, where pk is
on image s1 and ck and wðpkÞ are on image s2. The details of the algorithm
can be found in [28].

K -NN (K ¼ 5), then some samples (orange nodes) far away from the query (B) can be
chosen as neighbors. If, instead, we use -NN and choose the radius as shown in the picture,
then there can be too many
neighbors for a sample such as (A). The combination, hK; i-NN, shown as gray-edges,
provides a good balance for these two criteria.
Fig. 7. For a query image, we first find a hK; i-nearest neighbor set in the database using GIST matching
[34]. The nearest neighbors are reranked using SIFT flow matching scores, and form a top M-voting
candidate set. The annotations are transferred from the voting candidates to parse the query image.

significantly faster, but also achieves lower energies most of the time compared to the
ordinary matching algorithm.

Some SIFT flow examples are shown in Fig. 8, where dense SIFT flow fields (Fig. 8f) are
obtained between the query images (Fig. 8a) and the nearest neighbors (Fig. 8c). It is trial to
verify that the warped SIFT images (Fig. 8h) based on the SIFT flows (Fig. 8f) look very
similar to the SIFT images (Fig. 8b) of the inputs (Fig. 8a), and that the SIFT flow fields
(Fig. 8f) are piecewise smooth. The essence of SIFT flow is manifested in Fig. 8g, where the
same flow field is applied to warp the RGB image of the nearest neighbor to the query. SIFT
flow is trying to hallucinate the structure of the query image by smoothly shuffling the pixels
of the nearest neighbors. Because of the intrinsic similarities within each object categories,
it is not surprising that, through aligning image structures, objects of the same categories are
often matched. In addition, it is worth noting that one object in the nearest neighbor can
correspond to multiple objects in the query since the flow is asymmetric. This allows reuse
of labels to parse multiple object instances.

8.3: Scene Parsing through Label Transfer

Now that we have a large database of annotated images and a technique for establishing
dense correspondences across scenes, we can transfer the existing annotations to parse a
query image through dense scene alignment. For a given query image, we retrieve a set of
hK; i-nearest neighbors in our database using GIST matching [34]. We then compute the
SIFT flow from the query to each nearest neighbor, and use the achieved minimum energy
(defined in (4)) to rerank the hK; i-nearest neighbors. We further select the top M-ranked
retrievals (M K) to create our voting candidate set. This voting set will be used to transfer its
contained annotations into the query image. This procedure is illustrated in Fig. 7.
Under this setup, scene parsing can be formulated as the following label transfer problem:
For a query image I with its corresponding SIFT image s, we have a set of voting candidates
fsi; ci; wigi¼1:M , where si, ci, and wi are the SIFT image, annotation, and SIFT flow field (from
s to si) of the ith voting candidate, respectively. ci is an integer image where ciðpÞ 2 f1; . . . ;
Lg is the index of object category for pixel p. We want to obtain the annotation c for the
query image by transferring ci to the query image according to the dense correspondence wi.

We build a probabilistic Markov random field model to integrate multiple labels, prior information
of object category, and spatial smoothness of the annotation to parse
image I. Similarly to that of [43], the posterior probability is defined as: where Z is the normalization
constant of the probability. This posterior contains three components, i.e., likelihood, prior, and
spatial smoothness. The likelihood term is defined as
where _p;l ¼ fi; ciðp þ wðpÞÞ ¼ lg, l ¼ 1; . . . ; L, is the index set of the voting candidates whose
label is l after being warped to pixel p. _ is set to be the value of the maximum
difference of SIFT feature: _ ¼ maxs1;s2;pks1ðpÞ _ s2ðpÞk. The prior term _ðcðpÞ ¼ lÞ indicates
the prior probability that object category l appears at pixel p. This is obtained
from counting the occurrence of each object category at each location in the training set:
where histlðpÞ is the spatial histogram of object category l. The smoothness term is defined to bias
the neighboring pixels into having the same label in the event that no other
information is available, and the probability depends on the edge of the image: The stronger
luminance edge, the more likely it is that the neighboring pixels may have different
labels: Notice that the energy function is controlled by four parameters, K and M that decide the
mode of the model and _ and _ that control the influence of spatial prior and
smoothness. Once the parameters are fixed, we again use the BP-S algorithm to minimize the energy.
The algorithm converges in two seconds on a workstation with two quadcore
2.67 GHz Intel Xeon CPUs. A significant difference between our model and that in
[43] is that we have fewer parameters because of the nonparametric nature of our approach, whereas
classifiers were trained in [43]. In addition, color information is not included in our model at the
present as the color distribution for each object category is diverse in our databases.
Fig. 8. System overview. For a query image, our system uses scene retrieval techniques such as [34] to find
hK; i-nearest neighbors in our database. We apply coarse-to-fine SIFT flow to align the query image to the
nearest neighbors, and obtain top M as voting candidates (M ¼ 3 here). (c), (d), (e): The RGB image, SIFT
image and user annotation of the voting candidates. (f): The inferred SIFT flow field, visualized using the
color scheme shown on the left (hue: orientation; saturation: magnitude). (g), (h), and (i) are the warped
version of (c), (d), (e) with respect to the SIFT flow in (f). Notice the similarity between (a) and (g), (b) and
(h). Our system combines the voting from multiple candidates and generates scene parsing in (j) by
optimizing the posterior. (k): The ground-truth annotation of (a).
Chapter 9

Results
CHAPTER 9: RESULTS

EXPECTED OUTPUT :
OUTPUT 1: A Blue color portion of the beach
OUTPUT 2: AN ONION
OUTPUT 3
OUTPUT 4:
OUTPUT 5:
OUTPUT 6:
CONCLUSION

We have presented a nonparametric scene parsing system to integrate and transfer


the annotations from a large database to an input image via dense scene alignment.
A coarse-to-fine SIFT flow matching scheme is proposed to reliably and efficiently
establish dense correspondences between images across scenes. Using the dense
scene correspondences, we warp the pixel labels of the existing samples to the query.
Furthermore, we integrate multiple cues to segment and recognize the query image
into the object categories in the database. Promising results have been achieved by
our scene alignment and parsing system on challenging databases. Compared to
existing approaches that require training for each object category, our nonpara-
metric scene parsing system is easy to implement, has only a few parameters, and
embeds contextual information naturally in the retrieval/alignment procedure.
FUTURE SCOPE

The future of image processing and scene parsing will involve scanning the
heavens for other intelligent life out in space. Also new intelligent, digital species
created entirely by research scientists in various nations of the world will include
advances in image processing applications. Due to advances in image processing
and related technologies there will be millions and millions of robots in the world
in a few decades time, transforming the way the world is managed. Advances in
image processing and artificial intelligence6 will involve spoken commands,
anticipating the information requirements of governments, translating languages,
recognizing and tracking people and things, diagnosing medical conditions,
performing surgery, reprogramming defects in human DNA, and automatic driving
all forms of transport. With increasing power and sophistication of modern
computing, the concept of computation can go beyond the present limits and in
future, image processing technology will advance and the visual system of man can
be replicated. The future trend in remote sensing will be towards improved sensors
that record the same scene in many spectral channels. Graphics data is becoming
increasingly important in image processing app1ications. The future image
processing applications of satellite based imaging ranges from planetary
exploration to surveillance applications.
REFERENCES

1. E.H. Adelson, “On Seeing Stuff: The Perception of Materials by Humans and
Machines,” Proc. SPIE, vol. 4299, pp. 1-12, 2001.
2. S. Belongie, J. Malik, and J. Puzicha, “Shape Context: A New Descriptor for
Shape Matching and Object Recognition,” Proc. Advances in Neural
Information Processing Systems, 2000.
3. A. Berg, T. Berg, and J. Malik, “Shape Matching and Object Recognition
Using Low Distortion Correspondence,” Proc. IEEE Conf. Computer Vision
and Pattern Recognition, 2005
4. I. Borg and P. Groenen, Modern Multidimensional Scaling: Theory and
Applications, second ed. Springer-Verlag, 2005.
5. M.J. Choi, J.J. Lim, A. Torralba, and A. Willsky, “Exploiting Hierarchical
Context on a Large Database of Object Cate-gories,” Proc. IEEE Conf.
Computer Vision and Pattern Recogni-tion, 2010.
6. D. Crandall, P. Felzenszwalb, and D. Huttenlocher, “Spatial Priors for Part-
Based Recognition Using Statistical Models,” Proc. IEEE Conf. Computer
Vision and Pattern Recognition, 2005.
7. G. Edwards, T. Cootes, and C. Taylor, “Face Recognition Using Active
Appearance Models,” Proc. European Conf. Computer Vision, 1998.
8. P. Felzenszwalb and D. Huttenlocher, “Pictorial Structures for Object
Recognition,” Int’l J. Computer Vision, vol. 61, no. 1, pp. 55-79, 2005.
9. G. Edwards, T. Cootes, and C. Taylor, “Face Recognition Using Active
Appearance Models,” Proc. European Conf. Computer Vision, 1998.

10. A.A. Efros and T. Leung, “Texture Synthesis by Non-Parametric


Sampling,” Proc. IEEE Int’l Conf. Computer Vision, 1999.
11. R. Fergus, P. Perona, and A. Zisserman, “Object Class Recognition by
Unsupervised Scale-Invariant Learning,” Proc. IEEE Conf. Computer Vision
and Pattern Recognition, 2003.

12. A. Frome, Y. Singer, and J. Malik, “Image Retrieval and Classification Using
Local Distance Functions,” Proc. Advances in Neural Information Processing
Systems, 2006.

13. A. Gupta and L.S. Davis, “Beyond Nouns: Exploiting Prepositions and
Comparative Adjectives for Learning Visual Classifiers,” Proc. European
Conf. Computer Vision, 2008.

14. J. Hays and A.A. Efros, “Scene Completion Using Millions of


Photographs,” ACM Trans. Graphics, vol. 26, no. 3, 2007

.
15. B.C. Russell, A. Torralba, K.P. Murphy, and W.T. Freeman, “LabelMe: A
Database and Web-Based Tool for Image Annota-tion,” Int’l J. Computer
Vision, vol. 77, nos. 1-3, pp. 157-173, 2008.
APPENDICES

PROGRAM:

% The RGB image is converted to LAB color space and then the user draws
% some freehand-drawn irregularly shaped region to identify a color.
% The Delta E (the color difference in LAB color space) is then calculated
% for every pixel in the image between that pixel's color and the average
% LAB color of the drawn region. The user can then specify a number that
% says how close to that color would they like to be. The software will
% then find all pixels within that specified Delta E of the color of the drawn
region.

function DeltaE()
clc; % Clear command window.
clear; % Delete all variables.
close all; % Close all figure windows except those created by imtool.
% imtool close all; % Close all figure windows created by imtool.
workspace; % Make sure the workspace panel is showing.

% Change the current folder to the folder of this m-file.


if(~isdeployed)
cd(fileparts(which(mfilename))); % From Brett
end

try
% Check that user has the Image Processing Toolbox installed.
hasIPT = license('test', 'image_toolbox');
if ~hasIPT
% User does not have the toolbox installed.
message = sprintf('Sorry, but you do not seem to have the Image Processing
Toolbox.\nDo you want to try to continue anyway?');
reply = questdlg(message, 'Toolbox missing', 'Yes', 'No', 'Yes');
if strcmpi(reply, 'No')
% User said No, so exit.
return;
end
end

% Continue with the demo. Do some initialization stuff.


close all;
fontSize = 14;
figure;
% Maximize the figure.
set(gcf, 'Position', get(0, 'ScreenSize'));
set(gcf,'name','Color Matching Demo by ImageAnalyst','numbertitle','off')

% Change the current folder to the folder of this m-file.


% (The line of code below is from Brett Shoelson of The Mathworks.)
if(~isdeployed)
cd(fileparts(which(mfilename)));
end

% Ask user if they want to use a demo image or their own image.
message = sprintf('Do you want use a standard demo image,\nOr pick one of
your own?');
reply2 = questdlg(message, 'Which Image?', 'Demo','My Own', 'Demo');
% Open an image.
if strcmpi(reply2, 'Demo')
% Read standard MATLAB demo image.
message = sprintf('Which demo image do you want to use?');
selectedImage = questdlg(message, 'Which Demo Image?', 'Onions', 'Peppers',
'Stained Fabric', 'Onions');
if strcmp(selectedImage, 'Onions')
fullImageFileName = 'onion.png';
elseif strcmp(selectedImage, 'Peppers')
fullImageFileName = 'peppers.png';
else
fullImageFileName = 'fabric.png';
end
else
% They want to pick their own.
% Change default directory to the one containing the standard demo images
for the MATLAB Image Processing Toolbox.
originalFolder = pwd;
folder = fullfile(matlabroot, '\toolbox\images\imdemos');
if ~exist(folder, 'dir')
folder = pwd;
end
cd(folder);
% Browse for the image file.
[baseFileName, folder] = uigetfile('*.*', 'Specify an image file');
fullImageFileName = fullfile(folder, baseFileName);
% Set current folder back to the original one.
cd(originalFolder);
selectedImage = 'My own image'; % Need for the if threshold selection
statement later.
end

% Check to see that the image exists. (Mainly to check on the demo images.)
if ~exist(fullImageFileName, 'file')
message = sprintf('This file does not exist:\n%s', fullImageFileName);
WarnUser(message);
return;
end

% Read in image into an array.


[rgbImage storedColorMap] = imread(fullImageFileName);
[rows columns numberOfColorBands] = size(rgbImage);
% If it's monochrome (indexed), convert it to color.
% Check to see if it's an 8-bit image needed later for scaling).
if strcmpi(class(rgbImage), 'uint8')
% Flag for 256 gray levels.
eightBit = true;
else
eightBit = false;
end
if numberOfColorBands == 1
if isempty(storedColorMap)
% Just a simple gray level image, not indexed with a stored color map.
% Create a 3D true color image where we copy the monochrome image
into all 3 (R, G, & B) color planes.
rgbImage = cat(3, rgbImage, rgbImage, rgbImage);
else
% It's an indexed image.
rgbImage = ind2rgb(rgbImage, storedColorMap);
% ind2rgb() will convert it to double and normalize it to the range 0-1.
% Convert back to uint8 in the range 0-255, if needed.
if eightBit
rgbImage = uint8(255 * rgbImage);
end
end
end

% Display the original image.


h1 = subplot(3, 4, 1);
imshow(rgbImage);
drawnow; % Make it display immediately.
if numberOfColorBands > 1
title('Original Color Image', 'FontSize', fontSize);
else
caption = sprintf('Original Indexed Image\n(converted to true color with its
stored colormap)');
title(caption, 'FontSize', fontSize);
end

% Let user outline region over rgb image.


% [xCoords, yCoords, roiPosition] = DrawBoxRegion(h1); % Draw a box.
mask = DrawFreehandRegion(h1, rgbImage); % Draw a freehand, irregularly-
shaped region.

% Mask the image.


maskedRgbImage = bsxfun(@times, rgbImage, cast(mask, class(rgbImage)));
% Display it.
subplot(3, 4, 5);
imshow(maskedRgbImage);
title('The Region You Drew', 'FontSize', fontSize);

% Convert image from RGB colorspace to lab color space.


cform = makecform('srgb2lab');
lab_Image = applycform(im2double(rgbImage),cform);

% Extract out the color bands from the original image


% into 3 separate 2D arrays, one for each color component.
LChannel = lab_Image(:, :, 1);
aChannel = lab_Image(:, :, 2);
bChannel = lab_Image(:, :, 3);

% Display the lab images.


subplot(3, 4, 2);
imshow(LChannel, []);
title('L Channel', 'FontSize', fontSize);
subplot(3, 4, 3);
imshow(aChannel, []);
title('a Channel', 'FontSize', fontSize);
subplot(3, 4, 4);
imshow(bChannel, []);
title('b Channel', 'FontSize', fontSize);

% Get the average lab color value.


[LMean, aMean, bMean] = GetMeanLABValues(LChannel, aChannel,
bChannel, mask);

% Get box coordinates and get mean within the box.


% x1 = round(roiPosition(1));
% x2 = round(roiPosition(1) + roiPosition(3) - 1);
% y1 = round(roiPosition(2));
% y2 = round(roiPosition(2) + roiPosition(4) - 1);
%
% LMean = mean2(LChannel(y1:y2, x1:x2))
% aMean = mean2(aChannel(y1:y2, x1:x2))
% bMean = mean2(bChannel(y1:y2, x1:x2))

% Make uniform images of only that one single LAB color.


LStandard = LMean * ones(rows, columns);
aStandard = aMean * ones(rows, columns);
bStandard = bMean * ones(rows, columns);

% Create the delta images: delta L, delta A, and delta B.


deltaL = LChannel - LStandard;
deltaa = aChannel - aStandard;
deltab = bChannel - bStandard;

% Create the Delta E image.


% This is an image that represents the color difference.
% Delta E is the square root of the sum of the squares of the delta images.
deltaE = sqrt(deltaL .^ 2 + deltaa .^ 2 + deltab .^ 2);

% Mask it to get the Delta E in the mask region only.


maskedDeltaE = deltaE .* mask;
% Get the mean delta E in the mask region
% Note: deltaE(mask) is a 1D vector of ONLY the pixel values within the
masked area.
meanMaskedDeltaE = mean(deltaE(mask));
% Get the standard deviation of the delta E in the mask region
stDevMaskedDeltaE = std(deltaE(mask));
message = sprintf('The mean LAB = (%.2f, %.2f, %.2f).\nThe mean Delta E in
the masked region is %.2f +/- %.2f',...
LMean, aMean, bMean, meanMaskedDeltaE, stDevMaskedDeltaE);

% Display the masked Delta E image - the delta E within the masked region
only.
subplot(3, 4, 6);
imshow(maskedDeltaE, []);
caption = sprintf('Delta E between image within masked region\nand mean color
within masked region.\n(With amplified intensity)');
title(caption, 'FontSize', fontSize);

% Display the Delta E image - the delta E over the entire image.
subplot(3, 4, 7);
imshow(deltaE, []);
caption = sprintf('Delta E Image\n(Darker = Better Match)');
title(caption, 'FontSize', fontSize);

% Plot the histograms of the Delta E color difference image,


% both within the masked region, and for the entire image.
PlotHistogram(deltaE(mask), deltaE, [3 4 8], 'Histograms of the 2 Delta E
Images');

message = sprintf('%s\n\nRegions close in color to the color you picked\nwill be


dark in the Delta E image.\n', message);
msgboxw(message);

% Find out how close the user wants to match the colors.
prompt = {sprintf('First, examine the histogram.\nThen find pixels within this
Delta E (from the average color in the region you drew):')};
dialogTitle = 'Enter Delta E Tolerance';
numberOfLines = 1;
% Set the default tolerance to be the mean delta E in the masked region plus two
standard deviations.
strTolerance = sprintf('%.1f', meanMaskedDeltaE + 3 * stDevMaskedDeltaE);
defaultAnswer = {strTolerance}; % Suggest this number to the user.
response = inputdlg(prompt, dialogTitle, numberOfLines, defaultAnswer);
% Update tolerance with user's response.
tolerance = str2double(cell2mat(response));

% Let them interactively select the threshold with the threshold() m-file.
% (Note: This is a separate function in a separate file in my File Exchange.)
% threshold(deltaE);

% Place a vertical bar at the threshold location.


handleToSubPlot8 = subplot(3, 4, 8); % Get the handle to the plot.
PlaceVerticalBarOnPlot(handleToSubPlot8, tolerance, [0 .5 0]); % Put a
vertical red line there.

% Find pixels within that delta E.


binaryImage = deltaE <= tolerance;
subplot(3, 4, 9);
imshow(binaryImage, []);
title('Matching Colors Mask', 'FontSize', fontSize);

% Mask the image with the matching colors and extract those pixels.
matchingColors = bsxfun(@times, rgbImage, cast(binaryImage,
class(rgbImage)));
subplot(3, 4, 10);
imshow(matchingColors);
caption = sprintf('Matching Colors (Delta E <= %.1f)', tolerance);
title(caption, 'FontSize', fontSize);

% Mask the image with the NON-matching colors and extract those pixels.
nonMatchingColors = bsxfun(@times, rgbImage, cast(~binaryImage,
class(rgbImage)));
subplot(3, 4, 11);
imshow(nonMatchingColors);
caption = sprintf('Non-Matching Colors (Delta E > %.1f)', tolerance);
title(caption, 'FontSize', fontSize);
% Display credits: the MATLAB logo and my name.
ShowCredits(); % Display logo in plot position #12.

% Alert user that the demo has finished.


message = sprintf('Done!\n\nThe demo has finished.\nRegions close in color to
the color you picked\nwill be dark in the Delta E image.\n');
msgbox(message);

catch ME
errorMessage = sprintf('Error running this m-file:\n%s\n\nThe error message
is:\n%s', ...
mfilename('fullpath'), ME.message);
errordlg(errorMessage);
end
return; % from SimpleColorDetection()
% ---------- End of main function ---------------------------------

%----------------------------------------------------------------------------
% Display the MATLAB logo.

%-----------------------------------------------------------------------------
function [xCoords, yCoords, roiPosition] = DrawBoxRegion(handleToImage)
try
% Open a temporary full-screen figure if requested.
enlargeForDrawing = true;
axes(handleToImage);
if enlargeForDrawing
hImage = findobj(gca,'Type','image');
numberOfImagesInside = length(hImage);
if numberOfImagesInside > 1
imageInside = get(hImage(1), 'CData');
else
imageInside = get(hImage, 'CData');
end
hTemp = figure;
hImage2 = imshow(imageInside, []);
[rows columns NumberOfColorBands] = size(imageInside);
set(gcf, 'Position', get(0,'Screensize')); % Maximize figure.
end

txtInfo = sprintf('Draw a box over the unstained fabric by clicking and dragging
over the image.\nDouble click inside the box to finish drawing.');
text(10, 40, txtInfo, 'color', 'r', 'FontSize', 24);

% Prompt user to draw a region on the image.


msgboxw(txtInfo);

% Erase all previous lines.


if ~enlargeForDrawing
axes(handleToImage);
% ClearLinesFromAxes(handles);
end

hBox = imrect;
roiPosition = wait(hBox);
roiPosition
% Erase all previous lines.
if ~enlargeForDrawing
axes(handleToImage);
% ClearLinesFromAxes(handles);
end

xCoords = [roiPosition(1), roiPosition(1)+roiPosition(3),


roiPosition(1)+roiPosition(3), roiPosition(1), roiPosition(1)];
yCoords = [roiPosition(2), roiPosition(2), roiPosition(2)+roiPosition(4),
roiPosition(2)+roiPosition(4), roiPosition(2)];

% Plot the mask as an outline over the image.


hold on;
plot(xCoords, yCoords, 'linewidth', 2);
close(hTemp);
catch ME
errorMessage = sprintf('Error running DrawRegion:\n\n\nThe error message
is:\n%s', ...
ME.message);
WarnUser(errorMessage);
end
return; % from DrawRegion

%-----------------------------------------------------------------------------
function [mask] = DrawFreehandRegion(handleToImage, rgbImage)
try
fontSize = 14;
% Open a temporary full-screen figure if requested.
enlargeForDrawing = true;
axes(handleToImage);
if enlargeForDrawing
hImage = findobj(gca,'Type','image');
numberOfImagesInside = length(hImage);
if numberOfImagesInside > 1
imageInside = get(hImage(1), 'CData');
else
imageInside = get(hImage, 'CData');
end
hTemp = figure;
hImage2 = imshow(imageInside, []);
[rows columns NumberOfColorBands] = size(imageInside);
set(gcf, 'Position', get(0,'Screensize')); % Maximize figure.
end

message = sprintf('Left click and hold to begin drawing.\nSimply lift the mouse
button to finish');
text(10, 40, message, 'color', 'r', 'FontSize', fontSize);

% Prompt user to draw a region on the image.


uiwait(msgbox(message));

% Now, finally, have the user freehand draw the mask in the image.
hFH = imfreehand();

% Once we get here, the user has finished drawing the region.
% Create a binary image ("mask") from the ROI object.
mask = hFH.createMask();

% Close the maximized figure because we're done with it.


close(hTemp);
% Display the freehand mask.
subplot(3, 4, 5);
imshow(mask);
title('Binary mask of the region', 'FontSize', fontSize);

% Mask the image.


maskedRgbImage = bsxfun(@times, rgbImage, cast(mask,class(rgbImage)));
% Display it.
subplot(3, 4, 6);
imshow(maskedRgbImage);
catch ME
errorMessage = sprintf('Error running DrawFreehandRegion:\n\n\nThe error
message is:\n%s', ...
ME.message);
WarnUser(errorMessage);
end
return; % from DrawFreehandRegion

%-----------------------------------------------------------------------------
% Get the average lab within the mask region.
function [LMean, aMean, bMean] = GetMeanLABValues(LChannel, aChannel,
bChannel, mask)
try
LVector = LChannel(mask); % 1D vector of only the pixels within the masked
area.
LMean = mean(LVector);
aVector = aChannel(mask); % 1D vector of only the pixels within the masked
area.
aMean = mean(aVector);
bVector = bChannel(mask); % 1D vector of only the pixels within the masked
area.
bMean = mean(bVector);
catch ME
errorMessage = sprintf('Error running GetMeanLABValues:\n\n\nThe error
message is:\n%s', ...
ME.message);
WarnUser(errorMessage);
end
return; % from GetMeanLABValues
%=========================================================
===========================================================
======
function WarnUser(warningMessage)
uiwait(warndlg(warningMessage));
return; % from WarnUser()

%=========================================================
===========================================================
======
function msgboxw(message)
uiwait(msgbox(message));
return; % from msgboxw()

%=========================================================
===========================================================
======
% Plots the histograms of the pixels in both the masked region and the entire
image.
function PlotHistogram(maskedRegion, doubleImage, plotNumber, caption)
try
fontSize = 14;
subplot(plotNumber(1), plotNumber(2), plotNumber(3));

% Find out where the edges of the histogram bins should be.
maxValue1 = max(maskedRegion(:));
maxValue2 = max(doubleImage(:));
maxOverallValue = max([maxValue1 maxValue2]);
edges = linspace(0, maxOverallValue, 100);

% Get the histogram of the masked region into 100 bins.


pixelCount1 = histc(maskedRegion(:), edges);

% Get the histogram of the entire image into 100 bins.


pixelCount2 = histc(doubleImage(:), edges);

% Plot the histogram of the entire image.


plot(edges, pixelCount2, 'b-');

% Now plot the histogram of the masked region.


% However there will likely be so few pixels that this plot will be so low and
flat compared to the histogram of the entire
% image that you probably won't be able to see it. To get around this, let's scale
it to make it higher so we can see it.
gainFactor = 1.0;
maxValue3 = max(max(pixelCount2));
pixelCount3 = gainFactor * maxValue3 * pixelCount1 / max(pixelCount1);
hold on;
plot(edges, pixelCount3, 'r-');
title(caption, 'FontSize', fontSize);

% Scale x axis manually.


xlim([0 edges(end)]);
legend('Entire', 'Masked');

catch ME
errorMessage = sprintf('Error running PlotHistogram:\n\n\nThe error message
is:\n%s', ...
ME.message);
WarnUser(errorMessage);
end
return; % from PlotHistogram

%=========================================================
============
% Shows vertical lines going up from the X axis to the curve on the plot.
function lineHandle = PlaceVerticalBarOnPlot(handleToPlot, x, lineColor)
try
% If the plot is visible, plot the line.
if get(handleToPlot, 'visible')
axes(handleToPlot); % makes existing axes handles.axesPlot the current
axes.
% Make sure x location is in the valid range along the horizontal X axis.
XRange = get(handleToPlot, 'XLim');
maxXValue = XRange(2);
if x > maxXValue
x = maxXValue;
end
% Erase the old line.
%hOldBar=findobj('type', 'hggroup');
%delete(hOldBar);
% Draw a vertical line at the X location.
hold on;
yLimits = ylim;
lineHandle = line([x x], [yLimits(1) yLimits(2)], 'Color', lineColor,
'LineWidth', 3);
hold off;
end
catch ME
errorMessage = sprintf('Error running PlaceVerticalBarOnPlot:\n\n\nThe error
message is:\n%s', ...
ME.message);
WarnUser(errorMessage);
end
return; % End of PlaceVerticalBarOnPlot

You might also like