Professional Documents
Culture Documents
ON
NPRCET
Syllabus
DIGITAL IMAGE PROCESSING
UNIT I DIGITAL IMAGE FUNDAMENTALS AND TRANSFORMS
Elements of visual perception Image sampling and quantization basic relationship between pixels Basic geometric transformations
Introduction to fourier transform and DFT Properties of 2D fourier transform FFT Separable image transforms Walsh- Hadamard
Discrete cosine transform, Haar, Slant-Karhunen Love transforms.
UNIT II IMAGE ENHANCEMENT TECHNIQUES
Spatial domain methods: Basic grey level transformation Histogram equalization Image subtraction Image averaging Spatial
filtering Smoothing, sharpening filters Laplacian filters Frequency domain filters Smoothing Sharpening filters
Homomorphic filtering.
UNIT III IMAGE RESTORATION
Model of image degradation/restoration process Noise models Inverse filtering Least mean square filtering Constrained least
mean square filtering Blind imager Pseudo inverse Singular value decomposition.
UNIT IV IMAGE COMPRESSION
Lossless compression: variable length coding LZW coding Bit plane coding Predictive coding DPCM. Lossy Compression:
Transform coding Wavelet coding Basics of image compression standards JPEG, MPEG, basics of vector quantization.
UNIT V IMAGE SEGMENTATION AND REPRESENTATION
Edge detection Thresholding Region based segmentation Boundary representation Chair codes Polygonal approximation
Boundary segments Boundary descriptors Simple descriptors Fourier descriptors Regional descriptors Simple descriptors
Texture.
TEXT BOOK
1. Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing, 2nd Edition,
Pearson Education, 2003.
REFERENCES
1. William K. Pratt, Digital Image Processing , John Willey ,2001
2. Millman Sonka, Vaclav Hlavac, Roger Boyle, Broos/Colic, Thompson Learniy,
Vision, Image Processing Analysis and Machine, 1999.
Projections
There
1. Perspective Projection
Objects closer to the capture device appear bigger. Most
image formation situations can be considered to be under
this category, including images taken by camera and the
human eye.
2. Ortographic Projection
This is unnatural. Objects appear the same size regardless of their distance to the capture device.
Both types of projections can be represented via mathematical formulas. Ortographic projection is easier and is sometimes used as a
mathematical convenience.
over.
V ()
f (x0, y 0 ) =
cp(x0, y 0, )V ()d
(1)
Let us determine the image functions for the above sensitivity functions imaging the same scene:
1. This is the most realistic of the three. Sensitivity is concentrated in a band around 0 .
0
f1 (x , y ) =
2. This is an unrealistic capture device which has sensitivity only to a single wavelength 0 as
determined by the delta function. However there are devices that get close to such selective
behavior.
f2 (x0 , y 0 ) =
cp (x0 , y 0 , )( 0 )d
3. This is what happens if you take a picture without taking the cap off the lens of your camera.
0
f3 (x , y ) =
= 0
cp (x , y , )V3 ()d =
cp (x0 , y 0 , ) 0 d
fR (x , y ) =
fG (x0, y 0 ) =
fB (x0, y 0 ) =
These
Z
Z
Z
as
your monitor or your eye) to show a color image.
The
values taken by the image function are real numbers which again
vary in a continuum or interval fC (x0, y 0) [fmin, fmax].
Digital computers
continuum.
We
have to discretize:
Quantization
fC (i, j) (i = 0, . . . , N 1, j = 0, . . . , M 1).
discretization left.
fC (i, j) [fmin , fmax], (i, j).
Discretize the
where
Q(fC (i, j)) = (k + 1/2)Q + fmin
if and only if fC (i, j) [fmin + kQ, fmin + (k + 1)Q )
if and only if fmin + kQ fC (i, j) < fmin + (k + 1)Q
for k = 0, . . . , P 1
(4)
Quantization to P levels
quantization.
We
From
(5)
Grayscale Images
Grayscale image fgray (i, j)
A grayscale
Advantage:
Our emphasis
Images as Matrices
we will do
an image A
with A(i, j) {0, . . . , 255} into a new matrix B
which may not have integer entries!
In these cases we must suitably scale and
round the elements of B in order to display it
as an image.
9-4
Periodicity properties
Fourier spectrum
with back-to-back
half periods in the
range
[0,n-1]
Shifted spectrum
with a
full period
in the
same range
Average Value
A widely used expression for the average value of a
2-D discrete function is:
f ( x, y) =
N 1 N 1
1
N2
f ( x, y)
x =0 y =0
Therefore,
1
N
N 1 N 1
f ( x, y)
f ( x, y) =
x =0 y =0
1
F (0,0)
N
The Laplacian
The Laplacian of a two variable function f(x,y) is
given as:
2 f ( x, y) =
2 f 2 f
2 +
2
x
y
f ( x) * g ( x) = f ( ) g ( x )d
g()
1
1/2
1
g()
1/2
1/2
-1
g(x)
1/2
-1
1/2
-1
f()g(x- )
f ( x) * g ( x) = 1 x / 2 1 x 2
0
elsewhere.
Graphically,
f(x)*g(x)
1/2
f ( x) ( x x
)dx = f ( x0 )
(x x
x0+
)dx = ( x x0 )dx = 1
x0
x0
A(x-x0)
f()
A
-T
g()
D(u, v) = u 2 + v 2
H(u,v)
v
u
UNIT 1
2 marks
1. Define Image?
2. What is Dynamic Range?
3. Define Brightness?
4. Define Tapered Quantization?
5. What do you meant by Gray level?
6. What do you meant by Color model?.
7. List the hardware oriented color models?
8. What is Hue of saturation?
9. Explain separability property in 2D fourier transform
10. What are the properties of Haar and slant transform.
11. Define Resolutions?
12. What is meant by pixel?
13. Define Digital image?
14. What are the steps involved in DIP?
16. Specify the elements of DIP system?
18. What are the types of light receptors?
19. Differentiate photopic and scotopic vision?
26. Define sampling and quantization
27. Find the number of bits required to store a 256 X 256 image with 32 gray levels?
28. Write the expression to find the number of bits to store a digital image?
30. What do you meant by Zooming and shrinking of digital images?
32. Write short notes on neighbors of a pixel.
33. Explain the types of connectivity.
34. What is meant by path?
36. What is geometric transformation?
40. What is Image Transform?
16 MARKS
UNIT I
# Properties of hadamard:
Real and orthogonal
fast transform
faster than sine transform
Good energy compaction for image
# Appl:
Image data compression,
filtering and design of course
# Properties of hotelling:
Real and orthogonal
Not a fast transform
Best energy compaction for image
# Appl:
Useful in performance evaluation & for finding performance
bounds
13. Explain Haar transform in detail.
# Def P= 2P+q-l
# Find h k (z)
14. Explain K-L transform in detail.
Consider a set of n or multi-dimensional discrete signal represented as column
vector xl,x2,xn each having M elements,
X=
Xl
X2
.
.
Xn
Contrast
In
The relationship
You must
g(l):
learn how to calculate hB (l) given hA(l) and the point function
B has 10 times as
few distinct pixel values.
Note also the vertical axis
scaling in hB (l).
an overall point function which includes contrast stretching/compression, emphasis/de-emphasis, rounding, normalizing etc.
Given an
g(l)
Image Segmentation
If one
Most
High level
Based
For
a given image, decompose the range of pixel values (0, . . . , 255) into
discrete intervals Rt = [at, bt ], t = 1, . . . , T , where T is the total number
of segments.
Each Rt is
Label
the pixels with pixel values within each Rt via a point function.
is assumed to be composed of
Limitations
Histogram based
segmentation operates on each image pixel independently. As mentioned earlier, the main assumption is that objects
must be composed of pixels with similar pixel values.
This
Histogram Equalization
For
a given image A, we will now design a special point function gAe (l)
which is called the histogram equalizing point function for A.
Stretch/Compress
The
techniques we are going to use to get gAe (l) are also applicable
in
histogram modification/specification.
Let g1(l) =
Pl
Image A
k=0 pA (k)
in
Assuming you gAe is an array that contains the computed g e (l)A, you
can use >> B = gAe(A + 1); to obtain the equalized image.
gAe (l)
gAe (l)
A.
Example
Comparison/Undoing
Comparison/Undoing - contd.
Histogram Equalization
gl (l) =
Pl
k=0 pA (k)
= gl (l) gl (l 1) = pA(l) =
hA (l)
NM
(l = 1, . . . , 255).
Histogram
A single outcome of a
[0, 1] in matlab: >>
An N M
An N M
An N M
An N M
Example
Warning
Remember,
tograms.
Given images A
UNIT II
2 Marks
1. Specify the objective of image enhancement technique.
2. Explain the 2 categories of image enhancement.
3. What is contrast stretching?
4. What is grey level slicing?
5. Define image subtraction.
6. What is meant by masking?
7.Define Histogram.
8.What is meant by Histogram Equilisation
9. Differentiate linear spatial filter and non-linear spatial filter.
10. What is meant by laplacian filter?
11. Write the application of sharpening filters?
16 Marks
1. Explain the types of gray level transformation used for image enhancement.
# Linear (Negative and Identity)
# Logarithmic( Log and Inverse Log)
# Power_law (nth root and nth power)
# Piecewise_linear (Constrast Stretching, Gray level Slicing, Bit plane Slicing)
2. What is histogram? Explain histogram equalization.
# P(rk) = nk/n
# Ps(s) = l means histogram is arranged uniformly.
3. Discuss the image smoothing filter with its model in the spatial domain.
# LPF-blurring
# Median filter noise reduction & for sharpening image
4. What are image sharpening filters. Explain the various types of it.
# used for highlighting fine details
# HPF-output gets sharpen and background becomes darker
# High boost- output gets sharpen but background remains unchanged
# Derivative- First and Second order derivatives
Appl:
# Medical image
# electronic printing
# industrial inspection
Image restoration
Restoration is an objective process that attempts to
recover an image that has been degraded
A priori knowledge of the degradation phenomenon
Restoration techniques generally oriented toward
modeling the degradation
Application of the inverse process to recover the original
image
Involves formulating some criterion (criteria) of
goodness that is used to measure the desired result
f ( x, y) Degradation
Function
H
+
Noise
( x, y)
Restoration
filter(s)
f ( x, y)
Noise models
Common sources of noise
Acquisition
Environmental conditions (heat, light), imaging sensor quality
Transmission
Noise in transmission channel
Gaussian noise
Rayleigh noise
Erlang (Gamma) noise
Exponential noise
Uniform noise
Impulse (salt-and-pepper) noise
Gaussian noise
Gaussian (normal) noise
models are simple to
consider
The PDF of a Gaussian
random variable, z, is given
to the right as:
In this case, approximately
70% of the values of z will
be within within one
standard deviation
Approximately 95% of the
values of z will be within
within two standard
deviations
p( z) =
1
2
e ( z z )
/ 2 2
where
z represents intensity
z represents the mean (average) value of z
Rayleigh noise
The PDF of Rayleigh noise is
given as:
2
( z a)e ( z a )
p( z) = b
/ b
for z a
for z < a
where
z represents intensity
z = a + b / 4
2 =
b(4 )
4
e
p( z) = (b 1)!
0
for z 0
for z < 0
where
z represents intensity
z =b/a
2 = b / a2
a > 0, b is a positive integer
Exponential noise
The PDF of exponential
noise is given as:
ae az
p( z) =
0
where
for
for zz
< 00
z represents intensity
z = 1/ a
2 = 1/ a2
a>0
This PDF is a special case
of the Erlang PDF with b=1
Uniform noise
The PDF of uniform noise is
given as:
1
if a z b
p( z) = b a
0
otherwise
where
z represents intensity
z=
a +b
2
(b a) 2
=
12
Pa for z = a
p( z) = Pb for z = b
0 otherwise
If b>a then any pixel with
intensity b will appear as a
light dot in the image
Pixels with intensity a will
appear as a dark dot
50
50
100
100
150
150
200
200
250
250
50
100
150
200
250
50
100
150
200
250
122
124
126
128
130
132
134
136
105
110
115
120
125
130
135
140
145
150
z = zi p ( z ) and
=0
L 1
= ( z i z ) 2 p ( zi )
i =0
f ( x, y) = 1
g (s, t )
mn ( s ,t )S xy
The operation is generally implemented using a spatial
filter of size m*n in which all coefficients have value 1/mn
A mean filter smoothes local variations in an image
Noise is reduced as a result of blurring
mn
f ( x, y) = g (s, t )
( s ,t )S xy
f ( x, y) =
mn
( s ,t )S xy
1
g (s, t )
f ( x, y) =
g (s, t )
Q +1
( s ,t )S xy
g (s, t )
( s ,t )S xy
UNIT III
2 Marks
1. What is meant by Image Restoration?
2. What are the two properties in Linear Operator?
3. Explain additivity property in Linear Operator?
4. How a degradation process is modeled?
5. Explain homogenity property in Linear Operator?
8. Define circulant matrix?
10. What are the two methods of algebraic approach?
11. Define Gray-level interpolation?
12. What is pseudo inverse filter?
13.What is meant by least mean square filter?
14. Write the properties of Singular value Decomposition(SVD)?
16 Marks
1. Explain the algebra approach in image restoration.
# Unconstrained
# Constrained
2. What is the use of wiener filter in image restoration. Explain.
# Calculate f^
# Calculate F^(u, v)
3. What is meant by Inverse filtering? Explain.
# Recovering i/p from its o/p
# Calculate f^(x, y)
4. Explain singular value decomposition and specify its properties.
# U= m=lrm m T
This equation is called as singular value decomposition of an image.
# Properties
The SVD transform varies drastically from image to image.
The SVD transform gives best energy packing efficiency for any given
image.
The SVD transform is useful in the design of filters finding least
square,minimum solution of linear equation and finding rank of large
matrices.
5. Explain image degradation model /restoration process in detail.
# Image degradation model /restoration process diagram
# Degradation model for Continuous function
# Degradation model for Discrete function l_D and 2_D
6. What are the two approaches for blind image restoration? Explain in detail.
> Direct measurement
> Indirect estimation
Objectives
At the end of this lesson, the students should be able to:
1. Explain the need for standardization in image transmission and reception.
2. Name the coding standards for fax and bi-level images and state their
characteristics.
3. Present the block diagrams of JPEG encoder and decoder.
4. Describe the baseline JPEG approach.
5. Describe the progressive JPEG approach through spectral selection.
6. Describe
the
progressive
JPEG
approach
through
successive
approximation.
7. Describe the hierarchical JPEG approach.
8. Describe the lossless JPEG approach.
9. Convert YUV images from RGB.
10. Illustrate the interleaved and non-interleaved ordering for color images.
Introduction
With the rapid developments of imaging technology, image compression and coding
tools and techniques, it is necessary to evolve coding standards so that there is
compatibility and interoperability between the image communication and storage
products manufactured by different vendors. Without the availability of standards,
encoders and decoders can not communicate with each other; the service providers
will have to support a variety of formats to meet the needs of the customers and the
customers will have to install a number of decoders to handle a large number of data
formats. Towards the objective of setting up coding
standards, the
international standardization
agencies,
such
as
International Standards Organization (ISO), International Telecommunications Union
(ITU), International Electro-technical Commission (IEC) etc. have formed expert
groups and solicited proposals from industries, universities and research laboratories.
This has resulted in establishing standards for bi-level (facsimile) images and
continuous tone (gray scale) images. In this lesson, we are going to discuss the
highlighting features of these standards. These standards use the coding and
compression techniques both lossless and lossy which we have already studied in the
previous lessons.
The first part of this lesson is devoted to the standards for bi-level image coding.
Modified Huffman (MH) and Modified Relative Element Address Designate
(MREAD) standards are used for text-based documents, but more recent
standards like JBIG1 and JBIG2, proposed by the Joint bi-level experts group (JBIG)
can efficiently encode handwritten characters and binary halftone images. The latter part
of this lesson is devoted to the standards for continuous tone images. We are going to
discuss in details about the Joint Photographic Experts Group (JPEG) standard and its
different modes, such as baseline (sequential), progressive, hierarchical and lossless.
Coding Standards for Fax and Bi-level Images
Consider an A4-sized (8.5 in x 11 in) scanned page having 200 dots/in. An
uncompressed image would require transmission of 3,740,000 bits for this
scanned page. It is however seen that most of the information on the scanned page is
highly correlated along the scan lines, which proceed in the direction of left to right in
top to bottom order and also in between the scan lines. The coding standards have
exploited this redundancy to compress bi-level images. The coding standards
proposed for bi-level images are:
(a) Modified Huffman (MH): This algorithm performs one-dimensional run
length coding of scan lines, along with special end-of-line (EOL), end-of- page
(EOP) and synchronization codes. The MH algorithm on an average achieves a
compression ratio of 20:1 on simple text documents.
(b) Modified Relative Element Address
Designate
(MREAD): This
algorithm uses a two-dimensional run length coding to take advantage of
vertical spatial redundancy, along with horizontal spatial redundancy. It uses
the previous scan line as a reference when coding the current line. The position
of each black-to-white or white-to-black transition is coded relative to a
reference element in the current scan line. The compression ratio is improved to
25:1 for this algorithm.
(c) JBIG1: The earlier two algorithms just mentioned work well for printed texts
but are inadequate for handwritten texts or binary halftone images (continuous
images converted to dot patterns). The JBIG1 standard, proposed by the
Joint Bi-level Experts Group uses a larger region of support for coding the
pixels. Binary pixel values are directly fed into an arithmetic coder, which
utilizes a sequential template of nine adjacent and previously coded pixels plus
one adaptive pixel to form a 10-bit context. Other than the sequential mode
just described, JBIG1 also supports progressive mode in which a reduced
resolution starting layer image is followed by the transmission of progressively
higher resolution layers. The compression ratios of JBIG1 standard is
slightly better than that of MREAD for text images but has an
improvement of 8-to-1 for binary halftone images.
(d) JBIG2: This is a more recent standard proposed by the Joint bi-level Experts
Group. It uses a soft pattern matching approach to provide a solution to the
problem of substitution errors in which an imperfectly scanned symbol is
wrongly matched to a different symbol, as frequently observed in Optical
Character Recognition (OCR). JBIG2 codes the bit- map of each mark, rather
than its matched class index. In case a good match cannot be found for the
current mark, it becomes a token for a new class. This new token is then coded
using JBIG1 with a fixed template of previous pixels around the current mark.
The JBIG2 standard is seen to be 20% more efficient than the JBIG1 standard
for lossless compression.
Continuous tone still image coding standards
A different set of standards had to be created for compressing and coding
continuous tone monochrome and color images of any size and sampling rate. Of these,
the Joint Photographic Expert Group (JPEG)s first standard, known as JPEG is the
most widely used one. Only in recent times, the new standard JPEG-2000 has its
implementations in still image coding systems. JPEG is a very simple and easy to
use standard that is based on the Discrete Cosine Transform (DCT).
JPEG Encoder
Figure shows the block diagram of a JPEG encoder, which has the following
components:
(a) Forward Discrete Cosine Transform (FDCT): The still images are first
partitioned into non-overlapping blocks of size 8x8 and the image samples
are shifted from unsigned integers with range [0,2 p 1] to signed integers
with range [ 2 p 1 ,2 p 1 ], where p is the number of bits (here, p = 8 ). The
theory of the DCT has been already discussed in lesson-8 and will not be
repeated here. It should however be mentioned that to preserve freedom for
innovation and customization within implementations, JPEG neither specifies
any unique FDCT algorithm, nor any unique IDCT algorithms.
JPEG Decoder
Figure shows the block diagram of the JPEG decoder. It performs the inverse operation
of the JPEG encoder.
Modes of Operation in JPEG
The JPEG standard supports the following four modes of operation:
Baseline or sequential encoding
Progressive
encoding
(includes
spectral
selection
and
successive
approximation approaches).
Hierarchical encoding
Lossless encoding
Baseline Encoding: Baseline sequential coding is for images with 8-bit samples and
uses Huffman coding only. In baseline encoding, each block is encoded in a single
left-to-right and top-to-bottom scan. It encodes and decodes complete 8x8 blocks with
full precision one at a time and supports interleaving of color components. The FDCT,
quantization, DC difference and zig-zag ordering proceeds. In order to claim JPEG
compatibility of a product it must include the support for at least the baseline encoding
system.
Progressive Encoding: Unlike baseline encoding, each block in progressive
encoding is encoded in multiple scans, rather than a single one. Each scan follows the
zig zag ordering, quantization and entropy coding, as done in baseline encoding, but
takes much less time to encode and decode, as compared to the single scan of
baseline encoding, since each scan contains only a part of the complete information.
With the first scan, a crude form of image can be reconstructed at the decoder and with
successive scans, the quality of the image is refined. You must have experienced this
while downloading web pages containing images. It is very convenient for browsing
applications, where crude reconstruction quality at the early scans may be sufficient for
quick browsing of a page.
There are two forms of progressive encoding: (a) spectral selection approach and (b)
successive approximation approach. Each of these approaches is described below.
Progressive scanning through spectral selection: In this approach, the first scan
sends some specified low frequency DCT coefficients within each block. The
corresponding reconstructed image obtained at the decoder from the first scan therefore
appears blurred as the details in the forms of high frequency components are missing.
In subsequent scans, bands of coefficients, which are higher in frequency than the
previous scan, are encoded and therefore the reconstructed image gets richer with
details. This procedure is called spectral selection, because each band typically
contains coefficients which occupy a lower or higher part of the frequency spectrum
for that 8x8 block.
The spectral select on approach. Here all the 64 DCT coefficients in a block are of
8-bit resolution and successive blocks are stacked
one after the other in the scanning order. The spectral selection approach performs
the slicing of coefficients horizontally and picks up a band of coefficients,
starting with low frequency and encodes them to full resolution.
Progressive scanning through successive approximation: This is also a multiple
scan approach. Here, each scan encodes all the coefficients within a block, but not to
their full quantized accuracy. In the first scan, only the N most significant bits of each
coefficient are encoded (N is specifiable) and in successive scans, the next lower
significant bits of the coefficients are added and so on until all the bits are sent. The
resulting reconstruction quality is good even from the early scans, as the high
frequency coefficients are present from the initial scans.
The successive approximation approach. The organization of the DCT coefficients and
the stacking of the blocks are same as before. The successive approximation approach
performs the slicing operation vertically and picks up a group pf bits, starting with the
most significant ones and progressively considering the lower frequency ones.
Obtain the reduced resolution images starting with the original and for each,
reduce the resolution by a factor of two, as described above.
Encode the reduced resolution image from the topmost layer of the
pyramid .
Decode the above reduced resolution image. Interpolate and up-sample it by a
factor of two horizontally and/or vertically, using the identical interpolation
filter which the decoder must use. Use this interpolated and up-sampled image
as a predicted image for encoding the next lower layer (finer resolution) of the
pyramid.
Encode the difference between the image in the next lower layer and the
predicted image using baseline, progressive or lossless encoding.
Repeat the steps of encoding and decoding until the lowermost layer
(finest resolution) of the pyramid is encoded.
In lossless encoding, the 8x8 block structure is not used and each pixel is predicted
based on three adjacent pixels, as illustrated in figure using one of the eight possible
predictor modes listed here.
Prediction
None
A
B
C
A+B-C
A+(B-C)/2
B+(A-C)/2
(A+B)/2
encoding. However, from efficient encoding considerations, RGB is not the best format.
Color spaces such as YUV, CIELUV, CIELAB and others represent the chromatic
(color) information in two components and the luminance (intensity) information in
one component. These formats are more efficient from image compression
considerations, since our eyes are relatively insensitive to the high frequency
information from the chrominance channels and thus the chrominance components can
be represented at a reduced resolution as compared to the luminance components for
which full resolution representation is necessary.
It is possible to convert an RGB image into YUV, using the following relations:
B Y
+ 0.5
2
V=
R Y
+ 0.5
1.6
Quality
2
1.5
0.75
0.5
0.25
Indistinguishable
Excellent
Very good
Good
Fair
Compression
Ratio
8:1
10.7:1
21.4:1
32:1
64:1
A more advanced still image compression standard JPEG-2000 has evolved in recent times. This will
be our topic in the next lesson.
UNIT IV
2 Marks
1. What is image compression?
2. What is Data Compression?
3. What are two main types of Data compression?
4. What is the need for Compression?
5. What are different Compression Methods?
7. Define interpixel redundancy?
8. What is run length coding?
9. Define compression ratio.
10. What are the operations performed by error free compression?
11. What is Variable Length Coding?
12. Define Huffman coding
16 Marks
1. What is data redundancy? Explain three basic data redundancy?
Definition of data redundancy
The 3 basic data redundancy are
> Coding redundancy
> Interpixel redundancy
> Psycho visual redundancy
2. What is image compression? Explain any four variable length coding
compression schemes.
Definition of image compression
Variable Length Coding
* Huffman coding
* B2 Code
* Huffman shift
* Huffman Truncated
* Binary Shift
*Arithmetic coding
3. Explain about Image compression model?
The source Encoder and Decoder
The channel Encoder and Decoder
4. Explain about Error free Compression?
a. Variable Length coding
i. Huffman coding
ii. Arithmetic coding
b. LZW coding
c. Bit Plane coding
d. Lossless Predictive coding
5. Explain about Lossy compression?
Lossy predictive coding
Transform coding
Wavelet coding
7. Explain how compression is achieved in transform coding and explain about DCT
Block diagram of encoder
decoder
Bit allocation
1D transform coding
2D transform coding, application
1D,2D DCT
8. Explain arithmetic coding
Non-block code
One example
9. Explain about Image compression standards?
Binary Image compression standards
Continuous tone still Image compression standards
Video compression standards
10. Discuss about MPEG standard and compare with JPEG
Motion Picture Experts Group
1. MPEG-1
2. MPEG-2
3. MPEG-4
Block diagram
I-frame
p-frame
B-frame
1 , if G x, y
0 , if G x, y
(1)
In equation (1), G(x,y) indicates the intensity value of pixel (x,y) in the grey image G. GB is the segmentation result. Actually it
forms a binary image, in which each value of GB(x,y) gives the category (foreground or background) that the corresponding pixel
belongs to. If GB(x,y) = 1, then pixel (x,y) in the image G is classified as a foreground pixel, otherwise it is classified as a
background pixel.
Equation (1) is formulated under the assumption that foreground pixels in the image G have relatively high intensity values and
background pixels take low intensity values. Of course you can reverse the equation when you need to set the low intensity region
as the foreground.
Basic notations
G(x,y): The input gray image that we want to segment.
GB(x,y): The segmentation result of G. It is a binary image, the value of GB(x,y) is either 0 or 1 indicating the corresponding pixel
(x,y) in G belongs to background or foreground respectively.
K:
T:
N:
The maximum possible intensity value defined by G. If G is an 8-bit gray image, then K takes the value 255.
The thresholding value. It is an integer within the range [0..K].
The total number of pixels in G. If G has width = w, and height = h, then of course N = w h.
PG i ,
(3)
PG i
i T 1
i 0
Of course the frequency of the entire image G is calculated as = B(T) + F(T) = 1, no matter what value T takes. The mean
intensity values of background and foreground, B(T) and F(T) are calculated as:
T
B
i PG i
i PG i
i 0
(4)
i T 1
The mean intensity value of the entire image can be calculated as: = B(T) B(T) + F(T)
T takes, keeps the same. The intensity variances of background and foreground, 2B(T) and
i
T
2
B
T
i 0
P i
K
2
F
T
i T 1
F(T).
2
F(T)
P i
(5)
Having the definition of variances of the background and the foreground, it is the time to define the so-called within-class
variance, 2within:
2
within
(6)
2
within
between:
(7)
indicates the intensity variance of the entire image G and it is calculated as:
2
P i
(8)
i 0
Otsus algorithm
The Otsus algorithm is simple. We let T try all the intensity values from 0 to K and choose the one that gives the minimum
within-class variance 2within as the optimal thresholding value. Formally speaking:
Optimal value of T = TOpt, where
within(TOpt)
= min
0 T K
2
within
(9)
As we said before, 2 = 2within(T)+ 2between(T), and 2 is independent of the selection of T, therefore, minimization of
means maximization of 2between. So the optimal value of T can also be taken as:
Optimal value of T = TOpt, where
between(TOpt)
= max
0 T K
2
between
2
within
(10)
In fact equation (10) is the usual way that we use to find the optimal thresholding value. It is because that for each T, the
calculation of 2between only needs the calculations of B, F, B and F according to equation (7). And these values can be updated
iteratively:
Initially, T = 0:
Calculate the mean intensity of the entire image, .;
B (0)
Step 1:
Create a label image GL with the same size (width and height) as GB, and initially set each GL(x,y) = 1. Create a variable,
current_label, for recording the current available label and initially current_label = 0.
Step 2:
Scan the binary image GB sequentially and update the label image GL as follows:
FOR each y FROM 0 TO height 1
FOR each x FROM 0 TO width 1
IF pixel (x,y) is a foreground, which means GB(x,y) = 1,
THEN
Check the 8 neighborhood pixels around the pixel (x,y);
FOR each neighborhood pixel (x,y)
IF it has a nonnegative label in the label image GL, which means GL(x,y)>=0,
THEN
SET GL(x,y) = GL(x,y);
ELSE
SET GL(x,y) = current_label;
SET current_label = current_label +1;
END IF
END FOR
END IF
END FOR
END FOR
Step 3:
Build the 2D transit matrix M_T. Initially create an empty matrix:
M_T[0..current_label1][0..current_label1].
Clearly it is an 2D array with size=(current_label) (current_label), and each element in the matrix is first set to 0. Then assign 1
to each element of the matrix that locates at the diagonal. It means: set M_T[i][i]=1, where i is from 0 to current_label1.
After the above initialization, we need to update the matrix M_T according to the label image GL and let M_T become the transit
matrix of GL. The update procedure is given as follows:
FOR each y FROM 0 TO height 1
FOR each x FROM 0 TO width 1
IF pixel (x,y) is a foreground pixel, which means GB(x,y) = 1,
THEN
Check the 8 neighborhood pixels around pixel (x,y).
FOR each neighborhood pixel (x,y)
IF its label is nonnegative, which means GL(x,y)>=0
THEN
SET M_T[GL(x,y)][GL(x,y)] = 1;
SET M_T[GL(x,y)][GL(x,y)] = 1;
END IF
END FOR
END IF
END FOR
END FOR
After the above procedure, you get the updated matrix M_T, which represents the transit relationship in GL.
Step 4:
Calculate the transit closure matrix, M_TC, of M_T. This calculation of transit closure matrix is based on the warshells algorithm
which is an iteration procedure described as follows:
(1) Initially create two temporary matrixes M_0 and M_1, which have the same size of the matrix M_T. Copy each value in
M_T into the corresponding position in M_0 and M_1, which is:
FOR i FROM 0 TO current_label1
FOR j FROM 0 TO current_label1
SET M_0[i][j] = M_1[i][j] = M_T[i][j];
(2) Update the matrix M_1 using the transitivity law, which is:
FOR i FROM 0 TO current_label1
FOR j FROM 0 TO current_label1
FOR k FROM 0 TO current_label1
IF M_1[i][j] = 1 AND M_1[j][k] = 1
THEN
SET M_1[i][k] = 1;
SET M_1[k][i] = 1;
END IF
END FOR
END FOR
END FOR
(3) Compare M_1 and M_0 to see if they are exactly the same, which means each element from M_1 has the same value as
the corresponding element from M_0. If so, set M_TC = M_1, which means we got the transit closure matrix. If not,
copy M_1 to M_0 and go back to (2).
Step 5:
Count the number of connected objects (or the number of equivalent classes) in the transit closure matrix M_TC. It is not hard to
see that the number connected objects equals to the number of distinct rows (or columns) in matrix M_TC. Assume M_TC[i] and
M_TC[j] are the two rows of matrix M_TC, where i j. We say these two rows are distinct if and only if there exists at least one
k [0.. current_label1] that M_TC[i][k] M_TC[j][k].
Step 6:
Output the number of connected objects (or the number of distinct rows of M_TC) and return.
Then according to Step 4 (3), we compare matrix M_1 (Figure 2) with matrix M_0 (Figure 1). We find M_1 is different from
M_0. Therefore we copy M_1 to M_0, as shown in Figure 3. Then go back to (2) do the same operations on M_1 again, and get
M_1 updated again, as shown in Figure 4.
Then we compare M_1 (Figure 4) and M_0 (Figure 3). Again we find that they are different. So we need to copy M_1 to M_0 (as
shown in Figure 5) and go back to (2) to get M_1 updated again (as shown in Figure 6).
This time we find that M_1 did not change, which means M_0 (Figure 5) and M_1 (Figure 6) are identical. So we say the current
M_1 is the transit closure matrix that we want, and set M_TC = M_1. Furthermore we discover there are two distinct rows in the
transit closure matrix: (1111100) and (0000011). It means that there are two connected objects.
Detection of Discontinuities
Edge Linking and Boundary Detection
Thresholding
Region-Based Segmentation
Segmentation by Morphological Watersheds
The Use of Motion Segmentation
UNIT V
2 Marks
1. What is segmentation?
2. Write the applications of segmentation.
3. What are the three types of discontinuity in digital image?
4. How the derivatives are obtained in edge detection during formulation?
5. Write about linking edge points.
6. What are the two properties used for establishing similarity of edge pixels?
7. Define Gradient Operator?
8. Define region growing?
9. Define compactness
16 Marks
1. What is image segmentation. Explain in detail.
Definition - image segmentation
Discontinity Point, Line, Edge
Similarity Thresholding, Region Growing, Splitting and
merging
2. Explain Edge Detection in details?
* Basic formation.
* Gradient Operators
* Laplacian Operators
3. Define Thresholding and explain the various methods of thresholding in detail?
Foundation
The role of illumination
Basic adaptive thresholding
Basic adaptive thresholding
Optimal global & adaptive thresholding.
4. Discuss about region based image segmentation techniques. Compare
threshold region based techniques.
* Region Growing
* Region splitting and merging
* Comparison
5. Define and explain the various representation approaches?
chain codes
Polygon approximations
Signature
Boundary segments
Skeletons.
6. Explain Boundary descriptors.
Simple descriptors.
Fourier descriptors.