BKM MEP Sivakasi 14nov2016

Satellite Image Processing
and Analysis
Prof. B. Krishna Mohan
CSRE, IIT Bombay
bkmohan@csre.iitb.ac.in
About CSRE
One of the academic units of IIT Bombay offering M.Tech. and
PhD programs in Geoinformatics and Natural Resources
Engineering
Current faculty strength: 12; multidisciplinary in nature PhDs in
EE, CS, CE, Earth Sciences, Physics, Chemistry, Maths
Research Areas satellite image analysis, GIS, GPS, Microwave
Remote Sensing, Hyperspectral Remote Sensing, Applications
to Resources Exploration, Environmental Monitoring,
Engineering Applications, Natural Hazards and Disaster
Management
Excellent Infrastructure ArcGIS, ERDAS, ENVI, Ecognition,
Matlab, Geomatica, Spectroradiometers, GPS Receivers, Large
Format Map Scanner, Marine Research Lab, Photogrammetry
Workstations, Wireless Sensor modules
Todays Presentation
(Very brief) Introduction to Remote Sensing source of images
Steps in Satellite Image Processing and Analysis
Image Analysis and relation to Geospatial Technologies
Some Applications
Noise Filtering with Curvelets
High spatial resolution image analysis
High spectral resolution image analysis
What is Remote Sensing?

Remote sensing is the art and science of making
measurements about an object or the
environment without being in physical contact
with it
Importance of Remote Sensing

Remote Sensing provides vital data for many critical
applications
Resources management
Environmental monitoring
Defence
Urban / rural development and planning
Crop yield forecasting
Hazard zonation and disaster mitigation
Electromagentic Spectrum
Visible and Reflective Infrared

Reflectance measurements in different wavelengths
ratio of incident to reflected energy
Ranges 0% to 100%
Highly wavelength dependent
Basic Premise of RS
Each object on the earth surface has a unique reflectance
pattern as a function of wavelength
Reflectance Spectra of Earth Objects
Atmospheric Windows
The atmosphere interferes with the radiation passing
through it
It is essential to block the harmful UV rays in solar
radiation from reaching the earth
Should not block the radiation in in wavelengths used for
earth observation
Choice of wavelengths
Clear response of earth surface features
Minimal interference from atmospheric constituents
Atmospheric Windows
Transmission (%)
Visible
Near Infrared
Far Infrared
Wavelength (microns)
Concept of Resolution
Four types of resolution in remote sensing:
Spatial resolution
Spectral resolution
Radiometric resolution
Temporal resolution
Spatial Resolution
Ability of the sensor to observe closely spaced features on the
ground
Function of the instantaneous field of view of the sensor
Large IFOV Coarse spatial resolution pixel covers more
area on ground
Small IFOV Fine spatial resolution pixel covers less area on
ground
A sensor with pixel area 5x5 metres has a higher spatial
resolution than a sensor with pixel area 10x10 metres
CSRE
0.6m x 0.6m
5.8m x 5.8m
23.25m x 23.25m
Effect of High Spatial Resolution

High resolution images are information rich
Spatial information
Multispectral information
Textural information
Image can be viewed as a collection of objects with

spatial relationships adjacent, north of, south of,
Spectral Resolution
Ability of the sensor to distinguish differences in
reflectance of ground features in different
wavelengths
Characterized by many sensors, each operating in a
narrow wavelength band
Essential to discriminate between sub-classes of a
broad class such as vegetation
High Spectral Resolution

Response
Large number of contiguous sensors

Narrow bandwidth
wavelength
Coarse Spectral Resolution

Response
Small number of sensors

Large bandwidth
wavelength
Reflectance Spectra
reflectance
Unique spectra of objects
wavelength
Steerable Sensor Systems
Steps in Digital Image Processing

Image
Image
Image
Acquisition
Corrections
Enhancement
Image
Feature Selection
Image
Classification
Final Interpretation
Transforms
Concept of Image Classification

Image classification - assigning pixels in the
image to categories or classes of interest
Examples: built-up areas, waterbody, green
vegetation, bare soil, rocky areas, cloud,
shadow,

In order to classify a set of data into different classes or
categories, the relationship between the data and the
classes into which they are classified must be well
understood
To achieve this by computer, the computer must be
trained
Training is key to the success of classification
Classification techniques were originally developed out
of research in Pattern Recognition field

Computer classification of images involves the
process of the computer program learning the
relationship between the data and the
information classes
Important aspects of accurate classification
Learning techniques
Feature sets
Supervised Classification
The classifier has the advantage of an analyst
or domain knowledge using which the classifier
can be guided to learn the relationship between
the data and the classes.
The number of classes, prototype pixels for
each class can be identified using this prior
knowledge
Unsupervised Classification
When access to domain knowledge or the
experience of an analyst is missing, the data
can still be analyzed by numerical exploration,
whereby the data are grouped into subsets or
clusters based on statistical similarity
Partially Supervised Classification

When prior knowledge is available
For some classes, and not for others,
For some dates and not for others in a
multitemporal dataset,
Combination of supervised and unsupervised

methods can be employed for partially
supervised classification of images
Statistical Characterization of Classes

Each class has a conditional probability density
function (pdf) denoted by p(x | ck)
The distribution of feature vectors in each class ck
is indicated by p(x | ck)
We estimate P(ck | x), the conditional probability of
class ck given that the pixels feature vector is x
Supervised Classification Principles

Typical characteristics of classes
Mean vector
Covariance matrix
Minimum and maximum gray levels within each band
Conditional probability density function p(Ci|x) where Ci is
the ith class and x is the feature vector
Number of classes L into which the image is to be

classified should be specified by the user
Parallelepiped Classifier - Example of a

Supervised Classifier
Assign ranges of values for each class in each
band
Really a feature space classifier
Training data provide bounds for each feature for
each class
Results in bounding boxes for each class
A pixel is assigned to a class only if its feature
vector falls within the corresponding box
Band 2
C4
Band 3
C5
C1
C6
C2
C3
Band 1
Parallepiped Classifier
Band 2
C4
Band 3
Assigned to class 5
C5
C1
C6
C2
C3
Unclassified
Band 1
Parallepiped Classifier
Hierarchical Classification
Decision Tree Classifier
Multistage classification technique
Series of decisions taken to determine the label to
be associated with a pixel
Different data sources, different features, and even
different classification algorithms can be used at
each decision stage
Classifier training is simpler since fewer classes
considered at each stage
w1, w2, w3, w4, w5

w1, w2, w3
w1
w4, w5
w2, w3
w2
w4
w3
Decision Tree
Classifier
Hierarchy
w5
Supervised
Classification
Classification Result
Non-Parametric Classifiers
Nearest-Neighbor Classifier
Non-parametric in nature
The algorithm is:
Find the distance of given feature vector x from ALL
the training samples
x is assigned to the class of the nearest training
sample (in the feature space)
This method does not depend on the class
statistics like mean and covariance.
Concept
2-D Feature Space
Band 2
Class 3
xxxxx
x x
xxx
xxx
ooo
o o o
ooo
Feature
Vectors
Class 2
Sample to be classified
++++
+++
++++
Band 1
Class 1
Nearest neighbor
K-NN Classifier
K-nearest neighbour classifier
Simple in concept, time consuming to implement
For a pixel to be classified, find the K closest training
samples (in terms of feature vector similarity or
smallest feature vector distance)
Among the K samples, find the most frequently
occurring class Cm
Assign the pixel to class Cm
K-NN Classifier
Let ki be number of samples for class Ci (out of K
closest samples), i=1,2,,L (number of classes)
Note that
The discriminant for K-NN classifier is

gi(x) = ki
The classifier rule is
Assign x to class Cm if gm(x) > gi(x), for all i, im
K-NN Classifier
Let ki be number of samples for class Ci (out of K
closest samples), i=1,2,,L (number of classes)
Note that
The discriminant for K-NN classifier is

gi(x) = ki
The classifier rule is
Assign x to class Cm if gm(x) > gi(x), for all i, im
Spectral Angle Mapper

Given a large dimensional data set, computing the covariance
matrix, its inverse, and the distance for each pixel
(X m)TS-1(X m) is highly time consuming and if the
covariance matrix is close to singular then its inverse can be
unstable, leading to erroneous results
In such cases, alternate methods can be applied, such as
Spectral Angle Mapper
S.A.M. Principle
If each class is represented by a vector vi, then the
angle between the class vector and the pixel feature
vector x is given by
cosq = [vi.x] / [|vi||x|]
For small values of q, the value of cosq is large
The likelihood of x to belong to different classes can be
ranked according to the value of cosq.
S.A.M. Advantage
The value of the vector would not be greatly
affected by minor changes in vi or x.
The computation is simpler compared to the
Mahalanobis distance computation involved in
ML method
Role of Image Classifier

The image classifier performs the role of a discriminant
discriminates one class against others
Discriminant value highest for one class, lower for other
classes (multiclass)
Discriminant value positive for one class, negative for
another class (two class)
Discriminant Function
g(ck, x) is discriminant function, relating feature vector x
and class ck, k=1,,L
Denote g(ck,x) as gk(x) for simplicity
Multiclass Case
gk(x) > gl(x), l = 1,,L, l k x ck
Two Class Case
g(x) > 0 x c1; g(x) < 0 x c2
Output
class
Decision
g1(x)
x1
g2(x)
x2
g3(x)
x3
gK(x)
xn
Linear Discriminant Function

The discriminant in the 2-class case can be written as
g(x) = wtx + wo = 0
The weight vector w = [w1 wL]t is an L-dimensional
vector, where L is the number of bands
If two samples x1 and x2 are on the plane, we can write
wt(x1-x2) = 0 i.e., w is perpendicular to the
discriminant function
Algorithm for Linear Discriminant

Classifier
Given two classes c1 and c2, we have
wtx > 0 when x is in class c1
wtx < 0 when x is in class c2
Training this classifier involves determining the weights
w
They may be determined using training data by
optimization methods
Support Vector Machines
x
denotes +1
denotes -1
yest
f(x,w,b) = sign(w. x - b)
Linear Classifiers
a
x
denotes +1
denotes -1
yest
Linear Classifiers
a
x
denotes +1
denotes -1
yest
How would you

classify this data?
Linear
Classifiers
yest
denotes +1
denotes -1
Linear
Classifiers
x
denotes +1
yest
denotes -1
Any of these
would be fine..
..but which is
best?
x
denotes +1
denotes -1
Margin
yest
Define the margin

of a linear
classifier as the
width that the
boundary could be
increased by
before hitting a
datapoint.
Maximum Margin
denotes +1
denotes -1
Linear SVM
The maximum
margin linear
classifier is the
linear classifier
with the, um,
maximum margin.
This is the
simplest kind of
SVM (Called an
LSVM)
Maximum Margin
The maximum
margin linear
classifier is the
linear classifier
with the, um,
maximum margin.
denotes +1
denotes -1
Support Vectors
are those
datapoints that
the margin
pushes up
against
This is the
simplest kind of
SVM (Called an
LSVM)
Linear SVM
Why Maximum Margin?

1. Intuitively this feels safest.
denotes +1
denotes -1
f(x,w,b)
= sign(w.
- b)
2. If weve made
a small
error inx the
location of the boundary (its been
The maximum
jolted in its perpendicular
direction)
this gives us leastmargin
chance linear
of causing a
misclassification. classifier is the
3. The model is immune
removal of
linearto classifier
any non-support-vector
datapoints.
with the,
um,
maximum
4. Empirically it works
very very margin.
well.
This is the
simplest kind of
SVM (Called an
LSVM)
Support Vectors are those data points that the
margin pushes up against
denotes +1
denotes -1
wx +b = 0
Estimate the Margin

What is the distance expression for a point x to a line wx+b= 0?
d ( x)
xw b
w
2
2
xw b
d
2
w
i 1 i
denotes +1
wx +b = 0
denotes -1
Margin
What is the expression for margin?

margin arg min d (x) arg min
xD
xD
xw b
d
2
w
i 1 i
Maximize Margin
denotes +1
wx +b = 0
denotes -1
Margin
argmax margin(w, b, D )
w ,b
= argmax arg min d (xi )

w ,b
xi D
argmax arg min

w ,b
xi D
b xi w
d
2
w
i
i 1
Maximize Margin
denotes +1
wx +b = 0
denotes -1
Margin
argmax arg min

w ,b
xi D
b xi w
d
2
w
i 1 i
subject to xi D : yi xi w b 0
Maximize Margin
denotes +1
denotes -1
Margin
wx +b = 0
argmax arg min
w ,b
xi D
b xi w
d
2
w
i
i 1
Strategy:
xi D : b xi w 1
argmin i 1 wi2
d
w ,b
* *
d
2
{w , b }= argmax
w
k 1 k
w, b
subject to

y1 w x1 b 1

y2 w x2 b 1
Maximum Margin
Linear Classifier
....

y N w xN b 1
This is a standard constrained quadratic optimization

problem solvable using known techniques, described in
detail in most standard textbooks
Multilayer Perceptron
Neural Networks
Artificial Neural Networks for

Engineering Problems
face recognition
detect fraudulent use of credit cards
control blast furnaces
rate bond investment risk
automatically diagnose the cause of aircraft engine
failures
drive robotic automobiles down busy highways
Satellite image classification
Inspiration from Neurobiology
A physical neuron
Artificial
Neuron
Input
Layer
Elements of an
artificial neuron
Link
Weights
Summation
and activation
Output
layer
Mathematical Representation
x1
Inputs
x2
w1
n
w2
xn .
net wx
i ib
Output
i1
y f(net)
wn
b
of the Activation Function
of the Activation Function
Examples
Nonlinear form of the neuron:
y f ( x, w)
y
1
1 e
ye
w x
Sigmoidal Function
|| x w||2
2 a2
Gaussian Function
A Simple Perceptron Training

Algorithm
Its a single-layer
network
Change the weight by
an amount proportional
to the difference
between the target
output and the actual
output.
W = * (T-Y)X
Wnew =Wold + W
Perceptron Learning Rule
x1
w21
x2
xm
w1m
w11
w12
y1
w22
w2m
Y=hardlim(WX)
y2
Delta Rule
The delta rule requires that the activation function be
differentiable. The learning takes place using a set of pairs of
inputs and corresponding known outputs and weights are
adjusted accordingly.
If
If
If
ynk t nk
ynk t nk
ynk t nk
, then change no weights.

n
, then w
x
kj
j
wkj
, then
n
wkj
xj
x nj t nk
If a perceptron can solve the problem, the perceptron learning

procedure will converge in a finite number of steps to a
solution.
Multilayer Perceptron Network

I
N
P
U
T
O
U
T
P
U
T
N
O
D
E
S
N
O
D
E
S
H I D D E N LAY E R S
Multilayered Perceptron
ANN research was in the limbo for nearly two
decades when the single stage perceptron
network was found unable to deal with many
commonly encountered problems
The multistage perceptron network was felt to be
the answer to this problem. However, the problem
is
How is the multilayered perceptron network
trained?
Training the Multilayer

Perceptron Network
Any given neuron
Notation
y0=+1
wj0(n)=bj (n)
y1(n) wj1(n) netj(n)
yi(n)
m
net j ( n ) w ji ( n) yi ( n)
i 0
y j ( n ) f j ( net j ( n ))
e j ( n) d j ( n ) y j ( n )
wji(n)
f(.)
yj(n)
dj(n) is the desired output

and yj(n) is the computed
output

Perceptron Network
Total Error computed over all neurons in the output layer
1
( n) e 2 ( n)
2 jC
If there are N training samples, then the average error can
be computed as
1
av
N
( n)
n 1

Perceptron Network
Given that the weights are the unknowns, find the
derivative of the error function with respective to the
weights and move in the direction of the negative gradient.
(n)
(n) e j ( n) y j (n) net j ( n)
w ji (n) e j (n) y j (n) net j (n) w ji (n)

We have to compute each of the terms on the RHS to obtain
the derivative on the LHS
Multilayer Feed-forward
Network
Signals flow from the
input layer towards
the output layer
INPU
T
INPU
T
Forward
pass
Error is backpropagated!
Error signal flows from
output layer backwards
towards the input layer,
updating the weights
along the way
INPU
T
INPU
T
backward
Moving Towards Error Minimum

The error does not decrease monotonously
towards the minimum value
Oscillations, and stagnation are common during
the gradient descent procedure
Local Minima Of The Error

Function
Error
Weight Values
Network Configuration issues
The network size cannot be too small. It cannot

learn the relationship between the input and the
output
The network size cannot be too large. It takes
too long to train. Then it generalizes poorly.
Weight Initialization
Before training begins, what values should weights
have?
Too large weights or all zero weights are not
desirable.
Thus, weights are typically initialized to small random
values.
Optimization techniques can be used for smart
network weight initialization. Genetic algorithms are
one such approach
Momentum
Oscillations can be reduced:
E
w (t 1) w (t ) ( ) m wij( ) (t ) wij( ) (t 1)
wij (t )
()
ij
()
ij
Weights at iteration t+1 depend on error derivative

and difference between weights at iterations t and t1.
m - momentum term. Usually smaller than the gain
term .
Problems with BP Algorithm
Inability to learn
Error on training patterns never reaches a low level. This
typically means that either the network architecture is
inappropriate or the learning process has pushed the
network into a bad part of weight space.
Inability to generalize
The network may master the training patterns but fail to
generalize to novel situations. This typically means that
either the training environment is impoverished or the
network has over-learned from the inputs
Selected Applications of MLP

Classifier
Landuse/Landcover classification
Edge and line detection
Supervised Image
Classification
Identify the number of classes
Identify the training data and generate the
training patterns
Define the network
Input/Output layers
Hidden layers
Gain and momentum terms
Supervised Image Classification

Size of input layer = number of bands in the input data
Size of output layer = number of classes into which
the data is mapped
Hidden layer(s): = in practice, one or two hidden
layers are used
Usually first hidden layer has more nodes
And second hidden layer has fewer nodes
Supervised Image
Classification
Gain term = 0.1 to 0.5 in practice
Higher the gain term value, faster the change in
weights, but can lead to instability
Momentum term = 0.01 to 0.1 in practice
Gain and momentum values are modified if
convergence is slow
Low resolution image (WiFS)
Seven class
classification
using 3-layer
neural
network
Neural network classification
Texture
Analysis
MUMBAI
Data: IRS-1C,
PAN
Consists of
1024x1024
pixels.
Texture Feature (IDM)
LEGEND
WATER
MARSHY LAND /
SHALLOW WATER
HIGHLY
BUILT-UP AREA
PARTIALLY
BUILT-UP AREA
OPEN AREAS/
GROUNDS
Texture
Classification by
neural networks
Genetic Algorithms
Genetic Algorithms
Genetic algorithms are one of the well known
tools for optimization.
They are employed to generate optimal solutions
employing the principles of genetic evolution.
They employ the concept of random search
instead of deterministic search
One of the applications in the context of neural
nets is smart initialization of network weights
Basic Principle
Numerical approach to problem solving
Uses genetics as its model of problem solving.
Apply rules of reproduction, gene crossover, and
mutation
Start with a number of candidate solutions that evolve
over genetic cycles towards the optimal solution
Solutions evaluated using fitness criteria
Fittest will survive
Genetic Algorithm Approach

Maximization by Hill climbing
Global peak
Local peak
Climber i
104
Multiple Candidates
Multi-climbers
Climber j
Climber k
Climber i
Motivation
In course of time at least one candidate may
reach the global peak
Candidate
n at
global
peak!
Candidate j
Candidate i
Survival of the Fittest

The main principle of evolution used in GA is
survival of the fittest.
The good solutions survive, while bad ones die.
The definition of fitness is application dependent
Smart Initialization of Link Weights by

GA
Initialize the population
Select individuals for the mating pool
Perform crossover
Perform mutation
Insert offspring into the population

no
Stop?
yes
The End
Designing GA...
How to represent genomes?

How to define the crossover operator?
How to define the mutation operator?
How to define fitness function?
How to generate next generation?
How to define stopping criteria?
Simple Crossover Step

RANDOM CROSSOVER
POINT = 3
OFFSPRING 1
OFFSPRING 2
Multi-point Crossover
Shuffling Crossover
Single Point Mutation

1
0
RANDOM
MUTATION
POINT
AFTER
MUTATION
Uniform Mutation
1
Swap Mutation
1
RANDOM
POINTS FOR
SWAPPING
Fitness Function
Cost associated with a weight set =
Average error in classification for the entire set of
test samples
Lower error = Higher fitness
Using a number of candidate weight sets, a
multilayer perceptron network is initialized.
Image classification application using ANN
Selection
Roulette wheel selection
Rank selection
Tournament selection
Roulette Wheel Selection
START
COMBINE ALL 2n MEMBERS

FROM BOTH OLD AND NEW
POPULATION AND RANK THEM
ACCORDING TO THEIR
FITNESSES.
SELECT FIRST n RANKS AS

THE POPULATION FOR THE
NEXT ITERATION
END
Rank
Selection
Tournament
Selection
START
COMBINE ALL 2n MEMBERS OF BOTH OLD AND

NEW POPULATION
FROM SUCH 2n MEMBERS RANDOMLY SELECT
2 MEMBERS
Compare the fitnesses between members and

select the fitter one while return the other
member back to the population
CONTINUE THE PROCEDURE UNTIL n MEMBERS
ARE SELECTED FOR THE NEW ITERATION
STOP
Convergence Criteria
Image Classification
Accept the member if the corresponding error in
classification is within the user specified
tolerance limit (e.g. accuracy > 95% or error <
5%)
Fitness Function
Cost associated with a weight set =
Average error in classification for the entire set of
test samples
Lower error = Higher fitness
Using a number of candidate weight sets, a
multilayer perceptron network is initialized.
Image classification application using ANN
Sample Input Data
InputDataFileForGA01
InputNodes
4; HiddenLayers
2
HiddenNodes
16 13; OutPutNodes
7
PopulationSize
20; No.OfGenerations 200
SearchMinValue -5.0; SearchMaxValue +5.0
AllowbleError
0.01; CrossOverProbability 0.80
MutationProbability 0.1
TrainingDataFileName rajtrpat.dat
NetWorkWeightsFile raj.wgt
Sample Input Data
InputDataFileForGA01
InputNodes
4; HiddenLayers
2
HiddenNodes
16 19; OutPutNodes
15
PopulationSize
20; No.OfGenerations 20
SearchMinValue -2.0; SearchMaxValue +2.0
AllowbleError
0.01; CrossOverProbability 0.80
MutationProbability 0.1
TrainingDataFileName kdatrpat.dat
NetWorkWeightsFile kdaga.wgt
Input Image
GA-NN Supervised Classification
Example 2
Results
High Resolution Image Analysis

High Resolution
Spatial
Spectral
IRS-1C LISS3
23.5m
Quickbird Window
Per-pixel methods in high spatial resolution image

analysis
Classif
ication
Color/
Form/
Area/
Texture
Context
Spectr
al
Shape
Size
Pixelbased
Objectbased
Excessively detailed classification of the land surface.

OB
PB
Generic Framework for High Spatial

Resolution Image Analysis
High Resolution Satellite Image
Pre-processing
Decompose image at different level
Segment image at Different Resolutions
Linking the regions of different resolutions
Connected Component Labeling
Spatial Features
Spectral Features
Object-Specific Classification
Texture Features
Context
General Purpose Classification
Post-processing (Context Based Process)

Classified Image
133
Rizvi, I.A. and Mohan, B.K., IEEE-TGRS Dec. 2011
Illustration of
the working of
edge-preserving
smoothing
Classification Strategies
High Resolution Satellite Image
Pre-processing
Decompose image at different level
Segment image at Different Resolutions
Linking the regions of different resolutions
Connected Component Labeling
Spatial Features
Spectral Features
Texture Features
Object-Specific Classification
Context
General Purpose Classification
Post-processing (Relaxation Labeling Process)

Classified Image
135
Medium Resolution Input Image
CLASSIFIED
IMAGE
(Mumbai)
Results
Study Area 1
OB Segmentation
Object based image

classification
Study Area 2
OB Segmentation
Buildup
Open ground
Vegetation
Object based classification
Study Area 3
Parameters for OB Segmentation

Region reduced
From
To
3541
1986
Merging
Threshold
OB Segmentation
AFI
No. of
Regions
Value
15
0.022
Grass
Vegetation
Roof top
Concrete
Open ground
Study Area 4
OB Segmentation
Water
Sand
Vegetation
Buildup
Road
Slum
Open Area
CBFNN+RLP
Mod CBFNN
Study Area 5
OB Segmentation
Buildings1
Open ground
Road
Shadow
Buildings2
Vegetation
Study Area 6
OB Segmentation
Aero plane
Open Area
Vegetation
Water
Shadow
Settlements
Vehicle
Road
Mod CBFNN
CBFNN+RLP
Object Specific Classifiers
Extraction of buildings
Roads
Trees
Waterbodies
Airfields
Examples
Road Extraction
Biplab Banerjee, Siddharth Buddhiraju and
Krishna Mohan Buddhiraju, Proc. ICVGIP 2012
Examples
Building outline extraction

by object based image
analysis
Biplab Banerjee and Krishna Mohan Buddhiraju, UDMS 2013,

Claire Ellul et al. (ed.), CRC Press, May 2013
High Spectral
Resolution
Image
Analysis
High spectral resolution image

Atmospheric Correction
Dimensionality Reduction
Pure Pixel / Training Data

Identification
Supervised
Classification
Spectral
libraries
Mixture Modeling
Spectral
Matching
Abundance Mapping
General Purpose
classification
Sub-pixel Mapping &

Super-Resolution
Classification
INTRODUCTION
Hyperspectral sensors
Large number of contiguous
bands
Narrow spectral BW
Advantages
Better discrimination among
classes on ground is offered
Highly correlated bands
Huge information from a
contiguous and smooth
spectra
Hyperspectral data of a scene

(Source: remotesensing.spiedigitallibrary.org)
11/13/2016
Centre of Studies in Resources Engineering, IIT

BOMBAY
154
INTRODUCTION
Problems in Hyperspectral Remote Sensing
Cost of the system
Computational
complexity
Huge storage memory

Fast processors
High transmission BW
Hughes
phenomenon
More number of dimensions

require more training samples to
represent class statistics
For limited size of training set the
accuracy
of
classification
decreases
as
dimension
increases beyond certain point
The Hughes Phenomenon (Source: doi.ieeecopmutersociety.org)
11/13/2016

BOMBAY
155
INTRODUCTION
Dimension Reduction approaches
Feature Selection
Feature Extraction
Feature subset selection from available Transformation of

features.
dimensional space.
Original band information is preserved.
bands
on
lower
New projected axes are formed.
Search algorithm with a criterion function is A transform that maximizes the de-correlation,
involved.
ranks the axes and is invertible is involved.
Eg: Sequential search, Tabu search,
algorithms, etc.
11/13/2016
genetic Eg: PCA, MNF, Projection pursuit etc.

BOMBAY
156
GA-Based IMPLEMENTATION
List of all parameters used
Desired features = 30
Available features = 155
Population size = 30
Number of subpopulations = 2
Maximum generations = Different runs for 10, 20, 30 and 40
Crossover probability = 0.75
Mutation probability = 0.05 (Simple mutation)
Stopping criteria = Maximum iterations
Migration rate = 0.5 i.e. half candidates are exchanged
Migration policy = Best-Worst exchange
Migration interval = Different runs for 0.5, 0.33 and 0.25
11/13/2016

BOMBAY
157
IMPLEMENTATION
Integer encoding: chromosome is a string of integers equal to the desired number of bands
E.g. chromosome = { 4 12 14 20 21 23 25 29 32 35 41 45 46 57 60 67 69 72
86 91 93 99 105 115 120 131 141 145 149 154 }
Single point crossover:
Fitness function: accuracy of classification i.e. the number of pixels that are wrongly classified by 1
nearest neighbour classifier
fitness (x) = | misclassified pixels |
and minimize the fitness function
Selection: Rank selection where we first combine the members from both the old and the new
population and then we rank them according to their fitness values. The members having low fitness
function value represent better candidates and they are selected for further evolution.
11/13/2016

BOMBAY
158
IMPLEMENTATION
The stepwise workflow adopted
11/13/2016

BOMBAY
159
INPUT DATASET
Details of the hyperspectral scene
Sensor
Hyperion on board the EO-1
Date of imagery
Jan 27, 2014
Scene Request ID
EO11480472014027110KZ
Site Latitude
19.04
Site Longitude
72.84
HYP Start Time
2014 027 04:50:36
HYP Stop Time
2014 027 04:54:56
Spatial co-ordinates of input image

Image point
Latitude
Longitude
Upper left corner
19.544844
72.916448
Upper right corner
19.531030
72.988297
Lower left corner
18.626129
72.701583
Lower right corner
18.612367
72.773605
It covers the part of Mumbai and consists of 242 bands some of which do not contain any
information due to atmospheric absorption
The bad bands are removed and remaining bands are pre-processed for atmospheric correction
and we are left with final dataset consisting of 155 bands
The original full scene is 3400 x 256 with 16 bit original radiometric resolution and the band
interleave as BIL
11/13/2016

BOMBAY
160
600 x 256 x 242
11/13/2016
Subset 1
imageSubset1
500 x 256 x 242
Original
Scene Engineering, IIT
Centre of Studies in Resources
BOMBAY
Subset 2
imageSubset2
161
RESULTS : MGA Selected Features

Sr.
No.
Band Number
Wavelength
(nm)
Sr.
No.
Band
Number
Wavelength
(nm)
1124.28
1124.28
Band 12
467.517
16
Band 113
1275.66
1134.38
1134.38
Band 22
569.27
17
Band 115
1295.86
599.80
1184.87
18 18 Band
Band
104 104 1184.87
Band 25
599.796
18
Band 136
1507.73
Band 28
630.32
1225.17
19 19 Band
Band
108 108 1225.17
Band 31
660.848
19
Band 144
1588.42
Band 32
671.02
1537.92
20 20 Band
Band
139 139 1537.92
Band 37
721.899
20
Band 151
1659
Band 35
701.55
1578.32
21 21 Band
Band
143 143 1578.32
Band 43
782.951
21
Band 156
1709.5
Band 42
772.78
1608.61
22 22 Band
Band
146 146 1608.61
Band 44
793.127
22
Band 161
1759.89
Pm = 0.05
Band 44
793.13
1709.50
23 23 Band
Band
156 156 1709.50
Band 46
813.477
23
Band 188
2032.35
Migration interval = 0.25
Band 46
813.48
1739.70
24 24 Band
Band
159 159 1739.70
Band 83
972.993
24
Band 203
2183.63
Migration rate = Half
10
Band 51
864.35
1759.89
25 25 Band
Band
161 161 1759.89
10
Band 85
993.171
25
Band 205
2203.83
11
Band 79
932.64
2032.35
26 26 Band
Band
188 188 2032.35
11
Band 89
1033.49
26
Band 206
2213.93
12
Band 87
1013.30
2042.45
27 27 Band
Band
189 189 2042.45
12
Band 93
1073.89
27
Band 207
2224.03
13
Band 90
1043.59
2203.83
28 28 Band
Band
205 205 2203.83
13
Band 94
1083.99
28
Band 212
2274.42
14
Band 93
1073.89
2224.03
29 29 Band
Band
207 207 2224.03
14
Band 103
1174.77
29
Band 215
2304.71
15
Band 95
1094.09
2254.22
30 30 Band
Band
210 210 2254.22
15
Band 105
1194.97
30
Band 216
2314.81
MGA Parameters
d = 30
D = 155
Psize = 30
Pc = 0.75
Migration Policy = Best

Worst exchange
Sr.
No.
Band Number
Wavelength Sr. Sr. Band

Band Wavelength
Wavelength
No. No. Number
(nm)
Number
(nm)(nm)
Band 11
457.34
16 16 Band
Band
98 98
Band 15
498.04
17 17 Band
Band
99 99
Band 25
Selected subset for imageSubset1 by MGA
11/13/2016
Selected subset for imageSubset2 by MGA

BOMBAY
162
RESULTS : Spectral Plot
Spectrum of a random pixel taken from dataset imageSubset2
Spectrum of a random pixel taken

reduced
Centre offrom
Studiesdimension
in Resources Engineering,
IIT image of imageSubset2
11/13/2016
BOMBAY
163
RESULTS : MGA vs SGA

The fitness values
for SGA and MGA
with
different
migration
intervals
SGA
MGA (0.5)
Best Fitness Value
Sr. No.
Number of
Generations
SGA
MGA (0.5)
MGA (0.33)
MGA (0.25)
10
104
94
93
88
20
94
88
80
79
30
87
83
76
75
40
87
82
75
73
MGA (0.33)
MGA (0.25)
Best Fitness Value
105
100
95
90
85
80
75
70
10
20
30
40
Performance of SGA and

MGA
with
different
migration interval vs the
number of generations
Number of generations
11/13/2016

BOMBAY
164
RESULTS : MGA vs FE METHODS

Legend
All Bands
MGA
PCA
MNF
ICA
Results of LU-LC classification on the dataset imageSubset1 All 155 bands, MGA selected 30 bands, 30
PCA components,
MNFincomponents,
30 ICA
Centre of30
Studies
Resources Engineering,
IIT components
11/13/2016
BOMBAY
165

Accuracy and kappa coefficient for all datasets used for classification for
imageSubset1
Sr. No.
11/13/2016
Dataset used for classification
Overall Accuracy (%)
Kappa coefficient
Classification with all 155 bands
92.6883
0.8969
MGA reduced 30 bands
94.7009
0.9251
PCA transformed 30 components
94.1654
0.9176
MNF transformed 30 components
93.8146
0.9127
ICA transformed 30 components
94.6455
0.9244

BOMBAY
166

Legend
All Bands
MGA
PCA
MNF
ICA
Results of LU-LC classification on the dataset imageSubset2 All 155 bands, MGA selected 30 bands, 30
PCA components, 30 MNF components, 30 ICA components
11/13/2016

BOMBAY
167

Accuracy and kappa coefficient for all datasets used for classification for
imageSubset1
Sr. No.
11/13/2016
Dataset used for classification
Overall Accuracy (%)
Kappa coefficient
Classification with all 155 bands
93.4378
0.9104
MGA reduced 30 bands
96.0134
0.9442
PCA transformed 30 components
94.7816
0.9271
MNF transformed 30 components
94.7144
0.9262
ICA transformed 30 components
94.7816
0.9271

BOMBAY
168
Robust Watermarking of
Satellite Images
What is Watermarking?
A watermark is pattern of bits inserted into a digital
image, audio or video that identifies the files copyright
information (author, rights, etc).
Objective:
To find a technique to watermark satellite images robustly
against attacks
To implement and analyze the proposed technique
To find suitable technique for point selection from vector
dataset for watermarking.
To implement and analyze the proposed technique
Transform and LSB combination

Methodology
Comparison
Methods
Domain and Type
Image
Experimental Attacks
LSB
Spatial , fragile
Greyscale
cant handle compression and

simple operations
DCT
Frequency, Robust
Greyscale
robust to Jpeg compression, filters,

gaussian noise, histogram
equalization, stretching
Third Level DWT
Frequency, Robust
Satellite Image
robust to jpeg compression,

suitable for copyright protection
172
Texture Classification
Texture is the visual and especially tactile quality of a surface.
Each pixel of an image is classified into two classes.

1.High Texture
2.Low Texture
Method applied is modified k-means (unsupervised

classification algorithm)
K-means Algorithm
Texture based classification
Suitable areas and Embedding of

Watermark
Watermarking locations
Proposed method for Raster WM

Texture Based DWT Alpha method
Results of Texture based DWT Alpha

DWT Texture based Alpha method
No.
Input Image
Input Watermark Attack type
Wmr Cor
Iwm cor
12048x2048
16x16
Smoothing 3x3
0.9482
22048x2048
16x16
Smoothing 5x5
0.8768
32048x2048
16x16
Smoothing 7x7
0.8254
42048x2048
16x16
0.528
52048x2048
16x16
0.7837
62048x2048W
16x16
Contrast stretch
salt_and_paper(0.
01)
speckle noise
(0.005)
0.8174
Vector Dataset point selection

The watermarking technique for Vector image is
shown by Zope-Chaudhari et al. 2015.
Here all the points are selected for watermarking
which made watermarked vector data look visibly
distorted.
So to reduce this distortion we propose two
techniques.
1. Slope based point selection method.
2. Alternate point selection method.
Vector point selection method proposed

1. Slope based selection of points
Concept here is to select points that dont fall in
straight line or are junction points to reduce
distortion in image.
Also we reduce the alpha used by the watermarking
method in
Zope-Chaudhari et al. 2015a to embed watermark
in the selected points.
During retrieval we take the same algorithm to
select the points.
Algorithm for slope based selection of

points
Input shape file is info.
List all x and y coordinates in vector x and y. Remove
the junction points.
Calculate the slopes by taking three consecutive points
and select the points which are having opposite slopes
and having difference between the slopes atleast 0.3.
Save this set of points in other vector xn and yn.
Watermark the points selected and Replace them in the
vector data with original set.
Results of slope based point selection

method.
Comparison of original dataset and

the watermarked
Vector point selection method proposed

2. Alternate point selection method
Select every alternate point in the dataset which is
not a junction.
Apply watermarking algorithm on the selected
points.
This method reduces the number of points used for
watermarking thereby reducing error.
It is simple method to implement.
Results of Alternate points method
Comparison of original dataset and

the watermarked
Comparison of vector methods

Method
All points
Slope Based
Selection
Alternate
points
Input points
Selected
Points
Maximum
average error Error
MSE
58568
58568
0.0068
0.01
6.77E-05
58568
6814
6.89E-08
1.00E-07
6.89E-15
58568
28280
5.06E-04
0.001
5.06E-07
The above results show that the slope based method has minimum error as it has minimum
number of points for watermarking.
Also points embedded in slope based method are not in straight line so less distortion is
caused by it.
The Error is seen as reduced with number of points embedded is reduced.
Also, the watermarking strength alpha when reduced we have the error reduced. But if we
reduce the alpha more it can result in reduction in watermark robustness.
Multi-resolution Noise Filtering
189
Fast Discrete Curvelet Transform

Forward Transform
2 D data
FFT
Curvelet
coeff. in
frequency
domain
Zero padding &

Windowing
Curvelet
coeff. in
frequency
domain
Freq. data
Inverse Transform
IFFT
Reconstructed
data
Windowing &
Truncation
Reconstructed
Freq. data
IFFT
Curvelet
coefficients
FFT
Curvelet
coefficients
190
Proposed Adaptive Thresholding

Input curvelet coefficients.
Find maximum and minimum value of coefficients.
Iterate through (threshold=min to max) and execute
reconstruction of curvelet algorithm
Store the results in array for each loop such that temp =
[threshold, PSNR]
When loop completed, select the threshold that gives
maximum PSNR.
Perform curvelet reconstruction using best threshold
and store the results.
191
MRA based Hybrid Approach for noise

filtering
192
Original
193
Wavelet
Denoising
194
Curvelet
Denoising
195
Proposed Method
Curvelet patterns/artifacts observed while
denoising
These patterns not present in wavelet based
denoising
To overcome this problem and to retain
advantages of curvelet denoising
We propose, a combined approach of wavelet and
curvelet algorithms
Homogeneous areas => Wavelets
Heterogeneous areas => Curvelets
196
Proposed Method
To evolve a hybrid scheme whereby an image
may be delineated first into smooth and
textured regions first and
The denoising scheme may be applied
differently for each type of region
197
Proposed Method
1) Find edge magnitude by a standard edge detector
on noisy image
2) Run a 5x5 or 7x7 window over result of 1 and
calculate edge magnitude, and find average edge
gradient magnitude in this window
3) Apply k-means to select the threshold for
segmentation, apply thresholding, and declare
homogeneous and heterogeneous regions
4) Perform denoising using iterative thresholding on
transform domain coefficients
Ansari, Rizwan Ahmed and Buddhiraju, Krishna Mohan, k-means based Hybrid Wavelet
and Curvelet Transform Approach for Denoising of Remotely Sensed Images, Remote
198
Sensing Letters (IJRS), vol. 6 (12), pp. 982-991, 2015.
Extended Work
Four Different Methods are considered for

segmentation into homogeneous and
heterogeneous areas
Quadtree based
Entropy from GLCM based
Variance based
Edge magnitude based with fuzzy c-means
Edge Preservation Analysis

Edge Enhancing Index (EEI) EEI | V V | | V V
Image Detail Preserving Coefficient (IDPC) (correlation
coefficient between original image and filtered image)
f1
f2
199
Original Ikonos image (Powai area)

1m 1m
200
Noisy Ikonos image = 20
201
Wavelet Based Denoising
PSNR
26.7dB
EEI
0.63
MSE
149
IDPC
0.73
202
Contourlet Based Denoising
PSNR
25.9dB
EEI
0.69
MSE
157
IDPC
0.75
203
Curvelet Based Denoising
PSNR
28.9dB
EEI
0.88
MSE
127
IDPC
0.91
204
Quadtree based segmentation
205
ContCurv Based Denoising
PSNR
29.1dB
EEI
0.81
MSE
119
IDPC
0.88
206
WavCurv Based Denoising
PSNR
32.1dB
EEI
0.92
MSE
99
IDPC
0.90
207
Summary
Many remote sensing / image processing operations are extracting
information from images that can feed spatial problems with useful
data
The outcomes of satellite remotely sensed image analysis are
eventually linked to GIS
High spatial resolution image analysis outputs can be vectorized
into polygon layer with some pre-processing to remove tiny
polygons
Hyperspectral image analysis and like specialized tools can
provide information to add to fields in GIS like degraded forest /
polluted lake which requires detailed investigation which is
possible with specialized tools
Image preprocessing through noise filtering can improve the
analysis procedures
Thank You!

BKM MEP Sivakasi 14nov2016

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BKM MEP Sivakasi 14nov2016

Uploaded by

Copyright:

Available Formats

Satellite Image Processing

What is Remote Sensing?

Importance of Remote Sensing

Visible and Reflective Infrared

Reflectance Spectra of Earth Objects

Effect of High Spatial Resolution

Image can be viewed as a collection of objects with

High Spectral Resolution

Large number of contiguous sensors

Coarse Spectral Resolution

Small number of sensors

Unique spectra of objects

Steerable Sensor Systems

Steps in Digital Image Processing

Concept of Image Classification

Concept of Image Classification

Concept of Image Classification

Partially Supervised Classification

Combination of supervised and unsupervised

Statistical Characterization of Classes

Supervised Classification Principles

Number of classes L into which the image is to be

Parallelepiped Classifier - Example of a

w1, w2, w3, w4, w5

The discriminant for K-NN classifier is

The discriminant for K-NN classifier is

Spectral Angle Mapper

Role of Image Classifier

Linear Discriminant Function

Algorithm for Linear Discriminant

Support Vector Machines

How would you

Define the margin

Why Maximum Margin?

Estimate the Margin

What is the expression for margin?

= argmax arg min d (xi )

argmax arg min

argmax arg min

This is a standard constrained quadratic optimization

Artificial Neural Networks for

Inspiration from Neurobiology

A Simple Perceptron Training

Perceptron Learning Rule

, then change no weights.

If a perceptron can solve the problem, the perceptron learning

Multilayer Perceptron Network

Training the Multilayer

dj(n) is the desired output

Training the Multilayer

Training the Multilayer

w ji (n) e j (n) y j (n) net j (n) w ji (n)

Moving Towards Error Minimum

Local Minima Of The Error

Network Configuration issues

The network size cannot be too small. It cannot

Weights at iteration t+1 depend on error derivative

Problems with BP Algorithm

Selected Applications of MLP

Supervised Image Classification

Low resolution image (WiFS)

Neural network classification

Texture Feature (IDM)