Canny PIC

Case Studies
Canny Edge Detection

Particle-In-Cell Methods
Edges for inference

surface normal discontinuity
depth discontinuity
surface color discontinuity
illumination discontinuity
Edges in an image have many causes

An edge presents an opportunity to infer/
compress information required
Look at a few pixels in a binary image as opposed to
all pixels in a grayscale image
Biological plausibility
initial stages of mammalian vision systems involve
detection of edges and local features
Edge detection
Important 1st step of many
vision/graphics/compression algorithms
If you know the edges, the stuff in between
has low dimensional representations
Issues in edge detection

Noise and gradients
Edge localization (obtained edge must be at
true edge)
Edges of interest are in between other
gradients
Need non-maximum
suppression
Finally, edges returned

by the detector need to
be completed or pruned
using some labeling technique
Effects of noise
Consider a single row or column of the image
Plotting intensity as a function of position gives a
signal
Why does this happen and how can we find the edge?
Recall derivative definition
Laplacian of Gaussian
Consider
Laplacian of Gaussian
operator
Where is the edge?
Zero-crossings of bottom graph
Canny Algorithm
Gaussian convolution
Compute gradients via first derivative operators to
find magnitude and direction
Find pixels exhibiting the maxima of first derivative
gradient. i.e. find zero-crossings of the second
derivative, suppress other pixels
Run hysteresis
Cost of CPU version of Canny
Previous work
Reference CPU implementation: OpenCV computer vision
library.
Extremely fast.
Takes advantage of multicore and special processor instructions
Implementation on Parallel hardware:

Neoh & Hazanchuk, 2005: Edge Detection on FPGAs
Reference GPGPU implementation: Fung & Mann 2004, Fung
2005
J.-P Farrugia et al. GGPUCV

Filter part of the algorithm implemented in the NVIDIA SDK
(Podlozhnyuk, Image Convolution with CUDA)
First steps of Canny on GPU

Since these are done in the CUDA SDK will
not discuss much
Smoothing via the CUDA SDK
implementation
Coalesced read/write possible in one
direction due to natural ordering of image
pixels
Introduce a data structure with column apron
pixels placed in contiguous memory to allow
coalesced reads and writes for these
Smoothing Convolution filter

Non-separable Convolution Apron
Separable Convolution Apron
2 28 A pr on Pi xel s
7x7 Fil ter Wi n do w
16 x16 Th rea d B lock
192 A pro n Pi xel s
Gradient estimation
Filter 1
Sourc e Image
1
-1
=6
-2
-2
-2
-2
-2
-1
-2
-1
Interm ediate W indow
Filter 2
Sam ple Intermediate

Results
Window
Res ult
-5
Direction finding
Pixel Configurations
135 o
0o
45 o
90 o
45o
0o
90 o 135 o
= arctan(
Gy
)
Gx
At q, we have a maximum if the value is

larger than those at both p and at r.
Interpolate to get these values.
Non maxima suppression

Sample Gradient Magnitude Maxima
64
77
64
1 88
17 0
133
155
1 66
17 7
2 25
19 9
122
198
1 99
18 9
1 88
21 2
64
Gra di en t D ire ctio n
Ri dg e Pi xe ls
1 60
N on -ri dge Pixe ls
169
1 85
17 9
1 55
13 2
188
1 70
15 6
79
1 40
Hysteresis
Hysteresis Principles
2 thresholds technique for filtering edges
Low threshold t1
High threshold t2
Ridge pixels with magnitude m >= t2: Mark as

an edge pixel
t1 < m <t2 and adjacent to an edge pixel: Mark
as an edge pixel
Identifies strong edges and accounts for
comparatively weaker ones
Hysteresis Contd
Generalized Algorithm
Identify and mark all ridge pixels having
m >= t2 as visited edges
Add pixel into queue Q
Run a breadth first search (BFS) on Q
For each pixel i in Q
For each unvisited adjacent pixel j of i
Mark j as visited
If m(j) > t1, mark j as an edge and add j to Q
Remove i from Q
Terminate when Q is empty
Hysteresis Contd
CUDA Algorithm
Similar to the generalized algorithm
Multi-pass approach:
Preprocessing
Load central and apron gradient magnitude data from
global to shared memory
Set initial edge states
BFS
For each thread, run a BFS that marks potential edges
as definite edges
Write-back
Write all edge states to the gradient magnitude space
and reiterate
Hysteresis Contd
Preprocessing
For each thread-block

Load the central gradient magnitudes from
global to shared memory
Load a 1 unit wide apron from global to
shared memory
For each thread
Map the thread to unique central pixel i (nonapron)
Set pixel state as
S(mi) = {definite edge (-2): t2 <= mi
potential edge (-1): t1 <= mi < t2
non-edge (0): 0 < mi
(mi): mi <= 0}
Hysteresis Contd
Breadth First Search
For each thread with pixel i with edge state =
potential edge (-1)
If i has an adjacent pixel j with edge state = definite
edge (-2), add i to queue Qi
For each pixel k in Qi
Mark k as visited in
shared memory
Set edge state of k to
definite edge
For each unvisited
adjacent pixel l of k
Mark l as visited
If m(l) > t1, add l
to Qi
Remove k from Qi
Preprocessing
Breadth First Search
54
66
155
166
188
147
188
132
60
78
156
178
187
166
132
52
80
96
170
190
192
188
189
200
150
166
170
225
199
255
160
150
198
199
189
199
212
180
155
156
170
179
183
185
140
188
170
130
156
150
160
111
122
189
154
120
150
166
145
122
125
200
143
144
54
66
155
166
188
147
188
132
A1
B1
A2
A2
B2
B2
A1
A2
B1
A2
A2
B2
4
A3
60
-1
B2
A3
A2
B2
B2
A1
A2
B1
A2
A2
B2
B2
A2
B2
B2
A1
A2
B1
A2
A2
B2
B2
52
80
-1
-1
-1
-1
200
150
-2
-1
-2
198
-1
-1
-1
-1
-1
156
170
-1
130
A3
156
-1
120
150
166
145
122
125
200
143
144
A2
A3
B3
A2
B3
B4
Definite Edge
Non-edge
Potential Edge
Apron Pixel
t1 = 224
t 2 = 187
Write-back
Hysteresis
Contd
Iteration 1:
Preprocessing
Iteration 1:
BFS
Iteration 2:
Preprocessing
Iteration 2:
BFS
Non-edge
54
66
155
166
188
147
188
132
54
66
155
166
188
147
188
60
78
170
78
156
178
187
166
132
132
52
80
198
199
188
187
186
188
199
190
199
188
190
192
188
189
200
150
166
150
166
170
225
199
255
160
150
198
222
187
199
198
199
150
199
212
180
155
156
170
237
198
199
188
188
190
188
183
185
140
188
170
130
156
200
156
200
160
111
122
189
154
120
150
166
145
122
125
200
143
144
150
166
145
122
125
200
143
144
54
66
155
166
188
147
188
132
54
66
155
166
188
147
188
132
60
78
170
-2
52
80
-1
-1
-1
-1
-1
-1
199
190
-2
--2
-2
-2
-2
-2
200
150
166
150
-2
-2
-2
150
198
-2
-2
199
198
-1
-2
-2
-2
156
170
-2
-2
-2
-2
188
190
-1
-2
130
156
200
156
-1
-2
120
150
166
145
122
125
200
143
144
150
166
145
122
125
200
143
144
54
66
155
166
188
147
188
132
54
66
155
166
188
147
188
60
-2
52
80
-1
-1
-1
-1
-1
-1
-2
-1
-2
--2
-2
-2
-2
-2
200
150
198
-2
170
-2
-2
-2
-2
156
150
166
145
122
125
54
66
155
166
60
80
-2
-2
150
198
170
132
-2
-2
-2
150
-2
-1
-2
-2
-2
156
-1
-2
-1
-2
130
-1
-1
-2
120
200
143
144
150
166
145
122
125
200
143
144
188
147
188
132
54
66
155
166
188
147
188
132
-2
52
-2
-2
-2
-2
-2
-1
-2
--2
-2
-2
-2
-2
200
-2
-2
-2
150
-2
-2
-1
-2
-2
-2
-2
-2
156
-2
-2
-2
-2
-1
-2
-2
-2
130
156
-1
-2
-2
120
150
166
145
122
125
200
143
144
150
166
145
122
125
200
143
144
Potential-edge
Iteration 2
-2
-1
Preprocessing
Definite-edge
Apron Pixel
When all BFS from treads

terminate, write edge states
of central pixels from shared
memory to the gradient
magnitude space in global
memory
Multi-pass
Multiple calls to the entire
procedure allow data
migration
1 unit wide aprons allow
previous edge states to be
passed between threadblocks at the beginning of the
preprocessing stage
Bordering pixels along
thread-blocks that became
definite edges will be visible
to adjacent thread-blocks in
the following pass
profiling
Comparison with OpenCV
LENA
MANDRILL
Matlab
CUDA interoperability with Matlab via
compiled mex files (SDK example)
Adapted existing code to use
glCudaEdgeDetector(A,nChannels,threshold
Low,thresholdHigh,hysteresisIts,sigma);
More substantial speedups
Particle in Cell Methods

Arise in many areas
Fluid Mechanics
Plasma simulations
Ingredients
Field Equation
Particle evolution equation
Particles may be
Real particles (electrons, ions, bubbles, solid particles)
Point samples of a continuous function
General scheme
Separation of length and time scales
Field evolves at a macroscopic scale and particles contribute to
this evolution in an integral sense
Particles evolve and are influenced by local field, and possibly
local particle collisions
Particle Equations
Maxwells equations for E and B
How Does PIC work?

q
pp Bp )
(
m
Solve Maxwell Equations on grid
pi := ip (zero self-force)
Lorentz-Force: Fp = qE p +
qi = ip q p
Dual
DualGrid
GridCell
Cell
qi
E p = pi Ei
( Ei , B i )
(E
qp
,Bp )
Grid-Point
Grid-PointCharge
Charge
Charge Assignment
Force Interpolation
Basis of PIC plasma simulation

Acceleration and
increment of velocity
Calculation of forces
from fields and
velocity
Computation of electric
field. Magnetic is given.
Initial step
with =0
Displacements and new

positions. Boundary
conditions.
Calculation of
density. Current
profile fixed.
Resolution of Poisson
equation for the
electrostatic potential.
Major computational tasks

Interpolation of fields forces to particles
Evolving particles
Interpolating particle properties to grid for
next step
Major issues coalesced reads and writes
as particles move
Fast solution of field equations
Spectral methods
Finite-difference

Canny PIC

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Canny PIC

Uploaded by

Copyright:

Available Formats

Case Studies

Canny Edge Detection

Edges for inference

Edges in an image have many causes

Issues in edge detection

Finally, edges returned

Where is the edge?

Zero-crossings of bottom graph

Cost of CPU version of Canny

Implementation on Parallel hardware:

J.-P Farrugia et al. GGPUCV

First steps of Canny on GPU

Smoothing Convolution filter

192 A pro n Pi xel s

Interm ediate W indow

Sam ple Intermediate

At q, we have a maximum if the value is

Non maxima suppression

Gra di en t D ire ctio n

Ridge pixels with magnitude m >= t2: Mark as

Terminate when Q is empty

For each thread-block

Breadth First Search

When all BFS from treads

Comparison with OpenCV

More substantial speedups

Particle in Cell Methods

Maxwells equations for E and B

How Does PIC work?

Basis of PIC plasma simulation

Displacements and new

Major computational tasks

You might also like