You are on page 1of 31

Case Studies

Canny Edge Detection


Particle-In-Cell Methods

Edges for inference


surface normal discontinuity
depth discontinuity
surface color discontinuity
illumination discontinuity

Edges in an image have many causes


An edge presents an opportunity to infer/
compress information required
Look at a few pixels in a binary image as opposed to
all pixels in a grayscale image

Biological plausibility
initial stages of mammalian vision systems involve
detection of edges and local features

Edge detection
Important 1st step of many
vision/graphics/compression algorithms
If you know the edges, the stuff in between
has low dimensional representations

Issues in edge detection


Noise and gradients
Edge localization (obtained edge must be at
true edge)
Edges of interest are in between other
gradients
Need non-maximum
suppression

Finally, edges returned


by the detector need to
be completed or pruned
using some labeling technique

Effects of noise
Consider a single row or column of the image
Plotting intensity as a function of position gives a
signal

Why does this happen and how can we find the edge?
Recall derivative definition

Laplacian of Gaussian
Consider

Laplacian of Gaussian
operator

Where is the edge?

Zero-crossings of bottom graph

Canny Algorithm
Gaussian convolution
Compute gradients via first derivative operators to
find magnitude and direction
Find pixels exhibiting the maxima of first derivative
gradient. i.e. find zero-crossings of the second
derivative, suppress other pixels
Run hysteresis

Cost of CPU version of Canny

Previous work
Reference CPU implementation: OpenCV computer vision
library.
Extremely fast.
Takes advantage of multicore and special processor instructions

Implementation on Parallel hardware:


Neoh & Hazanchuk, 2005: Edge Detection on FPGAs
Reference GPGPU implementation: Fung & Mann 2004, Fung
2005

J.-P Farrugia et al. GGPUCV


Filter part of the algorithm implemented in the NVIDIA SDK
(Podlozhnyuk, Image Convolution with CUDA)

First steps of Canny on GPU


Since these are done in the CUDA SDK will
not discuss much
Smoothing via the CUDA SDK
implementation
Coalesced read/write possible in one
direction due to natural ordering of image
pixels
Introduce a data structure with column apron
pixels placed in contiguous memory to allow
coalesced reads and writes for these

Smoothing Convolution filter


Non-separable Convolution Apron
Separable Convolution Apron

2 28 A pr on Pi xel s
7x7 Fil ter Wi n do w
16 x16 Th rea d B lock

192 A pro n Pi xel s

Gradient estimation

Filter 1

Sourc e Image
1

-1

=6

-2

-2

-2
-2
-2

-1

-2

-1

Interm ediate W indow

Filter 2

Sam ple Intermediate


Results

Window

Res ult
-5

Direction finding
Pixel Configurations

135 o

0o
45 o

90 o

45o
0o

90 o 135 o

= arctan(

Gy
)
Gx

At q, we have a maximum if the value is


larger than those at both p and at r.
Interpolate to get these values.

Non maxima suppression


Sample Gradient Magnitude Maxima

64

77

64

1 88

17 0

133

155

1 66

17 7

2 25

19 9

122

198

1 99

18 9

1 88

21 2

64

Gra di en t D ire ctio n

Ri dg e Pi xe ls
1 60
N on -ri dge Pixe ls

169

1 85

17 9

1 55

13 2

188

1 70

15 6

79

1 40

Hysteresis
Hysteresis Principles
2 thresholds technique for filtering edges
Low threshold t1
High threshold t2

Ridge pixels with magnitude m >= t2: Mark as


an edge pixel
t1 < m <t2 and adjacent to an edge pixel: Mark
as an edge pixel
Identifies strong edges and accounts for
comparatively weaker ones

Hysteresis Contd
Generalized Algorithm
Identify and mark all ridge pixels having
m >= t2 as visited edges
Add pixel into queue Q
Run a breadth first search (BFS) on Q
For each pixel i in Q
For each unvisited adjacent pixel j of i
Mark j as visited
If m(j) > t1, mark j as an edge and add j to Q
Remove i from Q

Terminate when Q is empty

Hysteresis Contd
CUDA Algorithm
Similar to the generalized algorithm
Multi-pass approach:
Preprocessing
Load central and apron gradient magnitude data from
global to shared memory
Set initial edge states

BFS
For each thread, run a BFS that marks potential edges
as definite edges

Write-back
Write all edge states to the gradient magnitude space
and reiterate

Hysteresis Contd
Preprocessing

For each thread-block


Load the central gradient magnitudes from
global to shared memory
Load a 1 unit wide apron from global to
shared memory
For each thread
Map the thread to unique central pixel i (nonapron)
Set pixel state as
S(mi) = {definite edge (-2): t2 <= mi
potential edge (-1): t1 <= mi < t2
non-edge (0): 0 < mi
(mi): mi <= 0}

Hysteresis Contd
Breadth First Search
For each thread with pixel i with edge state =
potential edge (-1)
If i has an adjacent pixel j with edge state = definite
edge (-2), add i to queue Qi
For each pixel k in Qi
Mark k as visited in
shared memory
Set edge state of k to
definite edge
For each unvisited
adjacent pixel l of k
Mark l as visited
If m(l) > t1, add l
to Qi

Remove k from Qi

Preprocessing

Breadth First Search

54

66

155

166

188

147

188

132

60

78

156

178

187

166

132

52

80

96

170

190

192

188

189

200

150

166

170

225

199

255

160

150

198

199

189

199

212

180

155

156

170

179

183

185

140

188

170

130

156

150

160

111

122

189

154

120

150

166

145

122

125

200

143

144

54

66

155

166

188

147

188

132

A1

B1

A2

A2

B2

B2

A1

A2

B1

A2

A2

B2

4
A3

60

-1

B2

A3

A2

B2

B2

A1

A2

B1

A2

A2

B2

B2

A2

B2

B2

A1

A2

B1

A2

A2

B2

B2

52

80

-1

-1

-1

-1

200

150

-2

-1

-2

198

-1

-1

-1

-1

-1

156

170

-1

130

A3

156

-1

120

150

166

145

122

125

200

143

144

A2

A3

B3

A2

B3

B4

Definite Edge

Non-edge

Potential Edge

Apron Pixel

t1 = 224
t 2 = 187

Write-back

Hysteresis
Contd
Iteration 1:
Preprocessing

Iteration 1:
BFS

Iteration 2:
Preprocessing

Iteration 2:
BFS

Non-edge

54

66

155

166

188

147

188

132

54

66

155

166

188

147

188

60

78

170

78

156

178

187

166

132

132
52

80

198

199

188

187

186

188

199

190

199

188

190

192

188

189

200

150

166

150

166

170

225

199

255

160

150

198

222

187

199

198

199

150

199

212

180

155

156

170

237

198

199

188

188

190

188

183

185

140

188

170

130

156

200

156

200

160

111

122

189

154

120

150

166

145

122

125

200

143

144

150

166

145

122

125

200

143

144

54

66

155

166

188

147

188

132

54

66

155

166

188

147

188

132

60

78

170

-2

52

80

-1

-1

-1

-1

-1

-1

199

190

-2

--2

-2

-2

-2

-2

200

150

166

150

-2

-2

-2

150

198

-2

-2

199

198

-1

-2

-2

-2

156

170

-2

-2

-2

-2

188

190

-1

-2

130

156

200

156

-1

-2

120

150

166

145

122

125

200

143

144

150

166

145

122

125

200

143

144

54

66

155

166

188

147

188

132

54

66

155

166

188

147

188

60

-2

52

80

-1

-1

-1

-1

-1

-1

-2

-1

-2

--2

-2

-2

-2

-2

200

150

198

-2

170

-2

-2

-2

-2

156

150

166

145

122

125

54

66

155

166

60

80

-2

-2

150

198

170

132

-2

-2

-2

150

-2

-1

-2

-2

-2

156

-1

-2

-1

-2

130

-1

-1

-2

120

200

143

144

150

166

145

122

125

200

143

144

188

147

188

132

54

66

155

166

188

147

188

132

-2

52

-2

-2

-2

-2

-2

-1

-2

--2

-2

-2

-2

-2

200

-2

-2

-2

150

-2

-2

-1

-2

-2

-2

-2

-2

156

-2

-2

-2

-2

-1

-2

-2

-2

130

156

-1

-2

-2

120

150

166

145

122

125

200

143

144

150

166

145

122

125

200

143

144

Potential-edge

Iteration 2
-2
-1
Preprocessing

Definite-edge

Apron Pixel

When all BFS from treads


terminate, write edge states
of central pixels from shared
memory to the gradient
magnitude space in global
memory

Multi-pass
Multiple calls to the entire
procedure allow data
migration
1 unit wide aprons allow
previous edge states to be
passed between threadblocks at the beginning of the
preprocessing stage
Bordering pixels along
thread-blocks that became
definite edges will be visible
to adjacent thread-blocks in
the following pass

profiling

Comparison with OpenCV

LENA

MANDRILL

Matlab
CUDA interoperability with Matlab via
compiled mex files (SDK example)
Adapted existing code to use
glCudaEdgeDetector(A,nChannels,threshold
Low,thresholdHigh,hysteresisIts,sigma);

More substantial speedups

Particle in Cell Methods


Arise in many areas
Fluid Mechanics
Plasma simulations

Ingredients
Field Equation
Particle evolution equation

Particles may be
Real particles (electrons, ions, bubbles, solid particles)
Point samples of a continuous function

General scheme
Separation of length and time scales
Field evolves at a macroscopic scale and particles contribute to
this evolution in an integral sense
Particles evolve and are influenced by local field, and possibly
local particle collisions

Particle Equations

Maxwells equations for E and B

How Does PIC work?


q
pp Bp )
(
m
 Solve Maxwell Equations on grid
pi := ip (zero self-force)
 Lorentz-Force: Fp = qE p +

qi = ip q p

Dual
DualGrid
GridCell
Cell

qi

E p = pi Ei

( Ei , B i )

(E

qp

,Bp )

Grid-Point
Grid-PointCharge
Charge

Charge Assignment

Force Interpolation

Basis of PIC plasma simulation


Acceleration and
increment of velocity

Calculation of forces
from fields and
velocity

Computation of electric
field. Magnetic is given.

Initial step
with =0

Displacements and new


positions. Boundary
conditions.

Calculation of
density. Current
profile fixed.

Resolution of Poisson
equation for the
electrostatic potential.

Major computational tasks


Interpolation of fields forces to particles
Evolving particles
Interpolating particle properties to grid for
next step
Major issues coalesced reads and writes
as particles move
Fast solution of field equations
Spectral methods
Finite-difference

You might also like