You are on page 1of 247

3D Computer Vision

Radim Sara Martin Matousek

Center for Machine Perception


Department of Cybernetics
Faculty of Electrical Engineering
Czech Technical University in Prague

https://cw.felk.cvut.cz/doku.php/courses/a4m33tdv/
http://cmp.felk.cvut.cz
mailto:sara@cmp.felk.cvut.cz
phone ext. 7203

rev. January 7, 2014

Open Informatics Masters Course


Course Overview: Lectures

Programme:
1. how to do things right in 3D vision cookbook of effective methods, pitfall avoidance
2. things useful beyond CV task formulation exercise, powerful robust optimization methods

Content:
Elements of projective geometry
Perspective camera
Geometric problems in 3D vision
Epipolar geometry (and motion-induced homography, TBD)
Optimization methods in 3D vision
3D structure and camera motion from (many) images
Stereoscopic vision
Shape from reflectance

3D Computer Vision: I. (p. 9/211) R. Sara, CMP; rev. 7Jan2014


Background

I Important Material Covered Elsewhere


Homography as a multiview model [H&Z Secs: 2.5, 2.3, 2.4, 4.1, 4.2, A6.1, A6.2]
Sparse image matching using RANSAC [H&Z Sec. 4.7]

Necessary Prior Knowledge


basic geometry
elementary linear algebra
vectors, matrices, bases, linear systems of equations, eigensystem, decompositions incl. SVD
elementary optimization in continuous domain
quadratic problems, constrained optimization, gradient descend, Newton method
the basics of Bayesian thinking prior, posterior, likelihood, evidence

3D Computer Vision: I. (p. 10/211) R. Sara, CMP; rev. 7Jan2014


IReading

Annotated Slides this is the reference material not static, evolving over the years
for deeper digs into geometry, see the GVG lecture notes at
https://cw.felk.cvut.cz/doku.php/courses/a4m33gvg/start
there is a Czech-English and English-Czech dictionary for this course

The Book, it will be referenced as [H&Z]

Hartley, R. and Zisserman, A. Multiple View Geometry in Computer Vision.


Cambridge University Press, 2nd ed, 2003. Secs. 2, 4, 6, 7, 9, 10, 11, 12.

you can borrow this book from the CMP library


contact Ms. Pavla Maresova, room G102, mailto:pavla.maresova@fel.cvut.cz
indicate you are a student of this course

The Stereo Paper, referenced as [SP]


How To Teach Stereoscopic Matching?
(Invited Paper)
Radim Sara

Sara, R. Stereoscopic Vision. 2010


Center for Machine Perception
Department of Cybernetics
Czech Technical University in Prague, Czech Republic
Email: sara@cmp.felk.cvut.cz

AbstractThis paper describes a simple but non-trivial semi- Besides the basic knowledge in computer science and
dense stereoscopic matching algorithm that could be taught algorithms, basics of probability theory and statistics, and
in Computer Vision courses. The description is meant to be introductory-level computer vision or image processing, it is
instructive and accessible to the student. The level of detail is
sufficient for a student to understand all aspects of the algorithm assumed that the reader is familiar with the concept of epipolar
design and to make his/her own modifications. The paper includes geometry and plane homography. The limited extent of this
the core parts of the algorithm in C code. We make the point paper required shortening some explanations.
of explaining the algorithm so that all steps are derived from The purpose of this paper is not to give a balanced state-
first principles in a clear and lucid way. A simple method is of-the-art overview. The presentation, however, does give
described that helps encourage the student to benchmark his/her
own improvements of the algorithm. references to relevant literature during the exposition of the
problem.
I. I NTRODUCTION
Teaching stereovision in computer vision classes is a difficult II. T HE BASIC S TRUCTURE OF S TEREOSCOPIC M ATCHING
task. There are very many algorithms described in the literature Stereoscopic matching is a method for computing disparity
that address various aspects of the problem. A good up-to-date maps from a set of images. In this paper we will only
comprehensive review is missing. It is thus very difficult for a consider pairs of cameras that are not co-located. We will

Czech: http://cmp.felk.cvut.cz/~sara/Teaching/TDV/SP-cz.pdf
student to acquire a balanced overview of the area in a short often call them the left and the right camera, respectively. A
time. For a non-expert researcher who wants to design and disparity map codes the shift in location of the corresponding
experiment with his/her own algorithm, reading the literature local image features induced by camera displacement. The
is often a frustrating exercise. direction of the shift is given by epipolar geometry of the two
This paper tries to introduce a well selected, simple, and cameras [1] and the magnitude of the shift, called disparity,
reasonably efficient algorithm. My selection is based on a is approximately inversely proportional to the camera-object
long-term experience with various matching algorithms and distance. Some disparity map examples are shown in Fig. 10.
on an experience with teaching a 3D computer vision class. I In these maps, disparity is coded by color, close objects are
believe that by studying this algorithm the student is gradually reddish, distant objects bluish, and pixels without disparity are
educated about the way researchers think about stereoscopic black. Such color-coding makes various kinds of errors clearly
matching. I tried to address all elementary components of the visible.
algorithm design, from occlusion modeling to matching task This section formalizes object occlusion so that we could
formulation and benchmarking. derive useful necessary constraints on disparity map correct-
An algorithm suitable for teaching stereo should meet a ness. We will first work with spatial points. The 3D space
number of requirements: in front of the two cameras is spanned by two pencils of
Simplicity: The algorithm must be non-trivial but easy to optical rays, one system per camera. The spatial points can
implement. be thought of as the intersections of the optical rays. They
Accessibility: The algorithm should be described in a complete will be called possible correspondences. The task of matching
and accessible form. is to accept some of the possible correspondences and reject

English: http://cmp.felk.cvut.cz/~sara/Teaching/TDV/SP-en.pdf
Performance: Performance must be reasonably good for the the others. Fig. 1 shows a part of a surface in the scene (black
basic implementation to be useful in practice. line segment) and a point p on the surface. Suppose p is
Education: The student should exercise formal problem for- visible in both cameras (by optical ray r1 in the left camera
mulation. Every part of the algorithm design must be well and by optical ray t1 in the right camera, both cameras are
justified and derivable from first principles. located above the scene). Then every point on ray r1 that
Development potential: The algorithm must possess potential lies in front of p must be transparent and each point of r1
for encouraging the student to do further development. behind p must be occluded. A similar observation holds for
Learnability: There should be a simple recommended method ray t1 . This shows that the decision on point acceptance and
for choosing the algorithm parameters and/or selecting their rejection are not independent: If the correspondence given
most suitable values for a given application. by point p is accepted, then a set of correspondences X(p)

3D Computer Vision: I. (p. 11/211) R. Sara, CMP; rev. 7Jan2014


Optional Reading
(available from Google Scholar or CTU Library)

M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting
with applications to image analysis and automated cartography. Communications of the ACM,
24(6):381395, 1981.

C. Harris and M. Stephens. A combined corner and edge detector. In Proc ALVEY Vision
Conf, pp. 147151, University of Manchester, England, 1988.

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of


Computer Vision, 60(2):91110, 2004.

H. Li and R. Hartley. Five-point motion estimation made easy. In Proc ICPR, pp. 630633,
2006.
B. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon. A comprehensive survey of bundle
adjustment in computer vision. In Proc Vision Algorithms: Theory and Practice, LNCS
1883:298372. Springer Verlag, 1999.

25 years of RANSAC. In CVPR 06 Workshop and Tutorial [on-line],


http://cmp.felk.cvut.cz/ransac-cvpr2006/. 2006.

G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press,
Baltimore, USA, 3rd edition, 1996.

3D Computer Vision: I. (p. 12/211) R. Sara, CMP; rev. 7Jan2014


On-line Resources

1. OpenCV (Open Source Computer Vision): A library of programming functions for real
time computer vision. [on-line]
http://opencv.org/

2. T. Pajdla, Z. Kukelova, M. Bujnak. Minimal problems in computer vision. [on-line]


http://cmp.felk.cvut.cz/minimal/
Last update Oct 22, 2009.

3. Rob Hess. SIFT Feature Detector. [on-line]


http://robwhess.github.io/opensift/.
Last update Oct 24, 2013.

4. Marco Zuliani. RANSAC Toolbox for Matlab. [on-line]


http://vision.ece.ucsb.edu/~zuliani/Code/Code.html.
Last update Oct 18, 2009

5. Manolis Lourakis. A Generic Sparse Bundle Adjustment C/C++ Package based on


the Levenberg-Marquardt Algorithm. [on-line]
http://www.ics.forth.gr/~lourakis/sba/.
Last update Jan 5, 2010.

3D Computer Vision: I. (p. 13/211) R. Sara, CMP; rev. 7Jan2014


INotes on Slide Style

I am using a consistent slide style:


main material is in black (like this)

remarks and notes are in small blue font typically flushed to the right like this

papers or books are referenced like this [H&Z, p. 100] or [Golub & van Loan 1996]
except H&Z or SP, references are to the list on Slide 12

most references are linked (clickable) in the PDF, the triple of icons on the
bottom helps you navigate back and forth, they are: back, find, forward
check the references above

each major part of the lecture starts with a slide listing the equivalent written material

slides containing examined material have a bold title with a marker I like this slide

I mandatory homework exercises are in small red font, after a circled asterisk
~ H1; 10pt: syntax: <problem ID in submission system>; <points>: explanation
deadline: Lecture Day + 2 weeks; unit penalty per 7 days; see the Submission System

non-mandatory homework exercises are in green


~ P1; 1pt: same syntax; deadline: end of semester

3D Computer Vision: I. (p. 14/211) R. Sara, CMP; rev. 7Jan2014


Part II

Perspective Camera

1 Basic Entities: Points, Lines

2 Homography: Mapping Acting on Points and Lines

3 Canonical Perspective Camera

4 Changing the Outer and Inner Reference Frames

5 Projection Matrix Decomposition

6 Anatomy of Linear Perspective Camera

7 Vanishing Points and Lines

8 Real Camera with Radial Distortion

covered by
[H&Z] Secs: 2.1, 2.2, 3.1, 6.1, 6.2, 8.6, 2.5, 7.4, Example: 2.19

3D Computer Vision: II. Perspective Camera (p. 15/211) R. Sara, CMP; rev. 7Jan2014
IBasic Geometric Entities, their Representation, and Notation

entities have names and representations


names and their components:

entity in 2-space in 3-space


point m = (u, v) X = (x, y, z)
line n O
plane ,
associated vector representations

  x
u
m= = [u, v]> , X = y , n
v
z
will also be written in an in-line form as m = (u, v), X = (x, y, z), etc.
vectors are always meant to be columns x Rn,1
associated homogeneous representations

m = [m1 , m2 , m3 ]> , X = [x1 , x2 , x3 , x4 ]> , n


in-line forms: m = (m1 , m2 , m3 ), X = (x1 , x2 , x3 , x4 ), etc.
matrices are Q Rm,n

3D Computer Vision: II. Perspective Camera (p. 16/211) R. Sara, CMP; rev. 7Jan2014
IImage Line

line in the plane au + bv + c = 0

corresponds to (homogeneous) vector n ' (a, b, c)

and the equivalence class for R, 6= 0 (a, b, c) ' (a, b, c)

the set of equivalence classes of vectors in R3 \ (0, 0, 0) forms the projective space P2
a set of rays
1
standard representative for finite n = (n1 , n2 , n3 ) is n, where =
n2 2
1 +n2
assuming n21 + n22 6= 0; 1 is the unit, usually 1 = 1
naming convention: a special entity is the Ideal Line (line at infinity)

n ' (0, 0, 1) (standard representative)

I may sometimes wrongly use = instead of ', help me chase the mistakes down

3D Computer Vision: II. Perspective Camera (p. 17/211) R. Sara, CMP; rev. 7Jan2014
IImage Point

Point m = (u, v) is incident on the line n = (a, b, c) iff this works both ways!

au + bv + c = 0

can be rewritten as (with scalar product): (u, v, 1 ) (a, b, c) = m> n = 0

point is also represented by a homogeneous vector m ' (u, v, 1 )

and the equivalence class for R, 6= 0 is (m1 , m2 , m3 ) = m ' m

1
standard representative for finite point m is m, where = m3
assuming m3 6= 0
when 1 = 1 then units are pixels and m = (u, v, 1)
when 1 = f then all components have a similar magnitude, f image diagonal
use 1 = 1 unless you know what you are doing;
all entities participating in a formula must be expressed in the same units
naming convention: Ideal Point (point at infinity) m ' (m1 , m2 , 0)
a proper member of P2
all such points lie on the ideal line n ' (0, 0, 1), ie. m>
n = 0

3D Computer Vision: II. Perspective Camera (p. 18/211) R. Sara, CMP; rev. 7Jan2014
ILine Intersection and Point Join

The point of intersection m of image lines n and n0 , n 6' n0 is


n
proof: If m = n n0 is the intersection point, it
must be incident on both lines. Indeed,
m ' n n0 m n> (n n0 ) n0> (n n0 ) 0
n | {z } | {z }
m m

The join n of two image points m and m0 , m 6' m0 is

n ' m m0

Paralel lines intersect at the line at infinity n ' (0, 0, 1)


a u + b v + c = 0,
a u + b v + d = 0, d 6= c
(a, b, c) (a, b, d) ' (b, a, 0)

all such intersections lie on n


line at infinity represents a set of directions in the plane

3D Computer Vision: II. Perspective Camera (p. 19/211) R. Sara, CMP; rev. 7Jan2014
IHomography

R3 elements of P2 Projective plane P2 : Vector space of dimension 3


x1 excluding the zero vector, R3 \ (0, 0, 0)
x2 but including points at infinity and the line at infinity
(0, 0, 0)
x3 factorized to linear equivalence classes (rays)

Collineation: Let x1 , x2 , x3 be collinear points in P2 (coplanar rays in R3 ). Bijection


h : P2 7 P2 is a collineation iff h(x1 ), h(x2 ), h(x3 ) are collinear in P2 (coplanar in R3 ).
bijection = 1:1, onto

collinear image points are mapped to collinear image points lines are mapped to lines
concurrent image lines are mapped to concurrent image lines bijection!
concurrent = intersecting at the same point
point-line incidence is preserved

mapping h : P2 P2 is a collineation iff there exists a non-singular 3 3 matrix H s.t.


h(x) ' H x for all x P2
homogeneous matrix representant: det H = 1
in this course we will use the term homography but mean collineation

3D Computer Vision: II. Perspective Camera (p. 20/211) R. Sara, CMP; rev. 7Jan2014
Some Homographic Tasters

Rectification of camera rotation: Slides 64 (geometry), 123 (homography estimation)

Homographic Mouse for Visual Odometry: Slide TBD

illustrations courtesy of AMSL Racing Team, Meiji University and LIBVISO: Library for VISual Odometry

3D Computer Vision: II. Perspective Camera (p. 21/211) R. Sara, CMP; rev. 7Jan2014
IMapping Points and Lines by Homography

m0 ' H m image point


0 >
n 'H n image line H> = (H1 )> = (H> )1

incidence is preserved: (m0 )> n0 ' m> H> H> n = m> n = 0

1. collineation has 8 DOF; it is given by 4 correspondences (points, lines) in a general position


2. extending pixel coordinates to homogeneous coordinates m = (u, v, 1 )
3. mapping by homography, eg. m0 = H m
4. conversion of the result m0 = (m01 , m02 , m03 ) to canonical coordinates (pixels):
m01 m02
u0 = 1, v0 = 1
m03 m03

5. can use the unity for the homogeneous coordinate on one side of the equation only!
3D Computer Vision: II. Perspective Camera (p. 22/211) R. Sara, CMP; rev. 7Jan2014
Elementary Decomposition of a Homography

Unique decompositions: H = HS HA HP (= H0P H0A H0S )


 
sR t
HS = similarity
0> 1
 
K 0
HA = > special affine
0 1
 
I 0
HP = > special projective
v w

K upper triangular matrix with positive diagonal entries


R orthogonal, R> R = I, det R = 1
s, w R, s > 0, w 6= 0
sRK + t v> w t
 
H=
v> w
must use slim QR decomposition, which is unique [Golub & van Loan 1996, Sec. 5.2.6]
HS , HA , HP are collineation subgroups
(eg. K = K1 K2 , K1 , I are all upper triangular with unit determinant, associativity holds)

3D Computer Vision: II. Perspective Camera (p. 23/211) R. Sara, CMP; rev. 7Jan2014
Homography Subgroups
group DOF matrix invariant properties

incidence, concurrency, colinearity,



h11 h12 h13
h21 cross-ratio, order of contact
projective 8 h22 h23 (intersection, tangency, inflection),
h31 h32 h33 tangent discontinuities and cusps.

all above plus: parallelism, ratio of


a11 a12 tx areas, ratio of lengths on parallel lines,
affine 6 a21 a22 ty linear combinations of vectors (e.g.
0 0 1 midpoints), line at infinity n (not
pointwise)


s cos s sin tx all above plus: ratio of lengths, angle,
similarity 4 s sin s cos ty the circular points I = (1, i, 0),
0 0 1 J = (1, i, 0).


cos sin tx
Euclidean 3 sin cos ty all above plus: length, area
0 0 1

[H&Z, R2, p. 44]


3D Computer Vision: II. Perspective Camera (p. 24/211) R. Sara, CMP; rev. 7Jan2014
ICanonical Perspective Camera (Pinhole Camera, Camera Obscura)

C z xp

x O

(x , y , 1)
y

X = (x, y, z)
1. right-handed canonical coordinate system 1 z 1
(x, y, z) C O
y 0

2. origin = center of projection C y


y
3. image plane at unit distance from C  X
4. optical axis O is perpendicular to
projected point in the natural image
5. principal point xp : intersection of O and coordinate system:
6. in this picture we are looking down the street
y0 y y x
7. perspective camera is given by C and = y0 = = , x0 =
1 1+z1 z z
3D Computer Vision: II. Perspective Camera (p. 25/211) R. Sara, CMP; rev. 7Jan2014
INatural and Canonical Image Coordinate Systems

projected point in canonical camera x



1 0 0 0
> > 1 > y
x0 y0 y
 x
1 = z
, z
, 1 = x, y, z ' 0 1 0 0
z = P0 X
z
0 0 1 0
| {z } 1
P0

projected point in scanned image notice the chimney!


(0; 0) u

C z xp
xp = (u 0 ; v0 )
x O

(u; v ) (x , y , 1)
y

X = (x, y, z)

x x

u = f + u0
f x + z u0

f 0 u0

1 0 0 0
z 1 y
y f y + z v0 ' 0 f v0 0 1 0 0
z = KP0 X = P X
v = f + v0 z
z z 0 0 1 0 0 1 0
1
calibration matrix K transforms canonical camera P0 to standard projective camera P
3D Computer Vision: II. Perspective Camera (p. 26/211) R. Sara, CMP; rev. 7Jan2014
IComputing with Perspective Camera Projection Matrix


x
x + fz u0

m1 f 0 u0 0
y z
m = m2 = 0 f v0 0
z ' y + f v0
z
m3 0 0 1 0
| {z } 1 f

m1 fx m2 fy
= + u0 = u, = + v0 = v when m3 6= 0
m3 z m3 z

f focal length converts length ratios to pixels, [f ] = px, f > 0


(u0 , v0 ) principal point in pixels

Perspective Camera:
1. dimension reduction since P R3,4
2. nonlinear unit change 1 7 1 z/f since m ' (x, y, z/f )
for convenience we use P11 = P22 = f rather than P33 = 1/f and the u0 , v0 in relative units
3. m3 = 0 represents points at infinity in image plane (z = 0)

3D Computer Vision: II. Perspective Camera (p. 27/211) R. Sara, CMP; rev. 7Jan2014
IChanging The Outer (World) Reference Frame

A transformation of a point from the world to camera Rt; cam


coordinate system:
F
world
Xc = R Xw + t
Fw
R camera rotation matrix world orientation in the camera coordinate frame
t camera translation vector world origin in the camera coordinate frame
      
Xc RXw + t R t Xw  
P Xc = KP0 = KP0 = KP0 > =K R t Xw
1 1 0 1 1
| {z }
T
P0 selects the first 3 rows of T and discards the last row

R is rotation, R> R = I, det R = +1 I R3,3 identity matrix

6 extrinsic parameters: 3 rotation angles (Euler theorem), 3 translation components


alternative, often used, camera representations
   
P = K R t = KR I C
C camera position in the world reference frame t = RC
r>
3 optical axis in the world reference frame third row of R: r3 = R1 [0, 0, 1]>
 
we can save some conversion and computation by noting that KR I C X = KR(X C)
3D Computer Vision: II. Perspective Camera (p. 28/211) R. Sara, CMP; rev. 7Jan2014
IChanging the Inner (Image) Reference Frame

The general form of calibration matrix K includes


digitization raster skew angle
pixel aspect ratio a

u f f cot u0
 K = 0 f /(a sin ) v0
v 0 0 1
a
1 units: [f ] = px, [u0 ] = px, [v0 ] = px, [a] = 1

(u0 ; v0 )
~ H1; 2pt: Verify this K; hints: u0 eu0 + v 0 ev0 = ueu + vev ,
boldface are basis vectors, K maps from an orthogonal system to
a skewed system [w0 u0 , w0 v 0 , w0 ]> = K[u, v, 1]> ; first skew then
sampling then shift by u0 , v0 deadline LD+2 wk

general finite perspective camera has 11 parameters:


5 intrinsic parameters: f , u0 , v0 , a, finite camera: det K 6= 0
6 extrinsic parameters: t, R(, , )
     
m ' PX, P= Q q =K R t = KR I C a recipe for filling P

Representation Theorem: The set of projection matrices P of finite projective cameras is isomorphic
to the set of homogeneous 3 4 matrices with the left hand 3 3 submatrix Q non-singular.
random
3D finiteVision:
Computer camera: Q=rand(3,3);
II. Perspective while
Camera (p. det(Q)==0,
29/211) Q=rand(3,3); end, R.
P=[Q,
Sara, rand(3,1)];
CMP; rev. 7Jan2014
IProjection Matrix Decomposition
     
P= Q q KR I C = K R t

Q R3,3 full rank (if finite perspective cam.)


K R3,3 upper triangular with positive diagonal entries
R R3,3 rotation: >
R R = I and det R = +1

1. C = Q1 q see next
2. RQ decomposition of Q = KR using three Givens rotations [H&Z, p. 579]
K = Q R32 R31 R21
| {z }
3. t = RC R1

Rij zeroes element ij in Q affecting only columns i and j and the sequence preserves previously
zeroed elements, e.g.

1 0 0 q33 q32
R32 = 0 c s, c2 + s2 = 1, gives c = q s= q
0 s c 2 + q2
q32 q 2 2
33 32 + q33

~ P1; 1pt: Multiply known matrices K, R and then decompose back; discuss numerical errors
RQ decomposition nonuniqueness: KR = KT1 TR, where T = diag(1, 1, 1) is also a
rotation, we must correct the result so that the diagonal elements of K are all positive
slim RQ decomposition
care must be taken to avoid overflow, see [Golub & van Loan 1996, sec. 5.2]
3D Computer Vision: II. Perspective Camera (p. 30/211) R. Sara, CMP; rev. 7Jan2014
RQ Decomposition Step
Q = Array@q, 83, 3<D;
R32 = 881, 0, 0<, 80, c, - s<, 80, s, c<<;
R32 MatrixForm

1 0 0
0 c -s
0 s c

MatrixForm
Q1 = Q.R32;

= Solve@8Q1@@3DD@@2DD 0, c ^ 2 + s^ 2 1<, 8c, s<D;


Q1
s1

. s1 Simplify MatrixForm
s1 = s1@@2DD
Q1

q@1, 1D c q@1, 2D + s q@1, 3D - s q@1, 2D + c q@1, 3D


q@2, 1D c q@2, 2D + s q@2, 3D - s q@2, 2D + c q@2, 3D
q@3, 1D c q@3, 2D + s q@3, 3D - s q@3, 2D + c q@3, 3D

:c >
q@3, 3D q@3, 2D
, s-
q@3, 2D2 + q@3, 3D2 q@3, 2D2 + q@3, 3D2

-q@1,3D q@3,2D+q@1,2D q@3,3D q@1,2D q@3,2D+q@1,3D q@3,3D


q@1, 1D
q@3,2D2 +q@3,3D2 q@3,2D2 +q@3,3D2
-q@2,3D q@3,2D+q@2,2D q@3,3D q@2,2D q@3,2D+q@2,3D q@3,3D
q@2, 1D
q@3,2D2 +q@3,3D2 q@3,2D2 +q@3,3D2

q@3, 1D 0 q@3, 2D2 + q@3, 3D2

3D Computer Vision: II. Perspective Camera (p. 31/211) R. Sara, CMP; rev. 7Jan2014
ICenter of Projection

Observation: finite P has a non-trivial right null-space rank 3 but 4 columns

Theorem
Let there be B 6= 0 s.t. P B = 0. Then B is equal to the projection center C (in world
coordinate frame).
Proof.
1. Consider spatial line A B (B is given). We can write A X()

X() ' A + B, R B?

2. it images to B = C?
PX() ' P A + P B = P A
the whole line images to a single point it must pass through the optical center of P
this holds for all choices of A the only common point of the lines is the C, i.e. B ' C
u
t
Hence  
 C
= Q C + q C = Q1 q

0 = PC = Q q
1
C = (cj ), where cj = (1)j det P(j) , in which P(j) is P with column j dropped
Matlab: C_homo = null(P); or C = -Q\q;
3D Computer Vision: II. Perspective Camera (p. 32/211) R. Sara, CMP; rev. 7Jan2014
IOptical Ray

Optical ray: Spatial line that projects to a single image point.


1. consider line (d unit line direction vector, kdk = 1, R)

X = C + d

2. the image of point X is


X
m
 
  X
m' Q q = Q(C + d) + q = Q d =
1

 
 d d
= Q q
0 
C
optical ray line corresponding to image point m is

X = C + (Q)1 m, R

optical ray may be represented by a point at infinity (d, 0)

3D Computer Vision: II. Perspective Camera (p. 33/211) R. Sara, CMP; rev. 7Jan2014
IOptical Axis

Optical axis: The line through C that is perpendicular to image plane


1. a line parallel to images to line at infinity in :
>
u q1 q14  
v ' q >
2 q24
X X
1 o
0 q>3 q34
C
2. point X in parallel to iff q>
3 X + q34 = 0

3. this is a plane with q3 as the normal vector
4. optical axis direction: substitution P 7 P must not change the direction
5. we select (assuming det(R) > 0)

o = det(Q) q3

if P 7 P then det(Q) 7 3 det(Q) and q3 7 q3 [H&Z, p. 161]

3D Computer Vision: II. Perspective Camera (p. 34/211) R. Sara, CMP; rev. 7Jan2014
IPrincipal Point

Principal point: The intersection of image plane and the optical axis
1. we take point at infinity on the optical axis that must
project to principal point m0
q3 m0
2. then  
C
 q3


m0 ' Q q = Q q3
0

principal point: m0 ' Q q3

principal point is also the center of radial distortion (see Slide 50)

3D Computer Vision: II. Perspective Camera (p. 35/211) R. Sara, CMP; rev. 7Jan2014
IOptical Plane

A spatial plane with normal p passing through optical center C and a given image line n.

optical ray given by m d = Q1 m

m
0
optical ray given by m d0 = Q1 m0

d p
n
X d0
C

m 0

 p = d d0 = (Q1 m) (Q1 m0 ) = Q> (m m0 ) = Q> n


note the factoring-out of Q!

hence, 0 = p> (X C) = n> Q(X C) = n> PX = (P> n)> X for every X in plane
see Slide 28

optical plane is given by n: ' P> n 1 x + 2 y + 3 z + 4 = 0


3D Computer Vision: II. Perspective Camera (p. 36/211) R. Sara, CMP; rev. 7Jan2014
Cross-Check: Optical Ray as Optical Plane Intersection

p0

m d p

n0 C

n


optical plane normal given by n p = Q> n


optical plane normal given by n0 p0 = Q> n0

d = p p0 = (Q> n) (Q> n0 ) = Q1 (n n0 ) = Q1 m

3D Computer Vision: II. Perspective Camera (p. 37/211) R. Sara, CMP; rev. 7Jan2014
ISummary: Optical Center, Ray, Axis, Plane

General finite camera


>
q1 q14
  >    
P= Q q = q2 q24 = K R t = KR I C

q>
3 q34

C ' rnull(P) optical center (world coords.)

d = Q1 m optical ray direction (world coords.)

det(Q) q3 outward optical axis (world coords.)

Q q3 principal point (in image plane)

= P> n optical plane (world coords.)



f f cot u0
K = 0 f /(a sin ) v0 camera (calibration) matrix (f , u0 , v0 in pixels)
0 0 1
R camera rotation matrix (cam coords.)
t camera translation vector (cam coords.)
3D Computer Vision: II. Perspective Camera (p. 38/211) R. Sara, CMP; rev. 7Jan2014
What Can We Do with An Uncalibrated Perspective Camera?

How far is the engine?

distance between sleepers 0.806m but we cannot count them, resolution is too low

We will review some life-saving theory. . .

3D Computer Vision: II. Perspective Camera (p. 39/211) R. Sara, CMP; rev. 7Jan2014
IVanishing Point

Vanishing point: the limit of the projection of a point that moves along a space line
infinitely in one direction. the image of the point at infinity on the line

X0 + d m1
d
d
X0 m C


 
X0 + d
m ' lim P = = Qd ~ P1; 1pt: Derive or prove
1
V.P. is independent on line position, it depends on its orientation only
all parallel lines have the same V.P.
the image of the V.P. of a spatial line with direction vector d is m = Q d
V.P. m corresponds to spatial direction d = Q1 m optical ray through m

V.P. is the image of a point at infinity on any line, not just the optical ray as on Slide 33

3D Computer Vision: II. Perspective Camera (p. 40/211) R. Sara, CMP; rev. 7Jan2014
Some Vanishing Point Applications

where is the sun? what is the wind direction? fly above the lane,
(must have video) at constant altitude!

3D Computer Vision: II. Perspective Camera (p. 41/211) R. Sara, CMP; rev. 7Jan2014
IVanishing Line

Vanishing line: The set of vanishing points of all lines in a plane


the image of the line at infinity in the plane
and in all parallel planes

n | plane normal
v1 v2

m | line orientation ve tor

V.L. n corresponds to space plane of normal vector p = Q> n


a space plane of normal vector p has a V.L. represented by n = Q> p.

3D Computer Vision: II. Perspective Camera (p. 42/211) R. Sara, CMP; rev. 7Jan2014
ICross Ratio

Four collinear space points R, S, T, U define cross-ratio


U
|RT | |SU | R S T U
[RST U ] =
|RU | |ST | T
u
a mnemonic
v t p
|RT | signed distance from R to T S
(w.r.t. a fixed line orientation) s
C
R r
1 n
[SRU T ] = [RST U ], [RSU T ] = , [RT SU ] = 1 [RST U ]
[RST U ] 
v
/n
N
| r, t, v | | s, u, v |
Obs: [RST U ] = , | r, t, v | = det[ r, t, v ] = (r t)> v (1)
| r, u, v | | s, t, v |

Corollaries:
cross ratio is invariant under collineations (homographies) x0 ' Hx plug Hx in (1)
cross ratio is invariant under perspective projection: [RST U ] = [ r s t u ]

4 collinear points: any perspective camera will see the same cross-ratio of their images
we measure the same cross-ratio in image as on the world line
one of the points R, S, T , U may be at infinity
3D Computer Vision: II. Perspective Camera (p. 43/211) R. Sara, CMP; rev. 7Jan2014
I1D Projective Coordinates

The 1-D projective coordinate of a point P is defined by the following cross-ratio:

|p pI | |p0 p|
[P ] = [P P0 PI P ] = [p p0 pI p] = = [p]
|p0 pI | |p p| p
naming convention: p
P0 the origin [P0 ] = 0
pI
PI the unit point [PI ] = 1
P the supporting point [P ] = p0

n
[P ] is equal to Euclidean coordinate along N n : N kN
[p] is its measurement in the image plane

Applications
Given the image of a line N , the origin, the unit point, and the vanishing point, then
the Euclidean coordinate of any point P N can be determined see Slide 45
Finding v.p. of a line through a regular object see Slide 46

3D Computer Vision: II. Perspective Camera (p. 44/211) R. Sara, CMP; rev. 7Jan2014
Application: Counting Steps

Namesti Miru underground station in Prague


p
p sup

p
por

p
ting
line

p1 detail around the vanishing point

p0

Result: [P ] = 214 steps (correct answer is 216 steps) 4Mpx camera

3D Computer Vision: II. Perspective Camera (p. 45/211) R. Sara, CMP; rev. 7Jan2014
Application: Finding the Horizon from Repetitions

p0

pI
p P0

PI

P
p1

in 3D: |P0 P | = 2|P0 PI | then [H&Z, p. 218]~ P1; 1pt: How high is the camera above the floor?
|P0 P | |p0 pI | |p0 p|
[P P0 PI P ] = =2 |p p0 | =
|P0 PI | |p0 p| 2|p0 pI |
could be applied to counting steps (Slide 45)
3D Computer Vision: II. Perspective Camera (p. 46/211) R. Sara, CMP; rev. 7Jan2014
Homework Problem

~ H2; 3pt: What is the ratio of heights of Building A to Building B?


expected: conceptual solution; use notation from this figure
deadline: +2 weeks
n

tA

u m
A tB
B h

fA
fB

Hints
1. what are the properties of line h connecting the top of Buiding B tB with the point m at which the
horizon is intersected with the line p joining the foots fA , fB of both buildings? [1 point]
2. how do we actually get the horizon n ? [1 point] (we do not see it directly, there are hills there)
3. what tool measures the length ratio? [formula = 1 point]
3D Computer Vision: II. Perspective Camera (p. 47/211) R. Sara, CMP; rev. 7Jan2014
2D Projective Coordinates

py

py

p
pyI
pI

p0 pxI px px

[Px ] = [Px P0 PxI Px ]


[Py ] = [Py P0 PyI Py ]

3D Computer Vision: II. Perspective Camera (p. 48/211) R. Sara, CMP; rev. 7Jan2014
Application: Measuring on the Floor (Wall, etc)

San Giovanni in Laterano, Rome

measuring distances on the floor in terms of tile units


what are the dimensions of the seal? Is it circular (assuming square tiles)?
needs no explicit camera calibration
because we can see the calibrating object (vanishing points)

3D Computer Vision: II. Perspective Camera (p. 49/211) R. Sara, CMP; rev. 7Jan2014
IReal Camera with Radial Distortion

image with no radial distortion an extreme case of radial distortion image undistorted by division model

distortion types

none ( = 0) barrel ( = 0.3) pincushion ( = 0.3)

3D Computer Vision: II. Perspective Camera (p. 50/211) R. Sara, CMP; rev. 7Jan2014
IThe Radial Distortion Mapping

yL
r y0 center of radial distortion (usually principal point)
yR yL linearly projected point
yR radially distorted point
radial distortion r maps yL to yR along the radial direction
y0
magnitude of the transfer depends on the radius kyL y0 k only

circles centered at y0 map to centered circles, lines incident on y0 map on themselves


the mapping r() can be scaled to a r() so that a particular circle Cn of radius n does
not scale
r() a r()
n
distortion inside Cn outside Cn r()
barrel expanding contracting
pincushion contracting expanding r() =

3D Computer Vision: II. Perspective Camera (p. 51/211) R. Sara, CMP; rev. 7Jan2014
IRadial Distortion Models

let z = y y0 non-homogeneous
yL we have zR = r(zL ) zL linear, zR distorted
Cn
but are often interested in yL = r 1 (yR )
yn yR
yn a no-distortion point on Cn : r(yn ) = yn
y0 zn = yn y0
yn : choose boundary point that preserves all image
content within the same image size
Cn
barrel distortion yn
arrows represent zR r(zL )

in pincushion in barrel

Division Model
1 z 2 zL
zL = kzR k2
zR and zR = , where z =
1
q
1 kzk2
kzn k2 1+ 1+ kzn k2

single parameter 1 < 1: > 0 barrel distortion, < 0 pincushion distortion


has an analytic inverse
models even some fish-eye lenses
3D Computer Vision: II. Perspective Camera (p. 52/211) R. Sara, CMP; rev. 7Jan2014
contd

Polynomial Model
D(zR ; zn , k)
zL = zR ,
1+ n
P
i=1 ki

kzR k
D(zR ; zn , k) = 1 + k1 2 + k2 4 + + kn 2n , = , k = (ki )
kzn k
e.g. ki 0 barrel distortion, ki 0 pincusion distortion, i = 1, . . . , n
better fit for n 3
no analytic inverse
may loose monotonicity without requiring equal sign in all ki
hard to calibrate (tends to overfit)

Zernike polynomials Ri0 are a better choice

R20 () = 22 1, R40 () = 64 62 + 1, R60 () = 206 304 + 122 1,


but must know the field of view of the lens in the image plane; since must satisfy 0 1
yn

3D Computer Vision: II. Perspective Camera (p. 53/211) R. Sara, CMP; rev. 7Jan2014
IReal and Linear Camera Models
 
A xL AK0 R t X (linear camera)

undistortion: xL = A r1 (A1 xR )
X      
K0 R t r A xR A r K0 R t X (real camera)
yL yR
perspective projection distortion scanning

f 0 0 f sf u0
K0 = 0 f 0 ideal calibration matrix AK0 = 0 af v0
0 0 1 0 0 1

1 s u0
A = 0 a v0 everything affecting radial distortion center, skew, aspect ratio
0 0 1
r radial distortion function (here, it includes conversion from/to
homogeneous representation!)

Notes
assumption: the principal point and the center of radial distortion coincide
f included in K0 to make radial distortion independent of focal length
A makes radial lens distortion an elliptic image distortion
it suffices in practice that r1 is an analytic function (r need not be)

3D Computer Vision: II. Perspective Camera (p. 54/211) R. Sara, CMP; rev. 7Jan2014
Calibrating Radial Distortion
radial distortion calibration includes at least 5 parameters: , u0 , v0 , s, a

1. detect a set of straight line segment images {si }n


i=1 from a calibration target

2. select a suitable set of k measurement points per segment how to select k?


3. define invariant radial transfer error per measurement point ei,j in segment i:
k2
X
e2i = e2i,j eg. line fit residual (closed form)
j=1

n
X
4. minimize total radial transfer error while preserving yn : arg min e2i
, u0 , v0 , s, a
i=1

line segments from real-world images requires segmentation to inliers/outliers


inliers = lines that are straight in reality
marginalisation over the hidden inlier/outlier label gives a robust error, e.g. Slide 111
2
!
e
i2
2i = log e 2 +t , t>0

direct optimization usually suffices but in general such optimization is unstable

3D Computer Vision: II. Perspective Camera (p. 55/211) R. Sara, CMP; rev. 7Jan2014
Example Calibrations

Low-resolution (VGA) wide field-of-view (130 ) camera


Camera 0: Error histogram.
Camera 0, im. 6: Reprojection errors (16x) 18 Cam 0
16 RMS [px] 0.33
14 max [px] 1.97
12 f [px] 94.59

% of data points
10 a [-] 1.0951
8 u0 [px] 243.26
6 v0 [px] 353.37
4 (poly) k1 0.8256
2 k2 0.2261
0
0 0.5 1 1.5 2 k3 1.2524
Error in pixels

4 Mpix consumer camera


Calibration errors
2.4 Radial distortion coefficient values
division model, RMS 0.1
2.2 division model, max division model,
0.08 polynomial model (h=2), k1
polynomial model (h=2), RMS
2 polynomial model (h=2), max

1.8
0.06 polynomial model (h=2), k
2
radial distortion
0.04
is slightly
Coefficient values []

1.6
0.02
Error [px]

1.4 0
dependend on
1.2 0.02
focal length
1 0.04

0.8 0.06

0.6 0.08

0.4 0.1
6 8 10 12 14 16 18 20 22 6 8 10 12 14 16 18 20 22
Focal length from EXIF data [mm] Focal length from EXIF data [mm]

3D Computer Vision: II. Perspective Camera (p. 56/211) R. Sara, CMP; rev. 7Jan2014
Part III

Computing with a Single Camera

9 Calibration: Internal Camera Parameters from Vanishing Points and Lines

10 Camera Resection: Projection Matrix from 6 Known Points

11 Exterior Orientation: Camera Rotation and Translation from 3 Known Points

covered by
[1] [H&Z] Secs: 8.6, 7.1, 22.1
[2] Fischler, M.A. and Bolles, R.C . Random Sample Consensus: A Paradigm for Model
Fitting with Applications to Image Analysis and Automated Cartography.
Communications of the ACM 24(6):381395, 1981
[3] [Golub & van Loan 1996, Sec. 2.5]

3D Computer Vision: III. Computing with a Single Camera (p. 57/211) R. Sara, CMP; rev. 7Jan2014
Obtaining Vanishing Points and Lines

orthogonal pairs can be collected from more images by camera rotation

vanishing line can be obtained without vanishing points (see Slide 46)

3D Computer Vision: III. Computing with a Single Camera (p. 58/211) R. Sara, CMP; rev. 7Jan2014
ICamera Calibration from Vanishing Points and Lines

Problem: Given finite vanishing points and/or vanishing lines, compute K

v3 n23 v2
di = Q1 vi , i = 1, 2, 3 Slide 40
>
pij = Q nij , i, j = 1, 2, 3, i 6= j Slide 36

d3 d2 Constraints
d1
1. orthogonal rays d1 d2 in space then
n31 n12 0 = d> > > 1
Q v2 = v> > 1
1 d2 = v1 Q 1 (KK ) v2
| {z }
(IAC)
2. orthogonal planes pij pik in space

0 = p> > > >


ij pik = nij QQ nik = nij
1
nik
v1
3. orthogonal ray and plane dk k pij , k 6= i, j normal parallel to optical ray
> 1 >
pij ' dk Q nij = Q vk nij = Q Q1 vk = vk , 6= 0

nij may be constructed from non-orthogonal vi and vj , e.g. using the cross-ratio
is a symmetric, positive definite 3 3 matrix IAC = Image of Absolute Conic

3D Computer Vision: III. Computing with a Single Camera (p. 59/211) R. Sara, CMP; rev. 7Jan2014
Icontd

condition constraint # constraints

(2) orthogonal v.p. v>


i vj = 0 1

(3) orthogonal v.l. n>


ij 1
nik = 0 1

(4) v.p. orthogonal to v.l. nij = vk 2

(5) orthogonal raster = /2 12 = 21 = 0 1

(6) unit aspect a = 1 when = /2 11 = 22 1

(7) known principal point u0 = v0 = 0 13 = 31 = 23 = 32 = 0 2

these are homogeneous linear equations for the 5 parameters in in the form Dw = 0
can be eliminated from (4)
we will come to solving overdetermined homogeneous equations later Slide ??
we need at least 5 constraints for full symmetric 3 3
1 >
we get K from = KK by Choleski decomposition
the decomposition returns a positive definite upper triangular matrix
one avoids solving a set of quadratic equations for the parameters in K

3D Computer Vision: III. Computing with a Single Camera (p. 60/211) R. Sara, CMP; rev. 7Jan2014
In[1]:= K  f, s, u0, 0, a  f, v0, 0, 0, 1;
K  MatrixForm
Out[2]//MatrixForm=
f s u0
0 a f v0
0 0 1
In[4]:=  InverseK.TransposeK  DetK ^ 2;
 Simplify  MatrixForm
Out[5]//MatrixForm=
a2 f2 a f s a f  a f u0  s v0
a f s f2  s2 a f s u0  f2  s2  v0
a f  a f u0  s v0 a f s u0  f2  s2  v0 a2 f2 f2  u02   2 a f s u0 v0  f2  s2  v02

In[8]:=  f ^ 2 . s  0  Simplify  MatrixForm


Out[8]//MatrixForm=
a2 0  a2 u0
0 1  v0
 a2 u0  v0 a2 f2  u02   v02

In[10]:= . u0  0, v0  0  MatrixForm


Out[10]//MatrixForm=
a2 f2  a f s 0
 a f s f2  s2 0
0 0 a2 f4
In[17]:=  f ^ 2 . a  1, s  0  Simplify  MatrixForm
Out[17]//MatrixForm=
1 0  u0
0 1  v0
 u0  v0 f2  u02  v02
3D Computer Vision: III. Computing with a Single Camera (p. 61/211) R. Sara, CMP; rev. 7Jan2014
Examples

Assuming orthogonal raster, unit aspect (ORUA): = /2, a = 1

Ex 1:
Assuming ORUA and known m0 = (u0 , v0 ), two finite orthogonal vanishing points give f
in this formula, vi , m0 are not homogeneous!

f 2 = (v1 m0 )> (v2 m0 )


Ex 2:
v>
i vj
Non-orthogonal vanishing points vi , vj , known angle : cos = p q
v>
i vi v>j vj

leads to polynomial equations



1 0 u0
1
= 2 0 1 v0
f
u0 v0 f 2 + u20 + v02

ORUA and u0 = v0 = 0 gives

(f 2 + vi> vj )2 = (f 2 + kvi k2 ) (f 2 + kvj k2 ) cos2

3D Computer Vision: III. Computing with a Single Camera (p. 62/211) R. Sara, CMP; rev. 7Jan2014
ICamera Orientation from Two Finite Vanishing Points

Problem: Given K and two vanishing points corresponding to two known orthogonal
directions d1 , d2 , compute camera orientation R with respect to the plane.
v2
coordinate system choice, e.g.:

d1 = (1, 0, 0), d2 = (0, 1, 0)

we know that

di ' Q1 vi = (KR)1 vi = R1 K1 vi
| {z } d2 v1
wi .
Rdi ' wi
d1
then wi /kwi k is the ith column ri of R
some suitable scenes
the third column is orthogonal:
r3 = r1 r2
h i
w w2 w1 w2
R = kw11 k kw2 k kw1 w2 k

3D Computer Vision: III. Computing with a Single Camera (p. 63/211) R. Sara, CMP; rev. 7Jan2014
Application: Planar Rectification

Principle: Rotate camera parallel to the plane of interest.

m0 ' K I
   
m ' KR I C X C X

m0 ' K(KR)1 m = KR> K1 m = H m

H is the rectifying homography


both K and R can be calibrated from two finite vanishing points
not possible when one (or both) of them are infinite

3D Computer Vision: III. Computing with a Single Camera (p. 64/211) R. Sara, CMP; rev. 7Jan2014
ICamera Resection

Camera calibration and orientation from a known set of k 6 reference points and their
images {(Xi , mi )}6i=1 .
X i

P Xi is considered exact
z
mi is a measurement
m^i e2i = kmi mi k2
ei
where mi ' PXi
m i

projection error calibration target with translation stage

1,13
6,12 4,12 2,12 0,12 2,12 140

7,11 5,11 3,11 1,11 1,11 3,11 120

8,10 6,10 4,10 2,10 0,10 2,10 4,10


100

9,9 7,9 5,9 3,9 1,9 1,9 3,9


80

10,8 8,8 6,8 4,8 2,8 0,8 2,8 4,8


9,7 7,7 5,7 3,7 1,7 1,7 3,7
60

10,6 8,6 6,6 4,6 2,6 0,6 2,6 4,6 40

9,5 7,5 5,5 3,5 1,5 1,5 3,5 20

8,4 6,4 4,4 2,4 0,4 2,4 4,4


0

7,3 5,3 3,3 1,3 1,3 3,3


ke 5
y
1 6
2 7
3 8
4

6,2 4,2 2,2 0,2 2,2 -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120

automatic calibration point detection calibration chart


3D Computer Vision: III. Computing with a Single Camera (p. 65/211) R. Sara, CMP; rev. 7Jan2014
IThe Minimal Problem for Camera Resection
 k
Problem: Given k = 6 corresponding pairs (Xi , mi ) i=1 , find P

q>

1 q14 Xi = (xi , yi , zi , 1), i = 1, 2, . . . , k, k = 6
i mi = PXi , P = q> q24
2
i R, i 6= 0

mi = (ui , vi , 1),
q>
3 q34
easy to modify for infinite points Xi
expanded: i ui = q>
1 Xi + q14 , i vi = q>
2 Xi + q24 , i = q>
3 Xi + q34

after eliminating : (q> >


3 Xi + q34 )ui = q1 Xi + q14 , (q> >
3 Xi + q34 )vi = q2 Xi + q24

Then
q
X> 1 0> 0 u1 X> u1

1 1 1
0> 0 X>
1 1 v1 X>
1 v1 q14

. .. q2 = 0

A q = ..
. q24
(8)

X> 1 0 >
0 uk X> uk q3

k k
0> 0 X>k 1 vk X>
k vk q34
we need 11 indepedent parameters for P
A R2k,12 , q R12
6 points in a general position give rank A = 12 and there is no non-trivial null space
drop one row to get rank 11 matrix, then the basis of the null space of A gives q
3D Computer Vision: III. Computing with a Single Camera (p. 66/211) R. Sara, CMP; rev. 7Jan2014
IThe Jack-Knife Solution for k = 6
given the 6 correspondences, we have 12 equations for the 11 parameters
can we use all the information present in the 6 points?

Jack-knife estimation
1. n := 0
2. for i = 1, 2, . . . , 2k do
a. delete i-th row from A, this gives Ai
b. if dim null Ai > 1 continue with the next i
c. n := n + 1
d. compute the right null-space qi of Ai e.g. by economy-size SVD
e. qi := qi normalized by q11 and dimension-reduced assuming finite camera with P3,3 = 1
3. from all n vectors qi collected in Step 1d compute
n n
1X n1 X
q= qi , var[q] = diag (qi q)(qi q)> regular for n 11
n i=1 n i1

have a solution + an error estimate, per individual elements of P


at least 5 points must be in a general position see Slide 68
large error indicates near degeneracy
computation not efficient with k > 6 points, needs 2k

11 draws, e.g. k = 7 364 draws
better error estimation method: decompose Pi to Ki , Ri , ti (Slide 30), represent Ri with 3 parameters
(e.g. Euler angles, or in Cayley representation, see Slide 139) and compute the errors for the parameters

3D Computer Vision: III. Computing with a Single Camera (p. 67/211) R. Sara, CMP; rev. 7Jan2014
IDegenerate (Critical) Configurations for Camera Resection

Let X = {Xi ; i = 1, . . .} be a set of points and P1 6' P2 be two regular (rank-3) cameras.
Then two configurations (P1 , X ) and (P2 , X ) are image-equivalent if
P1 Xi ' P2 Xi for all Xi X

C
C1
if all calibration points Xi X lie on a plane then
C2 camera resection is non-unique and all image-equivalent
camera centers lie on a spatial line C with the C = C
{ excluded
C1 this also means we cannot resect if all Xi are infinite
by adding points Xi X to C we gain nothing
there are additional image-equivalent configurations, see
Case 4 next
see proof sketch in the notes or in [H&Z, Sec. 22.1.2]

Note that if Q, T are suitable homographies then P1 ' QP0 T, where P0 is canonical and the
analysic can be made with
P0 TXi ' P2 TXi for all Yi Y
| {z } | {z }
Yi Yi

3D Computer Vision: III. Computing with a Single Camera (p. 68/211) R. Sara, CMP; rev. 7Jan2014
contd (all cases)
C C cameras C1 , C2 co-located at point C
points on three optical rays or one optical ray
and one optical plane
Case 5: we see 3 isolated point images
Case 5 Case 6 Case 6: we see a line of points and an isolated point
C1 C
C1
C
C2 C2 0 }
cameras lie on a line C \ {C , C

C1 { points lie on C and


C1 0
1. on two lines meeting C at C , C
C 0
1
2. or on a plane meeting C at C

Case 3: we see 2 lines of points


Case 3 Case 4

C1 C2 cameras lie on a planar conic C \ {C }


C not necessarily an ellipse
Case 2 points lie on C and an additional line meeting the
conic at C
C1
Case 2: we see 2 lines of points

C2 cameras and points all lie on a twisted cubic C


Case 1
C1 C
Case 1: we see a conic
3D Computer Vision: III. Computing with a Single Camera (p. 69/211) R. Sara, CMP; rev. 7Jan2014
IThree-Point Exterior Orientation Problem (P3P)

Calibrated camera rotation and translation from Perspective images of 3 reference Points.
 3
Problem: Given K and three corresponding pairs (mi , Xi ) i=1 , find R, C by solving
i mi = KR (Xi C), i = 1, 2, 3
def
1. Transform vi = K1 mi . Then configuration w/o rotation
i vi = R (Xi C). (9) C
v3
2. Eliminate R by taking rotation preserves length: kRxk = kxk v1
def
|i | kvi k = kXi Ck = zi (10) v2
z1
X3
3. Consider only angles among vi and apply Cosine Law per
triangle (C, Xi , Xj ) i, j = 1, 2, 3, i 6= j X1 z2
d12
d2ij = zi2 + zj2 2 zi zj cij ,
X2
zi = kXi Ck, dij = kXj Xi k, cij = cos(vi vj )
4. Solve system of 3 quadratic eqs in 3 unknowns zi [Fischler & Bolles, 1981]
there may be no real root; there are up to 4 solutions that cannot be ignored
(verify on additional points)
5. Compute C by trilateration (3-sphere intersection) from Xi and zi ; then i from (10) and R
from (9)
Similar problems (P4P with unknown f ) at http://cmp.felk.cvut.cz/minimal/ (with code)
3D Computer Vision: III. Computing with a Single Camera (p. 70/211) R. Sara, CMP; rev. 7Jan2014
Degenerate (Critical) Configurations for Exterior Orientation

unstable solution
C center of projection C located on the orthogonal circular
cylinder with base circumscribing the three points Xi
unstable: a small change of Xi results in a large change of C
X3 can be detected by error propagation

degenerate
X1 X2 camera C is coplanar with points (X1 , X2 , X3 ) but is not
on the circumscribed circle of (X1 , X2 , X3 ) we see a line

X3 C no solution

X1 X2 1. C cocyclic with (X1 , X2 , X3 ) we see a line

additional critical configurations depend on the method to solve the quadratic


equations

[Haralick et al. IJCV 1994]


3D Computer Vision: III. Computing with a Single Camera (p. 71/211) R. Sara, CMP; rev. 7Jan2014
IPopulating A Little ZOO of Minimal Geometric Problems in CV

problem given unknown slide


 6
camera resection 6 worldimg correspondences (Xi , mi ) i=1 P 66
 3
exterior orientation K, 3 worldimg correspondences (Xi , mi ) i=1 R, C 70

camera resection and exterior orientation are similar problems in a sense:


we do resectioning when our camera is uncalibrated
we do orientation when our camera is calibrated

more problems to come

3D Computer Vision: III. Computing with a Single Camera (p. 72/211) R. Sara, CMP; rev. 7Jan2014
Part IV

Computing with a Camera Pair

12 Camera Motions Inducing Epipolar Geometry

13 Estimating Fundamental Matrix from 7 Correspondences

14 Estimating Essential Matrix from 5 Correspondences

15 Triangulation: 3D Point Position from a Pair of Corresponding Points

covered by
[1] [H&Z] Secs: 9.1, 9.2, 9.6, 11.1, 11.2, 11.9, 12.2, 12.3, 12.5.1
[2] H. Li and R. Hartley. Five-point motion estimation made easy. In Proc ICPR 2006, pp. 630633

additional references
H. Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Nature, 293
(5828):133135, 1981.

3D Computer Vision: IV. Computing with a Camera Pair (p. 73/211) R. Sara, CMP; rev. 7Jan2014
IGeometric Model of a Camera Pair

Epipolar geometry:
brings constraints necessary for inter-image matching
its parametric form encapsulates information about the relative pose of two cameras

X Description
baseline b joins projection centers C1 , C2
d1 " d2
b = C2 C1
epipole ei i is the image of Cj :
m1 1 2 m2 e1 ' P1 C2 , e2 ' P2 C1
l1 l2
b li i is the image of epipolar plane
C1 e1 e2 C2 = (C2 , X, C1 )
lj is the epipolar line in image j induced
two-camera setup by mi in image i

Epipolar constraint: d2 , b, d1 are coplanar a necessary condition, see also Slide 86

3D Computer Vision: IV. Computing with a Camera Pair (p. 74/211) R. Sara, CMP; rev. 7Jan2014
Epipolar Geometry Example: Forward Motion

18

6 18

13
10
12
3
916 13
10
3
12
5211 7
17
8 52916
11 7
8
17
414 4
14
15
15
1 1

image 1 image 2
red: correspondences (clock to see their IDs)
green: epipolar line pairs per correspondence (same ID in both images)

How high was the camera above the floor?

h =?
movement

2 1
3D Computer Vision: IV. Computing with a Camera Pair (p. 75/211) R. Sara, CMP; rev. 7Jan2014
ICross Products and Maps by Skew-Symmetric 3 3 Matrices

There is an equivalence b m = [b] m, where [b] is a 3 3 skew symmetric matrix



0 b3 b2 b1
[b] = b3 0 b1 , assuming b = b2
b2 b1 0 b3
Some properties
1. [b]>
= [b] the general antisymmetry property
2. [b]3 2
= kbk [b]

3. k[b] kF = 2 kbk Frobenius norm (kAk2F = |aij |2 )
P
i,j

4. [b] b = 0
5. rank [b] = 2 iff kbk > 0 check minors of [b]
>
6. for any regular B: B [Bz] B = det B [z] follows from the factoring on Slide 36
> >
7. special case: if RR = I then [Rb] = R [b] R
8. if Rb is rotation about b then [Rb b] = [b]
9. note [b] is not a homography it is singular
2
10. if n is a line and A skew symmetric then A n ' n
3D Computer Vision: IV. Computing with a Camera Pair (p. 76/211) R. Sara, CMP; rev. 7Jan2014
IExpressing Epipolar Constraint Algebraically
X
   
d1 " d2 Pi = Qi qi = Ki Ri ti , i = 1, 2
p"
m1 1 2 m2 R21 relative camera rotation, R21 = R2 R>
l2
1
l1
b t21 relative camera translation, t21 = t2 R21 t1 = R2 b
C1 e1 e2 C2
remember: C = Q1 q = R> t (Slides 30 and 32)

0 = d> 1 >
Q> > > > >
Q> > 
2 p ' (Q2 m2 ) 1 l1 = m2 Q2 Q1 (e1 m1 ) = m2 2 Q1 [e1 ] m1
|{z} | {z } | {z } | {z } | {z }
normal of optical ray optical plane image of in 2 fundamental matrix F

Epipolar constraint m>


2 F m1 = 0 is a point-line incidence constraint

point m2 is incident on epipolar line l2 ' Fm1 Fe1 = F> e2 = 0 (non-trivially)


point m1 is incident on epipolar line l1 ' F> m2 all epipolars meet at the epipole

e1 ' Q1 C2 + q1 = Q1 C2 Q1 C1 = K1 R1 b
~1
F = Q> > > > >
2 Q1 [e1 ] = Q2 Q1 [K1 R1 b] = ' K2 [t21 ] R21 K1
1
Slide 76
| {z }
essential matrix E
E = [t21 ] R21 = [R2 b] R21 = R21 [R1 b]
| {z } | {z }
baseline in Cam 2 baseline in Cam 1

3D Computer Vision: IV. Computing with a Camera Pair (p. 77/211) R. Sara, CMP; rev. 7Jan2014
IKey Properties of Fundamental Matrix

F = K>
2 [t21 ] R21 K1
1
| {z }
essential matrix E

1. E captures relative camera pose only [Longuet-Higgins 1981]


(the change of the world coordinate system does not change E)
 
 0 0
   R t  
Ri ti = Ri ti > = Ri R Ri t + ti ,
0 1
then
>
R021 = R02 R01 = = R21
t021 = t02 R021 t01 = = t21
2. the translation length t21 is lost since E is homogeneous
3. e2 (e2 Fm1 ) ' Fm1 , in general F ' [e2 ]2a 2b
F [e1 ] for any a, b N

follows from Property 10 on Slide 76


e2 Fm1 l2 Fm1
also by point/line transmutation (left)
e2 point e2 (black) does not lie on line e2 (red, dashed):
e2 e>
2 e2 6= 0

3D Computer Vision: IV. Computing with a Camera Pair (p. 78/211) R. Sara, CMP; rev. 7Jan2014
ISome Mappings by the Fundamental Matrix

Fm1
0 = m>
2 F m1
m1 1 2 m2 e1 ' null(F), e2 ' null(F> )
l1
l2
l2 = Fm1 l1 = F> m2
e1 e2
l2 = F[e1 ] l1 l1 = F> [e2 ] l2

m1 m>2 Fm1 = 0 m2
proof of l2 ' F [e1 ] l1 : line/point transmutation
l2 ' F x ' F(k l1 ) = F [k] l1 = F [e1 ] l1
F F>
k = e1
def

Q2 > Q>1 ' F [e1  l1


x
e1 / k: k> e1 = ke1 k2 6= 0
x 6= e1 , e1
l2 l1

Q1 > Q>2 ' F> e2 


[

3D Computer Vision: IV. Computing with a Camera Pair (p. 79/211) R. Sara, CMP; rev. 7Jan2014
ISVD Decomposition of 3 3 Skew-Symmetric Matrices

Every real skew symmetric 3 3 matrix A can be written as A = U S U> , where U is


orthogonal and
0 1 0
S = 1 0 0 , R.
0 0 0
Note that
1 0 0 0 1 0
S = 0 1 0 1 0 0 . (11)
0 0 0 0 0 1
| {z }| {z }
D W
> >
Then A = USU = UDWU and the SVD of A is

0 0
A = UDV> , where D = 0 0 and V> = WU> (12)
0 0 0

Note that W is a rotation about (0, 0, 1) by 90

3D Computer Vision: IV. Computing with a Camera Pair (p. 80/211) R. Sara, CMP; rev. 7Jan2014
IRepresentation Theorem for Essential Matrices

Theorem
Let E be a 3 3 matrix with SVD E = UDV>. Then E is essential iff D ' diag(1, 1, 0).

Assume D = diag(1, 1, 0), w.l.o.g. assume ktk = 1.


Proof.
Direct:
> >
Let E ' [t] R, let W be as in (11). Then E = UD WU
| {z R} = UDV is the SVD of E.
V>

Converse:
Let E = UDV> be the SVD of E. We can write E = UDWW> V> for some orthogonal
W that transforms D to an skew symmetric matrix S. The only such W are

0 0 0 1 0 0
W = 0 0 such that || = 1, hence S = 1 0 0 = 0 = [s]
0 0 1 0 0 0 1
> >
and E = USU> UW
| {z V } = [t] R has the form of an essential matrix. u
t
R

note: USU> = U[s] U> = [Us] = [u3 ] , where u3 is the last column of U. Slide 76
3D Computer Vision: IV. Computing with a Camera Pair (p. 81/211) R. Sara, CMP; rev. 7Jan2014
IEssential Matrix Decomposition

We are decomposing E to E = [t21 ] R21 [H&Z, sec. 9.6]

1. compute SVD of E = UDV> s.t. D = diag(1, 1, 0)


2. if det U < 0 transform it to U, do the same for V the overall sign is dropped
3. compute

0 0
R21 = U 0 0 V> , t21 = u3 , || = 1, 6= 0 (13)
0 0 1
| {z }
W
Notes
t21 recoverable up to scale and direction sign
the result for R21 is unique up to = 1 despite non-uniqueness of SVD
change of sign in W rotates the solution by 180 about t
R1 = UWV> , R2 = UW> V> T = R2 R> 1 = = U diag(1, 1, 1)U> which is
a rotation by 180 about u3 = t21 :

1 0 0 0
>
U diag(1, 1, 1)U u3 = U 0 1 0 0 = u3
0 0 1 1

4 solution sets for 4 sign combinations of , see next for geometric interpretation

3D Computer Vision: IV. Computing with a Camera Pair (p. 82/211) R. Sara, CMP; rev. 7Jan2014
IFour Solutions to Essential Matrix Decomposition

Transform the world coordinate system so that the origin is in Camera 2. Then t21 = b
and W rotates about the baseline b. Slide 77

C1 b C2 C1 C2

, , (twisted by W)

C1 C1
C2 C2

, (baseline reversal) , (combination of both)

chirality constraint: all 3D points are in front of both cameras


this singles-out the upper left case [H&Z, Sec. 9.6.3]
3D Computer Vision: IV. Computing with a Camera Pair (p. 83/211) R. Sara, CMP; rev. 7Jan2014
I7-Point Algorithm for Estimating Fundamental Matrix
Problem: Given a set {(xi , yi )}ki=1 of k = 7 correspondences, estimate f. m. F.
y>
i F xi = 0, i = 1, . . . , k, known: xi = (u1i , vi1 , 1), yi = (u2i , vi2 , 1)
terminology: correspondence = truth, later: match = algorithms result; hypothesised corresp.
Solution:
u11 u21 u11 v12 u11 u21 v11 v11 v12 v11 u21 v12

1
u12 u22 u12 v22 u12 u22 v21 v21 v22 v21 u22 v22 1
1 2
u13 v32 u13 u23 v31 v31 v32 v31 u23 v32

D = u3 u3 1 D Rk,9


. ..
.. .
u1k u2k u1k vk2 u1k u2k vk1 vk1 vk2 vk1 u2k vk2 1
>
f R9 ,

Df = 0, f = f11 f21 f31 ... f33 ,
for k = 7 we have a rank-deficient system, the null-space of D is 2-dimensional
but we know that det F = 0, hence
1. find a basis of the null space of D: F1 , F2 by SVD or QR factorization
2. get up to 3 real solutions for i from
det(F1 + (1 )F2 ) = 0 cubic equation in
3. get up to 3 fundamental matrices F = i F1 + (1 i )F2 (check rank F = 2)

the result may depend on image transformations


normalization improves conditioning Slide 91
this gives a good starting point for the full algorithm Slide 107
dealing with mismatches need not be a part of the 7-point algorithm Slide 108
3D Computer Vision: IV. Computing with a Camera Pair (p. 84/211) R. Sara, CMP; rev. 7Jan2014
IDegenerate Configurations for Fundamental Matrix Estimation

When is F not uniquely determined from any number of correspondences? [H&Z, Sec. 11.9]
1. when images are related by homography
a. camera centers coincide C1 = C2 : H = K2 R21 K1 1
b. camera moves but all 3D points lie in a plane (n, d): H = K2 (R21 t21 n> /d)K1
1

epipolar geometry is not defined


we do get an F from the 7-point algorithm but it is of the form of F = [s] H with s
arbitrary (nonzero)
l correspondence x y
y Hx
y is the image of x: y ' Hx
s this can be written as y l, l ' s Hx arbitrary s
> >
0 = y (s Hx) = y [s] Hx

2. both camera centers and all 3D points lie on a ruled quadric


hyperboloid of one sheet, cones, cylinders, two planes
there are 3 solutions for F

notes
estimation of E can deal with planes: [s] H = [s] (R21 t21 n> /d) has equal eigenvalues
iff s = t21 , the decomposition works (nonunique, as before) ~ P1; 1pt for a proof
a complete treatment with additional degenerate configurations in [H&Z, sec. 22.2]
a stronger epipolar constraint could reject some configurations
3D Computer Vision: IV. Computing with a Camera Pair (p. 85/211) R. Sara, CMP; rev. 7Jan2014
A Note on Oriented Epipolar Constraint
a tighter epipolar constraint preserves orientations
requires all points and cameras be on the same side of the plane at infinity

X
d1 " d2

m1 1 2 m2
l1 l2
b
C1 e1 e2 C2

e2 m2
+ F m1

notation: m
+ n means m = n, > 0

note that the constraint is not invariant to the change of either sign of mi
all 7 correspondence in 7-point alg. must have the same sign see later
this may help reject some wrong matches, see Slide 108 [Chum et al. 2004]
an even more tight constraint: scene points in front of both cameras expensive
this is called chirality constraint
3D Computer Vision: IV. Computing with a Camera Pair (p. 86/211) R. Sara, CMP; rev. 7Jan2014
I5-Point Algorithm for Relative Camera Orientation

Problem: Given {mi , m0i }5i=1 corresponding image points and calibration matrix K,
recover the camera motion R, t.
Obs:
1. E 8 numbers
2. R 3DOF, t we can recover 2DOF only, in total 5 DOF we need 3 constraints on E
3. E essential iff it has two equal singular values and the third is zero

This gives an equation system:


v> 0
i E vi = 0 5 linear constraints (v ' K1 m)
det E = 0 1 cubic constraint
1
EE> E tr(EE> )E = 0 9 cubic constraints, 2 independent
2
~ P1; 1pt: verify this equation from E = UDV> , D = diag(1, 1, 0)

1. estimate E by SVD from v> 0


i E vi = 0 by the null-space method,
2. this gives E = xE1 + yE2 + zE3 + E4
3. at most 10 (complex) solutions for x, y, z from the cubic constraints

when all 3D points lie on a plane: at most 2 solutions (twisted-pair) can be disambiguated in 3 views
or by chirality constraint (Slide 83) unless all 3D points are closer to one camera
6-point problem for unknown f [Kukelova et al. BMVC 2008]
resources at http://cmp.felk.cvut.cz/minimal/5_pt_relative.php
3D Computer Vision: IV. Computing with a Camera Pair (p. 87/211) R. Sara, CMP; rev. 7Jan2014
IThe Triangulation Problem
Problem: Given cameras P1 , P2 and a correspondence x y compute a 3D point X
projecting to x and y
1 2 i >
u u (p1 )
1 x = P1 X , 2 y = P2 X , 1
x = v , 2
y = v , Pi = (pi2 )>
1 1 (pi3 )>
Linear triangulation method
u1 (p13 )> X = (p11 )> X, u2 (p23 )> X = (p21 )> X,
v 1 (p13 )> X = (p12 )> X, v 2 (p23 )> X = (p22 )> X,
Gives 1 1 >
u (p3 ) (p11 )>

1 1 >
v (p3 ) (p12 )>

DX = 0, D= 2 2 >

2 >
, D R4,4 , X R4 (14)
u (p3 ) (p1 )
v 2 (p23 )> (p22 )>
back-projected rays will generally not intersect due to image error, see next
using Jack-knife (Slide 67) not recommended sensitive to small error
we will use SVD (Slide 89)
but the result will not be invariant to projective frame
replacing P1 7 P1 H, P2 7 P2 H does not always result in X 7 H1 X
the homogeneous form in (14) can represent points at infinity
3D Computer Vision: IV. Computing with a Camera Pair (p. 88/211) R. Sara, CMP; rev. 7Jan2014
IThe Least-Squares Triangulation by SVD

if D is full-rank we may minimize the algebraic least-squares error


2 (X) = kDXk2 s.t. kXk = 1, X R4
let Di be the i-th row of D, then
4
X 4
X 4
X
kDXk2 = (Di X)2 = X>D> >
i Di X = X Q X, where Q = D> >
i Di = D D R
4,4

i=1 i=1 i=1


4
X
we write the SVD of Q as Q = j2 uj u>
j , in which [Golub & van Loan 1996, Sec. 2.5]
j=1 (
> 0 if l 6= m
12 42 0 and ul um =
1 otherwise
then X = arg min q> Q q = u4
q,kqk=1

Proof. 4 4
X X
q> Q q = j2 q> uj u>
j q= j2 (u>
j q)
2

j=1 j=1

we have a sum of non-negative elements 0 (u> 2


j q) 1, let q = u4 + q s.t. q u4 , then

3
X
q> Q q = 42 + j2 (u> 2 2
j q) 4
j=1
3D Computer Vision: IV. Computing with a Camera Pair (p. 89/211) u
t R. Sara, CMP; rev. 7Jan2014
Icontd

if 4  3 , there is a unique solution X = u4 with residual error (D X)2 = 42


the quality (conditioning) of the solution may be expressed as q = 3 /4 (greater is better)

Matlab code for the least-squares solver:


[U,O,V] = svd(D);
X = V(:,end);
q = O(3,3)/O(4,4);

~ P1; 2pt: Why did we decompose D and not Q = D> D? Could we use QR decomposition
instead of SVD?

3D Computer Vision: IV. Computing with a Camera Pair (p. 90/211) R. Sara, CMP; rev. 7Jan2014
INumerical Conditioning
The equation DX = 0 in (14) may be ill-conditioned for
numerical computation, which results in a poor estimate for X.

Why: on a row of D there are big entries together with small


entries, e.g. of orders projection centers in mm, image points in px
3
103 106

10 0
3
0
3 10 103 106
10 0 103 106
3 3 6
0 10 10 10
Quick fix:
1. re-scale the problem by a regular diagonal conditioning matrix S R4,4

0 = D q = D S S1 q = D q

choose S to make the entries in D all smaller than unity in absolute value:
S = diag(103 , 103 , 103 , 106 ) S = diag(1./max(max(abs(D)),1))
2. solve for q as before
3. get the final solution as q = S q
when SVD is used in camera resection, conditioning is essential for success Slide 66
3D Computer Vision: IV. Computing with a Camera Pair (p. 91/211) R. Sara, CMP; rev. 7Jan2014
Algebraic Error vs Reprojection Error
algebraic error: from SVD Slide 90
2  2  2 
2 2 c > c > c > c >
X c c
= 4 = u (p3 ) X (p1 ) X + v (p3 ) X (p2 ) X
c=1

reprojection error 2
" !2 !2 #
2
X c (pc1 )> X c (pc2 )> X
e = u + v
c=1
(pc3 )> X (pc3 )> X
algebraic error zero reprojection error zero 4 = 0 non-trivial null space
epipolar constraint satisfied equivalent results
in general: minimizing algebraic error cheap but it gives inferior results
minimizing reprojection error expensive but it gives good results
the midpoint of the common perpendicular to both optical rays gives about 50% greater error in 3D
the gold standard method deferred to Slide 103
Ex: forward camera motion
error f /50 in image 2, orthogonal to epipolar plane
XT noiseless ground truth position
Xr reprojection error minimizer
Xa algebraic error minimizer
m measurement (mT with noise in v 2 )

3D Computer Vision: IV. Computing with a Camera Pair (p. 92/211) R. Sara, CMP; rev. 7Jan2014
Optimal Triangulation for the Geeks
detected image points x, y do not satisfy epipolar geometry exactly
as a result optical rays do not intersect in space, we must correct the image points to x, y first
X

l1 x l2
y^

x^ y
1 e1 2

1. given epipolar line l1 and l2 , l2 ' F[e1 ] l1 let the x, y be the closest points on l1 , l2
2. parameterize all possible l1 by
find after translating x, y to (0, 0, 1), rotating the epipoles to (1, 0, f1 ), (1, 0, f2 ), and
parameterising l1 = (0, , 1) (1, 0, f1 )
3. minimise the error
= arg min d2 x, l1 () + d2 y, l2 ()
 

the problem reduces to 6-th degree polynomial root finding, see [H&Z, Sec 12.5.2]
4. compute x, y and triangulate using the linear method on Slide 88

a fully optimal procedure requires error re-definition in order to get the most probable x, y

3D Computer Vision: IV. Computing with a Camera Pair (p. 93/211) R. Sara, CMP; rev. 7Jan2014
IWe Have Added to The ZOO

Continuation from Slide 72

problem given unknown slide


 6
camera resection 6 worldimg correspondences (Xi , mi ) i=1 P 66
 3
exterior orientation K, 3 worldimg correspondences (Xi , mi ) i=1 R, C 70
7
fundamental matrix 7 imgimg correspondences (mi , m0i ) i=1

F 84
5
relative orientation K, 5 imgimg correspondences (mi , m0i ) i=1

R, t 87

triangulation P1 , P2 , 1 imgimg correspondence (mi , m0i ) X 88


A bigger ZOO at http://cmp.felk.cvut.cz/minimal/
calibrated problems
have fewer degenerate configurations
can do with fewer points (good for geometry proposal generators Slide 116)

algebraic error optimization (with SVD) makes sense in camera resection and triangulation
only
but it is not the best method; we will now focus on optimizing optimally
3D Computer Vision: IV. Computing with a Camera Pair (p. 94/211) R. Sara, CMP; rev. 7Jan2014
Part V

Optimization for 3D Vision

16 The Concept of Error for Epipolar Geometry


17 Levenberg-Marquardts Iterative Optimization
18 The Correspondence Problem
19 Optimization by Random Sampling

covered by
[1] [H&Z] Secs: 11.4, 11.6, 4.7
[2] Fischler, M.A. and Bolles, R.C . Random Sample Consensus: A Paradigm for Model
Fitting with Applications to Image Analysis and Automated Cartography.
Communications of the ACM 24(6):381395, 1981
additional references
P. D. Sampson. Fitting conic sections to very scattered data: An iterative refinement of the Bookstein
algorithm. Computer Vision, Graphics, and Image Processing, 18:97108, 1982.

O. Chum, J. Matas, and J. Kittler. Locally optimized RANSAC. In Proc DAGM, LNCS 2781:236243.
Springer-Verlag, 2003.

O. Chum, T. Werner, and J. Matas. Epipolar geometry estimation via RANSAC benefits from the oriented
epipolar constraint. In Proc ICPR, vol 1:112115, 2004.

3D Computer Vision: V. Optimization for 3D Vision (p. 95/211) R. Sara, CMP; rev. 7Jan2014
IThe Concept of Error for Epipolar Geometry

Problem: Given at least 8 matched points xi yj in a general position, estimate the


most likely (or most probable) fundamental matrix F.
xi = (u1i , vi1 ), yi = (u2i , vi2 ), i = 1, 2, . . . , k, k8

xi F
yi
xi yi

image 1 image 2
 k
detected points xi , yi ; the set of matches is S = (xi , yi )
i=1
 k
corrected points xi , yi ; the set is S = (xi , yi )
i=1
corrected points satisfy the epipolar geometry exactly y> F xi = 0, i = 1, . . . , k
i
small correction is more probable
let e() be an error function (vector), e.g., per match i,
 
xi xi def
ei (xi , yi | xi , yi , F) = , kei ()k2 = e2i () = kxi xi k2 + kyi yi k2 (15)
yi yi

for calibrated cameras (unknown E) the solution is essentially the same


3D Computer Vision: V. Optimization for 3D Vision (p. 96/211) R. Sara, CMP; rev. 7Jan2014
Icontd

the total error (of all data) then is

k
X
L(S | S, F) = e2i (xi , yi | xi , yi , F)
i=1

and the optimization problem is

k
X
(S , F ) = arg min min e2i (xi , yi | xi , yi , F) (16)
F S
> i=1
rank F = 2 yi F xi = 0

Three approaches
they differ in how the corrected points xi , yi are obtained:
1. direct optimization of reprojection error over all variables S, F Slide 98

2. Sampson optimal correction = approximate minimization of L(S | S, F) over S followed


by minimization over F Slide 99

3. removing xi , yi altogether = marginalization of L(S, S | F) over S followed by


minimization over F

3D Computer Vision: V. Optimization for 3D Vision (p. 97/211) R. Sara, CMP; rev. 7Jan2014
Method 1: Geometric Error Optimization
we need to encode the constraints y F xi = 0, rank F = 2
i
idea: reconstruct 3D point via equivalent projection matrices and use reprojection error
equivalent projection matrices are see [H&Z,Sec. 9.5] for complete characterization
h i
P1 = I 0 , P2 = [e2 ] F + e2 e>
 
1 e2

~ H3; 2pt: Verify that F is a f.m. of P1 , P2 , for instance that F ' Q>
2 Q>
1 [e1 ]

(0)
1. compute F(0) by the 7-point algorithm Slide 84; construct camera P2 from F(0)
(0)
2. triangulate 3D points Xi from matches (xi , yi ) for all i = 1, . . . , k Slide 88

3. express the error function as reprojection error


r2i (xi , yi | Xi , P2 ) = kxi xi k2 + kyi yi k2 where xi ' P1 Xi , yi ' P2 Xi
(0)
4. starting from P2 , X (0) minimize
k
X
(X , P2 ) = arg min r2i (xi , yi | Xi , P2 )
P2 , X i=1
5. compute F from P1 , P2

3k + 12 parameters to be found: latent: Xi , for all i (correspondences!), non-latent: P2


minimal representation: 3k + 7 parameters, P2 = P2 (F) Slide 142
there are pitfalls; this is essentially bundle adjustment; we will return to this later Slide 134

3D Computer Vision: V. Optimization for 3D Vision (p. 98/211) R. Sara, CMP; rev. 7Jan2014
IMethod 2: First-Order Error Approximation

An elegant method for solving problems like (16):


we will get rid of the latent parameters X needed for obtaining the correction
[H&Z, p. 287], [Sampson 1982]
we will recycle the algebraic error = y> F x from Slide 84

Observations:
correspondences xi yi satisfy yi > F xi = 0, xi = (u1 , v 1 , 1), yi = (u2 , v 2 , 1)
this is a manifold VF R : a set of points Z = (u , v 1 , u2 , v 2 ) consistent with F
4 1

let Zi be the closest point on VF to measurement Zi , then (see (15))


def
ei (xi , yi | xi , yi ; F) = Zi Zi = ei (Zi , Zi ) F dropped to simplify notation

Zi VF this is the figure on Slide 96, just redrawn


Zi = u1 , v 1 , u2 , v 2 measurement (match)

ei (Zi , Zi )
Zi algebraic error for Zi : 0 = i (Zi ) = yi> F xi

Sampsons idea: Linearize the algebraic eror i (Z) at Zi (where it is non-zero) and evaluate the
resulting linear function at Zi (where it should be zero). This replaces VF by a linear manifold L.
The point on VF closest to Zi is replaced by the closest point on L.
3D Computer Vision: V. Optimization for 3D Vision (p. 99/211) R. Sara, CMP; rev. 7Jan2014
ISampsons Idea
Linearize i (Z) at correspondence Zi , evaluate it at Zi and get a constraint on ei (Zi , Zi )
have: i (Zi ), want: ei (Zi , Zi )
i (Zi ) def
0 = i (Zi ) i (Zi ) + (Zi Zi ) = i (Zi ) + Ji (Zi ) ei (Zi , Zi )
Zi
| {z } | {z }
Ji (Zi ) ei (Zi ,Zi )
Illustration on circle fitting
x
We are estimating distance from point x to circle VC of radius r in canonical position.
e
The circle is kxk2 r2 = 0. We convert non-linear error function (x) = kxk2 r2 to
linear function L (x), which replaces the circle (x) = 0 by a line L (x) = 0:
VC
(x) def
(x) (x) + (x x) = = 2 x> x (r2 + kxk2 ) = L (x)
x
| {z } | {z }
J(x)=2x> e(x,x)
x r 2 +kxk2
where L (x) = 0 is a line with normal kxk
and intercept 2kxk
not tangent to VC , outside!
x1
nonlinear algebraic error (x)
L1 (x) = 0 linear function over R2 : L (x)
(xi )

(x) = 0 VC R2
VC line in R : L (x) = 0
2

xi
x2
L2 (x) = 0 xi e(xi , xi )

3D Computer Vision: V. Optimization for 3D Vision (p. 100/211) R. Sara, CMP; rev. 7Jan2014
ISampson Error Approximation

Given a single measurement Zi and algebraic error i (Zi ), estimate optimal ei (Zi ).
In general, the Taylor expansion is
i (Zi ) !
i (Zi ) + (Zi Zi ) = i (Zi ) + Ji (Zi ) ei (Zi , Zi ) = 0
Zi | {z } | {z } | {z }
Rn n,d d
| {z } | {z }
i Ji R ei R
Ji (Zi ) e(Zi ,Zi )

n algebraic error dimension, d dimension of latent parameter space (n < d, n = 1, d = 4 for y> Fx = 0)

to find Zi closest to Zi , we estimate ei from i by minimizing


ei = arg min kei k2 subject to i + Ji ei = 0
ei

which gives a closed-form solution ~ P1; 1pt: derive ei

ei = J> > 1
i (Ji Ji ) i
kei k2 = > > 1
i (Ji Ji ) i

note that Ji is not invertible!


we often do not need Zi , just the squared distance kei k2 exception: triangulation Slide 103
the unknown parameters F are inside: ei = ei (F), i = i (F), Ji = Ji (F)

3D Computer Vision: V. Optimization for 3D Vision (p. 101/211) R. Sara, CMP; rev. 7Jan2014
ISampson Error: Result for Fundamental Matrix Estimation
The fundamental matrix estimation problem becomes ei is scalar, hence ei
k
X
F = arg min e2i (F)
F,rank F=2
i=1
1 >
(F ) 1 0 0
F3 (per columns) = (F2 )> (per rows), S = 0
 
Let F = F1 F2 1 0, then
(F3 )> 0 0 0
Sampson
i (F) = y>i F xi i R scalar algebraic error from Slide 84
 
i (F) i (F) i (F) i (F)
Ji (F) = 1
, 1
, 2
, 2
Ji R1,4 derivatives over point coords.
ui vi ui vi
i (F)
ei (F) = ei R Sampson error
kJi (F)k
h i y>
i Fxi
Ji (F) = (F1 )> yi , (F2 )> yi , (F1 )> xi , (F2 )> xi ei (F) = q
kSFxi k2 + kSF> yi k2

Sampson correction normalizes the algebraic error


automatically copes with multiplicative factors F 7 F
actual optimization not yet covered Slide 106
3D Computer Vision: V. Optimization for 3D Vision (p. 102/211) R. Sara, CMP; rev. 7Jan2014
IBack to Triangulation: The Golden Standard Method

We are given P1 , P2 and a single correspondence x y and we look for 3D point X


projecting to x and y. Slide 88
Idea:
1. compute F from P1 , P2 , e.g. F = (Q1 Q1 > 1
2 ) [q1 (Q1 Q2 )q2 ]
2. correct measurement by the linear estimate of the correction vector Slide 101
)> y
1 1 1
u u u (F1
1 1 1 y> Fx (F2 )> y
2 2 J> = v 2
v v

u u kJk2 u kSFxk2 + kSF> yk2 (F1 )> x
v 2 v2 v2 2 >
(F ) x
3. use the SVD algorithm with numerical conditioning Slide 89
Ex (contd from Slide 92):
XT noiseless ground truth position
reprojection error minimizer
Xs Sampson-corrected algebraic error minimizer
Xa algebraic error minimizer
m measurement (mT with noise in v 2 )

3D Computer Vision: V. Optimization for 3D Vision (p. 103/211) R. Sara, CMP; rev. 7Jan2014
Levenberg-Marquardt (LM) Iterative Estimation

Consider error function ei () = f (xi , yi , ) Rm , with xi , yi given, Rq unknown


X k = F, q = 9, m = 1 for f.m. estimation
Our goal: = arg min kei ()k2
i=1
Idea 1 (Gauss-Newton approximation): proceed iteratively for s = 0, 1, 2, . . .
Xk

s+1 := s + ds , where ds = arg min kei ( s + d)k2 (17)


d
i=1
s s
ei ( + d) ei ( ) + Li d,

ei () j
(Li )jl = , Li Rm,q typically a long matrix
()l
Then the solution to Problem (17) is a set of normal eqs
k k
!
X X
L> e
i i ( s
) = L >
i Li ds , (18)
i=1 i=1
| {z } | {z }
eRq,1 LRq,q

ds can be solved for by Gaussian elimination using Choleski decomposition of L


L symmetric use Choleski, almost 2 faster than Gauss-Seidel, see bundle adjustment
slide 137
such updates do not lead to stable convergence ideas of Levenberg and Marquardt
3D Computer Vision: V. Optimization for 3D Vision (p. 104/211) R. Sara, CMP; rev. 7Jan2014
LM (contd)

L>
P P >
Idea 2 (Levenberg): replace i Li with
i i Li Li + I for some damping factor 0

Idea 3 (Marquardt): replace I with i diag(L>


P
i Li ) to adapt to local curvature:

k k
!
X > s
X > > 
Li ei ( ) = Li Li + diag Li Li ds
i=1 i=1

Idea 4 (Marquardt): adaptive small Gauss-Newton, large gradient descend


3
1. choose 10 and compute ds
2. if i kei ( s + ds )k2 < i kei ( s )k2 then accept ds and set := /10, s := s + 1
P P

3. otherwise set := 10 and recompute ds

sometimes different constants are needed for the 10 and 103


note that Li Rm,q (long matrix) but each contribution L>
i Li is a square singular q q
matrix (always singular for k < q)
error can be made robust to outliers, see the trick on Slide 109
we have approximated the least squares Hessian by ignoring second derivatives of the error
function (Gauss-Newton approximation) See [Triggs et al. 1999, Sec. 4.3]
helps avoid the consequences of gauge freedom Slide 139

3D Computer Vision: V. Optimization for 3D Vision (p. 105/211) R. Sara, CMP; rev. 7Jan2014
LM with Sampson Error for Fundamental Matrix Estimation

Sampson (derived by linearization over point coordinates u1 , v 1 , u2 , v 2 )



y> Fx 1 0 0
i i i
ei (F) = = q S = 0 1 0
kJi k kSFxi k2 + kSF> yi k2 0 0 0

LM (by linearization over parameters F)


"   > #
ei (F) 1 2ei > 2ei >
Li = = yi SFxi xi + yi xi SF yi
F 2kJi k kJi k kJi k

Li is a 3 3 matrix, must be reshaped to dimension-9 vector


xi and yi in Sampson error are normalized to unit homogeneous coordinate
reinforce rank F = 2 after each LM update to stay in the fundamental matrix manifold and
kFk = 1 to avoid gauge freedom (by SVD, see Slide 107)
LM linearization could be done by numerical differentiation (small dimension)

3D Computer Vision: V. Optimization for 3D Vision (p. 106/211) R. Sara, CMP; rev. 7Jan2014
ILocal Optimization for Fundamental Matrix Estimation

Given a set {(xi , yi )}ki=1 of k > 7 inlier correspondences, compute an efficient estimate for
fundamental matrix F.

1. Find the conditioned ( Slide 91) 7-point F0 ( Slide 84) from a suitable 7-tuple
2. Improve the F0 using the LM optimization ( Slides 104105) and the Sampson error
( Slide 106) on all inliers, reinforce rank-2, unit-norm Fk after each LM iteration
using SVD

if there are no wrong matches (outliers), this gives a local optimum


contamination of (inlier) correspondences by outliers may wreak havoc with this algorithm
the full problem involves finding the inliers!
in addition, we need a mechanism for jumping out of local minima (and exploring the space of
all fundamental matrices)

3D Computer Vision: V. Optimization for 3D Vision (p. 107/211) R. Sara, CMP; rev. 7Jan2014
IThe Full Problem of Matching and Fundamental Matrix Estimation
Problem: Given two sets of image points X = {xi }m n
i=1 and Y = {yj }j=1 and their
descriptors D, find the most probable
1. inliers SX X, SY Y
2. one-to-one perfect matching M : SX SY perfect matching: 1-factor of the bipartite graph
3. fundamental matrix F such that rank F = 2
4. such that for each xi SX and yj = M (xi ) it is probable that
a. the image descriptor D(xi ) is similar to D(yj ), and
P 2
b. the total geometric error ij eij (F) is small note a slight change in notation: eij
5. inlier-outlier and outlier-outlier matches are improbable
X SX M Y
1
M: Y
1 SY 1 2 3 4 5 6 7 8
2 3 2 1
3
5 2 =0
4 5 6 4 3
X
4 =1
6 7
8 5
6

(M , F ) = arg max p(M , F | X, Y, D) (19)


M,F

probabilistic model: an efficient language for task formulation


the (19) is a p.d.f. for all the involved variables (there is a constant number of variables!)
binary matching table Mij {0, 1} of fixed size m n
each row/column contains at most one unity
zero rows/columns correspond to unmatched point xi /yj
3D Computer Vision: V. Optimization for 3D Vision (p. 108/211) R. Sara, CMP; rev. 7Jan2014
Deriving A Robust Matching Model by Marginalization

For algorithmic efficiency, instead of (M , F ) = arg max p(M , F | X, Y, D) we will solve


M,F


F = arg max p(F | X, Y, D) (20)
F

by marginalization of p(M, F | X, Y, D) over M this simplification changes the problem!


p(M , F | X, Y, D) ' p(M , F, X, Y, D) = p(X, Y, D, M | F) p(F)
assuming correspondence-wise independence:
m Y n m Y
n
def
Y Y
p(X, Y, D, M | F) = p(xi , yj , D, mij | F) = pe (eij , dij , mij | F)
i=1 j=1 i=1 j=1
eij represents geometric error for match xi yi : eij (xi , yi | F)
dij represents descriptor similarity for match xi yi : dij = kd(xi ) d(yj )k

Marginalization: m Y
n
X X X XX X Y
p(X, Y, D, M | F) = pe (eij , dij , mij | F) =
m11 {0,1} m12 mmn m11 m12 mmn i=1 j=1
m Y
Y n X
= = pe (eij , dij , mij | F) = p(X, Y, D | F)
i=1 j=1 mij {0,1}
| {z }
we will continue with this term
3D Computer Vision: V. Optimization for 3D Vision (p. 109/211) R. Sara, CMP; rev. 7Jan2014
Robust Matching Model (contd)
X X
pe (eij , dij , mij | F) = pe (eij , dij | mij , F) p(mij | F) =
mij {0,1} mij {0,1}

= pe (eij , dij | mij = 1, F) p(mij = 1 | F) + pe (eij , dij | mij = 0, F) p(mij = 0 | F) =


| {z } | {z } | {z } | {z }
p1 (eij ,dij |F) 10 p0 (eij ,dij |F) 0

= (1 0 ) p1 (eij , dij | F) + 0 p0 (eij , dij | F) (21)

the p0 (eij , dij | F) const is a penalty for missing a correspondence but it should be a
p.d.f. (cannot be a constant) (see Slide 111 for a simplification)
0
choose 0 1, p0 0 so that p0 const
1 0
the p1 (eij , dij | F) is typically an easy-to-design component: assuming independence of
geometric error and descriptor similarity:
p1 (eij , dij | F) = p1 (eij | F) p1 (dij )
we choose, eg.
e2
ij (F) kd(xi )d(yj )k2
1
21 2
1
2d 2
p1 (eij | F) = e , p1 (dij ) = e (22)
Te (1 , F) Td (d , dim d)
1 , d , 0 are hyper-parameters
the form of T (1 , F) depends on error definition
we will continue with the result from (21)
3D Computer Vision: V. Optimization for 3D Vision (p. 110/211) R. Sara, CMP; rev. 7Jan2014
ISimplified Robust Energy (Error) Function

assuming the choice of p1 as in (22), we are simplifying


m Y
Y n h i
p(X, Y, D | F) = (1 0 ) p1 (eij , dij | F) + 0 p0 (eij , dij | F) (23)
i=1 j=1

we define energy as: V (x) = log p(x) this helps simplify the formulas
for simplicity, we omit dij
we choose 0  1 and the missed-correspondence penalty function as
e2
ij (F)
1
20 2
p0 (eij | F) = e
Te (0 , F)
then

m X n e2
ij (F) e2
ij (F) 

log 1 0 log e 21 2 + 0 Te (1 , F) e 20 2
X 
V (X, Y, D | F) =
i=1 j=1 |
Te (1 , F) 1 0 Te (0 , F)

{z } | {z }
(F) t const

by choosing representative of F such that (F) = const, we get


2
m X
X n  eij (F) 
V (X, Y, D | F) = m n + log e 21 2 + t (24)
i=1 j=1 | {z }
V (eij )

note we are summing over all possible matches (m, n are fixed)
3D Computer Vision: V. Optimization for 3D Vision (p. 111/211) R. Sara, CMP; rev. 7Jan2014
IThe Action of the Robust Matching Model on Data
Example for V (e) from (24):
1 = 1
4
red the usual (non-robust) error when t = 0
V when t = 0
3.5 V when t = 0.25
blue the rejected correspondence penalty t
t = 0.25 green robust energy (24)
3

2.5

2 if the error of a correspondence exceeds a limit, it is ignored


V

1.5 then V (e) = const and we essentially count outliers in (24)


1
t controls the turn-off point
0.5
the inlier/outlier threshold is eT is the error for which
0
(1 0 ) p1 (eT ) = 0 p0 (eT ): note that t 0
0.5
4 3 2eT 1 0 1 eT2 3 4 p
e eT = 1 log t2 (25)

The full optimization problem is (20): likelihood prior


z }| { z }| {
p(X, Y, D | F) p(F)
F = arg max p(F | X, Y, D) = arg max =
F F p(X, Y, D)
| {z }
evidence

= arg min V (X, Y, D | F) + V (F)
F
typically we take V (F) = 0 unless we need to stabilize a computation, e.g. when video camera moves
smoothly (on a high-mass vehicle) and we have a prediction for F
evidence is not needed unless we want to compare different models
3D Computer Vision: V. Optimization for 3D Vision (p. 112/211) R. Sara, CMP; rev. 7Jan2014
Discussion: On The Art of Probabilistic Model Design. . .
a few models for fitting zero-centered circle C of radius r to points in R2
marginalized over C orthogonal deviation from C Sampson approximation
error model

x x x
N (0, 2 I) (, ) N (0, 2 I)

0.3 0.3 0.3


= 0.25 = 0.25 = 0.25
= 0.5 = 0.5 = 0.5
=1 =1 =1
0.25 0.25 0.25
=2 =2 =2
radial p.d.f.

0.2 0.2 0.2


p(||x|| | r = 1)

p(||x|| | r = 1)

p(||x|| | r = 1)
0.15 0.15 0.15

0.1 0.1 0.1

0.05 0.05 0.05

0 0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
||x|| ||x|| ||x||

3 3 3
random sample

2 2 2

1 1 1

0 0 0

1 1 1

2 2 2

3 3 3
3 2 1 0 1 2 3 3 2 1 0 1 2 3 3 2 1 0 1 2 3
p(x | r)

(kxkr)2   r2 rkxk e2 (x;r)


1 1 1 rkxk
e 2 2
2 2
e 1 e 2 2
(2)3 r kxk 2( r ) kxk r (2)3
mode inside the circle peak at the center mode at the circle
models the inside well unusable for small radii hole at the center
tends to normal distrib. tends to Dirac distrib. tends to normal distrib.
3D Computer Vision: V. Optimization for 3D Vision (p. 113/211) R. Sara, CMP; rev. 7Jan2014
How To Find the Global Maxima (Modes) of a PDF?
2
given the function p(x) at left p.d.f. on [0, 1], mode at 0.1

1.5
consider several methods:
1. exhaustive search
step = 1/(iterations-1);
p(x)

1
for x = 0:step:1
if p(x) > bestp
0.5 bestx = x; bestp = p(x);
end
end
0
0 0.2 0.4
x
0.6 0.8 1
slow algorithm (definite quantization); faster variants
exist fast to implement
exhaustive 2. randomized search with uniform sampling
x = rand(1);
randomized
if p(x) > bestp
bestx = x; bestp = p(x);
MH_crawl
end
Gibbs slow algorithm but better convergence fast to
implement how to stop it?
0 1000 2000 3000 4000 5000
iterations
3. random sampling from p(x) (Gibbs sampler)
averaged over 104 trials faster algorithm fast to implement but often infeasible (e.g.
when p(x) is data dependent (our case))
number of proposals before
|x xtrue | step 4. Metropolis-Hastings sampling
almost as fast (with care) not so fast to implement rarely
infeasible RANSAC belongs here
3D Computer Vision: V. Optimization for 3D Vision (p. 114/211) R. Sara, CMP; rev. 7Jan2014
How To Generate Random Samples from a Complex Distribution?

target (red) and scaled proposal (blue) distributions red: probability density function p(x) of a toy
2
distribution on the unit interval target distribution

1.5 4
X 4
X
p(x) = i Be(x; i , i ), i = 1, i 0
p(x), q(x|x0)/10

i=1 i=1
1

1
Be(x; , ) = x1 (1 x)1
0.5 B(, )

0 note we can generate samples from this p(x) how?


0 0.2 0.4 0.6 0.8 1
x

suppose we cannot sample from p(x) but we can sample from some simple
distribution q(x | x0 ), given the last sample x0 (blue) proposal distribution

U0,1 (x)
(independent) uniform sampling
q(x | x0 ) = Be(x; xT0 + 1, 1x0
T
+ 1) beta diffusion (crawler) T temperature

p(x) (independent) Gibbs sampler

note we have unified all the random sampling methods from the previous slide
how to transform proposal samples q(x | x0 ) to target distribution p(x) samples?

3D Computer Vision: V. Optimization for 3D Vision (p. 115/211) R. Sara, CMP; rev. 7Jan2014
IMetropolis-Hastings (MH) Sampling

C configuration (of all variable values) Here C = F and p(C) = p(F | X, Y, D)


Goal: Generate a sequence of random samples {Ci } from p(C)
setup a Markov chain with a suitable transition probability function so that it
generates the sequence
Sampling procedure
1. given Ci , generate random sample S from q(S | Ci )
q may use some information from Ci (Hastings)
2. compute acceptance probability the evidence term drops out
 
p(S) q(Ci | S)
a = min 1,
p(Ci ) q(S | Ci )
3. generate random number u from unit-interval uniform distribution U0,1
4. if u a then Ci+1 := S else Ci+1 := Ci
Programing an MH sampler
1. design a proposal distribution (mixture) q and a sampler from q
2. write functions q(Ci | S) and q(S | Ci ) that are proper distributions not always simple

Finding the mode


remember the best sample fast implementation but must wait long to hit the mode
use simulated annealing very slow
start local optimization from the best sample good trade-off between speed and accuracy
3D Computer Vision: V. Optimization for 3D Vision (p. 116/211) R. Sara, CMP; rev. 7Jan2014
MH Sampling Demo

initial sample

sampling process (video, 7:33, 100k samples)

blue point: current sample


green circle: best sample so far quality = p(x)
histogram: current distribution of visited states final distribution of visited
the vicinity of modes are the most often visited states states

3D Computer Vision: V. Optimization for 3D Vision (p. 117/211) R. Sara, CMP; rev. 7Jan2014
Demo Source Code (Matlab)

function x = proposal_gen(x0) %% DEMO script


% proposal generator q(x | x0)
k = 10000; % number of samples
T = 0.01; % temperature X = NaN(1,k); % list of samples
x = betarnd((x0)/T+1,(1-x0)/T+1);
end x0 = proposal_gen(0.5);
for i = 1:k
x1 = proposal_gen(x0);
function p = proposal_q(x, x0) a = target_p(x1)/target_p(x0) * ...
% proposal distribution q(x | x0) proposal_q(x0,x1)/proposal_q(x1,x0);
if rand < a
T = 0.01; X(i) = x1; x0 = x1;
p = betapdf(x, x0/T+1, (1-x0)/T+1); else
end X(i) = x0;
end
end
function p = target_p(x)
% target distribution p(x) figure(1)
x = 0:0.001:1;
% shape parameters: plot(x, target_p(x), r, linewidth,2);
a = [2 40 100 6]; hold on
b = [10 40 20 1]; binw = 0.025; % histogram bin width
n = histc(X, 0:binw:1);
% mixing coefficients: h = bar(0:binw:1, n/sum(n)/binw, histc);
w = [1 0.4 0.253 0.50]; w = w/sum(w); set(h, facecolor, r, facealpha, 0.3)
p = 0; xlim([0 1]); ylim([0 2.5])
for i = 1:length(a) xlabel x
p = p + w(i)*betapdf(x,a(i),b(i)); ylabel p(x)
end title MH demo
end hold off

3D Computer Vision: V. Optimization for 3D Vision (p. 118/211) R. Sara, CMP; rev. 7Jan2014
ISimplifying MH Sampler to RANSAC

RANSAC = five ideas: [Fischler & Bolles 1981]


1. configuration = s-tuple of inlier correspondences
the minimization will be over a discrete set of epipolar geometries proposable from 7-tuples

2. proposal distribution q() is given by the empirical distribution of data sample:


a. select s-tuple from data independently q(S | Ci ) = q(S)
1
i. q uniform q(S) = mn s MAPSAC (p(S) includes the prior)
ii. q dependent on descriptor similarity PROSAC (similar pairs are proposed more often)
b. solve the minimal geometric problem 7 geometry proposal e.g. F from s = 7

pairs of points define line distribution from p(n | X) (left)


random correspondence tuples drawn uniformly propose
samples of F from a data-driven distribution q(F | X, Y )

3. independent sampling & looking for the best sample no need to filter proposals by a
4. standard RANSAC replaces probability maximization with consensus maximization

x1 the eT is the inlier/outlier threshold from (25)


2eT x2

5. stopping based on the probability of mode-hitting Slide 121


3D Computer Vision: V. Optimization for 3D Vision (p. 119/211) R. Sara, CMP; rev. 7Jan2014
IA Variant of RANSAC with Local Optimization (LO-MAPSAC)

1. initialize the best sample as empty Cb :=


mn

2. estimate the number of iterations as N := s
3. repeat N -times:
a. draw minimal random sample S of size s from q(S)
S
b. if p(S) > p(Cb ) then
i. update the best sample Cb := S p(S) marginalized as in (24); p(S) includes a prior MAP
ii. threshold-out inliers

S
2eT

iii. start local optimization from the inliers of Cb


LM optimization with Sampson error, possibly weighted by posterior p(mij ) [Chum et al. 2003]
LO(S)

iv. re-estimate N from inlier counts Slide 121 for derivation


log(1 P ) | inliers(Cb )|
N = , =
log(1 s ) mn
4. output Cb

see MPV course for RANSAC details see also [Fischler & Bolles 1981], [25 years of RANSAC]
3D Computer Vision: V. Optimization for 3D Vision (p. 120/211) R. Sara, CMP; rev. 7Jan2014
IStopping RANSAC

Principle: what is the number of proposals N that are needed to hit an all-inlier sample?
P . . . probability that at least one sample is all-inlier (1 P . . . all N samples were bad)
. . . the fraction of inliers among tentative correspondences, 1
s . . . sample size (7 in 7-point algorithm)
s . . . proposal does not contain an outlier
log(1 P )
N 1 s . . . proposal contains at least one outlier
log(1 s ) (1 s )N . . . all N proposals contained an outlier = 1 P
s=7
8
10
P=0.5
P=0.8
6
N for s = 7 10 P=0.99
P=0.999

N (proposals)
P
0.8 0.99 10
4

0.5 205 590


0.2 1.3105 3.5105 2
10
0.1 1.6107 4.6107
0
10
1 0
10 10
(inlier fraction)
N can be re-estimated using the current estimate for (if there is LO, then after LO)
the quasi-posterior estimate for is the average over all samples generated so far
for 0 we gain nothing over the standard MH-sampler stopping criterion
3D Computer Vision: V. Optimization for 3D Vision (p. 121/211) R. Sara, CMP; rev. 7Jan2014
Example Matching Results for the 7-point Algorithm with RANSAC

input images interest points (ca. 3600) tentative corresp. (416) matching (340)

notice some wrong matches (they have wrong depth, even negative)
they cannot be rejected without additional constraints or scene knowledge
without local optimization the minimization is over a discrete set of epipolar geometries
proposable from 7-tuples

3D Computer Vision: V. Optimization for 3D Vision (p. 122/211) R. Sara, CMP; rev. 7Jan2014
Example: MH Sampling for a More Complex Problem
Task: Find two vanishing points from line segments detected in input image.
Model
principal point known, square pixel
explicit variables
1. two unknown vanishing points v1 , v2
latent variables
1. each line has a vanishing point label
i {, 1, 2}, represents an outlier
2. mother lines through vanishing points

arg min V (v1 , v2 , , L | S)


v1 ,v2 ,,L

v2

video
=2
simplifications =;
v1
vanishing points restricted to the set of all
pairwise segment intersections
=1
mother lines fixed by segment centroid

3D Computer Vision: V. Optimization for 3D Vision (p. 123/211) R. Sara, CMP; rev. 7Jan2014
Beyond RANSAC

By marginalization in (20) we have lost constraints on M (eg. uniqueness). One can choose a
better model when not marginalizing:
p(M, F, X, Y, D) = p(X, Y | M, F) p(D | M ) p(M ) p(F)
| {z } | {z } | {z } | {z }
geometric error similarity constraints prior

this is a global model: decisions on mij are no longer independent!


In the MH scheme
one can work with full p(M , F | X, Y, D), then S = (M, F)
explicit labeling mij can be done by, e.g. sampling from

q(mij | F) (1 0 ) p1 (eij | F), 0 p0 (eij | F)
when p(M ) uniform then always accepted, a = 1 ~ derive
we can compute the posterior probability of each match p(mij ) by histogramming mij
over {Si }
local optimization can then use explicit inliers and p(mij )
error can be estimated for elements of F from {Si } does not work in RANSAC!
large error indicates problem degeneracy this is not directly available in RANSAC
good conditioning is not a requirement we work with the entire distribution p(F)
one can find the most probable number of epipolar geometries by reversible jump MCMC
(homographies or other models)
if there are multiple models explaning data, RANSAC will return one of them randomly

3D Computer Vision: V. Optimization for 3D Vision (p. 124/211) R. Sara, CMP; rev. 7Jan2014
Part VI

3D Structure and Camera Motion

20 Introduction

21 Reconstructing Camera Systems

22 Bundle Adjustment

covered by
[1] [H&Z] Secs: 9.5.3, 10.1, 10.2, 10.3, 12.1, 12.2, 12.4, 12.5, 18.1
[2] Triggs, B. et al. Bundle AdjustmentA Modern Synthesis. In Proc ICCV Workshop
on Vision Algorithms. Springer-Verlag. pp. 298372, 1999.

additional references
D. Martinec and T. Pajdla. Robust Rotation and Translation Estimation in Multiview Reconstruction. In
Proc CVPR, 2007

M. I. A. Lourakis and A. A. Argyros. SBA: A Software Package for Generic Sparse Bundle Adjustment.
ACM Trans Math Software 36(1):130, 2009.

3D Computer Vision: VI. 3D Structure and Camera Motion (p. 125/211) R. Sara, CMP; rev. 7Jan2014
IConstructing Cameras from the Fundamental Matrix

Given F, construct some cameras P1 , P2 such that F is their fundamental matrix.


Solution   See [H&Z, p. 256]
P1 = I 0
P2 = [e2 ] F + e2 v>
 
e2

where
v is any 3-vector, e.g. v = e1 to make the camera finite
6= 0 is a scalar,
e2 = null(F> ), i.e. e>
2 F = 0

Proof
1. S is skew symmetric iff x> Sx = 0 for all x look-up the proof!
2. we have x ' PX
3. a non-zero F is a f.m. iff P> FP1 is skew symmetric
  2 
4. if P1 = I 0 and P2 = SF e2 then F corresponds to (P1 , P2 ) by Step 3
5. we can write S = [s]
6. a suitable choice is s = e2 [Luong96]
7. for the full the class including v, see [H&Z, Sec. 9.5]

3D Computer Vision: VI. 3D Structure and Camera Motion (p. 126/211) R. Sara, CMP; rev. 7Jan2014
IThe Projective Reconstruction Theorem

Observation: Unless Pi are constrained, then for any number of cameras i = 1, . . . , k

mi = Pi X = Pi H1 HX = P0i X0
| {z } |{z}
P0i X0

when Pi and X are both determined from correspondences (including calibrations


Ki ), they are given up to a common 3D homography H
(translation, rotation, scale, shear, pure perspectivity)

1 2
m1 m2 X X0

when cameras are internally calibrated (Ki known) then H is restricted to a similarity
since it must preserve the calibrations Ki [H&Z, Secs. 10.2, 10.3], [Longuet-Higgins 1981]
(translation, rotation, scale)

3D Computer Vision: VI. 3D Structure and Camera Motion (p. 127/211) R. Sara, CMP; rev. 7Jan2014
IReconstructing Camera Systems

Problem: Given a set of p decomposed pairwise essential matrices Eij = [ tij ] Rij and
calibration matrices Ki reconstruct the camera system Pi , i = 1, . . . , k
81 and 142 on representing E
P7
E^ 78 P6 We construct calibrated camera pairs Pij R6,4 126
 1   
Ki Pi I 0
P8 P5
Pij =
K1j Pj
=
Rij tij
R6,4
E^ 18 E^ 82 singletons i, j correspond to vertices V k vertices
pairs ij correspond to graph edges E p edges
P1 E^ 12 P2 P3 P4
Pij are in different coordinate systems but these are related by similarities Pij Hij = Pij
    
I 0 Rij tij ! Ri ti
> = (26)
Rij tij 0 sij Rj t j
| {z }| {z } | {z }
R6,4 Hij R4,4 R6,4

(26) is a linear system of 24p eqs. in 7p + 6k unknowns 7p (tij , Rij , sij ), 6k (Ri , ti )
each Pi appears on the right side as many times as is the degree of vertex Pi eg. P5 3-times

3D Computer Vision: VI. 3D Structure and Camera Motion (p. 128/211) R. Sara, CMP; rev. 7Jan2014
Icontd
       
Rij Ri tij ti
Eq. (26) implies = =
Rij Rij Rj Rij tij + sij tij tj
Rij and tij can be eliminated:

Rij Ri = Rj , Rij ti + sij tij = tj , sij > 0 (27)

note transformations that do not change these equations assuming no error in Rij
1. Ri 7 Ri R, 2. ti 7 ti and sij 7 sij , 3. ti 7 ti + Ri t

the global frame is fixed by e.g. selecting


k
X 1X
R1 = I, ti = 0, sij = 1 (28)
i=1
p i,j
rotation equations are decoupled from translation equations
in principle, sij could correct the sign of tij from essential matrix decomposition 81
but Ri cannot correct the sign in Rij
therefore make sure all points are in front of cameras and constrain sij > 0; 83

+ pairwise correspondences are sufficient


suitable for well-located cameras only (dome-like configurations)
otherwise intractable or numerically unstable
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 129/211) R. Sara, CMP; rev. 7Jan2014
Finding The Rotation Component in Eq. (27)

Task: Solve Rij Ri = Rj , i, j V , (i, j) E where R are a 3 3 rotation matrix each.


Per columns c = 1, 2, 3 of Rj :
Rij rci rcj = 0, for all i, j (29)
fix c and denote rc = rc1 , rc2 , . . . , rck > c-th columns of all rotation matrices stacked; rc R3k
 

then (29) becomes D rc = 0 D R3p,3k


3p equations for 3k unknowns p k in a 1-connected graph we have to fix rc1 = [1, 0, 0]

Ex: (k = p = 3)
P3
R12 rc1 rc2 = 0
c
R12 I 0 r1
E^ 13 E^ 23 R23 rc2 rc3 = 0
c
Dr = 0 R23 I rc2 = 0
R13 rc1 rc3 = 0 R13 0 I rc3
P1 E^ 12 P2
must hold for any c

Idea: [Martinec & Pajdla CVPR 2007]


1. find the space of all rc R3k that solve (29) D is sparse, use [V,E] = eigs(D*D,3,0); (Matlab)
2. choose 3 unit orthogonal vectors in this space 3 smallest eigenvectors
3. find closest rotation matrices per cam. using SVD because krc k = 1 is necessary but insufficient
R >
i = UV , where Ri = UDV
>
global world rotation is arbitrary

3D Computer Vision: VI. 3D Structure and Camera Motion (p. 130/211) R. Sara, CMP; rev. 7Jan2014
Finding The Translation Component in Eq. (27)

From eqs. (27) and (28): d rank of camera center set p No. of pairs, k No. of cameras
k
X X
Rij ti + sij tij tj = 0, ti = 0, sij = p, sij > 0, ti Rd
i=1 i,j

d(k1)1
in rank d: d p + d + 1 equations for d k + p unknowns p d1

Ex: Chains and circuits construction from sticks of known orientation and unknown length?
p=k1 k=p=3 k=p=4 k=p>4

k 2 for any d d 2: non-collinear ok d 3: non-planar ok d k 1: not possible

rank is not sufficient for chains, trees, or when d = 1 (collinear cameras)


3-connectivity gives a sufficient rank for d = 3 (cams. in general pos. in 3D)
s-connected graph has p d sk
2
e edges for s 2, hence p d 3k
2
e 3k
2
2
4-connectivity gives a sufficient rank for any k for d = 2 (coplanar cams)
since p d2ke 2k 3
maximal planar tringulated graphs have p = 3k 6 and give the rank for
k3
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 131/211) R. Sara, CMP; rev. 7Jan2014
contd

Linear equations in (27) and (28) can be rewritten to


>
t = t> > >

Dt = 0, 1 , t2 , . . . , tk , s12 , . . . , sij , . . .

for d = 3: t R3k+p , D R3p,3k+p is sparse

t = arg min t> D> D t


t, sij >0

this is a quadratic programming problem (constraints!)


z = zeros(3*k+p,1);
t = quadprog(D.*D, z, diag([zeros(3*k,1); -ones(p,1)]), z);
but check the rank first!

3D Computer Vision: VI. 3D Structure and Camera Motion (p. 132/211) R. Sara, CMP; rev. 7Jan2014
ISolving Eq. (27) by Stepwise Gluing

Given: Calibration matrices Kj and tentative correspondences per camera triples.


Initialization
1. initialize camera cluster C with P1 , P2 , Xi

2. find essential matrix E12 and matches i


M12 by the 5-point algorithm 87
3. construct camera pair
P1
Pj
   
P1 = K1 I 0 , P2 = K2 R t P2
eij (Xi , Pj )
4. compute 3D reconstruction {Xi } per ei1 (Xi , P1 )
mi1 mij
match from M12 93
5. initialize point cloud X with {Xi } mi2
satisfying chirality constraint zi > 0
and apical angle constraint |i | > T

Attaching camera Pj /C
1. select points Xj from X that have matches to Pj
2. estimate Pj using Xj , RANSAC with the 3-pt alg. (P3P), projection errors eij in Xj 70
3. reconstruct 3D points from all tentative matches from Pj to all Pl , l 6= k that are not in X
4. filter them by the chirality and apical angle constraints and add them to X
5. add Pj to C
6. perform bundle adjustment on X and C coming next

3D Computer Vision: VI. 3D Structure and Camera Motion (p. 133/211) R. Sara, CMP; rev. 7Jan2014
IBundle Adjustment

Given: Required:
1. set of 3D points {Xi }pi=1
1. corrected 3D points {X0i }pi=1
2. set of cameras {Pj }cj=1
2. corrected cameras {P0j }cj=1
3. fixed tentative projections mij
Latent:
X i 1. visibility decision vij {0, 1} per mij

P1
P2 Pj
e 1 (X P1 ) e (X P )
ij i; j

m1
i

i
i;
m ij

m2
i

for simplicity, X, m are considered direct (not homogeneous)


we have projection error eij (Xi , Pj ) = xi mi per image feature, where xi = Pj Xi
for simplicity, we will work with scalar error eij = keij k
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 134/211) R. Sara, CMP; rev. 7Jan2014
Robust Objective Function for Bundle Adjustment
Likelihood is constructed by marginalization, as in Robust Matching Model 110
p
Y c
Y  
p({m} | {P}) = (1 0 )p1 (eij | Xi , Pj ) + 0 p0 (eij | Xi , Pj )
pts:i=1 cams:j=1
marginalized log-likelihood is (111) 2
 eij (Xi2,Pj ) 
XX
21 def X X 2
V ({m} | {P}) = log p({m} | {P}) = log e +t = ij (Xi , Pj )
i j | {z } i j
(e2 2
ij (Xi ,Pj )) = ij (Xi ,Pj )

eij is the projection error (not Sampson error)


= 1, t = 0.02
ij is a robust error fcn.; it is non-robust (ij = eij ) when t = 0 10
e2ij (x)=x2
() is a robustification function we often find in M-estimation 8
2ij (x)

the Lij in Levenberg-Marquardt changes to vector 6

log p
ij 1 1 1 e2ij () 4
(Lij )l = = 2 2 2
(30)
l e ()/(21 ) ij () 4 l 2
1 + t e ij 1
| {z }
0
small for big eij 4 2 0 2 4
x

but the LM method stays the same as on Slides 104105


outliers: almost no impact on ds in normal equations because the red term in (30) scales
contributions to both sums down k
X X 
L> s
ij ij ( ) = L>
ij Lij ds
i,j i,j
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 135/211) R. Sara, CMP; rev. 7Jan2014
ISparsity in Bundle Adjustment

We have q = 3p + 11c parameters: = (X1 , X2 , . . . , Xp ; P1 , P2 , . . . , Pc ) points, cameras


We will use a running index r = 1, . . . , k, k = p c . Then each r corresponds to some i, j
k k k
!

X 2 s+1 s
X > s
X > >
= arg min r (), := +ds , Lr r ( ) = Lr Lr + diag Lr Lr ds

r=1 r=1 r=1

The block form of Lr in Levenberg-Marquardt (104) is zero except in columns i and j:


r-th error term is r2 = (e2ij (Xi , Pj ))

i j blocks:
Lr = : Xi , 1 3
: Pj , 1 11
3p 11
i j

i
3p
blocks: k
: Xi Xi , 3 3 X
L>
r Lr = : Xi Pj , 3 11
L>
r Lr =
j
: Pj Pj , 11 11 r=1

points first, then cameras scheme


standard bundle adjustment eliminates points and solves cameras, then back-substitutes
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 136/211) R. Sara, CMP; rev. 7Jan2014
ICholeski Decomposition for B. A.
The most expensive computation in B. A. is solving the normal eqs:
Xk X k 
find ds such that L> s
r r ( ) = L> >
r Lr + diag Lr Lr ds
r=1 r=1
This is a linear set of equations Ax = b, where
A is very large approx. 3 104 3 104 for a small problem of 10000 points and 5 cameras
A is sparse and symmetric, A1 is dense direct matrix inversion is prohibitive

Choleski: Every symmetric positive definite matrix A can be decomposed to


A = LL>, where L is lower triangular. If A is sparse then L is sparse, too.

1. decompose A = LL> transforms the problem to solving L L> x = b


| {z }
c
2. solve for x in two passes:
 X 
Lc = b ci := L1
ii bi Lij cj forward substitution, i = 1, . . . , q
j<i
 X 
L> x = c 1
xi := Lii ci Lji xj back-substitution
j>i
Choleski decomposition is fast (does not touch zero blocks)
non-zero elements are 9p + 121c + 66pc 3.4 106 ; ca. 250 fewer than all elements
it can be computed on single elements or on entire blocks
use profile Choleski for sparse A and diagonal pivoting for semi-definite A [Triggs et al. 1999]
controls the definiteness
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 137/211) R. Sara, CMP; rev. 7Jan2014
Profile Choleski Decomposition is Simple

function L = pchol(A)
%
% PCHOL profile Choleski factorization,
% L = PCHOL(A) returns lower-triangular sparse L such that A = L*L
% for sparse square symmetric positive definite matrix A,
% especially useful for arrowhead sparse matrices.

[p,q] = size(A);
if p ~= q, error Matrix must be square; end

L = sparse(q,q);
F = ones(q,1);
for i=1:q
F(i) = find(A(i,:),1); % 1st non-zero on row i; we are building F gradually
for j = F(i):i-1
k = max(F(i),F(j));
a = A(i,j) - L(i,k:(j-1))*L(j,k:(j-1));
L(i,j) = a/L(j,j);
end
a = A(i,i) - sum(full(L(i,F(i):(i-1))).^2);
if a < 0, error Matrix must be positive definite; end
L(i,i) = sqrt(a);
end
end

3D Computer Vision: VI. 3D Structure and Camera Motion (p. 138/211) R. Sara, CMP; rev. 7Jan2014
IGauge Freedom

1. The external frame is not fixed: See Projective Reconstruction Theorem, 127
mi ' Pj Xi = Pj H1 HXi = P0j X0i
2. Some representations are not minimal, e.g.
P is 12 numbers for 11 parameters
we may represent P in decomposed form K, R, t
but R is 9 numbers representing the 3 parameters of rotation

As a result
there is no unique solution
P >
matrix r Lr Lr is singular

Solutions
fixing the external frame (e.g. a selected camera frame) explicitly or by constraints
imposing constraints on projective entities
cameras, e.g. P3,4 = 1 this excludes affine cameras
points, e.g. kXi k2 = 1 this way we can represent points at infinity
using minimal representations
points in their Euclidean representation Xi but finite points may be an unrealistic model
rotation matrix can be represented by Cayley transform see next

3D Computer Vision: VI. 3D Structure and Camera Motion (p. 139/211) R. Sara, CMP; rev. 7Jan2014
IImplementing Simple Constraints
What for?
1. fixing external frame i = i0 trivial gauge
2. representing additional knowledge i = j e.g. cameras share calibration matrix K

^1 ^2 ^3 ^4


We introduce reduced parameters : these T, t represent
1
1 = 1 no change
= T + t, T Rp,p , p p 2
T= 3 t = 2 = 2 no change
Then Lr in LM changes to Lr T and 3 = t3 constancy
4
everything else stays the same 4 = 5 = 4 equality
5

T deletes columns of Lr that correspond to fixed parameters it reduces the problem size
consistent initialisation: 0 = T 0 + t
or filter the initialization by pseudoinverse 0 7 T 0
we need not compute derivatives for j that correspond to all-zero rows Tj
fixed params
constraining projective entities minimal representations
more complex constraints tend to make normal equations dense
implementing constraints is safer than explicit renaming of the parameters, gives a flexibility
to experiment
other methods are much more involved, see [Triggs et al. 1999]
BA resource: http://www.ics.forth.gr/~lourakis/sba/ [Lourakis 2009]
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 140/211) R. Sara, CMP; rev. 7Jan2014
IMinimal Representations for Rotation

o rotation axis, kok = 1, rotation angle


wanted: simple mapping to/from rotation matrices

1. Rodrigues representation
R = I + sin [o] + (1 cos )[o]2
1 1
sin [o] = (R R> ), cos = (tr R 1)
2 2
hiding in the vector o as in [sin o] is not so easy
Cayley tried:

2. Cayleys representation; let a = o tan 2 , then


R = (I + [a] )(I [a] )1
[a] = (R + I)1 (R I)
a1 + a2 a1 a2
a1 a2 = composition of rotations R = R1 R2
1 a>
1 a2
no trigonometric functions
cannot represent rotation by 180
explicit composition formula

3. exponential map R = exp [ o] , inverse by Rodrigues formula


3D Computer Vision: VI. 3D Structure and Camera Motion (p. 141/211) R. Sara, CMP; rev. 7Jan2014
Minimal Representations for Other Entities
1. with the help of rotation we can minimally represent
fundamental matrix
F = UDV> , D = diag(1, d2 , 0), U, V are rotations, 3 + 1 + 3 = 7 DOF
essential matrix
E = [t] R, R is rotation, ktk = 1, 3 + 2 = 5 DOF
camera  
P=K R t , 5 + 3 + 3 = 11 DOF

2. homography can be represented via exponential map



X 1 k
exp A = A note: A0 = I
k!
k=0
some properties
1
exp 0 = I, exp(A) = exp A , exp(A + B) 6= exp(A) exp(B)
> >
exp(A ) = (exp A) hence if A skew symmetric then exp A orthogonal
> > 1
exp(A) = exp(A ) = exp(A) = exp(A)
det exp A = exp(tr A) a key to homography representation:

z11 z12 z13
H = exp Z such that tr Z = 0, eg. Z = z21 z22 z23 , 8 DOF
z31 z32 (z11 + z22 )
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 142/211) R. Sara, CMP; rev. 7Jan2014
Part VII

Stereovision
23 Introduction
24 Epipolar Rectification
25 Binocular Disparity and Matching Table
26 Image Likelihood
27 Maximum Likelihood Matching
28 Uniqueness and Ordering as Occlusion Models
29 Three-Label Dynamic Programming Algorithm

mostly covered by
Sara, R. How To Teach Stereoscopic Vision. Proc. ELMAR 2010 referenced as [SP]

additional references
C. Geyer and K. Daniilidis. Conformal rectification of omnidirectional stereo pairs. In Proc Computer Vision
and Pattern Recognition Workshop, p. 73, 2003.

J. Gluckman and S. K. Nayar. Rectifying transformations that minimize resampling effects. In Proc IEEE
CS Conf on Computer Vision and Pattern Recognition, vol. 1:111117. 2001.

M. Pollefeys, R. Koch, and L. V. Gool. A simple and efficient rectification method for general motion. In
Proc Int Conf on Computer Vision, vol. 1:496501, 1999.

3D Computer Vision: VII. Stereovision (p. 143/211) R. Sara, CMP; rev. 7Jan2014
What Are The Relative Distances?

monocular vision already gives a rough 3D sketch because we understand the scene

3D Computer Vision: VII. Stereovision (p. 144/211) R. Sara, CMP; rev. 7Jan2014
What Are The Relative Distances?

Centrum for teknikstudier at Malmo Hogskola, Sweden

we have no help from image interpretation here


this is how difficult is low-level stereo we will attempt to solve

3D Computer Vision: VII. Stereovision (p. 145/211) R. Sara, CMP; rev. 7Jan2014
What Are The Relative Distances? (Why?)

a combination of lack of texture and occlusion ambiguous interpretation

3D Computer Vision: VII. Stereovision (p. 146/211) R. Sara, CMP; rev. 7Jan2014
Repetition: How Many Scenes Correspond to a Stereopair?

Consider the fence and the fortress worlds . . .

lack of texture is a limiting case of repetition


3D Computer Vision: VII. Stereovision (p. 147/211) R. Sara, CMP; rev. 7Jan2014
I How Difficult Is Stereo?

when we do not recognize the scene and cannot use high-level constraints the problem
seems difficult (right, less so in the center)
most stereo matching algorithms do not require scene understanding prior to matching
the success of a model-free stereo matching algorithm is unlikely:

WTA Matching:
for every left-image
pixel find the most
similar right-image
pixel along the
corresponding epipolar
left image disparity map disparity map from WTA line [Marroquin 83]

3D Computer Vision: VII. Stereovision (p. 148/211) R. Sara, CMP; rev. 7Jan2014
Why Model-Free Stereo Fails?

lack of an occlusion model


structural ambiguity
lack of a continuity model

left image right image


A2

B2
C3 C3

A1

B1

A A 3
3
B 2 B 2
C 1 C 1

interpretation 1 interpretation 2
3D Computer Vision: VII. Stereovision (p. 149/211) R. Sara, CMP; rev. 7Jan2014
But What Kind of Continuity Model Applies Here?

continuity alone is not a sufficient model


occlusion model is more primal
but occlusion model alone is insufficient, since it does not solve structural ambiguity

3D Computer Vision: VII. Stereovision (p. 150/211) R. Sara, CMP; rev. 7Jan2014
A Summary of Our Observations and an Outlook

simple matching algorithms do not work


decisions on matches are not independent due to occlusions
occlusion constraint works along epipolars only
occlusion model alone is insufficient does not resolve the structural ambiguity
a continuity model can resolve structural ambiguity
but continuity is piecewise due to object boundaries
in sufficiently complex scenes the only possibility is that stereopsis uses scene
interpretation (or another-modality measurement)
Outlook:
1. represent the occlusion constraint:
epipolar rectification
disparity
uniqueness as an occlusion constraint
2. represent piecewise continuity
ordering as a weak continuity model
3. use a consistent framework
looking for the most probable solution (MAP)

3D Computer Vision: VII. Stereovision (p. 151/211) R. Sara, CMP; rev. 7Jan2014
I Epipolar Rectification

Problem: Given fundamental matrix F or camera matrices P1 , P2 , transform images so


that epipolar lines become horizontal with the same row coordinate. The result is a
standard stereo pair. for easier correspondence search
Procedure:
1. find a pair of rectification homographies H1 and H2 .
2. warp images using H1 and H2 and modify fundamental matrix F 7 H> 2 FH1
1
or
cameras P1 7 H1 P1 , P2 7 H2 P2 .
Original pair

Rectification 1 Rectification 2

there is a 9-parameter family of rectification homographies for binocular rectification, see next
trinocular rectification has 9 or 6 free parameters (depends on additional
3D Computer Vision: VII. Stereovision (p. 152/211) constrains)
R. Sara, CMP; rev. 7Jan2014
Rectification Example
Four cameras in general position Rectified pairs

cam 1 cam 2 pair 1 2

cam 3 cam 4

pair 2 4

pair 1 4

3D Computer Vision: VII. Stereovision (p. 153/211) R. Sara, CMP; rev. 7Jan2014
I Rectification Homographies

Cameras (P1 , P2 ) are rectified by a homography pair (H1 , H2 ):

Pi = Hi Pi = Hi Ki Ri I Ci , i = 1, 2
 

u
v m1 m2 e2
rectified entities: F , l2 , l1 , etc:  
l1 l2
corresponding epipolar lines must be:
1. parallel to image rows epipoles become e1 = e2 = (1, 0, 0)
2. equivalent l2 = l1 l2 ' l1 ' e1 m1 = [e1 ] m1 = F m1
both conditions together give the rectified fundamental matrix

0 0 0

F ' 0 0 1
0 1 0
A two-step rectification procedure
1. Find some pair of primitive rectification homographies H1 , H2
2. Upgrade them to a pair of optimal rectification homographies from the class
preserving F .

3D Computer Vision: VII. Stereovision (p. 154/211) R. Sara, CMP; rev. 7Jan2014
IGeometric Interpretation of Linear Rectification

What pair of physical cameras is compatible with F ?


we know that F = (Q1 Q1 >
2 ) [e1 ] 79
we choose Q1 = K1 , Q2 = K2 R ; then
1 >
(Q1 Q2 ) [e1 ] = (K1 R> K2 1 )> F
we look for R , K1 , K2 compatible with

(K1 R> K2 1 )> F = F , R R> = I, K1 , K2 upper triangular


we also want b from e1 ' P1 C2 = K1 b b in cam. 1 frame
result:

b k11 k12 k13 k21 k22 k23

R = I, b = 0, K1 = 0 f v 0 , K2 = 0 f v0 (31)
0 0 0 1 0 0 1

rectified cameras are in canonical position with respect to each other


not rotated, canonical baseline
rectified calibration matrices can differ in the first row only
when K1 = K2 then the rectified pair is called the standard stereo pair and the
homographies standard rectification homographies

3D Computer Vision: VII. Stereovision (p. 155/211) R. Sara, CMP; rev. 7Jan2014
Icontd

rectification is a homography (per image)


rectified camera centers are equal to the original ones
standard rectified cameras are in canonical orientation
rectified image projection planes are coplanar
standard rectification guarantees equal rectified calibration matrices
rectified image projection planes are equal

standard rectification homographies reproject


onto a common image plane parallel to the base-
line f

C1 C2
Corollary
the standard rectified stereo pair has vanishing disparity for 3D points at infinity
but known F alone does not give any constraints to obtain standard rectification homographies
for that we need either of these:
1. projection matrices, or
2. calibrated cameras, or
3. a few points at infinity calibrating k1i , k2i , i = 1, 2, 3 in (31)

3D Computer Vision: VII. Stereovision (p. 156/211) R. Sara, CMP; rev. 7Jan2014
I Primitive Rectification

Goal: Given fundamental matrix F, derive some simple rectification homographies H1 , H2

1. Let the SVD of F be UDV> = F, where D = diag(1, d2 , 0), 1 d2 > 0


>
2. decompose D = A F B, where (F is given 154)

0 0 1 0 0 1
A = 0 d 0, B = 1 0 0
1 0 0 0 d 0
3. then
F = UDV> = UA> F BV>
| {z } | {z }
H>
2 H1

and the primitive rectification homographies are


H2 = AU> , H1 = BV>
~ P1; 1pt: derive some A, B from the admissible class

rectification homographies do exist


there are other primitive rectification homographies, these suggested are just simple to obtain

3D Computer Vision: VII. Stereovision (p. 157/211) R. Sara, CMP; rev. 7Jan2014
IPrimitive Rectification Suffices for Calibrated Cameras

Obs: calibrated cameras: d = 1 H1 , H2 are orthogonal


1. determine primitive rectification homographies (H1 , H2 ) from the essential matrix
2. choose a suitable common calibration matrix K, e.g.

f 0 u0
1 1
K = 0 f v0 , f = (f 1 + f 2 ), u0 = (u10 + u20 ), etc.
2 2
0 0 1
3. the final rectification homographies are
H1 = KH1 , H2 = KH2

we got a standard camera pair and non-negative disparity


def 1
P+
 
i = Ki Pi = Ri I Ci , i = 1, 2 note we started from E, not F
+ + >
   
H1 P1 = KH1 P1 = K BV R1 I C1 = KR I C1
| {z }
R

H2 P+ +
AU> R2 I C2 = KR I
   
2 = KH2 P2 = K | C2
{z }
R

one can prove that BV> R1 = AU> R2 with the help of (13)
points at infinity project to KR in both images they have zero disparity 162

3D Computer Vision: VII. Stereovision (p. 158/211) R. Sara, CMP; rev. 7Jan2014
IThe Degrees of Freedom in Epipolar Rectification

Proposition 1 Homographies A1 and A2 are rectification-preserving if the images stay


rectified, i.e. if A2> F A11 ' F , which gives
u
l1 l2 l3 r1 r2 r3
A1 = 0 sv tv , A2 = 0 sv tv , v
0 q 1 0 q 1
where s 6= 0, u0 , l1 , l2 6= 0, l3 , r1 , r2 6= 0, r3 , q are 9 free parameters.

general transformation canonical type

l1 , r1 horizontal scales l1 = r1 algebraic


l2 , r2 horizontal skews l2 = r2 algebraic
l3 , r3 horizontal shifts l3 = r3 algebraic
q common special projective geometric
sv common vertical scale geometric
tv common vertical shift algebraic
9 DoF 9 3 = 6 DoF
q is rotation about the baseline proof: find a rotation G that brings K to upper triangular form
via RQ decomposition: A1 K
1 = K1 G and A2 K2 = K2 G
sv changes the focal length
3D Computer Vision: VII. Stereovision (p. 159/211) R. Sara, CMP; rev. 7Jan2014
The Rectification Group

Corollary for Proposition 1 Let H1 and H2 be (primitive or other) rectification


homographies. Then H1 = A1 H1 , H2 = A2 H2 are also rectification homographies.

Proposition 2 Pairs of rectification-preserving homographies (A1 , A2 ) form a group with


group operation (A01 , A02 ) (A1 , A2 ) = (A01 A1 , A02 A2 ).
Proof:
closure by Proposition 1
associativity by matrix multiplication
identity belongs to the set
> 1
inverse element belongs to the set by A>
2 F A1 ' F F ' A2 F A1

3D Computer Vision: VII. Stereovision (p. 160/211) R. Sara, CMP; rev. 7Jan2014
Optimal and Non-linear Rectification

Optimal choice for the free parameters


by minimization of residual image distortion, eg.
[Gluckman & Nayar 2001]
ZZ
2
A1 = arg min det J(A1 H1 x) 1 dx
A1

by minimization of image information loss


[Matousek, ICIG 2004]
non-linear rectification suitable for forward motion
[Pollefeys et al. 1999], [Geyer & Daniilidis 2003]

forward egomotion
rectified images, Pollefeys method
3D Computer Vision: VII. Stereovision (p. 161/211) R. Sara, CMP; rev. 7Jan2014
I Binocular Disparity in Standard Stereo Pair

b top view
2 x Assumptions: single image line, standard camera pair
X b = z cot 1 z cot 2
u1 = f cot 1 u2 = f cot 2
z ot 1 z ot 2 z
b
b = + x z cot 2
2
X = (x, z) from disparity d = u1 u2 :
u1 u2
bf b u1 + u2 bv
m1 z
m2 z=
d
, x=
d 2
, y=
d
f
1 x 2 f , d, u, v in pixels, b, x, y, z in meters

C1 b C2 Observations
constant disparity surface is a frontoparallel plane
distant points have small disparity
f z
relative error in z is large for small disparity
C1 2;
1 dz 1
v =
y
m1 2
;
y z dd d
increasing baseline increases disparity and reduces
side view X the error

3D Computer Vision: VII. Stereovision (p. 162/211) R. Sara, CMP; rev. 7Jan2014
I Understanding Basic Occlusion Types
occluded
r3
r1 X2
surface pt. X1
r2
X
transparent

l
half occlusion mutual occlusion
surface point at the intersection of rays l and r1 occludes a world point at the intersection
(l, r3 ) and implies the world point (l, r2 ) is transparent, therefore
(l, r3 ) and (l, r2 ) are excluded by (l, r1 )
in half-occlusion, every world point such as X1 or X2 is excluded by a binocularly visible
surface point decisions on correspondences are not independent
in mutual occlusion this is no longer the case: any X in the yellow zone is not excluded
decisions in the zone are independent on the rest

mutuallyoccluded
halfoccluded

3D Computer Vision: VII. Stereovision (p. 163/211) R. Sara, CMP; rev. 7Jan2014
I Matching Table

Based on the observation on mutual exclusion we expect each pixel to match at most once.

1

1
2
3
1 2 3 4 1 2 3 4 4
1 2 5

2
1 2 3 4 5
C1 rays in epipolar plane C2 matching table T

matching table
rows and columns represent optical rays
nodes: possible correspondence pairs
full nodes: correspondences
numerical values associated with nodes: descriptor similarities see next

3D Computer Vision: VII. Stereovision (p. 164/211) R. Sara, CMP; rev. 7Jan2014
Image Point Descriptors And Their Similarity

Descriptors: Tag image points by their (viewpoint-invariant) physical properties:


texture window [Moravec 77]
reflectance profile under a moving illuminant
photometric ratios [Wolff & Angelopoulou 93-94]
dual photometric stereo [Ikeuchi 87]
polarization signature
...
similar points are more likely to match
we will compute image similarity for all match candidates and get the matching table

video
3D Computer Vision: VII. Stereovision (p. 165/211) R. Sara, CMP; rev. 7Jan2014
I Constructing A Suitable Image Similarity

let pi = (l, r) and L(l), R(r) be (left, right) image descriptors (vectors) constructed from
local image neighborhood windows
in matching table T :
1
in the left image:

L (l )
l l

2
r
kL(l) R(r)k2
a natural descriptor similarity is sim(l, r) =
I2 (l, r)
I the difference scale; a suitable (plug-in) estimate is 12 s2 L(l) + s2 R(r) , giving
2
  

2 s L(l), R(r) 2
sim(l, r) = 1 2   s () is sample (co-)variance (32)
s L(l) + s2 R(r)
| {z }

L(l),R(r)
MNCC Moravecs Normalized Cross-Correlation [Moravec 1977]
2 [0, 1], sign phase

3D Computer Vision: VII. Stereovision (p. 166/211) R. Sara, CMP; rev. 7Jan2014
contd
=10, =1.5
6 10

we choose some probability distribution on 5 8

[0, 1], e.g. Beta distribution

log(Be(2;,))
4 6

Be(2;,)
1
2(1) (1 2 )1

p1 sim(l, r) = 3 4
B(, )
2 2
note that uniform distribution is obtained for
==1
1 0

0 2
0 0.2 0.4 0.6 0.8 1

q
the mode is at 1
+2
0.9733 for = 10, = 1.5
if we chose = 1 then the mode was at = 1
perfect similarity is suspicious (depends on expected camera noise level)
from now on we will work with
 
V1 sim(l, r) = log p1 sim(l, r) (33)

3D Computer Vision: VII. Stereovision (p. 167/211) R. Sara, CMP; rev. 7Jan2014
How A Scene Looks in The Filled-In Similarity Table
scene left image right image
MNCC used
( = 1.5, = 1)
high-correlation structures
correspond to scene objects
constant disparity
a diagonal in correlation
table
5 5 window 11 11 window 3 3 window zero disparity is the main
diagonal
depth discontinuity
horizontal or vertical jump
in correlation table
large image window
better correlation
worse occlusion localization
see next

repeated texture
horizontal and vertical
block repetition
a good tradeoff occlusion artefacts undiscrimiable
3D Computer Vision: VII. Stereovision (p. 168/211) R. Sara, CMP; rev. 7Jan2014
Note: Errors at Occlusion Boundaries for Large Windows

NCC, Disparity Error

this used really large window of 25 25 px


errors depend on the relative contrast across the occlusion boundary
the direction of overlow depends on the combination of texture contrast and edge
contrast
solutions:
1. small windows (5 5 typically suffices)
2. eg. guided filtering methods for computing image similarity [Hosni 2011]
3D Computer Vision: VII. Stereovision (p. 169/211) R. Sara, CMP; rev. 7Jan2014
I Marroquins Winner Take All (WTA) Matching Algorithm
1. per left-image pixel: find the most similar right-image pixel
SAD(l, r) = kL(l) R(r)k1 L1 norm instead of the L2 norm in (32); unnormalized

2. represent the dissimilarity table diagonals in a compact form


d=0
d=1
d=2

d=0
d=1
d=2

3. use the image sliding aggregation algorithm

iml

imr
X
d

win

4. threshold results by maximal allowed dissimilarity

3D Computer Vision: VII. Stereovision (p. 170/211) R. Sara, CMP; rev. 7Jan2014
The Matlab Code for WTA

function dmap = marroquin(iml,imr,disparityRange)


% iml, imr - rectified gray-scale images
% disparityRange - non-negative disparity range

% (c) Radim Sara (sara@cmp.felk.cvut.cz) FEE CTU Prague, 10 Dec 12

thr = 20; % bad match rejection threshold


r = 2;
winsize = 2*r+[1 1]; % 5x5 window (neighborhood)

% the size of each local patch; it is N=(2r+1)^2 except for boundary pixels
N = boxing(ones(size(iml)), winsize);

% computing dissimilarity per pixel (unscaled SAD)


for d = 0:disparityRange % cycle over all disparities
slice = abs(imr(:,1:end-d) - iml(:,d+1:end)); % pixelwise dissimilarity
V(:,d+1:end,d+1) = boxing(slice, winsize)./N; % window aggregation
end

% collect winners, threshold, and output disparity map


[cmap,dmap] = min(V,[],3);
dmap(cmap > thr) = NaN; % mask-out high dissimilarity pixels
end

function c = boxing(im, wsz)


% if the mex is not found, run this slow version:
c = conv2(ones(1,wsz(1)), ones(wsz(2),1), im, same);
end
3D Computer Vision: VII. Stereovision (p. 171/211) R. Sara, CMP; rev. 7Jan2014
WTA: Some Results
thr = 20 thr = 10

results are bad


false matches in textureless image regions and on repetitive structures (book shelf)
a more restrictive threshold (thr=10) does not work as expected
we searched the true disparity range, results get worse if the range is set wider
chief failure reasons:
unnormalized image dissimilarity does not work well
no occlusion model
3D Computer Vision: VII. Stereovision (p. 172/211) R. Sara, CMP; rev. 7Jan2014
I Negative Log-Likelihood of Observed Images

given matching M what is the likelihood of observed data D?


we need the ability not to match
matches are pairs pi = (li , ri ), i = 1, . . . , n
we will mask-out some matches by a binary label {e, m} excluded, matched
labeled matching is a set
n   o
M = p1 , (p1 ) , p2 , (p2 ) , . . . , pn , (pn )
pi are matching table pairs; there are no more than n in the table T

The negative log-likelihood is then the likelihood of data D given labeled matching M
X 
V (D | M ) = V D(pi ) | (pi )
pi M

Our choice:

V D(pi ) | (pi ) = e = Ve penalty for unexplained data, Ve 0
 
V D(pi ) | (pi ) = m = V1 D(l, r) probability of match pi = (l, r) from (33)

the V D(pi ) | (pi ) = e could also be a non-uniform distribution but the extra effort does not pay off


3D Computer Vision: VII. Stereovision (p. 173/211) R. Sara, CMP; rev. 7Jan2014
I Maximum Likelihood (ML) Matching
1
Uniqueness constraint: Each point in the left image matches
at most once and vice versa.
A node set of T that follows the uniqueness constraint is called
matching in graph theory
X (p)
pi A set of pairs M = {pi }n
i=1 , pi T is a matching iff
pi , pj M, i 6= j : pj
/ X(pi ).
pj
2
The X(p) is called the X-zone of p and it defines dependencies
ML matching will observe the uniqueness constraint only ~ H4; 2pt: How many are
there: (1) binary partitionings
epipolar lines are independent wrt uniqueness constraint
of T , (2) maximal matchings
we can solve the problem per image lines i independently: in T ; prove the results.
X  X 
M = arg min

V D(p) | (p) = arg min M | Ve +
e
V (D(p) | (p) = m)
M M pM M M
pM : (p)=m
| {z } | {z }
unexplained pixels matching likelihood proper

M set of all perfect labeled matchings, |M |e number of pairs with = e in M , |M |e n


perfect = every table row (column) contains exactly 1 match

the total number of individual terms in the sum is n (which is fixed)

3D Computer Vision: VII. Stereovision (p. 174/211) R. Sara, CMP; rev. 7Jan2014
I Programming The ML Matching Algorithm

we restrict ourselves to a single (rectified) image line and reduce the problem to min-cost
perfect matching

extend every matching table pair p T , p = (j, k) to 4 combinations (j, sj ), (k, sk ) ,
sj {0, 1} and sk {0, 1} selects/rejects pixels for matching unlike selecting matches
binary label mjk = 1 then means that (j, sj ) matches (k, sk )

Vjk = V (D(j, k) | jk = m) Vjk = 0


1
(k, 0) Vjk = Ve Vjk =
2

each (j, 1) either matches some (k, 1) or it matches (j, 0)


(j, 1) each (k, 1) either matches some (j, 1) or (k, 0)
if M is maximal in the yellow quadrant then there will be n
auxiliary matches in the gray quadrant
otherwise every empty line in the yellow quadrant induces an
1
(k, 1) (j, 0) empty column in the quadrant, the cost is 2 2 Ve = Ve

our problem becomes minimum-cost perfect matching in an (m + n) (m + n) table


X X X
M + = arg min Vjk mjk , mjk = 1 for every j, mjk = 1 for every k
M
j,k k j
we collect our matches M in the yellow quadrant
3D Computer Vision: VII. Stereovision (p. 175/211) R. Sara, CMP; rev. 7Jan2014
Some Results for the ML Matching

unlike the WTA we can efficiently control the density/accuracy tradeoff


middle row: Ve set to error rate of 3% (and 61% density is achieved) holes are black
bottom row: Ve set to density of 76% (and 4.3% error rate is achieved)
3D Computer Vision: VII. Stereovision (p. 176/211) R. Sara, CMP; rev. 7Jan2014
Some Notes on ML Matching
an algorithm for maximum weighted bipartite matching can be used as well, with V 7 V
maximum weighted bipartite matching = maximum weighted assignment problem
by eg. Hungarian Algorithm

Idea?: Q: This looks simpler: Run matching with Ve = 0 and then threshold the result to
remove bad matches. A: Does not work well, see this example:
Ex: Ve = 8
ordinary ML w/thresholding our ML matching 174
8 3 9 8 3 9
10 6 9 10 6 9 our matching gives a better cost,
7 1 8 7 1 8 also greater cardinality (density)

V = 9 + 2 8 = 25 V = 9 + 10 + 8 = 27

ordinary ML with thresholding our ML from 174


3D Computer Vision: VII. Stereovision (p. 177/211) R. Sara, CMP; rev. 7Jan2014
A Stronger Model Needed

notice many small isolated errors in the ML matching


we need a continuity model
does human stereopsis teach us something?

Potential models for M


1. Monotonicity (ie. ordering preserved):
For all (i, j) M, (k, l) M, k>il>j
Notation: (i, j) M or j = M (i) left-image pixel i matches right-image pixel j.

2. Coherence [Prazdny 85]


the world is made of objects each occupying a well defined 3D volume

j l
i
coherent
k
monotonic
continuous

non-monotonic non-monotonic monotonic


incoherent coherent coherent model strength
3D Computer Vision: VII. Stereovision (p. 178/211) R. Sara, CMP; rev. 7Jan2014
IAn Auxiliary Construct: Cyclopean Camera

Cyclopean coordinate u from the psychophysiology of vision [Julesz 1971]

x b b u1 + u2 u1 + u2
new: u = f , known: d = f , x= u=
z z d 2 2

x0 X 0

x X
Disparity gradient
[Pollard, Mayhew, Frisby 1985]
1
z10

|d d0 | bf
z
z 0
z DG = 0
= 0  =
|u u | f x
z
xz0
u |z 0 z|
m m 0
=b
|xz 0 x0 z|
m1 z
m2
f human stereovision fails to perceive
x a continuous surface when disparity
C1 b
C b
C2 gradient exceeds a limit
2 2

3D Computer Vision: VII. Stereovision (p. 179/211) R. Sara, CMP; rev. 7Jan2014
IForbidden Zone and The Ordering Constraint

Forbidden zone F (X): DG > k with boundary b (z 0 z) = k (xz 0 x0 z)


boundary: a pair of lines in the x z plane
F (X ) a degenerate conic
point x = x0 , z = z 0 lies on the boundary
0
X = (x0 ; z 0 ) coincides with optical rays for k = 2
small k means wide F
X = (x; z )

m1
0
m2 0

m1 m2

C1 C2
disparity gradient limit is exceeded when X 0 F (X)
symmetry: X 0 F (X) X F (X 0 )
Obs: X 0 and X swap their order in the other image when X 0 F (X) k=2
real scenes often preserve ordering
thin and close objects violate ordering see next

3D Computer Vision: VII. Stereovision (p. 180/211) R. Sara, CMP; rev. 7Jan2014
Ordering and Critical Distance

object (thick):
X1 F (X4 ) X3 black binocularly visible

X2 yellow half-occluded
red ordering violated wrt foreground

solid red zone of depth :


spatial points visible in neither camera


bounded by the foreground object
X4
Ordering is violated iff both Xi , Xj s.t.
Xi F (Xj ) are visible in both cameras.
eg. X2 , X4

ordering is preserved in scenes where critical


1243 1423 distances are not exceeded, ie. when the
red background hides in the solid red zone

Thinner objects and/or wider baseline


C1 C2 require flatter scenes to preserve ordering.

3D Computer Vision: VII. Stereovision (p. 181/211) R. Sara, CMP; rev. 7Jan2014
IThe X-zone and the F -zone in Matching Table T

these are necessary and sufficient conditions for uniqueness and monotonicity

Uniqueness Constraint:
A set of pairs M = {pi }N i=1 , pi T is a matching iff
1 pi , pj M, i 6= j : pj / X(pi ).

X (p) Ordering Constraint:


Matching M is monotonic iff
pi , pj M : pj
/ F (pi ).
pi
ordering constraint: matched points form a
F (p) monotonic set in both images
ordering is a powerful constraint:
pj
monotonic matchings O(4N )  O(N !) all matchings
2
in N N table
~ 2: how many are there maximal monotonic matchings?
pj
/ X(pi ), pj
/ F (pi )
uniqueness constraint is a basic occlusion model
ordering constraint is a weak continuity model
and partly also an occlusion model

3D Computer Vision: VII. Stereovision (p. 182/211) R. Sara, CMP; rev. 7Jan2014
IUnderstanding Matching Table

this is essentially the picture from Slide 181


dk
binocularly visible foreground points
lI binocularly visible background pts violating ordering
depth discontinuity in right image
monocularly visible points
left image pixel index

dk critical disparity

depth discontinuity in left image

invisible dk

right image pixel index


rJ

3D Computer Vision: VII. Stereovision (p. 183/211) R. Sara, CMP; rev. 7Jan2014
Bayesian Decision Task for Matching
Idea: L(d, M ) decision cost (loss) d our decision (matching) M true correspondences
(
if d = M then L(d, M ) = 0
Choice: L(d, M ) : i.e. L(d, M ) = [d 6= M ]
6 M then L(d, M ) = 1
if d =
Bayesian Loss X
L(d | D) = p(M | D) L(d, M )
M M
M the set of all matchings D = {IL , IR } data

Solution for the best decision d !



X X
d = arg min p(M | D) (1 [d = M ]) = arg min 1 p(M | D)[d = M ] =
d d
M M M M
X
= arg max p(M | D) [d = M ] = arg max p(M | D) =
M
d M
M M
 d? 
def
= arg min( log p(M | D)) = arg min V (M | D) = arg min V (D | M ) + V (M )
M M M M | {z } | {z }
likelihood prior

this is Maximum Aposteriori Probability (MAP) estimate


other loss functions result in different solutions
our choice of L(d, M ) looks oversimple but it results in algorithmically tractable
problems
3D Computer Vision: VII. Stereovision (p. 184/211) R. Sara, CMP; rev. 7Jan2014
IConstructing The Prior Model Term V (M )

the prior V (M ) should capture


 
M = arg min V (D | M ) + V (M )
1. uniqueness M M
2. ordering
3. coherence

we need a suitable representation to encode V (M )


Every p = (l, r) of the |I| |J| matching table T (except for the last row and column)
receives two succesors (l + 1, r) and (l, r + 1)
r
s s

p
l

t t
this gives an acyclic directed graph G optimal paths in acyclic graphs are an easier problem
the set of s-t paths starting in s and ending in t will represent the set of matchings
all such s-t paths have equal length n = |I| + |J| 1
all prospective matchings will have the same number of terms in V (D | M ) and in V (M )

3D Computer Vision: VII. Stereovision (p. 185/211) R. Sara, CMP; rev. 7Jan2014
Endowing s-t Paths with Useful Properties
introduce node labels = {m, eL , eR } matched, left-excluded, right-excluded
s-t path neighbors are allowed only some label combinations:
s numbers are disparities
m eL eL eL m eL eR eL 0
2
eR m eR eL 2
eR m eR eR o
o
o
Observations 1
no two neighbors have label m
in each labeled s-t path there is at most one transition:
1. m eL or eR m per matching table row, 0 t
2. m eR or eL m per matching table column 0 o o 2 2 1 0
pairs labeled m on every s-t path satisfy uniqueness and ordering constraints
transitions eL eR or eR eL along an s-t path allow skipping a contiguous segment in
either or in both images this models half occlusion and mutual occlusion
eL eL eR eR
disparity change is the number of edges or
a given monotonic matching can be traversed by one or more s-t paths

Labeled s-t paths 1 n



P = (p1 , 1 ), (p2 , 2 ), . . . , (pn , n )
p1 p2 p3 pn
3D Computer Vision: VII. Stereovision (p. 186/211) R. Sara, CMP; rev. 7Jan2014
The Structure of The Prior Model V (P ) Gives a MC Recognition Problem

ideas:
we choose energy of path P dependent on its labeling only
we choose additive penalty per transition eL eL , eR eR , and eL eR , eR eL
no penalty for m eL , m eR
1 n
Employing Markovianity
p1 p2 p3 pn
V (P ) = V (n , n1 , . . . , 1 ) = V (n | n1 , . . . , 1 ) + V (n1 , . . . , 1 ) =
n
X
= V (n | n1 ) + V (n1 , . . . , 1 ) = V (1 ) + V (i | i1 )
i=2

The matching problem is then a decision over labeled s-t paths P P:


( n h
)
X i
P = arg min Vp1 (D | 1 ) + V (1 ) + Vpi (D | i ) + V (i | i1 ) (34)
P P
i=2

the data likelihood term Vpi (D | i ) is the same as in (33) (167)


note that one can add/subtract a fixed term from any of the functions Vp , V in (34)

3D Computer Vision: VII. Stereovision (p. 187/211) R. Sara, CMP; rev. 7Jan2014
A Choice of V (i | i1 )
A natural requirement: symmetry of probability p(i , i1 ) = eV (i , i1 )

i 3 DOF, 1 constraint 2 parameters


p(i , i1 )
m eL eR p(eL , eR )
1 = 0 1 1
m 0 p(m, e) p(m, e) p(e, e)
i1 eL p(m, e) p(e, e) p(eL , eR ) p(m, e)
eR p(m, e) p(eL , eR ) p(e, e) 2 = 0 < 2 1 + 1
p(e, e)

Result for V (i | i1 ) (after subtracting common terms):

i
V (i | i1 ) by marginalization:
m eL eR
m 0 0 1 + 1 + 2
V (m) = ln
1+1 +2 1+1 +2 1+1 +2 2 2
i1 eL ln 2 2
ln 2
ln 2 1 V (eL ) = V (eR ) = 0
1+1 +2 1+1 +2 1+1 +2
eR ln 2 2
ln 2 1
ln 2

parameters
1 likelihood of mutual occlusion (1 = 0 forbids mutual occlusion)
2 likelihood of irregularity (2 0 helps suppress small objects and holes)

, similarity model parameters (see V1 D(l, r) 167)
Ve penalty for disregarded data (see V (D(pi ) | (pi ) = e) 173)
3D Computer Vision: VII. Stereovision (p. 188/211) R. Sara, CMP; rev. 7Jan2014
Programming the Matching Algorithm: 3LDP
given G, construct directed graph G +
triple of vertices per node of s-t path representing three hypotheses (p) for
arcs have costs V (i | i1 ), nodes have costs V (D | i )
orientation of G + is inherited from the orientation of s-t paths
we converted the shortest labeled-path problem to ordinary shortest path problem

(l 1, r)
m

G+ eL

G eR
(l, r 1) p = (l, r) (l, r + 1)
s
m m m

eL eL eL

p eR eR eR
l
m

(l + 1, r)
eL

t
eR
r
neighborhood of p; strong blue edges are of zero penalty

3D Computer Vision: VII. Stereovision (p. 189/211) R. Sara, CMP; rev. 7Jan2014
contd: Dynamic Programming on G +

G + is a topologically ordered directed graph


we can use dynamic programming on G +
s

p1

p2 q

n o

Vs:q (q ) = min Vs:z (z ) + Vz (D | z ) + V (q | z )
z{p1 ,p2 },z
( ) cost of min-path from s to label at node q
Vs:q q q

complexity is O(|I| |J|), ie. stereo matching on N N images needs O(N 3 ) time
speedup by limiting the range in which the disparities d = l r are allowed to vary

3D Computer Vision: VII. Stereovision (p. 190/211) R. Sara, CMP; rev. 7Jan2014
Implementation of 3LDP in a few lines of code. . .

#define clamp(x, mi, ma) ((x) < (mi) ? (mi) : ((x) > (ma) ? (ma) : (x)))
#define MAXi(tab,j) clamp((j)+(tab).drange[1], (tab).beg[0], (tab).end[0])
#define MINi(tab,j) clamp((j)+(tab).drange[0], (tab).beg[0], (tab).end[0])

#define ARG_MIN2(Ca, La, C0, L0, C1, L1) if ((C0) < (C1)) { Ca = C0; La = L0; } else { Ca = C1; La = L1; }

#define ARG_MIN3(Ca, La, C0, L0, C1, L1, C2, L2) \


if ( (C0) <= MIN(C1, C2) ) { Ca = C0; La = L0; } else if ( (C1) < MIN(C0, C2) ) { Ca = C1; La = L1; } else { Ca = C2; La = L2; }

void DP3LForward(MatchingTableT tab) { void DP3LReverse(double *D, MatchingTableT tab) {

int i = tab.beg[0]; int j = tab.beg[1]; int i,j; labelT La; double Ca;
C_m[j][i-1] = C_m[j-1][i] = MAXDOUBLE; for(i=0; i<nl; i++) D[i] = nan; /* not-a-number */
C_oL[j][i-1] = C_oR[j-1][i] = 0.0;
C_oL[j-1][i] = C_oR[j][i-1] = -penalty[0]; i = tab.end[0]; j = tab.end[1];
ARG_MIN3(Ca, La, C_m[j][i], lbl_m,
for(j = tab.beg[1]; j <= tab.end[1]; j++) C_oL[j][i], lbl_oL, C_oR[j][i], lbl_oR);
for(i = MINi(tab,j); i <= MAXi(tab,j); i++) {
while (i >= tab.beg[0] && j >= tab.beg[1] && La > 0)
ARG_MIN2(C_m[j][i], P_m[j][i], switch (La) {
C_oR[j-1][i] + penalty[2], lbl_oR, case lbl_m: D[i] = i-j;
C_oL[j][i-1] + penalty[2], lbl_oL); switch (La = P_m[j][i]) {
C_m[j][i] += 1.0 - tab.MNCC[j][i]; case lbl_oL: i--; break;
case lbl_oR: j--; break;
ARG_MIN3(C_oL[j][i], P_oL[j][i], C_m[j-1][i], lbl_m, default: Error(...);
C_oL[j-1][i] + penalty[0], lbl_oL, } break;
C_oR[j-1][i] + penalty[1], lbl_oR);
C_oL[j][i] += penalty[3]; case lbl_oL: La = P_oL[j][i]; j--; break;
case lbl_oR: La = P_oR[j][i]; i--; break;
ARG_MIN3(C_oR[j][i], P_oR[j][i], C_m[j][i-1], lbl_m, default: Error(...);
C_oR[j][i-1] + penalty[0], lbl_oR, }
C_oL[j][i-1] + penalty[1], lbl_oL); }
C_oR[j][i] += penalty[3];
}
}

3D Computer Vision: VII. Stereovision (p. 191/211) R. Sara, CMP; rev. 7Jan2014
Some Results: AppleTree

left image right image ML (175)

3LDP (189) nave DP [Cox et al. 1992] stable segmented 3LDP (see [SP])
3LDP parameters i , Ve learned on Middlebury stereo data http://vision.middlebury.edu/stereo/

3D Computer Vision: VII. Stereovision (p. 192/211) R. Sara, CMP; rev. 7Jan2014
Some Results: Larch

left image right image ML (175)

3LDP (189) nave DP stable segmented 3LDP


nave DP does not model mutual occlusion
but even 3LDP has errors in mutually occluded region
stable segmented 3LDP has few errors in mutually occluded region since it uses a weak form
of image understanding
3D Computer Vision: VII. Stereovision (p. 193/211) R. Sara, CMP; rev. 7Jan2014
Algorithm Comparison

Winner-Take-All (WTA)
ROC curves and their average error rate bounds
the ur-algorithm [Marroquin 83] no model
98
dense disparity map
95
O(N 3 ) algorithm, simple but it rarely works
90
Maximum Likelihood (ML) 80
70
semi-dense disparity map

density [%]
many small isolated errors 50

models basic occlusion 30


O(N 3 log(N V )) algorithm max-flow by cost scaling 20
3LDP (3.65 0.26)
10
MAP with Min-Cost Labeled Path (3LDP) WTA (4.71 0.17)
5 ML (4.60 0.65)
semi-dense disparity map 2
GCS (4.29 1.47)

models occlusion in flat, piecewise continuos 0.5 1 2 3 5 10 20


inaccuracy [%]
scenes
has illusions if ordering does not hold ROC-like curve captures the
O(N 3 ) algorithm density/accuracy tradeoff
Stable Segmented 3LDP GCS is the one used in the exercises

better (fewer errors at any given density) more algorithms at


http://vision.middlebury.edu/
O(N 3 log N ) algorithm
stereo/ (good luck!)
requires image segmentation itself a difficult task

3D Computer Vision: VII. Stereovision (p. 194/211) R. Sara, CMP; rev. 7Jan2014
Thank You
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
How To Teach Stereoscopic Matching?
(Invited Paper)
Radim Sara
Center for Machine Perception
Department of Cybernetics
Czech Technical University in Prague, Czech Republic
Email: sara@cmp.felk.cvut.cz

AbstractThis paper describes a simple but non-trivial semi- Besides the basic knowledge in computer science and
dense stereoscopic matching algorithm that could be taught algorithms, basics of probability theory and statistics, and
in Computer Vision courses. The description is meant to be introductory-level computer vision or image processing, it is
instructive and accessible to the student. The level of detail is
sufficient for a student to understand all aspects of the algorithm assumed that the reader is familiar with the concept of epipolar
design and to make his/her own modifications. The paper includes geometry and plane homography. The limited extent of this
the core parts of the algorithm in C code. We make the point paper required shortening some explanations.
of explaining the algorithm so that all steps are derived from The purpose of this paper is not to give a balanced state-
first principles in a clear and lucid way. A simple method is of-the-art overview. The presentation, however, does give
described that helps encourage the student to benchmark his/her
own improvements of the algorithm. references to relevant literature during the exposition of the
problem.
I. I NTRODUCTION
Teaching stereovision in computer vision classes is a difficult II. T HE BASIC S TRUCTURE OF S TEREOSCOPIC M ATCHING
task. There are very many algorithms described in the literature Stereoscopic matching is a method for computing disparity
that address various aspects of the problem. A good up-to-date maps from a set of images. In this paper we will only
comprehensive review is missing. It is thus very difficult for a consider pairs of cameras that are not co-located. We will
student to acquire a balanced overview of the area in a short often call them the left and the right camera, respectively. A
time. For a non-expert researcher who wants to design and disparity map codes the shift in location of the corresponding
experiment with his/her own algorithm, reading the literature local image features induced by camera displacement. The
is often a frustrating exercise. direction of the shift is given by epipolar geometry of the two
This paper tries to introduce a well selected, simple, and cameras [1] and the magnitude of the shift, called disparity,
reasonably efficient algorithm. My selection is based on a is approximately inversely proportional to the camera-object
long-term experience with various matching algorithms and distance. Some disparity map examples are shown in Fig. 10.
on an experience with teaching a 3D computer vision class. I In these maps, disparity is coded by color, close objects are
believe that by studying this algorithm the student is gradually reddish, distant objects bluish, and pixels without disparity are
educated about the way researchers think about stereoscopic black. Such color-coding makes various kinds of errors clearly
matching. I tried to address all elementary components of the visible.
algorithm design, from occlusion modeling to matching task This section formalizes object occlusion so that we could
formulation and benchmarking. derive useful necessary constraints on disparity map correct-
An algorithm suitable for teaching stereo should meet a ness. We will first work with spatial points. The 3D space
number of requirements: in front of the two cameras is spanned by two pencils of
Simplicity: The algorithm must be non-trivial but easy to optical rays, one system per camera. The spatial points can
implement. be thought of as the intersections of the optical rays. They
Accessibility: The algorithm should be described in a complete will be called possible correspondences. The task of matching
and accessible form. is to accept some of the possible correspondences and reject
Performance: Performance must be reasonably good for the the others. Fig. 1 shows a part of a surface in the scene (black
basic implementation to be useful in practice. line segment) and a point p on the surface. Suppose p is
Education: The student should exercise formal problem for- visible in both cameras (by optical ray r1 in the left camera
mulation. Every part of the algorithm design must be well and by optical ray t1 in the right camera, both cameras are
justified and derivable from first principles. located above the scene). Then every point on ray r1 that
Development potential: The algorithm must possess potential lies in front of p must be transparent and each point of r1
for encouraging the student to do further development. behind p must be occluded. A similar observation holds for
Learnability: There should be a simple recommended method ray t1 . This shows that the decision on point acceptance and
for choosing the algorithm parameters and/or selecting their rejection are not independent: If the correspondence given
most suitable values for a given application. by point p is accepted, then a set of correspondences X(p)

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


How To Teach Stereoscopic Matching?
(Invited Paper)
Radim Sara
Center for Machine Perception
Department of Cybernetics
Czech Technical University in Prague, Czech Republic
Email: sara@cmp.felk.cvut.cz

AbstractThis paper describes a simple but non-trivial semi- Besides the basic knowledge in computer science and
dense stereoscopic matching algorithm that could be taught algorithms, basics of probability theory and statistics, and
in Computer Vision courses. The description is meant to be introductory-level computer vision or image processing, it is
instructive and accessible to the student. The level of detail is
sufficient for a student to understand all aspects of the algorithm assumed that the reader is familiar with the concept of epipolar
design and to make his/her own modifications. The paper includes geometry and plane homography. The limited extent of this
the core parts of the algorithm in C code. We make the point paper required shortening some explanations.
of explaining the algorithm so that all steps are derived from The purpose of this paper is not to give a balanced state-
first principles in a clear and lucid way. A simple method is of-the-art overview. The presentation, however, does give
described that helps encourage the student to benchmark his/her
own improvements of the algorithm. references to relevant literature during the exposition of the
problem.
I. I NTRODUCTION
Teaching stereovision in computer vision classes is a difficult II. T HE BASIC S TRUCTURE OF S TEREOSCOPIC M ATCHING
task. There are very many algorithms described in the literature Stereoscopic matching is a method for computing disparity
that address various aspects of the problem. A good up-to-date maps from a set of images. In this paper we will only
comprehensive review is missing. It is thus very difficult for a consider pairs of cameras that are not co-located. We will
student to acquire a balanced overview of the area in a short often call them the left and the right camera, respectively. A
time. For a non-expert researcher who wants to design and disparity map codes the shift in location of the corresponding
experiment with his/her own algorithm, reading the literature local image features induced by camera displacement. The
is often a frustrating exercise. direction of the shift is given by epipolar geometry of the two
This paper tries to introduce a well selected, simple, and cameras [1] and the magnitude of the shift, called disparity,
reasonably efficient algorithm. My selection is based on a is approximately inversely proportional to the camera-object
long-term experience with various matching algorithms and distance. Some disparity map examples are shown in Fig. 10.
on an experience with teaching a 3D computer vision class. I In these maps, disparity is coded by color, close objects are
believe that by studying this algorithm the student is gradually reddish, distant objects bluish, and pixels without disparity are
educated about the way researchers think about stereoscopic black. Such color-coding makes various kinds of errors clearly
matching. I tried to address all elementary components of the visible.
algorithm design, from occlusion modeling to matching task This section formalizes object occlusion so that we could
formulation and benchmarking. derive useful necessary constraints on disparity map correct-
An algorithm suitable for teaching stereo should meet a ness. We will first work with spatial points. The 3D space
number of requirements: in front of the two cameras is spanned by two pencils of
Simplicity: The algorithm must be non-trivial but easy to optical rays, one system per camera. The spatial points can
implement. be thought of as the intersections of the optical rays. They
Accessibility: The algorithm should be described in a complete will be called possible correspondences. The task of matching
and accessible form. is to accept some of the possible correspondences and reject
Performance: Performance must be reasonably good for the the others. Fig. 1 shows a part of a surface in the scene (black
basic implementation to be useful in practice. line segment) and a point p on the surface. Suppose p is
Education: The student should exercise formal problem for- visible in both cameras (by optical ray r1 in the left camera
mulation. Every part of the algorithm design must be well and by optical ray t1 in the right camera, both cameras are
justified and derivable from first principles. located above the scene). Then every point on ray r1 that
Development potential: The algorithm must possess potential lies in front of p must be transparent and each point of r1
for encouraging the student to do further development. behind p must be occluded. A similar observation holds for
Learnability: There should be a simple recommended method ray t1 . This shows that the decision on point acceptance and
for choosing the algorithm parameters and/or selecting their rejection are not independent: If the correspondence given
most suitable values for a given application. by point p is accepted, then a set of correspondences X(p)

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


p
p

sup
p
ort
ing
line
p1

p0

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


p
p

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


A

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


Camera 0, im. 6: Reprojection errors (16x)

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
Calibration errors
2.4
division model, RMS
2.2 division model, max
polynomial model (h=2), RMS
2 polynomial model (h=2), max

1.8

1.6
Error [px]

1.4

1.2

0.8

0.6

0.4
6 8 10 12 14 16 18 20 22
Focal length from EXIF data [mm]
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
Radial distortion coefficient values
0.1
division model,
0.08 polynomial model (h=2), k1

0.06 polynomial model (h=2), k2

0.04
Coefficient values []

0.02

0.02

0.04

0.06

0.08

0.1
6 8 10 12 14 16 18 20 22
Focal length from EXIF data [mm]
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
1,13
6,12 4,12 2,12 0,12 2,12
7,11 5,11 3,11 1,11 1,11 3,11
8,10 6,10 4,10 2,10 0,10 2,10 4,10
9,9 7,9 5,9 3,9 1,9 1,9 3,9
10,8 8,8 6,8 4,8 2,8 0,8 2,8 4,8
9,7 7,7 5,7 3,7 1,7 1,7 3,7
10,6 8,6 6,6 4,6 2,6 0,6 2,6 4,6
9,5 7,5 5,5 3,5 1,5 1,5 3,5
8,4 6,4 4,4 2,4 0,4 2,4 4,4
7,3 5,3 3,3 1,3 1,3 3,3
6,2 4,2 2,2 0,2 2,2

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


18

13
10
12
3
916
5211 7
17
8
414
15
1

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
nonlinear algebraic error (x)
linear function over R2 : L (x)
(xi )
VC R2
line in R2 : L (x) = 0
xi

xi e(xi , xi )

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
0.3
= 0.25
= 0.5
=1
0.25
=2

0.2
p(||x|| | r = 1)

0.15

0.1

0.05

0
0 0.5 1 1.5 2 2.5 3
||x||

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


0.3
= 0.25
= 0.5
=1
0.25
=2

0.2
p(||x|| | r = 1)

0.15

0.1

0.05

0
0 0.5 1 1.5 2 2.5 3
||x||

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


0.3
= 0.25
= 0.5
=1
0.25
=2

0.2
p(||x|| | r = 1)

0.15

0.1

0.05

0
0 0.5 1 1.5 2 2.5 3
||x||

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
mutuallyoccluded
halfoccluded

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014


3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
ROC curves and their average error rate bounds
98

95
90

80
70
density [%]

50

30
20
3LDP (3.65 0.26)
10
WTA (4.71 0.17)
5 ML (4.60 0.65)
GCS (4.29 1.47)
2
0.5 1 2 3 5 10 20
inaccuracy [%]

3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014

You might also like