Professional Documents
Culture Documents
https://cw.felk.cvut.cz/doku.php/courses/a4m33tdv/
http://cmp.felk.cvut.cz
mailto:sara@cmp.felk.cvut.cz
phone ext. 7203
Programme:
1. how to do things right in 3D vision cookbook of effective methods, pitfall avoidance
2. things useful beyond CV task formulation exercise, powerful robust optimization methods
Content:
Elements of projective geometry
Perspective camera
Geometric problems in 3D vision
Epipolar geometry (and motion-induced homography, TBD)
Optimization methods in 3D vision
3D structure and camera motion from (many) images
Stereoscopic vision
Shape from reflectance
Annotated Slides this is the reference material not static, evolving over the years
for deeper digs into geometry, see the GVG lecture notes at
https://cw.felk.cvut.cz/doku.php/courses/a4m33gvg/start
there is a Czech-English and English-Czech dictionary for this course
AbstractThis paper describes a simple but non-trivial semi- Besides the basic knowledge in computer science and
dense stereoscopic matching algorithm that could be taught algorithms, basics of probability theory and statistics, and
in Computer Vision courses. The description is meant to be introductory-level computer vision or image processing, it is
instructive and accessible to the student. The level of detail is
sufficient for a student to understand all aspects of the algorithm assumed that the reader is familiar with the concept of epipolar
design and to make his/her own modifications. The paper includes geometry and plane homography. The limited extent of this
the core parts of the algorithm in C code. We make the point paper required shortening some explanations.
of explaining the algorithm so that all steps are derived from The purpose of this paper is not to give a balanced state-
first principles in a clear and lucid way. A simple method is of-the-art overview. The presentation, however, does give
described that helps encourage the student to benchmark his/her
own improvements of the algorithm. references to relevant literature during the exposition of the
problem.
I. I NTRODUCTION
Teaching stereovision in computer vision classes is a difficult II. T HE BASIC S TRUCTURE OF S TEREOSCOPIC M ATCHING
task. There are very many algorithms described in the literature Stereoscopic matching is a method for computing disparity
that address various aspects of the problem. A good up-to-date maps from a set of images. In this paper we will only
comprehensive review is missing. It is thus very difficult for a consider pairs of cameras that are not co-located. We will
Czech: http://cmp.felk.cvut.cz/~sara/Teaching/TDV/SP-cz.pdf
student to acquire a balanced overview of the area in a short often call them the left and the right camera, respectively. A
time. For a non-expert researcher who wants to design and disparity map codes the shift in location of the corresponding
experiment with his/her own algorithm, reading the literature local image features induced by camera displacement. The
is often a frustrating exercise. direction of the shift is given by epipolar geometry of the two
This paper tries to introduce a well selected, simple, and cameras [1] and the magnitude of the shift, called disparity,
reasonably efficient algorithm. My selection is based on a is approximately inversely proportional to the camera-object
long-term experience with various matching algorithms and distance. Some disparity map examples are shown in Fig. 10.
on an experience with teaching a 3D computer vision class. I In these maps, disparity is coded by color, close objects are
believe that by studying this algorithm the student is gradually reddish, distant objects bluish, and pixels without disparity are
educated about the way researchers think about stereoscopic black. Such color-coding makes various kinds of errors clearly
matching. I tried to address all elementary components of the visible.
algorithm design, from occlusion modeling to matching task This section formalizes object occlusion so that we could
formulation and benchmarking. derive useful necessary constraints on disparity map correct-
An algorithm suitable for teaching stereo should meet a ness. We will first work with spatial points. The 3D space
number of requirements: in front of the two cameras is spanned by two pencils of
Simplicity: The algorithm must be non-trivial but easy to optical rays, one system per camera. The spatial points can
implement. be thought of as the intersections of the optical rays. They
Accessibility: The algorithm should be described in a complete will be called possible correspondences. The task of matching
and accessible form. is to accept some of the possible correspondences and reject
English: http://cmp.felk.cvut.cz/~sara/Teaching/TDV/SP-en.pdf
Performance: Performance must be reasonably good for the the others. Fig. 1 shows a part of a surface in the scene (black
basic implementation to be useful in practice. line segment) and a point p on the surface. Suppose p is
Education: The student should exercise formal problem for- visible in both cameras (by optical ray r1 in the left camera
mulation. Every part of the algorithm design must be well and by optical ray t1 in the right camera, both cameras are
justified and derivable from first principles. located above the scene). Then every point on ray r1 that
Development potential: The algorithm must possess potential lies in front of p must be transparent and each point of r1
for encouraging the student to do further development. behind p must be occluded. A similar observation holds for
Learnability: There should be a simple recommended method ray t1 . This shows that the decision on point acceptance and
for choosing the algorithm parameters and/or selecting their rejection are not independent: If the correspondence given
most suitable values for a given application. by point p is accepted, then a set of correspondences X(p)
M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting
with applications to image analysis and automated cartography. Communications of the ACM,
24(6):381395, 1981.
C. Harris and M. Stephens. A combined corner and edge detector. In Proc ALVEY Vision
Conf, pp. 147151, University of Manchester, England, 1988.
H. Li and R. Hartley. Five-point motion estimation made easy. In Proc ICPR, pp. 630633,
2006.
B. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon. A comprehensive survey of bundle
adjustment in computer vision. In Proc Vision Algorithms: Theory and Practice, LNCS
1883:298372. Springer Verlag, 1999.
G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press,
Baltimore, USA, 3rd edition, 1996.
1. OpenCV (Open Source Computer Vision): A library of programming functions for real
time computer vision. [on-line]
http://opencv.org/
remarks and notes are in small blue font typically flushed to the right like this
papers or books are referenced like this [H&Z, p. 100] or [Golub & van Loan 1996]
except H&Z or SP, references are to the list on Slide 12
most references are linked (clickable) in the PDF, the triple of icons on the
bottom helps you navigate back and forth, they are: back, find, forward
check the references above
each major part of the lecture starts with a slide listing the equivalent written material
slides containing examined material have a bold title with a marker I like this slide
I mandatory homework exercises are in small red font, after a circled asterisk
~ H1; 10pt: syntax: <problem ID in submission system>; <points>: explanation
deadline: Lecture Day + 2 weeks; unit penalty per 7 days; see the Submission System
Perspective Camera
covered by
[H&Z] Secs: 2.1, 2.2, 3.1, 6.1, 6.2, 8.6, 2.5, 7.4, Example: 2.19
3D Computer Vision: II. Perspective Camera (p. 15/211) R. Sara, CMP; rev. 7Jan2014
IBasic Geometric Entities, their Representation, and Notation
3D Computer Vision: II. Perspective Camera (p. 16/211) R. Sara, CMP; rev. 7Jan2014
IImage Line
the set of equivalence classes of vectors in R3 \ (0, 0, 0) forms the projective space P2
a set of rays
1
standard representative for finite n = (n1 , n2 , n3 ) is n, where =
n2 2
1 +n2
assuming n21 + n22 6= 0; 1 is the unit, usually 1 = 1
naming convention: a special entity is the Ideal Line (line at infinity)
I may sometimes wrongly use = instead of ', help me chase the mistakes down
3D Computer Vision: II. Perspective Camera (p. 17/211) R. Sara, CMP; rev. 7Jan2014
IImage Point
Point m = (u, v) is incident on the line n = (a, b, c) iff this works both ways!
au + bv + c = 0
1
standard representative for finite point m is m, where = m3
assuming m3 6= 0
when 1 = 1 then units are pixels and m = (u, v, 1)
when 1 = f then all components have a similar magnitude, f image diagonal
use 1 = 1 unless you know what you are doing;
all entities participating in a formula must be expressed in the same units
naming convention: Ideal Point (point at infinity) m ' (m1 , m2 , 0)
a proper member of P2
all such points lie on the ideal line n ' (0, 0, 1), ie. m>
n = 0
3D Computer Vision: II. Perspective Camera (p. 18/211) R. Sara, CMP; rev. 7Jan2014
ILine Intersection and Point Join
n ' m m0
3D Computer Vision: II. Perspective Camera (p. 19/211) R. Sara, CMP; rev. 7Jan2014
IHomography
collinear image points are mapped to collinear image points lines are mapped to lines
concurrent image lines are mapped to concurrent image lines bijection!
concurrent = intersecting at the same point
point-line incidence is preserved
3D Computer Vision: II. Perspective Camera (p. 20/211) R. Sara, CMP; rev. 7Jan2014
Some Homographic Tasters
illustrations courtesy of AMSL Racing Team, Meiji University and LIBVISO: Library for VISual Odometry
3D Computer Vision: II. Perspective Camera (p. 21/211) R. Sara, CMP; rev. 7Jan2014
IMapping Points and Lines by Homography
5. can use the unity for the homogeneous coordinate on one side of the equation only!
3D Computer Vision: II. Perspective Camera (p. 22/211) R. Sara, CMP; rev. 7Jan2014
Elementary Decomposition of a Homography
3D Computer Vision: II. Perspective Camera (p. 23/211) R. Sara, CMP; rev. 7Jan2014
Homography Subgroups
group DOF matrix invariant properties
s cos s sin tx all above plus: ratio of lengths, angle,
similarity 4 s sin s cos ty the circular points I = (1, i, 0),
0 0 1 J = (1, i, 0).
cos sin tx
Euclidean 3 sin cos ty all above plus: length, area
0 0 1
C z xp
x O
(x , y , 1)
y
X = (x, y, z)
1. right-handed canonical coordinate system 1 z 1
(x, y, z) C O
y 0
C z xp
xp = (u 0 ; v0 )
x O
(u; v ) (x , y , 1)
y
X = (x, y, z)
x x
u = f + u0
f x + z u0
f 0 u0
1 0 0 0
z 1 y
y f y + z v0 ' 0 f v0 0 1 0 0
z = KP0 X = P X
v = f + v0 z
z z 0 0 1 0 0 1 0
1
calibration matrix K transforms canonical camera P0 to standard projective camera P
3D Computer Vision: II. Perspective Camera (p. 26/211) R. Sara, CMP; rev. 7Jan2014
IComputing with Perspective Camera Projection Matrix
x
x + fz u0
m1 f 0 u0 0
y z
m = m2 = 0 f v0 0
z ' y + f v0
z
m3 0 0 1 0
| {z } 1 f
m1 fx m2 fy
= + u0 = u, = + v0 = v when m3 6= 0
m3 z m3 z
Perspective Camera:
1. dimension reduction since P R3,4
2. nonlinear unit change 1 7 1 z/f since m ' (x, y, z/f )
for convenience we use P11 = P22 = f rather than P33 = 1/f and the u0 , v0 in relative units
3. m3 = 0 represents points at infinity in image plane (z = 0)
3D Computer Vision: II. Perspective Camera (p. 27/211) R. Sara, CMP; rev. 7Jan2014
IChanging The Outer (World) Reference Frame
(u0 ; v0 )
~ H1; 2pt: Verify this K; hints: u0 eu0 + v 0 ev0 = ueu + vev ,
boldface are basis vectors, K maps from an orthogonal system to
a skewed system [w0 u0 , w0 v 0 , w0 ]> = K[u, v, 1]> ; first skew then
sampling then shift by u0 , v0 deadline LD+2 wk
Representation Theorem: The set of projection matrices P of finite projective cameras is isomorphic
to the set of homogeneous 3 4 matrices with the left hand 3 3 submatrix Q non-singular.
random
3D finiteVision:
Computer camera: Q=rand(3,3);
II. Perspective while
Camera (p. det(Q)==0,
29/211) Q=rand(3,3); end, R.
P=[Q,
Sara, rand(3,1)];
CMP; rev. 7Jan2014
IProjection Matrix Decomposition
P= Q q KR I C = K R t
1. C = Q1 q see next
2. RQ decomposition of Q = KR using three Givens rotations [H&Z, p. 579]
K = Q R32 R31 R21
| {z }
3. t = RC R1
Rij zeroes element ij in Q affecting only columns i and j and the sequence preserves previously
zeroed elements, e.g.
1 0 0 q33 q32
R32 = 0 c s, c2 + s2 = 1, gives c = q s= q
0 s c 2 + q2
q32 q 2 2
33 32 + q33
~ P1; 1pt: Multiply known matrices K, R and then decompose back; discuss numerical errors
RQ decomposition nonuniqueness: KR = KT1 TR, where T = diag(1, 1, 1) is also a
rotation, we must correct the result so that the diagonal elements of K are all positive
slim RQ decomposition
care must be taken to avoid overflow, see [Golub & van Loan 1996, sec. 5.2]
3D Computer Vision: II. Perspective Camera (p. 30/211) R. Sara, CMP; rev. 7Jan2014
RQ Decomposition Step
Q = Array@q, 83, 3<D;
R32 = 881, 0, 0<, 80, c, - s<, 80, s, c<<;
R32 MatrixForm
1 0 0
0 c -s
0 s c
MatrixForm
Q1 = Q.R32;
. s1 Simplify MatrixForm
s1 = s1@@2DD
Q1
:c >
q@3, 3D q@3, 2D
, s-
q@3, 2D2 + q@3, 3D2 q@3, 2D2 + q@3, 3D2
3D Computer Vision: II. Perspective Camera (p. 31/211) R. Sara, CMP; rev. 7Jan2014
ICenter of Projection
Theorem
Let there be B 6= 0 s.t. P B = 0. Then B is equal to the projection center C (in world
coordinate frame).
Proof.
1. Consider spatial line A B (B is given). We can write A X()
X() ' A + B, R B?
2. it images to B = C?
PX() ' P A + P B = P A
the whole line images to a single point it must pass through the optical center of P
this holds for all choices of A the only common point of the lines is the C, i.e. B ' C
u
t
Hence
C
= Q C + q C = Q1 q
0 = PC = Q q
1
C = (cj ), where cj = (1)j det P(j) , in which P(j) is P with column j dropped
Matlab: C_homo = null(P); or C = -Q\q;
3D Computer Vision: II. Perspective Camera (p. 32/211) R. Sara, CMP; rev. 7Jan2014
IOptical Ray
X = C + d
X = C + (Q)1 m, R
3D Computer Vision: II. Perspective Camera (p. 33/211) R. Sara, CMP; rev. 7Jan2014
IOptical Axis
o = det(Q) q3
3D Computer Vision: II. Perspective Camera (p. 34/211) R. Sara, CMP; rev. 7Jan2014
IPrincipal Point
Principal point: The intersection of image plane and the optical axis
1. we take point at infinity on the optical axis that must
project to principal point m0
q3 m0
2. then
C
q3
m0 ' Q q = Q q3
0
principal point is also the center of radial distortion (see Slide 50)
3D Computer Vision: II. Perspective Camera (p. 35/211) R. Sara, CMP; rev. 7Jan2014
IOptical Plane
A spatial plane with normal p passing through optical center C and a given image line n.
m
0
optical ray given by m d0 = Q1 m0
d p
n
X d0
C
m 0
hence, 0 = p> (X C) = n> Q(X C) = n> PX = (P> n)> X for every X in plane
see Slide 28
p0
m d p
n0 C
n
d = p p0 = (Q> n) (Q> n0 ) = Q1 (n n0 ) = Q1 m
3D Computer Vision: II. Perspective Camera (p. 37/211) R. Sara, CMP; rev. 7Jan2014
ISummary: Optical Center, Ray, Axis, Plane
q>
3 q34
distance between sleepers 0.806m but we cannot count them, resolution is too low
3D Computer Vision: II. Perspective Camera (p. 39/211) R. Sara, CMP; rev. 7Jan2014
IVanishing Point
Vanishing point: the limit of the projection of a point that moves along a space line
infinitely in one direction. the image of the point at infinity on the line
X0 + d m1
d
d
X0 m C
X0 + d
m ' lim P = = Qd ~ P1; 1pt: Derive or prove
1
V.P. is independent on line position, it depends on its orientation only
all parallel lines have the same V.P.
the image of the V.P. of a spatial line with direction vector d is m = Q d
V.P. m corresponds to spatial direction d = Q1 m optical ray through m
V.P. is the image of a point at infinity on any line, not just the optical ray as on Slide 33
3D Computer Vision: II. Perspective Camera (p. 40/211) R. Sara, CMP; rev. 7Jan2014
Some Vanishing Point Applications
where is the sun? what is the wind direction? fly above the lane,
(must have video) at constant altitude!
3D Computer Vision: II. Perspective Camera (p. 41/211) R. Sara, CMP; rev. 7Jan2014
IVanishing Line
n | plane normal
v1 v2
3D Computer Vision: II. Perspective Camera (p. 42/211) R. Sara, CMP; rev. 7Jan2014
ICross Ratio
Corollaries:
cross ratio is invariant under collineations (homographies) x0 ' Hx plug Hx in (1)
cross ratio is invariant under perspective projection: [RST U ] = [ r s t u ]
4 collinear points: any perspective camera will see the same cross-ratio of their images
we measure the same cross-ratio in image as on the world line
one of the points R, S, T , U may be at infinity
3D Computer Vision: II. Perspective Camera (p. 43/211) R. Sara, CMP; rev. 7Jan2014
I1D Projective Coordinates
|p pI | |p0 p|
[P ] = [P P0 PI P ] = [p p0 pI p] = = [p]
|p0 pI | |p p| p
naming convention: p
P0 the origin [P0 ] = 0
pI
PI the unit point [PI ] = 1
P the supporting point [P ] = p0
n
[P ] is equal to Euclidean coordinate along N n : N kN
[p] is its measurement in the image plane
Applications
Given the image of a line N , the origin, the unit point, and the vanishing point, then
the Euclidean coordinate of any point P N can be determined see Slide 45
Finding v.p. of a line through a regular object see Slide 46
3D Computer Vision: II. Perspective Camera (p. 44/211) R. Sara, CMP; rev. 7Jan2014
Application: Counting Steps
p
por
p
ting
line
p0
3D Computer Vision: II. Perspective Camera (p. 45/211) R. Sara, CMP; rev. 7Jan2014
Application: Finding the Horizon from Repetitions
p0
pI
p P0
PI
P
p1
in 3D: |P0 P | = 2|P0 PI | then [H&Z, p. 218]~ P1; 1pt: How high is the camera above the floor?
|P0 P | |p0 pI | |p0 p|
[P P0 PI P ] = =2 |p p0 | =
|P0 PI | |p0 p| 2|p0 pI |
could be applied to counting steps (Slide 45)
3D Computer Vision: II. Perspective Camera (p. 46/211) R. Sara, CMP; rev. 7Jan2014
Homework Problem
tA
u m
A tB
B h
fA
fB
Hints
1. what are the properties of line h connecting the top of Buiding B tB with the point m at which the
horizon is intersected with the line p joining the foots fA , fB of both buildings? [1 point]
2. how do we actually get the horizon n ? [1 point] (we do not see it directly, there are hills there)
3. what tool measures the length ratio? [formula = 1 point]
3D Computer Vision: II. Perspective Camera (p. 47/211) R. Sara, CMP; rev. 7Jan2014
2D Projective Coordinates
py
py
p
pyI
pI
p0 pxI px px
3D Computer Vision: II. Perspective Camera (p. 48/211) R. Sara, CMP; rev. 7Jan2014
Application: Measuring on the Floor (Wall, etc)
3D Computer Vision: II. Perspective Camera (p. 49/211) R. Sara, CMP; rev. 7Jan2014
IReal Camera with Radial Distortion
image with no radial distortion an extreme case of radial distortion image undistorted by division model
distortion types
3D Computer Vision: II. Perspective Camera (p. 50/211) R. Sara, CMP; rev. 7Jan2014
IThe Radial Distortion Mapping
yL
r y0 center of radial distortion (usually principal point)
yR yL linearly projected point
yR radially distorted point
radial distortion r maps yL to yR along the radial direction
y0
magnitude of the transfer depends on the radius kyL y0 k only
3D Computer Vision: II. Perspective Camera (p. 51/211) R. Sara, CMP; rev. 7Jan2014
IRadial Distortion Models
let z = y y0 non-homogeneous
yL we have zR = r(zL ) zL linear, zR distorted
Cn
but are often interested in yL = r 1 (yR )
yn yR
yn a no-distortion point on Cn : r(yn ) = yn
y0 zn = yn y0
yn : choose boundary point that preserves all image
content within the same image size
Cn
barrel distortion yn
arrows represent zR r(zL )
in pincushion in barrel
Division Model
1 z 2 zL
zL = kzR k2
zR and zR = , where z =
1
q
1 kzk2
kzn k2 1+ 1+ kzn k2
Polynomial Model
D(zR ; zn , k)
zL = zR ,
1+ n
P
i=1 ki
kzR k
D(zR ; zn , k) = 1 + k1 2 + k2 4 + + kn 2n , = , k = (ki )
kzn k
e.g. ki 0 barrel distortion, ki 0 pincusion distortion, i = 1, . . . , n
better fit for n 3
no analytic inverse
may loose monotonicity without requiring equal sign in all ki
hard to calibrate (tends to overfit)
3D Computer Vision: II. Perspective Camera (p. 53/211) R. Sara, CMP; rev. 7Jan2014
IReal and Linear Camera Models
A xL AK0 R t X (linear camera)
undistortion: xL = A r1 (A1 xR )
X
K0 R t r A xR A r K0 R t X (real camera)
yL yR
perspective projection distortion scanning
f 0 0 f sf u0
K0 = 0 f 0 ideal calibration matrix AK0 = 0 af v0
0 0 1 0 0 1
1 s u0
A = 0 a v0 everything affecting radial distortion center, skew, aspect ratio
0 0 1
r radial distortion function (here, it includes conversion from/to
homogeneous representation!)
Notes
assumption: the principal point and the center of radial distortion coincide
f included in K0 to make radial distortion independent of focal length
A makes radial lens distortion an elliptic image distortion
it suffices in practice that r1 is an analytic function (r need not be)
3D Computer Vision: II. Perspective Camera (p. 54/211) R. Sara, CMP; rev. 7Jan2014
Calibrating Radial Distortion
radial distortion calibration includes at least 5 parameters: , u0 , v0 , s, a
n
X
4. minimize total radial transfer error while preserving yn : arg min e2i
, u0 , v0 , s, a
i=1
3D Computer Vision: II. Perspective Camera (p. 55/211) R. Sara, CMP; rev. 7Jan2014
Example Calibrations
% of data points
10 a [-] 1.0951
8 u0 [px] 243.26
6 v0 [px] 353.37
4 (poly) k1 0.8256
2 k2 0.2261
0
0 0.5 1 1.5 2 k3 1.2524
Error in pixels
1.8
0.06 polynomial model (h=2), k
2
radial distortion
0.04
is slightly
Coefficient values []
1.6
0.02
Error [px]
1.4 0
dependend on
1.2 0.02
focal length
1 0.04
0.8 0.06
0.6 0.08
0.4 0.1
6 8 10 12 14 16 18 20 22 6 8 10 12 14 16 18 20 22
Focal length from EXIF data [mm] Focal length from EXIF data [mm]
3D Computer Vision: II. Perspective Camera (p. 56/211) R. Sara, CMP; rev. 7Jan2014
Part III
covered by
[1] [H&Z] Secs: 8.6, 7.1, 22.1
[2] Fischler, M.A. and Bolles, R.C . Random Sample Consensus: A Paradigm for Model
Fitting with Applications to Image Analysis and Automated Cartography.
Communications of the ACM 24(6):381395, 1981
[3] [Golub & van Loan 1996, Sec. 2.5]
3D Computer Vision: III. Computing with a Single Camera (p. 57/211) R. Sara, CMP; rev. 7Jan2014
Obtaining Vanishing Points and Lines
vanishing line can be obtained without vanishing points (see Slide 46)
3D Computer Vision: III. Computing with a Single Camera (p. 58/211) R. Sara, CMP; rev. 7Jan2014
ICamera Calibration from Vanishing Points and Lines
v3 n23 v2
di = Q1 vi , i = 1, 2, 3 Slide 40
>
pij = Q nij , i, j = 1, 2, 3, i 6= j Slide 36
d3 d2 Constraints
d1
1. orthogonal rays d1 d2 in space then
n31 n12 0 = d> > > 1
Q v2 = v> > 1
1 d2 = v1 Q 1 (KK ) v2
| {z }
(IAC)
2. orthogonal planes pij pik in space
nij may be constructed from non-orthogonal vi and vj , e.g. using the cross-ratio
is a symmetric, positive definite 3 3 matrix IAC = Image of Absolute Conic
3D Computer Vision: III. Computing with a Single Camera (p. 59/211) R. Sara, CMP; rev. 7Jan2014
Icontd
these are homogeneous linear equations for the 5 parameters in in the form Dw = 0
can be eliminated from (4)
we will come to solving overdetermined homogeneous equations later Slide ??
we need at least 5 constraints for full symmetric 3 3
1 >
we get K from = KK by Choleski decomposition
the decomposition returns a positive definite upper triangular matrix
one avoids solving a set of quadratic equations for the parameters in K
3D Computer Vision: III. Computing with a Single Camera (p. 60/211) R. Sara, CMP; rev. 7Jan2014
In[1]:= K f, s, u0, 0, a f, v0, 0, 0, 1;
K MatrixForm
Out[2]//MatrixForm=
f s u0
0 a f v0
0 0 1
In[4]:= InverseK.TransposeK DetK ^ 2;
Simplify MatrixForm
Out[5]//MatrixForm=
a2 f2 a f s a f a f u0 s v0
a f s f2 s2 a f s u0 f2 s2 v0
a f a f u0 s v0 a f s u0 f2 s2 v0 a2 f2 f2 u02 2 a f s u0 v0 f2 s2 v02
Ex 1:
Assuming ORUA and known m0 = (u0 , v0 ), two finite orthogonal vanishing points give f
in this formula, vi , m0 are not homogeneous!
Ex 2:
v>
i vj
Non-orthogonal vanishing points vi , vj , known angle : cos = p q
v>
i vi v>j vj
3D Computer Vision: III. Computing with a Single Camera (p. 62/211) R. Sara, CMP; rev. 7Jan2014
ICamera Orientation from Two Finite Vanishing Points
Problem: Given K and two vanishing points corresponding to two known orthogonal
directions d1 , d2 , compute camera orientation R with respect to the plane.
v2
coordinate system choice, e.g.:
we know that
di ' Q1 vi = (KR)1 vi = R1 K1 vi
| {z } d2 v1
wi .
Rdi ' wi
d1
then wi /kwi k is the ith column ri of R
some suitable scenes
the third column is orthogonal:
r3 = r1 r2
h i
w w2 w1 w2
R = kw11 k kw2 k kw1 w2 k
3D Computer Vision: III. Computing with a Single Camera (p. 63/211) R. Sara, CMP; rev. 7Jan2014
Application: Planar Rectification
m0 ' K I
m ' KR I C X C X
3D Computer Vision: III. Computing with a Single Camera (p. 64/211) R. Sara, CMP; rev. 7Jan2014
ICamera Resection
Camera calibration and orientation from a known set of k 6 reference points and their
images {(Xi , mi )}6i=1 .
X i
P Xi is considered exact
z
mi is a measurement
m^i e2i = kmi mi k2
ei
where mi ' PXi
m i
1,13
6,12 4,12 2,12 0,12 2,12 140
6,2 4,2 2,2 0,2 2,2 -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120
q>
1 q14 Xi = (xi , yi , zi , 1), i = 1, 2, . . . , k, k = 6
i mi = PXi , P = q> q24
2
i R, i 6= 0
mi = (ui , vi , 1),
q>
3 q34
easy to modify for infinite points Xi
expanded: i ui = q>
1 Xi + q14 , i vi = q>
2 Xi + q24 , i = q>
3 Xi + q34
Then
q
X> 1 0> 0 u1 X> u1
1 1 1
0> 0 X>
1 1 v1 X>
1 v1 q14
. .. q2 = 0
A q = ..
. q24
(8)
X> 1 0 >
0 uk X> uk q3
k k
0> 0 X>k 1 vk X>
k vk q34
we need 11 indepedent parameters for P
A R2k,12 , q R12
6 points in a general position give rank A = 12 and there is no non-trivial null space
drop one row to get rank 11 matrix, then the basis of the null space of A gives q
3D Computer Vision: III. Computing with a Single Camera (p. 66/211) R. Sara, CMP; rev. 7Jan2014
IThe Jack-Knife Solution for k = 6
given the 6 correspondences, we have 12 equations for the 11 parameters
can we use all the information present in the 6 points?
Jack-knife estimation
1. n := 0
2. for i = 1, 2, . . . , 2k do
a. delete i-th row from A, this gives Ai
b. if dim null Ai > 1 continue with the next i
c. n := n + 1
d. compute the right null-space qi of Ai e.g. by economy-size SVD
e. qi := qi normalized by q11 and dimension-reduced assuming finite camera with P3,3 = 1
3. from all n vectors qi collected in Step 1d compute
n n
1X n1 X
q= qi , var[q] = diag (qi q)(qi q)> regular for n 11
n i=1 n i1
3D Computer Vision: III. Computing with a Single Camera (p. 67/211) R. Sara, CMP; rev. 7Jan2014
IDegenerate (Critical) Configurations for Camera Resection
Let X = {Xi ; i = 1, . . .} be a set of points and P1 6' P2 be two regular (rank-3) cameras.
Then two configurations (P1 , X ) and (P2 , X ) are image-equivalent if
P1 Xi ' P2 Xi for all Xi X
C
C1
if all calibration points Xi X lie on a plane then
C2 camera resection is non-unique and all image-equivalent
camera centers lie on a spatial line C with the C = C
{ excluded
C1 this also means we cannot resect if all Xi are infinite
by adding points Xi X to C we gain nothing
there are additional image-equivalent configurations, see
Case 4 next
see proof sketch in the notes or in [H&Z, Sec. 22.1.2]
Note that if Q, T are suitable homographies then P1 ' QP0 T, where P0 is canonical and the
analysic can be made with
P0 TXi ' P2 TXi for all Yi Y
| {z } | {z }
Yi Yi
3D Computer Vision: III. Computing with a Single Camera (p. 68/211) R. Sara, CMP; rev. 7Jan2014
contd (all cases)
C C cameras C1 , C2 co-located at point C
points on three optical rays or one optical ray
and one optical plane
Case 5: we see 3 isolated point images
Case 5 Case 6 Case 6: we see a line of points and an isolated point
C1 C
C1
C
C2 C2 0 }
cameras lie on a line C \ {C , C
Calibrated camera rotation and translation from Perspective images of 3 reference Points.
3
Problem: Given K and three corresponding pairs (mi , Xi ) i=1 , find R, C by solving
i mi = KR (Xi C), i = 1, 2, 3
def
1. Transform vi = K1 mi . Then configuration w/o rotation
i vi = R (Xi C). (9) C
v3
2. Eliminate R by taking rotation preserves length: kRxk = kxk v1
def
|i | kvi k = kXi Ck = zi (10) v2
z1
X3
3. Consider only angles among vi and apply Cosine Law per
triangle (C, Xi , Xj ) i, j = 1, 2, 3, i 6= j X1 z2
d12
d2ij = zi2 + zj2 2 zi zj cij ,
X2
zi = kXi Ck, dij = kXj Xi k, cij = cos(vi vj )
4. Solve system of 3 quadratic eqs in 3 unknowns zi [Fischler & Bolles, 1981]
there may be no real root; there are up to 4 solutions that cannot be ignored
(verify on additional points)
5. Compute C by trilateration (3-sphere intersection) from Xi and zi ; then i from (10) and R
from (9)
Similar problems (P4P with unknown f ) at http://cmp.felk.cvut.cz/minimal/ (with code)
3D Computer Vision: III. Computing with a Single Camera (p. 70/211) R. Sara, CMP; rev. 7Jan2014
Degenerate (Critical) Configurations for Exterior Orientation
unstable solution
C center of projection C located on the orthogonal circular
cylinder with base circumscribing the three points Xi
unstable: a small change of Xi results in a large change of C
X3 can be detected by error propagation
degenerate
X1 X2 camera C is coplanar with points (X1 , X2 , X3 ) but is not
on the circumscribed circle of (X1 , X2 , X3 ) we see a line
X3 C no solution
3D Computer Vision: III. Computing with a Single Camera (p. 72/211) R. Sara, CMP; rev. 7Jan2014
Part IV
covered by
[1] [H&Z] Secs: 9.1, 9.2, 9.6, 11.1, 11.2, 11.9, 12.2, 12.3, 12.5.1
[2] H. Li and R. Hartley. Five-point motion estimation made easy. In Proc ICPR 2006, pp. 630633
additional references
H. Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Nature, 293
(5828):133135, 1981.
3D Computer Vision: IV. Computing with a Camera Pair (p. 73/211) R. Sara, CMP; rev. 7Jan2014
IGeometric Model of a Camera Pair
Epipolar geometry:
brings constraints necessary for inter-image matching
its parametric form encapsulates information about the relative pose of two cameras
X Description
baseline b joins projection centers C1 , C2
d1 " d2
b = C2 C1
epipole ei i is the image of Cj :
m1 1 2 m2 e1 ' P1 C2 , e2 ' P2 C1
l1 l2
b li i is the image of epipolar plane
C1 e1 e2 C2 = (C2 , X, C1 )
lj is the epipolar line in image j induced
two-camera setup by mi in image i
3D Computer Vision: IV. Computing with a Camera Pair (p. 74/211) R. Sara, CMP; rev. 7Jan2014
Epipolar Geometry Example: Forward Motion
18
6 18
13
10
12
3
916 13
10
3
12
5211 7
17
8 52916
11 7
8
17
414 4
14
15
15
1 1
image 1 image 2
red: correspondences (clock to see their IDs)
green: epipolar line pairs per correspondence (same ID in both images)
h =?
movement
2 1
3D Computer Vision: IV. Computing with a Camera Pair (p. 75/211) R. Sara, CMP; rev. 7Jan2014
ICross Products and Maps by Skew-Symmetric 3 3 Matrices
4. [b] b = 0
5. rank [b] = 2 iff kbk > 0 check minors of [b]
>
6. for any regular B: B [Bz] B = det B [z] follows from the factoring on Slide 36
> >
7. special case: if RR = I then [Rb] = R [b] R
8. if Rb is rotation about b then [Rb b] = [b]
9. note [b] is not a homography it is singular
2
10. if n is a line and A skew symmetric then A n ' n
3D Computer Vision: IV. Computing with a Camera Pair (p. 76/211) R. Sara, CMP; rev. 7Jan2014
IExpressing Epipolar Constraint Algebraically
X
d1 " d2 Pi = Qi qi = Ki Ri ti , i = 1, 2
p"
m1 1 2 m2 R21 relative camera rotation, R21 = R2 R>
l2
1
l1
b t21 relative camera translation, t21 = t2 R21 t1 = R2 b
C1 e1 e2 C2
remember: C = Q1 q = R> t (Slides 30 and 32)
0 = d> 1 >
Q> > > > >
Q> >
2 p ' (Q2 m2 ) 1 l1 = m2 Q2 Q1 (e1 m1 ) = m2 2 Q1 [e1 ] m1
|{z} | {z } | {z } | {z } | {z }
normal of optical ray optical plane image of in 2 fundamental matrix F
e1 ' Q1 C2 + q1 = Q1 C2 Q1 C1 = K1 R1 b
~1
F = Q> > > > >
2 Q1 [e1 ] = Q2 Q1 [K1 R1 b] = ' K2 [t21 ] R21 K1
1
Slide 76
| {z }
essential matrix E
E = [t21 ] R21 = [R2 b] R21 = R21 [R1 b]
| {z } | {z }
baseline in Cam 2 baseline in Cam 1
3D Computer Vision: IV. Computing with a Camera Pair (p. 77/211) R. Sara, CMP; rev. 7Jan2014
IKey Properties of Fundamental Matrix
F = K>
2 [t21 ] R21 K1
1
| {z }
essential matrix E
3D Computer Vision: IV. Computing with a Camera Pair (p. 78/211) R. Sara, CMP; rev. 7Jan2014
ISome Mappings by the Fundamental Matrix
Fm1
0 = m>
2 F m1
m1 1 2 m2 e1 ' null(F), e2 ' null(F> )
l1
l2
l2 = Fm1 l1 = F> m2
e1 e2
l2 = F[e1 ] l1 l1 = F> [e2 ] l2
m1 m>2 Fm1 = 0 m2
proof of l2 ' F [e1 ] l1 : line/point transmutation
l2 ' F x ' F(k l1 ) = F [k] l1 = F [e1 ] l1
F F>
k = e1
def
3D Computer Vision: IV. Computing with a Camera Pair (p. 79/211) R. Sara, CMP; rev. 7Jan2014
ISVD Decomposition of 3 3 Skew-Symmetric Matrices
3D Computer Vision: IV. Computing with a Camera Pair (p. 80/211) R. Sara, CMP; rev. 7Jan2014
IRepresentation Theorem for Essential Matrices
Theorem
Let E be a 3 3 matrix with SVD E = UDV>. Then E is essential iff D ' diag(1, 1, 0).
Converse:
Let E = UDV> be the SVD of E. We can write E = UDWW> V> for some orthogonal
W that transforms D to an skew symmetric matrix S. The only such W are
0 0 0 1 0 0
W = 0 0 such that || = 1, hence S = 1 0 0 = 0 = [s]
0 0 1 0 0 0 1
> >
and E = USU> UW
| {z V } = [t] R has the form of an essential matrix. u
t
R
note: USU> = U[s] U> = [Us] = [u3 ] , where u3 is the last column of U. Slide 76
3D Computer Vision: IV. Computing with a Camera Pair (p. 81/211) R. Sara, CMP; rev. 7Jan2014
IEssential Matrix Decomposition
4 solution sets for 4 sign combinations of , see next for geometric interpretation
3D Computer Vision: IV. Computing with a Camera Pair (p. 82/211) R. Sara, CMP; rev. 7Jan2014
IFour Solutions to Essential Matrix Decomposition
Transform the world coordinate system so that the origin is in Camera 2. Then t21 = b
and W rotates about the baseline b. Slide 77
C1 b C2 C1 C2
, , (twisted by W)
C1 C1
C2 C2
When is F not uniquely determined from any number of correspondences? [H&Z, Sec. 11.9]
1. when images are related by homography
a. camera centers coincide C1 = C2 : H = K2 R21 K1 1
b. camera moves but all 3D points lie in a plane (n, d): H = K2 (R21 t21 n> /d)K1
1
notes
estimation of E can deal with planes: [s] H = [s] (R21 t21 n> /d) has equal eigenvalues
iff s = t21 , the decomposition works (nonunique, as before) ~ P1; 1pt for a proof
a complete treatment with additional degenerate configurations in [H&Z, sec. 22.2]
a stronger epipolar constraint could reject some configurations
3D Computer Vision: IV. Computing with a Camera Pair (p. 85/211) R. Sara, CMP; rev. 7Jan2014
A Note on Oriented Epipolar Constraint
a tighter epipolar constraint preserves orientations
requires all points and cameras be on the same side of the plane at infinity
X
d1 " d2
m1 1 2 m2
l1 l2
b
C1 e1 e2 C2
e2 m2
+ F m1
notation: m
+ n means m = n, > 0
note that the constraint is not invariant to the change of either sign of mi
all 7 correspondence in 7-point alg. must have the same sign see later
this may help reject some wrong matches, see Slide 108 [Chum et al. 2004]
an even more tight constraint: scene points in front of both cameras expensive
this is called chirality constraint
3D Computer Vision: IV. Computing with a Camera Pair (p. 86/211) R. Sara, CMP; rev. 7Jan2014
I5-Point Algorithm for Relative Camera Orientation
Problem: Given {mi , m0i }5i=1 corresponding image points and calibration matrix K,
recover the camera motion R, t.
Obs:
1. E 8 numbers
2. R 3DOF, t we can recover 2DOF only, in total 5 DOF we need 3 constraints on E
3. E essential iff it has two equal singular values and the third is zero
when all 3D points lie on a plane: at most 2 solutions (twisted-pair) can be disambiguated in 3 views
or by chirality constraint (Slide 83) unless all 3D points are closer to one camera
6-point problem for unknown f [Kukelova et al. BMVC 2008]
resources at http://cmp.felk.cvut.cz/minimal/5_pt_relative.php
3D Computer Vision: IV. Computing with a Camera Pair (p. 87/211) R. Sara, CMP; rev. 7Jan2014
IThe Triangulation Problem
Problem: Given cameras P1 , P2 and a correspondence x y compute a 3D point X
projecting to x and y
1 2 i >
u u (p1 )
1 x = P1 X , 2 y = P2 X , 1
x = v , 2
y = v , Pi = (pi2 )>
1 1 (pi3 )>
Linear triangulation method
u1 (p13 )> X = (p11 )> X, u2 (p23 )> X = (p21 )> X,
v 1 (p13 )> X = (p12 )> X, v 2 (p23 )> X = (p22 )> X,
Gives 1 1 >
u (p3 ) (p11 )>
1 1 >
v (p3 ) (p12 )>
DX = 0, D= 2 2 >
2 >
, D R4,4 , X R4 (14)
u (p3 ) (p1 )
v 2 (p23 )> (p22 )>
back-projected rays will generally not intersect due to image error, see next
using Jack-knife (Slide 67) not recommended sensitive to small error
we will use SVD (Slide 89)
but the result will not be invariant to projective frame
replacing P1 7 P1 H, P2 7 P2 H does not always result in X 7 H1 X
the homogeneous form in (14) can represent points at infinity
3D Computer Vision: IV. Computing with a Camera Pair (p. 88/211) R. Sara, CMP; rev. 7Jan2014
IThe Least-Squares Triangulation by SVD
Proof. 4 4
X X
q> Q q = j2 q> uj u>
j q= j2 (u>
j q)
2
j=1 j=1
3
X
q> Q q = 42 + j2 (u> 2 2
j q) 4
j=1
3D Computer Vision: IV. Computing with a Camera Pair (p. 89/211) u
t R. Sara, CMP; rev. 7Jan2014
Icontd
~ P1; 2pt: Why did we decompose D and not Q = D> D? Could we use QR decomposition
instead of SVD?
3D Computer Vision: IV. Computing with a Camera Pair (p. 90/211) R. Sara, CMP; rev. 7Jan2014
INumerical Conditioning
The equation DX = 0 in (14) may be ill-conditioned for
numerical computation, which results in a poor estimate for X.
0 = D q = D S S1 q = D q
choose S to make the entries in D all smaller than unity in absolute value:
S = diag(103 , 103 , 103 , 106 ) S = diag(1./max(max(abs(D)),1))
2. solve for q as before
3. get the final solution as q = S q
when SVD is used in camera resection, conditioning is essential for success Slide 66
3D Computer Vision: IV. Computing with a Camera Pair (p. 91/211) R. Sara, CMP; rev. 7Jan2014
Algebraic Error vs Reprojection Error
algebraic error: from SVD Slide 90
2 2 2
2 2 c > c > c > c >
X c c
= 4 = u (p3 ) X (p1 ) X + v (p3 ) X (p2 ) X
c=1
reprojection error 2
" !2 !2 #
2
X c (pc1 )> X c (pc2 )> X
e = u + v
c=1
(pc3 )> X (pc3 )> X
algebraic error zero reprojection error zero 4 = 0 non-trivial null space
epipolar constraint satisfied equivalent results
in general: minimizing algebraic error cheap but it gives inferior results
minimizing reprojection error expensive but it gives good results
the midpoint of the common perpendicular to both optical rays gives about 50% greater error in 3D
the gold standard method deferred to Slide 103
Ex: forward camera motion
error f /50 in image 2, orthogonal to epipolar plane
XT noiseless ground truth position
Xr reprojection error minimizer
Xa algebraic error minimizer
m measurement (mT with noise in v 2 )
3D Computer Vision: IV. Computing with a Camera Pair (p. 92/211) R. Sara, CMP; rev. 7Jan2014
Optimal Triangulation for the Geeks
detected image points x, y do not satisfy epipolar geometry exactly
as a result optical rays do not intersect in space, we must correct the image points to x, y first
X
l1 x l2
y^
x^ y
1 e1 2
1. given epipolar line l1 and l2 , l2 ' F[e1 ] l1 let the x, y be the closest points on l1 , l2
2. parameterize all possible l1 by
find after translating x, y to (0, 0, 1), rotating the epipoles to (1, 0, f1 ), (1, 0, f2 ), and
parameterising l1 = (0, , 1) (1, 0, f1 )
3. minimise the error
= arg min d2 x, l1 () + d2 y, l2 ()
the problem reduces to 6-th degree polynomial root finding, see [H&Z, Sec 12.5.2]
4. compute x, y and triangulate using the linear method on Slide 88
a fully optimal procedure requires error re-definition in order to get the most probable x, y
3D Computer Vision: IV. Computing with a Camera Pair (p. 93/211) R. Sara, CMP; rev. 7Jan2014
IWe Have Added to The ZOO
algebraic error optimization (with SVD) makes sense in camera resection and triangulation
only
but it is not the best method; we will now focus on optimizing optimally
3D Computer Vision: IV. Computing with a Camera Pair (p. 94/211) R. Sara, CMP; rev. 7Jan2014
Part V
covered by
[1] [H&Z] Secs: 11.4, 11.6, 4.7
[2] Fischler, M.A. and Bolles, R.C . Random Sample Consensus: A Paradigm for Model
Fitting with Applications to Image Analysis and Automated Cartography.
Communications of the ACM 24(6):381395, 1981
additional references
P. D. Sampson. Fitting conic sections to very scattered data: An iterative refinement of the Bookstein
algorithm. Computer Vision, Graphics, and Image Processing, 18:97108, 1982.
O. Chum, J. Matas, and J. Kittler. Locally optimized RANSAC. In Proc DAGM, LNCS 2781:236243.
Springer-Verlag, 2003.
O. Chum, T. Werner, and J. Matas. Epipolar geometry estimation via RANSAC benefits from the oriented
epipolar constraint. In Proc ICPR, vol 1:112115, 2004.
3D Computer Vision: V. Optimization for 3D Vision (p. 95/211) R. Sara, CMP; rev. 7Jan2014
IThe Concept of Error for Epipolar Geometry
xi F
yi
xi yi
image 1 image 2
k
detected points xi , yi ; the set of matches is S = (xi , yi )
i=1
k
corrected points xi , yi ; the set is S = (xi , yi )
i=1
corrected points satisfy the epipolar geometry exactly y> F xi = 0, i = 1, . . . , k
i
small correction is more probable
let e() be an error function (vector), e.g., per match i,
xi xi def
ei (xi , yi | xi , yi , F) = , kei ()k2 = e2i () = kxi xi k2 + kyi yi k2 (15)
yi yi
k
X
L(S | S, F) = e2i (xi , yi | xi , yi , F)
i=1
k
X
(S , F ) = arg min min e2i (xi , yi | xi , yi , F) (16)
F S
> i=1
rank F = 2 yi F xi = 0
Three approaches
they differ in how the corrected points xi , yi are obtained:
1. direct optimization of reprojection error over all variables S, F Slide 98
3D Computer Vision: V. Optimization for 3D Vision (p. 97/211) R. Sara, CMP; rev. 7Jan2014
Method 1: Geometric Error Optimization
we need to encode the constraints y F xi = 0, rank F = 2
i
idea: reconstruct 3D point via equivalent projection matrices and use reprojection error
equivalent projection matrices are see [H&Z,Sec. 9.5] for complete characterization
h i
P1 = I 0 , P2 = [e2 ] F + e2 e>
1 e2
~ H3; 2pt: Verify that F is a f.m. of P1 , P2 , for instance that F ' Q>
2 Q>
1 [e1 ]
(0)
1. compute F(0) by the 7-point algorithm Slide 84; construct camera P2 from F(0)
(0)
2. triangulate 3D points Xi from matches (xi , yi ) for all i = 1, . . . , k Slide 88
3D Computer Vision: V. Optimization for 3D Vision (p. 98/211) R. Sara, CMP; rev. 7Jan2014
IMethod 2: First-Order Error Approximation
Observations:
correspondences xi yi satisfy yi > F xi = 0, xi = (u1 , v 1 , 1), yi = (u2 , v 2 , 1)
this is a manifold VF R : a set of points Z = (u , v 1 , u2 , v 2 ) consistent with F
4 1
Sampsons idea: Linearize the algebraic eror i (Z) at Zi (where it is non-zero) and evaluate the
resulting linear function at Zi (where it should be zero). This replaces VF by a linear manifold L.
The point on VF closest to Zi is replaced by the closest point on L.
3D Computer Vision: V. Optimization for 3D Vision (p. 99/211) R. Sara, CMP; rev. 7Jan2014
ISampsons Idea
Linearize i (Z) at correspondence Zi , evaluate it at Zi and get a constraint on ei (Zi , Zi )
have: i (Zi ), want: ei (Zi , Zi )
i (Zi ) def
0 = i (Zi ) i (Zi ) + (Zi Zi ) = i (Zi ) + Ji (Zi ) ei (Zi , Zi )
Zi
| {z } | {z }
Ji (Zi ) ei (Zi ,Zi )
Illustration on circle fitting
x
We are estimating distance from point x to circle VC of radius r in canonical position.
e
The circle is kxk2 r2 = 0. We convert non-linear error function (x) = kxk2 r2 to
linear function L (x), which replaces the circle (x) = 0 by a line L (x) = 0:
VC
(x) def
(x) (x) + (x x) = = 2 x> x (r2 + kxk2 ) = L (x)
x
| {z } | {z }
J(x)=2x> e(x,x)
x r 2 +kxk2
where L (x) = 0 is a line with normal kxk
and intercept 2kxk
not tangent to VC , outside!
x1
nonlinear algebraic error (x)
L1 (x) = 0 linear function over R2 : L (x)
(xi )
(x) = 0 VC R2
VC line in R : L (x) = 0
2
xi
x2
L2 (x) = 0 xi e(xi , xi )
3D Computer Vision: V. Optimization for 3D Vision (p. 100/211) R. Sara, CMP; rev. 7Jan2014
ISampson Error Approximation
Given a single measurement Zi and algebraic error i (Zi ), estimate optimal ei (Zi ).
In general, the Taylor expansion is
i (Zi ) !
i (Zi ) + (Zi Zi ) = i (Zi ) + Ji (Zi ) ei (Zi , Zi ) = 0
Zi | {z } | {z } | {z }
Rn n,d d
| {z } | {z }
i Ji R ei R
Ji (Zi ) e(Zi ,Zi )
n algebraic error dimension, d dimension of latent parameter space (n < d, n = 1, d = 4 for y> Fx = 0)
ei = J> > 1
i (Ji Ji ) i
kei k2 = > > 1
i (Ji Ji ) i
3D Computer Vision: V. Optimization for 3D Vision (p. 101/211) R. Sara, CMP; rev. 7Jan2014
ISampson Error: Result for Fundamental Matrix Estimation
The fundamental matrix estimation problem becomes ei is scalar, hence ei
k
X
F = arg min e2i (F)
F,rank F=2
i=1
1 >
(F ) 1 0 0
F3 (per columns) = (F2 )> (per rows), S = 0
Let F = F1 F2 1 0, then
(F3 )> 0 0 0
Sampson
i (F) = y>i F xi i R scalar algebraic error from Slide 84
i (F) i (F) i (F) i (F)
Ji (F) = 1
, 1
, 2
, 2
Ji R1,4 derivatives over point coords.
ui vi ui vi
i (F)
ei (F) = ei R Sampson error
kJi (F)k
h i y>
i Fxi
Ji (F) = (F1 )> yi , (F2 )> yi , (F1 )> xi , (F2 )> xi ei (F) = q
kSFxi k2 + kSF> yi k2
3D Computer Vision: V. Optimization for 3D Vision (p. 103/211) R. Sara, CMP; rev. 7Jan2014
Levenberg-Marquardt (LM) Iterative Estimation
L>
P P >
Idea 2 (Levenberg): replace i Li with
i i Li Li + I for some damping factor 0
k k
!
X > s
X > >
Li ei ( ) = Li Li + diag Li Li ds
i=1 i=1
3D Computer Vision: V. Optimization for 3D Vision (p. 105/211) R. Sara, CMP; rev. 7Jan2014
LM with Sampson Error for Fundamental Matrix Estimation
3D Computer Vision: V. Optimization for 3D Vision (p. 106/211) R. Sara, CMP; rev. 7Jan2014
ILocal Optimization for Fundamental Matrix Estimation
Given a set {(xi , yi )}ki=1 of k > 7 inlier correspondences, compute an efficient estimate for
fundamental matrix F.
1. Find the conditioned ( Slide 91) 7-point F0 ( Slide 84) from a suitable 7-tuple
2. Improve the F0 using the LM optimization ( Slides 104105) and the Sampson error
( Slide 106) on all inliers, reinforce rank-2, unit-norm Fk after each LM iteration
using SVD
3D Computer Vision: V. Optimization for 3D Vision (p. 107/211) R. Sara, CMP; rev. 7Jan2014
IThe Full Problem of Matching and Fundamental Matrix Estimation
Problem: Given two sets of image points X = {xi }m n
i=1 and Y = {yj }j=1 and their
descriptors D, find the most probable
1. inliers SX X, SY Y
2. one-to-one perfect matching M : SX SY perfect matching: 1-factor of the bipartite graph
3. fundamental matrix F such that rank F = 2
4. such that for each xi SX and yj = M (xi ) it is probable that
a. the image descriptor D(xi ) is similar to D(yj ), and
P 2
b. the total geometric error ij eij (F) is small note a slight change in notation: eij
5. inlier-outlier and outlier-outlier matches are improbable
X SX M Y
1
M: Y
1 SY 1 2 3 4 5 6 7 8
2 3 2 1
3
5 2 =0
4 5 6 4 3
X
4 =1
6 7
8 5
6
F = arg max p(F | X, Y, D) (20)
F
Marginalization: m Y
n
X X X XX X Y
p(X, Y, D, M | F) = pe (eij , dij , mij | F) =
m11 {0,1} m12 mmn m11 m12 mmn i=1 j=1
m Y
Y n X
= = pe (eij , dij , mij | F) = p(X, Y, D | F)
i=1 j=1 mij {0,1}
| {z }
we will continue with this term
3D Computer Vision: V. Optimization for 3D Vision (p. 109/211) R. Sara, CMP; rev. 7Jan2014
Robust Matching Model (contd)
X X
pe (eij , dij , mij | F) = pe (eij , dij | mij , F) p(mij | F) =
mij {0,1} mij {0,1}
the p0 (eij , dij | F) const is a penalty for missing a correspondence but it should be a
p.d.f. (cannot be a constant) (see Slide 111 for a simplification)
0
choose 0 1, p0 0 so that p0 const
1 0
the p1 (eij , dij | F) is typically an easy-to-design component: assuming independence of
geometric error and descriptor similarity:
p1 (eij , dij | F) = p1 (eij | F) p1 (dij )
we choose, eg.
e2
ij (F) kd(xi )d(yj )k2
1
21 2
1
2d 2
p1 (eij | F) = e , p1 (dij ) = e (22)
Te (1 , F) Td (d , dim d)
1 , d , 0 are hyper-parameters
the form of T (1 , F) depends on error definition
we will continue with the result from (21)
3D Computer Vision: V. Optimization for 3D Vision (p. 110/211) R. Sara, CMP; rev. 7Jan2014
ISimplified Robust Energy (Error) Function
we define energy as: V (x) = log p(x) this helps simplify the formulas
for simplicity, we omit dij
we choose 0 1 and the missed-correspondence penalty function as
e2
ij (F)
1
20 2
p0 (eij | F) = e
Te (0 , F)
then
m X n e2
ij (F) e2
ij (F)
log 1 0 log e 21 2 + 0 Te (1 , F) e 20 2
X
V (X, Y, D | F) =
i=1 j=1 |
Te (1 , F) 1 0 Te (0 , F)
{z } | {z }
(F) t const
note we are summing over all possible matches (m, n are fixed)
3D Computer Vision: V. Optimization for 3D Vision (p. 111/211) R. Sara, CMP; rev. 7Jan2014
IThe Action of the Robust Matching Model on Data
Example for V (e) from (24):
1 = 1
4
red the usual (non-robust) error when t = 0
V when t = 0
3.5 V when t = 0.25
blue the rejected correspondence penalty t
t = 0.25 green robust energy (24)
3
2.5
x x x
N (0, 2 I) (, ) N (0, 2 I)
p(||x|| | r = 1)
p(||x|| | r = 1)
0.15 0.15 0.15
0 0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
||x|| ||x|| ||x||
3 3 3
random sample
2 2 2
1 1 1
0 0 0
1 1 1
2 2 2
3 3 3
3 2 1 0 1 2 3 3 2 1 0 1 2 3 3 2 1 0 1 2 3
p(x | r)
1.5
consider several methods:
1. exhaustive search
step = 1/(iterations-1);
p(x)
1
for x = 0:step:1
if p(x) > bestp
0.5 bestx = x; bestp = p(x);
end
end
0
0 0.2 0.4
x
0.6 0.8 1
slow algorithm (definite quantization); faster variants
exist fast to implement
exhaustive 2. randomized search with uniform sampling
x = rand(1);
randomized
if p(x) > bestp
bestx = x; bestp = p(x);
MH_crawl
end
Gibbs slow algorithm but better convergence fast to
implement how to stop it?
0 1000 2000 3000 4000 5000
iterations
3. random sampling from p(x) (Gibbs sampler)
averaged over 104 trials faster algorithm fast to implement but often infeasible (e.g.
when p(x) is data dependent (our case))
number of proposals before
|x xtrue | step 4. Metropolis-Hastings sampling
almost as fast (with care) not so fast to implement rarely
infeasible RANSAC belongs here
3D Computer Vision: V. Optimization for 3D Vision (p. 114/211) R. Sara, CMP; rev. 7Jan2014
How To Generate Random Samples from a Complex Distribution?
target (red) and scaled proposal (blue) distributions red: probability density function p(x) of a toy
2
distribution on the unit interval target distribution
1.5 4
X 4
X
p(x) = i Be(x; i , i ), i = 1, i 0
p(x), q(x|x0)/10
i=1 i=1
1
1
Be(x; , ) = x1 (1 x)1
0.5 B(, )
suppose we cannot sample from p(x) but we can sample from some simple
distribution q(x | x0 ), given the last sample x0 (blue) proposal distribution
U0,1 (x)
(independent) uniform sampling
q(x | x0 ) = Be(x; xT0 + 1, 1x0
T
+ 1) beta diffusion (crawler) T temperature
p(x) (independent) Gibbs sampler
note we have unified all the random sampling methods from the previous slide
how to transform proposal samples q(x | x0 ) to target distribution p(x) samples?
3D Computer Vision: V. Optimization for 3D Vision (p. 115/211) R. Sara, CMP; rev. 7Jan2014
IMetropolis-Hastings (MH) Sampling
initial sample
3D Computer Vision: V. Optimization for 3D Vision (p. 117/211) R. Sara, CMP; rev. 7Jan2014
Demo Source Code (Matlab)
3D Computer Vision: V. Optimization for 3D Vision (p. 118/211) R. Sara, CMP; rev. 7Jan2014
ISimplifying MH Sampler to RANSAC
3. independent sampling & looking for the best sample no need to filter proposals by a
4. standard RANSAC replaces probability maximization with consensus maximization
S
2eT
see MPV course for RANSAC details see also [Fischler & Bolles 1981], [25 years of RANSAC]
3D Computer Vision: V. Optimization for 3D Vision (p. 120/211) R. Sara, CMP; rev. 7Jan2014
IStopping RANSAC
Principle: what is the number of proposals N that are needed to hit an all-inlier sample?
P . . . probability that at least one sample is all-inlier (1 P . . . all N samples were bad)
. . . the fraction of inliers among tentative correspondences, 1
s . . . sample size (7 in 7-point algorithm)
s . . . proposal does not contain an outlier
log(1 P )
N 1 s . . . proposal contains at least one outlier
log(1 s ) (1 s )N . . . all N proposals contained an outlier = 1 P
s=7
8
10
P=0.5
P=0.8
6
N for s = 7 10 P=0.99
P=0.999
N (proposals)
P
0.8 0.99 10
4
input images interest points (ca. 3600) tentative corresp. (416) matching (340)
notice some wrong matches (they have wrong depth, even negative)
they cannot be rejected without additional constraints or scene knowledge
without local optimization the minimization is over a discrete set of epipolar geometries
proposable from 7-tuples
3D Computer Vision: V. Optimization for 3D Vision (p. 122/211) R. Sara, CMP; rev. 7Jan2014
Example: MH Sampling for a More Complex Problem
Task: Find two vanishing points from line segments detected in input image.
Model
principal point known, square pixel
explicit variables
1. two unknown vanishing points v1 , v2
latent variables
1. each line has a vanishing point label
i {, 1, 2}, represents an outlier
2. mother lines through vanishing points
v2
video
=2
simplifications =;
v1
vanishing points restricted to the set of all
pairwise segment intersections
=1
mother lines fixed by segment centroid
3D Computer Vision: V. Optimization for 3D Vision (p. 123/211) R. Sara, CMP; rev. 7Jan2014
Beyond RANSAC
By marginalization in (20) we have lost constraints on M (eg. uniqueness). One can choose a
better model when not marginalizing:
p(M, F, X, Y, D) = p(X, Y | M, F) p(D | M ) p(M ) p(F)
| {z } | {z } | {z } | {z }
geometric error similarity constraints prior
3D Computer Vision: V. Optimization for 3D Vision (p. 124/211) R. Sara, CMP; rev. 7Jan2014
Part VI
20 Introduction
22 Bundle Adjustment
covered by
[1] [H&Z] Secs: 9.5.3, 10.1, 10.2, 10.3, 12.1, 12.2, 12.4, 12.5, 18.1
[2] Triggs, B. et al. Bundle AdjustmentA Modern Synthesis. In Proc ICCV Workshop
on Vision Algorithms. Springer-Verlag. pp. 298372, 1999.
additional references
D. Martinec and T. Pajdla. Robust Rotation and Translation Estimation in Multiview Reconstruction. In
Proc CVPR, 2007
M. I. A. Lourakis and A. A. Argyros. SBA: A Software Package for Generic Sparse Bundle Adjustment.
ACM Trans Math Software 36(1):130, 2009.
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 125/211) R. Sara, CMP; rev. 7Jan2014
IConstructing Cameras from the Fundamental Matrix
where
v is any 3-vector, e.g. v = e1 to make the camera finite
6= 0 is a scalar,
e2 = null(F> ), i.e. e>
2 F = 0
Proof
1. S is skew symmetric iff x> Sx = 0 for all x look-up the proof!
2. we have x ' PX
3. a non-zero F is a f.m. iff P> FP1 is skew symmetric
2
4. if P1 = I 0 and P2 = SF e2 then F corresponds to (P1 , P2 ) by Step 3
5. we can write S = [s]
6. a suitable choice is s = e2 [Luong96]
7. for the full the class including v, see [H&Z, Sec. 9.5]
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 126/211) R. Sara, CMP; rev. 7Jan2014
IThe Projective Reconstruction Theorem
mi = Pi X = Pi H1 HX = P0i X0
| {z } |{z}
P0i X0
1 2
m1 m2 X X0
when cameras are internally calibrated (Ki known) then H is restricted to a similarity
since it must preserve the calibrations Ki [H&Z, Secs. 10.2, 10.3], [Longuet-Higgins 1981]
(translation, rotation, scale)
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 127/211) R. Sara, CMP; rev. 7Jan2014
IReconstructing Camera Systems
Problem: Given a set of p decomposed pairwise essential matrices Eij = [ tij ] Rij and
calibration matrices Ki reconstruct the camera system Pi , i = 1, . . . , k
81 and 142 on representing E
P7
E^ 78 P6 We construct calibrated camera pairs Pij R6,4 126
1
Ki Pi I 0
P8 P5
Pij =
K1j Pj
=
Rij tij
R6,4
E^ 18 E^ 82 singletons i, j correspond to vertices V k vertices
pairs ij correspond to graph edges E p edges
P1 E^ 12 P2 P3 P4
Pij are in different coordinate systems but these are related by similarities Pij Hij = Pij
I 0 Rij tij ! Ri ti
> = (26)
Rij tij 0 sij Rj t j
| {z }| {z } | {z }
R6,4 Hij R4,4 R6,4
(26) is a linear system of 24p eqs. in 7p + 6k unknowns 7p (tij , Rij , sij ), 6k (Ri , ti )
each Pi appears on the right side as many times as is the degree of vertex Pi eg. P5 3-times
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 128/211) R. Sara, CMP; rev. 7Jan2014
Icontd
Rij Ri tij ti
Eq. (26) implies = =
Rij Rij Rj Rij tij + sij tij tj
Rij and tij can be eliminated:
note transformations that do not change these equations assuming no error in Rij
1. Ri 7 Ri R, 2. ti 7 ti and sij 7 sij , 3. ti 7 ti + Ri t
Ex: (k = p = 3)
P3
R12 rc1 rc2 = 0
c
R12 I 0 r1
E^ 13 E^ 23 R23 rc2 rc3 = 0
c
Dr = 0 R23 I rc2 = 0
R13 rc1 rc3 = 0 R13 0 I rc3
P1 E^ 12 P2
must hold for any c
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 130/211) R. Sara, CMP; rev. 7Jan2014
Finding The Translation Component in Eq. (27)
From eqs. (27) and (28): d rank of camera center set p No. of pairs, k No. of cameras
k
X X
Rij ti + sij tij tj = 0, ti = 0, sij = p, sij > 0, ti Rd
i=1 i,j
d(k1)1
in rank d: d p + d + 1 equations for d k + p unknowns p d1
Ex: Chains and circuits construction from sticks of known orientation and unknown length?
p=k1 k=p=3 k=p=4 k=p>4
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 132/211) R. Sara, CMP; rev. 7Jan2014
ISolving Eq. (27) by Stepwise Gluing
Attaching camera Pj /C
1. select points Xj from X that have matches to Pj
2. estimate Pj using Xj , RANSAC with the 3-pt alg. (P3P), projection errors eij in Xj 70
3. reconstruct 3D points from all tentative matches from Pj to all Pl , l 6= k that are not in X
4. filter them by the chirality and apical angle constraints and add them to X
5. add Pj to C
6. perform bundle adjustment on X and C coming next
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 133/211) R. Sara, CMP; rev. 7Jan2014
IBundle Adjustment
Given: Required:
1. set of 3D points {Xi }pi=1
1. corrected 3D points {X0i }pi=1
2. set of cameras {Pj }cj=1
2. corrected cameras {P0j }cj=1
3. fixed tentative projections mij
Latent:
X i 1. visibility decision vij {0, 1} per mij
P1
P2 Pj
e 1 (X P1 ) e (X P )
ij i; j
m1
i
i
i;
m ij
m2
i
log p
ij 1 1 1 e2ij () 4
(Lij )l = = 2 2 2
(30)
l e ()/(21 ) ij () 4 l 2
1 + t e ij 1
| {z }
0
small for big eij 4 2 0 2 4
x
i j blocks:
Lr = : Xi , 1 3
: Pj , 1 11
3p 11
i j
i
3p
blocks: k
: Xi Xi , 3 3 X
L>
r Lr = : Xi Pj , 3 11
L>
r Lr =
j
: Pj Pj , 11 11 r=1
function L = pchol(A)
%
% PCHOL profile Choleski factorization,
% L = PCHOL(A) returns lower-triangular sparse L such that A = L*L
% for sparse square symmetric positive definite matrix A,
% especially useful for arrowhead sparse matrices.
[p,q] = size(A);
if p ~= q, error Matrix must be square; end
L = sparse(q,q);
F = ones(q,1);
for i=1:q
F(i) = find(A(i,:),1); % 1st non-zero on row i; we are building F gradually
for j = F(i):i-1
k = max(F(i),F(j));
a = A(i,j) - L(i,k:(j-1))*L(j,k:(j-1));
L(i,j) = a/L(j,j);
end
a = A(i,i) - sum(full(L(i,F(i):(i-1))).^2);
if a < 0, error Matrix must be positive definite; end
L(i,i) = sqrt(a);
end
end
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 138/211) R. Sara, CMP; rev. 7Jan2014
IGauge Freedom
1. The external frame is not fixed: See Projective Reconstruction Theorem, 127
mi ' Pj Xi = Pj H1 HXi = P0j X0i
2. Some representations are not minimal, e.g.
P is 12 numbers for 11 parameters
we may represent P in decomposed form K, R, t
but R is 9 numbers representing the 3 parameters of rotation
As a result
there is no unique solution
P >
matrix r Lr Lr is singular
Solutions
fixing the external frame (e.g. a selected camera frame) explicitly or by constraints
imposing constraints on projective entities
cameras, e.g. P3,4 = 1 this excludes affine cameras
points, e.g. kXi k2 = 1 this way we can represent points at infinity
using minimal representations
points in their Euclidean representation Xi but finite points may be an unrealistic model
rotation matrix can be represented by Cayley transform see next
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 139/211) R. Sara, CMP; rev. 7Jan2014
IImplementing Simple Constraints
What for?
1. fixing external frame i = i0 trivial gauge
2. representing additional knowledge i = j e.g. cameras share calibration matrix K
T deletes columns of Lr that correspond to fixed parameters it reduces the problem size
consistent initialisation: 0 = T 0 + t
or filter the initialization by pseudoinverse 0 7 T 0
we need not compute derivatives for j that correspond to all-zero rows Tj
fixed params
constraining projective entities minimal representations
more complex constraints tend to make normal equations dense
implementing constraints is safer than explicit renaming of the parameters, gives a flexibility
to experiment
other methods are much more involved, see [Triggs et al. 1999]
BA resource: http://www.ics.forth.gr/~lourakis/sba/ [Lourakis 2009]
3D Computer Vision: VI. 3D Structure and Camera Motion (p. 140/211) R. Sara, CMP; rev. 7Jan2014
IMinimal Representations for Rotation
1. Rodrigues representation
R = I + sin [o] + (1 cos )[o]2
1 1
sin [o] = (R R> ), cos = (tr R 1)
2 2
hiding in the vector o as in [sin o] is not so easy
Cayley tried:
Stereovision
23 Introduction
24 Epipolar Rectification
25 Binocular Disparity and Matching Table
26 Image Likelihood
27 Maximum Likelihood Matching
28 Uniqueness and Ordering as Occlusion Models
29 Three-Label Dynamic Programming Algorithm
mostly covered by
Sara, R. How To Teach Stereoscopic Vision. Proc. ELMAR 2010 referenced as [SP]
additional references
C. Geyer and K. Daniilidis. Conformal rectification of omnidirectional stereo pairs. In Proc Computer Vision
and Pattern Recognition Workshop, p. 73, 2003.
J. Gluckman and S. K. Nayar. Rectifying transformations that minimize resampling effects. In Proc IEEE
CS Conf on Computer Vision and Pattern Recognition, vol. 1:111117. 2001.
M. Pollefeys, R. Koch, and L. V. Gool. A simple and efficient rectification method for general motion. In
Proc Int Conf on Computer Vision, vol. 1:496501, 1999.
3D Computer Vision: VII. Stereovision (p. 143/211) R. Sara, CMP; rev. 7Jan2014
What Are The Relative Distances?
monocular vision already gives a rough 3D sketch because we understand the scene
3D Computer Vision: VII. Stereovision (p. 144/211) R. Sara, CMP; rev. 7Jan2014
What Are The Relative Distances?
3D Computer Vision: VII. Stereovision (p. 145/211) R. Sara, CMP; rev. 7Jan2014
What Are The Relative Distances? (Why?)
3D Computer Vision: VII. Stereovision (p. 146/211) R. Sara, CMP; rev. 7Jan2014
Repetition: How Many Scenes Correspond to a Stereopair?
when we do not recognize the scene and cannot use high-level constraints the problem
seems difficult (right, less so in the center)
most stereo matching algorithms do not require scene understanding prior to matching
the success of a model-free stereo matching algorithm is unlikely:
WTA Matching:
for every left-image
pixel find the most
similar right-image
pixel along the
corresponding epipolar
left image disparity map disparity map from WTA line [Marroquin 83]
3D Computer Vision: VII. Stereovision (p. 148/211) R. Sara, CMP; rev. 7Jan2014
Why Model-Free Stereo Fails?
B2
C3 C3
A1
B1
A A 3
3
B 2 B 2
C 1 C 1
interpretation 1 interpretation 2
3D Computer Vision: VII. Stereovision (p. 149/211) R. Sara, CMP; rev. 7Jan2014
But What Kind of Continuity Model Applies Here?
3D Computer Vision: VII. Stereovision (p. 150/211) R. Sara, CMP; rev. 7Jan2014
A Summary of Our Observations and an Outlook
3D Computer Vision: VII. Stereovision (p. 151/211) R. Sara, CMP; rev. 7Jan2014
I Epipolar Rectification
Rectification 1 Rectification 2
there is a 9-parameter family of rectification homographies for binocular rectification, see next
trinocular rectification has 9 or 6 free parameters (depends on additional
3D Computer Vision: VII. Stereovision (p. 152/211) constrains)
R. Sara, CMP; rev. 7Jan2014
Rectification Example
Four cameras in general position Rectified pairs
cam 3 cam 4
pair 2 4
pair 1 4
3D Computer Vision: VII. Stereovision (p. 153/211) R. Sara, CMP; rev. 7Jan2014
I Rectification Homographies
Pi = Hi Pi = Hi Ki Ri I Ci , i = 1, 2
u
v m1 m2 e2
rectified entities: F , l2 , l1 , etc:
l1 l2
corresponding epipolar lines must be:
1. parallel to image rows epipoles become e1 = e2 = (1, 0, 0)
2. equivalent l2 = l1 l2 ' l1 ' e1 m1 = [e1 ] m1 = F m1
both conditions together give the rectified fundamental matrix
0 0 0
F ' 0 0 1
0 1 0
A two-step rectification procedure
1. Find some pair of primitive rectification homographies H1 , H2
2. Upgrade them to a pair of optimal rectification homographies from the class
preserving F .
3D Computer Vision: VII. Stereovision (p. 154/211) R. Sara, CMP; rev. 7Jan2014
IGeometric Interpretation of Linear Rectification
3D Computer Vision: VII. Stereovision (p. 155/211) R. Sara, CMP; rev. 7Jan2014
Icontd
C1 C2
Corollary
the standard rectified stereo pair has vanishing disparity for 3D points at infinity
but known F alone does not give any constraints to obtain standard rectification homographies
for that we need either of these:
1. projection matrices, or
2. calibrated cameras, or
3. a few points at infinity calibrating k1i , k2i , i = 1, 2, 3 in (31)
3D Computer Vision: VII. Stereovision (p. 156/211) R. Sara, CMP; rev. 7Jan2014
I Primitive Rectification
3D Computer Vision: VII. Stereovision (p. 157/211) R. Sara, CMP; rev. 7Jan2014
IPrimitive Rectification Suffices for Calibrated Cameras
H2 P+ +
AU> R2 I C2 = KR I
2 = KH2 P2 = K | C2
{z }
R
one can prove that BV> R1 = AU> R2 with the help of (13)
points at infinity project to KR in both images they have zero disparity 162
3D Computer Vision: VII. Stereovision (p. 158/211) R. Sara, CMP; rev. 7Jan2014
IThe Degrees of Freedom in Epipolar Rectification
3D Computer Vision: VII. Stereovision (p. 160/211) R. Sara, CMP; rev. 7Jan2014
Optimal and Non-linear Rectification
forward egomotion
rectified images, Pollefeys method
3D Computer Vision: VII. Stereovision (p. 161/211) R. Sara, CMP; rev. 7Jan2014
I Binocular Disparity in Standard Stereo Pair
b top view
2 x Assumptions: single image line, standard camera pair
X b = z cot 1 z cot 2
u1 = f cot 1 u2 = f cot 2
z
ot 1 z
ot 2 z
b
b = + x z cot 2
2
X = (x, z) from disparity d = u1 u2 :
u1 u2
bf b u1 + u2 bv
m1 z
m2 z=
d
, x=
d 2
, y=
d
f
1 x 2 f , d, u, v in pixels, b, x, y, z in meters
C1 b C2 Observations
constant disparity surface is a frontoparallel plane
distant points have small disparity
f z
relative error in z is large for small disparity
C1 2;
1 dz 1
v =
y
m1 2
;
y z dd d
increasing baseline increases disparity and reduces
side view X the error
3D Computer Vision: VII. Stereovision (p. 162/211) R. Sara, CMP; rev. 7Jan2014
I Understanding Basic Occlusion Types
occluded
r3
r1 X2
surface pt. X1
r2
X
transparent
l
half occlusion mutual occlusion
surface point at the intersection of rays l and r1 occludes a world point at the intersection
(l, r3 ) and implies the world point (l, r2 ) is transparent, therefore
(l, r3 ) and (l, r2 ) are excluded by (l, r1 )
in half-occlusion, every world point such as X1 or X2 is excluded by a binocularly visible
surface point decisions on correspondences are not independent
in mutual occlusion this is no longer the case: any X in the yellow zone is not excluded
decisions in the zone are independent on the rest
mutuallyoccluded
halfoccluded
3D Computer Vision: VII. Stereovision (p. 163/211) R. Sara, CMP; rev. 7Jan2014
I Matching Table
Based on the observation on mutual exclusion we expect each pixel to match at most once.
1
1
2
3
1 2 3 4 1 2 3 4 4
1 2 5
2
1 2 3 4 5
C1 rays in epipolar plane C2 matching table T
matching table
rows and columns represent optical rays
nodes: possible correspondence pairs
full nodes: correspondences
numerical values associated with nodes: descriptor similarities see next
3D Computer Vision: VII. Stereovision (p. 164/211) R. Sara, CMP; rev. 7Jan2014
Image Point Descriptors And Their Similarity
video
3D Computer Vision: VII. Stereovision (p. 165/211) R. Sara, CMP; rev. 7Jan2014
I Constructing A Suitable Image Similarity
let pi = (l, r) and L(l), R(r) be (left, right) image descriptors (vectors) constructed from
local image neighborhood windows
in matching table T :
1
in the left image:
L (l )
l l
2
r
kL(l) R(r)k2
a natural descriptor similarity is sim(l, r) =
I2 (l, r)
I the difference scale; a suitable (plug-in) estimate is 12 s2 L(l) + s2 R(r) , giving
2
2 s L(l), R(r) 2
sim(l, r) = 1 2 s () is sample (co-)variance (32)
s L(l) + s2 R(r)
| {z }
L(l),R(r)
MNCC Moravecs Normalized Cross-Correlation [Moravec 1977]
2 [0, 1], sign phase
3D Computer Vision: VII. Stereovision (p. 166/211) R. Sara, CMP; rev. 7Jan2014
contd
=10, =1.5
6 10
log(Be(2;,))
4 6
Be(2;,)
1
2(1) (1 2 )1
p1 sim(l, r) = 3 4
B(, )
2 2
note that uniform distribution is obtained for
==1
1 0
0 2
0 0.2 0.4 0.6 0.8 1
q
the mode is at 1
+2
0.9733 for = 10, = 1.5
if we chose = 1 then the mode was at = 1
perfect similarity is suspicious (depends on expected camera noise level)
from now on we will work with
V1 sim(l, r) = log p1 sim(l, r) (33)
3D Computer Vision: VII. Stereovision (p. 167/211) R. Sara, CMP; rev. 7Jan2014
How A Scene Looks in The Filled-In Similarity Table
scene left image right image
MNCC used
( = 1.5, = 1)
high-correlation structures
correspond to scene objects
constant disparity
a diagonal in correlation
table
5 5 window 11 11 window 3 3 window zero disparity is the main
diagonal
depth discontinuity
horizontal or vertical jump
in correlation table
large image window
better correlation
worse occlusion localization
see next
repeated texture
horizontal and vertical
block repetition
a good tradeoff occlusion artefacts undiscrimiable
3D Computer Vision: VII. Stereovision (p. 168/211) R. Sara, CMP; rev. 7Jan2014
Note: Errors at Occlusion Boundaries for Large Windows
d=0
d=1
d=2
iml
imr
X
d
win
3D Computer Vision: VII. Stereovision (p. 170/211) R. Sara, CMP; rev. 7Jan2014
The Matlab Code for WTA
% the size of each local patch; it is N=(2r+1)^2 except for boundary pixels
N = boxing(ones(size(iml)), winsize);
The negative log-likelihood is then the likelihood of data D given labeled matching M
X
V (D | M ) = V D(pi ) | (pi )
pi M
Our choice:
V D(pi ) | (pi ) = e = Ve penalty for unexplained data, Ve 0
V D(pi ) | (pi ) = m = V1 D(l, r) probability of match pi = (l, r) from (33)
the V D(pi ) | (pi ) = e could also be a non-uniform distribution but the extra effort does not pay off
3D Computer Vision: VII. Stereovision (p. 173/211) R. Sara, CMP; rev. 7Jan2014
I Maximum Likelihood (ML) Matching
1
Uniqueness constraint: Each point in the left image matches
at most once and vice versa.
A node set of T that follows the uniqueness constraint is called
matching in graph theory
X (p)
pi A set of pairs M = {pi }n
i=1 , pi T is a matching iff
pi , pj M, i 6= j : pj
/ X(pi ).
pj
2
The X(p) is called the X-zone of p and it defines dependencies
ML matching will observe the uniqueness constraint only ~ H4; 2pt: How many are
there: (1) binary partitionings
epipolar lines are independent wrt uniqueness constraint
of T , (2) maximal matchings
we can solve the problem per image lines i independently: in T ; prove the results.
X X
M = arg min
V D(p) | (p) = arg min M | Ve +
e
V (D(p) | (p) = m)
M M pM M M
pM : (p)=m
| {z } | {z }
unexplained pixels matching likelihood proper
3D Computer Vision: VII. Stereovision (p. 174/211) R. Sara, CMP; rev. 7Jan2014
I Programming The ML Matching Algorithm
we restrict ourselves to a single (rectified) image line and reduce the problem to min-cost
perfect matching
extend every matching table pair p T , p = (j, k) to 4 combinations (j, sj ), (k, sk ) ,
sj {0, 1} and sk {0, 1} selects/rejects pixels for matching unlike selecting matches
binary label mjk = 1 then means that (j, sj ) matches (k, sk )
Idea?: Q: This looks simpler: Run matching with Ve = 0 and then threshold the result to
remove bad matches. A: Does not work well, see this example:
Ex: Ve = 8
ordinary ML w/thresholding our ML matching 174
8 3 9 8 3 9
10 6 9 10 6 9 our matching gives a better cost,
7 1 8 7 1 8 also greater cardinality (density)
V = 9 + 2 8 = 25 V = 9 + 10 + 8 = 27
j l
i
coherent
k
monotonic
continuous
x b b u1 + u2 u1 + u2
new: u = f , known: d = f , x= u=
z z d 2 2
x0 X 0
x X
Disparity gradient
[Pollard, Mayhew, Frisby 1985]
1
z10
|d d0 | bf
z
z 0
z DG = 0
= 0 =
|u u | f x
z
xz0
u |z 0 z|
m m 0
=b
|xz 0 x0 z|
m1 z
m2
f human stereovision fails to perceive
x a continuous surface when disparity
C1 b
C b
C2 gradient exceeds a limit
2 2
3D Computer Vision: VII. Stereovision (p. 179/211) R. Sara, CMP; rev. 7Jan2014
IForbidden Zone and The Ordering Constraint
m1
0
m2 0
m1 m2
C1 C2
disparity gradient limit is exceeded when X 0 F (X)
symmetry: X 0 F (X) X F (X 0 )
Obs: X 0 and X swap their order in the other image when X 0 F (X) k=2
real scenes often preserve ordering
thin and close objects violate ordering see next
3D Computer Vision: VII. Stereovision (p. 180/211) R. Sara, CMP; rev. 7Jan2014
Ordering and Critical Distance
object (thick):
X1 F (X4 ) X3 black binocularly visible
X2 yellow half-occluded
red ordering violated wrt foreground
bounded by the foreground object
X4
Ordering is violated iff both Xi , Xj s.t.
Xi F (Xj ) are visible in both cameras.
eg. X2 , X4
3D Computer Vision: VII. Stereovision (p. 181/211) R. Sara, CMP; rev. 7Jan2014
IThe X-zone and the F -zone in Matching Table T
these are necessary and sufficient conditions for uniqueness and monotonicity
Uniqueness Constraint:
A set of pairs M = {pi }N i=1 , pi T is a matching iff
1 pi , pj M, i 6= j : pj / X(pi ).
3D Computer Vision: VII. Stereovision (p. 182/211) R. Sara, CMP; rev. 7Jan2014
IUnderstanding Matching Table
dk critical disparity
invisible dk
3D Computer Vision: VII. Stereovision (p. 183/211) R. Sara, CMP; rev. 7Jan2014
Bayesian Decision Task for Matching
Idea: L(d, M ) decision cost (loss) d our decision (matching) M true correspondences
(
if d = M then L(d, M ) = 0
Choice: L(d, M ) : i.e. L(d, M ) = [d 6= M ]
6 M then L(d, M ) = 1
if d =
Bayesian Loss X
L(d | D) = p(M | D) L(d, M )
M M
M the set of all matchings D = {IL , IR } data
p
l
t t
this gives an acyclic directed graph G optimal paths in acyclic graphs are an easier problem
the set of s-t paths starting in s and ending in t will represent the set of matchings
all such s-t paths have equal length n = |I| + |J| 1
all prospective matchings will have the same number of terms in V (D | M ) and in V (M )
3D Computer Vision: VII. Stereovision (p. 185/211) R. Sara, CMP; rev. 7Jan2014
Endowing s-t Paths with Useful Properties
introduce node labels = {m, eL , eR } matched, left-excluded, right-excluded
s-t path neighbors are allowed only some label combinations:
s numbers are disparities
m eL eL eL m eL eR eL 0
2
eR m eR eL 2
eR m eR eR o
o
o
Observations 1
no two neighbors have label m
in each labeled s-t path there is at most one transition:
1. m eL or eR m per matching table row, 0 t
2. m eR or eL m per matching table column 0 o o 2 2 1 0
pairs labeled m on every s-t path satisfy uniqueness and ordering constraints
transitions eL eR or eR eL along an s-t path allow skipping a contiguous segment in
either or in both images this models half occlusion and mutual occlusion
eL eL eR eR
disparity change is the number of edges or
a given monotonic matching can be traversed by one or more s-t paths
ideas:
we choose energy of path P dependent on its labeling only
we choose additive penalty per transition eL eL , eR eR , and eL eR , eR eL
no penalty for m eL , m eR
1 n
Employing Markovianity
p1 p2 p3 pn
V (P ) = V (n , n1 , . . . , 1 ) = V (n | n1 , . . . , 1 ) + V (n1 , . . . , 1 ) =
n
X
= V (n | n1 ) + V (n1 , . . . , 1 ) = V (1 ) + V (i | i1 )
i=2
3D Computer Vision: VII. Stereovision (p. 187/211) R. Sara, CMP; rev. 7Jan2014
A Choice of V (i | i1 )
A natural requirement: symmetry of probability p(i , i1 ) = eV (i , i1 )
i
V (i | i1 ) by marginalization:
m eL eR
m 0 0 1 + 1 + 2
V (m) = ln
1+1 +2 1+1 +2 1+1 +2 2 2
i1 eL ln 2 2
ln 2
ln 2 1 V (eL ) = V (eR ) = 0
1+1 +2 1+1 +2 1+1 +2
eR ln 2 2
ln 2 1
ln 2
parameters
1 likelihood of mutual occlusion (1 = 0 forbids mutual occlusion)
2 likelihood of irregularity (2 0 helps suppress small objects and holes)
, similarity model parameters (see V1 D(l, r) 167)
Ve penalty for disregarded data (see V (D(pi ) | (pi ) = e) 173)
3D Computer Vision: VII. Stereovision (p. 188/211) R. Sara, CMP; rev. 7Jan2014
Programming the Matching Algorithm: 3LDP
given G, construct directed graph G +
triple of vertices per node of s-t path representing three hypotheses (p) for
arcs have costs V (i | i1 ), nodes have costs V (D | i )
orientation of G + is inherited from the orientation of s-t paths
we converted the shortest labeled-path problem to ordinary shortest path problem
(l 1, r)
m
G+ eL
G eR
(l, r 1) p = (l, r) (l, r + 1)
s
m m m
eL eL eL
p eR eR eR
l
m
(l + 1, r)
eL
t
eR
r
neighborhood of p; strong blue edges are of zero penalty
3D Computer Vision: VII. Stereovision (p. 189/211) R. Sara, CMP; rev. 7Jan2014
contd: Dynamic Programming on G +
p1
p2 q
n o
Vs:q (q ) = min Vs:z (z ) + Vz (D | z ) + V (q | z )
z{p1 ,p2 },z
( ) cost of min-path from s to label at node q
Vs:q q q
complexity is O(|I| |J|), ie. stereo matching on N N images needs O(N 3 ) time
speedup by limiting the range in which the disparities d = l r are allowed to vary
3D Computer Vision: VII. Stereovision (p. 190/211) R. Sara, CMP; rev. 7Jan2014
Implementation of 3LDP in a few lines of code. . .
#define clamp(x, mi, ma) ((x) < (mi) ? (mi) : ((x) > (ma) ? (ma) : (x)))
#define MAXi(tab,j) clamp((j)+(tab).drange[1], (tab).beg[0], (tab).end[0])
#define MINi(tab,j) clamp((j)+(tab).drange[0], (tab).beg[0], (tab).end[0])
#define ARG_MIN2(Ca, La, C0, L0, C1, L1) if ((C0) < (C1)) { Ca = C0; La = L0; } else { Ca = C1; La = L1; }
int i = tab.beg[0]; int j = tab.beg[1]; int i,j; labelT La; double Ca;
C_m[j][i-1] = C_m[j-1][i] = MAXDOUBLE; for(i=0; i<nl; i++) D[i] = nan; /* not-a-number */
C_oL[j][i-1] = C_oR[j-1][i] = 0.0;
C_oL[j-1][i] = C_oR[j][i-1] = -penalty[0]; i = tab.end[0]; j = tab.end[1];
ARG_MIN3(Ca, La, C_m[j][i], lbl_m,
for(j = tab.beg[1]; j <= tab.end[1]; j++) C_oL[j][i], lbl_oL, C_oR[j][i], lbl_oR);
for(i = MINi(tab,j); i <= MAXi(tab,j); i++) {
while (i >= tab.beg[0] && j >= tab.beg[1] && La > 0)
ARG_MIN2(C_m[j][i], P_m[j][i], switch (La) {
C_oR[j-1][i] + penalty[2], lbl_oR, case lbl_m: D[i] = i-j;
C_oL[j][i-1] + penalty[2], lbl_oL); switch (La = P_m[j][i]) {
C_m[j][i] += 1.0 - tab.MNCC[j][i]; case lbl_oL: i--; break;
case lbl_oR: j--; break;
ARG_MIN3(C_oL[j][i], P_oL[j][i], C_m[j-1][i], lbl_m, default: Error(...);
C_oL[j-1][i] + penalty[0], lbl_oL, } break;
C_oR[j-1][i] + penalty[1], lbl_oR);
C_oL[j][i] += penalty[3]; case lbl_oL: La = P_oL[j][i]; j--; break;
case lbl_oR: La = P_oR[j][i]; i--; break;
ARG_MIN3(C_oR[j][i], P_oR[j][i], C_m[j][i-1], lbl_m, default: Error(...);
C_oR[j][i-1] + penalty[0], lbl_oR, }
C_oL[j][i-1] + penalty[1], lbl_oL); }
C_oR[j][i] += penalty[3];
}
}
3D Computer Vision: VII. Stereovision (p. 191/211) R. Sara, CMP; rev. 7Jan2014
Some Results: AppleTree
3LDP (189) nave DP [Cox et al. 1992] stable segmented 3LDP (see [SP])
3LDP parameters i , Ve learned on Middlebury stereo data http://vision.middlebury.edu/stereo/
3D Computer Vision: VII. Stereovision (p. 192/211) R. Sara, CMP; rev. 7Jan2014
Some Results: Larch
Winner-Take-All (WTA)
ROC curves and their average error rate bounds
the ur-algorithm [Marroquin 83] no model
98
dense disparity map
95
O(N 3 ) algorithm, simple but it rarely works
90
Maximum Likelihood (ML) 80
70
semi-dense disparity map
density [%]
many small isolated errors 50
3D Computer Vision: VII. Stereovision (p. 194/211) R. Sara, CMP; rev. 7Jan2014
Thank You
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
How To Teach Stereoscopic Matching?
(Invited Paper)
Radim Sara
Center for Machine Perception
Department of Cybernetics
Czech Technical University in Prague, Czech Republic
Email: sara@cmp.felk.cvut.cz
AbstractThis paper describes a simple but non-trivial semi- Besides the basic knowledge in computer science and
dense stereoscopic matching algorithm that could be taught algorithms, basics of probability theory and statistics, and
in Computer Vision courses. The description is meant to be introductory-level computer vision or image processing, it is
instructive and accessible to the student. The level of detail is
sufficient for a student to understand all aspects of the algorithm assumed that the reader is familiar with the concept of epipolar
design and to make his/her own modifications. The paper includes geometry and plane homography. The limited extent of this
the core parts of the algorithm in C code. We make the point paper required shortening some explanations.
of explaining the algorithm so that all steps are derived from The purpose of this paper is not to give a balanced state-
first principles in a clear and lucid way. A simple method is of-the-art overview. The presentation, however, does give
described that helps encourage the student to benchmark his/her
own improvements of the algorithm. references to relevant literature during the exposition of the
problem.
I. I NTRODUCTION
Teaching stereovision in computer vision classes is a difficult II. T HE BASIC S TRUCTURE OF S TEREOSCOPIC M ATCHING
task. There are very many algorithms described in the literature Stereoscopic matching is a method for computing disparity
that address various aspects of the problem. A good up-to-date maps from a set of images. In this paper we will only
comprehensive review is missing. It is thus very difficult for a consider pairs of cameras that are not co-located. We will
student to acquire a balanced overview of the area in a short often call them the left and the right camera, respectively. A
time. For a non-expert researcher who wants to design and disparity map codes the shift in location of the corresponding
experiment with his/her own algorithm, reading the literature local image features induced by camera displacement. The
is often a frustrating exercise. direction of the shift is given by epipolar geometry of the two
This paper tries to introduce a well selected, simple, and cameras [1] and the magnitude of the shift, called disparity,
reasonably efficient algorithm. My selection is based on a is approximately inversely proportional to the camera-object
long-term experience with various matching algorithms and distance. Some disparity map examples are shown in Fig. 10.
on an experience with teaching a 3D computer vision class. I In these maps, disparity is coded by color, close objects are
believe that by studying this algorithm the student is gradually reddish, distant objects bluish, and pixels without disparity are
educated about the way researchers think about stereoscopic black. Such color-coding makes various kinds of errors clearly
matching. I tried to address all elementary components of the visible.
algorithm design, from occlusion modeling to matching task This section formalizes object occlusion so that we could
formulation and benchmarking. derive useful necessary constraints on disparity map correct-
An algorithm suitable for teaching stereo should meet a ness. We will first work with spatial points. The 3D space
number of requirements: in front of the two cameras is spanned by two pencils of
Simplicity: The algorithm must be non-trivial but easy to optical rays, one system per camera. The spatial points can
implement. be thought of as the intersections of the optical rays. They
Accessibility: The algorithm should be described in a complete will be called possible correspondences. The task of matching
and accessible form. is to accept some of the possible correspondences and reject
Performance: Performance must be reasonably good for the the others. Fig. 1 shows a part of a surface in the scene (black
basic implementation to be useful in practice. line segment) and a point p on the surface. Suppose p is
Education: The student should exercise formal problem for- visible in both cameras (by optical ray r1 in the left camera
mulation. Every part of the algorithm design must be well and by optical ray t1 in the right camera, both cameras are
justified and derivable from first principles. located above the scene). Then every point on ray r1 that
Development potential: The algorithm must possess potential lies in front of p must be transparent and each point of r1
for encouraging the student to do further development. behind p must be occluded. A similar observation holds for
Learnability: There should be a simple recommended method ray t1 . This shows that the decision on point acceptance and
for choosing the algorithm parameters and/or selecting their rejection are not independent: If the correspondence given
most suitable values for a given application. by point p is accepted, then a set of correspondences X(p)
AbstractThis paper describes a simple but non-trivial semi- Besides the basic knowledge in computer science and
dense stereoscopic matching algorithm that could be taught algorithms, basics of probability theory and statistics, and
in Computer Vision courses. The description is meant to be introductory-level computer vision or image processing, it is
instructive and accessible to the student. The level of detail is
sufficient for a student to understand all aspects of the algorithm assumed that the reader is familiar with the concept of epipolar
design and to make his/her own modifications. The paper includes geometry and plane homography. The limited extent of this
the core parts of the algorithm in C code. We make the point paper required shortening some explanations.
of explaining the algorithm so that all steps are derived from The purpose of this paper is not to give a balanced state-
first principles in a clear and lucid way. A simple method is of-the-art overview. The presentation, however, does give
described that helps encourage the student to benchmark his/her
own improvements of the algorithm. references to relevant literature during the exposition of the
problem.
I. I NTRODUCTION
Teaching stereovision in computer vision classes is a difficult II. T HE BASIC S TRUCTURE OF S TEREOSCOPIC M ATCHING
task. There are very many algorithms described in the literature Stereoscopic matching is a method for computing disparity
that address various aspects of the problem. A good up-to-date maps from a set of images. In this paper we will only
comprehensive review is missing. It is thus very difficult for a consider pairs of cameras that are not co-located. We will
student to acquire a balanced overview of the area in a short often call them the left and the right camera, respectively. A
time. For a non-expert researcher who wants to design and disparity map codes the shift in location of the corresponding
experiment with his/her own algorithm, reading the literature local image features induced by camera displacement. The
is often a frustrating exercise. direction of the shift is given by epipolar geometry of the two
This paper tries to introduce a well selected, simple, and cameras [1] and the magnitude of the shift, called disparity,
reasonably efficient algorithm. My selection is based on a is approximately inversely proportional to the camera-object
long-term experience with various matching algorithms and distance. Some disparity map examples are shown in Fig. 10.
on an experience with teaching a 3D computer vision class. I In these maps, disparity is coded by color, close objects are
believe that by studying this algorithm the student is gradually reddish, distant objects bluish, and pixels without disparity are
educated about the way researchers think about stereoscopic black. Such color-coding makes various kinds of errors clearly
matching. I tried to address all elementary components of the visible.
algorithm design, from occlusion modeling to matching task This section formalizes object occlusion so that we could
formulation and benchmarking. derive useful necessary constraints on disparity map correct-
An algorithm suitable for teaching stereo should meet a ness. We will first work with spatial points. The 3D space
number of requirements: in front of the two cameras is spanned by two pencils of
Simplicity: The algorithm must be non-trivial but easy to optical rays, one system per camera. The spatial points can
implement. be thought of as the intersections of the optical rays. They
Accessibility: The algorithm should be described in a complete will be called possible correspondences. The task of matching
and accessible form. is to accept some of the possible correspondences and reject
Performance: Performance must be reasonably good for the the others. Fig. 1 shows a part of a surface in the scene (black
basic implementation to be useful in practice. line segment) and a point p on the surface. Suppose p is
Education: The student should exercise formal problem for- visible in both cameras (by optical ray r1 in the left camera
mulation. Every part of the algorithm design must be well and by optical ray t1 in the right camera, both cameras are
justified and derivable from first principles. located above the scene). Then every point on ray r1 that
Development potential: The algorithm must possess potential lies in front of p must be transparent and each point of r1
for encouraging the student to do further development. behind p must be occluded. A similar observation holds for
Learnability: There should be a simple recommended method ray t1 . This shows that the decision on point acceptance and
for choosing the algorithm parameters and/or selecting their rejection are not independent: If the correspondence given
most suitable values for a given application. by point p is accepted, then a set of correspondences X(p)
sup
p
ort
ing
line
p1
p0
1.8
1.6
Error [px]
1.4
1.2
0.8
0.6
0.4
6 8 10 12 14 16 18 20 22
Focal length from EXIF data [mm]
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
Radial distortion coefficient values
0.1
division model,
0.08 polynomial model (h=2), k1
0.04
Coefficient values []
0.02
0.02
0.04
0.06
0.08
0.1
6 8 10 12 14 16 18 20 22
Focal length from EXIF data [mm]
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
3D Computer Vision: enlarged figures R. Sara, CMP; rev. 7Jan2014
1,13
6,12 4,12 2,12 0,12 2,12
7,11 5,11 3,11 1,11 1,11 3,11
8,10 6,10 4,10 2,10 0,10 2,10 4,10
9,9 7,9 5,9 3,9 1,9 1,9 3,9
10,8 8,8 6,8 4,8 2,8 0,8 2,8 4,8
9,7 7,7 5,7 3,7 1,7 1,7 3,7
10,6 8,6 6,6 4,6 2,6 0,6 2,6 4,6
9,5 7,5 5,5 3,5 1,5 1,5 3,5
8,4 6,4 4,4 2,4 0,4 2,4 4,4
7,3 5,3 3,3 1,3 1,3 3,3
6,2 4,2 2,2 0,2 2,2
13
10
12
3
916
5211 7
17
8
414
15
1
xi e(xi , xi )
0.2
p(||x|| | r = 1)
0.15
0.1
0.05
0
0 0.5 1 1.5 2 2.5 3
||x||
0.2
p(||x|| | r = 1)
0.15
0.1
0.05
0
0 0.5 1 1.5 2 2.5 3
||x||
0.2
p(||x|| | r = 1)
0.15
0.1
0.05
0
0 0.5 1 1.5 2 2.5 3
||x||
95
90
80
70
density [%]
50
30
20
3LDP (3.65 0.26)
10
WTA (4.71 0.17)
5 ML (4.60 0.65)
GCS (4.29 1.47)
2
0.5 1 2 3 5 10 20
inaccuracy [%]