Professional Documents
Culture Documents
CHAPTER 1
INTRODUCTION TO 3D
1.1 3D RECONSTRUCTION FROM SINGLE 2D IMAGES
A single image of an everyday object, a sculptor can recreate its 3D shape (i.e.,
produce a statue of the object), even if the particular object has never been seen before.
Presumably, it is familiarity with the shapes of similar3D objects (i.e., objects from the
same class) and how they appear in images, which enables the artist to estimate its shape.
This might not be the exact shape of the object; but it is often a good enough estimate for
many purposes Motivated.
In general, the problem of 3D reconstruction from a single2D image is ill posed,
since different shapes may give rise to the same intensity patterns.
To solve this,
reconstructs outdoor
scenes assuming they can be labelled as ground, sky, and vertical billboards.
The target of the system is the geometric model of the scenes. So here consider
geometric reconstruction and not photometric (or image-based) reconstruction, which
directly generates new views of a scene without (completely) reconstructing the 3D
structure. With the stated purposes stated and application context set the limits as:
Varying intrinsic camera parameters: The camera intrinsic parameters (e.g. focal
length) can vary freely. Together with the previous, this assumption adds flexibility to the
system.
The process starts with the data capturing step, in which a person moves around
After that, the video sequence is processed to produce a 3D model of the scene.
4.
Finally, the 3D model can be rendered, or exported for editing using 3D modeling
tools.
The 3D reconstruction (step 3) can be divided into 4 main tasks, which are as following:
1.
Feature detection and matching: The objective of this step is to find out the
Structure and motion recovery: This step recovers the structure and motion of
the scene (i.e. 3D coordinates of detected features; position ,orientation and parameters of
the camera at capturing positions).
Dept. of ECE, MRITS
Stereo mapping: This step creates a dense matching map. In conjunction with the
structure recovered in the previous step, this enables to build a dense depth map.
4.
Modeling: This step includes procedures needed to make a real model of the
Some define the input as an image sequence but in fig.1.1 defines it as a video
sequence since our practical objective is a system that does reconstruction from video. By
defining it like that, we want to clearly state that the intermediate step to go from video to
image sequences (i.e. frame selection) is a part of there construction process.
1.2.1 Feature Detection and Matching
This process creates relations used by the next step, structure and motion
recovery, by detecting and matching features in different images. Until now, the features
used in structure recovery processes are points and lines. So here features are understood
as points or lines.
1.3 LINES
Two-view projective reconstruction can only use point correspondences. But in
three or more view structure recovery it is possible to use line correspondences.
1.3.1 Line detection
Line detection usually includes edge detection, followed by line extraction.
1.3.2 Edge detection
The key to solve the problem is the intensity change, which is shown via the
gradient of the image. Edge detectors usually follow the same routine: smoothing,
applying edge enhancement filters, applying a threshold, and edge tracing.
Evaluations of edge detectors are inconsistent and not convergent for reasons such
as unclear objective and varying parameters. A series of evaluation in different tasks in
which the application acts as the black box to test algorithms. One of them is structure
from motion. The evaluation shows that overall, the canny detector is most suitable
because of its Performance.
Fastest s peed and low sensitivity to parameters variation .However the structure
from motion algorithm used there is not a three-view one and uses line segments rather
Dept. of ECE, MRITS
Lines are generally highly structured features give stronger constraints. Lines are
many and easy to extract in scenes with dominant artificial objects, e.g. urban
architectures. However, the fact that evaluations on line extraction and matching for
structure recovery are not complete and concrete probably is the reason why the theory of
three-view reconstruction with lines are available for along time but methods in structure
recovery usually use point correspondence. One of the few works that uses line
correspondences and trifocal tensors is of Breads but lines are not used directly. Still
point correspondences are used first to recover geometry information.
On the other hand, laser-based methods are complex to handle for large scale
outdoor scenes, especially for aerial data acquisition. In contrast to that, passive imageDept. of ECE, MRITS
10
11
matching approaches that rely on 2.5D data fusion of pair wise stereo depth maps,
the correspondence chaining (i.e. measurement linking) and triangulation approach that
takes full advantage of the achievable baseline (i.e. triangulation angles). In contrast to
voxel-based approaches, Polygonal meshes and local patches, focus on algorithms
representing geometry as a set of depth maps. It eliminates the need for resembling the
geometry in the three-dimensional domain and can be easily parallelized. Evaluate the
approach on the multitier benchmark data set that provides accurate ground truth and on
large scale aerial images.
z= -
. d...
(1)
where z is the point depth, f the focal length and b the image baseline. Hence the depth
precision is mainly a function of the ray intersection angle. In contrast, for multi view
image matching and triangulation , the redundancy not only implies more measurements
but additionally constrains the 3D point location through multiple ray intersections. These
entities are not independent but are coupled, since they rely on the network geometric
Dept. of ECE, MRITS
12
)...........................................................................................(2)
x
tracks^m = (< ^x1; ^y1>;< ^x2; ^y2> : : : ;< ^xk; ^yk>) and ground truth projection
matrices Pi=1:N, the 3Dposition of the respective point in space is determined. This
process requires the intersection of at least two known rays in space. Hence, we use a
linear triangulation method to determine the 3D position of point tracks. This method
generalizes easily to the intersection of multiple rays providing a least squares solution.
Optionally, a non-linear optimizer based on the Levenberg-Marquardt algorithm issued to
Dept. of ECE, MRITS
13
....(3)
... (4)
Using the singular value decomposition the covariance matrix can then be diagonalized,
CX = U(
) VT(5)
and
are the
2 and
3to
respectively.
14
15
16
Fig.1.8: Block matching a macro block of side 16 pixels and a search parameter p of
size 7 pixels.
Larger motions require a larger p, and the larger the search parameter the more
computationally expensive the process of motion estimation becomes. Usually the macro
block is taken as a square of side 16 pixels, and the search parameter p is 7 pixels. The
idea is represented in Fig1.8. The matching of one macro block with another is based on
the output of a cost function. The macro block that results in the least cost is the one that
matches the closest to current block. There are various cost functions, of which the most
popular and less computationally expensive is Mean Absolute Difference (MAD) given
by equation 1.Another cost function is Mean Squared Error (MSE) given in equation 2.
MAD= N-1i=0
Dept. of ECE, MRITS
(6)
17
MSE =
N-1i=0
2 (7)
where N is the side of the macro bock, C and Rij are the pixels being compared in current
macro block and reference macro block respectively, Peak-Signal-to-Noise-Ratio (PSNR)
given by equation 3characterizes the motion compensated image that is created by using
motion vectors and macro clocks from the reference frame
PSNR=10log10 [
18
Fig.1.9: Three Step Search procedure. The motion vector is (5, -3).
19
20
Fig.1.11: Search patterns corresponding to each selected quadrant: (a) Shows all
quadrants (b) quadrant I is selected (c) quadrant II is selected (d) quadrant III is
selected (e) quadrant IV is selected.
21
Fig.1.12: The SES procedure. The motion vector is (3, 7) in this example.
22
(c)
(d)
Fig.1.13: Search patterns of the FSS. (a) First step (b) Second/Third
step(c)Second/Third Step (d) Fourth Step.
Once again if the least weight location is at the center of the 5x5 search window
we jump to fourth step or else we move on to third step. The third is exactly the same as
the second step. IN the fourth step the window size is dropped to 3x3, i.e.S = 1. The
Dept. of ECE, MRITS
23
24
Fig.1.15: Adaptive Root Pattern: The predicted motion vector is (3,-2), and the step
size S = Max (|3|, |-2|) = 3.
25
Fig. 1.16. Search points per macro block while computing the PSNR
Performance of Fast Block Matching Algorithms
The main advantage of this algorithm over DS is if the predicted motion vector is
(0, 0), it does not waste computational time in doing LDSP, it rather directly starts using
Dept. of ECE, MRITS
26
27
28
CHAPTER 2
STEREO VISION ALGORITHMS
2.1 INTRODUCTION TO STEREO VISION
Stereo correspondence problem has historically been, and continues to be, one of
the most investigated topics in computer vision, and a larger number of literatures on it
have been published. The correspondence problem in computer vision concerns the
matching of points, or other kinds of primitives, in two or more images such that the
matched elements are the projections of the same physical elements in 3D scene, and the
resulting displacement of a projected point in one image with respect to the other is
termed as disparity. Similarity is the guiding principle for solving the correspondence
problem; however, the stereo correspondence problem is an ill-posed task, in order to
make it tractable, it is usually necessary to exploit some additional information or
constraints.
The most popular constraint is the epipolar constraint, which can reduce the
search to one-dimension rather than two. Other constraints commonly used are the
disparity uniqueness constraint and the continuous constraint.
The origin of the word stereo is the Greek word stereos which means firm or
solid, with stereo vision, the objects are seen solid in three dimensions with range. In
stereo vision, the same seen is captured using two sensors from two different angles. The
captured two images have a lot of similarities and smaller number of differences. In
human sensitivity, the brain combines the captured to images together by matching the
similarities and integrating the differences to get a three dimension model for the seen
objects.
In machine vision, the three dimension model for the captured objects is obtained
finding the similarities between the stereo images and using projective geometry to
process these matches. The difficulties of reconstruction using stereo is finding matching
correspondences between the stereo pair.
29
(1)
30
31
32
33
Finding pairs of matched points such that each point in the pair is the projection
Ambiguous correspondence between points in the two images may lead to several
34
35
36
37
Fig 2.7: Geometry of epipolar lines, where C1 and C2 are the left and right camera
lens centers, respectively. Point P1 in one image plane may have arisen from any of
points in the line C1P1, and may appear in the alternate image plane at any point on
the epipolar line E2.
As numerous methods have been proposed since then, this section aspires to review the
most recent ones, i.e., Most of the results presented in the rest of this paper are based on
the image sets and test provided there.
The most common image sets are presented in Figure 2.8. Table 2.1 summarizes
their size as well the number of disparity levels. Experimental results based on these
image sets are given, where available. The preferred metric adopted by in this paper, in
order to depict the quality of the resulting disparity maps, is the percentage of pixels
whose absolute disparity error is greater than 1 in the unconcluded areas of the image.
This metric, considered the most representative of the results quality, was used so as to
38
Fig.2.8:Left image of the stereo pair (left) and ground truth (right) for the Tsukuba
(a), Sawtooth (b), Map (c), Venus (d), Cones (e) and Teddy (f) stereo pair.
The speed with which the algorithms process input image pairs is expressed in
frames per second (fps). This metric has of course a lot to do with the used computational
platform and the kind of the implementation. Inevitably, speed results are not directly
comparable.
Tsukuba
Size
in 384288
Map
Sawtooth
Venus
Cone
Teddy
284216
434380
434383
450375
450375
30
20
20
60
60
pixels
Disparity 16
levels
Table 2.1: Characteristics of the most common image sets
39
40
41
42
43
data
smooth provides
takes into consideration the (x, y) pixels value throughout the image, E
The main disadvantage of the global methods is that they are more time
consuming and computational demanding. The source of these characteristics is the
iterative refinement approaches that they employ. They can be roughly divided in those
performing a global energy minimization and those pursuing the minimum for
independent scan lines using DP.
In Figure 2.10 the main characteristics of the below discussed global algorithms
are presented. It is clear that the recently published works utilizes global optimization
preferably rather than DP.
consideration the fact that under the term global optimization there are actually quite a
few different methods. Additionally, DP tends to produce inferior ,thus less impressive,
results. Therefore, applications that dont have running speed constraints, preferably
utilize global optimization methods
44
45
46
47
48
The work indicates that computational cost of the graph cuts stereo
correspondence technique can be efficiently decreased using the results of a simple local
Dept. of ECE, MRITS
49
50
51
52
53
54
55
56
57
58
59
CHAPTER 3
IMAGE PYRAMID AND BLOCK MATCHING
3.1 ALGORITHM IMPLEMENTATION
60
61
Define a small part of the whole problem and find an optimum solution to this
small part.
2)
Enlarge this small part slightly and find optimum solution to new problem using
Continue with step2 until you have enlarged sufficiently that the current problem
Track back the solution to the whole problem from the optimum solutions to the
62
63
3.2 ADVANTAGES
Computational time is more
Improves accuracy
Efficient technique
3.4 APPLICATIONS
Object recognition.
Position determination.
Shape and size detection.
Product processing and assembly.
Obstacle avoidance and navigation.
64
CHAPTER 4
EXPERIMENTAL RESULTS AND DISCUSSIONS
65
66
67
68
69
CHAPTER 5
CONCLUSION AND FUTURE SCOPE
The process of stereo matching involves in block matching technique is able to
establish a correspondence by matching image pixel intensities. The output of is a
disparity mapping which stores the depth or distance of each pixel in an image. Each
pixel in the map corresponds to the depth at that point rather than the gray shaded or
color. In future we can improve the peak-signal to noise ratio by improved stereo vision
techniques.
70
REFERENCES
1. Hartley, R. and Zisserman, A., 2004. Multiple View Geometry in computer Visionsecond edition. Cambridge Un.Press.
2. Blostein, S. and Huang, T., 1987. Quantization error in stereo triangulation. In: IEEE
Int. Conf. on Computer Vision.
3. Hartley, R., Gupta, R. and Chang, T., 1992. Stereo from uncalibrated cameras. In:
IEEE Conf. on Computer Visionand Pattern Recognition.
4. Koch, R., Pollefeys, M. and Gool, L., 2000. Realistic surface reconstruction of 3d
scenes from uncalibrated image. sequences. In: Visualization and Computer Animation.
5. Hernandez, C., Schmitt, F. and Cipol, R., 2007. Silhouette coherence for camera
calibration under circular motion.In: IEEE Trans. On Pattern Analysis and Machine
Intelligence.
6. S. Lazebnik, Y. Furukawa, J. Ponce. Projective Visual Hulls. IJCV 2006.
7. R. Szeliski. Rapid Octree Construction from Image Sequences. Computer Vision,
Graphics and Image Processing, 58(1):23.32, 1993.
8. W. Matusik, C. Buehler, R. Raskar, S. Gortler, and L. McMillan. Image Based Visual
Hulls. In ACM Siggraph,2000.
9. Khan, S., Yan, P. and Shah, M., 2007. A homographic framework for the fusion of
multi-view silhouettes. In: IEEE Int. Conf. on Computer Vision.
10.
71
APPENDIX
SOFTWARE IMPLEMENTATION
Software Requirement
The MATLAB
MATLAB is a high performance language for technical computing. It integrates
computation visualization and programming in an easy to use environment. MATLAB
stands for matrix laboratory. It was written originally to provide easy access to matrix
software developed by LINPACK (linear system package) and EISPACK (Eigen system
package) projects. MATLAB is therefore built on a foundation of sophisticated matrix
software in which the basic element is matrix that does not require pre dimensioning.
Typical uses of MATLAB
The typical usage areas of MATLAB are
Algorithm development
Data acquisition
does not require dimensioning. This allows you to solve many technical computing
problems, especially those with matrix and vector formulations, in a fraction of the time it
would take to write a program in a scalar non-interactive language such as C or
FORTRAN.
MATLAB features a family of add-on application-specific solutions called
toolboxes. Very important to most users of MATLAB, toolboxes allow you to learn and
apply specialized technology. Toolboxes are comprehensive collections of MATLAB
functions (M-files) that extend the MATLAB environment to solve particular classes of
Dept. of ECE, MRITS
72
MATLAB
MATLAB
Programming language
User written / built in functions
Graphics
Computation
External interface
2-D graphics
Linear algebra
3-D graphics
Signal processing
FORTRAN
Quadrature
Programs
ToolEtc
boxes
1. Signal
processi
ng
2. Image
processi
Fig: Features ng
and capabilities of MATLAB
3. Control
systems
Dept. of ECE, MRITS
4. Neural
Network
s
5. Commu
73
Development Environment
Development Environment
This is the set of tools and facilities that help you use MATLAB functions and
files. Many of these tools are graphical user interfaces. It includes the MATLAB desktop
and Command Window, a command history, an editor and debugger, and browsers for
viewing help, the workspace, files, and the search path.
The MATLAB Mathematical Function
This is a vast collection of computational algorithms ranging from elementary
functions like sum, sine, cosine, and complex arithmetic, to more sophisticated functions
like matrix inverse, matrix Eigen values, Bessel functions, and fast Fourier transforms.
The MATLAB Language
This is a high-level matrix/array language with control flow statements, functions,
data structures, input/output, and object-oriented programming features. It allows both
"programming in the small" to rapidly create quick and dirty throw-away programs, and
"programming in the large" to create complete large and complex application programs.
The GUI construction
MATLAB has extensive facilities for displaying vectors and matrices as graphs,
as well as annotating and printing these graphs. It includes high-level functions for twoDept. of ECE, MRITS
74
Program the GUI: GUIDE automatically generates an M-file that controls how the
GUI operates. The M-file initializes the GUI and contains a framework for all the GUI
call-backs -- the commands that are executed when a user clicks a GUI component. Using
the M-file editor, we can add code to the call-backs to perform the functions.
GUIDE stores a GUI in two files, which are generated the first time when we save
A FIG-file, with extension .fig, which contains a complete description of the GUI
layout and the components of the GUI: push buttons, menus, axes, and so on.
An M-file, with extension .m, which contains the code that controls the GUI,
These two files correspond to the tasks of lying out and programming the GUI. When
we lay out of the GUI in the Layout Editor, our work is stored in the FIG-file. When we
program the GUI, our work is stored in the M-file.
75
76
77
78