Computer Vision Notes: Confirmed Midterm Exam Guide (Kisi-Kisi UTS)

Computer Vision Notes
Bilingual for convenience and don’t forget to bring calculator
Confirmed Midterm Exam Guide (Kisi-kisi UTS)

● Point-based processing: Image transformation, Histogram equalization
● Area-based processing: Filtering → Convolution and Correlation
● Canny Edge Detector (Explain how the edge detector works, step-by-step)
● Harris Corner Detector (Most probably explaining how it works step-by-step again)
● Case: SIFT and SURF explanation (according to Pak Diaz), paper-related in t he following link.
Point-based Processing
Image Transformation (Transformasi Citra)

For an in-depth explanation, you may open the following link.
Untuk penjelasan lebih lanjut, Anda dapat membuka t autan berikut ini.

Image transformation can be achieved through matrix multiplication.
Transformasi citra dapat dilakukan melalui perkalian matriks.
Rotation (Rotasi)
The following formula is used to rotate an image where 𝛩 (Theta) is the angle of rotation.
Rumus berikut digunakan untuk merotasi sebuah citra di mana 𝛩 (Theta) adalah sudut rotasi.

Easy way to remember rotation

Let say you want to rotate this vector 90 degree 2 times counterclockwise:

In the first rotation, the coordinate of this vector becomes:

In the second rotation, the coordinate of this vector becomes:

Now mathematically , we can do this 90 degree rotation by multiplying some unknown 2x2 matrix with the
vector 2 times:

First multiplication :

The result would be

Second multiplication :

The result would be

The full result matrix a, b , c ,d is

Since cos 90 = 0, sin 90 = 1, -(sin 90) = -1, we can guess and transform full result matrix into :

Hurray ＼(^ω^＼)
Shearing (Shear)
Shearing (a.k.a. skewing) is an operation that displaces a line vertically or horizontally depending on the
shear matrix used.
Shear (dikenal juga dengan skew) merupakan transformasi yang menggeser garis secara vertikal atau
horizontal bergantung pada shear matrix yang digunakan.

There are two types of shearing:
Terdapat dua jenis shear:

● Vertical
This type of shearing displaces lines vertically, depending on the value of 𝛼 and 𝑥.
Jenis shear ini menggeser garis secara vertikal, tergantung pada besar nilai 𝛼 dan 𝑥.

● Horizontal
This type of shearing displaces lines horizontally, depending on the value of 𝛼 and 𝑦.
Jenis shear ini menggeser garis secara horizontal, tergantung pada besar nilai 𝛼 dan 𝑦.
Scaling
A transformation that enlarges or shrinks the image by a certain scale (constant).
Sebuah transformasi yang memperbesar atau memperkecil citra dengan suatu skala (konstanta) tertentu.

There are two kinds of scaling transformations:
Terdapat dua jenis transformasi scaling:

● Uniform/Isotropic scaling (Scaling dengan konstanta yang sama)
This type of scaling uses the same scale factor for the 𝑥 and 𝑦 components of the vector.
Jenis scaling ini menggunakan faktor skala yang sama untuk komponen 𝑥 dan 𝑦 dari vektor.

● Non-uniform/Anisotropic Scaling (Scaling dengan konstanta yang berbeda)
This type of scaling uses different scale factors for the 𝑥 and 𝑦 components of the vector.
Jenis scaling ini menggunakan faktor skala yang berbeda untuk komponen 𝑥 dan 𝑦 dari citra.

Translation (Translasi)
A transformation that moves every component of the image by a given distance and cannot be written as
the multiplication of a 2x1 matrix with a 2x2 matrix.
Sebuah transformasi yang memindahkan setiap komponen dari citra dengan jarak tertentu dan tidak dapat
dituliskan sebagai perkalian matriks 2x1 dengan matriks 2x2.

● Homogeneous Coordinates (Koordinat Homogen)
To allow translation, the image must use homogeneous coordinates where the 2D vector is
represented as a 3D vector , with 𝑧 acting as a scale for the 𝑥 and 𝑦 components.
Agar dapat melakukan translasi, citra harus menggunakan koordinat homogen di mana vektor 2D
direpresentasikan dalam bentuk vektor 3D , di mana 𝑧 berfungsi sebagai skala untuk komponen 𝑥 dan
𝑦.

● Translation with Homogeneous Coordinates (Translasi dalam Koordinat Homogen)
Translation can be written as the product of a homogenous vector with a 3x3 matrix (with 𝑧 = 1)
Translasi dapat dituliskan dalam persamaan perkalian vektor homogen dengan matriks 3x3 (dengan
komponen 𝑧 = 1)

Where 𝑥 is moved by 𝛼 units and 𝑦 by 𝛽.
Converting a 2x2 matrix to 3x3 for homogeneous coordinates (Konversi matriks 2x2
menjadi 3x3 untuk koordinat homogen)
The transformation matrix can be converted to a 3x3 matrix for transformation with homogeneous
coordinates, that is:
Matriks transformasi dapat diubah menjadi matriks 3x3 untuk transformasi dalam koordinat
homogen menjadi:
Histogram Equalization
To calculate the equalized histogram, use CDF (Cumulative Distribution Function)

By calculating the CDF, we can obtain the normalized frequency of every intensity by rounding down every
fN result

L = intensity count
fk = cumulative frequency

Example:

Intensity fk CDF fN Intensity New fk
0 2 2/25 0.56 1 2
1 4 6/25 1.68 2 4
2 5 11/25 3.08 3 5
3 2 13/25 3.64 4 ↓
4 3 16/25 4.48 4 2 + 3 = 5
5 3 19/25 5.32 5 3
6 3 22/25 6.16 6 3
7 3 25/25 7 7 3

Intensity Transformation (point operators)

Image Negative

Where s is the output intensity value for input intensity r.

Using the equation above we reverse the intensity levels of an image to produce equivalent of image
negatives.

This type of processing is suited for enhancing white or gray detail covered mostly of dark background in
an image.

Log Transformation

Where c is a constant (usually 1) and

This type of transformation is suited for expanding the dark values in an image while compressing the high
intensity values.

We can see from Figure 3.3 that :
● Log function maps low input intensity value to wide range of output intensity level and map high input
intensity value to narrow range of output intensity level.
● Inverse log function is the opposite of log function (low intensity -> narrow output, high intensity -> wide
output).

Power Law (Gamma) Transformation

Where s is the output intensity value, c and are positive constant.

● Gamma Transformation is more versatile than log transformation for compressing intensity values.
● A variety of devices used for image capture, printing and display respond according to power law. The
process used to correct these power-law response phenomena is called gamma correction.

We can see from Figure 3.6 that:
● Fractional values ( ) maps narrow range of low intensity input value to wider range of
output intensity value while the opposite is true for the high intensity input value. (lowering the
fractional gamma values might reduce the contrast of an image and might make image look
“washed-out”).
● that is bigger than 1 maps wide range of intensity input value to narrow range of output intensity
value while the opposite is true for the high intensity input value.
● Gamma Transformation become identity transformation when is 1

Example of gamma correction:

A CRT device have an intensity-to-voltage response that is a power function, with exponents 2.5. By looking
at the Figure 3.6 ( = 2.5) the response produced by the CRT device tends to produce image that is darker.
We see in the Figure 3.7 (b) that indeed the image viewed in the CRT monitor is darker than the original
image in Figure 3.7(a).

Thus we need to do gamma correction by applying Power Law Transformation to the original image by
with c = 1 before we display it in the CRT monitor.

Gamma correction is useful for:
● Displaying an image accurately on a computer screen.
● Reproduce color of an image correctly (gamma values not only change the intensity value but also the
ratio of Red, Green, Blue in a color image).

Histogram Equalization

Probability of occurrence of input intensity level in a digital image is approximated by :

Where
● means input pixel r with intensity level j (0-255 or 0-[L-1] where L is the color bit depth or the number
of bins).
● is the number of pixel that have intensity level j.
● M is the number of pixel row and N is the number of pixel column (for example the image resolution is
640 x 480 then MN = 307200).
The discrete form of Transformation function CDF equation is:

Where (located in output image) is mapping from each corresponding pixel (located in input image).

Example :
Let say that there is an 3-bit image represented in 5x5 matrix :

5 6 3 1 5
1 2 5 3 3
6 4 1 7 7
3 4 0 6 2
2 7 5 0 5

We can calculate the frequency of each intensity value:

2
3
3
4
2
5
3
3

Since it’s contains 8 different intensity value then (L-1) = (8-1) = 7.
MN = 5x5 = 25

The equation becomes:

Calculate each s from 0 to 7:
= 7/25 * 2 = 0.56
= 7/25 * (2 + 3) = 1.4
= 7/25 * (2 + 3 + 3) = 2.24
= 7/25 * (2 + 3 + 3 + 4) = 3.36
= 7/25 * (2 + 3 + 3 + 4 + 2) = 3.92
= 7/25 * (2 + 3 + 3 + 4 + 2 + 5) = 5.32
= 7/25 * (2 + 3 + 3 + 4 + 2 + 5 + 3) = 6.16
= 7/25 * (2 + 3 + 3 + 4 + 2 + 5 + 3 + 3) = 7

Round all the fraction result since there is no way that pixel values is a fraction (IIRC PaoPao said by
flooring):

= 0 (no changes)
= 1 (no changes)
= 2 (no changes)
= 3 (no changes)
= 3 (changed)
= 5 (no changes)
= 6 (no changes)
= 7 (no changes)

Since only the pixel of intensity 4 mapped into 3 in the output image we replace intensity 4 by 3 and the
output image matrix become (changes in red):

5 6 3 1 5
1 2 5 3 3
6 3 1 7 7
3 3 0 6 2
2 7 5 0 5

(this is a really shitty example since the distribution of histogram is pretty balanced in the first place).

Spatial Transformation (Neighbours operation)

Definition of filters

There are two kinds of filter:
● Low Pass filter is a filter that passes low frequencies, the effect produced by this filter is
blurring/smoothing an image (also called a veraging filters).
● High Pass filter is a filter that passes high frequencies, the effect produced by this filter is s
harpening
(if the result of the filter is added to the original image).

We can achieve these effects by using spatial filters (also called spatial mask), spatial filters consist of :
1. A neighbourhood typically a small rectangle.
2. A predefined operation that is performed on the images pixels encompassed by the neighbourhood.

Spatial Filtering c reates a new pixel (in output image) with coordinates equal to the coordinate of the
center of the neighbourhood. If the operation performed on the image is linear then the filter is called linear
spatial filter otherwise the filter is nonlinear.
Spatial Correlation and Convolution

There are two methods of spatial filtering:
1. Correlation which is a process of moving filter mask over the image and computing the sum of
products at each location.
2. Convolution i s the same as correlation but the filter mask is rotated 180 degree before convolving.

Note that if the filter mask is s ymmetric then correlation and convolution will lead to the same result.
(leftmost column is symmetric)

Here the step by step video on how to convolve a mask with an image:
https://youtu.be/XuD4C8vJzEQ?t=185

Smoothing Spatial Filters (averaging)

● Smoothing is analogous to i ntegration
● Smoothing filters are used for b
lurring(removal of small details in image) and noise reduction.

Blurring because replacing the pixel in the image by the average intensity level in the neighbourhood lead to
reduced s harp transition in intensities between adjacent pixels and this also lead to N
oise reduction.
However edges (which is also characterized as sharp transition in intensities) is also blurred.

● The mask in figure 3.32(a) is called box filter because all the coefficient in the matrix is the same.
● The mask in figure 3.32(b) is called weighted average filter, t his terminology is used to indicate that
pixels are multiplied by different coefficients and thus giving more importance/weight to some pixels
(in this case the closer the pixel to the center, the bigger the coefficient is).
Sharpening Spatial Filters

● Sharpening is analogous to differentiation.
● Sharpening filter are based on f irst order derivative and second order derivative.
● All the coefficient in the mask must sum to 0 (image with constant intensity must have zero derivative).

Unsharp Masking and High Boost Filtering

Unsharp Masking is sharpening an image by subtracting an unsharp (smoothed) version of the original
image from the original image.
High Boost filtering is multiplying the mask created from unsharp masking by a constant K > 1.

The process of unsharp masking and high boost filtering are:
1. Blur the original image.
2. Subtract the blurred image from the original.
3. Multiply the mask by some constant k > 1.
4. Add the multiplied mask to the original image.

Edge Detection
How does a computer define an edge? It's the sudden change in colour or intensity of colour.
Mathematical definition: Edge is the zero crossing point of the second derivative as illustrated below.

All you have to understand is that the derivative is the gradient/kemiringan at any point in the graph.
First derivative graph is graphed by calculating gradient at every point in the colour intensity graph.
Second derivative graph is graphed by made by calculating gradient at all point in the first derivative graph.
First and second order derivative

First order derivative in digital image is defined as partial derivative with respect to both x and y :

And for second order derivative:

Laplacian Edge Detection

The second order derivative in image processing are implemented using laplacian operator.

Derivative laplacian operator can be defined as the sum of second order derivative with respect to x and y :

The equation above can be implemented into a filter mask by 3x3:

● Left mask doesn’t take into account diagonal pixel when computing the derivative and invariant to 90
degree rotation.
● Right mask is the extension of the original equation and also take into account the diagonal pixel when
computing the derivative and i nvariant to 90 and 45 degree rotation.
● Rotation invariant mask is called isotropic filter.
● We can sharpen an image by adding the result of filtered image by laplacian mask to the original image.

The first order derivative in image processing is implemented by using sobel mask and others.

Sobel Edge Detection

● The mask in the first row left column compute the derivative in h
orizontal direction.
● The mask in the first row right column compute the derivative in v ertical direction.
● The mask in the second row compute the derivative in d iagonal direction.
● Sobel also smooth the image when differentiating.
Canny Edge Detector

The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a
wide range of edges in images.
Canny edge detection algorithm consist of the following steps:

1. Noise reduction with Gaussian Filter
Convolving the image with the Gaussian filter to remove noise on the image since edge detection is
highly sensitive to noise
2. Gradient magnitude
First, we detect edge intensity and direction by calculating the gradient of the image using the Sobel
filter in x and y axis. Then, we calculate the magnitude by finding the hypotenuse of derivatives of the x
and y axis. Finally, we calculate the degree/magnitude of the gradient by calculating the tangent of
derivatives of the x and y axis.
3. Non-maxima suppression
Non-maxima suppression is a method to eliminate spurious (read:false) edges and corners in Canny
and Harris. By using this, you’ll get only the true edges and corners. Basically, NMS eliminates spurious
edges or corners by checking the direction of the edge and corner then checking the surrounding pixels
to eliminate pixels with low intensity and keep the high intensity pixels.
4. Double thresholding
Set high threshold to identify strong pixel (higher intensity than high threshold), low threshold to identify
non-relevant pixel (lower intensity than low threshold). The pixel which has intensity between high and
low threshold will be flagged.
5. Hysteresis thresholding
Hysteresis help us to identify the flagged pixels in double threshold considered as strong pixels or
non-relevant pixels. It will transform the weak pixel into strong pixel if there is at least one strong pixel
around that weak pixel.

More details about Canny in h ere!
Harris Corner Detector

Harris corner detector algorithm consist of the following steps:
1. Convert the image to grayscale and compute the image derivative (optionally smooth it first).
2. Find the second moment matrix / structure tensor matrix M by approximating the response difference
by using first order taylor expansion.
3. Plug the eigenvalues of matrix M to the corner response function to get response value R.
4. Perform non-maxima suppression to the list of candidate corners (R) and find the correct corner.
(gonna explain all of this later, maybe)

Step 1:
See APPENDIX A for gradient and Smoothing Spatial filters for smoothing.

Step 2:
In this step we're gonna use SSD (Sum of Square Difference) to extract the structure tensor matrix. The
purpose of this is to find the biggest response difference when we move the window in any direction
(finding a candidate corner point in a nutshell) [see the image below.]

This window operation is mathematically defined as:

● E is the difference between the original and the moved window.
● u is the window's displacement in the x direction
● v is the window's displacement in the y direction
● w(x, y) is the window at position (x, y). This acts like a mask. Ensuring that only the desired window is
used.
● I is the intensity of the image at a position (x, y)
● I(x+u, y+v) is the intensity of the moved window
● I(x, y) is the intensity of the original
Lets ignore w(x,y) for now and focus on the square difference

We can approximate the equation I(x+u,y+v) above by using f irst order multivariate taylor expansion. And
the equation becomes.
In the above equation, I(x,y) cancel out so lets expand and the equation becomes:
This can be turned into matrix vector multiplication from (since the summation symbol only depends on the
x,y we can leave out and its transpose outside the summation) :
Now we can extract the equation in the parenthesis which is called the structure tensor matrix/second
moment matrix i nto M (also add back w(x,y) since it also depends on the summation).:
Now the window operation simplified into :
Step 3:
Compute the eigenvalues from every M matrix (with varying x,y coordinate).
Forgot how to calculate eigen value? Here’s a refresher:

Then plug it to the response function
Response function is defined as :
Step 4:
Perform NMS to the list of R corner coordinates to find the best corner and eliminate unnecessary corners
candidate that do not lie on the ‘true edges’ (see APPENDIX A).3
Blob Detection
In computer vision, blob detection methods are aimed at detecting regions in a digital image that differ in
properties, such as brightness or color, compared to surrounding regions.

Methods:
● Laplacian of Gaussian (LoG)
● Difference of Gaussians (DoG)
● Determinant of Hessian (DoH)

Laplacian of Gaussian

Given input image I, create gaussian blurred version of it G. Convolving I and G using Laplacian operator
(taking second derivative, that is the very definition of edge if you remember) gives you LoG. (source:
http://fourier.eng.hmc.edu/e161/lectures/gradient/node8.html)

Difference of Gaussians

Given input image I, create multiple gaussian blurred version of it with different k-sizes and take the
differences between them using laplacian operator. (SIFT uses this)

Determinant of Hessian

Hessian operator is simply put a better version of Laplacian operator. (SURF uses this)
(source: https://en.wikipedia.org/wiki/Blob_detection#The_determinant_of_the_Hessian)

This is simply because Hessian operators contain more information since it contains all the possible
second-order partial derivatives where Laplacian operators only store information about the sum of
second-order partial derivatives. Hessian matrix looks like below matrix:
Hough Transform
The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital
image processing.[1]
The purpose of the technique is to find imperfect instances of objects within a certain
class of shapes by a voting procedure. This voting procedure is carried out in a p arameter space, from
which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly
constructed by the algorithm for computing the Hough transform.
The classical Hough transform was concerned with the identification of l ines in the image, but later the
Hough transform has been extended to identifying positions of arbitrary shapes, most commonly circles or
ellipses.
Image Descriptors
● Most features can be thought of as templates, histograms (counts), or combinations
● The ideal descriptor should be
○ Robust and Distinctive
○ Compact and Efficient
● Most available descriptors focus on edge/gradient information
○ Capture texture information
○ Color rarely used

Main Components
1. Detection: Identify the interest points
2. Description: Extract vector feature descriptor surrounding each interest point.
3. Matching: Determine correspondence between descriptors in two views
Scale Invariant Feature Transform (SIFT)
https://en.wikipedia.org/wiki/Scale-invariant_feature_transform

The scale-invariant feature transform (SIFT) is a f eature detection algorithm in
computer vision to detect and describe local features in images published by David
Lowe in 1999. SIFT can robustly identify objects even among clutter and under partial
occlusion, because the SIFT feature descriptor is invariant to u niform scaling,
orientation, illumination changes, and partially invariant to a ffine distortion.

Affine distortion example:
An image of a f ern-like f ractal that exhibits affine s elf-similarity.

SIFT keypoints of objects are first extracted from a set of reference images and stored
in a database. An object is recognized in a new image by individually comparing each feature from the new
image to this database and finding candidate matching features based on Euclidean distance of their
feature vectors.
Key locations are defined as maxima and minima of the result of d ifference of Gaussians (DoG) function
applied in s
cale space to a series of smoothed and resampled images.
SIFT uses a modified version of kd-tree (binary search tree that stores k-dimension koordinates) called the
best-bin-first search method[14] that can identify the n
earest neighbors with high probability using only a
limited amount of computation. The BBF algorithm uses a modified search ordering for the k-d tree
algorithm so that bins in feature space are searched in the order of their closest distance from the query
location. This search order requires the use of a heap-based priority queue for efficient determination of the
search order. The best candidate match for each keypoint is found by identifying its nearest neighbor in the
database of keypoints from training images. The nearest neighbors are defined as the key points with
minimum Euclidean distance from the given descriptor vector. The probability that a match is correct can
be determined by taking the ratio of distance from the closest neighbor to the distance of the second
closest.
Lowe[3] rejected all matches in which the distance ratio is greater than 0.8, which eliminates 90% of the
false matches while discarding less than 5% of the correct matches. To further improve the efficiency of
the best-bin-first algorithm search was cut off after checking the first 200 nearest neighbor candidates. For
a database of 100,000 keypoints, this provides a speedup over exact nearest neighbor search by about 2
orders of magnitude, yet results in less than a 5% loss in the number of correct matches.
SIFT uses Hough Transform to identify clusters of features with a consistent interpretation by using each
feature to vote for all object poses that are consistent with the feature. When clusters of features are found
to vote for the same pose of an object, the probability of the interpretation being correct is much higher
than for any single feature.
Finally, Outliers can now be removed by checking for agreement between each image feature and the
model, given the parameter solution. Given the l inear least squares solution (linear regression), each match
is required to agree within half the error range that was used for the parameters in the H ough transform
bins. As outliers are discarded, the linear least squares solution is resolved with the remaining points, and
the process iterated. If fewer than 3 points remain after discarding o utliers, then the match is rejected. In
addition, a top-down matching phase is used to add any further matches that agree with the projected
model position, which may have been missed from the Hough transform bin due to the similarity transform
approximation or other errors.
The final decision to accept or reject a model hypothesis is based on a detailed probabilistic model.[15] This
method first computes the expected number of false matches to the model pose, given the projected size
of the model, the number of features within the region, and the accuracy of the fit. A Bayesian probability
analysis then gives the probability that the object is present based on the actual number of matching
features found. A model is accepted if the final probability for a correct interpretation is greater than 0.98.
Lowe's SIFT based object recognition gives excellent results except under wide illumination variations and
under non-rigid transformations.

SIFT consist of the following this steps:
1. Scale-space extrema detection
Use Difference of Gaussian (DoG) to identify potential interest points, which were invariant to scale
and orientation
2. Keypoint localization
Reject low contrast points and eliminate edge responses
3. Orientation assignment
Each keypoint is assigned one or more orientations based on local image gradient direction to
achieve invariance to rotation
4. Keypoint descriptor
Compute a descriptor vector for each keypoints for the descriptor that is highly distinctive
andpartially invariant to remaining variations

Speeded up robust features (SURF)

It is partly inspired by the scale-invariant feature transform (SIFT) descriptor. The standard version of SURF
is several times faster than SIFT and claimed by its authors to be more robust against different image
transformations than SIFT, but SURF is sometimes more inaccurate when faced with rotations. It’s also
partially invariant to a
ffine distortion like SIFT, meaning sometimes it can be inaccurate.
To detect interest points, SURF first uses Gaussian Blur uses an integer approximation of the determinant
of Hessian blob detector, which can be computed with 3 integer operations using a precomputed integral
image. Its feature descriptor is based on the sum of the H
aar wavelet response around the point of interest.
These can also be computed with the aid of the integral image.
APPENDIX A
Image gradient

Lets define the derivative with respect to x as and with respect to y as :

(see F
irst and second order derivative section for explanation on the derivation).

To find edge strength and direction at location (x,y) in image f we need to compute the gradient :

The magnitude (length) of vector denoted as M(x,y) which is euclidean distance :

The direction of the gradient vector is given by angle at point (x,y) with respect to x axis:

We can use gradient operators to compute edge direction and strength (illustrated below):

Non-Maxima suppression

Example :

Let , , and denote four basic edge direction for 3x3 region which are
Horizontal (0 and +180) , -45 degree, vertical (+90 and -90) and +45 degree respectively. We can formulate
the following non maxima suppression scheme for 3x3 region centered at point (x,y) in ):
1. Compute gradient magnitude M(x,y) and angle .
2. Find the direction that is closest to
a. For example : if then the closest direction from (edge normal) is
horizontal since (20 - 0) = 20 , (45-20) = 25.
b. Since edge direction is perpendicular to edge normal, the edge direction is 0 + 90 = +90 degree and
0 - 90 = -90 degree (vertical direction).
3. If the value of M(x,y) is less than at least on of its two neighbors along , let f(x,y) = 0 (suppression)
otherwise, let f(x,y) = M(x,y).
a. Continuing the example in 2, the two neighbors along the vertical direction is (x,y+1) and (x,y-1).

Characteristics of Good Features detector (maybe come out in theory)
● Repeatability
○ The same feature can be found in several images despite geometric and photometric
transformations
● Saliency
○ Each feature is distinctive
● Compactness and efficiency
○ Many fewer features than image pixels
● Locality
○ A feature occupies a relatively small area of the image; robust to clutter and occlusion
Criteria for Optimal Edge Detection (this too)
● Good detection
○ The optimal detector must minimize the probability of false positives (detecting spurious edges
caused by noise), as well as that of false negatives (missing real edges)
● Good localization
○ The edges detected must be as close as possible to the true edges
● Single response constraint
○ The detector must return one point only for each true edge point, that is, minimize the number of
local maxima around the true edge (created by noise)

Computer Vision Notes: Confirmed Midterm Exam Guide (Kisi-Kisi UTS)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Vision Notes: Confirmed Midterm Exam Guide (Kisi-Kisi UTS)

Uploaded by

Copyright:

Available Formats

Computer Vision Notes

Bilingual for convenience and don’t forget to bring calculator

Confirmed Midterm Exam Guide (Kisi-kisi UTS)

Image Transformation (Transformasi Citra)

represented as a 3D vector , with 𝑧 acting as a scale for the 𝑥 and 𝑦 components.

Intensity Transformation (point operators)

Where s is the output intensity value, c and are positive constant.

with c = 1 before we display it in the CRT monitor.

Sharpening Spatial Filters

Canny Edge Detector

Canny edge detection algorithm consist of the following steps:

Now the window operation simplified into :

Forgot how to calculate eigen value? Here’s a refresher:

Response function is defined as :

Speeded up robust features (SURF)

You might also like