You are on page 1of 20

Projected Inter-Active Display

for DLSU-Manila Campus Map

by

Arcellana, Anthony A.
Ching, Warren S.
Guevara, Ram Christopher M.
Santos, Marvin S.
So, Jonathan N.

October 2006
Chapter 3 Theoretical Considerations
3.1 Image

3.1.1 Image Representation

A digital image is a representation of a two-dimensional image as a finite

set of digital values, called pixels derived from the word “picture element”. It has

been discretized both in spatial coordinates and in brightness. Each pixel of an

image corresponds to a part of a physical object in the 3D world, which is

illuminated by some light which is partly reflected and partly absorbed by it. Part

of the reflected light reaches the sensor used to image the scene and is responsible

for the value recorded for the specific pixel. The pixels are stored in computer

memory as a raster image or raster map, a two-dimensional array of small

integers. (Petrou, M., et. al, 1999). The number of horizontal and vertical samples

in the pixel grid is called Image dimensions, it is specified as width x height.

These values are often transmitted or stored in a compressed form. The number of

bits, b, with a size of N x N with 2m different grey level is:

b=NxNxm

That is why we often try to reduce m and N, without significant loss in the quality

because it determines the storage size. Digital images can be created in a variety

of ways with input devices like digital cameras, scanners and etc.

3.1.2 Binary and Grayscale

There are many kinds of digital image like binary, grayscale, and color.

These digital images can be classified according to the number and nature of the
value of a pixel. Binary images are images that have been quantized to two

values, usually denoted 0 and 1, but often with pixel values 0 and 255,

representing black and white. A grayscale image is an image in which the value of

each pixel is a single sample. Images of this sort are typically composed of shades

of gray, varying from black to white depending on its intensity, though in

principle the samples could be displayed as shades of any color, or even coded

with various colors for different intensities. An example of this image is in figure

3.1. The original image is the letter a (leftmost) is a grayscale image that has an

intensity of 0 to 255, the center image is a zoomed in version of the image and it

reveals the individual pixels of the letter a. The rightmost image is the normalized

numerical values of each pixel. For this example the coding used is that 1(255) is

brightest and 0(0) is darkest.

Figure 3.1
3.1.3 Color

A color image is a digital image that includes color information for each

pixel, usually stored in memory as a raster map, a two-dimensional array of small

integer triplets; or as three separate raster maps, one for each channel. One of the

most popular colour model is the RGB model. The colors red, green, and blue was

formalized by the CIE (Commission Internationale d’Eclairage) which in 1931

specified the spectral characteristics of red(R), blue(B), green(G) to be

monochromatic light of wavelengths of 700 nm, 546.1nm, 435.8 nm respectively.

(Morris, T., 2004). Almost any colour can be made to match using linear

combinations of red, green, and blue:

C = rR + gG + bB

Today there are many RGB standards in use. Some of these are ISO RGB, sRGB,

ROMM RGB, and NTSC RGB. (Buckley, R. et. al, 1999). These standards are

specifications for specific applications of the RGB color spaces.

Figure 3.2
RGB Colorspace
3.1.4 Resolution

The term resolution is often used as a pixel count in digital imaging.

Resolution is sometimes identified by the width and height of the image as well as

the total number of pixels in the image. For example, an image that is 2048 pixels

wide and 1536 pixels high (2048X1536) contains (multiply) 3,145,728 pixels (or

3.1 Megapixels). Resolution of an image expresses how much detail we can see in

it and clearly and it depends on N and m. It is a measurement of sampling density,

resolution of bitmap images give a relationship between pixel dimensions and

physical dimensions. The most often used measurement is ppi, pixels per inch.

3.1.5 Scaling / Resampling

When creating an image with different dimensions from what we have, we

scale the image. Resampling algorithms try to reconstruct the original continous

image and create a new sample grid.

3.1.6 Sample depth

Sample depth is the level at which binary representation is used to

represent the image. The spatial continuity of the image is approximated by the

spacing of the samples in the sample grid. The values represented for each pixel is

determined by the sample format chosen.


3.2 Input and Output Devices

3.2.1 PC Camera

PC Camera, popularly known as web camera or webcam, is a real time

camera widely used for video conferencing via the Internet. Acquired images

from this device were uploaded in a web server hence making it accessible using

the world wide web, instant messaging, or a PC video calling application. Web

cameras typically include a lens, an image sensor, and some support electronics.

Image sensors can be a CMOS or CCD, the former being the dominant for low-

cost cameras. Typically, consumer webcams offers a resolution in the VGA region

having a rate of around 25 frames per second. Various lens is also available, the

most being a plastic lens that can be screwed in and out to manually control the

camera focus. Support electronics is present to read the image from the sensor and

transmit it to the host computer.

3.2.2 Projector

Projectors are classified into two technologies, DLP (Digital Light

Processing) and LCD (Liquid Crystal Display). This refers to the internal

mechanisms that the projector uses to compose the image (Projectorpoint).

3.2.2.1 DLP

Digital Light Processing technology used in projectors uses an optical

semiconductor known as the Digital Micromirror Device, or DMD chip to

recreate the source material. Originally developed by Texas Instruments, there are

two manners by which DLP projection creates a color image. First it employs the

usage of single-chip DLP projectors and the other is on the use of three-chip
projectors. On a single DMD chip, placing a color wheel between the lamp and

the DMD chip generates colors. Basically a color wheel is divided into four

sectors: red, green, blue and an additional clear section to boost brightness. The

later is usually omitted since it is only use to reduce color saturation. The DMD

chip is synchronized with the rotating color wheel thus when a certain color

section of the color wheel is in front of the lamp that color is displayed at the

DMD. While on a three chip DLP projector, a prism is used to split the light from

the lamp. Each primary color of light is routed to its own DMD chip, recombined

and directed out through the lens. Three chip DLP is referred to the market as

DLP2.

3.2.2.2 LCD

LCD projectors contain three separate LCD glass panels, one for red,

green, and blue components of the image signal being transferred to the projector.

As the light passes through the LCD panels, individual pixels can be opened to

allow light to pass or closed to block the light. This activity modulates the light

and produces the image that is projected onto the screen (Projectorpoint).

3.2.2.3 Keystone Correction

Keystoning occurs when a projector is aligned non-perpendicularly to a

screen, or when the projection screen has an angled surface. The resulting image

of keystoning will be trapezoidal rather than a square (trapezoidal distortion). To

avoid this trapezoidal distortion, keystone correction is done (Projector People).

Keystone correction is basically changing the shape of the projected image to

compensate for the trapezoidal distortion (Presenters Online).


There are two methods on which keystone correction is done, optical and

digital keystone correction. Optical keystone correction is done by physically

modifying the light-path through the lens. The correction is done after the light

has been reflected off the image panels in the projector. Digital keystone

correction adjusts the image proportions by shrinking the image at the edge

furthest away from the screen before the projector generates it (HTRgroup). The

amount of keystone correction varies on the projectors. Some projectors offer 13

to 35 degrees of vertical keystone correction and some even offer both vertical

and horizontal keystone corrections (Projector People).

3.3 Image Processing

Image processing is basically the transformation of images into images. These

images undergo signal processing techniques to manipulate the images to the users’

desire. These techniques will either enhance or suppress wanted and unwanted part of an

image respectively.

3.3.1 Preprocessing Algorithms

Preprocessing algorithms and techniques are used to make the necessary

data reduction and to make the analysis easier. This stage is basically where we

eliminate unwanted information in different specific applications. Such techniques

include extracting the Region-of-Interest (ROI), performing basic mathematical

operations, enhancement of specific features and data reduction. (Umbaugh,

2005 )

3.3.1.1 Defining Region-of-Interest


In image analysis we seldom need the whole image, we only want

to concentrate in a specified area of the image called the Region-of-

Interest (ROI). Image geometry operations are used to extract ROI.

Examples of these operations include crop, zoom, rotate, etc.

3.3.1.2 Arithmetic and Logical Operations

Arithmetic and logical operations are applied in preprocessing

stage to combine images in different ways. These operations include

addition, subtraction, multiplication, division, AND, OR, and NOT

3.3.1.3 Spatial Filters

Spatial filtering is used for noise reduction and image

enhancement. This is done by applying filter functions or filter operators

in the domain of the image space.

3.3.2 Thresholding

Thresholding is the process of reducing the gray scale of monochrome

images to two values and the simplest way to do image segmentation. One of

which is the “object pixel” and the other is the “background pixel”. An image will

be marked as an object pixel when its value is greater than the threshold value and

background pixel otherwise. Usually, an object pixel is given a value of '1' while a

background pixel is given a value of '0'.

0if . f (i, j ) ≤ θ
g (i, j ){
1 otherwise

The main parameter in thresholding lies in selecting the correct value for

the threshold. There are many ways to acquire the value of threshold and the
simplest way to select the threshold value would be to choose the mean or median

value. This is effective provided that the object pixels are brighter than the

background, and they should also be brighter than the average. Using a histogram

to record the frequency of occurrence of the image pixel and use the valley point

as the threshold would be the next. The histogram approach assumes that there is

some average value for the background and object pixels, but that the actual pixel

values have some variation around these average values. A more effective way to

acquire the value of threshold is by using iterative methods.

There are two ways to possibly perform the iterative method. The first

method will incrementally search through the histogram for a threshold. Starting

at the lower end of the histogram, the average of the gray values less than the

suggested threshold will be computed thus labeled L, and the same thing with

gray values greater than the suggested threshold labeled G. The average of L and

G will be then computed. If the average is equal to the suggested threshold, it will

be the threshold. Otherwise the suggested threshold is incremented and the

process repeats. (Umbaugh, 2005)

The second method searches the histogram persistently. First an initial

threshold value is suggested: a suitable choice is getting the average of the

image’s four corner pixels. Then the next steps will be similar to the first method,

the only difference lies on updating of the suggested threshold, on this method the

updated value is now equal to the average the value of L and G. (Umbaugh, 2005)

3.3.3 Edge Detection


Edges are important structures in images and in image processing. Edges

define significant structures in a scene, particularly the outlines of objects and

parts of objects.

Morris (2004) defines an edge as a significant, local change in image

intensity. Edge detectors can be classified in two types of operators: template

matching (TM) and differential gradient (DG). Examples of template matching

are Prewitt, kirsch, and Robinson operators and for differential gradient Roberts

and Sobel operators. Both template matching and differential gradient estimate

local intensity gradients with the help of suitable convolution masks (Davies,

2005).

In TM approach the local gradient magnitude, g is approximated by taking

the maximum of the responses for the different component masks:

g = max (gi : i=1,…,n)

where n is the number of masks used usually 8 to 12.

In the DG approach; the local edge magnitude may be computed

vectorially using the transformation

g = (gx + gy) ½

and the edge orientation is calculated as

θ = tan-1 ( gy / gx)

3.4 Motion Detection

3.4.1 Image Differencing

A common method for detecting moving objects is by use of image

differencing. Image differencing over successive pairs of frames should reveal the
different pixels which should be composed of the moving object. However certain

considerations complicate the matter. Regions of constant intensity and edges

parallel to the direction of motion give no sign of motion (Davies, E. , 2005). Also

image differencing suffers from noise. It is prone to contain errors due to subtle

changes in illumination. This can be caused due to environmental changes and the

digitization process of the camera where in internal noise causes subtle changes in

successive frames.

The documentation of the OpenCV library suggests using a mean of a

number of frames as the reference of the differencing. The mean is calculated as

And the standard deviation is

Where S(x,y) is the sum of the individual pixel intensities at point x and y

Sq(x,y) is the sum of the squares of the individual pixel intensities at point x and y

N is the total number of frames

A pixel is regarded as part of the moving object if it satisfies the condition

(m( x , y ) − p ( x , y ) ) > cσ ( x , y )

C is a certain constant that controls the sensitivity of the differencing. If C = 3, it

is known as the 3 sigma rule (Intel, 2001).

3.5 Image Segmentation


The term image segmentation refers to the partitioning of an image into a set of

regions according to a given criterion. Regions may also be defined as a group of pixels

having both a border and a particular shape such as circle, ellipse, or polygon. Image

segmentation is a very important tool in many image processing and computer vision

problems. Division of the image into regions corresponding to objects of interests is

necessary before any processing can be done at a higher level than that of the pixel. Most

image segmentation algorithms are modification, extension or combination of two basic

concepts. The two basic concepts are the measure of homogeneity within themselves and

the measure of contrast with the objects on their border. Image segmentation techniques

can be divided into three main categories: (1) region growing and shrinking, (2)

clustering methods, and (3) boundary detection (Umbaugh, 2005).

3.5.1 Region Growing Technique

The region growing and shrinking methods use the row and column based

image domain. The seed based region growing is a bottom up segmentation

approach (Yakimovsky, 1976). A seed point within the region of interest is

selected and the adjacent pixels which satisfy the homogeneity property is added.

This process will output a single connected region in the image. To fully partition

the image into N regions, seed points must be selected in each region and the

region growing process must be repeated N times.

The selection of seed points for region growing is often accomplished by

manually selecting the points within the objects of interest. With this process of

selecting seed points, it does ensure that the resulting object meets the needs of

the application. An alternative is to automatically scan the image in acquiring the


seed points based on some expected properties in the region of interest. Local

intensity maxima was usually used as a seed point since majority of the image

have a brighter objects than their background.

Once a seed point (x,y) is identified, the neighbors of that point (x+1,y),

(x-1,y), (x,y+1) and (x,y-1) will be examined to see which belong in the region.

All pixels whose colour is within the radius Rmax of the mean region colour cr are

part of the region, then these points should be added to the region and their

neighbors are next to be considered. As the region grows, the list of adjacent

pixels will also grow. The region will stop growing when all of the neighboring

pixels lie outside the colour radius Rmax (Sangwine, 1998).

3.5.2 Clustering Techniques

Clustering technique is an image segmentation method wherein individual

elements are placed into groups. These groups are based on some measure of

similarity within the group. The major difference of clustering technique with the

region growing technique is that domains other than the row and column (x,y),

based image space (the spatial domain) may be considered as the primary domain

for clustering. Other domains include color spaces, histogram spaces, or complex

feature spaces.

The process starts by looking for clusters in the domain (mathematical

space) of interest. The simplest method is to divide the space of interest into

regions by selecting the center or median along each dimension and splitting it

there. This method is used in the center and median segmentation algorithms. This

method will only be effective if the space we are using and the entire algorithm is
designed intelligently because the center or median split alone may not find good

clusters.

3.5.3 Boundary Detection

Boundary detection is performed by finding the boundaries between

objects, thus indirectly defining the objects. The process starts by marking points

that may be a part of an edge. These points are then merged into line segments,

and the line segments are then merged into object boundaries. Edge detectors are

used to mark points of rapid change, thus indicating the possibility of an edge.

These edge points represent local discontinuities in specific terms, such as

brightness, color or texture.

After the detection of edges the next step is to threshold the results. One

method is to consider the histogram of the edge detection results, looking for the

best valley manually. Edge detection threshold method works best with a bimodal

(two peaks) histogram (Umbaugh, 2005).

3.6 OpenCV

Intel developed an open source computer vision library named OpenCV which

intended for use, incorporation and modification by researchers, commercial software

developers, government and camera vendors as reflected in the license. OpenCV Library

is a collection of algorithms and sample codes for various computer vision problems.

This library is cross-platformed, and runs both on Windows and Linux Operating

Systems. It focuses mainly towards real-time image processing with applications in areas

of Human Computer Interaction (HCI), object identification, face recognition, gesture


recognition, motion tracking, and mobile robotics. The philosophy behind the creation of

the said library is to aid commercial uses of computer vision in human-computer

interface, robotics, monitoring, biometrics and security by providing a free and open

infrastructure where the distributed efforts of the vision community can be consolidated

and performance optimized.

3.6.1 Advantages of Using OpenCV Library

The software provides a set of image processing functions as well as

image and pattern analysis functions. The functions are optimized for Intel®

architecture processors, and are particularly effective at taking advantage of

MMX™ technology. The OpenCV Library is a way of establishing an open

source vision community that will make better use of up-to-date opportunities to

apply computer vision in the growing PC environment. The Library is open and

has platform-independent interface and supplied with whole C sources.

3.6.2 Relation Between OpenCV and Other Libraries

OpenCV is designed to be used together with Intel® Image Processing

Library (IPL) on which the latter extends the functionality toward image and

pattern analysis. It also uses Intel® Integrated Performance Primitives (IPP) on

lower-level, which provides cross-platform interface to highly-optimized low-

level functions that perform image processing and computer vision operations.

OpenCV can automatically benefit from using IPP on platforms like IA32, IA 64

and StrongARM.

3.6.3 Data Types Supported


To make OpenCV API simpler and more uniform, few fundamental types

helper data types are introduced. The fundamental data types include array-like

types: “IplImage”(IPL image), “CvMat” (matrix), growable and mixed type

collections: “CvSeq”, “CvSet”, “CvGraph” and “CvHistogram” (multi-

dimensional histogram). Helper data types include: “CvPoint” (2d point),

“CvSize”(width and height), ”CvTermCriteria” (termination criteria for iteration),

“CvMoments” (spatial moments) and many others.

3.7 Microsoft Visual C++

Microsoft Visual C++ is an integrated development environment (IDE) product

for C, C++ programming languages engineered by Microsoft Corporation. This contains

tools for creating and debugging C++ codes. It posses features like syntax highlighting,

auto-completion feature and debugging functions. The compile and build system feature,

precompiled header files, "minimal rebuild" functionality and incremental link: these

features significantly shorten turn-around time to edit, compile and link the program.

(Wikipedia) Visual C++ is included in the Visual Studio Suite.

3.7.1 Visual C++ Libraries

This includes the industry-standard Active Template Library (ATL), the

Microsoft Foundation Class (MFC) libraries, and standard libraries such as the

Standard C++ Library, and the C RunTime Library (CRT), which has been

extended to provide security enhanced alternatives to functions known to pose

security issues. A new library, the C++ Support Library, is designed to simplify

programs that target the CLR. (MSDN)


3.8 .Net Windows API

The Microsoft .NET framework is a software component that can be added to the

Microsoft operating system. It is a development and execution environment that allows

different programming languages & libraries to work together to create Windows-based

applications that are easier to build, manage, and integrate with other networked systems

(MSDN).

Windows API is designed for usage by C/C++ programs and is the most direct

way to interact with a Windows system for software applications (MSDN). An API

(Application Program Interface) is a set of predefined Windows functions used to control

the appearance and behavior of every Windows function, from the outlook of the desktop

to the memory allocation for new processes. Every action triggers several more API

functions telling Windows what has happened (Nair, 2002).

The APIs can be found in the DLL’s (Dynamic Link Library) in the Windows

system directory. Dynamic Link Library is Microsoft’s implementation of the shared

library concept in the Microsoft OS (Wikipedia). These win32 APIs can be split in to

three, User32.dll, which handles the user interface, Kernel32.dll, which handles file

operations and memory management and Gdi32.dll which handles graphics (Nair, 2002).
References

Answers. (n.d.). Webcam. Retrieved June 03, 2006 from


http://www.answers.com/topic/web-cam.

Buckley, R., et. al. (1999). Standard RGB color spaces. In the IS&T/SID Seventh Color
Imaging Conference: Color Science, Systems and Applications. Scottsdale,
Arizona

Davies, E. (2005). Machine vision: theory, algorithms, practicalities. Elsiever: CA

DLP and LCD Projector Technology Explained. (n.d.). Retrieved June 2, 2006, from
http://www.projectorpoint.co.uk/projectorLCDvsDLP.htm.

Home Theater Research Group. Keystone Correction. Retrieved September 24, 2006
from http://htrgroup.com/?tab=projector-docs&section=keystone

Intel (2001). Open source computer vision library reference manual. Retrieved September
22, 2006 from http://developer.intel.com

Kolas, O. (2005) Image Processing with gluas: introduction to pixel molding. Retrieved
September 24, 2006 form http://pippin.gimp.org/image_processing/chap_dir.html

Microbus (2003). Image, resolution, size and compression. Retrieved September 23, 2006
from http://www.microscope-microscope.org/imaging/image-resolution.htm

MSDN (2006) Microsoft developer network: .network fundamentals. Retrieved


September 23, 2006 from
http://msdn.microsoft.com/netframework/programming/fundamentals/default.asp
x.

Morris, T. (2004) Computer vision and image processing. Palgrave Macmillan: NY

Nair. S. (2002). Working with Win32 API in .NET. Retrieved September 24, 2006 from
http://www.c-sharpcorner.com/Code/2002/Nov/win32api.asp

Petrou, M., and Bosdogianni, P (1999). Image Processing, The Fundamentals. John Wiley
& Sons, LTD : New York

Presenters Online. Fixing a Distorted Image with Keystone Correction. Retrieved


September 24, 2006 from
http://www.presentersonline.com/technology/projector/keystonecorrection.shtml

Projector People. Projector Keystone Correction. Retrieved September 24, 2006 from
http://www.projectorpeople.com/tutorials/keystone-correction.asp
Sangwine, S. (1998). The colour image processing handbook. Chapman and Hall:
London

Shapiro, L. and Stockman, G. (2001). Computer Vision. Prentice Hall. Upper Saddle
River, New Jersey.

Umbaugh, S. (2005). Computer Imaging: Digital Image Analysis and Processing. CRC
Press: Boca Raton, Florida.

Wikipedia (n.d.). .Net framework. Retrieved September 23, 2006 from


http://en.wikipedia.org/wiki/.NET_Framework_3.0

Wikipedia. (n.d.). Segmentation. Retrieved September 19, 2006 from


http://en.wikipedia.org/wiki/Segmentation_(image_processing).

You might also like