You are on page 1of 23

UNIT V

APPLICATIONS OF IMAGE PROCESSING


Image Classification – Image Recognition – Image Understanding – Video Motion Analysis – Image Fusion –
Steganography – Digital Compositing – Mosaics – Colour Image Processing..

IMAGE CLASSIFICATION
The intent of the classification process is to categorize all pixels in a digital image into one of several land
cover classes, or "themes". This categorized data may then be used to produce thematic maps of the
land cover present in an image. Normally, multispectral data are used to perform the classification and,
indeed, the spectral pattern present within the data for each pixel is used as the numerical basis for
categorization (Lillesand and Kiefer, 1994). The objective of image classification is to identify and
portray, as a unique gray level (or color), the features occurring in an image in terms of the object or
type of land cover these features actually represent on the ground.

Figure Spectral Reflectance curve of 3 land covers

Image classification is perhaps the most important part of digital image analysis. It is very nice to have a
"pretty picture" or an image, showing a magnitude of colors illustrating various features of the
underlying terrain, but it is quite useless unless to know what the colors mean. Two main classification
methods are:

 Supervised Classification
 Unsupervised Classification
 Maximum Likelihood Classification
 Minimum distance Classification and
 Parallelepiped Classification

Supervised Classification
With supervised classification, we identify examples of the Information classes (i.e., land cover type) of
interest in the image. These are called "training sites". The image processing software system is then
used to develop a statistical characterization of the reflectance for each information class. This stage is
often called "signature analysis" and may involve developing a characterization as simple as the mean or
the rage of reflectance on each bands, or as complex as detailed analyses of the mean, variances and
covariance over all bands. Once a statistical characterization has been achieved for each information
class, the image is then classified by examining the reflectance for each pixel and making a decision
about which of the signatures it resembles most.

Figure Steps in Supervised classification

Maximum likelihood Classification

Maximum likelihood Classification is a statistical decision criterion to assist in the classification


of overlapping signatures; pixels are assigned to the class of highest probability.

The maximum likelihood classifier is considered to give more accurate

results than parallelepiped classification however it is much slower due to extra computations.
We put the word `accurate' in quotes because this assumes that classes in the input data have a
Gaussian distribution and that signatures were well selected; this is not always a safe assumption.

Minimum distance Classification

Minimum distance classifies image data on a database file using a set of 256 possible class
signature segments as specified by signature parameter. Each segment specified in signature, for
example, stores signature data pertaining to a particular class. Only the mean vector in each class
signature segment is used. Other data, such as standard deviations and covariance matrices, are
ignored (though the maximum likelihood classifier uses this).

The result of the classification is a theme map directed to a specified database image channel. A
theme map encodes each class with a unique gray level. The gray-level value used to encode a
class is specified when the class signature is created. If the theme map is later transferred to the
display, then a pseudo-color table should be loaded so that each class is represented by a
different color.

Parallelepiped Classification

The parallelepiped classifier uses the class limits and stored in each class signature to determine
if a given pixel falls within the class or not. The class limits specify the dimensions (in standard
deviation units) of each side of a parallelepiped surrounding the mean of the class in feature
space.

If the pixel falls inside the parallelepiped, it is assigned to the class. However, if the pixel falls
within more than one class, it is put in the overlap class (code 255). If the pixel does not fall
inside any class, it is assigned to the null class (code 0).

The parallelepiped classifier is typically used when speed is required. The draw back is (in many
cases) poor accuracy and a large number of pixels classified as ties (or overlap, class 255).

Unsupervised Classification

Unsupervised classification is a method which examines a large number of unknown pixels and
divides into a number of classed based on natural groupings present in the image values. unlike
supervised classification, unsupervised classification does not require analyst-specified training
data. The basic premise is that values within a given cover type should be close together in the
measurement space (i.e. have similar gray levels), whereas data in different classes should be
comparatively well separated (i.e. have very different gray levels)

The classes that result from unsupervised classification are spectral classed which based on
natural groupings of the image values, the identity of the spectral class will not be initially
known, must compare classified data to some form of reference data (such as larger scale
imagery, maps, or site visits) to determine the identity and informational values of the spectral
classes. Thus, in the supervised approach, to define useful information categories and then
examine their spectral separability; in the unsupervised approach the computer determines
spectrally separable class, and then define their information value.

Unsupervised classification is becoming increasingly popular in agencies involved in long term


GIS database maintenance. The reason is that there are now systems that use clustering
procedures that are extremely fast and require little in the nature of operational parameters. Thus
it is becoming possible to train GIS analysis with only a general familiarity with remote sensing
to undertake classifications that meet typical map accuracy standards. With suitable ground truth
accuracy assessment procedures, this tool can provide a remarkably rapid means of producing
quality land cover data on a continuing basis.
IMAGE RECOGNITION

 The identification of objects in an image. This process would probably start with image
processing techniques such as noise removal, followed by (low-level) feature extraction to locate
lines, regions and possibly areas with certain textures.
The clever bit is to interpret collections of these shapes as single objects, e.g. cars on a road,
boxes on a conveyor belt or cancerous cells on a microscope slide. One reason this is an
AI problem is that an object can appear very different when viewed from different angles or
under different lighting. Another problem is deciding what features belong to what object and
which are background or shadows etc. The human visual system performs these tasks mostly
unconsciously but a computer requires skillful programming and lots of processing power to
approach human performance.
Important application areas are:

 Fingerprint recognition

 Character recognition

 Face recognition

 Person identification

FINGERPRINT RECOGNITION

Fingerprint recognition is a rapidly evolving technology, which is being widely used in forensics
such as criminal identification and prison security, and has the potential to be used in a large
range of civilian application areas. Fingerprint recognition can be used to prevent unauthorized
access to ATMs, cellular phones, smart cards, desktop PCs, workstations, and computer
networks. It can be used during transactions conducted via telephone and Internet (electronic
commerce and electronic banking).

Principles and Algorithms

The ridge characteristics of two different impressions to show the coincidence of the features can
compares fingerprints. These features are called minutiae and are represented by specific ridge
characteristics. The basis for all worldwide forensic fingerprint identification systems today is
found in the ability to identify and match basic minutiae characteristics in the fingerprint images.
To accomplish this, three major fingerprint identification steps were implemented:

 image capture

 image processing

 feature extraction and

 matching.
Intelligent systems evaluate the fingerprint minutiae features by showing the coincidence of the
features, taking into consideration the similarity, number, and unit relationship of the
characteristics to each other. Searching and matching of fingerprints is accomplished by
assigning each minutiae point a position on an x/y coordinate, a direction of flow, and
relationship to other minutiae.

CHARACTER RECOGNITION

Handwritten input is often used in business 'form-filling' applications, including handwritten


postal addresses, cheques, insurance applications, mail-order forms, tax returns, credit card sales
slips, customs declarations and many others. These applications all generate handwritten script
from an unconstrained population of writers and writing implements, which must subsequently
be processed off-line by computer. The objective is to develop techniques, which allow the
automatic input of handwritten forms data to computers.

Principles and Algorithms

Forms-based Character Recognition applications require the integration of several independent


components (hand-printed addresses, signatures, cursive handwriting, numeric and alphanumeric
hand-printed codes, etc.) together with form-specific knowledge if adequate recognition
performance is to be achieved.

FACE RECOGNITION

Face recognition in general and the recognition of moving people in natural scenes in particular,
require a set of visual tasks to be performed robustly.

These include:

1. Acquisition: the detection and tracking of face-like image patches in a dynamic scene

2. Normalisation: the segmentation, alignment and normalisation of the face images

3. Recognition: the representation and modelling of face images as identities, and the
association of novel face images with known models.

These tasks seem to be sequential and have traditionally often been treated as such. However, it
is both computationally and psychophysically more appropriate to consider them as a set of co-
operative visual modules with closed-loop feedbacks.

Applications

A biometrics system based on face recognition has a very large number of applications: security
systems, criminal identification, image and movie processing, man-machine interaction.
Unfortunately, the development of computational models for the face recognition is a very
difficult task; because even today we do not know how the human brain recognizes the faces.
Automatic face recognition involves the resolution of some complex problems:
• face localization in complex scenes

• invariance to pose and illumination

• invariance to change in expression

• invariance to moustache, beard, glasses, hairstyle etc.

Technology

Technologies used for face recognition are many and varied, but they all consists of thesame
three basic steps.

1. Segmentation of scenes from cluttered scenes

- edge map, several heuristics

2. Extraction of features from the face region

- holistic features(where each feature is a characteristic of the whole face)

- partial features(hair, nose, mouth, eyes etc)

3. Decision

- identification

- recognition of person

- categorization

Eigenface Technology

A large number of pictures of faces are collected in a database. Then, combining all the pictures
as shown to the right, makes a set of eigenfaces - two-dimensional face-like arrangements of
light and dark areas, and looking at what is common to groups of individuals and where they
differ most.

Eigenfaces work as 'component faces.' Just as a color can be created by mixing primary colors, a
facial image can be built by adding together, with different intensities of light, a relatively small
number of eigenfaces (about 100 is enough to identify most persons but just for one pose and one
lighting condition).
To identify a face, the program compares its eigenface characteristics, which are encoded into
numbers called a template,12 with those in the database, selecting the faces whose templates
match the target most closely, as shown to the right

Local Feature Analysis

Local feature analysis is derived from the eigenface method but overcomes some of its problems
by not being sensitive to deformations in the face and changes in poses and lighting. Local
feature analysis considers individual features instead of relying on only a global representation of
the face. The system selects a series of blocks that best define an individual face. These features
are the building blocks from which all facial images can be constructed. The procedure starts by
collecting a database of photographs and extracting eigenfaces from them. Applying local feature
analysis, the system selects the subset of building blocks, or features, in each face that differ
most from other faces. Any given face can be identified with as few as 32 to 50 of those

blocks. The most characteristic points as shown to the right are the nose, eyebrows, mouth and
the areas where the curvature of the bones changes. The patterns have to be elastic to describe
possible movements or changes of expression. The computer knows that those points, like the
branches of a tree on a windy day, can move slightly across the face in combination with the
others without losing the basic structure that defines that face. To determine someone's identity,

 the computer takes an image of that person and

 determines the pattern of points that make that individual differ most from other people.
Then the system starts creating patterns,

 either randomly or

 based on the average eigenface.

 For each selection, the computer constructs a face image andcompares it with the target
face to be identified.
 New patterns are created until

 a facial image that matches with the target can be constructed. When a match is found,
the computer looks in its database for a matching pattern of a real person

 as shown below.

Neural Network

As face recognition has evolved from eigenfaces to local feature analysis, three performance
issues have remained: locating a face in images with complex backgrounds, accommodating
variable lighting conditions and improving recognition accuracy. Neural network technology has
dealt with all these issues in the next evolutionary step of face recognition technology.

Once a face is isolated and located, features from the entire face are extracted as visual contrast
elements such as the eyes, side of the nose, mouth, eyebrows, cheek line and others .

The features are quantified, normalized and compressed into a template code. The combination
of face isolation and face encoding takes about 1 second on a Pentium II. The size of the
template code is about 1,300 bytes (Face Template). To verify someone, a14 claim is first made
for whom the person should be, in a process shown below. This claim is in the form of a user ID
that can come from either a PIN, bar code, card, token, proxcard or other biometrics. The user ID
claim is used to fetch the face template representing the true identity of the person (reference
template), which is stored in a database. Then the program matches the known reference
template against the live face template and produces a match score. When a match scores,
between the live face template and the stored reference template, is greater than a preset
threshold, then the two faces are deemed to come from the same person. Otherwise they are
deemed to come from different people.
Future of Face Recognition – Person Recognition

Face recognition systems used today work very well under constrained conditions, although all
systems work much better with frontal mug-shot images and constant lighting. All current face
recognition algorithms fail under the vastly varying conditions under which humans need to and
are able to identify other people. Next generation person recognition systems will need to
recognize people in real-time and in much less constrained situations.

IMAGE UNDERSTANDING

Image understanding is the process of actually interpreting those regions/objects to figure out
what’s actually happening in the image. This may include figuring out what the objects are, their
spatial relationships to each other, etc. It may also include ultimately making some decision for
further action.

Control Strategies

1. Bottom-Up
The process as we’ve just described it is bottom-up: it starts from the raw image data, works
upward to object shape representation, and from there to a higher-level analysis or decision.
2. Top-Down
Image understanding can also work from the top down. Such processing makes hypotheses about
that is happening, then uses the image data to validate/reject those hypotheses. Most of these
approaches are model-based. That is, they have an approximate model of what they think they’re
looking at, then try to fit that model to the data. In a sense, primitive-fitting approaches such as
the Hough transform used this idea. This idea can be extended further to so-called deformable
models, in which you can deform the model to better fit the data. The goodness of the match is
the inverse of how much you have to work to deform the model to fit the data.
3. Hybrid Hierarchical Control
Obviously, these two paradigms aren’t mutually exclusive: one might use bottom-up processing
to bring the information to something higher than the pixel level, then switch to top-down
processing using this intermediate representation to validate or reject hypotheses.
4. Active Vision
A third alternative, neither quite bottom-up or top-down, presents itself when the vision system is
part of a larger system capable of acting so as to be able to influence the position/view/etc. Such
active vision approaches more accurately model the way people interact with the world. The idea
is to make a tentative hypothesis (using either top-down or bottom-up processing) then ask
yourself, “based on what I already know (and suspect), what do I need to do to be able to acquire
the information that will best help me analyze the image or otherwise accomplish my task?”

Active Contours (Snakes)


The earliest and best known active contour approach is snakes: deformable splines that are acted
upon by image, internal, and user-defined “forces” and deform to minimize the “energy” they
exert in resisting these forces.

Notice that the general form of a snake follows the idea introduced earlier when we discussed
graph-based approaches:
1. Establish the problem as the minimization of some cost function.
2. Use established optimization techniques to find the optimal (minimum cost) solution.

In the case of a snake, the cost function is the “energy” exerted by the snake in resisting the
forces put upon it. The original formulation of this “energy” was

Esnake = wintEint + wimageEimage + wconEcon


where each term is as follows:
 Eint Internal Energy Keeps the snake from bending too much
 Eimage Image Energy Guides the snake along important image features
 Econ Constraint Energy Pushes or pulls the snake away from or towards
user-defined positions
The total energy for the snake is the integral of the energy at each point:

Internal Energy

The internal energy term tries to keep the snake smooth. Such smoothness constraints are
also a common theme in computer vision, occurring in such approaches as
• Bayesian reconstruction
• Shape from shading
• Stereo correspondence
• and many others . . .
The internal energy term in general keeps the model relatively close to its original a
priori shape. In this case, we explicitly assume that the contour we want is generally smooth but
otherwise unconstrained. Other models might start with an approximation of a brain, a heart, a
kidney, a lung, a house, a chair, a person, etc.orm this model—in all cases, the internal energy
term constrains the deformation.
One has to be careful with the weighting given to internal energy terms, though: too
much weight means the model stays too “rigid” and the system “sees what it wants to see”, too
little weight means the model is too flexible and can pretty much match up to anything.
In the original snakes implementation, they used two terms to define the internal energy:
one to keep the snake from stretching or contracting along its length (elasticity) and another to
keep the snake from bending (curvature):

Notice that both α(s) and β(s) are functions of the arc length along the snake. This means that we
can (perhaps interactively), keep the snake more rigid in some segments and more flexible in
others.

Image Energy

The image energy term is what drives the model towards matching the image. It is
usually inversely based on image intensity (bright curve finding), gradient magnitude (edge
finding), or similar image features. Make sure to note the inverse relationship: strong features are
low energy, and weak features (or no features) are high energy.
An interesting image energy term used in the original snakes paper also tracked line
terminations. These are useful in analyzing visual illusions such as the Kinisa and Ehringhaus
illusions illustrated in your text.

Constraint Energy
Some systems, including the original snakes implementation, allowed for user interaction
to guide the snakes, not only in initial placement but also in their energy terms. Such constraint
energy can be used to interactively guide the snakes towards or away from particular features.
The original snakes paper used “springs” (attraction to specified points) and “volcanos”
(repulsion from specified points).

Point-Density Models
Point density models can be thought of as deformable models in which the deformation
“energy” is based on statistical properties of a large number of training examples. This has the
powerful advantage of allowing deformation where the objects themselves normally differ while
remaining more rigid where the objects themselves are normally consistent.
First identify a number of key landmark points for the object. These need not be on the
contour but can be any identifiable points.
Now, gather a collection of sample images with varying shapes that you want to
recognize. (For example, if you’re building a system to recognize and analyze brains, get a
collection of sample brains; if you’re building a system to recognize different kinds of fish, get
yourself samples of each kind of fish; etc.) For each image in your training set, find the
landmarks and store their locations. Now, you need to register these images by transforming
(translating, rotating) the landmark points so that they are all registered relative to a common
mean shape.
We can now measure the covariance matrix for all of the landmarks across all of the
training shapes. This tells us not only the consistency of each landmark but the way each
landmark tends to move as the others move.
The covariance matrix can be further analyzed by computing its eigenvalues and
eigenvectors (called principal component analysis). These eigenvectors are called the modes of
variation of the shapes, and the eigenvalues tell you the rigidity or flexibility along these modes.
Notice that a mode is not a single direction—it’s an overall change in all of the landmarks
relative to each other.
Point-density or landmark-based models, though they can be computationally expensive,
are among the most powerful models for shape description and deformation currently in use.

VIDEO MOTION ANALYSIS

Video Motion Analysis is the technique used to get information about moving objects from video.
Examples of this include gait analysis, sport replays, speed/acceleration calculations and, in the case of
team/individual sports, task performance analysis. The motions analysis technique usually involves a
digital movie camera, and a computer that has software allowing frame-by-frame playback of the video.

Uses

Traditionally, video motion analysis has been used in scientific circles for calculation of speeds
of projectiles, or in sport for improving play of athletes. Recently, computer technology has
allowed other applications of video motion analysis to surface including things like teaching
fundamental laws of physics to school students, or general educational projects in sport and
science.

In Sport, systems have been developed to provide a high level of task, performance and
physiological data to coaches, teams and players. The objective is to improve individual and
team performance and/or analyse opposition patterns of play to give tactical advantage. The
repetitive and patterned nature of sports games lend themselves to video analysis in that over a
period of time real patterns, trends or habits can be discerned.

Technique

A digital video camera is mounted on a tripod. The moving object of interest is filmed doing a
motion with a scale in clear view on the camera. Using video motion analysis software, the
image on screen can be calibrated to the size of the scale enabling measurement of real world
values. The software also takes note of the time between frames to give a movement versus time
data set. This is useful in calculating gravity for instance from a dropping ball.

Sophisticated sport analysis systems such as those by Verusco Technologies in New Zealand use
other methods such as direct feeds from satellite television to provide realtime analysis to
coaches over the internet and more detailed post game analysis after the game has ended.

Software
There are many commercial packages that enable frame by frame or real time video motion
analysis. There are also free packages available that provide the necessary software functions.
These free packages include the relatively old but still functional Physvis, and a relatively new
program written in Java called Phys Mo which runs on Macintosh and Windows. Upmygame is a
free online application. Video Strobe is free software that creates a strobographic image from a
video; motion analysis can then be carried out with dynamic geometry software such as
GeoGebra.

The objective for video motion analysis will determine the type of software used. Prozone and
Amisco are expensive stadium based camera installations focusing on player movement and
patterns. Both of these provide a service to "tag" or "code" the video with the players' actions,
and deliver the results after the match. In each of these services, the data is tagged according to
the company's standards for defining actions.

Verusco Technologies are oriented more on task and performance and therefore can analyse
games from any ground. Focus X2 and Sports code systems rely on the team performing the
analysis in house, allowing the results to be available immediately, and to the team's own coding
standards.

Match Matix takes the data output of video analysis software, and analyses sequences of events.
Live HTML reports are generated and shared across a LAN, giving updates to the manager on
the touchline while the game is in progress.

IMAGE FUSION

Multisensor Image fusion is the process of combining relevant information from two or more
images into a single image. The resulting image will be more informative than any of the input
images.

In remote sensing applications, the increasing availability of space borne sensors gives a
motivation for different image fusion algorithms. Several situations in image processing require
high spatial and high spectral resolution in a single image. Most of the available equipment is not
capable of providing such data convincingly. The image fusion techniques allow the integration
of different information sources. The fused image can have complementary spatial and spectral
resolution characteristics. But, the standard image fusion techniques can distort the spectral
information of the multispectral data, while merging.

In satellite imaging, two types of images are available. The panchromatic image acquired by
satellites is transmitted with the maximum resolution available and the multispectral data are
transmitted with coarser resolution. This will be usually, two or four times lower. At the receiver
station, the panchromatic image is merged with the multispectral data to convey more
information.

Many methods exist to perform image fusion. The very basic one is the high pass filtering
technique. Later techniques are based on DWT, uniform rational filter bank, and laplacian
pyramid.
Why Image Fusion

Multisensor data fusion has become a discipline to which more and more general formal
solutions to a number of application cases are demanded. Several situations in image processing
simultaneously require high spatial and high spectral information in a single image. This is
important in remote sensing. However, the instruments are not capable of providing such
information either by design or because of observational constraints. One possible solution for
this is data fusion..

Standard Image Fusion Methods

Image fusion methods can be broadly classified into two - spatial domain fusion and transform
domain fusion.

The fusion methods such as averaging, Brovey method, principal component analysis (PCA) and
IHS based methods fall under spatial domain approaches. Another important spatial domain
fusion method is the high pass filtering based technique. Here the high frequency details are
injected into upsampled version of MS images. The disadvantage of spatial domain approaches is
that they produce spatial distortion in the fused image. Spectral distortion becomes a negative
factor while we go for further processing, such as classification problemial distortion can be very
well handled by transform domain approaches on image fusion. The multiresolution analysis has
become a very useful tool for analysing remote sensing images. The discrete wavelet transform
has become a very useful tool for fusion. Some other fusion methods are also there, such as
Lapacian pyramid based, curvelet transform based etc. These methods show a better performance
in spatial and spectral quality of the fused image compared to other spatial methods of fusion.

The images used in image fusion should already be registered. Misregistration is a major source
of error in image fusion. Some well-known image fusion methods are:

 High pass filtering technique


 IHS transform based image fusion
 PCA based image fusion
 Wavelet transform image fusion
 pair-wise spatial frequency matching

Applications

1. Image Classification
2. Aerial and Satellite imaging
3. Medical imaging
4. Robot vision
5. Concealed weapon detection
6. Multi-focus image fusion
7. Digital camera application
8. Battle field monitoring
Satellite Image Fusion

Several methods are there for merging satellite images. In satellite imagery we can have two
types of images

 Panchromatic images - An image collected in the broad visual wavelength range but
rendered in black and white.
 Multispectral images - Images optically acquired in more than one spectral or
wavelength interval. Each individual image is usually of the same physical area and scale
but of a different spectral band.

The standard merging methods of image fusion are based on Red-Green-Blue (RGB) to
Intensity-Hue-Saturation (IHS) transformation. The usual steps involved in satellite image fusion
are as follows:

1. Resize the low resolution multispectral images to the same size as the panchromatic image.
2. Transform the R,G and B bands of the multispectral image into IHS components.
3. Modify the panchromatic image with respect to the multispectral image. This is usually
performed by histogram matching of the panchromatic image with Intensity component of the
multispectral images as reference.
4. Replace the intensity component by the panchromatic image and perform inverse
transformation to obtain a high resolution multispectral image.

Medical Image Fusion

Image fusion has become a common term used within medical diagnostics and treatment. The
term is used when multiple patient images are registerd and overlaid or merged to provide
additional information. Fused images may be created from multiple images from the same
imaging modality , or by combining information from multiple modalities , such as magnetic
resonance image (MRI), computed tomography (CT), positron emission tomography (PET), and
single photon emission computed tomography (SPECT). In radiology and radiation oncology,
these images serve different purposes. For example, CT images are used more often to ascertain
differences in tissue density while MRI images are typically used to diagnose brain tumors.

For accurate diagnoses, radiologists must integrate information from multiple image formats.
Fused, anatomically-consistent images are especially beneficial in diagnosing and treating
cancer. Companies such as Nicesoft, Velocity Medical Solutions, Mirada Medical, Keosys,
MIMvista, IKOE, and BrainLAB have recently created image fusion software for both improved
diagnostic reading, and for use in conjunction with radiation treatment planning systems. With
the advent of these new technologies, radiation oncologists can take full advantage of intensity
modulated radiation therapy (IMRT). Being able to overlay diagnostic images onto radiation
planning images results in more accurate IMRT target tumor volumes.

STEGANOGRAPHY
Steganography is the art and science of writing hidden messages in such a way that no one,
apart from the sender and intended recipient, suspects the existence of the message, a form of
security through obscurity. The word steganography is of Greek origin and means "concealed
writing" from the Greek words steganos (στεγανός) meaning "covered or protected", and
graphein (γράφειν) meaning "to write". The first recorded use of the term was in 1499 by
Johannes Trithemius in his Steganographia, a treatise on cryptography and steganography
disguised as a book on magic. Generally, messages will appear to be something else: images,
articles, shopping lists, or some other covertext and, classically, the hidden message may be in
invisible ink between the visible lines of a private letter.

The advantage of steganography, over cryptography alone, is that messages do not attract
attention to themselves. Plainly visible encrypted messages—no matter how unbreakable—will
arouse suspicion, and may in themselves be incriminating in countries where encryption is
illegal. Therefore, whereas cryptography protects the contents of a message, steganography can
be said to protect both messages and communicating parties.

Steganography includes the concealment of information within computer files. In digital


steganography, electronic communications may include steganographic coding inside of a
transport layer, such as a document file, image file, program or protocol. Media files are ideal for
steganographic transmission because of their large size. As a simple example, a sender might
start with an innocuous image file and adjust the color of every 100th pixel to correspond to a
letter in the alphabet, a change so subtle that someone not specifically looking for it is unlikely to
notice it.

Steganographic Techniques

 Physical Steganography

 Digital Steganography

 Network Steganography

 Text Steganography

 Steganography using Sudoku Principles

Digital Steganography

Modern steganography entered the world in 1985 with the advent of the personal computer being
applied to classical steganography problems. Development following that was slow, but has
since taken off, going by the number of "stego" programs available: Over 800 digital
steganography applications have been identified by the Steganography Analysis and Research
Center. Digital steganography techniques include:

 Concealing messages within the lowest bits of noisy images or sound files.
 Concealing data within encrypted data or within random data. The data to be concealed is first
encrypted before being used to overwrite part of a much larger block of encrypted data or a
block of random data (an unbreakable cipher like the one-time pad generates ciphertexts that
look perfectly random if you don't have the private key).
 Chaffing and winnowing.
 Mimic functions convert one file to have the statistical profile of another. This can thwart
statistical methods that help brute-force attacks identify the right solution in a ciphertext-only
attack.
 Concealed messages in tampered executable files, exploiting redundancy in the targeted
instruction set.
 Pictures embedded in video material (optionally played at slower or faster speed).
 Injecting imperceptible delays to packets sent over the network from the keyboard. Delays in
keypresses in some applications (telnet or remote desktop software) can mean a delay in
packets, and the delays in the packets can be used to encode data.
 Changing the order of elements in a set.
 Content-Aware Steganography hides information in the semantics a human user assigns to a
datagram. These systems offer security against a non-human adversary/warden.
 Blog-Steganography. Messages are fractionalized and the (encrypted) pieces are added as
comments of orphaned web-logs (or pin boards on social network platforms). In this case the
selection of blogs is the symmetric key that sender and recipient are using; the carrier of the
hidden message is the whole blogosphere.
 Modifying the echo of a sound file (Echo Steganography)

Applications

Image Steganography has many applications, especially in today’s modern, hightechworld.


Privacy and anonymity is a concern for most people on the internet. Image Steganography allows
for two parties to communicate secretly and covertly. It allows for some morally-conscious
people to safely whistle blow on internal actions; it allows for copyright protection on digital
files using the message as a digital watermark. One of the other main uses for Image
Steganography is for the transportation of high-level or top-secret documents between
international governments. While Image Steganography has many legitimate uses, it can also be
quite nefarious. It can be used by hackers to send viruses and trojans to compromise machines,
and also by terrorists and other organizations that rely on covert operations to communicate
secretly and safely.

DIGITAL COMPOSITING

Digital compositing is the process of digitally assembling multiple images to make a final image, typically
for print, motion pictures or screen display. It is the evolution into the digital realm of optical film
compositing.

Mathematics

The basic operation used is known as 'alpha blending', where an opacity value, 'α' is used to
control the proportions of two input pixel values that end up a single output pixel.

Consider three pixels;

 a foreground pixel, f
 a background pixel, b
 a composited pixel, c

and

 α, the opacity value of the foreground pixel. (α=1 for opaque foreground, α=0 for a completely
transparent foreground). A monochrome raster image where the pixel values are to be
interpreted as alpha values is known as a matte.

Then, considering all three colour channels, and assuming that the colour channels are expressed
in a γ=1 colour space (that is to say, the measured values are proportional to light intensity), we
have:

cr = α fr + (1 − α) br

cg = α fg + (1 − α) bg

cb = α fb + (1 − α) bb

Note that if the operations are performed in a colour space where γ is not equal to 1 then the
operation will lead to non-linear effects which can potentially be seen as aliasing artifacts (or
'jaggies') along sharp edges in the matte. More generally, nonlinear compositing can have effects
such as "halos" around composited objects, because the influence of the alpha channel is non-
linear. It is possible for a compositing artist to compensate for the effects of compositing in non-
linear space.

Performing alpha blending is an expensive operation if performed on an entire image or 3D


scene. If this operation has to be done in real time video games there is an easy trick to boost
performance.
cout = α fin + (1 − α) bin

cout = α fin + bin − α bin

cout = bin + α (fin − bin)

By simply rewriting the mathematical expression one can save 50% of the multiplications
required.

Algebraic properties

When many partially transparent layers need to be composited together, it is worthwhile to


consider the algebraic properties of compositing operators used. Specifically, the associativity
and commutativity determine when repeated calculation can or cannot be avoided.

Consider the case when we have four layers to blend to produce the final image:
F=A*(B*(C*D)) where A, B, C, D are partially transparent image layers and "*" denotes a
compositing operator (with the left layer on top of the right layer). If only layer C changes, we
should find a way to avoid re-blending all of the layers when computing F. Without any special
considerations, four full-image blends would need to occur. For compositing operators that are
commutative, such as additive blending, it is safe to re-order the blending operations. In this
case, we might compute T=A*(B*D) only once and simply blend T*C to produce F, a single
operation. Unfortunately, most operators are not commutative. However, many are associative,
suggesting it is safe to re-group operations without changing their order. In this case we may
compute S=A*B once and save this result. To form F with an associative operator, we need only
do two additional compositing operations to integrate the new layer C: F=S*(C*D). Note that
this expression indicates compositing C with all of the layers below it in one step and then
blending all of the layers on top of it with the previous result to produce the final image in the
second step.

If all layers of an image change regularly but a large number of layer still need to be composited
(such as in distributed rendering), the commutativity of a compositing operator can still be
exploited to speed up computation through parallelism even when there is no gain from pre-
computation. Again, consider the image F=A*(B*(C*D)). Each compositing operation in this
expression depends on the next, leading to serial computation. However, commutativity can
allow us to rewrite F=(A*B)*(C*D) where there are clearly two operations that do not depend on
each other that may be executed in parallel. In general, we can build a tree of pair-wise
compositing operations with a height that is logarithmic in the number of layers.

Compositing

Compositing often also includes scaling, retouching and color correction of images.There are
two radically different digital compositing workflows:

 node-based compositing and


 layer-based compositing.

Node-Based Compositing

Node-based compositing represents an entire composite as a tree graph, linking media


objects and effects in a procedural map, intuitively laying out the progression from source input
to final output, and is in fact the way all compositing applications internally handle composites.
This type of compositing interface allows great flexibility, including the ability to modify the
parameters of an earlier image processing step "in context" (while viewing the final composite).
Node-based compositing packages often handle keyframing and time effects poorly, as their
workflow does not stem directly from a timeline, as do layer-based compositing packages.
Software which incorporates a node based interface include Apple Shake, Blender, eyeon
Fusion, and Foundry Nuke.

Layer-Based Compositing

Layer-based compositing represents each media object in a composite as a separate layer


within a timeline, each with its own time bounds, effects, and keyframes. All the layers are
stacked, one above the next, in any desired order; and the bottom layer is usually rendered as a
base in the resultant image, with each higher layer being progressively rendered on top of the
previously composited of layers, moving upward until all layers have been rendered into the final
composite. Layer-based compositing is very well suited for rapid 2D and limited 3D effects such
as in motion graphics, but becomes awkward for more complex composites entailing a large
number of layers. A partial solution to this is some programs' ability to view the composite-order
of elements (such as images, effects, or other attributes) with a visual diagram called a flowchart
to nest compositions, or "comps," directly into other compositions, thereby adding complexity to
the render-order by first compositing layers in the beginning composition, then combining that
resultant image with the layered images from the proceeding composition, and so on.

MOSAICS

Many problems require finding the coordinate transformation between two images of the same
scene or object. One of them is Image Mosaicing. It is important to have a precise description of
the coordinate transformation between a pair of images. Image mosaics are collection of
overlapping images together with coordinate transformations that relate the different image
coordinate systems. By applying the appropriate transformations via a warping operation and
merging the overlapping regions of a warped images, it is possible to construct a single image
covering the entire visible area of the scene. This merged single image is the motivation for the
term ``mosaic''.

Image mosaics allow one to compensate for differences in viewing geometry. Thus they can be
used to simplify version tasks by simulating the condition in which the scene is viewed from a
fixed position with single camera. Mosaic are therefore quite useful in tasks involving motion or
change detection or determining the relative pose of the new images that are acquired. They can
be used to determine what parts of the scene visible from that point have been observed.
Two types of image mosaicings are there:

 Intensity-Based Image Mosaicing

 Feature-Based Image Mosaicing

Intensity-Based Image Mosaicing

We know that two images in a panoramic image sequence, or two images belonging to a planar
scene, are related by an affine transormation. In particular, the image coordinates of any point in
the scene, in two different images are related by an equation of the form:

The following describes the algorithm in more detail:

Feature based Image Detection

In this method we first construct the edge images by using an edge detector from the grey scale
version of the images. Then find features in one image and correlate them in the next image of
the sequence.

 Corner Detection

Corner Detection:

The basic features that we used for correlation were corners. For detection of the corners we tried
several techniques. The first one was the method proposed by Rangarajan, Shah and Van
Brackle. The basic idea in that method was classifying the types of corner that occurs in the
images by considering the quadrants occupied by the cone portion of the corner. For that purpose
they defined 12 types of kernels for possible types of corners. We found that designing kernels
for the different types of corners and then doing convolution with them was costly for our
purpose. Also for the matching detection of every kind of corner is not necessary. So next we
tried simple types of kernels to find "straight angle" (right angle) corners. The kernels that we
used were basically 9x9 windows which has 1's at the sides and zeros at everywhere, for
different rotation of the right angle. We basically did template matching with that kernels. This
method was computationally more efficient than the previous one, since there was no
multiplication. But the edge pixels were containing many parallel lines that were so close to
cause misclassified corner detection. So lastly we tried a simple heuristic approach for the
detection of corners. Our first observation for our decision on this method was single-pixel-thick
edges. This types of edges makes edge following easier. The method that we used basically
following edge pixels in the 13x13 window. Basically we followed the edge pixel in this window
and try to find connected component that enters the window from the east or west side of the
window and leaves it from north or south side. We discard the components that have length less
than the size of window length. In this way we avoid to detect duplicate corners and the false
corners. Also the running time was impressive.

One of the improvements that we introduce in the corner detection and also in the matching
stages was the assumption of the motion direction. Since the major motion direction was in the x
direction we put limit in the area that we detect features. Because some of the features due to the
movement is outside of the next image, so detection this features are not necessary, since we can
not match them.

Corner Matching

In the matching stage we correlate the edge pixels in the 9x9 template that is centered by the
corner pixel in one image to the following image in the sequence. As we said above the
correlation of the template is not done on the entire image. The assumption of the slight y
movement and the putting threshold on the x direction movement improved our correlation
method. Without this assumption the mismatches are inevitable. The scoring method that we
used basically gives penalty to the mismatched edge pixels and rewards the matches. We
observed that the choice of the window size for the matching template is crucial for getting good
matches. As the window size gets larger the small group of edge pixels covered, which makes
the scoring scheme difficult. But giving penalty to the unmatched edge pixels sometimes
improve the optimal match case, although the matches are not perfect. We think that the reason
for not getting perfect matches although there is almost no rotation, is that the slight contrast
difference between the images of the same area resulting in different edge pixels. But the results
that we get from the edge based mosaicing was satisfactory.

Blending images into one Mosaic

After finding the transformation parameter for successive image pair, we have to put all of them
into one big images. Easy way to put all images into one mosaic is using superimposing method.
For example, initially the mosaic is empty, then the first image is put into mosaic, then the
second image is being put into mosaic where mosaic has empty pixel. If any pixel in the second
images is mapped to a mosaic pixel which has not been already occupied by previous image
pixel, then the value of that pixel (second image) is used the value of mapped pixel in mosaic. In
this method, each pixel in mosaic takes it's value from only one image. This method gives
unblured results but it has many artifact especially corner of the overlapping area because of
misalignment.

You might also like