You are on page 1of 10

Geophysical Signals and Data Processing

Introduction
Geophysical "signals" exist in a variety of forms. The signals can depend upon time or location, or both.
For instance at any location on Earth, the magnetic field changes with time and we want to record this
temporal variation. Alternatively, a buried drum or pipe may have a magnetic field which varies with
location. Sometimes we have signals that depend upon both time and space. Seismic and ground
penetrating radar signals fall into this category.

A map of data with value changing as a function of position.

A "section" of data, with echo strength


varying as a function of time (vertical
direction) and postion along a line
horizontal direction.

Signals in time or space are continuous. Consider the magnetic field, barometric pressure, ground motion,
or wind speed at any location. Each of these quantities varies continually with time. Most instruments are
electromechanical devices that produce a continuous waveform as a function of time. If this waveform
were recorded directly then we would have an analogue signal. (e.g. a barometric pressure chart, ground
motion measured with a pen on a rotating drum). Prior to the age of digital recorders analogue records
were the only form of data collection. Today most signals are digitally sampled so they can be stored on
computers, processed and plotted. Before looking at sampling we consider the simplest and most
fundamental of all signals, sinusoids.

Sinusoids
Sinusoids can be written as:

These are the harmonic continous waveforms. (Note the symbols assume you have a "symbols" font on
your computer - likely for Windows machines, unlikely for Unix machines).
1.

is the angular frequency expressed in radians/sec.

2.

is the phase (in radians)

3.

=2

f where f is the linear frequency measured in Hz (hertz) or cycles per second. So we

could write each of the above sinusoids like sin(2

f t)

Any signal may be made up of a linear combination of sinusoids with each sinusoid having its own
frequency, amplitude and phase. This is the heart of Fourier analysis which we shall talk briefly about
later.

Digital data, Sampling Interval, Nyquist Frequency, Aliasing


Analogue data are difficult to work with. Much greater flexibility in processing data is obtained by
working with digital data. Effectively, a signal is "sampled" at uniform intervals ( t for a time domain
signal, x for a spatial signal). Consider the example below. The continuous sinusoid at the top is sampled
at 12 equispaced intervals. After sampling, the numbers (0, .5, .9, ...-.5, 0) are the only knowledge that we
have of the signal. If we plotted out the points and connected them, we would obtain a reasonable picture
of the initial signal. However, as you sample this signal with fewer and fewer points, your ability to
recover the information in the original signal will progressively deteriorate. Imagine trying to represent
this waveform with a single sample value!

Signal (top) sampled at intervals=1/12th of the sinusoid's period (bottom).From


An Introduction to Geophysical Exploration Philip Keary and Brooks, 1991.

This is effectively illustrated in a second example below; click the figure for a larger version with an
explanation. The top curve is an analogue signal and the dots represent the sampling points. The sampled
time series is shown in the middle, and the reconstructed signal, obtained by joining the dots in (b) is given
at the bottom. Notice how much information is lost.

Top: An analogue signal,with


dots representing the sampling
points.
Middle: The sampled time series is shown.
Bottom: The reconstructed signal,
obtained by joining the dots in (b). Notice
how much information is lost.
From Seismic Data Processing, Ozdogain Yilmaz, 1987.

How often do you need to sample a signal so that all information is retained?
In order to retain all of the information that is in the initial signal, it is necessary to sample so that you
have at least two samples per cycle for the highest frequency that is present in your data. If you sample a
sinusoid with frequency of 50 Hz then your sampling interval must be less than or equal to 0.01 seconds in other words, a 50 Hz sinusoid must be sampled at least 100 times per second. If you sample a
complicated signal made up of many frequencies, say from 50 hz out to 300 hz (like a typical seismic
signal), you must sample this signal with at least 600 samples per second in order to recover the signal
accurately.
If you have a fixed sampling interval (for example, you only took measurements every 5 meters along a
line), what is the highest signal frequency you could recover from the data? The question is so important
that there is a name for the highest frequency sinusoid that can be recovered: it is called the Nyquist
frequency and its formula is

where t is the sampling interval. If the frequency of a signal in your data exceeds the Nyquist frequency
(or equivalently if your sample interval is too large and you have fewer than two samples per cycle) then
you will in fact recover a sinusoid that has a lower frequency than the real signal. The phenomenon is
called "aliasing". The next figure illustrates why it happens:

Aliasing: Sinusoid must be sampled at least 2 times per cycle.


The Nyquist frequency is given by.
From An Introduction to Geophysical
Exploration Philip Keary and Brooks, 1991.

Aliasing can have disastrous consequences when interpreting geophysical data, and it should be guarded
against whenever possible. Proper observation begins by knowing the highest frequency (or wavenumber
for spatial signals) in the signal and choosing a sampling interval accordingly. This is generally possible in
time-recordings, but it is more difficult in the spatial domain because earth structure near the surface can
give rise to high frequency (or more correctly, high wavenumber) signals.

a continuously sampled profile.


Figure 1.7 from Reynolds, 1997.

Two versions with 10 m sampling intervals, but at different locations.

2 m sampling interval:
profile is still aliased.

1 m samplong interval:
this profiles is close enough to (A).

The effects can be serious for maps as well. Click the images for full-page versions of these two figures
from Reynolds, 1997.
Examples of
contouring
(A)
(B)

(B)

(D)

different patterns
of data. (A)
shows a set of
radial lines, and
(B) an even grid
of data, both with
114 points per
square kilometre.
(C) has far too
few data points
unevenly spread
over the same
area (23 data
points per square
kilometre). The
results are
seriousl "aliased"
- ie spacial
information
obtained is
"wrong". (D)
shows the result
of contouring an
even grid of 453
data points per
square kilometre.
The contours are
lines of constant
total magnetic
field strength (in
nanoteslas). The
data are from a
ground
magnetometer
investigation of
north-west
Dartmoor,
England.

Figure 1.9

Example of spatial aliasing on aeromagentic data, showing


the loss of higher-frequency anomalies with increasing
separation between flight lines, and the increasing "bullseye"
effect caused by stretching the data too far.
From An introduction to applied and environmental geophysics, J.
M. Reynolds, 1997.

Figure 1.8 from Reynolds, 1997.

In the left figure, data have been acquired along the vertical solid lines, and then a contour map was
produced. In the right-hand figure, data have been aquired only at the positions of solid dots. Click the
figures to see full scale versions of these figures.
What in fact will the recorded data look like if it has been aliased? Suppose you are sampling data which
has a frequency ftrue that is greater than the Nyquist frequency. The digital series will look as if it has a
sinusoidal at frequency frec where
where n = 0, 1, 2, ... and is large enough so that

The negative frequencies are needed in the description. Perhaps the easiest way to get an intuitive feel for
the need for negative frequencies is to remember the old western movies with wagon wells. The camera is
sampling the image at a fixed rate. Depending upon the actual speed of the wagon the wagon wheel can
appear to be moving clockwise (positive frequency), stopped (zero frequency), moving counter-clockwise
(negative frequency). Anyone who is familiar with watching scenes in the presence of strobe-lighting will
be familiar with these effects.

Signals depicted in time or frequency


Here is one more example of aliasing. Click figure for a larger version. This figure introduces a new
kind of graph - the spectrum. Normal plots of data show signal strength versus time or position. (Contour
maps do this for x-y positions.) A spectrum plots the signal strength versus frequency. The x-axis has units
of Hertz (cycles per second). The spectrum of a signal that is a combination of two sinusoids will look like
two spikes (as shown below).
A time series syntesized from two sinusoids
at 12.5 and 75 Hz at 2-ms sampling rate
remains unchanged when resampled at
4 ms. However, at 8 ms, its high frequency
component shifts from 75 to 50 Hz, while
its low-frequency component remains the same.
From Yilmaz, Ozdogan

To complete the illustration of time versus frequency plots (a normal signal versus it's spectrum), here is a
seismic signal (ground motion caused by an earthquake). The time series is a complicated looking signal.
In fact, this signal can be described as a combination of a very large number of sinusoids, all with
amplitudes as shown in the spectrum graph.
TOP: A record of ground motion
caused by an earthquake.
Amplitude is speed at which the
ground moved up and down.
This signal seems to have some
dominant frequency, but the
signal is certainly not only a
simple sinusoid.
BOTTOM: This signal's amplitude
spectrum. The signal's dominant
frequency is clear (the spike at
2 Hertz), but the spectrum shows
that there are also small amounts of
other lower and higher frequencies
contributing to the signal.

Digital Filtering
There are many reasons for wanting to filter your data.
1. remove high frequency fluctuations which you feel reflect "noise" rather than desired signal
2. to separate regional and anomalous fields
3. to perform upward or downward continuation on the data (that is to see what the data would have
looked like if they had been recorded at a different elevation level)
4. amplify the signal
5. remove outliers
Some types of filtering are most easily carried out by working with the original recorded time series. In
other cases the filtering is most easily accomplished by first converting the recorded time (or space - as in
a map) data into the frequency domain, working with this so-called "Fourier transformed data", then
converting the result back into the time (or space) domains to see and use the filtered data. These two
options for filtering are illustrated in the following diagram. Fourier transforms and the related
mathematics are interesting enough to be the subject of whole courses in engineering, financial analysis, as
well as geophysics and other disciplines.

Noise and Stacking


Geophysical data are the sum of two elements:
1. SIGNAL: the response that we are looking for and which will ultimately be interpreted
2. NOISE: everything else that is contaminating the signal
so

OBSERVATION = SIGNAL + NOISE

There are many different kinds of "noise" and in fact there is no unique definition because noise depends
upon the goals of the problem. The saying "one man's signal is another man's noise" is often valid for
geophysical data. (e.g. seismic reflection/refraction data; ocean bottom magnetometer data; magnetic vs
magnetotelluric data).
Noise in geophysical data may be:
1.
2.
3.
4.

Instrumental noise (from electronics)


Cultural noise (power lines, fences)
Natural noise (due to winds, electromagnetic storms)
Geologic noise (additional signal caused by localized distorting bodies, topography).

It is possible to reduce noise without loss of signal if you collect the noisy data several times. Here's how it
works. Suppose you added up a whole set of independent versions of the experiment. Each version has
noise, but if the noise is random, the noise is different in each version of the data. Let us name the noise on
the "ith" version of experiment Xi. This noise has a standard deviation of . If we could add up all the
noisy parts by them selves, this sum would look like

where N is the actual number of times the experiment was done (in other words, the number ot trials). The
mean (average) value of Y is zero because the mean value of each random variable X i is zero. However the
standard deviation of this sum is

This says that the standard deviation of our sum is smaller than standard deviation of any one trial by a
factor of 1 over square root of the number of trials.
This has important implications. If you sum geophysical observations which are composed of the same
signal but different realizations of the noise then the signal to noise ratio is improved by a factor of
sqrt(N). For this reason it is often customary to record data that come from repetitions of the same
experiment, and to then average the results. This is sometimes referred to as "stacking". (NOTE - to be
strictly correct, the arguments above assume that the random noise is in fact "Gaussien" - which is a
particular type of randomness. This is not an unreasonable assumption for many types of noise on
geophysical data.)
Example of stacking. The top figure in the image below is the original signal. Click the buttons to see
versions of the signal before and after stacking, as indicated by each button's label.
One instance of random noise added to the original
signal (default figure).

A third such instance.

A second instance of random noise added to the


original signal.

Result of stacking 2 such signals, each with different


instances of noise.

Result of stacking 4 such signals,

Result of stacking 9 such signals.

25 such stacks,

100 stacks,

625 stacks,

10,000 stacks.

Figures by R. Shekhtman and F. Jones


Question to think about ... How much improvement in signal-to-noise ratio can be expected for each
case?

Gaussian Noise
Gaussian noise is characterized by a probability that any realization of the noise takes on a certain value.
The probability density function looks like:
The quantity
is the standard deviation of the
variable and is not to be confused with the electrical
conductivity. If you take many samples of the random
variable you will find that 65% lie between and
95% lie between 2 .

Here are 1000 Gaussien random


numbers between +50 and -50.

Bars show the actual distribution of


numbers (in percent).
The theoretical probability distribution
is shown by the red curve.

Finally, in general we don't know very much about the noise which is contaminating the data. We need to
ascribe some uncertainty to the measurement so we often make the simplest assumption. We say that the
noise is Gaussian and has a given standard deviation and is unbiased. Furthermore we often assume that
the noise is uncorrelated between adjacent samples. All of these assumptions are likely violated to some
degree but this still gives us a good place to start.
In advanced interpretation we are interested in finding a distribution of physical properties that reproduces
or "fits" the data. The concept of adequately fitting the data requires a knowledge of what the errors are on
the data. Consider the following illustration.

Very noisy data: Data are compatible


"Clean" data: Data requires a +ve slope.
with +ve and -ve slope.
The point here is, if we can not tell how noisy the data are, we can NOT estimate reliably what kind of
model could cause the data. Even if assigning some measure for error bars is done based upon experience
rather than by actually measuring the noise level, it is still important. And, when the error bars are
assigned, it is usual to assume errors have Gaussien behaviour.
UBC Earth and Ocean Sciences, F. Jones.

You might also like