Professional Documents
Culture Documents
1. INTRODUCTION
Research in autonomous detection of machinery threats on the energy pipeline right of way from imagery captured
at altitudes of around 3000 feet is challenging as a multitude of factors come into play. Issues in aerial imagery
such as complex appearances of machinery (belonging to a similar kind), the resolution of the object, the noise
present due to the image capture process and more importantly the height and angle at which the object has been
captured are some of the challenges that an object detection algorithm from aerial imagery needs to address. The
main objectives in this research are to tackle these issues by careful analysis of the imagery and develop a fullfledged system which can detect potential threats and aid human analysts for threat evaluation and subsequent
actions to be considered. To be more specific, the objectives in relation to the machinery threat detection are as
follows:
To detect and classify various types of construction machinery on the Pipeline ROW(Right of Way): The
characterization of each construction equipment is according to the following criteria. An illustration
showing the different types of equipment is shown in Figure 1.
Color, size and type of the machinery or equipment.
Outer and inner structural details of the machinery which are visible at around 1000-3000 feet above
the ground.
Further author information: (Send correspondence to Binu M Nair)
Binu M Nair: E-mail: nairb1@udayton.edu
Dr. Vijayan K Asari: E-mail: vasari1@udayton.edu
(a) Excavator
(b) Backhoe
(c) Mini-Excavator
(d) Trencher
(e) SkidSteer
(b) CastShadows
(e) Overexposure
(1)
Here f (x) is the original signal and fH (x) is its Hilbert transform, which can be calculated in frequency domain
as given in Equation 2.
FH (u) = H1 (u)F (u)
(2)
Here F (u) is the frequency domain representation of f (x) and H1 (u) = isgn(u) is the definition of the Hilbert
transform. Therefore, from Equation 1,the phase of a signal can be computed as
(x) = arctan(fH (x), f (x))
(3)
There have been multiple techniques that have attempted to extend the analytical signal representation to
multiple dimensions like the use of steerable filters. However, those techniques were not purely isotropic in
nature. The isotropic extension of the analytical signal representation is given by the monogenic signal2 as in
Equation 4.
fM (x1 , x2 ) = (f, fR )(x1 , x2 )
(4)
where fR (x1 , x2 ) = (h f )(x1 , x2 ) and h = (h1 , h2 ), is the Riesz kernel. The spatial and frequency domain
representation of the Riesz kernel is given in Equations 5 and 6 respectively.
x2
x1
,
), x = (x1 , x2 ) R2
(5)
(h1 , h2 )(x1 , x2 ) = (
2|x|2 2|x|2
u2
u1
)(
), (u) = (u1 , u2 ) R2
(6)
(H1 , H2 )(u1 , u2 ) = (
2|u|2 2|u|2
As in Equation 3, the local phase can be computed for the two dimensional signal from the monogenic signal
representation as in Equation 7.
((x)) =
|fR (x)|
fR (x)
arctan(
) = (x) exp(i (x))
|fR (x)|
f (x)
(7)
where (x) is the local phase and (x) is the local orientation. Both these are computed from the Reisz transform
of the signal as shown in Equations in 8 and 9.
q
= arctan( R12 (f ) + R22 (f ), f )
(8)
R2 (f )
), [0, )
R1 (f )
A measure of the local contrast can be estimated as shown in Equation 10.
q
A = f 2 (x) + |fR2 (x)|
= arctan(
(9)
(10)
In the monogenic signal representation, the implicit assumption is that any signal in a local sense is considered
to be intrinsically one dimensional. The local amplitude, local phase and local orientation is computed for this
intrinsic 1D signal. Therefore, design of the bandpass filter in the construction of monogenic signal representation
is significant.
3. METHODOLOGY
The general approach to object detection from aerial imagery is to use a novel object representation scheme
which uniquely describes an objects shape, structure, color and texture which is invariant to the issues of nonuniform illumination, view-point, orientation and noise. These artifacts are mainly due to the image acquisition
process in the optical sensor onboard the aircraft. One of main issues in designing an object representation
is the invariance to lighting or invariance to non-uniform illumination. In case of an object present in a good
lighting condition and another object present in dark lighting condition, the object representation of both the
objects should ideally have similar in its invariance to non-uniform illumination. One such representation is
the use of phase information computed from frequency spectrum image analysis. A more localized version of
phase can be computed by monogenic signal analysis where the local phase represents the local structure object
irrespective of the lighting present in the image. This is illustrated in Figure 3. We see that the local phase
brings out the structural details of the backhoe irrespective of the lighting present and helps in distinguishing it
from the surrounding objects in the background region such as trees, shrubs, building etc...The characteristics
of the local phase is that since it is illumination invariant, it is not affected by over exposure to lighting or very
low illumination conditions and it projects the regular edges and corners associated with the description and
t:
rt. "
(a) Backhoe 1
(c) Backhoe 2
Figure 3: Top Row: Backhoe captured in Flight 8(left) and the corresponding local phase computed in that region.
Bottom Row: Backhoe captured in Flight 6(left) and the corresponding local phase information computed in
that region. (Courtesy of Vendor 1)
(a) Excavator1
(b) Excavator2
(c) Excavator3
Figure 4: Images of Excavator illustrating the various constraints that occurs in Aerial imagery for both the
Vendor 4 (Left three) and Vendor 1(Right three)
representation of the object. A feature descriptor which is extracted from the local phase information tend to
hold the illumination property true and hence this representation is a very effective in describing objects captured
by optical sensors at an altitude of 500-3000 feet. Some of the constraints in using the local phase domain are
that the computation of local phase depends on the following factors:
Size of the object region: The sampling frequency refers to the sampling used to create the monogenic
filters used in the computation of the local phase information which in turn is related to the size of the
region of interest containing the object or construction machinery.
Orientation of the object: The local phase changes with the orientation of the object. Since the local phase
inherently depends on the frequency spectrum of the object, a change in the orientation of the object causes
its frequency spectrum to get shifted thereby changing the local structure in a square neighborhood region.
Image Resolution: The variation in the resolution of the object captured in the scene can also cause changes
in the computation of the local phase. More specifically, the frequency content captured by the monogenic
signal analysis shifts to a different band in the frequency spectrum as the resolution of the object changes.
So, to extract similar local phase information from two similar objects but appearing at different image
resolutions, the frequency band at which the local phase operates should be varied.
So, any object descriptor computed from the local phase information needs to be normalized for the scale (related
to the size of the object), the orientation and the image resolution. Illustrations of the constraints are shown
in Figure 4. To counter these constraints, we use a multi-stage approach where at each stage a suitable type of
descriptor is extracted to incorporate rotation,scale and viewpoint invariance.
4. DETECTION FRAMEWORK
The detection framework used to automatically locate construction vehicles in the pipeline right of way follows
a three stage approach. The detection framework is usually preceded by a training stage where the template for
each construction equipment is computed. The template is extracted from the local phase information of a high
resolution image and stored in a multi-scale fashion.
Local phase based template matching : This is a preliminary stage where a possible set of regions for
location of the object are noted by matching a template of the object on the training set to the test image.
Selection of Orientation and Cluster Voting: Here, the orientation of the object present in possible regions
is determined and a shortlist of such regions is made through Hierarchical clustering.
Final Detection by cluster selection and Feature matching by Histogram of Oriented Phase (HOP): From
the final set of clusters, we extract the a feature descriptor known as Histogram of Oriented Phase from
the local phase information for feature matching with the original template.
4.1 Training
The training stage involves selection of a suitable image for the creation of the template. We use a high resolution,
nader-view (top-view) of the construction vehicle as the training image. The selection of the high resolution image
enables us to create a multi-scale template pyramid with each level corresponding to a lower resolution. This
scaling enables the algorithm to search for objects with a different resolution than the object images present in
the training set. The local phase information of the training image is computed and the template to be used
is selected from a closely cropped region containing the local phase of the actual object. This template from
a high resolution image is down sampled to different lower sizes to create a local phase template pyramid. An
illustration of the template selection is shown in Figure 6. Some of the steps involved in computing the local
phase are given below.
Generation of Log-Gabor filters and monogenic filters for local phase computation.
Computation of local phase of training image using the Log-Gabor filters and monogenic filters to create
a frequency-scale space representation.
Selection of the template region in the local phase domain.
Creation of a template pyramid by down sampling of the template obtained from the local phase of highresolution training image.
Select
Voting Scheme
Orientation
If
Voting
Scale Detections
Zoomed in
View
HOP Matching
Final Detection
an important factor as different viewing angles of the camera can lead to different viewpoints. So, the testing on
these three datasets will provide a good evaluation of the construction equipment detection framework described
in the previous sections. In this section, we will evaluate the algorithm by testing it on the three datasets for
the detection of the Backhoe and provide statistics on the accuracy, false detection rate and miss rates. The
accuracy will be in terms of the percentages. The false detection rates are the number of detections that was
incorrectly detected as construction equipment in the final stage. The miss rate is the number of construction
equipment (Backhoe) which were not located by the automated algorithm. As mentioned earlier, the algorithm
has a training stage and 3 stages in the testing phase :
Stage 1: Local Phase based Template Matching.
Stage 2: Orientation Selection and Cluster Voting.
Stage 3: Cluster Selection and Matching by Histogram of Oriented Phase.
Ix= ...sr
.atmauler
1.
014 WM
7#1-ii4
i
" '-am....
tr. mom,
lb 416
,
Ramie
'
Figure 10: Detection of the Backhoe at different stages on sample images,Courtesy of Vendor 1
Figure 11: Detection of the Backhoe at different stages on sample images,Courtesy of Vendor 2
41210-1,
(a) Training Image.
lit4NtiR.sir
"Alta ikzx
tritta%ar
(d) Stage2:Orientation Selection and Cluster Voting
a,ms
i11RSs
Figure 12: Detection of the Backhoe at different stages on sample images,Courtesy of Vendor 3
False Positives
0
0
0
1
0
1
0
2
False Positives
0
0
0
0
0
1
1
0
2
False Positives
0
0
2
4
0
0
2
4
0
0
0
12
6. CONCLUSIONS
We have proposed an algorithm which can autonomously detect construction equipment in various lighting
conditions and different equipment orientations using a multi-layer framework. This framework is based on the
feature extraction from the local phase information generated by a monogenic analysis of the image. The local
phase information brings out the spatial structure of the object and projects it from the surrounding homogenous
background and is invariant to the illumination present in that region. By computing the histogram of phase
and histogram of oriented phase(HOP) along with a template matching scheme, we have successfully detected
the construction equipment such as the Backhoe in three different datasets provided by the vendors 1,2 and 3.
Future work will include the detection of other construction equipment such as the Excavator, Mini-Excavator,
Trencher etc.. on the pipeline right of way (ROW) which is more challenging as its size is considerably smaller
than the Backhoe.
ACKNOWLEDGMENTS
This project has been funded by the Pipeline Research Council International(PRCI) with the test imagery
captured in Gary, Indiana. (Project No: PR-433-133700)
REFERENCES
[1] Gonzalez, R. C. and Woods, R. E., [Digital Image Processing], Addison-Wesley Longman Publishing Co.,
Inc., Boston, MA, USA, 2nd ed. (1992).
[2] Felsberg, M. and Sommer, G., The monogenic signal, Signal Processing, IEEE Transactions on 49(12),
31363144 (2001).
[3] Yao, J. and Zhang, Z., Semi-supervised learning based object detection in aerial imagery, in [Computer
Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on], 1, 10111016
vol. 1 (2005).
[4] Khan, S., Cheng, H., Matthies, D., and Sawhney, H., 3d model based vehicle classification in aerial imagery,
in [Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on ], 16811687 (2010).
[5] Guo, Z., Zhang, L., and Zhang, D., Rotation invariant texture classification using lbp variance (lbpv) with
global matching, Pattern Recogn. 43, 706719 (Mar. 2010).
[6] M.Pietikainen, A.Hadid, G.Zhao, and T.Ahonen, [Computer Vision Using Local Binary Patterns], Springer
(2011).
[7] Mathew, A. and Asari, V., Local region statistical distance measure for tracking in wide area motion
imagery, in [Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on ], 248253
(2012).
[8] Rubner, Y., Tomasi, C., and Guibas, L. J., The earth movers distance as a metric for image retrieval,
International Journal of Computer Vision 40, 2000 (2000).
[9] Matungka, R., Zheng, Y., and Ewing, R., Object recognition using log-polar wavelet mapping, in [Tools
with Artificial Intelligence, 2008. ICTAI 08. 20th IEEE International Conference on], 2, 559563 (2008).