An Integrate Multistage Framework For Automatic Road Extraction From High Resolution Satellite Imagery

J Indian Soc Remote Sens (March 2011) 39(1):125 DOI 10.
1007/s12524-011-0063-9
RESEARCH ARTICLE
An Integrated Multistage Framework for Automatic Road Extraction from High Resolution Satellite Imagery
T. T. Mirnalinee & Sukhendu Das & Koshy Varghese
Received: 6 October 2009 / Accepted: 6 April 2010 / Published online: 12 March 2011 # Indian Society of Remote Sensing 2011
Abstract Automated procedures to rapidly identify road networks from high-resolution satellite imagery are necessary for modern applications in GIS. In this paper, we propose an approach for automatic road extraction by integrating a set of appropriate modules in a unified framework, to solve this complex problem. The two main properties of roads used are: (1) spectral contrast with respect to background and (2) locally linear path. Support Vector Machine is used to discriminate between road and non-road segments. We propose a Dominant singular Measure (DSM) for the task of detecting linear (locally) road boundaries. This pair of information of road segments, obtained using Probabilistic SVM (PSVM) and DSM, is integrated using a modified Constraint Satisfaction Neural Network. Results of this integration are not satisfactory due to occlusion of roads, variation of road material, and curvilinear pattern. Suitable post-processing modules (segment linking
T. T. Mirnalinee : S. Das (*) Visualization and Perception Lab, Dept. of CSE, Indian Institute of Technology, Madras, Chennai 600 036, India e-mail: sdas@iitm.ac.in T. T. Mirnalinee e-mail: mirna@cse.iitm.ac.in K. Varghese Dept. of Civil Engg, Indian Institute of Technology, Madras, Chennai 600 036, India e-mail: koshy@iitm.ac.in
and region part segmentation) have been designed to address these issues. The proposed non-model based approach is verified with extensive experimentations and performance compared with two state-of-the-art techniques and a GIS based tool, using multi-spectral satellite images. The proposed methodology is robust and shows superior performance (completeness and correctness are used as measures) in automating the process of road network extraction. Keywords Dominant singular measure . PSVM . CSNN-CII . Road edges . Road segments . Fusion . Segment linking . Region part segmentation
Introduction Road networks are essential modes of transportation, and provide a backbone for human civilization. Cartographic object extraction from digital imagery is a fundamental operation for GIS update. However the complete automation of the extraction processes is still an unsolved problem. Road feature extraction from a raster image is a non trivial and image specific process. Hence, it is difficult to have a general method to extract roads from any given raster image. Road layers on raster maps typically have two distinguishable geometric properties from other layers: (1) Road lines are straight within a small distance (i.e., several meters in a street block); (2) Unlike building layers, which could have many small distinct connected
J Indian Soc Remote Sens (March 2011) 39(1):125
components, roads are connected to each other to form a road network. Road layers usually have few connected objects or even only one huge connected object forming a whole road layer. Many works on this topic have been presented (Laptev et al. 2000; Shi and Zhu 2002; Hinz and Baumgartner 2003; Hu and Tao 2007; Mokhtarzade and Zoej 2007; Mena 2003; Tupin et al. 2002). However, the manual intervention of the operator in extracting, defining and validating cartographic objects for GIS update is still needed. Applications of road extraction process are found in updating GIS records, urban planning, traffic control, car navigation, map generation etc. Most of the works published in literature on road detection from satellite images are classified in two categories: (1) Semi Automatic (Gruen and Li 1995; Udomhunsakul 2004; Bucha et al. 2006; Zhang et al. 2008; Hu et al. 2004; Xiao et al. 2005) processes that require help from a human operator. In contrast to the automatic methods they demand a number of seed points which are usually chosen by the operator in an interactive fashion. Given such seed points the semiautomatic algorithm connects them by a path which is most likely a road. On the other hand, (2) Automatic (Laptev et al. 2000; Shi and Zhu 2002; Hinz and Baumgartner 2003; Mokhtarzade and Zoej 2007; Baumgartner et al. 2002; Zhu et al. 2005) road extraction methods require no initial (prior) information about the presence and location of roads. In the following, we will discuss automatic road extraction process. Automated extraction of roads from highresolution imagery is a difficult task because of the complexity in spatial and spectral variability of the road network. Roads exhibit a variety of spectral responses due to differences in age and/or material and vary widely in physical dimensions. In addition, road network in dense urban areas typically have different geometric characteristics than those in suburban and rural areas. Techniques to extract road networks using binarization and line segment matching of high-resolution IKONOS urban imagery were presented in (Shi and Zhu 2002; Zhu et al. 2005). A line segment match method was used to detect long linear groups of pixels for classification as roads. These road pixels are then simplified into the road centerlines with the use of morphological operators. Mayer et al. (1997) presented a complex road net-work extraction ap-
proach that attempts to accurately map both the road network and the road edges through the use of snakes (Kass et al. 1987). In another approach, Hinz and Baumgartner (2003) utilized multiple very highresolution aerial images and detailed scene models, to perform road extraction. One can find a survey of road extraction methods from satellite images by Mena (2003). Tupin et al. (2002) presented the road extraction algorithm using feature extraction (line detector) and network reconstruction (graph labeling), which uses multiple views of the same scene. According to McKeown (1996), roads extracted from one raster image need not be extracted in the same way from another raster image, as there can be a drastic change in the value of important parameters based on nature, instrument variation, and photographic orientation. Yang and Wang (2007) proposed a road extraction algorithm which deals with detecting two types of road primitives, namely blob-like primitive and line-like primitive. These primitives are defined, measured, extracted and linked using different methods for dissimilar road scenes. Tuncer (2007) proposed a method which comprises of preprocessing the image via a series of wavelet based filter banks and reducing the data into a single image which is of the same size as the original satellite image. Then a fuzzy inference algorithm is utilized to perform road detection. Each wavelet function resolves features at a different resolution level associated with the frequency response of the corresponding FIR filter. Resulting two images are fused together using Karhounen-Louve transform (KLT) which is based on principal component analysis (PCA). This process underlines the prominent features of the original image as well as denoising it, since the prominent features appear in both of the wavelet transformed images while noise does not strongly correlate between scales. Next a fuzzy logic inference algorithm which is based on statistical information and geometry is used to extract the road pixels. The approach is only suitable for the Ikonos data on rural areas where roads are mostly homogeneous and are not disturbed by shadows or occlusions. The central idea is to take into account the spectral information by means of a (fuzzy) classification approach. A back-propagation neural network (BPNN) with one hidden layer has been proposed for road
extraction in Mokhtarzade and Zoej (2007). The output layer consists of one neurode that expresses the networks response by a number between 0 and 1, as background and road pixel respectively. Back propagation Neural Network with different sizes of the hidden layers, were trained with different number of iterations before converging. Training and recalling stages were time consuming in this approach. Doucette et al. (2001) introduced a self-organizing road map algorithm to extract roads from highresolution Multi-Spectral imagery. The self organizing road map, a specialized version of the self organizing neural network model, performs spatial clustering to identify and group together elongated regions. Most of the methods discussed so far use a limited set of image samples of a particular area to obtain descent results. Some of them do not exhibit performance analysis and comparative study with existing state of the art techniques. Techniques adapted are often adhoc and tuned for a particular set of (few) samples acquired to show results. Our study of road extraction is solely based on the road characteristics (geometrical and spectral) stored in an implicit manner in a raster image. It is often difficult to obtain satisfactory results, by using only one of these methods to detect road structures in complex pictures. However, it is possible to improve the results by using the complementary nature of edge-based and region-based information. A large amount of work on the fusion of edge and region information have been reported in literature (Haddon and Boyce 1990; Chu and Aggarwal 1993; Moigne and Tilton 1995; Pavlidis and Liow 1990) for image segmentation. Pavlidis and Liow (1990) described a method to combine segments obtained using a region growing (over-segmented) approach, where the edges between regions are eliminated or modified based on contrast, gradient and smoothness of the boundary. Haddon and Boyce (1990) generate regions by partitioning the image co-occurrence matrix and then refining them by relaxation using the edge information. Chu and Aggarwal (1993) present an optimization method to integrate segmentation and edge maps obtained from several channels, including visible, infrared, etc., where user specified weights and arbitrary mixing of region and edge maps are allowed. Most of the methods proposed for combining region and edge information are highly sensitive to the correctness of edge map.
Lin et al. (1992) proposed constraint a satisfaction neural network for image segmentation. They posed the image segmentation problem as a constraint satisfaction problem (CSP) by interpreting the process as one of assigning labels to pixels subject to certain spatial con-straints. Kurugollu and Sankur (1999) proposed a segmentation algorithm for color images, which implements the MAP estimation of the label field using a CSNN. In their work, the initial class probabilities are obtained via a fuzzy C-means algorithm in contrast to Lin et al. (1992) method, where an adhoc fuzzification of an initial map takes place. They have tried to combine advantages of GMRF formulation (Raghu and Yegnanarayana 1996) with those of the CSNN based (Lin et al. 1992) relaxation. The results are shown on synthetic images. In a recent work proposed by Lalit et al. (2008), a CSNN-CII (Constraint Satisfaction Neural Net-work Complementary Information Integration) has been used for texture segmentation. Results are shown on simulated and real world images. The focus of this paper is on the design and development of a technique, which enables the user to extract road segments from an input image without much of user interaction. The motivation of our work comes from the fact that the complimentary information of regions (road pixels in our case) and edges (road boundaries) have not been exploited together to obtain a decent road map from satellite images. Either of these techniques when solely applied, produce errors which do not occur together (simultaneously), in general. This is due to the fact that the criteria for classification of pixels as road regions look for continuity and local smoothness, whereas methods to detect road boundaries look for discontinuities in raster images. Road regions are separated from nonroad regions in our proposed framework using a PSVM (Probabilistic Support Vector Machine) classifier. In our previous work on DSM (Dominant Singular Measure) (Mirnalinee et al. 2009) based road extractor, the performance was low as the local contrast between the regions was only considered. Therefore, we decided to merge the information from both DSM and PSVM using a CSNN-CII (Constraint Satisfaction Neural Network with Complimentary Information Integration) (Lalit et al. 2008) to produce better results. A modified constraint satisfaction neural network (CSNN) has been designed for this task, which uses a novel dynamic window to merge
the complimentary information of edges and regions. The output of CSNN-CII needs to be processed further to remove some undesired artifacts and errors. Segment linking algorithm is used to bridge the discontinuities detected between road segments. Region part segmentation algorithm separates the roads from protruding or attached non-road regions thereby improving the accuracy. Results are shown using four categories of database of high-resolution satellite images from the following areas: (1) Developed suburban, (2) Developed Urban, (3) Emerging suburban and (4) Emerging Urban. Performance analysis is presented using completeness and correctness measures (Heipke et al. 1997). This paper is organized as follows: Section Research Issues and Design Strategy deals with the research issues and design strategies. Section Proposed Method deals with the overall proposed methodology. Description of various stages in proposed framework is presented in Section Description of the Different Stages in Our Proposed Framework. We present experimental results in Section Experimental Results and Comparative Study and conclude the paper in Section Conclusions.
image, its physical appearance tends to exist as long continuous features. Humans fuse these vital clues to identify a foreground road object from the background layer. This motivated us to develop a generic framework that integrates suitable processing modules necessary for extracting the different types of features present in road objects available in satellite scenes. We present the characteristics of roads next, followed by suitable modules designed specifically to address these issues. We also validate the efficiency of the extraction system using experimental results. Most significant characteristics of roads, which appear in high-resolution satellite imagery are: 1. Roads have a distinctively contrasting spectral signature (both locally and globally) with respect to the background layer (e.g. vegetation, soil, waterways, manmade structures etc.). 2. Roads are mostly elongated structures, with locally linear properties. 3. The road surface is usually homogeneous, with occasional variations. 4. Discontinuities appear in a road structure mainly due to occluding objects, such as trees, buildings, large vehicles etc. or even shadows. 5. Roads do not appear as a small segment or patch; either in isolation or attached to a large linear segment. 6. Roads rarely terminate (no abrupt ending) within short distances. In fact, they intersect, occlude one another (bridges and highways) and bifurcate to build a network (global appearance). 7. Roads have near-parallel boundaries, with both linear and curvilinear patterns. 8. Road structures are rarely non-smooth and occur generally without much of sharp bends. Among the different properties stated above, the two major characteristics of roads are their geometrical shape and spectral contrast (as stated in (1) and (2) above). Roads in high spatial-resolution images of urban areas appear as piecewise linear segments with spectrally homogeneous characteristics. These are vital clues, which form the basis of the design of our framework for automatically detecting roads in satellite imagery. In the design of a framework for road detection, we first need to exploit these two vital characteristics of roads. In such a case, one may be tempted to use a foreground extracting algorithm trained with spectral
Research Issues and Design Strategy The difficulties in the design of an automated road network extraction system using remotely-sensed imagery lie in the fact that the image characteristics of road feature vary according to sensor type, spectral and spatial resolution, ground characteristics, etc. Even for an image taken over a particular urban area, different parts of the road network reveal different characteristics. In real world, a road network is too complex to be modeled using a mathematical formulation or an abstract model. The existence of other objects (e.g., buildings and trees) cast shadows to occlude road features, thus complicating the extraction process. Human perceptual ways of recognizing a road involves (Jin and Davis 2005) extracting geometric, radiometric and topological characteristics of an image. Humans usually recognize a road using first its geometric characteristics considering a road to be a long, elongated feature with uniform width and similar radiometric variance along its path. Even though spectral characteristics of road vary within an
patterns for roads and then use linear features on top of it. However, a classifier based on only spectral features will produce false alarms (identify non-road objects as roads and filter parts of roads as background, due to reasons mentioned in points (3) and (4) above). On the other hand, a pattern classifier (for classifying roads) trained with geometrical features is useless, unless the target (road, in this case) is available. It is also not possible to simultaneously extract and fuse these pair of distinct/disconnected features together, as not unless the road-like structures are filtered from the background the linear features may be estimated. It is impossible to design an operator or mask for this purpose, as that would need to simultaneously extract spectral and RST-invariant shape (geometrical) features from the image data. It is also not possible to formulate a mathematical (parametric) model for a road network, which will work for all complex variations in the geometric design patterns (linear and curvilinear) formed by roads in urban scenarios. Due to the existence of these complex phenomena for roads, it is almost impossible to consider and model all these situations and incorporate them in a single module or processing stage for road network extraction. This drove us to formulate and design a hierarchical pipelined framework, consisting of the classification (supervised), information integration, filtering and local neighborhood analysis to obtain descent results with acceptable quality. Results will be compared with two state-of-the-art methods (Tuncer 2007; Mokhtarzade and Zoej 2007) published in literature and one GIS-based software (Geospace 2008) used for raster image analysis. Because of the issues mentioned earlier, in most cases with hyper-spectral dataset, the spectral information alone is not sufficient to define roads. We
Table 1 Road characteristics & corresponding processing module Sl. No 1. 2. 3. 4. Characteristics Strategy/module
need an integrated multistage framework to achieve our goal. Each stage of the framework deals with a particular characteristic of roads and are given in the left column of Table 1. The center column gives the corresponding strategy (processing module) used by us to solve the problem, while the right-hand side column specifies the difficulties/drawbacks that one may face in execution of that stage. In the next section we describe our proposed multistage method based on the issues discussed in this section, followed by design details of the road extraction modules listed in Table 1.
Proposed Method A multistage pipelined framework for road extraction has been proposed in this paper. Figure 1 shows the flowchart of our proposed method of road extraction, which is a hierarchical pipelined multistage framework based on details specified in Table 1. The first stage consists of an iterative merging of region and edge based information using a set of constraints. Road edges (boundaries) are extracted from edge features using DSM. We assume roads appearing in satellite images to be locally linear. Soft class labels (probabilities) for each pixel belonging to either road or non-road regions are produced by the PSVM. Then a modified CSNN, termed CSNN-CII (Lalit et al. 2008) is used for integrating the complimentary information from the edge and region outputs. A fruitful cooperation could be established between region-based and edge-based methods to extract elongated thick objects like roads in high-resolution satellite imagery. Elongatedness measure (shape feature) is used to remove the isolated non-road structures. Then a
Remarks
Contrast w.r.t. background Mostly homogenous Elongated Structure
SVM classifier using mean and variance of Misclassification of non-road objects with idenspectral response tical spectral response DSM on edge map; Shape features Discontinuity due to occlusion Chance of linking roads with other structures Removal of small road fragments
Discontinuities and distortions in CSNN-CII and Segment Linking linear pattern Not appearing in isolation, rarely Region Part Segmentation terminate
segment linking algorithm is used to link the discontinuous road segments which result due to occlusion. Region part algorithm module removes the non-road structures which appear due to adjacent manmade structures. The steps of the algorithm,
depicting the process illustrated in Fig. 1, is given in Algorithm 1. In the next section, we present the description of the different stages of our framework along with intermediate results of processing using two satellite image samples.
Algorithm 1 Proposed framework for road detection. Input: Image. Output: Segmented Image. Steps: 1. Compute edge maps of the image using DSM. 2. Compute the probability of class-label for each pixel using PSVM. 3. Integrate region information and edge information (output of steps (2) & (1)) using CSNN-CII (Lalit et al, 2008): Initialize the neuron in CSNN-CII using the probability obtained from PSVM. Iterate and update the probabilities and edge map to get the final segmented map. 4. Post-process the CSNN-CII output to remove stray patches and unnecessary artifacts. 5. Perform segment linking to reduce the false negative. 6. Perform Region part segmentation algorithm to reduce the false positive.
Description of the Different Stages in Our Proposed Framework DSM Based Edge Detection Roads are expected to be locally linear. Hence, we extract the local orientation from the image of the road network. Extracting linear features from satellite images have been of interest to pattern recognition community for some time (Cooper and Cowan 2007; Granlund and Knutsson 1995; Lyvers and Mitchell 1988; Wei and Xin 2008; B.Majidi and BabHadiashar
Fig. 1 Framework of the proposed method for road detection
2009). In the work by Cooper and Cowan (2007), amplitude balanced horizontal derivatives were used for enhancing linear features in images. However, if the dataset possesses features with large variations in amplitude then the horizontal derivative will also have the same property, and the smaller amplitude features (which may be of considerable importance) may be hard to discern. Granlund and Knutsson (1995) devised an elegant method for combining the outputs of quadrature pairs to extract a measure of orientation. Perona (1998) extended the idea of anisotropic diffusion to orientation maps. Bigun et al. (1991) posed the problem as the least squares fitting of a plane in the Fourier transform domain. Another technique (Haglund and Fleet 1994) based on steerable filters (Jacob and Unser 2004), is limited in precision and generalization. In (Lyvers and Mitchell 1988), Lyvers et al. examined the accuracy of various local differential operators for noiseless situations, as well as in the presence of additive Gaussian noise. In (Jiang 2007), Jiang proposed an image integration operator which leads to unbiased orientation estimation.
Our method of obtaining the dominant direction using PCA and a gradient matrix (obtained using 1-D Canny (Kumar et al. 2000)) for orientation estimation to extract road segments is novel, more efficient and produces more robust results. Most established local orientation estimation techniques are based on the analysis of the local gradient field of the image. But the local gradients are very sensitive to noise, thus making the estimate of local orientation from these images unreliable. We use the method of Principal Component Analysis (PCA) for image orientation estimation. For each pixel in the image, we first calculate the local image gradients (using 1-D Canny (Kumar et al. 2000)) and then perform SVD of the gradient matrix. Gradient of image f(x,y) at point (xk, yk) is denoted by:
rf k r f xk ; yk df xk ; yk df xk ; yk T ; dx dy
the eigenvectors form a cluster along a dominant direction indicating the presence of a linear structure. The eigenvalue will reflect the strength (peakiness in domain) of the distribution of the gradients towards a particular direction. Generally, the first eigenvalue is larger than the second one, and hence in case of an ideal straight line the second eigenvalue is zero (indicating no spread along the orthogonal direction). However, a digital line is represented stepwise (aliased), and hence the second eigenvalue for the case of a line in a digital image is a non-zero value. In order to get the local orientation estimate, we rearrange the gradient vectors into a 2 N2 matrix, where a window size of NN is used for processing around each pixel, as shown below: G rf1 rf2 rf3::: rfN2 4
which involves 1-D processing along orthogonal directions (for details see (Kumar et al. 2000)). For example, the smoothing operator used along one direction (say, x) is the Gaussian filter: 1 x Gx p exp 2 2s 1 2p s 1
2
where, rf i rf xi ; yi ; i 1; 2; :::; N 2 see Eq. 1. We then compute the SVD (Singular Value Decomposition) (Strang 2005) of the gradient matrix for each pixel, computed using a window of size NN. SVD of the gradient map is computed as G USVT 5
and the 1-D Canny operator for computing the derivative along y is: y y2 dGy p 3 exp 2 2s 2 2p s 2 3
where, U is an orthogonal 22 matrix, in which the first column represents the dominant orientation of the gradient field. S is a 2 N2 matrix, representing the energy along the dominant directions and V is orthogonal matrix of size N2 N2 representing each vectors contribution to the singular value. Dominant Singular Measure Dominant singular Measure (DSM) is computed as the ratio between the singular value of the major axis and the sum of the singular values. This measure approaches 1 for an elongated shape, DSM is defined as: DSM s1 ; s1 s2 s1 ! s2 6
Similar processing are applied along the y and x directions, where the two operators interchange their directions of processing. This method is efficient and produces better gradient vectors which are orthogonal to the dominant orientation of the image pattern. Let us assume that in the image of interest f(x,y), the orientation field is piece-wise constant. Under this assumption, the gradient vectors in an image block should on average be orthogonal to the dominant orientation of the image pattern. So orientation estimation can be formulated as the task of finding a unit vector !, which maximizes the average of the a angles between ! and gradient vectors (Feng and a Milanfar 2002). The computational basis of PCA is the calculation of the Singular Value Decomposition (SVD) of the data covariance matrix. The majority of
When all the gradient components have the same direction, only one singular value (s1) is non-zero, which in turn makes the DSM value equal to 1. If both the singular values are equal and non-zero, the DSM value is 0.5. Range of values of DSM thus lies in the range [0.5 - 1]. We use the DSM measure to distinguish between scattered or disoriented image patterns and an
(a)
(b)
Fig. 2 The results of DSM on a synthetic image. a Input image, b corresponding DSM with the color bar indicating the values of DSM in the normalized range [0-1]
image region with an orientation pattern. If the DSM is less than a threshold (0:5<<1), it is very likely that the corresponding image block is noisy, contains no dominant orientation, and is hence considered as background (non-road) The default values for the parameters used in this process are given in Table 3. The result of this orientation detection process is robust, and performs well on both synthetic and natural image. The results of DSM (unsupervised method) includes the orientation information about streets/roads, rivers and other linear structures. However, if additional information and knowledge are available, non-road structures may be masked (Gruen and Li 1995). Figure 2 shows the result of this algorithm applied to a synthetic image. We observed that the most dominant structure has the DSM value 1. Image patches which do not have any oriented pattern produce a DSM value just greater than 0.5, and these patches are considered as non-roads. Hence DSM is an efficient method for automatically gener-
ating low-level representations of linear structures directly from satellite images. The color bar in Fig. 2 show the DSM value in a normalized range [0-1] after thresholding (>0.5). The measured orientations and strengths accurately reflect the linearity in the oriented structures of the input image. This method fails in case of junctions, certain textures, and transparent or overlapping objects, which do not contain a dominant local orientation. Edge candidates are detected for individual channels as described in (Kumar et al. 2000). Edges can occur over a wide range of strengths as well as scales in real images. Selecting an optimal scale for all edges in an image is difficult. Combining the estimates from different scales we get a final edge map using the method as described in (Qian and Huang 1996). Figure 3 shows the results of the DSM for a satellite image of urban scene. Figure 3a shows the input image and Fig. 3b shows the edge map obtained using multi-scale 1-D Canny edge detector (Kumar et al. 2000; Qian and Huang 1996), while Fig. 3c shows the corresponding DSM results. Figure 4 show the same results using another image of a suburban scene. Segmentation Using Probabilistic SVM The goal of region-based techniques is to segment an image into clusters using classification or regiongrowing algorithms. A support vector machine (SVM) is a relatively new classification technique that has grown from the field of statistical learning theory. SVM classifiers have yielded some excellent results in other application domains. In Mantero et al. (2005) a classification strategy is described that
(a)
(b)
(c)
Fig. 3 The results of DSM on a satellite image of an urban scene. a Input image, b edge map extracted using multi-scale Canny (Kumar et al. 2000; Qian and Huang 1996) c corresponding DSM output
(a)
(b)
(c)
Fig. 4 The results of DSM on a satellite image of a suburban scene. a Input image, b edge map extracted using multi-scale Canny (Kumar et al. 2000; Qian and Huang 1996) c corresponding DSM output
allows the identification of samples drawn from unknown classes through the application of a suitable Bayesian decision rule (Duda et al. 2000). This approach is based on support vector machines (SVMs) for the estimation of probability density functions, which uses a recursive procedure to generate prior probability estimates for known and unknown classes. SVMs are exploited by Yager and Sowmya (2003) as a classifier for road extraction, which involves two stages of processing. Here, SVM is trained using edge based features such as, edge length, gradient and intensity within the edge pair. In level 1, SVM is used to classify edges as road edges or non-road edges. Edges classified as road edges are given as input to the SVM in level 2 where opposite edges are paired as road segments. However, they have reported very low correctness measure. A new method (Miliaresisa and Kokkasb 2007) is presented for the extraction of buildings from light detection and ranging (LIDAR) digital elevation models (DEMs) on the basis of segmentation principles. The accuracy of supervised classification largely depends on the quality of the training data. The locations and sample size of training data are difficult to be optimized depending on image data types and classifiers to be used. Support vector machines (SVM) represent a promising development in machine learning research that is not widely used within the remote sensing community (Pal and Mather 2005). The architecture of a SVM machine (Theodoridis and Koutroumbas 2006) is given in Fig. 5. Number of nodes is determined by the number of support vectors Ns. The main idea of SVM is to separate the classes with a hyperplane surface so as to maximize the
margin among them. In this paper, support vector machines are used to classify roads from satellite imagery. In SVM the input vectors are mapped nonlinearly to a very high-dimensional feature space (Cortes and Vapnik 1995). Considering a two-class pattern classification problem, let the training set of size N be Xi ; di N where, Xi2 2 Rn is the input i1 pattern for the ith example and di2 2 1; 1is the corresponding desired response. The classifier is represented by the function f x; a ! y with as the parameters of the classifier. The SVM method involves finding the optimum separating hyperplane so that: 1. Samples with labels y = 1 are located on each side of the hyperplane. 2. The distances of the closest vectors to the hyperplane on each side are maximum. These are called support vectors and the distance is the optimal margin. The membership decision rule is based on the function f(x) where, f(x) represents the discriminant
Fig. 5 Architecture of SVM
10
function associated with the hyperplane in the transformed space and is defined as: f x w :fx w0
where, w* is the weight vector, w0 is the bias fx 2 0 Rd d 0 d: SVM is used to classify every pixel into either road or non-road groups based on the sign of the discriminant function (y = sgn(f(x))). Pixels belonging to roads are assigned as group 1 and others to group 2 from training sample images. Since SVM has good generalization ability, this decision function can be applied to extract road structures from satellite images. Through training, we obtain the decision function. The feature vectors are fed into the SVM classifier initially for training (to learn the pattern) from known examples, and then for predicting the labels of unknown samples once the training is complete. Considering a classifier to produce a posterior probability is very useful in practical recognition problems. Posterior probabilities are also required when a classifier is making a small part of an overall decision, and the classification output is combined for overall decision. As described above, SVM is principally a binary classifier. Polynomial kernel of degree two was used due to its superiority over other kernels for most of the applications. However, SVM (Cortes and Vapnik 1995) produces an uncalibrated value that is not a probability. In the next section, we describe a mechanism to obtain probabilistic classification of pixels as roads or nonroads, using soft-class labels from SVM. Soft Class Labels Using PSVM SVM does not provide any estimation of their classification confidence. Thus, SVM does not allow us to incorporate any a-priori information. Hence we use PSVM to produce posterior probability P(Class/
Input). The posterior probability outputs of SVMs are based on the distance of testing vectors and support vectors. Following a method presented in Platt (1999), a sigmoid model is used to map binary SVM scores into probabilities as shown below: Py 1jf 1 1 expAf B 8
where y is the binary class label and f is an output of SVM decision function (Eq. 7). The two parameters A and B are obtained using Maximum likelihood using the training set (fi, yi). The parameters A and B are found by minimizing the negative log- likelihood of the training data. An image block is said to be road if its probability output by PSVM is larger than a predetermined threshold. As a result, the model has a probabilistic output for further processing. Probabilistic output of a classifier makes it possible to use existing results for fusion theories, especially in cases when a classifier is making a small part of an overall decision, and the classification outputs must be combined for the overall decision. Training samples are gathered from regions surrounding the road pixels. The sample sub images shown in Fig. 6, illustrate the discriminative feature between road and non-road samples. Spectral characteristics vary for both the classes, which is analyzed by PSVM. As seen from Fig. 6a, local homogeneous orientation for the road class will be captured by DSM, whereas non-road structures as shown in Fig. 6b will produce distributed orientations. In order to demonstrate the performance of the proposed method, we used the generated dataset described in Section Dataset Description and Performance Measures. Our system is trained with 5,000 samples of road and 7,200 samples of non-road classes. Once the classifier is trained, it is asked to predict the labels for the test
(a)
Fig. 6 a Road samples and b non road samples, of size 2121
(b)
11
image pixels. Figure 7 shows the results of P-SVM for the images given in Figs. 3a & 4a. Experimental results for different scenarios, namely, urban and suburban areas of developed and emerging countries and their discussions are presented in Section Results and Discussion. In the next section, we discuss the method of fusing the two complementary information (segment class from PSVM and linear edgemap edge obtained using DSM), using a CSNN (Constraint Satisfaction Neural Network) based integrator. CSNN for Integration Edge extraction from satellite images often delivers partly fragmented and erroneous results. Attributes describing geometrical and radiometric properties of the line segments can be helpful in sorting out the most probable false alarms. However, these attributes may be ambiguous and are not considered to be reliable enough when used alone. Region based segmentation produces over-segmentation whereas edge based segmentation may lead to undersegmentation. We used a fusion strategy proposed by Lalit et al. (2008), which uses a constraint to iteratively correct both these erroneous outputs to produce a better result. The method is described briefly in the following for the sake of completion of this paper. Each neuron in CSNN-CII contains two fields: probability and rank. Rank field stores the rank of the probability in a decreasing order for that neuron. We exploit the soft class labels produced by PSVM to compute ranks, which in turn is used to initialize the interconnection weights of the CSNN. In addition to region-based constraints CSNN-CII also incorporates edge constraints. The number of neighbors considered
for computation is determined using edge information. The initial class probabilities can be obtained using PSVM (Platt 1999). The initial edge maps can be obtained using DSM based techniques for road edge extraction. Dynamic Window The interconnection weights of the CSNN are computed only for those neurons which are within the effective size of the dynamic window. This effective width is based on the presence of edge information around the seed pixel. The stopping criterion is based on the presence of the edge pixels. Hence this process helps to mutually exploit both the complementary information of regions and edges inside the window. The window is considered to be dynamic (or adaptive), as its effective size depends on both these information one (region) for initial estimation and the other (edge) for convergence. The obvious advantage of using dynamic window at region boundaries is that only the neurons which correspond to a single class will be processed and the neurons which may confuse the network would not be used for computation. The optimal size of dynamic window (m n) was obtained empirically as 3121. Lalit et al. (2008) used a square window, whereas we use a rectangular oriented window in our work. The orientation of the rectangular window is obtained from the DSM output. It was observed from experimentation, that when a larger window size was used small regions (or small sections of a region) were merged with larger adjacent regions. The use of a smaller window size makes the CSNN take a longer time to converge to the final solution. Figure 8 shows
(a) (a) (b)

Fig. 7 a The results of P-SVM for the image shown in Fig. 3a; b the results of P-SVM for the image shown in Fig. 4a
(b)
Fig. 8 The results of CSNN-CII obtained by: a combining those in Fig. 3c & Fig. 7a; b combining those in Fig. 4c & Fig. 7b, for the images in Figs. 3a & 4a respectively
12
the results of CSNN-CII using inputs from the intermediate results of processing shown in Figs. 3, 4 and 7, for the images in Figs. 3 and 4a. Post-Processing and Segment Linking The objective of the refinement process presented here is to eliminate the false segments which do not belong to roads. The result of CSNN integration produces a few undesired patches, which do not correspond to road segments. In the case of satellite images, a few undesired or noisy structures will be erroneously classified as road segments. To eliminate these false alarms (segments), we use connected component labeling (Haralick and Shapiro 1992) to extract the disjoint segments from the output of our algorithm. Segments with area less than a prefixed
threshold TA are deleted. Major axis and minor axis lengths of each component are computed using normalized second central moments for each segment as shown below: m20 M20 xM10 ; m02 M02 yM01 ; XX M10 M10 ; y ; Mpq xi yi Ix; y M00 M00 x y
We computed the ratio of major axis length to the minor axis length of each component as: E m20 m02 Components having value of E less than a threshold TE, are usually non-road structures and hence deleted. The steps of the algorithm, depicting the postprocessing stage is given below in Algorithm 2.
Algorithm 2 Steps of Post-processing for refining the result. Compute the connected components. 1. Compute Area (A) of each connected component. 2. Compute Eccentricity (E) of each connected component. 3. For each Component if (E TE) then delete that component else if (A TA) delete that component end if end if
We used a region linking algorithm (Rizvandi et al. 2008) to eliminate the discontinuities detected between road segments. Initially a dilation operation is performed on the input image. Since dilation is an operation that thickens or grows objects in the original image, the result of this operation is that edge segments which are very close to each other are automatically linked. In our algorithm the structural element used for the dilation operation is a disk of radius 10. The image is then thinned and the edges are broken down into smaller straight line edge segments. Heuristics based upon proximity properties and alignment of road features are used to cluster and integrate fragmented segments. For each segment, the best neighbor is determined based on the difference in direction and the minimum distance between the end points. Results of post-processing and segment linking are shown in Figs. 9 and 10.
(a)
(b)
Fig. 9 The results of (a) post-processing using the output shown in Fig. 8a; b segment linking using the output shown in (a)
13
6.
7.
(a)
(b)
Fig. 10 The results of (a) post-processing using the output shown in Fig. 8b; b segment linking using the output shown in (a)
8. Region Part-Segmentation Region part segmentation is necessary to eliminate some large patches of non-road structures which appear to be fused to roads. These patches are manmade structures such as roof-tops, parking lots, with similar spectral characteristics as roads. The proposed algorithm for Region part-segmentation is based on part-segmentation (Bennamoun and Mamic 2002), consisting of the following steps: 1. Compute the smoothed inner and outer contours (closed) of the image 2. Compute the smoothed curvature of the contours. 3. Determine the local extrema, where the derivative of smooth curvature equals zero, with curvature value greater than a threshold. 4. Compute Convex/Concave Dominant Points at which the interior angle is greater/less than 180 , by tracing the outer/inner contour of the region as shown in Fig. 11. 5. Compute effective Convex (CDPcx) and Concave (CDPce) dominant points, on outer and 9.
10.
11.
inner contours respectively by logical AND operation of the output in steps 3 and 4. The CDPs (both CDPcx & CDPce) are moved along the normal for a fixed number of iterations (all the CDPs must move simultaneously) on the respective contours. A moving CDP will stop (freezes) only if it touches another moving CDP or a point on the same contour within a specified path distance from it. For the outer contour, if the contour of the segment touches the boundary of the image, then respective CDPs are not freezed. Trace back all the freezed CDPs and join the pair of corresponding CDPs or the CDP and the contour point using a line segment. For each line segment obtained in step 8: form two adjacent regions within a closed contour, using the line as the new boundary. Merge the new pair of adjacent region, if they have similar structural properties (orientation of line segments near the CDPs). Set a threshold and eliminate all the connected components with area below the threshold.
Curvature Computation A curve is represented in parametric form, where t is the path length, x and y are the coordinates of the contour. rt xt; yt 9
If there is more than one object, then outer contour is traced for each object. If there is a child object inside an object, we have to then trace the outer contour for the child object as well. Inner boundary pixels are extracted by tracing the pixels at the inner contour in an object. A smoothing of the contour with a Gaussian kernel is then needed prior to the computation of the curvature, to overcome the problem of discontinuities in derivatives needed for curvature calculation (Pei and Lin 1992). The smoothed contour is represented xs t xtG; ys t ytG 10
Fig. 11 a Input image; b inner and outer contours
Figure 11a shows an image having one object with two holes. The outermost pixels of the object are traced to extract the outer contour and the boundary of the holes gives the inner contours as shown in
14
Fig. 11b. Curvature is defined as the rate of change of slope as a function of arc length t: Kt dqt dt 11
Extraction of Dominant Points It has been suggested from the view point of the human visual system (Bennamoun 1994) that the dominant points have high curvature or the rate of change of slope along the path length is high. In this paper, we detect these points and use them to decompose the object to remove the non-road structures. Dominant points are points having a curvature value grater than a threshold. Local extremas are defined by the points at which the derivative of the curvature equals zero (Pei and Lin 1992), as Ks t
:
where, (t) is the tangent to the curve at t. The curvature is computed as (Bennamoun and Mamic 2002) xs ys ys xs Ks t 12 3=2 x 2 y 2
s s
The curvature obtained from Eq. 12 is smoothed with a Gaussian kernel (Eq. 2) to obtain a smoothed curvature, as given by the following equation: Ks t KtG 13
dKs t 0 dt
14
Figure 12c shows the curvature plot of the image shown in Fig. 12a. The smoothed curvature obtained using Eq. 13 is shown in Fig. 12d.
which is equivalent to convolving the curvature with the derivative of Gaussian and taking the zero crossings of this operation. Figure 12e shows the local extremas for the input image in Fig. 12a.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Fig. 12 a Input synthetic image; b smoothed contour; c curvature plot; d smoothed curvature; e local extremas; f effective CDPs marked on the smoothed curvature in (d); g CDP marked on the smoothed contour; h contour normals at CDP; i segmented map of (a)
15
Convex Dominant points on the outer contour is combined with local extremas using AND operation to give the effective CDPcx. Similarly Convex Dominant points on the inner contour is combined with the local extramas to get effective CDPce. These points are then used to segment the non-road parts from the given image. The CDPcx are moved inwards along the direction of its normal, where as CDPce are moved outwards along the direction of the normal. For a particular contour all the CDPs (both CDPcx & CDPce) are allowed to move simultaneously, and a CDP freezes only when it touches another moving CDP in the same contour or a point in the contour itself, which is within a specified path length. The specified path length of the moving CDP dictates the maximum perimeter of the non-road region for the purpose of elimination. All the freezed CDPs are traced back to their origins and the corresponding CDPs or the CDP and the contour point are joined using a line segment. The effective CDPs on the smoothed contour in Fig. 12d are shown in Fig. 12f. The same are marked on the smoothed contour of Fig. 12b, in Fig. 12g. Figure 12i show the results of region part segmentation for the image in Fig. 12a. Unlike Bennamoun algorithm (Bennamoun and Mamic 2002) there is no necessity to freeze all the CDPs and we only move the CDPs for a particular number of iterations. Unfrozen CDPs are not taken into account for segmentation. Now the regions fitted with the new line segments are isolated as separate components. Setting an area threshold, small noisy non-road structures are eliminated. Figure 13 shows the results of region part segmentation algorithm for the images shown in Figs. 9b and 10b. It is observed that the non-road regions have been eliminated
thereby improving the accuracy of road extraction results (Fig. 13).
Experimental Results and Comparative Study We now describe the results of experimentation using our proposed framework. The performance of the proposed method is verified on satellite images of size 512512 each. The performance of the proposed technique is compared with two state of the art techniques: Tuncer (2007) and Mokhtarzade et al. (2007), as well as a free commercial tool for feature extraction (Geospace 2008), termed as FeatureObjeX. FeatureObjeX (Geospace 2008) is a semiautomatic system, where it allows the user to select the training samples. Once the seed is created intensity distributions are computed for a set of pixels around the seed, which are then used to fit a multivariate normal distribution. Each seed region is modeled by a Naive Bayes classifier (Duda et al. 2000). Then the likelihood of a given pixel is computed with respect to each of the seed distribution. If the likelihood of a particular pixel is same or greater than the likelihood of the seed, then that pixel is classified as a target class. FeatureObjeX was used to segment the image into road and non-road classes using color features. Several configuration changes were made in FeatureObjeX before the tests, to make it more efficient and closer to our requirement for working in road scenes over urban and suburban environments. Dataset Description and Performance Measures We created a database for satellite images with 1-m/ pixel resolution from Wikimapia (Koriakine and Saveliev 2006). The commercial cost of this type of images is very expensive. We screen captured 100 images of Developed countries and 100 images of Emerging countries that we considered useful for our work. In our case the place and date were not very critical, and the only characteristic that we were looking for was the content of the images which had views of highways and roads. For creating the dataset, we consider selected sections (512512 pixels) of scenes from satellite images of 1 m/pixel resolution acquired from Wikimapia (Koriakine and Saveliev 2006), which includes: (1) sub-urban and (2) urban
(a)
(b)
Fig. 13 The results of region part segmentation for: a output shown in Fig. 9b; and b output shown in Fig. 10b
16
areas from Developed and Emerging countries. Figures 15a and 17a show three examples of images from suburban areas in Developed and Emerging countries, whereas Figs. 16a and 18a show three examples of images from urban areas in Developed and Emerging countries. For each image in the dataset, ground truth (road) map was also obtained using a human operator. A portion of the dataset can be downloaded from (Visualisation and Perception Lab 2006). The categorization of the data into four groups was done with the advice (based on visual observation and geo-location) of a GIS expert. As the data was distributed in four groups of 50 images each, we trained four different P-SVMs with data (25 images) from each respective group. Rest (25 images) was used for testing and performance analysis of the output of our proposed multistage framework. To assess the performance of road extraction system, the length of the extracted road network (parameter obtained after morphological thinning) that falls within a prespecified range with respect to the reference road network is used for the calculation of accuracy measures. The road segments in the test sites are manually digitized to form the reference road network. This subjectively obtained reference road network, used to evaluate the proposed road extraction system, covers all roads present in the image. Hence this is used as a ground truth to estimate the accuracy measures for road extraction. Two measures are used to evaluate the accuracy of the extracted road network (Heipke et al. 1997), and these measures are defined as follows: Completeness is defined as the percentage of the reference data, which was detected during road extraction: completeness length of matched reference length of reference 15
Correctness is represents the percentage of the extracted road data, which is correct: correctness length of matched extraction length of extraction 16
Results and Discussion Separate training of the P-SVM was necessary for the four categories of image samples, as the spectral characteristics exhibited for roads were different for
the four cases of our study. The road intensity and contrast also varies between the four different types of image samples. The proposed CSNN - based algorithm iteratively shuttles between adding new and removing redundant edge pixels, and hence inherently produces a correction mechanism to the process of fusion. Edge maps are obtained using the method discussed in Section DSM Based Edge Detection. The CSNN-CII algorithm requires the probability values for all the pixels corresponding to each class in an image. The initial probabilistic values and segmented maps are obtained using the method discussed in Section Segmentation Using Probabilistic SVM. In order to directly compare our approach with the recently published results in (Tuncer 2007; Mokhtarzade and Zoej 2007), we use a pair of images published by them for evaluation. We will show the results first on the images used in (Tuncer 2007) and (Mokhtarzade and Zoej 2007), and then with a few examples from the testing dataset acquired from wikimapia (Koriakine and Saveliev 2006) in Figs. 15, 16, 17 and 18. Figure 14a shows two sample images used in (Tuncer 2007) and (Mokhtarzade and Zoej 2007). Output of human operator to detect roads from the two images are presented in Fig. 14b. Figure 14c-I presents the result published in (Tuncer 2007), while that in Fig. 14c-II is taken from (Mokhtarzade and Zoej 2007). The result in Fig. 14c-I shows that only roads with rather large pixel widths such as the main highways are recovered as thinned structure. Prominent roads are recovered with good accuracy but inner city roads which are narrow and road intersections have not been recovered. Similarly for the method presented in (Mokhtarzade and Zoej 2007), false positives (non road structures) occur more which reduces the correctness measure for this method (see Fig. 14c-II). Some pixels belonging to rooftops of buildings were falsely identified as roads. The completeness and correctness measures for the given test images calculated for Tuncer (2007) and Mokhtarzade et al. (2007) as well as our proposed method are shown in Table 2. The completeness measure in (Mokhtarzade and Zoej 2007) is higher than that in (Tuncer 2007), as the true positives (actual road parts) are detected more accurately. Results of our proposed method are much better in both the cases as shown in Fig. 14d. It can be
17
(I)
(II)
(a)
(b)
(c)
(d)
Fig. 14 a Images presented in (Tuncer 2007) & (Mokhtarzade and Zoej 2007); b output of manual (hand-drawn) extraction; c results reproduced from (I) Tuncer (Tuncer 2007) and (II) Mokhtarzade et al. (Mokhtarzade and Zoej 2007); d results of our proposed approach
observed that our method outperforms both the prior published work. Figures 15 and 16 show the results obtained using the proposed methodology on satellite images of Developed countries, whereas Figs. 16 and 17 show the results for Emerging countries. Figures 15b, 16b, 17b and 18b show the results of feature extraction using the FeatureObjeX tool for the images in Figs. 15a, 16a, 17a and 18a respectively. Figures 15c, 16c, 17c and 18c show the results for the algorithm proposed in (Tuncer 2007). Figures 15d, 16d, 17d and 18d show the extracted road segments from the input satellite images using the technique presented in (Mokhtarzade and Zoej 2007). Figures 15e, 16e, 17e and 18e show manually plotted reference road layouts from the respective input images. It can be observed that the results of our proposed method, given in Figs. 15f, 16f, 17f and 18f are significantly better than other approaches and quite close to the ground truth given in Fig. 15e, 16e, 17e and 18e. Our
Table 2 Performance of the proposed approach and the algorithm presented in (Tuncer 2007; Mokhtarzade and Zoej 2007) Methods (Tuncer 2007) (Fig. 14c-I) Proposed (Fig. 14d-I)
system outperforms FeatureObjeX (Geospace 2008) and other state of the art methods in all the cases. The optimal values for the parameters used for our proposed approach are given in Table 3, which have been obtained empirically using a large set of experiments. Table 4 describes the comparison of accuracy measures for the results presented in Figs. 15, 16, 17 and 18 using the completeness and correctness measures. From Table 4 it can be seen using the completeness and correctness measure that our proposed method outperforms the other techniques in almost all the cases. In very few cases, the completeness measure of the FeatureObjeX tool is marginally better than our method. Tables 5, 6, 7 and 8 show the average classification accuracy obtained by analyzing images using the proposed method, FeatureObjeX (Geospace 2008) and two state of the art techniques (Tuncer 2007) and (Mokhtarzade and Zoej 2007), over 25 images in four different categories respectively.
Completeness 82% 100% 92% 96% Correctness 96% 100% 82% 85%
(Mokhtarzade and Zoej 2007) (Fig. 14c-II) Proposed (Fig. 14d-II)
18
(a)
(b)
(c)
(d)
(e)
(f)
(I)
Fig. 15 a Three satellite images of size (512512), from a suburban area of a developed region; b results from FeatureObjeX (Geospace 2008); c results of the method proposed in (Tuncer
(II)
(III)
2007); d results of the method proposed in (Mokhtarzade and Zoej 2007); e hand-drawn (manual) road map; f results of our proposed method
19
(a)
(b)
(c)
(d)
(e)
(f)
(I)
Fig. 16 a Three satellite images of size (512512), from a urban area of a developed region; b results from FeatureObjeX (Geospace 2008); c results of the method proposed in (Tuncer
(II)
(III)
20
(a)
(b)
(c)
(d)
(e)
(f)
(I)
Fig. 17 a Three satellite images of size (512512), from a suburban area of a emerging region; b results from FeatureObjeX (Geospace 2008); c results of the method proposed in (Tuncer
(II)
(III)
21
(a)
(b)
(c)
(d)
(e)
(f)
(I)
Fig. 18 a Three satellite images of size (512512), from a urban area of a emerging region; b results from FeatureObjeX (Geospace 2008); c results of the method proposed in (Tuncer
(II)
(III)
22 Table 3 Values of the parameters used in our proposed approach Road image type Suburban Urban 1 2 3 2 2.5 3.5 N 99 1111 0.6 0.7 TE 0.7 0.7 TA 50 50
It is observed from the results shown in Figs. 15, 16, 17 and 18 and Tables 4, 5, 6, 7 and 8 that the performance measure for our proposed algorithm is superior than the other methods. The results obtained using proposed methodologies are much superior to the methods presented in (Tuncer 2007; Mokhtarzade and Zoej 2007) and close to the manually drawn reference road network. Compared to our preliminary investigation in (Mirnalinee et al. 2009), the performance in terms of completeness and correctness measures have been enhanced significantly. The region linking algorithm improves the completeness measure whereas the region part segmentation improves the correctness measure. The correctness and completeness measures obtained for scenes from emerging countries are in most cases less compared to the scenes from the developed countries. This decrease in accuracy for the scenes of emerging countries is expected, since there are many more opportunities for errors in these types of areas due to the large numbers of linear non-road features, four way crossings, non-linear road structures and unplanned layouts. Comparing the results of developed urban and suburban scenes, the performance of urban scenes is low, because of the distortions. It is obvious that images of urban areas exhibit a more complex structure than scenes of suburban areas, as the number of different
Table 4 Performance of the system for the images shown in Figs. 15, 16, 17 and 18
objects and their heterogeneity is much higher in urban scenes. Some of the roads comprise several lanes that are linked by complex road crossings. Generally as shown in Fig. 15, the extraction results for open landscape areas are nearly complete and correct. Suburban scenes of emerging countries are covered by vegetation. Moreover the spectral response of roads in these areas are on certain occasions similar to the spectral response of open-fields and roof-tops, which in turn increases the false positives thereby reducing the correctness measure. Overall, our proposed method outperforms the featureObjeX (Geospace 2008) and the two state of the art methods (Tuncer 2007; Mokhtarzade and Zoej 2007), for observations averaged over images of 50 developed and 50 emerging areas.
Conclusions A novel and efficient method for automatically extracting roads using low level information, directly from satellite images based on region and edge integration has been introduced and demonstrated. This new method combines outputs of PSVM and DSM in such a way that, it preserves the strong discriminative ability of SVM while simultaneously exploiting the linear like characteristics in the features derived using DSM. For the determination of the discontinuity and elimination of non-road parts, two approaches were shown: the first using several criteria concerning properties of the road parts and their relations to each other. Segment linking module solves the problem of discontinuity to some extent, thereby increasing the completeness. Region part segmentation and shape analysis based on elongated-
Road image type
FeatureObjeX I II 84 72 96 82 83 73 87 71 III 100 91 97 72 94 63 83 75
Tuncer I 98 93 75 83 73 67 62 72 II 82 74 92 83 62 56 52 51 III 97 92 91 68 92 74 74 61
Mokhtarzade I 98 86 65 52 61 58 71 57 II 68 56 66 54 51 51 64 58 III 95 85 76 57 87 62 59 58
Proposed I 100 98 92 96 88 92 89 83 II 94 89 100 100 96 93 92 92 III 100 90 99 94 95 89 82 85
Developed suburban Fig. 15 Developed urban Fig. 16 Emerging suburban Fig. 17 Emerging urban A: Completeness, B: Correctness Fig. 18
A B A B A B A B
97 88 85 79 96 68 91 74
J Indian Soc Remote Sens (March 2011) 39(1):125 Table 5 Performance of the system averaged over 25 images of suburban scenes of developed countries Methods FeatureObjeX (Geospace 2008) Tuncer (Tuncer 2007) Mokhtarzade et al. (Mokhtarzade and Zoej 2007) Proposed Method Completeness 86% 81% 63% 93%
23 Correctness 79% 72% 60% 89%
Table 6 Performance of the system averaged over 25 images of urban scenes of developed countries
Methods FeatureObjeX (Geospace 2008) Tuncer (Tuncer 2007) Mokhtarzade et al. (Mokhtarzade and Zoej 2007) Proposed Method
Completeness 84% 81% 62% 93%
Correctness 74% 65% 89% 91%
Table 7 Performance of the system averaged over 25 images of suburban scenes of emerging countries
Table 8 Performance of the system averaged over 25 images of urban scenes of emerging countries
24
J Indian Soc Remote Sens (March 2011) 39(1):125 Doucette, P., Agouris, P., Stefanidis, A., & Musavi, M. (2001). Self-organized clustering for road extraction in classified imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 55, 347358. Duda, R., Hart, P., & Stork, D. (2000). Pattern classification. Wiley Interscience. Feng, X., & Milanfar, P. (2002). Multiscale principal components analysis for image local orientation estimation. In: Proceedings of The 36th Asilomar Conference on Signals, Systems and Computers, pp. 478482. Geospace (2008). FeatureObjeX, http://www.pcigeomatics. com/. Granlund, G., & Knutsson, H. (1995). Signal processing for computer vision. Boston: Kluwer Academic. Gruen, A., & Li, H. (1995). Road extraction from aerial and satellite images by dynamic programming. ISPRS Journal of Photogrammetry and Remote Sensing, 50(4), 1120. Haddon, J., & Boyce, J. (1990). Image segmentation by unifying region and boundary information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12 (10), 929948. Haglund, L., & Fleet, D. (1994). Stable estimation of image orientation. In: Proceedings of the First IEEE International Conference on Image Processing III, pp. 6872. Haralick, R., & Shapiro, L. (1992). Computer and robot vision. Addison Wesley. Heipke, C., Mayer, H., Wiedemann, C., & Jamet, O. (1997). Evaluation of automatic road extraction. International Archives of Photogrammetry and Remote Sensing, pp. 4756. Hinz, S., & Baumgartner, A. (2003). Multiview fusion of road objects supported by self diagnosis. In: In Proceeding of 2nd GRSS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas, pp. 137141. Hu, X., & Tao, V. (2007). Automatic extraction of main road centerlines from high resolution satellite imagery using hierarchical grouping. Photogrammetric Engineering and Remote Sensing, 73(9), 10491056. Hu, X., Zhang, Z., & Tao, V. (2004). A robust method for semiautomatic extraction of road centerlines using a piece-wise parabolic model and least square template matching. The International Journal of Photogrammetric engineering and Remote Sensing, 70(12), 13931398. Jacob, M., & Unser, M. (2004). Design of steerable filters for feature detection using Canny like criteria. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26 (8), 10071019. Jiang, X. (2007). Extracting image orientation feature by using integration operator. Pattern Recognition, 40, 705717. Jin, X., & Davis, C. (2005). An integrated system for automatic road mapping from high-resolution multispectral satellite imagery by information fusion. Information Fusion, pp. 257273. Kass, M., Witkin, A., & Terzopoulos, D. (1987). Snakes: active contour models. International Journal of Computer Vision, 1, 321331. Koriakine, A., & Saveliev, E. (2006). Data, http://www. wikimapia.org/. Kumar, P., Das, S., & Yegnanarayana, B. (2000). One-dimensional processing of images. In: International Conference on Multimedia Processing and Systems, pp. 451454.
ness measure eliminates non-road parts and increases the correctness. The results prove that the proposed system is able to effectively extract major sections of the road network, a few junctions and curved roads from high-resolution satellite images. It is observed that the road detection process produces a high degree of accuracy especially for the scenes of developed countries. In urban areas however, only major roads with larger pixel widths have been detected. Moreover, the presence of buildings and other features similar to roads made the extraction process somewhat more difficult compared to the suburban case. Linking of discontinuous segments, road junction detection and modeling of shadows are issues to be addressed in future scope of work for this problem. Vectorization of the extracted road segments can also be a nice extension of this work for GIS updates. The next step may include the formation of a road network by searching for junctions connecting road segments. Results may improve with the help of a road hypothesis verification using parallelism of road boundaries and use of a graph data structure to form a complete road network representation.
References
Baumgartner, A., Hinz, S., & Wiedemann, C. (2002). Efficient methods and interfaces for road tracking. In: Proceedings of the ISPRS commission III Symp. Photogrammet. Comput. Vision, pp. 2831. Bennamoun, M. (1994). A contour based part segmentation algorithm. In: Proc. of the IEEE ICASSP, pp. 4144. Bennamoun, M., & Mamic, G. J. (2002). Object recognition fundamentals and case studies. Springer. Bigun, J., Granlund, G., & Wiklund, J. (1991). Multidimensional orientation estimation with applications to texture analysis and optical flow. IEEE Transactions on Pattern analysis and Machine Intelligence, 13(8), 775790. Bucha, V., Uchida, S., & Ablameyko, S. (2006). Interactive road extraction with pixel force fields. In: IEEE The 18th International Conference on Pattern Recognition (ICPR06), pp. 829832. Chu, J., & Aggarwal, J. (1993). The integration of image segmentation maps using region and edge information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, 12411252. Cooper, G., & Cowan, D. (2007). Enhancing linear features in image data using horizontal orthogonal gradient ratios. Computers and Geosciences, 33, 981984. Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20(3), 273297.
J Indian Soc Remote Sens (March 2011) 39(1):125 Kurugollu, F., & Sankur, B. (1999). Map segmentation of color images using constraint satisfaction neural network. In: International Conference on Image Processing, pp. 236 239. Lalit, G., Mangai, U. G., & Das, S. (2008). Integrating region and edge information for texture segmentation using a modified constraint satisfaction neural network. Image and Vision Computing, pp. 11061117. Laptev, I., Mayer, H., Lindeberg, T., Eckstein, W., Steger, C., & Baumgartner, A. (2000). Automatic extraction of roads from aerial images based on scale space and snakes. Machine Vision and Applications, 12(1), 2331. Lin, W., Kuo, E., & Chen, C. (1992). Constraint satisfaction neural networks for image segmentation. Pattern Recognition, 25(7), 679693. Lyvers, E., & Mitchell, O. (1988). Precision edge contrast and orientation estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(6), 927937. Majidi, B., & BabHadiashar, A. (2009). Aerial tracking of elongated objects in rural environments. Machine Vision and Applications, 20, 2334. Mantero, P., Moser, G., & Serpico, S. (2005). Partially supervised classification of remote sensing images through SVM-based probability density estimation. IEEE Transactions on Geoscience and Remote Sensing, 43(3), 559 570. Mayer, H., Laptev, I., Baumgartner, A., & Steger, C. (1997) Automatic road extraction based on multi-scale modelling, context and snakes. In: International Archives of Photogrammetry and Remote Sensing, pp. 106113. McKeown, D. (1996). Top ten lessons learned in automated cartography. Mena, J. B. (2003). State of the art on automatic road extraction for GIS update: a novel classification. Pattern Recognition Letters, 24(16), 30373058. Miliaresisa, G., & Kokkasb, N. (2007). Segmentation and object-based classification for the extraction of the building class from LIDAR DEMs. Computers and Geosciences, 33, 10761087. Mirnalinee, T., Das, S., & Varghese, K. (2009). Integration of region and edge based information for efficient road extraction from high resolution satellite imagery. In: IEEE Proceedings of ICAPR, Kolkata, India, pp. 373376. Moigne, J., & Tilton, J. (1995). Refining image segmentation by integration of edge and region data. IEEE Transactions on Geoscience and Remote Sensing, 33, 605615. Mokhtarzade, M., & Zoej, M. (2007). Road detection from high-resolution satellite images using artificial neural networks. International Journal of applied Earth Observation and Geoinformation, 9(1), 3240. Pal, M., & Mather, P. (2005). Support Vector Machines for classification in remote sensing. International Journal of Remote Sensing, 26(5), 10071011. Pavlidis, T., & Liow, Y. (1990). Integrating region growing and edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 225233. Pei, S., & Lin, C. (1992). The detection of dominant points on digital curves by scale space filtering. Pattern Recognition, pp. 13071314.
25 Perona, P. (1998). Orientation diffusions. IEEE Transactions on Image processing, 7(3), 457467. Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, MIT Press, pp. 6174. Qian, R., & Huang, T. (1996). Optimal edge detection in twodimensional images. IEEE Transaction on Image processing, 5, 12151220. Raghu, P., & Yegnanarayana, B. (1996). Segmentation of Gaborfiltered textures using deterministic relaxation. IEEE Transactions on Image processing, 5(12), 424429. Rizvandi, N., Pizurica, A., Philips, W., & Ochoa, D. (2008). Edge linking based method to detect and separate individual c. elegans worms in culture. In: DICTA, pp. 6570. Shi, W., & Zhu, C. (2002). The line segment match method for extracting road network from high-resolution satellite images. IEEE Transactions on Geoscience and Remote Sensing, 40(2), 511514. Strang, G. (2005). Linear Algebra and its application. Thomson Brooks. Theodoridis, S., & Koutroumbas, K. (2006). Pattern Recognition. Academic. Tuncer, O. (2007). Fully automatic road network extraction from satellite images. In: Recent Advances in Space Technologies, pp. 708714. Tupin, F., Houshmand, B., & Datcu, M. (2002). Road detection in dense urban areas using SAR imagery and the usefulness of multiple views. IEEE Transactions on Geoscience and Remote Sensing, 40, 24052414. Udomhunsakul, S. (2004). Semi-automatic road detection from satellite imagery. In: IEEE International Conference on Image Processing (ICIP), pp. 17231726. Visualisation and Perception Lab (2006). http://www.cse.iitm. ac.in/~sdas/vplab/downloads.html. Wei, W., & Xin, Y. (2008). Feature extraction for man-made objects segmentation in aerial images. Machine Vision and Applications, 19, 5764. Xiao, Y., Tan, T., & Tay, S. (2005). Utilizing edge to extract roads in high-resolution satellite imagery. In: IEEE International Conference on Image Processing (ICIP), pp. 637640. Yager, N., & Sowmya, A. (2003). Support vector machines for road extraction from remotely sensed images. LNCS, 2756, 285292. Yang, J., & Wang, R. (2007). Classified road detection from satellite images based on perceptual organization. International Journal of Remote Sensing, 28, 46534669. Zhang, H., Xiao, Z., & Zhou, Q. (2008). Research on road extraction semi-automatically from high resolution remote sensing images. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XXXVII (Part B):536538. Zhu, C., Shi, W., Pesaresi, M., & Liu, L. (2005). The recognition of road network from high-resolution satellite remotely sensed data using image morphological characteristics. International Journal of Remote Sensing, 26(24), 54935508.

An Integrate Multistage Framework For Automatic Road Extraction From High Resolution Satellite Imagery

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Integrate Multistage Framework For Automatic Road Extraction From High Resolution Satellite Imagery

Uploaded by

Copyright:

Available Formats

J Indian Soc Remote Sens (March 2011) 39(1):125 DOI 10.

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

Contrast w.r.t. background Mostly homogenous Elongated Structure

J Indian Soc Remote Sens (March 2011) 39(1):125

Fig. 1 Framework of the proposed method for road detection

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

Fig. 5 Architecture of SVM

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

(a) (a) (b)

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

Fig. 11 a Input image; b inner and outer contours

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

thereby improving the accuracy of road extraction results (Fig. 13).

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

(Mokhtarzade and Zoej 2007) (Fig. 14c-II) Proposed (Fig. 14d-II)

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

J Indian Soc Remote Sens (March 2011) 39(1):125

Road image type

FeatureObjeX I II 84 72 96 82 83 73 87 71 III 100 91 97 72 94 63 83 75

Proposed I 100 98 92 96 88 92 89 83 II 94 89 100 100 96 93 92 92 III 100 90 99 94 95 89 82 85

23 Correctness 79% 72% 60% 89%

Completeness 84% 81% 62% 93%

Correctness 74% 65% 89% 91%

Completeness 89% 78% 64% 87%

Correctness 62% 64% 58% 91%

Completeness 78% 58% 63% 85%

Correctness 60% 52% 52% 87%

You might also like