You are on page 1of 5

OpenForisToolkit

Wiki
User name:
Password:
(new user) Login
Front Page News People Recent Changes
Example process chain
I nfo
Search:
Here is an example of the use of provided toolset, e.g. for LandCover classification.
Training data collection
Motivation: There are two main categories of image classification methodologies: unsupervised and
supervised. Unsupervised classification creates natural groupings in the image values, called spectral
clusters or classes. In this fashion, values with similar grey levels are assumed to belong to the same
cover type. The analyst must then determine the identity of these spectral clusters. Supervised
classification relies on the a priori knowledge of the location and identity of land cover types that
are in the image (training data). This can be achieved through field work, study of aerial
photographs or other independent sources of information (
http://www.ccrs.nrcan.gc.ca/glossary/index_e.php).
1. Use of field data
Existing field data may be used for land cover classification. There are, however, several points to
be considered before employing the field plots:
-Are there differences between image date and field data collection date?
-Is the sampling design suitable (e.g. dense enough, covering all required classes)?
-Does the sample plot size correspond to the pixel size?
-Do the attributes collected in the field allow derivation of the aspired land cover classes?
2. Use of remotely sensed data
Concerning independent remotely sensed data, Google Earth is one possibility for collecting the
training data, especially in areas with up-to-date, high resolution imagery.
1.1. Training data collection tool
There is a tool to be used with Google Earth (TZ_TD_Collector.swf) tailored for the Naforma LULC
classification task. It starts from elementary land cover levels and goes into details within each land
cover class. The forest cover % is also recorded, as well as the type and date of imagery available
at Google Earth of each point. Pre-defined locations (100 x 100 m squares) are used as training
areas. The idea for using the tool is that the output is coherent, i.e. all required information is
collected at each point and there is no variation in the class names etc. Old or coarse resolution
imagery areas can be left uninterpreted or these can be screened out later based on the recorded
attributes. It is recommendable to check the locations over the imagery employed in the actual
classification, as there may be clouds.
There is a tool PointsToSquares.py which creates a kml file containing the 100 x 100 m training
squares from a list of coordinates provided by the user. The sampling design (systematic, stratified)
can be selected by the user. Within the training squares, there are 25 systematically laid plots,
which can be used to assess the crown coverage. The script gengrid.bash can be used to generate
a systematic sample of user-determined distances in X and Y directions. The produced list of
coordinates can be used as input in the kml-generating program PointsToSquares.py.
The interpreted training areas are saved into xml-files, which can then be converted into shapefiles
using scripts GExml2csv.bash and CsvToPolygon.py.
Example process chain
1.2. Subjectively selected training areas
Training areas may also be selected subjectively and digitized using Google Earths built-in facilities.
There are 2 scripts genericGEkml2csv.bash and genericCsvToPolygon.py which are meant for
converting the produced separate kml files into one shapefile.
When selecting the training areas subjectively, extra care must be taken in order not to avoid areas
difficult to interpret and to the spatial and informational coverage of the training sites.
3. Use of other auxiliary data
Existing GIS data may also be used, especially to support image interpretation. In case of good, but
slightly outdated earlier map, the training data could be sampled from the map, outliers (i.e. changed
areas) removed from the sample and a new classification run.
Image pre-processing
1. Geometric correction/image reprojection
The generic program gdalwarp can be used for these purposes. It is an image mosaicing, reprojection
and warping utility. The program can reproject to any supported projection, and can also apply GCPs
stored with the image ( http://www.gdal.org/gdalwarp.html)
2. Atmospheric correction to directional surface reflectance
Directional surface reflectance represents the value that would be measured by an ideal sensor
held at the same sun-view geometry and located just above the Earths surface if there was no
atmosphere ( http://ledaps.nascom.nasa.gov/docs/pdf/SR_productdescript_dec06.pdf)
Motivation:
-Removes atmospheric distortions (caused by emission and absorption) within one image
-Improves the correlation between the ground truth and the pixel values
-Allows use of common training data over several scenes
-Allows the mosaicking of scenes, use of time series etc.
(-Is a prerequisite for methods, which require images calibrated to ground reflectance)
Landsat Imagery from USGS are treated using programs created in the LEDAPS project by NASA
http://ledaps.nascom.nasa.gov/
The program package performs atmospheric correction and produces surface reflectance values,
(script ledapsscript.bash can be used, however the program source codes must be obtained from
NASA and compiled in the user's system. Auxiliary data are also needed from the same source).
The output is a set of HDF files: one containing the corrected bands and some extra layers
(atmospheric opacity and a Quality Assurance), one containing a cloud and snow mask and one
containing the thermal band.
3. Correction of bidirectional reflectance effects
Correction of surface bidirectional effects in remotely sensed images, where both sun and viewing
angles are varying.
The reflectance from a target on the ground varies with both incidence angle and exitance angle
(the angle between the surface normal and the view vector). The variation is caused by both
topography and the anisotropic nature of how light reflects from the Earths surface due to different
land cover types. The light reflected from the surface is therefore highly dependent on the incidence
and exitance angles, and the phase angle between the incidence and exitance vectors. One
combination of incidence, exitance and phase angles results in one possible, bi-directional,
reflectance value. An infinite number of combinations results in the bi-directional reflectance
distribution function (BRDF) for a surface. ( http://www.scribd.com/doc/37526174/15arspc-
Submission-23)
Motivation:
-Improves the correlation between the ground truth and the pixel values
-Allows use of common training data over several scenes
-Allows the mosaicking of scenes, use of time series etc.
For this, a tool is under development.
4. Creating image stacks
Motivation:
-Some programs require image composites rather than separate bands
The HDF data created in the atmospheric correction process (with ledapsscript.bash) can be
converted into Erdas Imagine .img composite using script stack.bash.
In case a composite is needed directly from TIFF band layers provided by USGS, the generic program
gdal_merge.py can be used:
gdal_merge.py -of HFA -o output_filename.img -separate b1.tif b2.tif b3.tif b4.tif b5.tif b6.tif b7.tif
5. Masking of clouds/cloud shadows and Landsat 7 gaps
Motivation:
-Remove cloud/shadow/gap pixels from training data before classification, in case an algorithm is
used which is sensitive to outliers.
-Obtain a wall-to-wall classification without blank areas. The cloud/shadow/gap pixels are filled
based on the mask, using stand alone progam gdal_gapfill. See point 6. below.
Some options:
-unsupervised classification: for this the script cluster.bash can be used
-going for L7 gaps only: for this the script mask_single_image.bash can be used
-LEDAPS project gap/cloud/shadow-mask: for this no script is yet available for single image, but for
the gapfilling (gdal_gapfill) purposes a masking script trim_mask.bash is provided which extracts the
bad data in 2 images of the same scene and detects the combined effective area of the images.
-manual digitizing: for this, a script is to be provided soon, which combines the digitized AOI layers,
inserts the gaps from the anchor and filler images and detects the combined effective area of the
images.
6. Filling of cloud/shadow/gap areas
Motivation:
-Obtain a wall-to-wall classification without blank areas.
The cloud/shadow/gap areas in the target image can be filled using another image, preferably from
the same season and not too far in calerdar years. A simple way is to substitute the missing areas
with data from the second image.
However, even in case the image dates are close to each other, and the atmospheric correction has
been carried out, there may be dissimilarities. Therefore, a stand alone program gdal_gapfill has been
developed, which computes local regression models and fills the missing areas using the model. For
large holes, a large area regression model is applied. The user can define the parameters of modelling
windows for local and large area models. In areas with severe changes (e.g. clear-cut) the filling will
not succeed. Also, the model is sensitive to other outliers, so that the cloud/shadow mask must be
more or less perfect for the method to work well.
Classification
There is a multitude of methods, of which we present a selection:
1. Unsupervised classification
Unsupervised classification algorithms include e.g.: K-means clustering, ISODATA clustering, and
Narenda-Goldberg clustering. ( http://www.ccrs.nrcan.gc.ca/glossary/index_e.php).
The problem with unsupervised classification is that the spectral classes do not necessarily match
any informational classes due, e.g. to mixels and illumination conditions. At least, a large number of
preliminary classes is to be required - these can later be combined into the actual aspired classes.
In this package, k-means algorithm is included (stand-alone program gdal_kmeans). It clusters the
input image into user defined number of clusters, based on a sample of pixels (the sampling density
is determined by the user). The script cluster.bash can be used to run the program.
2. Supervised classification
2.1. Maximum likelihood
Maximum likelihood method computes statistics from training site pixels, and the probability density
function shows the probability for the target pixel to belong a certain class. The class with the
highest probability will be given to the pixel. For this, no tool is provided within this package, yet.
2.2. K nearest neighbors (k-nn)
This method performs a non-parametric regression. For each target pixel, we want to find the k most
similar observations, from which we have measured the variables we are interested in. One way to
determine the similarity is to calculate simple Euclidean distance between the DN's of the target and
the observations with measured variables. The method must be parameterized for the data and
problem at hand, e.g. the number of nearest neihgbors must be chosen. Increasing the value of k
generally improves the accuracy of the estimates, but the amount of training data becomes a
limiting factor.
The stand alone program gdal_nn can be used.
2.3. Random Forest
For this, no tool is provided within this package, yet.
Evaluation
Motivation:
-Accuracy affects the legal standing and operational usefulness of maps and reports derived from
remotely sensed data (Campbell 2002)
-No image product should be released without descriptive error statistics.
Typically, error matrices are computed, from which producer's and user's accuracies, overall
accuracy, kappa and adjusted kappa values can be derived. Possibilities for obtaining ground
reference information for error matrices include use of independent test data (field observations,
image interpretation, auxiliary data) and use of leave-one-out cross-validation. Spatial variation is
not covered by these measures, however. Maps of uncertainty can also be produced, revealing e.g.
the distance to the class centre or the variation between neares neighbors.
After evaluation, additional training data may need to be acquired for classes that were detected
poorly and the classification re-run.
Editing of output
Motivation:
-Remove noise from the classified image
-Correct obvious errors
-Separate classes for which classification failed
-Create subclasses using auxiliary data or user-determined filters
-Match classifications on overlapping areas of 2 scenes
Except where otherwise noted, this content is licensed under a
Creative Commons A ttribution License. See Copyrights.

This is a Wiki Spot wiki. Wiki Spot is a 501(c)3 non-profit organization
that helps communities collaborate via wikis.

You might also like