ModEco Manual

ModEco: Integrated Software for
Ecological Niche Modeling (Version 1)

Qinghua Guo and Yu Liu
University of California Merced
Last updated: November, 2009
1 Introduction .................................................................................................................. 1
1.1 Environmental niches modeling ........................................................................... 1
1.2 Functions of the toolbox ....................................................................................... 1
1.3 User interface ........................................................................................................ 2
2 Project management ..................................................................................................... 3
2.1 Project structure .................................................................................................... 3
2.2 Import environmental factor layers ....................................................................... 6
2.3 Import species data ............................................................................................... 8
2.4 Save elements in a project................................................................................... 11
3 Interactive display ..................................................................................................... 12
3.1 Zoom in/Zoom out .............................................................................................. 12
3.2 Set symbols of map ............................................................................................. 12
3.3 Overlay species data points [View Overlay species data points...]................. 13
3.4 Overlay result map [ViewOverlay result map] ........................................... 14
3.5 Overlay Base map ............................................................................................... 15
4 Factor analysis ........................................................................................................... 16
4.1 Factor histogram [AnalysisFactor histogram]............................................. 16
4.2 Scatter plot [Analysis->Scatter plot]............................................................... 17
4.3 Factor importance analysis [Analysis->Factor importance analysis] ............. 18
4.4 Factor importance analysis based on sub-groups [Analysis->Factor importance
analysis (based on sub-groups)] ............................................................................ 19
4.5 Principal component analysis [Analysis->Principal component analysis] ..... 20
5 Model training and prediction.................................................................................... 21
5.1 Models introduction ............................................................................................ 22
Support Vector Machines
BioClim
Domain
Generalized Linear Model
Maximum Likelihood Classification
Artificial Neural Network
Rough Set
5.2 Train a model [ModelTrainThe concret model] .......................................... 24
5.3 Predict using a trained model [ModelPredict] ............................................ 28
5.4 Run two-class model based on pseudo absence points approach [Pseudo-absence
based two class model...] .......................................................................................... 28
6 Accuracy assessment ................................................................................................. 29
1
6.1 Accuracy of prediction result [AnalysisAccuracy of prediction] ............... 29

6.2 Cross-validation accuracy assessment [ModelEvaluate modelThe concrete
model] ....................................................................................................................... 30
6.3 ROC curve [ModelROC curveThe concrete model]................................... 31
6.4 True positive rate/ prediction area plot [Model True positive rate/ prediction
area plot] ................................................................................................................... 32
6.5 Search maximum Kappa value [ModelSearch maximum ] ............................ 33
7 Tutorial....................................................................................................................... 34
7.1 Data description .................................................................................................. 34
7.2 Create a project to manage data .......................................................................... 35
7.3 Estimate factor importance ................................................................................. 35
7.4 Search maximum Kappa value ........................................................................... 36
7.5 Train and run the model ...................................................................................... 36
7.6 Assess the accuracy of the prediction result ....................................................... 37
7.7 Save the prediction result map ............................................................................ 38
8 Acknowledgement ..................................................................................................... 38
9 References .................................................................................................................. 38
1 Introduction
1.1 Environmental niches modeling
Modeling species distributions of biodiversity conservation and obtaining a better
understanding of the relationship between environmental factors and species distribution
is an important task (Funk & Richardson 2002; Thuiller 2004). With the increasing
availability of digital ecological data (Graham et al. 2004; Wieczorek et al. 2004),
environmental niche modeling has gained much attention for various ecological
applications (Pearce & Boyce 2006). For example, they have been used to study potential
distributions of invasive species (Guo et al. 2005; Thuiller et al. 2005), study the response
of species distribution to climate change (Kueppers et al. 2005; Broenniman et al. 2006),
and perform biodiversity assessments (Feria & Peterson 2002; Chefaoui et al. 2005).
A range of environmental niches models have been proposed for studying species
distributions such as BioClim (Busby 1986), Domain (Carpenter et al. 1993), linear,
multivariate and logistic regressions (Mladenoff et al. 1995; Bian & West 1997; Kelly et
al. 2001; Felicisimo et al. 2002; Fonseca et al. 2002), generalized linear modeling and
generalized additive modeling (Frescino et al. 2001; Guisan et al. 2002a), discriminant
analysis (Livingston et al. 1990; Fielding & Haworth 1995; Manel et al. 1999a),
classification and regression tree analysis (De'ath & Fabricius 2000; Fabricius & De'ath
2001; Kelly 2002), genetic algorithms (Stockwell & Peters 1999), artificial neutral
networks (Manel et al. 1999a; Spitz & Lek 1999; Moisen & Frescino 2002), and support
vector machines (Guo et al. 2005).
1.2 Functions of the toolbox

The objective of the toolbox is to provide an integrated environment for running these
common models (models mentioned in the previous section). Functions in the toolbox
include data visualization, feature selection, accuracy assessment, and inclusions of a
range of niche models. Unlike other species modeling packages which often implement
one or two species modeling techniques, we developed a range of niche models in the
toolbox (i.e. BioClim, Domain, generalized linear model, artificial neural network
(ANN), support vector machines (SVM), and rough set). We choose these models
because: 1) they are commonly used in the literature (e.g. BioClim, Domain, Generalized
linear models); or 2) they are advanced machine learning algorithms, but are not available
in existing species modeling packages (e.g. SVM, ANN, rough set).
Fig. 1 Software architecture of ModEco

The function structure perspective of ModEco architecture is shown in Fig. 1. It
includes the following features:
1. Data management: import/export GIS data layers, project management
2. Spatial data visualization: zoom in/zoom out, overlay display
3. Feature analysis: analyze the importance of the factor involved in a model
instance to help the user to select an appropriate environmental factor.
4. Model training and prediction: train different niche models and predict the
species distributions based on the model
5. Accuracy assessment and model evaluation
1.3 User interface

The tool is running on the Microsoft Windows platform, thus the UI (user
interface) looks like most of the windows applications (Fig. 2). The main UI components
include:
1. Data window
The data windows are designed to display the geographic data layers in a project.
The data may be environmental factor layers, species data point layers, or prediction
result maps.
2. Project management window
The project management window provides a convenient way for the users to
browse the elements in a project using a tree control.
3. Status bar
In ModEco, the status bar includes three panes: a progress bar to indicate the
progress of a time-consuming operation, a coordinate pane to show the geographical
coordinate of the current mouse position, and an attribute pane to show the attribute value
in a data layer associated with the current mouse position.
4. Output window
The output window is used to output the running information of a model,
especially when it is time-consuming.
5. Menu bar and tool bar
They are similar to those of common windows applications.
Fig. 2 User interface of ModEco
2 Project management
2.1 Project structure
In ModEco, the data are managed using an XML-based project file. Its file extension
name is SML. It includes the following six elements (Fig. 3):
1. Environmental factor group
A factor data group encapsulates a set of factor layers in raster format used in
predicting a species distribution. The raster layers may be temperature or precipitation
maps that influence the species distribution. However, the nominal data (such as soil
type) are not supported in this versions package.
More than one factor group can be managed in a project. Thus, the user may train
a model using one group (e.g. the historic data), but predict using another group (e.g. the
contemporary data).
The following metadata are necessary for a factor layer:
Name
Data unit
Key words
If the user wants to train a model using one group but predict based on another
group, their data unit and key words must be consistent. This tool also provides a
function to match the layers in different groups automatically.
When importing factor layers, the acceptable raster data may be BIP file or
ARCGIS ASSCI file. For BIP file, a header file (*.hdr) specifying the size and
geographic extent of a layer is needed.
2. Species data points
Species data points are managed using a simple text file (*.smp) similar to the
following:
-119.809924 37.485305 lobata
-119.804719 37.471919 lobata
-119.936354 37.407816 lobata
-119.951348 37.371039 lobata
-118.632031 36.529882 lobata
-116.927998 34.273594 N/A
-122.627998 40.673594 N/A
-121.677998 38.273594 N/A
-117.277998 34.523594 N/A
-115.977998 35.873594 N/A
where the first two columns specify the geographical coordinates of a speceis
point. If the third column reads N/A, it means no species is observed at this point;
otherwise, it is the name of a species. If a species layer includes more than one species, it
is a multi-species layer; otherwise, it is a single-species layer. A single-species layer
without N/A rows is a one-class layer; otherwise, it is a two-class layer. Different
models are suitable for different types of species point layers.
3. Model instance
In a project, the model instance manages the parameters and the trained result of a
model. After being trained, a model instance can be invoked at any time.
4. Result data
Based on a factor group and a trained model, the result map representing the
spatial distribution of a species can be obtained. The result map is a nominal raster data
layer. Generally, in a result layer, the value 0 stands for absence of a species, while
value 255 stands for a null value.
5. Base Map
In ModEco, the base maps are vector layers in ARCGIS shape format that can be
loaded and overlaid on a factor layer, species point layer, or prediction result layer.
6. Preferences
Some global settings are needed to implement certain projects. For example, the
extent, which defines the study area of interest, is important for input environmental data
layers that are greater than the study area of interest. If not defined properly, the extent
could also affect the performance of several analyses. For example, histogram
comparisons between the observed species distribution and background distribution
highly depend on the definition of extent, which may change the results dramatically.
Other analyses such as principle component analysis, scatter plot are also related to the
extent. In addition, the extent could also significantly affect the pseudo absence data
generation for the niche models. Consequently, we developed the extent as a global
preference for ModEco.
Fig. 3 XML schema of a project in ModEco
2.2 Import environmental factor layers

This function can be executed by clicking menu [File Add factor layer]. The
information of the factor layer to be added can be inputted using a dialog box (Fig. 4). In
the dialog box:
Click the button labeled to select a raster data file. It can be a binary BIP
file or an ASCII file (extension name is grd) (3).
Click >>Remove or Add<< to remove or add a key word for this data
layer (6) (8).
Input data unit (7).
Specify to which existing group this layer is to be added. If we want to create
a new group, simply input a new group name (9).
Fig. 4 Dialog box for importing a factor layer

After the layer is selected, a dialog box is open to specify the data type of the
layer. It may be 1 byte (unsigned), 2 byte integer, 4 byte integer, single real, or
double real. If one is not sure of its data type, she/he may open the data using ARCGIS
to query about it.
A window showing the added factor layer (Fig.5) is opened after the file has been
imported successfully.
Fig. 5 Environmental factor layer display window
2.3 Import species data

In ModEco, there are three possible data sources for importing species point layers. They
are ARCGIS shapes files, common text files, and BIP raster files.
2.3.1 Import species point layers from ARCGIS shape files [FileImport species
dataFrom shape files]
Only a point shape file can be imported by this function. In the dialog box (Fig. 6), the
shape file name and the field (4) that specifies the species names in the associated data
table should be inputted (7).
Fig. 6 Dialog box for importing shape files

2.3.2 Import species point layers from text files [FileImport species dataFrom text
files]
A text file containing species data points may look like:
X Y Elevation Name
-119.809924 37.485305 1000 lobata
-119.804719 37.471919 1200 lobata
-119.936354 37.407816 2000 lobata
In such a file, the header line specifies the attributes name (X, Y, Elevation, and
Name). In order to import a text file, the user should specify (Fig. 7):
The text file name. (4)
The program will load and parse the text file, thus the possible field names of
coordinates and species information are listed in the combo boxes (7)(9).
Thus, the user may select the correct attributes.
There are two approaches in which the species names are specified in the text
file: 1) the attribute is the name directly; and 2) the attribute value may be 1 or
0 that stands for presence or absence of a species. In the second case (11), the
user should input the species name in the dialog box (12).
Fig. 7 Dialog box for importing text files

2.3.3 Import species point layers from raster files [FileImport species dataFrom
BIP files]
If the species data points are sampled regularly, they can be managed using a common
raster file. Thus they can be imported via BIP raster file (Fig. 8). The following
information is required.
The BIP file name (4).
The species name (5) and the corresponding code in the raster file (6).
Fig. 8 Dialog box for importing raster files as species point layer
After a species layer has been imported, a window will be opened to show all the
points in this layer (Fig. 9).
10
Fig. 9 Species point layer display window
2.4 Save elements in a project

In ModEco, the following element can be saved to a file alone:
Table 1 Element that can be saved to a file alone
Element
Saving command
File extension name
Imported
species
data [FileSaving species data], smp
points
active when the associated
data window is open.
Model instance
[Right
click
the mod
itemSave]
Prediction result map
[FileSaving result map], bip or grd (ASSCI file)
active when the associated
data window is open.
If the opened project has been changed, we can use the menu command
[FileSave project] to save it. However, if an element has not been saved previously, it
will be removed from the project.
11
When the user wants to close the application, a dialog box (Fig. 10) will appear to
prompt the user to save the unsaved components in the current project.
Fig. 10 Dialog box prompting user to save components in a project
3 Interactive display
For the convenience of analyzing the data, this tool provides some features on viewing
the associated maps.
3.1 Zoom in/Zoom out

These two functions can be accessed by clicking menu [ViewZoom in] or
[ViewZoom out] or clicking two buttons
in the tool bar. The operations of
these two functions are similar to those of common GIS software packages.
3.2 Set symbols of map

3.2.1 Set color of environmental factor layers [ViewSet color of factors]
In the present version of ModEco, the factor layers are displayed using the stretched
mode. One can change its symbol by selecting different palette files (*.pal) (Fig. 11).
There are some palette files enclosed in the software package.
Fig. 11 Set symbol of factor layers

3.2.2 Set symbol color of species [ViewSet color of species]
12
In the project, different species are symbolized with different colors. The user may
modify the default color set by double-clicking the species name in the dialog box shown
in Fig. 12.
Fig. 12 Set symbol color of species
3.3 Overlay species data points [View Overlay species data points...]
The species point layers can be overlaid on a factor layer or result map (Fig. 13). Thus,
we can view the relative distribution of species points and roughly estimate the prediction
accuracy. This function can also be accessed by the toolbar button
13
Fig. 13 Overlay a species point layer on prediction result map
3.4 Overlay result map [ViewOverlay result map]

If we have obtained some prediction result maps, we can overlay one of them on the
factor maps (Fig. 14). This function can also be accessed by the toolbar button
.
14
Fig. 14 A factor layer overlaid by a prediction result map
3.5 Overlay Base map

This function is designed to overlay a common vector GIS layer (e.g. transportation map)
on the geographical layers in ModEco (Fig. 15). This function can be accessed by toolbar
button
.
15
Fig. 15 A factor layer overlaid by a base map (counties of California state)
4 Factor analysis
Before we start to predict species distribution, it is important to examine the input
environmental and species data. In ModEco, we provide some basic functions that allow
users to visualize the relationship between observed species localities and environmental
features. Functions include the factor histogram, scatter plot, and factor importance
analysis.
4.1 Factor histogram [AnalysisFactor histogram]

The Factor histogram is designed for comparing the distributions of environmental
variables between the observed species localities and the whole study area.
This function is implemented in a dialog box (Fig. 16). In the left side of the
dialog box, we can determine:
Based on which environmental factor layer should the histogram being
computed.
As a contrast, the user may select a species point layer and/or a prediction
result map to compute the associated histograms.
In the right side of the dialog box, the user can change the display mode of the
histogram by setting the following two parameters:
16
groups number used to compute the histogram.

drawing style of the histogram, may be bar or line.
Figure 16 Factor histograms (Factor: January precipitation, Species: Black oak)
4.2 Scatter plot [Analysis->Scatter plot]

The scatter plot is another graphical tool used to evaluate the ability of two selected
environmental factors in discriminating the species distribution.
In the Scatter plot dialog box (Fig. 17), the user can set:
The factor layers associated with the X and Y axis in the scatter plot. Note that
two axes may share the same layer, and a dot line will be drawn.
The species point layer to be tested.
In the scatter plot, we use different colors to denote different features. If the
presence (denoted by blue dots) and absence data (denoted by green dots) can be easily
separated (in other words, less red dots), then these two environmental factors have the
potential to discern the presence data from the absence data, and consequently, could be
included into the niche models for improving the model performance.
17
Figure 17 Distribution of black oak in feature space (annual precipitation vs. annual
temperature)
4.3 Factor importance analysis [Analysis->Factor importance analysis]

The factor importance analysis is used to examine the contributions of different
environmental factors to the overall classification accuracy based on a specific niche
model. We used Kappa values to evaluate the model performance, and the factor
importance analysis is designed for evaluating the change of classification accuracy of
with-only or without a specific environmental factor on the model (Phillips 2006).
In order to run this operation for a specific model (e.g. SVM), the user should first
determine:
The parameter values associated with this model, e.g. Gamma and Nu for
SVM model. (Detailed information will be introduced in section 5).
The factors group and species point layer involved in the analysis.
The analysis result is shown in Fig. 18. In the dialog box, when only certain
environmental factors are shown, this means that the classification accuracy is evaluated
by only one factor, and without the factor means that the model performance is evaluated
by all the factors except the factor of interest. Therefore, the environmental variable that
is important for species distribution will have a high kappa value if the model contains
this factor only, meanwhile, there could be a significant decrease of the Kappa value
when this factor is excluded from the model. On the other hand, less important factors
could have less contribution to the Kappa values with or without them. For instance, as
shown in Fig. 17, pt_pm7 is the least important factor, while pt_pm1 may be the most
18
important one for maximum a likelihood model. Note that factor importance analysis
could be an algorithm-sensitive analysis. Different models may lead to different results.
Fig. 18 Result of factor importance analysis
4.4 Factor importance analysis based on sub-groups [Analysis->Factor

importance analysis (based on sub-groups)]
The above analysis can also be executed for a sub-group of layers instead of individual
layers. In this operation, the user is required to define sub-groups based on an existing
environmental group.
In the dialog box for defining sub-groups of factors (Fig. 18), the user can:
Create a new sub-group by clicking Add sub-group;
Select a sub-group to query its included layers;
Press >> or double click a layer to add it into a subgroup;
Press << to remove a layer from the subgroup.
The analysis result of this function is similar to that based on individual layers
(Fig. 19).
19
Fig. 19 Dialog box for defining sub-groups of factors
4.5 Principal component analysis [Analysis->Principal component

analysis]
If there exists strong multicollinearity among the environmental factors. Principal
component analysis (PCA) is designed to transform the original variables into
uncorrelated factors. In ModEco, running PCA includes the following steps:
1. Specify the environmental factor layers to run PCA (Fig. 20).
Fig. 20 Specify the environmental factor layers in PCA

2. After the eigenvectors and eigenvalues have been computed, a dialog box
appears to prompt the user to select result components and input the group name and file
name to save the result data (Fig. 21).
20
Fig. 21 Dialog box for saving PCA result maps

3. The result layers of PCA will be added to the project as a new data group. Fig.
22 demonstrates one layer in them.
Fig. 22 Data window of PCA result
5 Model training and prediction

In the current version of ModEco, eleven models are implemented. They are:
Support vector machine (SVM)
21
BioClim
Domain
Generalized linear model (GLM)
Maximum likelihood classification
Artificial neural network (ANN) trained using back-propagation algorithm
Artificial neural network (ANN) trained using particle swarm optimization
(PSO) algorithm
Rough set
Classification and Regression Tree (CART)
Maximum Entropy (MaxEnt)
Ensemble model
Different models are suitable for different training data. First of all, some models
(one-class SVM, BioClim, and Doman) are one-class models, thus the involved species
point layer can only contain one species. Two-class species layers are acceptable;
however, the absence points are filtered before training. The other models support two or
more classes. Therefore, one-class species layers are not acceptable for them. In order to
employ such models, the pseudo absence approach (Section 5.4) can be adopted to make
the species point layer two-classed.
5.1 Models introduction

Support Vetcor Machines (SVMs)
SVMs, originally developed by Vapnik (Vapnik 1995), are considered to be a new
generation of learning algorithms. SVMs have several appealing characteristics for
modelers, including: they are statistically based models rather than loose analogies with
natural learning systems, and they theoretically guarantee performance (Cristianini &
Scholkopf 2002). Typically, SVMs are designed for two-class problems where both
positive and negative objects exist. For these classification problems, two-class SVMs
seek to find a hyperplane in the feature space that maximally separates the two target
classes. Scholkopf et al. (Scholkopf et al. 1999) developed one-class SVMs to deal with
the one-class problem. The applications of one-class SVMs include document
classification (distinguishing one specific category from other categories) (Manevitz &
Yousef 2002), texture segmentation (distinguishing one specific texture from other
textures) (Tax & Duin 2002), and image retrieval (retrieving a subset of images based on
the similarity between given query images) (Lai et al. 2002). Recently, Guo et al. (2005)
applied one-class SVM in modeling a newly found tree disease in California and found
that one-class SVM is a promising addition to environmental niche modeling approaches.
BioClim
The BioClim model identifies locations where all environmental factors fall
within the certain percentiles (e.g. 95%) of the observation records (Busby 1986). Thus,
BioClim defines the environmental envelope for the target species as a hyperbox. The
idea of BioClim is analogous to the paralleled piped or boxcar image classification
methods (Carpenter et al. 1993). Despite the coarse description of environmental
envelopes, the BioClim model has been commonly used in niche modeling because the
22
concept of the BioClim model is very straightforward and often provides reasonably good
results (Rissler et al. 2006). Another advantage of the BioClim model is that it only
requires one free parameter (i.e. percentile), and can be easily implemented in a
geographic information system. Thus, the BioClim model is often used to provide a base
result for comparisons among other advanced niche models (Elith et al. 2006).
Domain
The Domain model is considered an improvement over BioClim model
(Carpenter et al. 1993). The DOMAIN procedure assigns a classification value to an
unknown site based on the distance of its closest similar site in environmental space. The
similarity metric is the only free parameter needed in the DOMAIN model. Essentially,
the DOMAIN model is analogous to nearest neighbor classification which is commonly
used in spatial interpolation or image classification. On a recent method comparison
(Rissler et al. 2006), the DOMAIN model has been demonstrated to be a very competitive
model based on its performance and relatively easy implementation.
Generalized Linear Model (GLM)
GLM is a generalization of the general linear models which can relax the
distribution and constancy of variances that are commonly required by traditional linear
models such as linear regression. The GLM is commonly used to model dependent
variables that are discrete distributions and are nonlinearly related to independent
variables through a link function (Guisan et al. 2002). Consequently, the GLM model is
particularly suitable for predicting species distributions, and has been proven to be
successful in various ecological applications (Guisan et al. 2002; Latimer et al. 2006).
Three link functions are currently implemented in ModEco: logit link, log-log link, and
complementary log-log link.
Maximum Likelihood Classification (MLC)
MLC is one of the most popular classification methods in remote sensing
(Richards & Jia 1999). The idea of the MLC is to label an unknown location to the class
(either presence or absence) of the maximum likelihood. The likelihood is defined as the
posterior probability of the unknown location belonging to either presence or absence.
The MLC method relies heavily on a normal distribution of each environmental factor,
and it takes into consideration the variance and covariance of environmental factors of
presence and absence data by using a covariance matrix. The MLC method is considered
to be one of the most accurate classifiers if the data meet the assumptions (Richards & Jia
1999; Duda et al. 2001).
Artificial Neural Networks (ANNs)
ANNs which were originally inspired by the central nervous system have been
commonly used to model complex relationship between dependent variables and
independent variables or used to mine patterns in data. The idea of ANNs is to extract
linear combinations of the input variable as derived features, and model the output as a
nonlinear function of these derived features (Hastie et al. 2001). ANNs have been
successfully used to predict species distribution (Manel et al. 1999a; Maravelias et al.
2003). In ModEco, a 4 layer feed-forward ANN (one input layer, one output layer, and
23
two hidden layers) that can be trained using backpropagation algorithm (Werbos 1994)
and Particle Swarm Optimization (PSO) algorithm (Eberhart & Kennedy 1995) is
implemented.
Rough Set
The idea of rough set was proposed by Pawlak (1991) as a new mathematical tool
to deal with vague concepts. Rough set based reduction is particularly useful for rule
generation and feature selection in data mining. At present, rough sets have been viewed
as a theoretical basis for some problems in machine learning. It has been widely used in
pattern recognition, classification, and the other related areas. In ModEco, we used a
heuristics algorithm for the rule reduction, since it is a NP-hard problem.
Classification and Regression Tree (CART)
CART seeks to recursively partition the response variable into increasingly pure
binary subsets with splits and stop criteria (Venables & Ripley 2002). Tree can overgrow
to fit exactly the training data. The method has several advantages: 1) it can handle any
combination of categorical data and continuous data in the classification and regression.
For example, in this study, we can use the aspect directly into the classification tree; 2)
the results from CART are presented by a set of if then logical splits that allows for
accurate prediction and classification of cases, consequently, it is easy to interpret the
CART results; and 3) it has the ability to capture hierarchical and nonlinear relationship
among predictor variables (De'ath & Fabricius 2000).
Maximum Entropy (MaxEnt)
MaxEnt, first proposed by Jaynes (1957), and has been widely used in natural
language processing, text segmentation, part-of-speech tagging, prepositional phrase
attachment, and niche modeling. Entropy is a fundamental concept in information theory,
and it measures how much choice is involved in the selection of an event (Shannon,
1948). The principle of maximum entropy indicates that the distribution model that
satisfies any given constraints should be as uniform as possible. This agrees with
everything that is known, but carefully avoids assuming anything that is not known.
MaxEnt has been shown to be a promising approach to modeling species distributions
(Philllips et al. 2004, Elith et al. 2006).
5.2 Train a model [ModelTrainThe concret model]

For the models in ModEco, the training procedures are similar. The only difference is that
different models need different parameters. Thus, we will take SVM as an example to
introduce this function.
As shown in Fig. 23, in order to train a model, the following information is
required:
The environmental factor group and the involved layers. The user may select a
subset of the layers by clicking the corresponding check box.
24
The species point layers to extract the training data.

Note that ModEco supports batch run mode for training a model. If the user
checks Batch run, the species point layer does not need to be specified.
Consequently, the model will run more than one time for all species point
layers in the project.
If the box Predict after training is checked, then the result species
distribution map will be generated using the trained model and the selected
factor layers.
Fig. 23 Dialog box for training SVM model

After the training and prediction are complete, two items are added to the opened
project:
Result map
Model instance
By clicking an item in the project, the user can query its properties. Fig. 24 and
Fig. 25 demonstrate the properties of a model instance and a prediction result map
respectively.
Fig. 24 Properties of a trained model instance
25
Fig. 25 Properties of a prediction result map

For SVM models, the associated parameters include:
z SVM types: May be C-SVC (support vector classification), nu-SVC, oneclass SVM (distribution estimation), epsilon-SVR (support vector regression),
and nu-SVR. If the species data point layer is one-class, only one-class SVM
is suitable.
z Kernel function types: May be linear; polynomial, radial basis function
(RBF), and sigmoid. The default option is RBF.
z Cost: It is required for training C-SVC, epsilon-SVR, and nu-SVR. The
default value is 1.
z Gamma: It is a coefficient in the kernel function. The default value is 1/k,
where k is the number of involved fatcors.
z Nu: It is required for nu-SVC, one-class SVM, and nu-SVR. The default
value is 0.05.
z Degree: It is the degree in the kernel function. The default value is 3.
The user may press the Set SVM button (Fig. 16) to set them.
For the other models, less parameters are required:
BioClim
Percentiles of the observed records. Permitted range: 50~100; default value:
90%.
DOMAIN
Similarity to the observed records. Permitted range: 0.8~1; default value:
0.95.
GLM
Link function types. May be LOGIT, LOG-LOG, and complementary LOGLOG
Threshold. Permitted range: 0~1; default value: 0.5.
MLC
No required parameter.
26
BP-ANN
Momentum. Default value: 0.3.
Learning rate. Default value: 0.1.
PSO-ANN
The number of particles. Default value: 10.
Rough set
The number of discretized grades. Default value: 10.
The option to choose lower approximation or upper approximation to training
the model.
Classification Tree
The number of trails. Default value: 10.
Window size. Default value: 20.
Pruning confidence level. Default value: 0.25.
Maximum Entropy
Validation Set. Default: 25%.
Empirical Threshold that is used to generate the binary output. Default: 0.05
omission rate.
Fig. 26 demonstrates the prediction result maps using the eight models for the
same species.
Fig. 26 Prediction result maps using the eight models for the same species
27
5.3 Predict using a trained model [ModelPredict]

After a model has been trained, it can be used to predict species distribution based on
another environmental factor group. According to the keyword, the layers in different
groups can be matched automatically. For example, if there is a layer with the keyword
January and Temperature in the training data group, then the layer with the same
keywords in the prediction group will be loaded to play the same role as the first layer.
As shown in Fig. 27, the user may simply specify a trained model instance and the
involved environmental data group to run the model and to predict the species
distribution.
Fig. 27 Dialog box for running a trained model
5.4 Run two-class model based on pseudo absence points approach

[Pseudo-absence based two class model...]
For a one-class species point layer, pseudo absence points should be created if we want to
employ a two-class model. ModEco provides two approaches to generate pseudo absence
points and train a model: loop mode and iterate mode. The inputs of these two approaches
are similar. They include:
The environmental factor group and the layers involved in the model;
Species data points. It should be a one-class layer, i.e., it only contains the
presence points;
Maximum loops and classification threshold. The model can be trained many
times for different pseudo-absence points. For iterative mode, the pseudoabsence points are obtained based on the previously predicted result map.
However, for loop mode, the absence points are simply computed based on
the distribution of known presence points. If the number of loops is n, after
training and predicting for n times, a pixel in the result raster map may be
predicted to be presence for m times and absence for n-m times. Thus, we
need to specify a threshold that this pixel should be classified as presence if m
is lager than this threshold.
Range definition layer. We can specify a layer that the pseudo-absence points
are generated within.
Model. In this version of ModEco, 2-Class SVM, GLM, and BP-ANN are
supported in this function. For each model, the user should input different
associated parameters (Fig. 28).
28
Fig. 28 Dialog box for running a two class model based on pseudo-absence points
6 Accuracy assessment
6.1 Accuracy of prediction result [AnalysisAccuracy of prediction]
Once we have obtained the predicted distribution map, we can compute the accuracy of
the prediction by an overlay operation. To perform this function, the user needs to select
the result map and the relevant species point layer. The computed accuracy is shown in
Fig. 29. It includes:
The error matrix;
The Kappa value;
The true positive rate;
The area of prediction species distribution.
If the species point layer is one-class, the error matrix and Kappa value cannot be
computed.
Fig. 29 Accuracy assessment for the prediction result
29
6.2 Cross-validation accuracy assessment [ModelEvaluate

modelThe concrete model]
The idea of cross-validation accuracy assessment is implemented as follows: first, the
training data is randomly split into n subsets of equal size (n is specified by users (e.g. 5
or 10). Second, each subset is in turn used for accuracy testing and the remaining n-1
subsets for training. Finally, the total accuracy is estimated by averaging the accuracy of
each test.
In order to run this operation, the user needs to input the number of folds for
cross-validation besides the necessary parameters. Its default value is 10 (Fig. 30).
Fig. 30 Dialog box for n-fold cross-validation

The evaluation results are listed in Fig. 31. This function supports batch running.
If a species point layer is two-class, the Kappa value is computed; otherwise, the true
positive rate is obtained.
Fig. 31 Result of n-fold cross-validation
30
6.3 ROC curve [ModelROC curveThe concrete model]

The Reciever Operating Characteristic (ROC) curve is a plot of the sensitivity (true
positives rate) vs. 1-specificity (true negative rate) by varying discrimination thresholds.
One advantage of ROC curve is that it does not depend on a specific threshold.
Comparisons among different ROC curves often need to calculate the area under the
ROC (AUC).
In order to generate a ROC curve for a specific model, the relevant parameters
should vary in a range. For each parameter value, the model is trained and run to compute
a respective true positive rate and true negative rate. Thus the ROC and AUC can be
obtained.
Fig. 32 demonstrates the ROC curve as well as the AUC. Note that this function
supports batch running. If the user chooses batch run mode, the AUC will be computed
for each species point layer in the project (Fig. 33). The more detailed information is
outputted in the output window. The user may use the menu command [ViewOutput]
to open this window (Fig. 34).
Figure 32: ROC curve and AUC value for predicting Black Oaks in California based on
the BioClim model.
31
Fig. 33 AUC computed in batch run mode
Fig. 34 Output window for outputting detailed information

The current version of ModEco supports four models for the ROC function:
SVM, BioClim, Domain, and GLM. For different models, the parameters to be tuned are
different:
One-class SVM: Nu;
Two-class SVM: Cost or Gamma (determined by the user);
BioClim: Percentile;
Domain: Similarity threshold;
GLM: Classification Threshold.
6.4 True positive rate/ prediction area plot [Model True positive rate/
prediction area plot]
For the real presence-only data (one-class species point layer), which are very common in
ecological observation data, the above accuracy measures are not applicable. One
solution is to generate pseudo absence data and assume they are real absence data,
consequently, conventional accuracy assessment methods such as the Kappa values and
cross validation could be applied. Alternatively, Engler et al. (2004) proposed that a
good prediction model with presence-only data should predict a potential area as small as
possible while still covering the maximum number that the species occurs. Guo et al.
(2005) demonstrated the concept for selecting parameters of one-class SVM. ModEco
allows users to plot the true positive rate vs. the prediction area in aiding users to select
appropriate parameters for the model.
32
This feature is only available for one-class species point layers. Its
implementation is similar to that of the ROC curve. The difference between them is that
the prediction area is obtained instead of the true negative rate. (For one-class data, it
cannot be computed) (Fig. 35).
Fig. 35 Dialog box for showing the true positive rate/ prediction area plot
6.5 Search maximum Kappa value [ModelSearch maximum]

The Kappa value is often used in evaluating classification accuracy (Loiselle et al. 2003;
Elith et al. 2006), it takes into consideration the effect of random change on an accuracy
assessment. The maximum Kappa is calculated by iteratively selecting the threshold until
the model reaches its maximum Kappa value. Therefore, the maximum Kappa values can
be considered the best possible accuracy achieved by the model with a specific set of
parameters.
In ModEco, this function has the following characteristics:
It is only suitable for two-class species layers, since Kappa values cannot be
computed based on one-class layers.
It supports batch running mode. Thus, we can find the optimized parameters
for different species layers.
Four models are involved in this function. They are SVM, BioClim, Domain,
and GLM. Each of them requires different parameters to be tuned:
-- One-class SVM: Nu and Gamma;
-- Two-class SVM: Gamma and Cost;
-- BioClim: Percentiles;
-- Domain: Similarity;
-- GLM: Threshold.
Fig. 36 shows the search result for four species. The searched parameter can be
used to train the model and predict the species distribution. Note that the searched
33
maximum Kappa value may be different from the value of assessing the accuracy of
result map (Section 6.1), since cross-validation is used in this function.
Fig. 36 Result of searching maximum Kappa values for BioClim model
7 Tutorial
In this section, we will present a demonstration on how to use ModEco. After installing
the software, you can use open a project called Species.sml under the directory where
ModEco is installed (e.g. c:\program files\ModEco\Data\Species.sml). The project file is
saved as xml file format, therefore, users can open the file in a text editor. For example,
Notepad ++ is able to read and edit the xml file (available at: http://notepadplus.sourceforge.net/uk/site.htm). The Species.sml file contains the following data set in
California.
7.1 Data description

The example data include one data group containing 11 environmental factor layers:
DEM (dem300_res)
Annual Precipitation (pt_pa)
Precipitation of January (pt_pm1)
Precipitation of April (pt_pm4)
Precipitation of July (pt_pm7)
Precipitation of October (pt_pm10)
Annual Temperature (ta_pa)
Temperature of January (ta_pm1)
Temperature of April (ta_pm4)
Temperature of July (ta_pm7)
Temperature of October (ta_pm10)
and 11 species data point layers:
Blue oak
34
Black oak
Coast oak
Calibay
Fremont CW
Interior oak
Canyon oak
Valley oak
Tan oak
Madrone
Oregon oak.
The Species data are originally presence-only, pseudo-absence data are generated using
the Function "Create Pseudo-absence points". For one class method (such as BioClime,
Domain, One-class SVM, or background vs. presence only data model such as MaxEnt,
GLM, SVM), only presence data will be used in the analysis, and pseudo-absence points
will be ignored in the analysis. Note that for background vs. presence-only data model,
ModEco will generate the background points for the model.
7.2 Create a project to manage data

In addition to open the existing project file, users can also create their own project file to
manage the data. After the ModEco program has been started, an empty new project is
created. The data can be added to the new project. Use command [FileAdd
environmental layer] to add a factor layer. Note their data units are feet (DEM layer),
mm (precipitation layers), and degree (temperature layers). The associated keywords may
be entered for each layer.
Click menu command [FileAdd species data] to add a species data point
layer. Repeat this operation to add all 11 layers. Then you may save the project.
7.3 Estimate factor importance

In this tutorial, we select BioClim model and species Calibay to execute the analyses.
Choose menu command [AnalysisFactor importance analysisBioCLIM]. In the
dialog box, set the species point layer to be Calibay (the layer name is Calibar_R) and
percentile to be 95%. Run the model and the analysis result is shown in Fig. 37.
35
Fig. 37 Factor importance analysis result for Calibay based on BioCLIM model
According to the result, three factors: DEM, pt_pm7, and ta_pm1 are less
important. We thus can use the other eight factors to predict the species distribution. Note
the analysis result is limited to BioClim model with 95 percentiles and the species
Calibay. For the other models and species, the result may be different.
7.4 Search maximum Kappa value

In the above analysis, the parameter value (95%) is arbitrarily determined. We can search
an optimized value using command [ModelSearch maximum Kappa
valueBioCLIM]. With the selected eight factors, we will achieve the maximum Kappa
value 0.66 when the percentile is 94.8%. Note that the result values may change slightly
if we run this function more than one time since it is based on random cross-validation.
7.5 Train and run the model

Using the eight factor layers and the searched parameter value, we may train and run the
model. Fig. 38 shows the input dialog box for this function, and the prediction result map
overlaid by the species points is shown in Fig. 39.
36
Fig. 38 Dialog box for inputting information of BioClim model
Fig. 39 Prediction result map overlaid by species data points
7.6 Assess the accuracy of the prediction result

Based on the prediction result map, we can assess the accuracy using the command
[AnalysisAccuracy of prediction]. The result is shown in Fig. 40. We can see the
Kappa index is 0.6608, which is a rather high value.
37
Fig. 40 Accuracy of prediction result
7.7 Save the prediction result map

We may save the prediction result into BIP file or ASCII grid file using command
[FileSave result map]. They can be loaded in the common GIS software products,
such as ARCGIS, for further analysis.
8 Acknowledgement
This research is partially supported by the National Science Foundation (BDI-0742986)
and BioGeomancer project from the Gordon and Betty Moore Foundation. The
development of ModEco also benefited from discussion and comments from John
Wieczorek and Craig Moritz at the Museum of Vertebrate Zoology at UC Berkeley. We
also thank Otto Alvarez, Hong Yu, Miguel Fernandez, Andew Zumkehr for help on
software debugging and website maintenance.
9 References
Anderson, R.P. and Martinez-Meyer, E., 2004, "Modeling species' geographic
distributions for preliminary conservation assessments: An implementation with
the spiny pocket mice (heteromys) of ecuador", Biological Conservation, 116:
167-179.
Bian, L. and West, E., 1997, "Gis modeling of elk calving habitat in a prairie
environment with statistics", Photogrammetric Engineering & Remote Sensing,
63: 161-167.
Broenniman, O., Thuiller, W., Hughes, G., Midgley, G.F., Alkemade, J.M.R. and Guisan,
A., 2006, "Do geographic distribution, niche property and life form explain
plants' vulnerability to global change?" Global Change Biology, 12: 1079-1093.
Busby, J.R., 1986, "A biogeoclimatic analysis of nothofagus cunninghamii (hook.) oerst.
In southeastern australia", Australian Journal of Ecology, 11: 1-7.
Carpenter, G., Gillison, A.N. and Winter, J., 1993, "Domain - a flexible modeling
procedure for mapping potential distributions of plants and animals", Biodiversity
and Conservation, 2: 667-680.
38
Chefaoui, R.M., Hortal, J. and Lobo, J.M., 2005, "Potential distribution modelling, niche
characterization and conservation status assessment using gis tools: A case study
of iberian copris species", Biological Conservation, 122: 327-338.
Cristianini, N. and Scholkopf, B., 2002, "Support vector machines and kernel methods the new generation of learning machines", Ai Magazine, 23: 31-41.
De'ath, G. and Fabricius, K., 2000, "Classification and regression trees: A powerful yet
simple technique for ecological data analysis", Ecology, 81: 3178-3192.
Duda, R.O., Hart, P.E. and Stork, D.G., 2001, Pattern classification, New York: John
Wiley & Sons.
Eberhart, R.C. and Kennedy, J., 1995, A new optimizer using particle swarm theory.
Proceedings of the Sixth International Symposium on Micromachine and Human
Science, Japan: Nagoya, 39-43.
Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., Hijmans, R.J.,
Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle,
B.A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M.,
Peterson, A.T., Phillips, S.J., Richardson, K., Scachetti-Pereira, R., Schapire,
R.E., Soberon, J., Williams, S., Wisz, M.S. and Zimmermann, N.E., 2006,
"Novel methods improve prediction of species' distributions from occurrence
data", Ecography, 29: 129-151.
Engler, R., Guisan, A. and Rechsteiner, L., 2004, "An improved approach for predicting
the distribution of rare and endangered species from occurrence and pseudoabsence data", Journal of Applied Ecology, 41: 263-274.
Fabricius, K. and De'ath, G., 2001, "Environmental factors associated with the spatial
distribution of crustose coralline algae on the great barrier reef", Coral Reefs, 19:
303-309.
Felicisimo, A.M., Frances, E., Fernandez, J.M., Gondalez-Diez, A. and Varas, J., 2002,
"Modeling the potential distribution of forests with a gis", Photogrammetric
Engineering & Remote Sensing, 68: 455-462.
Feria, T.P. and Peterson, A.T., 2002, "Prediction of bird community composition based
on point-occurrence data and inferential algorithms: A valuable tool in
biodiversity assessments", Diversity and Distributions, 8: 49-56.
Fielding, A.H. and Haworth, P.F., 1995, "Testing the generality of bird-habitat models",
Conservation Biology, 9: 1466-1481.
Fonseca, M.S., Whitfield, P.E., Kelly, N.M. and Bell, S.S., 2002, "Statistical modeling of
seagrass landscape pattern and associated ecological attributes in relation to
hydrodynamic gradients", Ecological Applications, 12: 218-237.
Frescino, T.S., Edwards, T.C. and Moisen, G.G., 2001, "Modeling spatially explicit
forest structural attributes using generalized additive models", Journal of
Vegetation Science, 12: 15-26.
Funk, V.A. and Richardson, K.S., 2002, "Systematic data in biodiversity studies: Use it
or lose it", Systematic Biology, 51: 303-316.
Graham, C.H., Ferrier, S., Huettman, F., Moritz, C. and Peterson, A.T., 2004, "New
developments in museum-based informatics and applications in biodiversity
analysis", Trends in Ecology & Evolution, 19: 497-503.
39
Graham, C.H., Moritz, C. and Williams, S.E., 2006, "Habitat history improves prediction
of biodiversity in rainforest fauna", Proceedings of the National Academy of
Sciences of the United States of America, 103: 632-636.
Guisan, A., Edwards, T.C. and Hastie, T., 2002, "Generalized linear and generalized
additive models in studies of species distributions: Setting the scene", Ecological
Modelling, 157: 89-100.
Guo, Q.H., Kelly, M. and Graham, C.H., 2005, "Support vector machines for predicting
distribution of sudden oak death in california", Ecological Modelling, 182: 75-90.
Hastie, T., Tibshirani, R. and Friedman, J., 2001, The elements of statistical learning:
Data mining, inference and prediction., New York: Springer.
Iguchi, K., Matsuura, K., McNyset, K.M., Peterson, A.T., Scachetti-Pereira, R., Powers,
K.A., Vieglais, D.A., Wiley, E.O. and Yodo, T., 2004, "Predicting invasions of
north american basses in japan using native range data and a genetic algorithm",
Transactions of the American Fisheries Society, 133: 845-854.
Jaynes, E.T., 1957, Information Theory and Statistical Mechanics. Physical Review, 106,
pp. 620630.
Kelly, N.M., 2002, Monitoring sudden oak death in california using high-resolution
imagery. In: pp. 799-810. USDA-Forest Service.
Kelly, N.M., Fonseca, M. and Whitfield, P., 2001, "Predictive mapping for management
and conservation of seagrass beds in north carolina", Aquatic Conservation:
Marine and Freshwater Ecosystems, 11: 437-451.
Kueppers, L.M., Snyder, M.A., Sloan, L.C., Zavaleta, E.S. and Fulfrost, B., 2005,
"Modeled regional climate change and california endemic oak ranges", PNAS %R
10.1073/pnas.0501427102, 102: 16281-16286.
Lai, C., Tax, D., Duin, R., Pekalska, E. and Paclik, P., 2002, On combining one-class
classifiers for image database retrieval. In: Roli, F. & Kittler, J. (eds.) Multiple
classifier systems, pp. 212-221. Springer-Verlag, Berlin.
Latimer, A.M., Wu, S.S., Gelfand, A.E. and Silander, J.A., 2006, "Building statistical
models to analyze species distributions", Ecological Applications, 16: 33-50.
Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J. and Aulagnier, S., 1996,
"Application of neural networks to modelling nonlinear relationships in ecology",
Ecological Modelling, 1 -13.
Livingston, S.A., Todd, C.S., Krohn, W.B. and Owen, R.B., 1990, "Habitat models for
nesting bald eagles in maine", Journal of Widlife Management, 54: 644-665.
Loiselle, B.A., Howell, C.A., Graham, C.H., Goerck, J.M., Brooks, T., Smith, K.G. and
Williams, P.H., 2003, "Avoiding pitfalls of using species distribution models in
conservation planning", Conservation Biology, 17: 1591-1600.
Manel, S., Dias, J.M., Buckton, S.T. and Ormerod, S.J., 1999a, "Alternative methods for
predicting species distribution: An illustration with himalayan river birds",
Journal of Applied Ecology, 36: 734-747.
Manel, S., Dias, J.M. and Ormerod, S.J., 1999b, "Comparing discriminant analysis,
neural networks and logistic regression for predicting species distributions: A case
study with a himalayan river bird", Ecological Modelling, 120: 337-347.
Manevitz, L.M. and Yousef, M., 2002, "One-class svms for document classification",
Journal of Machine Learning Research, 2: 139-154.
40
Maravelias, C.D., Haralabous, J. and Papaconstantinou, C., 2003, "Predicting demersal

fish species distributions in the mediterranean sea using artificial neural
networks", Marine Ecology-Progress Series, 255: 249-258.
Mastrorillo, S., Lek, S., Dauba, F. and Belaud, A., 1997, "The use of artifical neural
networks to predict the presence of small-bodied fish in a river", Freshwater
Biology, 237-246.
Mladenoff, D.J., Sickley, T.A., Haight, R.G. and Wydeven, A.P., 1995, "A regional
landscape analysis and prediction of favorable grey wolf habitat in the northern
great lakes region", Conservation Biology, 9: 279-294.
Moisen, G.G. and Frescino, T.S., 2002, "Comparing five modelling techniques for
predicting forest characteristics", Ecological Modelling, 157: 209-225.
Pawlak, Z., 1991, Rough Sets: Theoretical Aspects of Reasoning About Data. Dordrecht:
Kluwer Academic Publishing.
Pearce, J.L. and Boyce, M.S., 2006, "Modelling distribution and abundance with
presence-only data", Journal of Applied Ecology, 43: 405-412.
Peterson, A.T., Ball, L.G. and Cohoon, K.P., 2002, "Predicting distributions of mexican
birds using ecological niche modelling methods", Ibis, 144: E27-E32.
Phillips, S., 2006, Maxent software for species habitat modeling. In:
http://www.cs.princeton.edu/~schapire/maxent/.
Richards, J.A. and Jia, X., 1999, Remote sensing digital image analysis: An introduction,
New York: Springer.
Rissler, L.J., Hijmans, R.J., Graham, C.H., Moritz, C. and Wake, D.B., 2006,
"Phylogeographic lineages and species comparisons in conservation analyses: A
case study of california herpetofauna", American Naturalist, 167: 655-666.
Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J. and Williamson, R.C., 1999,
"Estimation the support of a high-dimensional distribution", Technical Report
MSR-TR-99-87, Microsoft Research.
Spitz, F. and Lek, S., 1999, "Environmental impact prediction using neural network
modelling. An example in wildlife damage", Journal of Applied Ecology, 36:
317-326.
Stockman, A.K., Beamer, D.A. and Bond, J.E., 2006, "An evaluation of a garp model as
an approach to predicting the spatial distribution of non-vagile invertebrate
species", Diversity and Distributions, 12: 81-89.
Stockwell, D. and Peters, D., 1999, "The garp modelling system: Problems and solutions
to automated spatial prediction", International Journal of Geographical
Information Science, 13: 143-158.
Stockwell, D.R.B. and Peterson, A.T., 2002, "Effects of sample size on accuracy of
species distribution models", Ecological Modelling, 148: 1-13.
Tax, D.M.J. and Duin, R.P.W., 2002, "Uniform object generation for optimizing oneclass classifiers", Journal of Machine Learning Research, 2: 155-173.
Thuiller, W., 2004, "Patterns and uncertainties of species' range shifts under climate
change", Global Change Biology, 10: 2020-2027.
Thuiller, W., Richardson, D.M., Pysek, P., Midgley, G.F., Hughes, G.O. and Rouget, M.,
2005, "Niche-based modelling as a tool for predicting the risk of alien plant
invasions at a global scale", Global Change Biology, 11: 2234-2250.
Vapnik, V., 1995, The nature of statistical learning theory, New York: Springer-Verlag.
41
Venables, W.N. and Ripley, B.D. 2002. Modern applied statistics with s. Springer, New
York.
Werbos, P.J., 1994, The Roots of Backpropagation, Wiley.
Wieczorek, J., Guo, Q.G. and Hijmans, R.J., 2004, "The point-radius method for
georeferencing locality descriptions and calculating associated uncertainty",
International Journal of Geographical Information Science, 18: 745-767.
Wielanda, R., Vossa, M., Holtmanna, X., Mirschela, W. and Ajibefunb, I., 2006, "Spatial
analysis and modeling tool (samt): 1. Structure and possibilities", Ecological
Informatics, 1: 67-76.
42

ModEco Manual

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ModEco Manual

Uploaded by

Copyright:

Available Formats

ModEco: Integrated Software for

Ecological Niche Modeling (Version 1)

6.1 Accuracy of prediction result [AnalysisAccuracy of prediction] ............... 29

1.2 Functions of the toolbox

Fig. 1 Software architecture of ModEco

1.3 User interface

Fig. 2 User interface of ModEco

Fig. 3 XML schema of a project in ModEco

2.2 Import environmental factor layers

Fig. 4 Dialog box for importing a factor layer

Fig. 5 Environmental factor layer display window

2.3 Import species data

Fig. 6 Dialog box for importing shape files

Fig. 7 Dialog box for importing text files

Fig. 9 Species point layer display window

2.4 Save elements in a project

Fig. 10 Dialog box prompting user to save components in a project

3.1 Zoom in/Zoom out

3.2 Set symbols of map

Fig. 11 Set symbol of factor layers

Fig. 12 Set symbol color of species

Fig. 13 Overlay a species point layer on prediction result map

3.4 Overlay result map [ViewOverlay result map]

Fig. 14 A factor layer overlaid by a prediction result map

3.5 Overlay Base map

Fig. 15 A factor layer overlaid by a base map (counties of California state)

4.1 Factor histogram [AnalysisFactor histogram]

groups number used to compute the histogram.

Figure 16 Factor histograms (Factor: January precipitation, Species: Black oak)

4.2 Scatter plot [Analysis->Scatter plot]

4.3 Factor importance analysis [Analysis->Factor importance analysis]

Fig. 18 Result of factor importance analysis

4.4 Factor importance analysis based on sub-groups [Analysis->Factor

Fig. 19 Dialog box for defining sub-groups of factors

4.5 Principal component analysis [Analysis->Principal component

Fig. 20 Specify the environmental factor layers in PCA

Fig. 21 Dialog box for saving PCA result maps

Fig. 22 Data window of PCA result

5 Model training and prediction

5.1 Models introduction

5.2 Train a model [ModelTrainThe concret model]

The species point layers to extract the training data.

Fig. 23 Dialog box for training SVM model

Fig. 24 Properties of a trained model instance

Fig. 25 Properties of a prediction result map

5.3 Predict using a trained model [ModelPredict]

Fig. 27 Dialog box for running a trained model

5.4 Run two-class model based on pseudo absence points approach

Fig. 29 Accuracy assessment for the prediction result

6.2 Cross-validation accuracy assessment [ModelEvaluate

Fig. 30 Dialog box for n-fold cross-validation

Fig. 31 Result of n-fold cross-validation

6.3 ROC curve [ModelROC curveThe concrete model]

Fig. 33 AUC computed in batch run mode

Fig. 34 Output window for outputting detailed information

6.5 Search maximum Kappa value [ModelSearch maximum]

Fig. 36 Result of searching maximum Kappa values for BioClim model

7.1 Data description

7.2 Create a project to manage data

7.3 Estimate factor importance

7.4 Search maximum Kappa value

7.5 Train and run the model

Fig. 38 Dialog box for inputting information of BioClim model