You are on page 1of 57

UPTEC IT05015

Degree Project 20 p
February 2005
Eikos
A Simulation Toolbox for Sensitivity Analysis
Per-Anders Ekstrom
This page intentionally left blank
Faculty of Science and Technology
The UTH-unit
Visiting address:

Angstromlaboratoriet
Laderhyddsvagen 1
House 4, Level 0
Postal address:
Box 536
SE-751 21 Uppsala
SWEDEN
Telephone:
+46 (0)18471 30 03
Telefax:
+46 (0)18471 30 00
Homepage:
http://www.teknat.uu.se/student
Abstract
Eikos
A Simulation Toolbox for Sensitivity Analysis
Per-Anders Ekstrom
Computer-based models can be used to approximate real life processes. These mod-
els are usually based on mathematical equations which are dependent on several
variables. The predictive capability of models is therefore limited by the uncer-
tainty in the value of these. Sensitivity analysis is used to apportion the relative
importance each uncertain input factor has on the output variation. Sensitivity
analysis is therefore an essential tool in simulation modeling and for performing risk
assessments.
Simple sensitivity analysis techniques based on tting the output to a linear equation
are often used, for example correlation or linear regression coecients. These meth-
ods work well for linear models, but for non-linear models their sensitivity estimations
are not accurate. Usually models of complex natural systems are non-linear.
Various software products exist for sensitivity analysis; unfortunately these products
cannot eciently be linked to the Matlab environment. The algorithms used in
these products are also hidden. Many model builders (in the radioecological eld)
develop their models in the Matlab/Simulink environment. A Matlab toolbox for
sensitivity analysis is therefore interesting.
Within the scope of this work, various sensitivity analysis methods, which can cope
with linear, monotone as well as non-monotone problems, have been implemented,
in Matlab, in a generic way. A graphical user interface has also been developed,
from which the user easily can load or call their model and perform a sensitivity
analysis.
The sensitivity analysis methods has been benchmarked with well-known test func-
tions and compared with other sensitivity analysis software, with successful results.
With this toolbox, model builders can easily perform simple as well as advanced
sensitivity analysis on their models.
Supervisors: Rodolfo Avila Moreno and Robert Broed
Subject controller: Virgil Stokes
Examiner: Lars-Henrik Eriksson
UPTEC IT05015
Sponsors: Facilia AB, The Norwegian Radiation Protection Authority and
Posiva Oy
This page intentionally left blank
Sammanfattning
Eikos
En verktygslada for kanslighetsanalys
Mitt examensarbete, som avslutar mina civilingenjors-
studier vid Uppsala universitet, sammanfattas i detta
dokument. Arbetet har utforts pa ett konsultforetag i
Stockholm som heter Facilia AB. Uppdrag som Facilia
AB far kan exempelvis vara att utfora riskanalyser vid
stralskydds- och karnavfallsfragor.
Vid genomforandet av en riskanalys anvands dator-
baserade modeller for att uppskatta faktiska processer
i naturen. Modellerna bygger oftast pa matema-
tiska ekvationer, vilka i sin tur ar beroende av en
stor mangd faktorer. Exempel pa sadana faktorer
ar forangningshastighet, markens vattenkoncentration,
tjockleken hos markens rotlager och tidsfordrojning.
Modellernas mojlighet att forutspa trovardiga resultat
ar darmed kraftigt begransad av osakerheten hos dessa
faktorer.
Kanslighetsanalys anvands till att hitta de osakra fak-
torer som paverkar modellens resultat i storst omfat-
tning. En sadan sanalys ar darfor en viktig komponent
i arbetet med att skapa modeller och att genomfora
riskanalyser.
Malet med mitt arbete pa Facilia AB var ursprungligen
uppdelat i tre delar:
Forst och framst skulle jag implementera
kanslighetsanalysmetoder som kan hantera
de typer av modeller som oftast anvands for
riskanalyser av ekosystem.
Darefter skulle dessa modeller testas och
riktigheten sakerstallas med hjalp av kanda
testfunktioner och modeller utvecklade pa
Facilia AB.
Slutligen skulle aven programvara med ett
anvandarvanligt graskt granssnitt utveck-
las, dvs. det skulle vara latt for anvandaren
att utnyttja.
Facilia har utvecklat en plattform, kallad Ecolego,
som ar sarskilt anpassad for simulering och modeller-
ing av ekologiska system. Denna plattform anvander
sig av Matlab och da speciellt verktyget Simulink som
matematisk motor. Av den anledningen har det varit
naturligt att utfora arbetet i Matlab-miljon.
Kanslighetsanalysmetoderna som inforts har jag delat
upp i tre olika kategorier. Forsta typen ar screening-
metoder, vilka kan anvandas i ett tidigt stadium da man
kanske har hundratals med osakra faktorer, eller om en
databehandling av modellen tar mycket lang tid. Dessa
metoder kan med relativt fa antal databehandlingar
identiera modellens viktigaste osakerhetsfaktorer.
Den andra typen av metoder ar samplings-baserade
metoder. Dessa metoder bygger pa Monte Carlo-
simuleringar och forutsatter ett linjart samband mel-
lan vardena pa faktorerna och modellens resultat. Om
det linjara sambandet ar uppfyllt kan dessa metoder
val uppskatta hur stor del av osakerheten hos model-
lens resultat en viss faktor bidrar med. Den sista typen
av metoder har kallats varians-baserade metoder.

Aven
dessa metoder anvander sig av Monte Carlo-tekniker,
men till skillnad mot de samplings-baserade metoderna
ar de relativt komplicerade, och kraver manga modell-
databehandlingar. Fordelen med de varians-baserade
metoderna ar att de ar modelloberoende, dvs. de har
inga krav pa att modellens resultat ska ha ett linjart
beroende av faktorerna.
Diverse programvara for kanslighetsanalys nns idag
pa marknaden, men tyvarr sa kan ingen av dessa pa
ett enkelt satt kopplas till Matlab-miljon. Algorit-
merna som anvands i dessa produkter ar ocksa gomda
for anvandaren. Manga modellbyggare (inom det ra-
dioekologiska omradet) utvecklar sina modeller i en
Matlab/Simulink-miljo. En verktygslada for kans-
lighetsanalys utvecklad i Matlab ar darfor av stort
intresse.
De implementerade kanslighetsanalysmetoderna har
testats med valkanda testfunktioner och jamforts med
annan programvara for kanslighetsanalys, med lyckade
resultat.
Med denna verktygslada, dopt till Eikos, kan modell-
byggare utfora enkel saval som avancerad kanslighets-
analys pa sina modeller.
This page intentionally left blank
TABLE OF CONTENTS
Table of Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Uncertainty analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Outline of the report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Monte Carlo simulations 3
2.1 Simple random sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Latin hypercube sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Rank order correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Probability distributions 6
3.1 Normal (Gaussian) distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Log-normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Uniform (Rectangular) distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Log-uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 Triangular distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 Log-triangular distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.7 Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.8 Extreme value (Gumbel) distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.9 Beta distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.10 Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.11 Chi-square distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.12 Weibull distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Model evaluations 15
5 Screening methods 16
5.1 Morris method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Sampling-based methods 18
6.1 Graphical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.2 Regression analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.3 Correlation coecients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.4 Rank transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.5 Two-sample tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
i
TABLE OF CONTENTS
6.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7 Variance-based methods 21
7.1 Sobol indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.2 Jansen (Winding Stairs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.3 Fourier Amplitude Sensitivity Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.4 Extended Fourier Amplitude Sensitivity Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8 EikosThe application 26
8.1 The main window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8.2 Using a Simulink model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.3 Using a standard Matlab function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.4 The distribution chooser window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.5 The correlation matrix window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8.6 The uncertainty analysis window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8.7 The sensitivity analysis window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8.8 The scatter plot window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8.9 The variance-based methods window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8.10 The Morris screening window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8.11 The Monte Carlo ltering window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8.12 Exporting and saving results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
9 Results of benchmarking 31
9.1 Test of code for inducing rank correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
9.2 Benchmarking of procedure for generating probability distributions . . . . . . . . . . . . . . . . . . 31
9.3 Benchmarking of SA methods for linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
9.4 Benchmarking of SA methods for non-linear monotonic models . . . . . . . . . . . . . . . . . . . . 32
9.5 Benchmarking of SA methods for non-monotonic models . . . . . . . . . . . . . . . . . . . . . . . . 33
9.5.1 The Sobol g-function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
9.5.2 The Ishigami function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
9.5.3 The Morris function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
10 Example of practical application 36
11 General discussion 40
12 Acknowledgements 41
A Appendix I
ii
TABLE OF CONTENTS
A.1 Computer code for Iman & Conover rank correlation . . . . . . . . . . . . . . . . . . . . . . . . . . I
A.2 Computer code for the inverse of the CDF of the gamma distribution . . . . . . . . . . . . . . . . I
A.3 Computer code for Morris design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II
A.4 Computer code for FAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III
A.5 Computer code for EFAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV
iii
This page intentionally left blank
1.2 Uncertainty analysis 1 INTRODUCTION
1 Introduction
This work has been carried out as a Degree Project to
obtain a M.Sc. in Information Technology Engineering
at Uppsala University. The work has taken place at,
and been organized by, Facilia AB, Stockholm. Exter-
nal funding has been provided by the Norwegian Ra-
diation Protection Authority (NRPA) and the Finnish
Nuclear Waste Management Company, Posiva Oy. Su-
pervisors for the project have been Dr. Rodolfo Avila
Moreno, director of Facilia AB and Robert Broed, Ph.D.
student at the Department of Nuclear Physics, Stock-
holm University.
1.1 Background
Facilia AB is a scientic consultant company active
in the elds of environmental and health risk assess-
ments, safety assessments in radioactive waste manage-
ment and radiation protection.
Facilia AB has developed a platform (Ecolego) for simu-
lation modeling and environmental risk assessment based
on Matlab/Simulink and Java. This platform should
include procedures for sensitivity analysis, which can be
used to assess the inuence of dierent model param-
eters on simulation endpoints. Such assessments are
needed for ranking the parameters in order of impor-
tance serving various purposes, for example:
(a) to identify major contributors to the uncer-
tainties in the predictions with a model, or
(b) to identify research priorities to improve risk
assessments with a model.
Simulations with computer models are used for the risk
assessment of systems that are too complex to analyse
directly.
In the following text, the model is seen as a black box
function f with k input factors x = (x
1
, x
2
, . . . , x
k
) and
one single scalar output y, i.e.
y = f(x
1
, x
2
, . . . , x
k
). (1.1)
In an application the dimensionality of y may of course
be a vector instead of a single scalar value. That will
be determined by the function or program of interest.
The term input factor is used to denote any quantity
that can be changed in the model prior to its execution.
It may be, for example, a parameter, an initial value, a
variable or a module of the model.
During this project, a Matlab toolbox has been de-
veloped, the name of the toolbox is Eikos
1
. In Eikos a
model can be a Matlab function, a general Simulink
model or any type of stand-alone executable code that
has been compiled with a Matlab wrapper. Matlab
supports C/C++, FORTRAN and Java.
1.2 Uncertainty analysis
When pursuing risk assessments, the general features of
the modeled systems are usually quite well understood,
but problems may arise as there exist uncertainties in
some important model inputs. Thus the input factors
of models are unfortunately not always known with a
sucient degree of certainty. Input uncertainty can be
caused by natural variations as well as by errors and
uncertainties associated with measurements.
In this context, the model is assumed to be determin-
istic, i.e. the same input data would produce the same
output if the model is run twice. Therefore, the in-
put uncertainties are the only uncertainty propagated
through the model aecting the output uncertainty.
The uncertainty of input factors is often expressed in
terms of probability distributions; it can also be speci-
ed by samples of measured values, i.e. empirical prob-
ability distributions. The uncertainties of the dierent
input factors may have dependencies on each other, i.e.
they may be correlated
2
.
Generally, the main reason of performing an uncertainty
analysis is to assess the uncertainty in the model output
that derives from uncertainty in the inputs. A question
to be investigated is How does y vary when x varies
according to some assumed joint probability distribu-
tions?.
1.3 Sensitivity analysis
Andrea Saltelli [Sal00] states that Sensitivity analysis
(SA) is the study of how the variation in the output of
a model (numerical or otherwise) can be apportioned,
qualitatively or quantitatively, to dierent sources of
variation, and how the given model depends upon the
information fed into it.. Saltelli also list a set of reasons
why modelers should carry out a sensitivity analysis,
these are to determine:
(a) if a model resembles the system or processes
1
Eikos Ancient Greek word for To be expected with some de-
gree of certainty.
2
Correlation A term used to describe the degree to which one
factor is related to another.
1
1 INTRODUCTION 1.4 Objective
under study;
(b) the factors that mostly contribute to the
output variability;
(c) the model parameters (or parts of the model
itself) that are insignicant;
(d) if there is some region in the space of in-
put factors for which the model variation is
maximal;
(e) the optimal regions within the space of the
factors for use in a subsequent calibration
study;
(f) if and which (group of) factors interact with
each other.
Sensitivity analysis aims at determining how sensitive
the model output is to changes in model inputs. When
input factors are relatively certain, we can look at the
partial derivative of the output function with respect to
the input factors. This sensitivity measure can easily be
computed numerically by performing multiple simula-
tions varying input-factors around a nominal value. We
will nd out the local impact of the factors on the model
output and therefore techniques like these are called lo-
cal SA. For environmental and health risk assessments,
input factors will often be uncertain and therefore local
SA techniques will not be usable for a quantitative anal-
ysis. We want to nd out which of the uncertain input
factors are more important in determining the uncer-
tainty in the output of interest. To nd this we need to
consider global SA, which are usually implemented us-
ing Monte Carlo simulations and are, therefore, called
sampling-based methods.
Dierent SA techniques will do well on dierent types
of model problems. At an initial phase, for models with
a large amount of uncertain input factors, a screening
method could be used to qualitatively nd out which
the most important factors are and which are not im-
portant. The screening method implemented in Eikos
is the Morris design [Mor91]. A natural starting point
in the analysis with sampling-based methods would be
to examine scatter plots. With these, the modeler can
graphically nd out nonlinearities, nonmonotonicity and
correlations between the input-output factors. For lin-
ear models, linear relationship measures like Pearson
product moment correlation coecient (CC), Partial
Correlation Coecients (PCC) and Standardized Re-
gression Coecients (SRC) will perform well. For non-
linear but monotonic models, measures based on rank
transforms like Spearman Rank Correlation Coecient
(RCC), Partial Rank Regression Coecient (PRCC)
and Standardized Rank Regression Coecients (SRRC)
will perform well. For non-linear non-monotonic mod-
els, methods based on decomposing the variance are
the best choice. Examples of these methods are the
Sobol method, Jansens alternative, the Fourier Am-
plitude Sensitivity Test (FAST) and the Extended Fou-
rier Amplitude Sensitivity Test (EFAST). Methods of
partitioning the empirical input distributions accord-
ing to quantiles
3
(or other restrictions) of the output
distribution are called Monte Carlo ltering. Measures
of their dierence are called two-sample-tests. Two
non-parametric
4
two-sample-tests are implemented,
the Smirnov test and the Cramer-von Mises test.
1.4 Objective
This work deals with implementing SA methods that
can cope with the type of models commonly used in
environmental risk assessment. The work also include
testing the implemented methods with well known test
functions and environment risk assessment models de-
veloped at Facilia AB, as well as providing user friendly
software written in Matlab that later-on can be incor-
porated into the Ecolego-platform.
1.5 Outline of the report
The remainder of this report begins with a section de-
scribing the Monte Carlo method in Section 2. Sample
generation and how to induce dependencies on the in-
put factor distributions are subjects considered.
Elementary probability theory and statistics is subject
of Section 3. Also, the properties of various probability
distributions oered in Eikos are described.
Section 4 describes how the input-output mapping is
done in Eikos, i.e. how to evaluate the models with
dierent inputs to obtain outputs.
The implemented methods of sensitivity analysis are
described in Section 5-7, including screening designs,
graphical techniques and the usual global SA meth-
ods. The last of these sections (Section 7) describes
the state-of-the-art variance-based methods.
A presentation of the graphical user interface (GUI) of
Eikos comes in Section 8. The GUI has been imple-
mented so that the SA process can be performed easily.
Results of benchmarking the probability distributions
3
The q-quantile of a random variable X is any value x such that
the probability P(X x) = q.
4
Non-parametric minimal or no assumptions are made about
the probability distributions of the factors being assessed.
2
2 MONTE CARLO SIMULATIONS
generated in Eikos with commercial systems are shown
in Section 9. Comparisons of results from running known
test-functions as well as a test of the codes for induc-
ing rank correlation are presented. Section 10 presents
a sensitivity study of a risk assessment model imple-
mented in Ecolego.
Finally in Section 11 a general discussion is given.
Throughout the report, Matlab code snippets imple-
menting parts of the algorithms, has been included to
facilitate the understanding by the reader and make the
theory easier to understand. In Appendix, larger, more
complete Matlab codes for some of the implemented
methods are collected.
2 Monte Carlo simulations
Monte Carlo (MC) methods are algorithms for solving
various kinds of computational problems by using ran-
dom numbers [tfe05]. It includes any technique of sta-
tistical sampling employed to approximate solutions to
quantitative problems. MC sampling refers to the tech-
nique of using random or pseudo-random
5
numbers to
sample from a probability distribution, it is a conven-
tional approach to solve uncertainty analysis problems.
Figure 1: A schematic view of sampling-based sensitiv-
ity analysis, Saltelli [Sal00].
A MC simulation is based on performing multiple model
evaluations with probabilistically selected model inputs.
The results of these evaluations can be used to deter-
5
The prex pseudo- is used since the random number is generated
by a computational algorithm and not from a truly random
event such as radioactive decay.
mine the uncertainty in the model output (prediction)
and to perform SA.
Eikos is a Matlab utility designed to assist the user
in the implementation of a MC simulation, followed
by a sensitivity analysis based on the MC simulation
(sampling-based SA). The main steps of a SA based on
a MC simulation are:
(1) For the dened model f with input factors
x
i
, i = 1, . . . , k and output y, ranges and dis-
tributions are selected for each input factor.
These k distributions D
1
, D
2
, . . . , D
k
charac-
terizes the uncertainty in the input factors.
An expert in the risk assessment eld must
dene these distributions.
(2) Samples, X = X
1
, X
2
, . . . , X
k
, correspond-
ing to the ranges and distributions dened
in the previous step are generated. These
samples can be thought of as a matrix which
contains, for N model runs, the sampled val-
ues of each of the k input factors under ex-
amination.
X =
_

_
x
11
x
12
x
1k
x
21
x
22
x
2k
.
.
.
.
.
.
.
.
.
.
.
.
x
N1
x
N2
x
Nk
_

_
(2.1)
(3) The model is evaluated N times, once for
each element of the sample, creating a map-
ping in the input-output space. The pro-
duced output is Y = f(X
1
, X
2
, . . . , X
k
).
(4) The output of the previous step is used as
the basis for the uncertainty analysis. The
uncertainty can for example be character-
ized with a mean value and a variance.
(5) In the last step the mapping between Y and
Xis used by various sensitivity analysis tech-
niques to rank the importance of the input
factors.
To recreate the input factor distributions through sam-
pling by the MC method, a large number of iterations
may be required. If too few iterations are made, the
values in the outer ranges of the distributions may not
be represented in the samples and thus their impact on
the results will not be included in the simulation out-
put. To address this problem the sampling technique
called Latin Hypercube Sampling (LHS) was developed.
LHS is a stratied sampling technique that guarantees
samples to be drawn over the whole range of the distri-
bution. LHS is described in Section 2.2.
3
2 MONTE CARLO SIMULATIONS 2.2 Latin hypercube sampling
2.1 Simple random sampling
To generate random samples from any distribution one
usually starts with uniform random numbers. With uni-
form random numbers one can produce random num-
bers from other distributions, either directly or by using
inversion or rejection methods.
In Eikos, the inversion method is used to replicate the
expected distributions. The sampling procedure of a
distribution can be divided into two steps:
(1) generate uniformly distributed numbers in
the interval [0, 1], and then
(2) deterministically transform these numbers
into the wanted probability distributions us-
ing the inversion method.
The inversion method works due to a fundamental the-
orem that relates the uniform distribution to other con-
tinuous distributions:
Theorem 1 If is a continuous distribution with in-
verse
1
, and x is a uniform random number, then

1
(x) has distribution .
So, to generate a random sample from any probabil-
ity distribution, a random number r is rst generated,
drawn with equal probability in the interval [0, 1]. This
value is then used in the equation of the inverse distri-
bution to determine the value to be generated for the
distribution:

1
(r) = x (2.2)
Unfortunately, this approach is usually not the most
ecient.
With simple random sampling the samples are entirely
random in the sense that each sample is drawn indepen-
dently from all other samples. Too few samples may not
give a good coverage of the entire range space.
If the computer code of the model is not time consuming
and a large number of iterations can be performed, then
random sampling is the best technique to use, since it
yields un-biased estimates of the desired output char-
acteristics, for example the mean and the variance.
In Matlab there exists a very ecient uniform pseudo-
random number generator named rand. This uniform
random generator is state-of-the-art and extremely fast
since it does not use any multiplications or divisions at
all. It is based on the work of George Marsaglia [Mar68]
and has been shown to have a period as big as 2
1492
.
The function call to generate uniform samples in Mat-
lab for the N k input factor matrix X would be:
X = rand(N,k);
2.2 Latin hypercube sampling
Latin hypercube sampling (LHS) was introduced by
McKay, Conover and Beckman [MCB79] and is today
according to David Vose [Vos96] an essential feature in
any risk analysis software package.
LHS is designed to accurately recreate the input distri-
bution through sampling in fewer iterations comparing
with the simple random sampling. LHS is a so-called
stratied sampling technique where the random vari-
able distributions are divided into equal probability in-
tervals. The technique being used is known as stratied
sampling without replacement. This procedure ensures
that each sub-interval for each variable is sampled ex-
actly once, in such a way that the entire range of each
variable is explored.
A LHS proceeds as follows:
(1) The probability distribution of an input fac-
tor is split into N non-overlapping intervals
of equal marginal probability 1/N.
(2) In the rst iteration, one of these intervals
is selected randomly.
(3) Within this interval one observation is cho-
sen randomly and are now the rst sample
of this specic input factor.
(4) In the second and the rest of the N iter-
ations, one interval which has not already
been chosen is randomly selected and the
procedure is repeated from step number (3)
to obtain the N samples. Since there are N
samples generated, each of the N intervals
will have been sampled-from only once.
(5) The whole procedure is repeated for each of
the k input factors.
Hence, LHS is equivalent to uniformly sample from the
quantiles of the distribution (equivalent to sampling
the vertical axis of the cumulative distribution function
CDF
6
) and then inverting the CDF to obtain the ac-
tual distribution values that those quantiles represent
[WJ98]. Therefore the LHS procedure can always start
with generating uniformly distributed samples in the
interval (0,1) and then the inverse of the CDF can be
applied to obtain the sought sample.
6
The cumulative distribution function is described in Section 3.
4
2.3 Rank order correlation 2 MONTE CARLO SIMULATIONS
Figure 2: Division of the normal probability distribu-
tion into 10 equiprobable intervals.
LHS allows an un-biased output mean with a smaller
number of runs, i.e. it will require fewer samples than
simple random samples for similar accuracy. One draw-
back of LHS is that it can yield biased estimates of the
sample variance.
To generate uniform LHS samples in Matlab for the
Nk input factor matrix Xthe following code is enough:
for i=1:k
X(:,i) = (randperm(N)-rand(N,1))/N;
end
The built-in Matlab-function randperm(N) returns a
random permutation of the integers from 1 to N.
2.3 Rank order correlation
In the generation of samples we have assumed that the
input factors are independent of each other and should
therefore be uncorrelated. But since input factors could
be related to each other there is a need to be able to
induce a desired correlation structure onto the sample.
The technique to correlate input samples used in Eikos
was developed by Iman and Conover [IC82]. It is a
restricted pairing technique that is based on rank cor-
relations rather than sample correlations. When using
the ranks the data is replaced with their corresponding
ranks. The ranks are dened as follow; the smallest
value is assigned 1, the second smallest 2 and so on to
the largest value that is assigned rank N. The tech-
nique involves a re-permutation of the columns of the
Nk matrix X so that a desired rank-correlation struc-
ture results between the individual input factors. Iman
and Conover [IC82] lists the following desirable features
with their method:
(a) It is distribution-free. That is, it may be
used with equal facility on all types of dis-
tribution functions.
(b) It is simple. No unusual mathematical tech-
niques are required to implement the method.
(c) It can be applied to any sampling scheme
for which correlated input variables can logi-
cally be considered, while preserving the in-
tent of the sampling scheme. That is, the
same number originally selected as input val-
ues are retained; only their pairing is af-
fected to achieve the desired rank correla-
tions. This means that in Latin Hypercube
sampling the integrity of the intervals is main-
tained. If some other structure is used for
selection of values, that same structure is
retained.
(d) The marginal distributions
7
remain intact.
David Vose [Vos96] says that most risk analysis prod-
ucts now oer a facility to correlate probability distri-
butions within a risk analysis model using rank order
correlation.
In Eikos the user has the possibility of dening a k k-
dimensional correlation matrix C with the wanted cor-
relation structure. The following 4-dimensional corre-
lation matrix C tells us that the input factors one and
two are positively correlated to a degree 0.5, the input
factors two and four are negatively correlated to a de-
gree 0.3 and the rest of the factors are fully uncorrelated
with each other.
C =
_

_
1.0 0.5 0.0 0.0
0.5 1.0 0.0 0.3
0.0 0.0 1.0 0.0
0.0 0.3 0.0 1.0
_

_
(2.3)
If two factors are fully correlated, it corresponds to a 1
in the correlation matrix C (that is why the diagonal
constitutes of ones). There are ways to dene a corre-
lation structure that is impossible, for example, if X
1
and X
2
are highly positively correlated as well as X
2
and X
3
. Then X
1
and X
3
cannot be highly negatively
correlated. In general, C is valid only if it is positive
7
Given N random variables X
1
, . . . , X
N
with joint probability
density function (x
1
, . . . , x
N
), the marginal distribution of x
r
is
obtained by integrating the joint probability density over all vari-
ables but x
r
: g
r
(x
r
) =

. . .

(x
1
, . . . , x
N
)dx
1
. . . dx
N
.
5
3 PROBABILITY DISTRIBUTIONS
semi-denite. The Hermitian part
8
of a positive semi-
denite matrix has all positive eigenvalues. The code
to check for validity used in Eikos is:
% Hermitian part.
HP = 0.5*(C+C);
% Get eigenvalues.
[V,D] = eig(HP) ;
eigenvals = diag(D);
% Look for non-positive values.
if any(real(eigenvals) < 0)
error(Matrix is not positive semi-definite.);
end
Theoretical details of the rank correlation method can
be found in the original article by Iman and Conover
[IC82]. The algorithm can be summarized by the fol-
lowing seven steps, where R is the N k input factor
matrix and C is the k k matrix corresponding to the
wanted correlation matrix:
(1) Calculate the sample correlation matrix T
of R.
(2) Calculate lower triangular cholesky decom-
position P of C, i.e. C = PP
T
.
(3) Calculate lower triangular cholesky decom-
position Q of T, i.e. T = QQ
T
.
(4) Obtain S such that C = STS
T
, can be cal-
culated as S = PQ
1
.
(5) Obtain R
score
by rank-transforming R and
convert to van der Waerden scores
9
.
(6) Calculate target correlation matrix R

=
R
score
S
T
.
(7) Match up the rank pairing in R according
to R

.
A Matlab implementation of the rank correlation pro-
cedure can be found in Appendix A.1.
8
The Hermitian part of a real matrix A is dened as A
H

1
2
(A+A
T
).
9
The van der Waerden scores are
1
(i/(N + 1)), where
1
is the inverse CDF of the standard normal distribution, i is the
assigned rank and N is the total number of samples.
3 Probability distributions
Some basic theory of probability and statistics is essen-
tial to understand for example the inversion technique
used to generate samples of general distributions men-
tioned in the previous section. In this section some el-
ementary theory about distributions and few essential
moments are therefore described. A list and some de-
tails of available probability distributions in Eikos are
listed after the theory in this section.
If we consider the distribution of the uncertain input
variable x. The cumulative distribution function (x)
(CDF) gives the probability P that the variable X will
be less than or equal to x, i.e.
(x) = P(X x) (3.1)
The CDF has the following two theoretical properties:
(1) It is always non-decreasing.
(2) (x) = 0 at x = and (X) = 1 at x = .
The CDF therefore ranges from zero to one. Fig. 3
shows the CDF of a normal distribution with = 0 and
= 1.
Figure 3: CDF of a normal distribution, with = 0 and
= 1.
The inverse of the CDF gives the value of (x) for a
given value of x. This inverse function
1
((x)) is
written as

1
((x)) = x (3.2)
Fig. 4 shows the inverse of the CDF of a normal distri-
bution with = 0 and = 1.
6
3 PROBABILITY DISTRIBUTIONS
Figure 4: The inverse of the CDF of a normal distribu-
tion, with = 0 and = 1.
The uncertainty of the input factors is described with
probability density functions (x) (PDF). A PDF has
two theoretical properties:
(1) The PDF is zero or positive for every possi-
ble outcome.
(2) The integral of a pdf over its entire range of
values is one.
The PDF of a continuous random variable X is the func-
tion (x) such that
P(a < X b) =
_
b
a
(x)dx, (3.3)
for all real a and b, a < b. Fig. 5 shows the PDF of a
normal distribution with = 0 and = 1.
Figure 5: PDF of a normal distribution, with = 0 and
= 1.
Since the random variable X is continuous the proba-
bility of observing any particular value becomes zero.
Therefore, the PDF needs to be integrated over the in-
terval of interest. The PDF could also be described as
the rate of change (the gradient) of the CDF:
(x) =
d
dx
(x). (3.4)
From this, it is easy to understand that the rst of
the theoretical properties of a PDF listed above is true.
The PDF is zero or positive since the gradient of a non-
decreasing curve is always non-negative. The probabil-
ity of lying between any two exact values (a, b):
P(a x b) = (b) (a), where b > a. (3.5)
The expected value of a random variable X is denoted as
either or E(X). Other names for E(X) are the expec-
tation, mean or average of X. The expectation indicates
the position of the center of mass of the distribution.
Expectation is given by:
E(X) = =
_
+

x(x)dx, (3.6)
where (x) is the probability density function of X.
The mode of a random variable X with probability den-
sity function (x) is the x-value at which (x) is the
largest, i.e. it is the most likely outcome. The mode is
denoted M
o
.
The variance of a random variable X is denoted as ei-
ther
2
or V (X). It is the average of the squared de-
viations about the mean and measures how widely dis-
persed the values are in the distribution. Variance is
given by:
V (X) =
2
=
_
+

(x )
2
(x)dx, (3.7)
where = E(X). Variance can also be dened in just
terms of expectations of the random variable X
V (X) =
2
= E[(X E(X))
2
] = E(X
2
) E(X)
2
. (3.8)
The positive square root of the variance ( =
_
V (X))
is called the standard deviation.
In Eikos, distributions with innite tails can be bounded
or limited by truncation. If a probability distribution
function is truncated, the characteristics of the PDF
will change. For example, if a normal distribution with
= 0 and = 1 is truncated such that the probabil-
ity for values smaller than -6 is zero, then the actual
mean will be slightly larger than zero. The truncation
is implemented as follows: when generating samples, if
7
3 PROBABILITY DISTRIBUTIONS 3.2 Log-normal distribution
for example a left bound on the PDF is supposed to
be at -6, then the random uniform samples are drawn
from the range [(6), 1] instead of [0, 1], these samples
are then used to get the needed ones by the inversion
method.
3.1 Normal (Gaussian) distribution
The normal distribution is probably the most frequently
used distribution in probability theory. Its popularity
comes from that many natural phenomena follow a sta-
tistically normal distribution.
The normal distribution is specied with a mean value
and a standard deviation . The PDF is dened as:
(x) =
1

2
e

1
2
(
x

)
2
(3.9)
Fig. 5 shows the normal PDF using a mean value of
= 0 and standard deviation = 1. In Matlab the
PDF can be computed as (x being the vector of X-
values):
y = exp(-0.5*((x-mu)/sigma).^2)/(sqrt(2*pi)*sigma);
There is no analytic formula for the CDF () of a nor-
mal distribution. The function exists, but cannot be
expressed in terms of other standard functions. There-
fore both the CDF and its inverse is approximated by
computer algorithms. In Matlab the CDF and its in-
verse can be computed with the help of the error func-
tion erf and its inverse erfinv respectively (or their
complements). CDF can be computed with:
p = 0.5*erfc(-((x-mu)/sigma)/sqrt(2));
The inverse of CDF can be computed as (p being a
vector of uniform random samples):
p(p<0) = NaN;
p(p>1) = NaN;
r = (-sqrt(2)*sigma)*erfcinv(2*p)+mu;
Range: X
Parameters: , 0 <
E(X) =
M
o
=
V (X) =
2
3.2 Log-normal distribution
As with the normal distributions many natural phenom-
ena occurs according to the log-normal distribution. It
is often used when most values occur near the minimum,
i.e. the values are positively skewed
10
.
If the natural logarithm of a random variable X has
a normal distribution with mean and standard devi-
ation , then its distribution is log normal. The log-
normal PDF is given by:
(x) =
1
x

2
e

1
2
(
log x

)
2
(3.10)
Fig. 6 shows the PDF of a log-normal distribution with
a mean value = 0 and a variance = 1.
Figure 6: PDF of a log-normal distribution, with = 0
and = 1.
In Matlab the PDF of the log-normal distribution is
computed as:
y = exp(-0.5*((log(x)-mu)/sigma).^2)./ ...
(x*sqrt(2*pi)*sigma);
As with the normal distribution, no formula for the
CDF and the inverse of the CDF exist for the log-
normal distribution. Eikos uses function calls to the
normal distribution function to obtain the CDF and its
inverse. In Matlab the CDF of the log-normal distri-
bution is computed as:
x(x<0) = 0;
p = cdfnorm(mu,sigma,log(x));
Where cdfnorm is the function that returns the CDF
of a normal distribution. In Matlab the inverse of the
CDF of the log-normal distribution is computed as:
p(p<0) = NaN;
p(p>1) = NaN;
r = exp(invnorm(0,1,p)*sigma+mu);
10
Positively skewed A distribution in which most of the values
occur at the lower end of the range.
8
3.4 Log-uniform distribution 3 PROBABILITY DISTRIBUTIONS
Where invnorm is the function that returns the inverse
of the CDF of a normal distribution.
Range: 0 X
Parameters: 0 < , 0 <
E(X) = e
+0.5
2
M
o
= e

2
V (X) = e
2+2
2
e
2+
2
3.3 Uniform (Rectangular) distribution
If it is only possible to dene a minimum and maximum
value for a factor, a uniform distribution is appropriate.
For a uniform distribution all values between the mini-
mum and maximum value are equally likely to be sam-
pled. It is the simplest of all continuous distributions.
Parameters are the upper and lower bounds, a and b.
The uniform PDF is given as:
(x) =
_
0, x / (a, b)
1
ba
, x (a, b).
(3.11)
Fig. 7 shows the PDF of an uniform distribution with
a minimum value a = 0 and a maximum value b = 1.
Figure 7: PDF of an uniform distribution, with a = 0
and b = 1.
In Matlab the uniform PDF can be computed as:
y = zeros(size(x));
k = find(x>=a & x<=b & a<b);
y(k) = 1/(b-a);
The CDF of an uniform distribution is dened as:
(x) =
_

_
0, x a
xa
ba
, x (a, b)
1, x b.
(3.12)
In Matlab the CDF of an uniform distribution can be
computed as:
x(x<a) = a;
x(x>b) = b;
p = (x-a)/(b-a);
The inverse of the CDF of an uniform distribution is
dened as:

1
(p) = a + (b a)p, p (0, 1) (3.13)
In Matlab the inverse of the CDF can be computed
as:
p(p<0) = NaN;
p(p>1) = NaN;
r = a+(b-a)p;
Range: a X b
Parameters: < a, a < b < +
E(X) =
a+b
2
M
o
= no unique mode.
V (X) =
1
12
(b a)
2
3.4 Log-uniform distribution
The log-uniform distribution is like the standard uni-
form distribution in that all values between the lower
and upper bounds are equally likely to be sampled. The
dierence is that it is sampled on a logarithmic scale in-
stead of a linear scale. The upper and lower bounds are
still entered in linear space. The PDF of a log-uniform
distribution is dened as:
(x) =
1
x
(log(b) log(a)) (3.14)
Fig. 8 shows the PDF of a log-uniform distribution with
a minimum value a = 1 and a maximum value b = 10.
Figure 8: Log-uniform PDF with a = 1 and b = 10.
9
3 PROBABILITY DISTRIBUTIONS 3.5 Triangular distribution
In Matlab the log-uniform PDF can be computed as:
y = zeros(size(x));
k = find(x>=a & x<=b & a<b);
y(k) = 1 ./(x(k)*(log(b)-log(a)));
The CDF of a log-uniform distribution is dened as:
(x) =
_

_
0, x a
log(x)log(a)
log(b)log(a)
, x (a, b)
1, x b.
(3.15)
In Matlab the CDF of a log-uniform distribution can
be computed as:
p(x<a) = 0;
k = find(x>=a & x<=b);
p(k) = (log(x(k))-log(a))/(log(b)-log(a));
p(x>b) = 1;
The inverse of the CDF of an uniform distribution is
dened as:

1
(p) = ae
(log(b)log(a))p
, p (0, 1) (3.16)
In Matlab the inverse of the CDF can be computed
as:
r = a*exp((log(b)-log(a))*p);
r(p<0) = NaN;
r(p>1) = NaN;
Range: a X b
Parameters: 0 < a, a < b < +
E(X) =
log(a)+log(b)
2
M
o
= a
V (X) =
log(
b
a
)
2
12
3.5 Triangular distribution
If in addition to a minimum and a maximum value, a
nominal value is considered as most likely, a triangular
distribution is appropriate.
Parameters are minimum, maximum and nominal val-
ues, a, b and c respectively. The PDF of a triangular
distribution is dened as:
(x) =
_

_
2(xa)
(ba)(ca)
, a x c
2(bx)
(ba)(bc)
, c < x b
0, x / (a, b)
(3.17)
Fig. 9 shows the PDF of a triangular distribution with
a minimum value a = 0, a maximum value b = 10 and a
nominal value c = 3.
Figure 9: PDF of a triangular distribution, with a = 0,
b = 10 and c = 3.
In Matlab the PDF of a triangular distribution can be
computed as:
y = zeros(size(x));
k = find(x>a & x<=c & a~=c);
y(k) = (2.0*(x(k)-a)/(b-a)/(c-a));
k = find(x>a & x>c & x<=b & c~=b);
y(k) = (2.0*(b-x(k))/(b-a)/(b-c));
The CDF of a triangular distribution is dened as:
(x) =
_

_
0, x < a
(xa)
2
(ba)(ca)
, a x c
1
(bx)
2
(ba)(bc)
, c < x b
1, x > b.
(3.18)
In Matlab the CDF of a triangular distribution can
be computed as:
p = zeros(size(x));
k = find(x>=a & x<=c);
p(k) = (x(k)-a).^2/((b-a)*(c-a));
k = find(x>c & x<=b);
p(k) = 1-(b-x(k)).^2/((b-a)*(b-c));
p(x>b) = 1;
The inverse of the CDF of a triangular distribution is
dened as:

1
(p) =
_
a +
_
p(b a)(c a), p <
ca
bc
b
_
(1 p)(b a)(b c), p
ca
bc
.
(3.19)
In Matlab the inverse of the CDF of a triangular dis-
tribution can be computed as:
p(p>1) = NaN;
p(p<0) = NaN;
r = (p<(c-a)/(b-a)).*(a+sqrt(p*(b-a)*(c-a))) + ...
(p>=(c-a)/(b-a)).*(b-sqrt((1-p)*(b-a)*(b-c)));
10
3.7 Exponential distribution 3 PROBABILITY DISTRIBUTIONS
Range: a X b
Parameters: < a, a < b, a c b
E(X) =
a+b+c
3
M
o
= c
V (X) =
a
2
+b
2
+c
2
abacbc
18
3.6 Log-triangular distribution
The log-triangular distribution is like the triangular dis-
tribution except that it is sampled on a logarithmic
scale instead of a linear scale. The upper and lower
bounds are still entered in linear space. The PDF of a
log-triangular distribution is dened as:
(x) =
_

_
2(log(x)log(a))
(log(b)log(a))(log(c)log(a))x
, a x c
2(log(b)log(x))
(log(b)log(a))(log(b)log(c))x
, c < x b
0, x / (a, b)
(3.20)
Fig. 10 shows the PDF of a log-triangular distribution
with a minimum value a = 1, a maximum value b = 10
and a nominal value c = 4.
Figure 10: PDF of a log-triangular distribution, with
a = 1, b = 10 and c = 4.
In Matlab the PDF of a triangular distribution can be
computed as:
y = zeros(size(x));
k = find(x>a & x<=c & a~=c);
y(k) = (2.0*(log(x(k))-log(a))/(log(c)-log(a))/ ...
(log(b)-log(a)))./x(k);
k = find(x>a & x>c & x<=b & c~=b);
y(k) = (2.0*(log(b)-log(x(k)))/(log(b)-log(a))/ ...
(log(b)-log(c)))./x(k);
The CDF of a triangular distribution is dened as:
(x) =
_

_
0, x < a
(log(x)log(a))
2
(log(b)log(a))(log(c)log(a))
, a x c
1
(log(b)log(x))
2
(log(b)log(a))(log(b)log(c))
, c < x b
1, x > b.
(3.21)
In Matlab the CDF of a log-triangular distribution
can be computed as:
p = zeros(size(x));
k = find(x>=a & x<=c);
p(k) = (log(x(k))-log(a)).^2/((log(b)-log(a))* ...
(log(c)-log(a)));
k = find(x>c & x<=b);
p(k) = 1-(log(b)-log(x(k))).^2/((log(b)-log(a))* ...
(log(b)-log(c)));
p(x>b) = 1;
The inverse of the CDF of a triangular distribution is
dened as:

1
(p) =
_
e
log(a)+

p(log(b)log(a))(log(c)log(a))
, p <
ca
bc
e
log(b)

(1p)(log(b)log(a))(log(b)log(c))
, p
ca
bc
.
(3.22)
In Matlab the inverse of the CDF of a triangular dis-
tribution can be computed as:
p(p>1) = NaN;
p(p<0) = NaN;
a = log(a);
c = log(c);
b = log(b);
r = exp((p<(c-a)/(b-a)).*(a+sqrt(p*(b-a)*(c-a))) + ...
(p>=(c-a)/(b-a)).*(b-sqrt((1-p)*(b-a)*(b-c))));
Range: X b
Parameters: 0 < a, a < b, a c b
Analytical values for the moments E(X), M
o
and V (X)
has not been found.
3.7 Exponential distribution
The exponential distribution is often used to describe
random events in time or space. For example, the de-
cay of a radionuclide over time or interarrival times for
Markovian queues.
It has only one parameter, , which equals the recip-
rocal of the mean of the distribution. The PDF of an
exponential distribution is given by:
(x) = e
x
(3.23)
11
3 PROBABILITY DISTRIBUTIONS 3.8 Extreme value (Gumbel) distribution
Fig. 11 shows the PDF of an exponential distribution
with = 1.
Figure 11: PDF of an exponential distribution, with
= 1.
In Matlab the PDF of an exponential distribution can
be computed as:
y = exp(-x/alpha)/alpha;
y(x<0) = 0;
The CDF of an exponential distribution is dened as:
(x) = 1 e
x
(3.24)
In Matlab the CDF of an exponential distribution can
be computed as:
p = 1-exp(-x*alpha);
p(x<0) = 0;
The inverse of the CDF of an exponential distribution
is dened as:

1
(p) =
log(1 p)

(3.25)
In Matlab the inverse of the CDF of an exponential
distribution can be computed as:
p(p<0) = NaN;
p(p>1) = NaN;
r = -log(1-p)/alpha;
Range: 0 X +
Parameters: 0 <
E(X) =
1

M
o
= 0
V (X) =
1

2
3.8 Extreme value (Gumbel) distribu-
tion
The extreme value distribution (the Gumbel distribu-
tion) is commonly used to describe the largest value of
a response over a period of time: for example, in ood
ows, rainfall, and earthquakes. Other applications in-
clude the breaking strengths of materials, construction
design, and aircraft loads and tolerances [Dec04].
The extreme value distribution is described by a loca-
tion parameter and a scale parameter , its PDF is
given by:
(x) =
e
(x)/e
(x)/

(3.26)
Fig. 12 shows the PDF of an extreme value distribution
with = 0 and = 1.
Figure 12: Extreme value PDF with = 0 and = 1.
In Matlab the PDF of the extreme value distribution
can be computed as:
y = exp((mu-x)/sigma-exp((mu-x)/sigma))/sigma;
The CDF of the extreme value distribution is dened
as:
(x) = e
e
(x)/
(3.27)
In Matlab the CDF of the extreme value distribution
can be computed as:
p = exp(-exp((mu-x)/sigma));
The inverse of the CDF of the extreme value distribu-
tion is dened as:

1
(p) = log(log(p)) (3.28)
In Matlab the inverse of the CDF of the extreme value
distribution can be computed as:
12
3.10 Gamma distribution 3 PROBABILITY DISTRIBUTIONS
p(p<0) = NaN;
p(p>1) = NaN;
r = mu-sigma*log(-log(p));
Range: X +
Parameters: +, 0 <
E(X) = +, where =Eulers constant 0.577216
M
o
=
V (X) =

2
6

2
3.9 Beta distribution
The beta distribution is commonly used to represent
variability over a xed range. Another common use
of this distribution is to describe empirical data and
predict random behavior of percentages and fractions
[Dec04].
The beta distribution has two parameters and and
its PDF is given by:
(x) =
x
1
(1 x)
1
B[, ]
, (3.29)
where B[, ] is the beta function with parameters
and , given by
B[, ] =
_
1
0
x
1
(1 x)
1
dx. (3.30)
Fig. 13 shows the PDF of a beta distribution with = 2
and = 2.
Figure 13: PDF of the beta distribution, with = 2
and = 2.
In Matlab the PDF of the beta distribution can be
computed as a and b is parameter and respectively:
y = x.^(a-1).*(1-x).^(b-1)/beta(a,b);
Where beta(a,b) is the beta function in Eq. 3.30 which
is built-in in Matlab. The CDF of the beta distribu-
tion is dened as:
(x) =
_
x
0
x
1
(1 x)
1
dx
B[, ]
(3.31)
In Matlab the CDF of the beta distribution can be
computed by using the incomplete beta function which
is a built-in function in Matlab:
p = betainc(x,a,b);
The formula for the inverse of the CDF of the beta
distribution does not exist in a simple closed form. It
is computed numerically. In Matlab the inverse of the
CDF of the beta distribution can be computed as:
b = min(b,100000);
r = a/(a+b);
dr = 1;
while any(any(abs(dr)>256*eps*max(r,1)))
dr = (betainc(r,a,b)-p)./pdfbeta(r,a,b);
r = r-dr;
r = r+(dr-r)/2.*(r<0);
r = r+(1+(dr-r))/2.*(r>1);
end
The above code has been adapted from codes developed
by Anders Holtsberg [Hol00].
Range: X +
Parameters: +, 0 <
E(X) =

+
M
o
=
1
+1
, > 1 and > 1
V (X) =

(+)
2
(++1)
3.10 Gamma distribution
The Gamma distribution applies to a wide-range of
physical quantities. It is used in meteorological pro-
cesses to represent pollutant concentrations and mois-
ture quantities. The gamma distribution is also used
to measure the time between the occurrence of events
when the event process is not completely random [Dec04].
The gamma distribution is dened by two parameters
and , its PDF is given by:
(x) =
1
[]
_
x

_
1
e
x/
(3.32)
Fig. 14 shows the PDF of a gamma distribution with
= 3 and = 1.
13
3 PROBABILITY DISTRIBUTIONS 3.12 Weibull distribution
Figure 14: PDF of the gamma distribution, with = 3
and = 1.
In Matlab the PDF of the gamma distribution can be
computed as:
x(x<0) = Inf;
y = (beta-1)*log(x/eta)-x/eta-gammaln(beta);
y(x==0 & beta==1) = 0;
y((x==Inf & isfinite(beta)) | (x<Inf & beta==Inf)) = -Inf;
y = exp(y)/eta;
Where gammaln is the logarithm of the gamma function.
There does not exist any closed formulas in general for
the CDF and the inverse of the CDF of the gamma dis-
tribution. In Matlab the CDF of the gamma distribu-
tion can be computed by using the incomplete gamma
function which is a built-in function in Matlab:
p = gammainc(x/eta,beta);
p(x/eta==inf) = 1;
The Matlab codes used in Eikos for computing the in-
verse of the CDF of the gamma distribution has been
adapted from codes in the WAFO toolbox [gro04]. Mat-
lab codes can be found in Appendix A.2.
Range: 0 X +
Parameters: 0 < , 0 <
E(X) =
M
o
= ( 1), 1
V (X) =
2

3.11 Chi-square distribution


The chi-square distribution has an input parameter n.
It is the gamma distribution with parameters = n/2
and = 2. The chi-square distribution is used in many
cases for the critical regions for hypothesis tests and
in determining condence intervals. It is rarely used
for modeling applications. The PDF of the chi-square
distribution is given by:
(x) =
x
n
2
1
e

x
2
2
n
2 [
n
2
]
(3.33)
Fig. 15 shows the PDF of a chi-square distribution with
n = 4.
Figure 15: PDF of the chi-square distribution with n =
4.
Since the chi-square distribution is just a special case
of the gamma distribution, restricted function calls to
the gamma distribution functions has been used to im-
plement the chi-square distribution functions.
Range: 0 X +
Parameters: 0 < n, n = 1, 2, . . .
E(X) = n
M
o
= (n 2), n 2
V (X) = 2n
3.12 Weibull distribution
The Weibull distribution is a family of distributions
that can assume the properties of several other distri-
butions. For example, depending on the shape param-
eter you dene, the Weibull distribution can be used
to model the exponential and Rayleigh distributions,
among others.
The Weibull distribution is widely used in the eld of
life phenomena, as the distribution of the lifetime of
some object. It is commonly used to describe failure
time in reliability studies, and the breaking strengths of
materials in reliability and quality control tests. Weibull
distributions are also used to represent various physical
quantities, such as wind speed [Dec04].
14
4 MODEL EVALUATIONS
The Weibull distribution is dened by a shape param-
eter and a scale parameter , its PDF is dened as:
(x) =
X
1

e
(
x

(3.34)
Fig. 16 shows the PDF of an Weibull distribution with
= 1 and = 2.
Figure 16: PDF of the Weibull distribution, with = 1
and = 2.
In Matlab the PDF of the Weibull distribution can be
computed as:
y = alpha*(x/beta).^(alpha-1)/beta.* ...
exp(-((x/beta).^alpha));
y(x<0) = 0;
The Weibull CDF is dened as:
(x) = 1 exp

(3.35)
In Matlab the CDF of the Weibull distribution can be
computed as:
p = 1-exp(-((x/beta).^alpha));
The inverse of the CDF of the Weibull distribution is
dened as:

1
(p) = (log(
1
1 p
))
1/
(3.36)
In Matlab the inverse of the CDF of the Weibull dis-
tribution can be computed as:
p(p<0) = NaN;
p(p>1) = NaN;
r = beta*(-log(1-p)).^(1/alpha);
Range: 0 X +
Parameters: 0 < , 0 <
E(X) =

[
1

]
M
o
= (1 1/)
1/
, 1, 0, 1
V (X) =

2

[2[
2

]
1

([
1

])
2
]
4 Model evaluations
The model function f is evaluated N times with input
factors coming from the N rows of the N k matrix X.
The outcomes of these N evaluations are the N scalar
elements of the output vector Y .
In Eikos there is the possibility to use a standard Mat-
lab function as a model, or a Simulink model. The out-
put can, of course, have a larger dimension than N 1,
i.e. more than one output is possible. The sensitiv-
ity analyses are then performed independently for each
column in the output matrix. If f is a Matlab func-
tion then its parameters, for each iteration i = 1, . . . , N,
are just the ith row of X. The outputs of a Matlab
function are directly obtained with a comma separated
vector.
In the case of Simulink models, Eikos looks for con-
stant blocks or global variables. For each iteration, be-
fore evaluating the model, Eikos updates the constant
blocks in the model, or assign variables to the Matlab
Workspace with the current data.
The outputs from a Simulink model are obtained by
adding To Workspace-blocks for each output. Outputs
from a Simulink model are vectors over the spanned
time interval. The default simulation endpoint is the
last value in this vector, i.e. the scalar value obtained
at the last time point. Eikos has the choice of do-
ing analysis at any given time point or time points in
the evaluated time-span. This is done by interpolat-
ing within the elements of the output vector. If using
a variable step-size ODE-solver an interpolation must
be performed, since the evaluated time points are cho-
sen by the numerical algorithm and therefore are not
the ones desired. The choice to analyse the maximal
evaluated value also exists.
From Eikos it is possible to change the time-span and
the specic numerical ODE-solver used for evaluating
the Simulink model. Various other solver specic is-
sues as maximum and minimum step size can also be
congured within Eikos.
As FORTRAN and C/C++ codes can be called from
Matlab, if compiled with a wrapping function, Eikos
can do analysis on computer-models developed in these
languages.
15
5 SCREENING METHODS 5.1 Morris method
5 Screening methods
When dealing with computationally expensive models,
containing large amounts of uncertain input factors,
screening methods can be used to isolate the set of fac-
tors that has the strongest eect on the output variabil-
ity with very few model evaluations. It is valuable to
identify the most important input factors in an initial
phase of the analysis. The number of uncertain input
factors to examine might be reduced. It is often the
case that the number of important input factors is quite
small compared to the total number of input factors of
a model.
The most appealing property of the screening methods
is their low computational costs i.e. the required num-
ber of model evaluations. A drawback of this feature
is that the sensitivity measure is only qualitative. It is
qualitative in the sense that the input factors are ranked
in order of importance, but they are not quantied on
how much a given factor is more important than others.
There are various screening techniques described in the
literature. One of the simplest is the one-factor-at-a-
time design (OAT). In OAT designs the input factors
are varied in turn and the eect each has on the output
is measured. Normally, the factors that are not varied
are xed at nominal values, which are the values that
best estimates the factors. A maximum and minimum
value is often used representing the range of likely values
for each factor. Usually the nominal value is chosen to
be midway of these extremes.
The OAT designs can be used to compute the local
impact of the input factors on the model outputs. This
method is often called local sensitivity analysis. It is
usually carried out by computing partial derivatives of
the output functions with respect to the input variables.
The approach often seen in the literature is, instead of
computing derivatives, to vary the input factors in a
small interval around the nominal value. The interval
is usually a xed (e.g. 5%) fraction of the nominal value
and is not related to the uncertainty in the value of the
factors.
In general, the number of model evaluations required
for an OAT design is of the order O(k) (often, 2k + 1),
k being the number of factors examined.
Elementary OAT designs are implemented in Eikos, with
the possibility of choosing a nominal value and extremes
for each input factor and then to vary each factor in turn
to compute local sensitivities (partial derivatives) if the
extremes are not too far away from the nominal values.
5.1 Morris method
Morris [Mor91] came up with an experimental plan that
is composed of individually randomized OAT designs.
The data analysis is then based on the so called elemen-
tary eects, the changes in an output due to changes
in a particular input factor in the OAT design. The
method is global in the sense that it does vary over the
whole range of uncertainty of the input factors. The
Morris method can determine if the eect of the input
factor x
i
on the output y is negligible, linear and ad-
ditive, nonlinear or involved in interactions with other
input factors x
i
11
.
According to Morris the input factor x
i
may be impor-
tant if:
(a) f(x
i
+ , x
i
) f(x) is nonzero, then x
i
af-
fects the output.
(b) f(x
i
+, x
i
) f(x) varies as x
i
varies, then
x
i
aects the output nonlinearly.
(c) f(x
i
+, x
i
)f(x) varies as x
i
varies, then
x
i
aects the output with interactions.
is the variation size.
The input factor space is discretized and the possi-
ble input factor values will be restricted to be inside a
regular k-dimensional p-level grid, where p is the num-
ber of levels of the design. The elementary eect of
a given value x
i
of input factor X
i
is dened as a nite-
dierence derivative approximation
ee
i
(x) = [f(x
1
, x
2
, . . . , x
i1
, x
i
+ ,
x
i+1
, . . . , x
k
) f(x)]/, (5.1)
for any x
i
between 0 and 1 where x {0, 1/(p
1), 2/(p 1), . . . , 1}, and is a predetermined multiple
of 1/(p 1). The inuence of x
i
is then evaluated by
computing several elementary eects at randomly se-
lected values of x
i
and x
i
.
If all samples of the elementary eect of the ith input
factor are zero, then x
i
doesnt have any eect on the
output y, the sample mean and standard deviation will
both be zero. If all elementary eects have the same
value, then y is a linear function of x
i
. The standard
deviation of the elementary eects will then of course
be zero. For more complex interactions, due to inter-
actions between factors and nonlinearity, Morris states
that if the mean of the elementary eects is relatively
large and the standard deviation is relatively small, the
11
x
i
is the input factor under consideration; x
i
is all input factors,
except the input factor under consideration.
16
5.1 Morris method 5 SCREENING METHODS
eects of x
i
on y is mildly nonlinear. If the opposite,
the mean is relatively small and the standard devia-
tion is relatively large, then the eect is supposed to be
strongly nonlinear. As a rule of thumb:
(a) a high mean indicates a factor with an im-
portant overall inuence on the output and
(b) a high standard deviation indicates that ei-
ther the factor is interacting with other fac-
tors or the factor has nonlinear eects on
the output.
To compute r elementary eects of the k inputs we need
to do 2rk model evaluations. With the use of Morris
randomized OAT design the number of evaluations are
reduced to r(k + 1).
The design that Morris proposed is based on the con-
struction of a (k + 1) k orientation matrix B

. Rows
in B

represent input vectors x = x


(1)
, x
(2)
, . . . , x
(k+1)
that denes a trajectory
12
in the input factor space, for
which the corresponding experiment provides k elemen-
tary eects, one for each input factor. The algorithm
of Morris design is
(a) Randomly choose a base value x

for x,
sampled from the set {0, 1/(p1), . . . , 1}.
(b) One or more of the k elements in x are in-
creased by .
(c) The estimated elementary eect of the ith
component of x
(1)
is (if changed by )
ee
i
(x
(1)
) = [f(x
(1)
1
, x
(1)
2
, . . . , x
(1)
i1
, x
(1)
i
+ ,
x
(1)
i+1
, . . . , x
(1)
k
) f(x
(1)
)]/,
if x
(1)
has been increased by , or
ee
i
(x
(1)
) = [f(x
(1)
) f(x
(1)
1
, x
(1)
2
, . . . , x
(1)
i1
,
x
(1)
i
, x
(1)
i+1
, . . . , x
(1)
k
)]/,
if x
(1)
has been decreased by
(d) Let x
(2)
be the new vector (x
(1)
1
, x
(1)
2
, . . . , x
(1)
i1
, x
(1)
i

, x
(1)
i+1
, . . . , x
(1)
k
)) dened above. Select a
new vector x
(3)
such that x
(3)
diers from
x
(2)
for only one component j: either x
(3)
j
=
x
(2)
j
+ or x
(3)
j
= x
(2)
j
, j = i. Estimated
elementary eect of factor j is then
ee
j
(x
(2)
) =
f(x
(3)
) f(x
(2)
)

12
Trajectory A sequence of points starting from a random base
vector in which two consecutive elements dier only for one com-
ponent.
if > 0, or
ee
j
(x
(2)
) =
f(x
(2)
) f(x
(3)
)

otherwise.
(e) The previous step is then repeated such that
a succession of k+1 input vectors x
(1)
, x
(2)
, . . . , x
(k+1)
is produced with two consecutive vectors only
diering in one component.
To produce the randomized orientation matrix B

con-
taining the k + 1 input vectors, Morris proposed:
B

= (J
k+1,1
x

+ (/2)((2BJ
k+1,k
)D

+J
k+1,k
)P

),
(5.2)
where J
k+1,k
is a (k+1)k matrix of ones, B is a (k+1)
k sampling matrix which has the property that for every
of its column i = 1, 2, . . . , k, there are two rows of B that
dier only in their ith entries. D

is a k-dimensional
diagonal matrix with elements 1 and nally P

is a
kk random permutation matrix, in which each column
contains one element equal to 1 and all the others equal
to 0, and no two columns have ones in the same position.
B

provides one elementary eect per factor that is


randomly selected, r dierent orientation matrices B

have then to be selected in order to provide a r k-


dimensional sample.
The main advantage of the Morris design is the rela-
tively low computational cost. The design requires only
about one model evaluation per computed elementary
eect.
One drawback with the Morris design is that it only
gives an overall measure of the interactions, indicating
whether interactions exists, but it does not say which
are the most important. Also it can only be used with a
set of orthogonal input factors, i.e. correlations cannot
be induced on the input factors.
In an implementation, there is a need to think about the
choice of the p levels among which each input factor is
varied. In Eikos these levels correspond to quantiles of
the input factor distributions, if the distributions are
not uniform. For uniform distributions, the levels are
obtained by dividing the interval into equidistant parts.
The choice of the sizes of the levels p and realizations
r is also a problem; various experimenters have demon-
strated that the choice of p = 4 and r = 10 produces
good results [STCR04].
The Matlab code for the Morris method can be found
in Appendix A.3.
17
6 SAMPLING-BASED METHODS 6.2 Regression analysis
6 Sampling-based methods
The sampling-based methods are among the most com-
monly used techniques in sensitivity analysis. They
are computed on the basis of the mapping between
the input-output-relationship generated by the Monte
Carlo simulation. Sampling-based methods are some-
times called global since these methods evaluate the
eect of x
i
while all other input factors x
j
, j = i, are
varied as well. All input factors are varied over their
entire range. This in contrast to local perturbational
approaches where the eect of x
i
is evaluated when the
others x
j
, j = i, are kept constant at a nominal value.
In the rest of this section various common sampling-
based sensitivity analysis methods are described.
6.1 Graphical methods
Providing a means of visualizing the relationships be-
tween the output and input factors, graphical methods
play an important role in SA.
A plot of the points [X
ji
, Y
j
] for j = 1, 2, . . . , N, usually
called a scatter plot can reveal nonlinear or other unex-
pected relationships between the input factor x
i
and the
output y. Scatter plots are undoubtedly the simplest
sensitivity analysis technique and is a natural starting
point in the analysis of a complex model. It facilitates
the understanding of model behavior and the planning
of more sophisticated sensitivity analysis. When only
one or two inputs dominate the outcome, scatter plots
alone often completely reveal the relationships between
the model input X and output Y . Using Latin Hyper-
cube sampling can be particularly revealing due to the
full stratication over the range of each input variable.
In Matlab the function scatter(X,Y) produces a scat-
ter plot of the vectors X and Y.
A tornado graph is another type of plot often used to
present the results of a sensitivity study. It is a simple
bar graph where the sensitivity statistics is visualized
vertically in order of descending absolute value. All of
the global sensitivity measures presented in this section
can be presented with a tornado graph.
6.2 Regression analysis
A sensitivity measure of a model can be obtained using
a multiple regression to t the input data to a theoreti-
cal equation that could produce the output data with as
small error as possible. The most common technique of
regression in sensitivity analysis is that of least squares
linear regression. Thus the objective is to t the input
data to a linear equation (

Y = aX + b) approximating
the output Y , with the criterion that the sum of the
squared dierence between the line and the data points
in Y is minimized. A linear regression model of the
N k input sample X to the output Y takes the form:
Y
i
=
0
+
k

j=1

j
X
ij
+
i
, (6.1)
where
j
are regression coecients to be determined
and
i
is the error due to the approximation, i.e.
i
=
Y
i


Y
i
.
A measure of the extent to which the regression model
can match the observed data is called the model coe-
cient of determination, R
2
, which is dened as:
R
2
=

N
i=1
(

Y
i


Y )
2

N
i=1
(Y
i


Y )
2
, (6.2)
where

Y
i
is the approximated output obtained from the
regression model and Y
i
and

Y are the original values
and their mean respectively. If R
2
is close to 1, then
the regression model is accounting for most of the vari-
ability in Y. If on the other side R
2
is low, nonlinear
behavior is implicated and a linear approximation is
therefore no good. Another method of analysis should
therefore be used. In Matlab (y is the original val-
ues and yhat is the approximation from the regression
model):
R2 = norm(yhat-mean(y))^2/norm(y-mean(y))^2;
Alternatively, R
2
could be obtained in Matlab by the
following code (X is the input factor matrix and y is the
output vector):
% compute correlation matrix
corr = corrcoef([X y]);
% invert the correlation matrix
corr = inv(corr);
% compute the model coefficient of determination
R2 = 1-1/corr(end,end);
The regression coecients
j
, j = 1, . . . , k, measures
the linear relationship between the input factors and
the output. Their sign indicates whether the output in-
creases (positive coecient) or decreases (negative co-
ecient) as the corresponding input factor increases.
Since the coecients are dependent on the units in
which X and Y are expressed, the normalized form of
the regression model is used in sensitivity analysis:

Y
i


Y
i
s
=
k

j=1

j
s
j
s
X
ij


X
j
s
j
, (6.3)
18
6.3 Correlation coecients 6 SAMPLING-BASED METHODS
where
s =
_
N

i=1
(Y
i


Y )
2
N 1
_1/2
, s
j
=
_
N

i=1
(X
ij


X
j
)
2
N 1
_1/2
(6.4)
In sensitivity analysis, the standardized coecients
j
s
j
/ s
in Eq. (6.3), called standardized regression coecients
(SRCs), are used as a sensitivity measure.
If X
j
are independent, SRCs provide a measure of the
importance, based on the eect of moving each variable
away from its expected value by a xed fraction of its
standard deviation while retaining all other variables at
their expected values. Calculating SRCs is equivalent
to performing the regression analysis with the input and
output variables normalized to mean zero and standard
deviation one.
In Eikos, SRCs are computed with the least squares
method (X is the N k-dimensional input matrix and y
is the N-dimensional output vector):
% Add a constant term.
X = [ones(N,1) X];
% Find the least squares solution by the use
% of Matlab backslash operator.
% b is the vector of regression coefficients.
b = X\y;
% "Standardize" the regression coefficients.
r = b.*std(X)/std(y);
% Remove the constant term.
SRC = r(2:end);
An alternative approach to compute the SRCs in Mat-
lab is:
% compute correlation matrix
corr = corrcoef([X y]);
% invert the correlation matrix
corr = inv(corr);
for i=1:length(corr)-1
% compute standardized regression coefficients
SRC(i,1) = -corr(i,end)/corr(end,end);
end
6.3 Correlation coecients
The Correlation coecients (CC) usually known as Pear-
sons product moment correlation coecients, provide
a measure of the strength of the linear relationship be-
tween two variables. CC between two N-dimensional
vectors x and y is dened by:

xy
=

N
k=1
(x
k
x)(y
k
y)
_

N
k=1
(x
k
x)
2
_
1/2
_

N
k=1
(y
k
y)
2
_
1/2
, (6.5)
where x and y are dened as the mean of x and y re-
spectively. CC could be reformulated as:

xy
=
cov(x, y)
(x)(y)
, (6.6)
where cov(x, y) is the covariance between the data sets
x and y and (x) and (y) are the sampled standard
deviations.
The correlation coecient is then the normalized co-
variance between the two data sets and (as SRC) pro-
duces a unitless index between -1 and +1. CC is equal
in absolute value to the square root of the model co-
ecient of determination R
2
associated with the linear
regression.
In Matlab there is a function corrcoef(x,y) that
computes the correlation coecient, but since it returns
the whole correlation matrix an alternative code is given
by:
x_m = x-mean(x); %Remove mean of x
y_m = y-mean(y); %Remove mean of y
SSxx = x_m*x_m;
SSyy = y_m*y_m;
SSxy = x_m*y_m;
rho = SSxy/sqrt(SSxx*SSyy);
CC only measures the linear relationship between two
variables without considering the eect that other possi-
ble variables might have. So when more than one input
factor is under consideration, as it usually is, partial
correlation coecients (PCCs) can be used instead to
provide a measure of the linear relationships between
two variables when all linear eects of other variables
are removed. PCC between an individual variable X
i
and Y can be obtained from the use of a sequence of
regression models. Begin by constructing the following
regression models:

X
i
= c
0
+
k

j=1,j=i
c
j
X
j
,

Y
i
= b
0
+
k

j=1,j=i
b
j
X
j
. (6.7)
Then PCC is dened by the CC of X
i


X
i
and Y

Y .
PCC can also be written in terms of simple correlation
coecients. Denote the PCC of X
1
and Y holding Z =
X
2
, . . . , X
k
xed,
X
1
Y |X
2
,...,X
k
. Then

X
1
Y |Z
=

X
1
Y

X
1
Z

Y Z
_
(1
2
X
1
Z
)(1
2
Y Z
)
. (6.8)
The following Matlab code computes PCC for every
input factor in the input matrix:
19
6 SAMPLING-BASED METHODS 6.5 Two-sample tests
% compute correlation matrix
corr = corrcoef([X y]);
for i=1:length(corr)-1
% 2 by 2 correlation matrix of Xi and y.
S11 = [1 corr(i,end);corr(i,end) 1];
% k-1 by k-1 correlation matrix of X~i.
S22 = corr;
% Remove effects of Xi and y.
S22(:,i)=[];S22(i,:)=[];S22(:,end)=[];S22(end,:)=[];
% 2 by k matrix of all pairwise
% correlations between (Xi,y) and X~i.
S12 = [corr(:,i); corr(:,end)];
% Remove effects of Xi and y.
S12(:,i)=[];S12(:,end)=[];
% Form the condition correlation matrix of (Xi,y)
% given X~i.
S112 = S11-S12*inv(S22)*S12;
% PCC is the off-diagonal of S112 divided by the
% geometric mean of the two diagonals.
PCC(i,1) = S112(1,2)/sqrt(S112(1,1)*S112(2,2));
end
An alternative approach to compute the PCCs in Mat-
lab is:
% compute correlation matrix
corr = corrcoef([X y]);
% invert the correlation matrix
corr = inv(corr);
for i=1:length(corr)-1
% compute partial correlation coefficient
PCC(i,1) = -corr(i,end)/sqrt(corr(i,i)*corr(end,end));
end
PCC characterizes the strength of the linear relation-
ship between two variables after a correction has been
made for the linear eects of the other variables in the
analysis. SRC on the other hand characterizes the ef-
fect on the output variable that results from perturbing
an input variable by a xed fraction of its standard de-
viation. Thus, PCC and SRC provide related, but not
identical, measures of the variable importance. When
input factors are uncorrelated, results from PCC and
SRC are identical.
6.4 Rank transformations
Since the above methods are based on the assumption
of linear relationships between the input-output factors,
they will perform poorly if the relationships are non-
linear. Rank transformation of the data can be used
to transform a nonlinear but monotonic
13
relationship
to a linear relationship. When using rank transforma-
tion the data is replaced with their corresponding ranks.
Ranks are dened by assigning 1 to the smallest value,
2 to the second smallest and so-on, until the largest
value has been assigned the rank N. If there are ties
13
A function is monotone if its output only increases or decreases
with the inputs, i.e. it preserves the order.
in the data set, then the average rank is assigned to
them. The usual regression and correlation procedures
are then performed on the ranks instead of the original
data values. Standardized rank regression coecients
(SRRC) are SRC calculated on ranks, spearman rank
correlation coecient (RCC) are corresponding CC cal-
culated on ranks and partial rank correlation coecients
(PRCC) are PCC calculated on ranks.
The model coecient of determination R
2
in Eq. (6.2)
can be computed with the ranked data, and measures
then how well the model matches the ranked data.
Rank transformed statistics are more robust, and pro-
vide a useful solution in the presence of long tailed
input-output distributions. A rank transformed model
is not only more linear, it is also more additive. Thus
the relative weight of the rst-order terms is increased
on the expense of higher-order terms and interactions.
The following Matlab code is used in Eikos to acquire
ranks r of a vector x:
[foo,i ] = sort(x);
[foo,r1] = sort(i);
[foo,i ] = sort(flipud(x));
[foo,r2] = sort(i);
r = (r1+flipud(r2))/2;
6.5 Two-sample tests
Two-sample tests were originally designed to check the
hypothesis that two dierent samples belong to the same
population [Con99]. In sensitivity analysis, two-sample
tests can be used together with Monte Carlo ltering.
In Monte Carlo ltering you partition the input samples
into two sub-samples according to restrictions on the
output distribution. The two-sample test is then per-
formed on the two empirical cumulative distributions
of the sub-samples, and measures if their dierence is
signicant or not, i.e. if the two distributions are dif-
ferent, then it can be said that the factor inuences the
output.
To obtain in Matlab the empirical cumulative distri-
butions F
1
(x) and F
2
(x) out of two sub-samples X
1
and
X
2
the following code can be used (X1, X2 are the two
input samples and F1x, F2x are their corresponding
empirical CDFs):
% Set up an EDGES vector.
edges = [-inf;sort([X1;X2]);inf];
% Count number of values falling
% between elements in EDGES vector.
counts1 = histc(X1,edges)
counts2 = histc(X2,edges)
20
7 VARIANCE-BASED METHODS
% Normalized cumulative sum of the values.
sumCounts1 = cumsum(counts1)./sum(counts1);
sumCounts2 = cumsum(counts2)./sum(counts2);
% Remove last element.
F1x = sumCounts1(1:end-1);
F2x = sumCounts2(1:end-1);
The Smirnov test is dened as the greatest vertical dis-
tance between the two empirical cumulative distribu-
tions:
smirnov(Y, X
j
) = Max
X
j
|F
1
(X
i
) F
2
(X
i
)|, (6.9)
where F
1
and F
2
are the cumulative distributions of X
j
estimated on the two sub-samples and the dierence is
estimated at all the x
ij
points, i = 1, . . . , N. In Matlab
the Smirnov statistics is computed as:
% Smirnov test
smirnov = max(abs(F1x - F2x));
A test similar to the Smirnov test, but with slightly
more calculations, is the Cramer-Von Mises test, which
is given by:
cramer(Y, X
j
) =
N
1
N
2
(N
1
+N
2
)
2

X
j
(F
1
(X
i
) F
2
(X
i
))
2
,
(6.10)
where the squared dierence in the summation is com-
puted at each x
ij
, i.e. the statistic depends upon the
total area between the two distributions. In Matlab
(n1, n2 are respectively the number of samples in the
two sub-samples X
1
and X
2
):
% Cramer-Von Mises test
cramer = n1*n2/(n1+n2)^2*sum((F1x - F2x).^2);
6.6 Discussion
According to Saltelli et al. [SM90],[SH91] the estima-
tors PRCC and SRRC appears to be, in general, the
most robust and reliable methods of the ones described
in this section, followed by Spearmans RCC and the
Smirnov test. Nevertheless, all above tests have been
included in Eikos since they could be of use in dierent
contexts.
The rankings of SRCC and PRCC are usually identical,
so they could be considered redundant. Dierences oc-
cur only when there are signicant correlations amongst
the input factors.
Predictions using SRRC are strongly correlated to those
of Spearmans RCC and the Smirnov test. Therefore,
the value of the model coecient of determination R
2
plays a crucial role, as it indicates the degree of relia-
bility of the regressed model in SRRCs as well as the
other techniques.
7 Variance-based methods
The main idea of the variance-based methods is to quan-
tify the amount of variance that each input factor X
i
contributes with on the unconditional variance of the
output V (Y ).
We are considering the model function: Y = f(X),
where Y is the output and X = (X
1
, X
2
, . . . , X
k
) are
k independent input factors, each one varying over its
own probability density function.
We want to rank the input factors according to the
amount of variance that would disappear, if we knew
the true value x

i
of a given input factor X
i
. V (Y |X
i
=
x

i
) is the conditional variance of Y given X
i
= x

i
and
is obtained by taking the variance over all factors but
X
i
.
In most cases we do not know the true value x

i
for each
X
i
. Therefore, the average of this conditional variance
for all possible values x

i
of X
i
is used, i.e. E[V (Y |X
i
)] is
the expectation value over the whole variation interval
of the input X
i
. Having the unconditional variance of
the output V (Y ), the above average and by using the
following property of the variance:
V (Y ) = V (E[Y |X
i
]) +E[V (Y |X
i
)], (7.1)
we obtain the variance of the conditional expectation
V
i
= V (E[Y |X
i
]). This measure is sometimes called
main eect and is used as an indicator of the impor-
tance of X
i
on the variance of Y , i.e. the sensitivity of
Y to X
i
. Normalizing the main eect V
i
by the uncon-
ditional variance of the output we obtain:
S
i
=
V (E[Y |X
i
])
V (Y )
. (7.2)
The ratio S
i
was named rst order sensitivity index by
Sobol [Sob93]. Various other names for this ratio can
be found in the literature: importance measure, cor-
relation ratio and rst order eect.
The rst order sensitivity index measures only the main
eect contribution of each input parameter on the out-
put variance. It doesnt take into account the interac-
tions between input factors. Two factors are said to
interact if their total eect on the output isnt the sum
of their rst order eects. The eect of the interac-
tion between two orthogonal factors X
i
and X
j
on the
output Y would in terms of conditional variances be:
V
ij
= V (E[Y |X
i
, X
j
]) V (E[Y |X
i
]) V (E[Y |X
i
]). (7.3)
V (E[Y |X
i
, X
j
]) describes the joint eect of the pair (X
i
, X
j
)
on Y . This eect is known as the second-order eect.
21
7 VARIANCE-BASED METHODS 7.1 Sobol indices
Higher-order eects can be computed in a similar fash-
ion, i.e. the variance of the third-order eect between
the three orthogonal factors X
i
, X
j
and X
l
would be:
V
ijl
= V (E[Y |X
i
, X
j
, X
l
])
V
ij
V
il
V
jl
V
i
V
j
V
l
. (7.4)
A model without interactions is said to be additive, for
example a linear one. The rst order indices sums up
to one in an additive model with orthogonal inputs.
For additive models, the rst order indices coincides
with what can be obtained with regression methods
(described in section 6.2). For non-additive models we
would like to know information from all interactions
as well as the rst order eect. For non-linear mod-
els the sum of all rst order indices can be very low.
The sum of all the order eects that a factor accounts
for is called the total eect [HS96]. So for an input
X
i
, the total sensitivity index S
T
i
is dened as the sum
of all indices relating to X
i
(rst and higher order).
Having a model with three input factors (k = 3), the
total sensitivity index for input factor X
1
would then
be: S
T
1
= S
1
+ S
12
+ S
13
+ S
123
. Computing all order-
eects to obtain the total eect by brute force is not
advisable when the number of input factors k increases,
since the number of terms needed to be evaluated are
as many as 2
k
1.
In the following subsections, four methods to obtain
these sensitivity indices are described. These are the
Sobol indices, Jansens Winding Stairs technique, Fou-
rier Amplitude Sensitivity Test (FAST) and the Ex-
tended Fourier Amplitude Sensitivity Test (EFAST).
All four methods except FAST can obtain both the rst
and total order eect in an ecient way. The standard
FAST method only computes the rst order eect. The
Sobol method can achieve all order eects, but the need
of model evaluations increases too much to be practical.
The input factor space
k
is hereafter assumed to be
the k-dimensional unit hypercube:

k
= (X|0 X
i
1; i = 1, . . . , k). (7.5)
This doesnt give any loss of generality since the factors
can be deterministically transformed from the uniform
distributions to a general PDF as described in section
3. The input factors are also assumed to be orthogonal
against each other, thus no correlation structure can be
induced on the input factors.
The expected value of the output E(Y ) can be evaluated
by the k-dimensional integral:
E(Y ) =
_

k
f(X)p(X)dX =
_

k
f(X)dX, (7.6)
where p(X) is the joint probability density function as-
sumed to be uniform for each input factor.
7.1 Sobol indices
Sobol [Sob93] introduced the rst order sensitivity in-
dex by decomposing the model function f into sum-
mands of increasing dimensionality:
f(X
1
, . . . , X
k
) = f
0
+
k

i=1
f
i
(X
i
) +
k

1=1
k

j=i+1
f
ij
(X
i
, X
j
) +. . . +
f
1...k
(X
1
, . . . , X
k
). (7.7)
This representation of the model function f(X), which
is a decomposition into summands of increasing dimen-
sionality, holds if f
0
is a constant (f
0
is the expectation
of the output, i.e. E[Y ]) and the integrals of every sum-
mand over any of its own variables are zero, i.e.:
_
1
0
f
i
s
(X
i
1
, . . . , X
i
s
)dX
i
k
= 0, if 1 k s (7.8)
As a consequence of this, all the summands are mutu-
ally orthogonal.
The total variance V (Y ) is dened as:
V (Y ) =
_

k
f
2
(X)dXf
2
0
(7.9)
and the partial variances are computed from each of the
terms in Eq. (7.7)
V
i
1
...i
s
=
_
1
0

_
1
0
f
2
i
1
...i
s
(X
i
1
, . . . , X
i
s
)dX
i
1
. . . dX
i
s
(7.10)
where 1 i
1
< < i
s
k and s = 1, . . . , k.
The sensitivity indices are then obtained by:
S
i
1
...i
s
=
V
i
1
...i
s
V (Y )
, for 1 i
1
< < i
s
k. (7.11)
The integrals in Eq. (7.9) and Eq. (7.10) can be com-
puted with Monte Carlo methods. For a given sample
size N the Monte Carlo estimate of f
0
is:

f
0
=
1
N
N

m=1
f(X
m
) (7.12)
where X
m
is a sampled point in the input space
k
. In
Matlab (sample matrix X is of size Nk, function2evaluate
is the model function under consideration):
Y = function2evaluate(X);
f0 = mean(Y);
22
7.1 Sobol indices 7 VARIANCE-BASED METHODS
The Monte Carlo estimate of the output variance V (Y )
is:

V (Y ) =
1
N
N

m=1
f
2
(X
m
)

f
2
0
(7.13)
In Matlab:
% second parameter makes var() normalize
% by N instead of N-1
V = var(Y,1);
Main eect of input factor X
i
is estimated as:

V
i
=
1
N
N

m=1
f(X
(M1)
im
, X
(M1)
im
)f(X
(M2)
im
, X
(M1)
im
)

f
2
0
.
(7.14)
Here we need to use two sampling matrices, X
(M1)
and
X
(M2)
, both of size N k. X
(M1)
im
means the full set of
samples from X
(M1)
except the ith one. Matrix X
(M1)
is usually called the data base matrix while X
(M2)
is
called the resampling matrix [CST00].
In Matlab this is implemented as (XM1 and XM1 are
the two dierent sampling matrices both of size N k):
YXM1 = function2evaluate(XM1);
for i=1:k
XM2_i = XM2;
XM2_i(:,i) = XM1(:,i);
YXM2_i = function2evaluate(XM2_i);
Vi(i) = 1/n*sum(YXM1.*YXM2_i)-f02;
end
For formulas of computing partial variances of higher
order than in Eq. (7.14) I refer to [HS96], a separate MC
integral is required to compute any eect. Counting the
MC integral needed for computing

f
0
, a total of 2
k
MC
integrals are therefore needed for a full characterization
of the system.
In 1996, Homma and Saltelli proposed an extension
for direct evaluation of the total sensitivity index S
T
i
[HS96]. Now S
T
i
can be evaluated with just one Monte
Carlo integral instead of computing the 2
k
integrals.
They suggested dividing the set of input factors into
two subsets, one containing the given variable X
i
and
the other containing its complementary set X
ci
. The
decomposition of f(X) would then become:
f(X) = f
0
+f
i
(X
i
) +f
ci
(X
ci
) +f
i,ci
(X
i
, X
ci
). (7.15)
The total output variance V (Y ) would be:
V (Y ) = V
i
+V
ci
+V
i,ci
(7.16)
and the total eect sensitivity index S
T
i
is:
S
T
i
= S
i
+S
i,ci
= 1 S
ci
. (7.17)
Thus, to obtain the total sensitivity index for variable
X
i
we only need to obtain its complementary index
S
ci
= V
ci
/V (Y ). Homma and Saltelli [HS96] shows that
V
ci
can be estimated with just one Monte Carlo integral:

V
ci
=
1
N
N

m=1
f(X
(M1)
im
, X
(M1)
im
)f(X
(M1)
im
, X
(M2)
im
)

f
2
0
.
(7.18)
In Matlab:
for i=1:k
XM1_i = XM1;
XM1_i(:,i) = XM2(:,i);
YXM1_i = function2evaluate(XM1_i);
Vci(i)= 1/n*sum(YXM1.*YXM1_i)-f02;
end
So to compute the rst and total order sensitivity in-
dices we take S
i
=

V
i
/

V (Y ) and S
T
i
= 1

V
ci
/

V (Y )
respectively.
In Matlab:
Si = Vi/V;
Sti = 1-Vci/V;
To obtain both rst and total order sensitivity indices
for k factors and N samples, with Sobol method, we
need to make N(2k + 1) model evaluations (this if we
use either X
(M1)
or X
(M2)
to compute the unconditional
variance V (Y ), described in Eq. (7.13)).
When the mean value (f
0
) is large, a loss of accuracy
is induced in the computation of the variances by the
Monte Carlo methods presented above [Sob01]. If we
dene c
0
as an approximation to the mean value f
0
, the
new model function f(x) c
0
could be used instead of
the original model function (f(x)) in the analysis. For
the new model function, the constant term will then
be very small, and the loss of accuracy, due to a large
mean value, will disappear. This transformation is not
implemented in Eikos.
When computing Sobol indices, the standard MC sam-
pling schemes are usually not used, instead Sobols LP

sequences are used. LP

sequences are quasi-random


sequences used to produce points uniformly distributed
in the unit hypercube. The dierence between quasi-
random numbers and uncorrelated pseudo-random num-
bers is that the quasi-random numbers maintain a nearly
uniform density of coverage of the domain while pseudo-
random numbers may have places that are relatively
undersampled and other places that have clusters of
points. The main reason to use quasi-random numbers
instead of pseudo-random numbers in a MC simulation
is that the former converges faster [Sob01].
23
7 VARIANCE-BASED METHODS 7.2 Jansen (Winding Stairs)
Eikos has not yet incorporated algorithms to produce
LP

sequences, thus the Sobol indices has to be com-


puted on the basis of pseudo-random numbers.
7.2 Jansen (Winding Stairs)
Chan, Saltelli and Tarantola [CST00] proposed the use
of a new sampling scheme to compute both rst and to-
tal order sensitivity indices in only N k model evalua-
tions. The sampling method used to measure the main
eect was called Winding Stairs, developed by Jansen,
Rossing and Deemen in 1994 [JRD94]. The main eect
is computed as:
V
J
i
= V (Y )
1
2
E[f(X
i
, X
i
) f(X
i
, X

i
)]
2
, (7.19)
and the total eect is computed as:
V
J
Ti
=
1
2
E[f(X
i
, X
i
) f(X

i
, X
i
)]
2
. (7.20)
Here (X
i
, X

i
) denotes that all variables are resam-
pled, except for the ith one. Jansens method uses the
squared dierences of two sets of model outputs to com-
pute the indices whereas the Sobol method uses the
product [CTSS00]. It has been shown that the covari-
ances in
1
2
E[f(X
i
, X
i
) f(X
i
, X

i
)]
2
=
V (Y ) Cov[f(X
i
, X
i
), f(X
i
, X

i
)] (7.21)
and
1
2
E[f(X
i
, X
i
) f(X

i
, X
i
)]
2
=
V (Y ) Cov[f(X
i
, X
i
), f(X

i
, X
i
)] (7.22)
are equivalent to those of the Sobol rst and total par-
tial variances.
The Winding Stairs sampling scheme was designed to
make multiple use of model evaluations. With a single
series of N model evaluations, it can compute both the
rst-order and the total sensitivity indices. The wind-
ing stairs method consists in computing the model out-
puts after each drawing of a new value for an individual
parameter and building up a so-called WS-matrix.
The WS-matrix is set up in such a way that no two
observations within a column share common input pa-
rameters. Therefore, the output within each column
of the matrix is independent and can be used to esti-
mate the variance of the output. In total, k (N + 1)
input points are generated, for the theory on how the
WS-matrix cyclically is built up I refer to [CST00]. An
example of the output Winding Stairs matrix for k = 3
and N = 4:

y
1
y
2
y
3
y
4
y
5
y
6
y
7
y
8
y
9
y
10
y
11
y
12

f(X
11
, X
21
, X
31
) f(X
11
, X
22
, X
31
) f(X
11
, X
22
, X
32
)
f(X
12
, X
22
, X
32
) f(X
12
, X
23
, X
32
) f(X
12
, X
23
, X
33
)
f(X
13
, X
23
, X
33
) f(X
13
, X
24
, X
33
) f(X
13
, X
24
, X
34
)
f(X
14
, X
24
, X
34
) f(X
14
, X
25
, X
34
) f(X
14
, X
25
, X
35
)
f(X
15
, X
25
, X
35
) f(X
15
, X
26
, X
35
) f(X
15
, X
26
, X
36
)

(7.23)
Matlab code to build up the WS matrix is as follows
(sample matrix Xsample is of size N + 1 k):
for i=1:k
X = Xsample(1:N,:);
WS(1:N,i) = function2evaluate(X);
if i~=k
Xsample(1:N,i+1) = Xsample(2:N+1,i+1);
end
end
The WS sample estimate of V (Y ) is then computed as:

V
WS
(Y ) =
1
k(N 1)
k

i=1
_
_
N

m=1
y
2
(m, i)
_
1
N
N

m=1
y(m, i)
_2
_
_
,
(7.24)
where y(m, i) is the (m, i)th element in the WS-matrix.
In Matlab:
V = 1/k*sum(var(WS));
Estimates of the main eect V
i
are computed as:

V
WS
i
=

V
WS
(Y )
1
2N
N

j=i
[y
k(j1)+i
y
kj+i1
]
2
, (7.25)
where y
k
is the kth y as in the example WS-matrix
in Eq. (7.23), circulary shifted if the index is out of
bounds.
Matlab supports one-dimensional indexing in the two-
dimensional WS-matrix. But since Matlab stores its
matrices column wise instead of row wise as in other
programming languages like C, we rst need to trans-
pose the WS-matrix:
WS = WS;
for i=1:k
Vi(i) = V-1/2/N*sum((WS(k*(i-1:N-1)+i)- ...
WS([k*(i:N-1)+i-1 ...
mod(i-2,N*k)+1])).^2);
end
Estimates of the complementary eect V
ci
is computed
as:

V
WS
ci
=

V
WS
(Y )
_
1
2N

N
j=1
[y
jk
y
jk+1
]
2
, if i = 1
1
2N

N
j=1
[y
k(j1)+i1
y
k(j1)+i
]
2
, if i = 1.
24
7.3 Fourier Amplitude Sensitivity Test 7 VARIANCE-BASED METHODS
In Matlab:
for i=1:k
if i==1
Vci(i) = V-1/2/N*sum((WS(k*(1:N))- ...
WS([k*(1:N-1)+1 1])).^2);
else
Vci(i) = V-1/2/N*sum((WS(k*(0:N-1)+i-1)- ...
WS(k*(0:N-1)+i)).^2);
end
end
As before the rst and total order sensitivity indices
are computed as S
i
=

V
WS
i
/

V
WS
(Y ) and S
T
i
= 1

V
WS
ci
/

V
WS
(Y ) respectively.
7.3 Fourier Amplitude Sensitivity Test
The Fourier Amplitude Sensitivity Test (FAST) was
proposed already in the 70s [CFS
+
73, SS73, CSS75],
and was at the time successfully applied to two chemi-
cal reaction systems involving sets of coupled, nonlinear
rate equations.
The main idea underlying the FAST method is to con-
vert the k-dimensional integral in Eq. (7.6) into a one-
dimensional integral, applying a theorem proposed by
Weyl [Wey38]. Each uncertain input factor X
i
is re-
lated to a frequency
i
and transformed by X
i
(s) =
G
i
(sin(
i
s)), where G
i
is a suitably dened paramet-
ric equation which allows each factor to be varied in its
range, as the parameter s is varied. The set {
1
, . . . ,
k
}
are linearly independent integer frequencies. The para-
metric equations dene a curve that systematically ex-
plores the whole input factor space
k
. According to
Chan et al. [CTSS00], the multidimensional integral
in Eq. (7.6) can be estimated by integrating over the
curve:

E(Y ) =
1
2
_

f(s)ds, (7.26)
where f(s) = f(G
1
(sin(
1
s)), G
2
(sin(
2
s)), . . . , G
k
(sin(
k
s))).
The output variance may be approximated by perform-
ing a Fourier analysis as

V
FAST
(Y ) =
1
2
_

f
2
(s)ds

E
2
(Y )

j=
(A
2
j
+B
2
j
) (A
2
0
+B
2
0
)
2
N

j=1
(A
2
j
+B
2
j
), (7.27)
A
j
and B
j
are the Fourier coecients dened as:
A
j
=
1
2
_

f(s) cos(js)ds
B
j
=
1
2
_

f(s) sin(js)ds. (7.28)


Finally, the partial variances are approximated by:

V
FAST
i
= 2
M

p=1
(A
2
p
i
+B
2
p
i
), (7.29)
where M is the maximum harmonic we consider
14
, usu-
ally assigned the value 4 or 6.
An application would need to numerically evaluate the
Fourier coecients A
2
p
i
and B
2
p
i
. McRae et al. [MTS82]
proposed the following dierence expressions for the
Fourier coecients that can be derived by a simple nu-
merical quadrature technique:
A
j
=
_
0,
1
N
(y
0
+

q
p=1
(y
p
+y
p
) cos(
jp
N
),
if j odd
if j even
B
j
=
_
0,
1
N
(

q
p=1
(y
p
y
p
) sin(
jp
N
),
if j even
if j odd
where q = (N 1)/2.
Matlab code to compute the Fourier coecients AC
and BC are as follows (Y is the N 1 output vector):
AC = zeros(N,1); % initially zero
BC = zeros(N,1); % initially zero
q = (N-1)/2;
N0 = q+1;
for j=2:2:N % j is even
AC(j) = 1/N*(Y(N0)+(Y(N0+(1:q))+Y(N0-(1:q)))* ...
cos(pi*j*(1:q)/N));
end
for j=1:2:N % j is odd
BC(j) = 1/N*(Y(N0+(1:q))-Y(N0-(1:q)))* ...
sin(pi*j*(1:q)/N);
end
Saltelli et al. [STC99] recommend a suitable transfor-
mation parametric equation G
i
dened as:
X
i
(s) = G
i
(sin(
i
s)) =
1
2
+
1

arcsin(sin(
i
s)). (7.30)
According to Saltelli et al. this transformation better
provides uniformly distributed samples for each factor
X
i
in the unit hypercube
k
than has been proposed
by various others, among them Cukier et al. [CFS
+
73].
Matlab code to generate the N k input factor ma-
trix X according to Eq. (7.30) (OM is an 1 k-vector of
frequencies):
S = pi/2*(2*(1:N)-N-1)/N;
ANGLE = OM*S;
X = 0.5+asin(sin(ANGLE))/pi;
Saltelli et al. [STC99] has shown that according to the
Nyquist criterion, the minimum sample size required
14
M is the maximum number of Fourier coecients that may be
retained in calculating the partial variances without interferences
between the assigned frequencies.
25
8 EIKOSTHE APPLICATION
to compute

V
FAST
i
is 2M
max
+ 1, where
max
is the
largest frequency in the set {
1
, . . . ,
k
}.
In 1998, Saltelli and Bolado [SB98] proved that the ratio

V
FAST
i
/

V
FAST
(Y ) computed with the FAST method is
equivalent to the rst order sensitivity indices proposed
by Sobol [Sob93].
The Matlab code for FAST can be found in Appendix
A.4.
7.4 Extended Fourier Amplitude Sensi-
tivity Test
In 1999, Saltelli et al. [STC99] proposed an improve-
ment of the FAST method. They called it the Extended
Fourier Amplitude Sensitivity Test (EFAST). With this
method they could estimate the total eect indices, as
in the Sobol method, by estimating the variance in the
complementary set

V
FAST
ci
. This is done by assigning
a frequency
i
for the factor X
i
(usually high) and al-
most identical frequencies to the rest
i
(usually low).
The partial variance of the complementary set is then
computed as:

V
FAST
ci
= 2
M

p=1
(A
p
2
i
+B
p
2
i
). (7.31)
A modication of the parametric equation in Eq. (7.30)
was also introduced to get a more exible sampling
scheme. Since G
i
in Eq. (7.30) always return exactly
the same points in
k
, a random phase-shift
i
was
added. The new equation now becomes
X
i
(s) = G
i
(sin(
i
s)) =
1
2
+
1

arcsin(sin(
i
s+
i
)). (7.32)
Because of symmetry properties the curve now must
be sampled over (, ). The technique of using many
phases generating dierent curves in
k
and doing in-
dependent Fourier analysis over them and nally taking
the arithmetic means over the estimates is called resam-
pling.
A whole new set of model evaluations are needed to
compute each of the k complementary variances

V
FAST
ci
,
so the computational cost to obtain all rst and total
order indices are k(2M
max
+ 1)N
r
, where N
r
is the
number of resamples that was done.
The Matlab code for EFAST can be found in Ap-
pendix A.5.
7.5 Discussion
The variance-based methods described in this section
are considered being quantitative SA methods. All meth-
ods can compute the main eect contribution of each
input factor to the output variance. Also the total sen-
sitivity index can be obtained by the Sobol, Jansen
and EFAST methods. The total eect index is a more
accurate measure of a factor on the model output, since
it takes into account all interaction eects involving that
factor.
To compute the main eect contribution, the FAST
method only requires a single set of model evaluations.
EFAST needs k(2M
max
+ 1)N
r
model evaluations to
compute both the main eect as well as the total eect
contribution. To compute the Sobol indices, the re-
quired model runs are N(2k+1) using the Sobol method
and only Nk model evaluations using the WS-sampling
scheme. The WS-sampling scheme was designed to
make multiple use of model evaluations. Chan et al.
[CTSS00] express a concern that the reduction of model
evaluations in Jansens method might aect the accu-
racy of the obtained estimates.
Quasi-random sampling has not been implemented in
Eikos, and as it is the commonly used sampling scheme
in computing the Sobol indices the other methods might
be preferable.
8 EikosThe application
Eikos includes a graphical user interface (GUI) which
facilitates the SA process. Essentially, the GUI is a
package that puts together all the implemented meth-
ods with the help of GUIDE
15
. Although the GUI has
not been the essential part of this work, nevertheless a
reasonable amount of time has been spent on its devel-
opment. It should be noted that GUIDE is not a very
exible tool and therefore the GUI suers from several
bloopers and smaller bugs.
In this section several essential views of the GUI are
presented.
8.1 The main window
The user can start Eikos by typing
>>eikos
at the Matlab prompt. The Main window, shown in
Fig. 17, will then appear.
The main window is the base of the toolbox. It is di-
vided into three sections: Pre Processing, Model Eval-
15
GUIDE is the Matlab GUI development toolkit.
26
8.3 Using a standard Matlab function 8 EIKOSTHE APPLICATION
uation, and Post Processing. Usually a SA will be-
gin with the functions in pre processing, then continue
with the ones in model evaluation and nish with post
processing, although there are some cases that follow a
dierent order of operations.
The user starts with choosing which type of model will
be analysed. With the radio buttons in the Model
Type-panel, a Simulink model or Matlab function
can be chosen. These two dierent model types are
described in the next two sub-sections.
Figure 17: Main window of Eikos.
8.2 Using a Simulink model
To analyse a Simulink model the user should choose
Simulink model in the Model Type-panel. Simulink
specic buttons then become enabled and the Matlab
function specic buttons disabled. The user then has
to load into Eikos a specic Simulink model, by push-
ing the button Load model. Eikos automatically nds
all possible input/output factors in the loaded Simulink
model and saves them into a structure. Solver specic
settings are also saved into the structure. By clicking
on the button Solver Cong the window in Fig. 18
appears and the user can choose the simulation time
interval, the solver to be used and the solver specic set-
tings. If the user wants to run the model only once, then
he just have to choose the output factors of interest,
by pushing the button Select in the Output Factors-
panel, and push the RUN button in the Model Eval-
uation section. The results can be plotted by pushing
the Single Run button in the Post Processing section.
Figure 18: Window for choosing solver specic settings.
If the user wants to do a SA/UA on specic input fac-
tors, he has to select the input factors of interest by
pushing the Select button in the Input Factors-panel.
Then the user needs to assign a distribution to each of
them, by pushing the button Congure (Section 8.4
shows how to assign distributions to the factors). A
correlation matrix can be assigned to induce correla-
tion between the input factors by pushing the Corre-
late button in the Input Factors-panel. In the Model
Evaluation-panel the user can specify the time point(s),
for which the SA is going to be performed. If no time
point is specied, then the SA will be performed for
the last simulation time. The user can also choose to
perform the SA for the maximum value of the output.
8.3 Using a standard Matlab function
To perform SA/UA on a Matlab function, the user
should choose Matlab model in the Model Type-
panel, and add input and output factors using the Se-
lect buttons in the corresponding factor panel. Now
any function, with parameters corresponding to the in-
put factors and outputs to the desired output factors,
can be written in the Matlab fnc/expr edit block.
8.4 The distribution chooser window
When pushing the button Congure in the Input Factors-
panel the distribution chooser window appears (Fig. 19
shows the normal distribution with mean = 2.5 and
standard deviation = 1, truncated by [0,6], assigned
to input factor X1.).
In the distribution chooser window the user can assign
a distribution to each of the dierent input factors. All
27
8 EIKOSTHE APPLICATION 8.6 The uncertainty analysis window
probability distributions described in Section 3 can be
chosen. The distributions can be truncated, so that the
probability to get very large or very small values is zero.
Constant values can be set for input factors that are not
supposed to be included in the SA.
Figure 19: Distribution chooser window
8.5 The correlation matrix window
When the user pushes the Correlate button in the In-
put Factors-panel, the correlation matrix window ap-
pears. In the correlation matrix window the user can
assign a correlation structure describing pair dependen-
cies between the input factors. The window is generic
and extends and contracts according to the number of
input factors under consideration. Fig. 20 shows how
the window would look like if a correlation matrix, such
as in Eq. (2.3), was desired.
Figure 20: Correlation matrix with 4 input factors.
It is only the lower triangular part of the matrix that
can be changed by the user. The diagonal always con-
sists of ones, and the upper triangular automatically
mirrors the lower part. The values is restricted to be
in the range [-1,1]. If the matrix is not positively semi-
denite the user will not be able to save it. If the user
does not open the correlation matrix window, and thus
does not choose any correlation matrix, rank correlation
is not induced on the input samples.
8.6 The uncertainty analysis window
After running a simulation, the user can open the uncer-
tainty analysis window (Fig. 21) by pushing the UA
button in the Post Processing-section. The output of
a Monte Carlo simulation can be visualized with a fre-
quency plot, cumulative plot or the inverse cumulative
plot of the empirical distribution. These can be shown
as histograms, area, stem or line plots. The mean, vari-
ance and standard deviation of the empirical distribu-
tions are also shown. The distributions assigned to the
inputs and computed for the outputs can also be visu-
alized in this window.
Figure 21: Uncertainty analysis window showing the
Weibull distribution assigned to the input factor X1.
For the input factors, the analytical PDF curve of the
chosen distribution is also plotted in the chart (Fig.
21), which gives an indication of whether the samples
are estimating the desired distribution well enough.
8.7 The sensitivity analysis window
To visualize the sampling-based sensitivity analysis, the
user should open, after a simulation, the sensitivity
analysis window by pushing the SA button in the Post
Processing-section. A plot with the values of each in-
put factor statistics, as well as a tornado plot for the
sampling-based methods is showed.
The results shown in Fig. 22 indicate that the out-
put under consideration (Y ) is strongly negatively cor-
28
8.9 The variance-based methods window 8 EIKOSTHE APPLICATION
related with input factor X4. The low and high coe-
cients of determination obtained for the raw and ranked
data, respectively, indicate that there is monotonic de-
pendency between the input factors and the output.
Figure 22: Sensitivity analysis window showing the re-
sults of a SA using the SRRC method.
8.8 The scatter plot window
When the user pushes the Scatter plot button in the
Post Processing-section after a MC simulation, he can
view scatter plots of the input-output relationships. These
plots can use the rank-transformed or the raw data sam-
ples. A regression line based on least squares can also
be added to the plot.
Figure 23: Scatter plot window.
Fig. 23 shows the scatter plot window with a model that
seems to have an output (Y ) that is linearly dependent
on input factor X3. The regression line approximates
the data very well.
8.9 The variance-based methods window
If an analysis has been done with any of the variance-
based methods (Sobol, Jansen, FAST or EFAST), re-
sults can be examined by pushing the corresponding
buttons in the Post Processing-section. The window
shows the results using pie-charts and value plots of
both rst order and total eect sensitivity indices. Fig.
24 shows the result of a EFAST analysis of the non-
monotone Ishigami function described in Section 9.5.2.
Unexplained variance is shown in white color in the pie-
chart. Due to the presence of higher order eects, the
rst order indices of the Ishigami function do not ex-
plain all output variance.
Figure 24: Window showing results from variance based
methods.
8.10 The Morris screening window
After performing a Morris screening on a model, the re-
sults can be graphically displayed by pushing the Mor-
ris button in the Post Processing-section. A plot of
the estimated means against the estimated standard de-
viations for the elementary eects will be shown (see
Fig. 25).
29
8 EIKOSTHE APPLICATION 8.12 Exporting and saving results
Figure 25: Window showing results from a Morris
screening of the Sobol g-function described in Section
9.5.1.
8.11 The Monte Carlo ltering window
Figure 26: Window showing results from Monte Carlo
ltering.
In order to activate the Monte Carlo ltering window,
after a simulation, the user should push the MCF but-
ton in the Post Processing-section. In this window,
the user can divide the input distributions according to
criterions applied on the output. The dierence of the
two empirical distributions of the input sub-samples is
then plotted. The Smirnov test and Cramer-von Mises
test statistics of the two empirical distributions of the
sub-samples are also shown.
Fig. 26 shows an example of the Monte Carlo ltering
window. The output is divided at the median of its
empirical distribution, i.e. the distribution is divided
into upper and lower quantiles.
8.12 Exporting and saving results
After a simulation the user can save or export the re-
sults to various formats:
(a) File|Save... (Ctrl+S)
The whole session will be saved to a binary
le that later can be loaded into Eikos.
(b) Push the button ToWorkspace in the Post
Processing-section. A structure describing
the simulation together with the results will
be exported to Matlab workspace. The
data can then be further analysed.
(c) File|Report Generator...
Or push the button Report Generator in
the Post Processing-section. The report
generator window will open, from where re-
sults can be saved to either an Excel sheet
(.xls) or an ordinary ASCII format text le
(.txt).
Fig. 27 shows the report generator window where the
user can choose which data he would like to save.
Figure 27: Report generator window.
30
9.2 Benchmarking of procedure for generating probability distributions 9 RESULTS OF BENCHMARKING
9 Results of benchmarking
The Eikos software can cope with general Matlab or
Simulink models as well as models coded in FORTRAN,
C/C++ and JAVA. It is designed to do sensitivity anal-
ysis as well as uncertainty analysis. Sensitivity indices
and other statistics are calculated. The raw data can
easily be exported to the Matlab Workspace, where
the user can do complementary data analysis.
There exist various other software packages that can
do similar computations as the ones oered in Eikos.
One example is the package PREP/SPOP, coded in
FORTRAN 77, which can prepare the MC-generated
input data and compute sampling-based statistics on
the model output. PREP/SPOP requires the model to
be FORTRAN 77 compatible.
Another example is SimLab, which is a software package
for doing uncertainty and sensitivity analysis on pro-
gram codes. SimLab has been developed at the Joint
Research Centre European Commission with Saltelli
and Tarantola as project leaders. With SimLab, exe-
cutable code as well as simple functions can be analysed.
SimLab is built on the basis of the PREP/SPOP codes
and has been extended with various other methods, for
example screening designs and variance-decomposing
methods.
A simple but often used tool in risk analysis is MS Ex-
cel. MS Excel becomes very powerful together with pro-
gram modules like @RISK or Crystal Ball. Together
with these add-ons, MS Excel can generate samples
from various dierent distributions, as well as to per-
form simple SA.
The procedure for sampling distributions in Eikos has
been benchmarked against the sampling routines of-
fered in @RISK. To benchmark the sensitivity analysis
routines implemented in Eikos, SimLab has been used,
together with well known test functions found in the
literature.
All test functions presented in the rest of this section
has been implemented as test cases in Eikos.
9.1 Test of code for inducing rank cor-
relation
The code for inducing rank correlation, included in Ap-
pendix A.1, was tested for a model with four input
factors (R = (X
1
, X
2
, X3, X
4
)). A uniform distribution
was assigned to all factors. Thousand random samples
were generated for each of the four factors, in the range
[0,1] (using R = rand(1000,4);). The correlation ma-
trix obtained for the input sample matrix R was (using
corrcoef(R)):
_

_
1.0000 0.0107 0.0090 0.0262
0.0107 1.0000 0.0072 0.0016
0.0090 0.0072 1.0000 0.0056
0.0262 0.0016 0.0056 1.0000
_

_
R
(9.1)
Samples generated with rand are assumed to be uncor-
related. But as can be seen in the resulting correla-
tion matrix in Eq. (9.1), small undesired correlations
between the factors has occurred. This is due to the
relatively small sample size. If more uncorrelated sam-
ples are desired, the rank correlation procedure could be
called with the identity matrix as correlation structure.
The matrix in Eq. (2.3) describes the wanted corre-
lations between the input factors. After executing the
code, the correlated samples had the following correla-
tion matrix (obtained by corrcoef(rankCorr(Cstar,R))):
_

_
1.0000 0.4808 0.0008 0.0040
0.4808 1.0000 0.0072 0.2742
0.0008 0.0072 1.0000 0.0013
0.0040 0.2742 0.0013 1.0000
_

_
Eikos
For the 1000 samples, the sought correlation was al-
most obtained after the call to the Matlab function
rankCorr(Cstar,R). As a comparison, the same pro-
cedure has been performed with SimLab and @RISK.
The resulting correlation matrix from SimLab was:
_

_
1.0000 0.4988 0.0020 0.0282
0.4988 1.0000 0.0264 0.1684
0.0020 0.0264 1.0000 0.0788
0.0282 0.1684 0.0788 1.0000
_

_
SimLab
SimLab does very well with inducing the positive cor-
relation between factor X
1
and factor X
2
. But since
the negative correlation between factor X
2
and factor
X
4
were supposed to be -0.3, Eikos did a better over-
all job than SimLab. The resulting correlation matrix
from @RISK was:
_

_
1.0000 0.4893 0.0263 0.0048
0.4893 1.0000 0.0530 0.2447
0.0263 0.0530 1.0000 0.0093
0.0048 0.2447 0.0093 1.0000
_

_
@RISK
@RISK and Eikos showed a comparable performance
when inducing rank correlation.
9.2 Benchmarking of procedure for gen-
erating probability distributions
Ten of the twelve probability distributions oered in
Eikos have been compared with @RISK. @RISK does
31
9 RESULTS OF BENCHMARKING 9.4 Benchmarking of SA methods for non-linear monotonic models
not oer the log-triangular and log-uniform distribu-
tions so these have not been benchmarked. For the
comparison, 10,000 samples where generated for each
distribution both with Eikos and in @RISK. The sam-
ples where compared with each other using the Two-
sample Kolmogorov-Smirnov goodness-of-t hypothesis
test (kstest2), available in Matlab Statistics Tool-
box. The default parameter settings were used for all
distributions. Truncated versions of the distributions
were also tested by the same method.
Distribution Monte Carlo LHS Truncated (LHS)
Normal 0.0075 0.0001 0.0001
Log-normal 0.0100 0.0001 0.0001
Uniform 0.0104 0.0001 0.0001
Triangular 0.0086 0.0001 0.0001
Exponential 0.0228 0.0001 0.0001
Extreme value 0.0148 0.0001 0.0001
Beta 0.0130 0.0001 0.0001
Gamma 0.0089 0.0001 0.0001
Chi-square 0.0118 0.0001 0.0001
Weibull 0.0112 0.0001 0.0001
Table 1: Results of the Kolmogorov-Smirnov Goodness-
of-Fit Test of probability distributions.
The obtained K-S test statistics are presented in table 1,
showing the maximal dierence between the empirical
distributions generated with Eikos and @RISK. For all
tested distributions, the results indicate that the sam-
ples where drawn from the same probability distribu-
tions generated with Eikos and @RISK. Hence, it can
be concluded that there is good agreement between the
sampling procedures implemented in Eikos and @RISK.
9.3 Benchmarking of SA methods for
linear models
The following test model has been used for benchmark-
ing the SA methods applicable for linear models:
Y = X
1
+X
2
+X
3
, (9.2)
where the distributions for the three input factors X
i
,
i = 1, . . . , 3, are all uniform in the interval [0.5, 1.5],
[1.5, 4.5] and [4.5, 13.5] respectively. This model has a
model coecient of determination R
2
= 1, thus a linear
t approximates the function very well. The model is
linear and since factor X
3
has the largest mean value,
it should be the most important factor. The results
of the SA obtained from simulating the model 10,000
times, using LHS, with SimLab and Eikos are presented
in Table 2.
As can be noticed from Table 2, the results of the SA
obtained with Eikos and SimLab coincide well, for all
methods.
Factor CC RCC SRC SRRC PCC PRCC
X
1
SimLab 0.107 0.100 0.105 0.098 1 0.806
Eikos 0.108 0.101 0.105 0.098 1.000 0.829
X
2
SimLab 0.313 0.295 0.314 0.296 1 0.972
Eikos 0.313 0.297 0.315 0.299 1.000 0.976
X
3
SimLab 0.944 0.948 0.943 0.947 1 0.997
Eikos 0.943 0.947 0.944 0.947 1.000 0.998
Table 2: Sensitivity results of the linear model in Eq.
(9.2).
9.4 Benchmarking of SA methods for
non-linear monotonic models
For non-linear monotonic models results from the rank
transformed sampling-based statistics should yield bet-
ter estimates of the underlying importance of the dier-
ent input factors. The following test model was used to
benchmark the SA methods for non-linear monotonic
models:
Y = X
1
+X
4
2
(9.3)
Two dierent tests were made with this model. In the
rst test, a uniform distribution in the interval [0, 1] was
assigned to both factors. With these settings the second
factor (X
2
) should always give a smaller contribution,
since it is raised to the power of four and thus will be
less important than the rst factor (X
1
). With this
setting the model coecient of determination for both
raw and ranked data is R
2
= 0.89. The results of the SA
obtained from simulating the model 10,000 times, using
LHS, with SimLab and Eikos are presented in Table 3.
Factor CC RCC SRC SRRC PCC PRCC
X
1
SimLab 0.734 0.763 0.739 0.767 0.909 0.919
Eikos 0.735 0.762 0.734 0.761 0.908 0.917
X
2
SimLab 0.582 0.550 0.588 0.556 0.866 0.860
Eikos 0.588 0.558 0.587 0.557 0.866 0.860
Table 3: Sensitivity results of the non-linear monotonic
model in Eq. (9.3) with uniform distributions in the
range [0,1].
In the second test, a uniform distribution in the interval
[0, 5] was assigned to both input factors. This makes the
second factor very important since a number larger than
one is raised to the power of four, and the model will
be monotonic.
Factor CC RCC SRC SRRC PCC PRCC
X
1
SimLab -0.001 0.071 0.008 0.080 0.015 0.516
Eikos 0.022 0.095 0.007 0.078 0.015 0.506
X
2
SimLab 0.866 0.988 0.866 0.989 0.866 0.991
Eikos 0.866 0.988 0.866 0.987 0.866 0.991
Table 4: Sensitivity results of the non-linear monotonic
model in Eq. (9.3) with uniform distributions in the
range [0,5].
32
9.5 Benchmarking of SA methods for non-monotonic models 9 RESULTS OF BENCHMARKING
The results of the SA obtained from simulating the
model 10,000 times, using LHS, with SimLab and Eikos
are presented in Table 4.
As can be seen in Table 3 and 4 the sampling-based SA
results obtained with Eikos and SimLab coincide very
well with each other for both tests performed.
For the model given by Eq. (9.3), when using uniform
distributions in the range [0,5], the coecient of deter-
mination obtained for raw data was R
2
= 0.75 while for
the ranked data it was R
2
= 0.98. This implies that rank
transformed measures gives better sensitivity estimates
for this model.
To illustrate how rank transformed data creates better
linear ts, Fig. 28 and 29 shows scatter plots of input
factor X
2
against the output for the model in Eq. (9.3)
when using uniform distributions in the range [0,5].
Figure 28: Scatter plot of X
2
against Y with raw data.
Figure 29: Scatter plot of X
2
against Y using ranked
data.
Fig. 28 shows that the scatter plot, obtained when
using the raw sample data, and cannot be tted well
with a regression line.
On the contrary Fig. 29 shows that the scatter plot,
obtained when using the rank transformed sample data,
and can be tted well with a regression line.
9.5 Benchmarking of SA methods for
non-monotonic models
For non-monotonic models, correlation coecients and
statistics based on linear regression do not perform well.
For this type of models, the variance decomposing meth-
ods described in Sec. 7 are recommended. The rst or-
der sensitivity index S
i
=
V (E[Y |X
i
])
V (Y )
and the total eect
sensitivity index S
T
i
including all interactions, can be
used as sensitivity measures.
9.5.1 The Sobol g-function
Saltelli and Sobol proposed in 1995 [SS95] a non-monotonic
analytical test function, known as the Sobol g-function,
which has the property that the user can decide which
factors should be important, non important and which
should be non signicant. The Sobol g-function is de-
ned as:
Y =
k

j=1
|4X
j
2| +a
j
1 +a
j
, (9.4)
where a
j
0 is a parameter and k is the number of
factors. All input factor distributions are uniform in
the range [0,1].
A very desirable property of this test function is that
the sensitivity indices of all orders can be computed
analytically. Saltelli and Bolado gives the formulas for
computing all order partial variances as well as the total
variance [SB98]. Partial variances of the rst order are
V
i
=
1
3(1 +a
i
)
2
. (9.5)
Higher order partial variances are then the product of
the lower ones, for example
V
12
= V
1
V
2
. (9.6)
And nally the total variance is given by
V = 1 +
k

i=1
(1 +V
i
). (9.7)
For our test case we used eight input factors with pa-
rameters a = {0, 1, 4.5, 9, 99, 99, 99, 99}. A low value of an
33
9 RESULTS OF BENCHMARKING 9.5 Benchmarking of SA methods for non-monotonic models
input factor X
j
, with the corresponding parameter a
j
,
implies high importance and a high value implies low
importance.
Fig. 30 shows a scatter plot of the most important
factor X
1
against the output. The model is very non-
monotonic and a regression line can not t the data.
The coecient of determination for a regression line is
zero.
Figure 30: Scatter plot of factor X
1
against Y obtained
with the Sobol g-function.
Table 5 presents the rst order sensitivity indices, for
the three variance decomposing methods available in
Eikos, obtained when evaluating the model 10,000 times
(at the most).
Factor S
i
S
SOBOL
i
S
SOBOL
i
S
WS
i
S
FAST
i
S
FAST
i
analytic Eikos SimLab Eikos Eikos SimLab
X
1
0.7165 0.7649 0.7366 0.7137 0.7052 0.7023
X
2
0.1791 0.1809 0.1836 0.2243 0.1764 0.1699
X
3
0.0237 0.0712 0.0242 0.0200 0.0233 0.0211
X
4
0.0072 0.0321 0.0064 -0.0032 0.0071 0.0066
X
5
0.0001 0.0275 0.0001 -0.0139 0.0001 0.0001
X
6
0.0001 0.0272 0.0000 -0.0133 0.0001 0.0001
X
7
0.0001 0.0288 0.0001 -0.0126 0.0001 0.0001
X
8
0.0001 0.0288 0.0001 -0.0122 0.0001 0.0001
Table 5: First order sensitivity indices obtained for
the Sobol g-function Eq. (9.4) with parameter a =
{0, 1, 4.5, 9, 99, 99, 99, 99}.
Table 6 presents the total eect sensitivity indices, for
the three variance decomposing methods available in
Eikos, obtained when evaluating the model 10,000 times
(at the most).
Factor S
T
i
S
SOBOL
T
i
S
SOBOL
T
i
S
WS
T
i
S
EFAST
T
i
S
EFAST
T
i
analytic Eikos SimLab Eikos Eikos SimLab
X
1
0.7871 0.7705 0.8028 0.7702 0.7917 0.7761
X
2
0.2420 0.1926 0.2475 0.2418 0.2315 0.2363
X
3
0.0340 0.0252 0.0360 0.0378 0.0327 0.0321
X
4
0.0105 0.0087 0.0091 0.0098 0.0107 0.0108
X
5
0.0001 -0.0013 0.0001 0.0001 0.0002 0.0010
X
6
0.0001 -0.0006 0.0003 0.0001 0.0002 0.0010
X
7
0.0001 0.0025 0.0003 0.0001 0.0002 0.0010
X
8
0.0001 0.0030 0.0000 0.0001 0.0002 0.0010
Table 6: Total eect sensitivity indices obtained for
the Sobol g-function Eq. (9.4) with parameter a =
{0, 1, 4.5, 9, 99, 99, 99, 99}.
The sensitivity results given in table 5 and 6 for dier-
ent SA methods were expected to dier from each other.
This is because the number of iterations was limited to
10,000 at the most. And computing Sobol indices de-
mands far more iterations to retrieve the same degree of
accuracy, than for example the extended FAST method.
The results obtained with the same SA method, com-
puted in Eikos and SimLab, should instead be com-
pared.
The less important indices, computed with the Sobol
method in SimLab, were closer to the analytical values
than the corresponding indices computed with Eikos.
This might be due to the advantage of using quasi-
random numbers from the LP

sequence instead of the


pseudo-random numbers oered in Eikos. SimLab doesnt
have an implementation of the WS-sampling scheme,
but results from Eikos were quite close to the analyti-
cal value, for the important factors. The results from
the Fourier analysis-based methods obtained with Eikos
were close to the analytical values as well as the values
computed with SimLab.
9.5.2 The Ishigami function
Another popular non-monotonic test function is the
Ishigami function [CSST00], which has three input fac-
tors and is dened as:
Y = sin X
1
+ 7 sin
2
X
2
+ 0.1X
4
3
sin X
1
, (9.8)
where all three input factors have uniform distributions
in the range [, ]. As for the Sobol g-function, the
behavior of this function is very non-monotonic. The
model coecient of determination for a regression line
is not zero as for the Sobol g-function, but it is very
low, R
2
= 0.19.
Both the sampling-based methods and the variance de-
composing techniques have been tested with this func-
tion. Table 7 presents the results of SA obtained with
sampling-based methods for 10,000 model simulations,
using LHS.
34
9.5 Benchmarking of SA methods for non-monotonic models 9 RESULTS OF BENCHMARKING
Factor CC RCC SRC SRRC PCC PRCC
X
1
SimLab 0.440 0.441 0.440 0.441 0.440 0.441
Eikos 0.436 0.437 0.436 0.437 0.436 0.437
X
2
SimLab -0.001 -0.001 -0.001 -0.001 -0.001 -0.001
Eikos 0.003 0.001 0.003 0.002 0.003 0.001
X
3
SimLab 0.004 0.002 0.004 0.003 0.004 0.003
Eikos 0.001 0.002 0.001 0.002 0.001 0.002
Table 7: Results of the SA obtained for the non-
monotonic Ishigami function in Eq. (9.8) using
sampling-based methods.
The sensitivity coecients obtained from all sampling-
based SA methods indicate that input factor X
1
is the
only important factor. However, since the Ishigami
function is non-monotonic, the coecients in Table 7
are not a good measure of the relative importance of
the input factors. The similarity of the results obtained
from Eikos and from SimLab indicates on the other
hand that the Eikos implementation of the sampling-
based SA methods is correct.
As with the Sobol g-function, analytical solutions ex-
ists for the sensitivity indices, these are described by
Homma and Saltelli [HS96].
Factor S
i
S
SOBOL
i
S
SOBOL
i
S
WS
i
S
FAST
i
S
FAST
i
analytic Eikos SimLab Eikos Eikos SimLab
X
1
0.3138 0.2849 0.3234 0.2832 0.3115 0.3090
X
2
0.4424 0.4336 0.4409 0.4313 0.4426 0.4436
X
3
0.0000 -0.0062 -0.0082 0.0217 0.0002 0.0289
Table 8: First order sensitivity indices obtained for the
Ishigami function in Eq. (9.8).
The rst order sensitivity indices indicate that factor
X
2
is more important than factor X
1
. Note that the
sampling-based SA results in Table 7 indicated that
only factor X
1
was important, which is not correct.
Factor S
T
i
S
SOBOL
T
i
S
SOBOL
T
i
S
WS
T
i
S
EFAST
T
i
S
EFAST
T
i
analytic Eikos SimLab Eikos Eikos SimLab
X
1
0.5574 0.5209 0.5725 0.5710 0.5386 0.5256
X
2
0.4442 0.4853 0.4409 0.4432 0.4879 0.4893
X
3
0.2410 0.2297 0.2408 0.2490 0.2369 0.2361
Table 9: Total eect sensitivity indices obtained for the
Ishigami function in Eq. (9.8).
The total eect sensitivity indices reveal even more in-
formation about the input-output relationships. From
the analytical values of S
T
i
, listed in Table 9, X
1
can
be concluded to be the most important contributor to
the output variability. Also, factor X
3
, thought to be
totally unimportant according to the rst order sensi-
tivity indices, has some importance.
Fig. 31 shows scatter plots of how the output from the
Ishigami function depends on its three input factors.
Figure 31: Scatter plots of the three input factors X
1

X
3
against Y obtained with the Ishigami function in
Eq. (9.8).
As with the sensitivity results obtained for the Sobol
g-function, the Sobol indices computed in SimLab for
the Ishigami function were more accurate than the ones
computed in Eikos. Apart from that, the results, ob-
35
10 EXAMPLE OF PRACTICAL APPLICATION
tained with SimLab and Eikos were very similar.
9.5.3 The Morris function
To demonstrate the basic idea of the Morris design,
Morris [Mor91] proposed a computational model with
20 input factors dened as:
y =
0
+
20

i
w
i
+
20

i<j

i,j
w
i
w
j
+
20

i<j<l

i,j,l
w
i
w
j
w
l
+
20

i<j<l<s

i,j,l,s
w
i
w
j
w
l
w
s
,(9.9)
where w
i
= 2(x
i
1/2) except for i = 3, 5, and 7, where
w
i
= 2(1.1x
i
/(x
i
+ .1) 1/2). Coecients of relatively
large value were assigned as
B
i
= 20, i = 1, . . . , 10,
B
i,j
= 15, i, j = 1, . . . , 6,
B
i,j,l
= 10, i, j, l = 1, . . . , 5,
B
i,j,l,s
= 5, i, j, l, s = 1, . . . , 4.
(9.10)
The remainders of rst- and second-order coecients
were generated independently from a normal distribu-
tion with zero mean and unit standard deviation and
the remainder of third- and fourth-order coecients
were set to 0.
Four random orientations r = 4 were generated using
p = 4 ( = 2/3) to produce an 84-run design. Based
on 84 computed values of y, a random sample of four
elementary eects was observed for each of the 20 in-
puts. Fig. 32 shows the results of the experiment. The
mean and standard deviation has been computed on the
absolute value of the estimated elementary eects.
The input factors in the Morris test function have been
reported to have the following characteristics [STCR04]:
(a) the rst ten factors are important;
(b) of these, the rst seven have signicant ef-
fects that involve either interactions or cur-
vatures;
(c) the other three are important mainly be-
cause of their rst order eect.
The results of Morris screening on the Morris function
shown in Fig. 32 correspond quite well to the listed
characteristics. The rst seven factors have the largest
estimated means and standard deviations of the elemen-
tary eects. After that comes factor X
8
, X
9
and X
10
.
Clustered to the left of the plot, with small estimated
means and standard deviations, are the factors that are
not important.
Figure 32: Plot of the estimated means against the
estimated standard deviation of the elementary eects
obtained with the Morris test function.
Morris screening in Eikos is not as robust as the other
SA methods, i.e. the factor ranking is not always con-
sistent. Important factors are more consistent than
the unimportant. Therefore the Morris design is rec-
ommended to be repeated several times. This incon-
sistency of the Morris screening method has also been
noted in SimLab.
10 Example of practical applica-
tion
A model of the long-term transfer of radionuclides in
forests has been developed by Facilia AB. This model
simulates the long-term behavior in temperate and bo-
real forests of radionuclides entering the ecosystem with
subsurface water. It can be applied for most radionu-
clides that are of relevance for high level waste manage-
ment and allows estimating radionuclide concentrations
in soil, trees, understorey plants, mushrooms etc.
The model has been implemented in Ecolego. A sen-
sitivity analysis has been performed for the concentra-
tion of Cs-135 in the soil layer after 10,000 and 100,000
years, with a constant radionuclide input of 1 Bq/m
2
/y.
The Simulink numerical ODE solver ode15s was used
for integration of the model. ode15s is a variable step-
size solver specialized on solving sti ordinary dieren-
tial equations.
36
10 EXAMPLE OF PRACTICAL APPLICATION
Uncertain input factors under consideration are listed
in Table 10.
Factor Information Units
Volumetric water content in soil kg/m
2
Soil bulk density kg/m
3
ET Area normalised evaporation rate m
3
/m
2
/y
P Area normalised precipitation rate m
3
/m
2
/y
TC
LiToS
Transfer coefficient from litter to soil y
1
TC
WToLi
Transfer coefficient from tree wood to litter y
1
I
loss
Interception loss ratio r.u.
TreeLT Time delay y
h Thickness of the soil rooting layer m
Kd Distribution coefficient of Cs-135 in soil m
3
/kg
CR
W
Concentration ratio from soil to tree wood
of CS-135 r.u.
CR
H
Concentration ratio from soil to understorey
plants of CS-135 r.u.
CR
L
Concentration ratio from soil to tree leaves
of CS-135 r.u.
Table 10: Information of uncertain input factors.
Each of the thirteen uncertain input factors was as-
signed a probability distribution based on values re-
ported in the literature. The distribution coecient
values were generally found either for forest soils or for
organic soils. Due to the wide intervals of variation,
values representing boreal forests are assumed to be in-
cluded. Table 11 list the distributions used.
Factor Distribution Best Est. Min. Max. Units
Triangular 0.2 0.1 0.5 kg/m
2
Triangular 1180.0 700 1500 kg/m
3
ET Triangular 0.335 0.15 0.46 m
3
/m
2
/y
P Triangular 0.674 0.588 0.76 m
3
/m
2
/y
TC
LiToS
Triangular 0.16 0.005 0.5 y
1
TC
WToLi
Triangular 0.0090 0.002 0.008 y
1
I
loss
Triangular 0.3 0.1 0.5 r.u.
TreeLT Triangular 100.0 70 200 y
h Triangular 0.3 0.2 0.5 m
Kd Log-triangular 0.2 0.4 53 m
3
/kg
CR
W
Triangular 0.8 0.1 5.8 r.u.
CR
H
Log-triangular 7.0 0.1 100 r.u.
CR
L
Triangular 3.4 0.8 14 r.u.
Table 11: Distributions assigned to the input factors.
A Morris screening was performed with the forest model.
Twenty elementary eects were computed for six levels,
demanding 20*(13+1)=280 model evaluations. Table
12 presents the estimated means and standard devia-
tions of the twenty elementary eects.
Factor 10,000 100,000

0.0003 0.0001 0.0004 0.0001
0.0449 0.0703 0.0008 0.0002
ET 1.7103 0.5829 1.8905 0.4995
P 0.8028 0.2968 0.7850 0.3121
TC
LiToS
0.0037 0.0036 0.0001 0.0001
TC
WToLi
0.0129 0.0116 0.0002 0.0001
I
loss
1.4827 0.4956 1.6303 0.4397
TreeLT 0.0000 0.0000 0.0000 0.0000
h 0.0354 0.0676 0.0005 0.0002
Kd 4.1210 0.9222 4.5928 1.2451
CR
W
0.0411 0.0582 0.0007 0.0005
CR
H
0.0125 0.0187 0.0002 0.0001
CR
L
0.0022 0.0033 0.0000 0.0000
Table 12: Results of Morris screening.
Figure 33: Estimated means against estimated stan-
dard deviations of the elementary eects at year 10,000.
Fig. 34 and 33 show plots of the estimated means
against the standard deviations, for year 10,000 and
100,000 respectively. Both Fig. 34 and 33 indicate that
there are four important input factors in this model.
These are in descending order of importance: Kd, ET,
I
loss
and P. The rest of the input factors are all scat-
tered close to zero and should therefore be classied as
unimportant.
Figure 34: Estimated means against estimated stan-
dard deviations of the elementary eects at year
100,000.
A Monte Carlo simulation using 1000 iterations and
Latin Hypercube sampling has been performed on the
model. Fig. 35 show how the predicted activity concen-
tration in soil for Cs-135 varies with time, for twenty of
these MC simulations.
37
10 EXAMPLE OF PRACTICAL APPLICATION
Figure 35: Twenty simulations of the predicted activity
concentration in soil for Cs-135 (Bq/kg).
One can note that the concentration converges to equi-
librium with dierent speed and at dierent values. For
some input factor combinations, equilibrium occurs fast
and at a low concentration value and for other combi-
nations the equilibrium hasnt been reached even after
10,000 years, with very high concentration of Cs-135 as
a result.
At year 10,000, the frequency distribution of the pre-
dicted activity concentration obtained from the MC
simulation has a mean E(Y ) = 11.5 and variance V (Y ) =
36.3. Fig. 36 shows a frequency plot of this distribution.
Figure 36: Histogram over the predicted activity con-
centration in soil for Cs-135 (Bq/kg) at year 10,000.
The model coecient of determination of a regression
line computed from the raw data is R
2
= 0.6279 and
with ranked data it is R
2
= 0.8815, thus sensitivity co-
ecients computed from ranked sample data should re-
sult in more valuable information than what is oered
by the coecients based on raw sample data.
Table 13 presents the sampling-based SA results ob-
tained from the 1000 MC simulations at year 10,000.
Factor CC RCC SRC SRRC PCC PRCC
0.0259 0.0410 0.0268 0.0145 0.0259 0.0409
-0.1565 -0.1321 -0.1586 -0.1262 -0.1565 -0.1320
ET 0.3351 0.3343 0.3385 0.3259 0.3350 0.3342
P -0.1633 -0.1757 -0.1594 -0.1496 -0.1633 -0.1757
TC
LiToS
0.0217 0.0131 0.0182 0.0405 0.0216 0.0131
TC
WToLi
0.0741 0.0537 0.0726 0.0512 0.0740 0.0537
I
loss
0.2889 0.2882 0.2908 0.2867 0.2888 0.2881
TreeLT 0.0028 0.0039 0.0034 -0.0007 0.0028 0.0039
h -0.1782 -0.1341 -0.1792 -0.1571 -0.1782 -0.1340
Kd 0.5496 0.7798 0.5513 0.7669 0.5496 0.7798
CR
W
-0.1135 -0.0953 -0.1133 -0.0874 -0.1134 -0.0953
CR
H
-0.1577 -0.1272 -0.1587 -0.1226 -0.1577 -0.1272
CR
L
-0.0144 -0.0111 -0.0084 -0.0022 -0.0143 -0.0111
Table 13: Sampling based SA results of the forest model
at year 10,000 (1000 simulations).
Figure 37: Tornado plot of SRRC at year 10,000.
Most of the sampling-based methods presented in Table
13 ranks the input factors similarly. To get an easier
overview of the results, a tornado plot of the standard-
ized ranked regression coecients is shown in Fig. 37.
The tornado plot in Fig. 37 shows that the model is
most inuenced by a positive correlation with Kd. Af-
ter Kd, the ET and I
loss
are the most important input
factors.
Since Kd is thought to be the most important factor
according to the sampling-based coecients for year
10,000, a scatter plot of Kd against the output for the
rank transformed data, at year 10,000, is shown in Fig.
38.
38
10 EXAMPLE OF PRACTICAL APPLICATION
Figure 38: Scatter plot of Kd against the predicted ac-
tivity concentration for ranked data, at year 10,000.
At year 100,000, the frequency distribution of the pre-
dicted activity concentration obtained from the MC
simulation has a mean E(Y ) = 37.4 and variance V (Y ) =
2221.4. Fig. 39 shows a frequency plot of this distribu-
tion. The model coecient of determination computed
from the raw data is R
2
= 0.6312 and with ranked data
it is R
2
= 0.9308, thus sensitivity coecients computed
from ranked sample data should result in more valu-
able information than what is oered by the coecients
based on raw sample data.
Figure 39: Histogram over the predicted activity con-
centrations in soil for Cs-135 at year 100,000.
Table 14 presents the sampling-based SA results ob-
tained from the 1000 MC simulations at year 100,000.
The tornado plot in Fig. 40 shows that the model is
similarly dependent on the input factors at year 100,000
as for year 10,000. Kd is still the most important factor
with a positive correlation. After Kd, the ET and I
loss
are the most important input factors. The dierence
is for the least important factors, except for the three
most important factors, there is now only one factor
that has a coecient larger in magnitude than 0.15.
This is factor P, which is negatively correlated to the
model output.
Factor CC RCC SRC SRRC PCC PRCC
0.0267 0.0441 0.0272 0.0156 0.0267 0.0441
-0.0143 -0.0289 -0.0177 -0.0195 -0.0143 -0.0289
ET 0.3808 0.3755 0.3839 0.3652 0.3808 0.3755
P -0.1468 -0.1817 -0.1431 -0.1529 -0.1468 -0.1817
TC
LiToS
-0.0131 -0.0241 -0.0162 0.0030 -0.0131 -0.0241
TC
WToLi
0.0204 0.0055 0.0184 0.0026 0.0204 0.0055
I
loss
0.3350 0.3245 0.3367 0.3209 0.3350 0.3245
TreeLT -0.0171 -0.0032 -0.0159 -0.0087 -0.0170 -0.0032
h -0.0498 -0.0001 -0.0497 -0.0247 -0.0498 -0.0001
Kd 0.5835 0.8191 0.5852 0.8052 0.5835 0.8191
CR
W
0.0135 0.0108 0.0134 0.0196 0.0135 0.0108
CR
H
-0.0628 -0.0254 -0.0651 -0.0190 -0.0628 -0.0254
CR
L
-0.0339 -0.0136 -0.0285 -0.0074 -0.0339 -0.0136
Table 14: Sensitivity results of the forest model at year
100,000 (1000 simulations).
Figure 40: Tornado plot of SRRC at year 100,000.
Since Kd is thought to be the most important factor
according to the sampling-based coecients for year
100,000, a scatter plot of Kd against the output for
the rank transformed data, at year 100,000, is shown in
Fig. 41.
39
11 GENERAL DISCUSSION
Figure 41: Scatter plot of Kd against the predicted ac-
tivity concentration for ranked data, at year 100,000.
A variance-based sensitivity analysis has also been per-
formed on the model, using the extended Fourier ampli-
tude sensitivity test with 845 model evaluations which
is the least number of iterations required by EFAST for
a model with 13 input factors.
Factor 10, 000 100, 000
S
i
S
T
i
S
i
S
T
i
0.0003 (13) 0.0130 (12) 0.0110 (7) 0.0522 (8)
0.0243 (5) 0.0487 (5) 0.0007 (13) 0.0239 (13)
ET 0.1570 (2) 0.2365 (3) 0.1183 (3) 0.2395 (4)
P 0.0154 (7) 0.0376 (6) 0.0495 (5) 0.1246 (5)
TC
LiToS
0.0037 (9) 0.0133 (11) 0.0044 (10) 0.0327 (11)
TC
WToLi
0.0024 (11) 0.0175 (10) 0.0030 (11) 0.0379 (10)
I
loss
0.1428 (3) 0.2411 (2) 0.2495 (2) 0.6444 (1)
TreeLT 0.0025 (10) 0.0241 (9) 0.0048 (9) 0.0805 (7)
h 0.0245 (4) 0.0736 (4) 0.0159 (6) 0.1089 (6)
Kd 0.8193 (1) 0.8973 (1) 0.3632 (1) 0.5741 (2)
CR
W
0.0158 (6) 0.0344 (7) 0.0029 (12) 0.0258 (12)
CR
H
0.0107 (8) 0.0247 (8) 0.0069 (8) 0.0412 (9)
CR
L
0.0005 (12) 0.0046 (13) 0.0750 (4) 0.4277 (3)
Table 15: Sensitivity indices and their corresponding
ranking obtained with EFAST.
The rst order sensitivity indices for year 10,000, pre-
sented in Table 15, indicate that Kd contributes with
67% of the output variability, I
loss
and ET contribute
with 12% and 13% respectively. The rest of the input
factors contribute with less than 3% each. If examining
the total indices Kd is still the most important factor,
contributing with 54% of the output variability. I
loss
and the ET contribute with 14% each and the rest con-
tribute with less than 6% each.
For year 100,000 the rst order indices does not sum
up to 1. Thus, there are interactions between the fac-
tors accounting for the unexplained part, which is 9%.
According to the rst order eects, Kd is still the most
important input factor contributing with 36% of the
output variability. CR
L
having an 8% contribution to
the output variability, being more important than P,
according to the rst order indices is the biggest dif-
ference to the other methods. According to results of
total indices the contribution from CR
L
is almost 20%
and the contribution of I
loss
is the largest with a 27%
contribution.
The results obtained for year 10,000 with all SA meth-
ods lead to the same conclusion: Kd is most impor-
tant, followed by the ET and I
loss
, while the impor-
tance of other factors is almost negligible. The rela-
tively high value of the model coecient of determina-
tion for ranked sample data (R
2
= 0.8815) compared to
the one for raw data (R
2
= 0.6279) indicate that the
concentration depends on input factors in a monotone
way.
The results obtained for year 100,000 were similar to re-
sults obtained for year 10,000, with all SA methods, ex-
cluding the EFAST method. The results obtained with
the EFAST method indicate that there are interactions
between the factors. The CR
L
was not identied as im-
portant with the Morris method, and was ranked almost
lowest by the usual sampling-based methods. However,
the EFAST method showed that the uncertainty in this
parameter explains nearly one fth of the overall un-
certainty. Also, according to the total indices, Kd has
a lower importance than thought after examining the
results obtained with other methods.
11 General discussion
In general, what sensitivity analysis method should be
used? It depends on several factors, such as how heavy
the needed model computations are, the number of un-
certain input factors and whether the model output de-
pends linearly, monotone or non-monotone on the input
factors under consideration.
The methods for factor screening, for example the Mor-
ris method, are useful as a rst step in dealing with com-
putationally expensive models containing large number
of input factors. The factors that control most of the
output variability, can be identied at low computa-
tional costs.
Local SA is used to investigate the impact of the input
factors on the model, locally. It measures how sensitive
the model is to small perturbations of the input fac-
tors. When the model is nonlinear and various input
factors are aected by uncertainties of dierent order
of magnitude, local SA should not be used.
If the model is not linear, correlation and regression
coecients can not be trusted. To determine if there
are nonlinearities in the model, the model coecient
40
12 ACKNOWLEDGEMENTS
of determination can be examined. A R
2
-value that is
lower than 0.6 indicates that the regression model does
not describe the dependency between the inputs and
the outputs accurately enough. Another way to detect
nonlinearities in the model is to examine scatter plots
of the inputs versus outputs.
The coecients computed on the basis of rank trans-
formed data can handle non-linear models that still
are monotone. Using ranked data, the R
2
value will
be improved but a drawback is that the model under
analysis has been altered. The analysis based on rank
transformed data is more robust than for untransformed
data, but for small sample sizes the results are not as
trustworthy, due to loss of information from the trans-
formation.
According to Saltelli et al. [SM90],[SH91], the most
robust and reliable of the sampling-based methods are
PRCC and SRRC. Therefore the results obtained with
these methods should be trusted more, than results ob-
tained with other sampling-based methods.
Latin Hypercube sampling can be used instead of simple
random sampling in the MC analysis. LHS forces the
samples to be drawn from the full range of the desired
distribution functions. Thus, a lower number of samples
can be used to emulate the distribution functions. A
drawback is that it can give a biased estimation of the
variance of the distributions; therefore, LHS should not
be used to generate sample distributions used in the
variance-based methods.
Variance-based methods that can compute the total ef-
fect contribution of the input factors on the model out-
put should be used if the model is thought to be non-
monotone. These methods require more model evalua-
tions, increasing with the number of input factors, than
the other methods. For models with a moderate number
of input factors and not too long model execution time,
these methods are ideal. All variance-based methods
give the same type of results: rst-order and total sen-
sitivity indices. Dierent amount of model evaluations
are required for the dierent methods. Sobol method
requires N(2k + 1) model evaluations while the Jansen
method only requires Nk model evaluations. For the
Fourier amplitude sensitivity test the number of model
evaluations required is in the order of O(k
2
), while for
the extended version k(2M
max
+ 1)N
r
model evalua-
tions are required. Drawbacks of the variance-based
methods are their higher computational costs and their
assumption that all information about the uncertainty
in the output is captured by its variance.
Sensitivity analysis provides a way to identify the model
inputs that have the strongest eect on the uncertainty
in the model predictions. However, sensitivity analysis
does not provide an explanation for such eects. This
explanation must come from the analysts involved and,
of course, be based on the mathematical properties of
the model under consideration.
Sensitivity analysis methods are essential tools in risk
analysis and simulation modeling. Due to diculties
of implementing the methods, most SA are today per-
formed using local methods, correlation or regression
coecients. The variance-based methods described here
are recommended for studies of environmental risk as-
sessment models, since these are often non-linear and
can be non-monotonic.
By having a toolbox, implemented in such general lan-
guage as Matlab, collecting diverse SA methods into
one single package, the modeler will have an easy access
to the more advanced methods.
In Eikos the most important methods for sensitivity
analysis has been implemented and benchmarked against
SimLab using well-known test functions. The imple-
mented distribution functions were also benchmarked
against samples generated with @RISK.
12 Acknowledgements
I would like to thank my supervisors for their help and
support. Rodolfo Avila Moreno especially for being a
never ending source of ideas and for proofreading and
enhancing the text considerably and Robert Broed for
many helpful discussions.
The Norwegian Radiation Protection Authority and Po-
siva Oy are acknowledged for being interested in the
project as well as for providing nancial support.
Finally I would like to thank all colleagues at Facilia
AB for contributing with a stimulating and joyful at-
mosphere.
41
REFERENCES
References
[CFS
+
73] R. I. Cukier, C. M. Fortuin, Kurt E. Shuler, A. G.
Petschek, and J. H. Schaibly. Study of the sensitivity
of coupled reaction systems to uncertainties in rate co-
ecients. I theory. The Journal of Chemical Physics,
59(8):38733878, October 1973.
[Con99] William Jay Conover. Practical Nonparametric
Statistics. John Wiley & Sons, Inc., 3rd edition, 1999.
[CSS75] R. I. Cukier, J. H. Schaibly, and Kurt E. Shuler.
Study of the sensitivity of coupled reaction systems
to uncertainties in rate coecients. III analysis of the
approximations. The Journal of Chemical Physics,
63(3):11401149, August 1975.
[CSST00] Francesca Campolongo, Andrea Saltelli, Tine
Srensen, and Stefano Tarantola. Hitchhikers guide
to sensitivity analysis. In Sensitivity analysis, Wiley
Ser. Probab. Stat., pages 1547. Wiley, Chichester,
2000.
[CST00] Karen Chan, Andrea Saltelli, and Stefano Tarantola.
Winding stairs: A sampling tool to compute sensitiv-
ity indices. Statistics and Computing, 10(3):187196,
July 2000.
[CTSS00] Karen Chan, Stefano Tarantola, Andrea Saltelli, and
Ilya M. Sobol

. Variance-based methods. In Sensitiv-


ity analysis, Wiley Ser. Probab. Stat., pages 167197.
Wiley, Chichester, 2000.
[Dec04] Decisioneering, Inc. Crystal Ball User Manual, 2004.
[gro04] The WAFO group. WAFO; A Matlab Toolbox for
Analysis of Random Waves and Loads, April 2004.
[Hol00] Anders Holtsberg. STIXBOX; A statistics toolbox for
Matlab and Octave, May 2000. GNU Public Licence
Copyright c .
[HS96] Toshimitsu Homma and Andrea Saltelli. Importance
measure in global sensitivity analysis of nonlinear
models. Reliability Engineering and System Safety,
52(1):17, 1996.
[IC82] Ronald L. Iman and W. J. Conover. A distribution-
free approach to inducing rank correlation among in-
put variables. Communications in Statistics: Simula-
tion and Computations, 11(3):311334, 1982.
[JRD94] M.J.W. Jansen, W.A.H. Rossing, and R.A. Daamen.
Monte Carlo estimation of uncertainty contributions
from several independent multivariate sources. Pre-
dictability and Nonlinear Modelling in Natural Sci-
ences and Economics, pages 334343, 1994.
[Mar68] George Marsaglia. Random numbers fall mainly in
the planes. Proceedings of the National Academy of
Sciences, 61(1):2528, September 1968.
[MCB79] M. D. McKay, W. J. Conover, and R. J. Beckman.
A comparison of three methods for selecting values
of input variables in the analysis of output from a
computer code. Technometrics, 21(2):239245, 1979.
[Mor91] Max D. Morris. Factorial sampling plans for pre-
liminary computational experiments. Technometrics,
33(2):161174, May 1991.
[MTS82] Gregory J. McRae, James W. Tilden, and John H.
Seinfeld. Global sensitivity analysis - a computational
implementation of the Fourier amplitude sensitivity
test (FAST). Computers and Chemical Engineering,
6(1):1525, 1982.
[Sal00] Andrea Saltelli. What is sensitivity analysis? In Sen-
sitivity analysis, Wiley Ser. Probab. Stat., pages 313.
Wiley, Chichester, 2000.
[SB98] Andrea Saltelli and Ricardo Bolado. An alternative
way to compute Fourier amplitude sensitivity test
(FAST). Computational Statistics and Data Analy-
sis, 26:445460, 1998.
[SH91] Andrea Saltelli and T Homma. Lisa package users
guide, part III: SPOP (Statistical POst Processor)
uncertainty and sensitivity analysis for model output
program description and user guide. Technical report,
CEC/JRC Nuclear Science and Technology, 1991.
[SM90] Andrea Saltelli and J Marivoet. Non-parametric
statistics in sensitivity analysis for model output: A
comparison of selected techniques. Reliability Engi-
neering and System Safety, 28:229253, 1990.
[Sob93] I. M. Sobol

. Sensitivity analysis for nonlinear mathe-


matical models. Mathematical Modeling and Compu-
tational Experiment, 1(4):407414, 1993.
[Sob01] I. M. Sobol

. Global sensitivity indices for nonlin-


ear mathematical models and their Monte Carlo esti-
mates. Math. Comput. Simulation, 55(1-3):271280,
2001. The Second IMACS Seminar on Monte Carlo
Methods (Varna, 1999).
[SS73] J. H. Schaibly and Kurt E. Shuler. Study of the sen-
sitivity of coupled reaction systems to uncertainties
in rate coecients. II applications. The Journal of
Chemical Physics, 59(8):38793888, October 1973.
[SS95] Andrea Saltelli and I. M. Sobol

. About the use of


rank transformation in sensitivity analysis of model
output. Reliability Engineering and System Safety,
50:225239, 1995.
[STC99] Andrea Saltelli, Stefano Tarantola, and Karen Chan.
A quantitative model-independent method for global
sensitivity analysis of model output. Technometrics,
41(1):3956, February 1999.
[STCR04] Andrea Saltelli, Stefano Tarantola, Francesca Campo-
longo, and Marco Ratto. Sensitivity analysis in prac-
tice. John Wiley & Sons Ltd., Chichester, 2004. A
guide to assessing scientic models.
[tfe05] Wikipedia, the free encyclopedia. Monte Carlo
method
http://en.wikipedia.org/wiki/Monte Carlo method,
2005.
[Vos96] David Vose. Quantitative risk analysis. John Wiley &
Sons Ltd., Chichester, 1996. A guide to Monte Carlo
simulation modelling.
[Wey38] H. Weyl. Mean motion. American Journal of Mathe-
matics, 60(4):889896, October 1938.
[WJ98] Gregory D. Wyss and Kelly H. Jorgensen. A users
guide to LHS: Sandias latin hypercube sampling soft-
ware. Technical report, Risk Assessment and Systems
Modeling Department Sandia National Laboratories,
1998.
42
A.2 Computer code for the inverse of the CDF of the gamma distribution
A Appendix
A.1 Computer code for Iman & Conover
rank correlation
Fully functional Matlab code for inducing rank corre-
lation onto a sample R with the technique proposed by
Iman and Conover [IC82]. Cstar needs to be a positive
semi-denite correlation matrix.
function corrR = rankCorr(Cstar,R)
%CORRR = RANKCORR(CSTAR,R)
% Induces rank correlation onto a sample.
% Method of Iman & Conover.
% Iman, R. L., and W. J. Conover. 1982.
% A Distribution-free Approach to Inducing Rank
% Correlation Among Input Variables.
% Communications in Statistics: Simulation and
% Computations 11:311-334.
% Input:
% Cstar : wanted correlation matrix (k,k)
% R : sample matrix (N,k)
% Output:
% corrR : correlated sample matrix (N,k)
%
C = Cstar;
[N k] = size(R);
% Calculate the sample correlation matrix T
T = corrcoef(R);
% Calculate lower triangular cholesky
% decomposition of Cstar
% i.e. P*P = C
P = chol(C);
% Calculate lower triangular cholesky decomposition
% of T, i.e. Q*Q = T
Q = chol(T);
% S*T*S = C
S = P*inv(Q);
% Replace values in samples with corresponding
% rank-indices and convert to van der Waerden scores
RvdW = -sqrt(2).*erfcinv(2*(getRanks(R)/(N+1)));
% Matrix RBstar has a correlation matrix exactly
% equal to C
RBstar = RvdW*S;
% Match up the rank pairing in R according to RBstar
ranks = getRanks(RBstar);
sortedR = sort(R);
for i=1:k
corrR(:,i) = sortedR(ranks(:,i),i);
end
function r = getRanks(u)
%R = GETRANKS(U)
% Ranking of a vector (matrix)
% Input:
% u : vector (matrix) (nrow,ncol)
% Output:
% r : rank of the vector (nrow,ncol)
%
% Returns a matrix the size of u where each
% element in u has been replaced by its
% corresponding rank value.
%
% If a matrix is used as argument, each column
% is treated separately.
%
if size(u,1)==1
u=u;
end
[nr nc] = size(u);
[s,ind] = sort(u);
for i=1:nc
r(ind(1:nr,i),i) = 1:nr;
end
A.2 Computer code for the inverse of
the CDF of the gamma distribution
Matlab code for computing the inverse of the CDF
of the gamma distribution. Adapted from codes in the
WAFO toolbox [gro04].
function r = invgamma(beta,eta,p);
%INVGAMMA Inverse of the gamma cumulative distribution
% function (CDF).
% R = INVGAMMA(BETA,ETA,P) returns an array of random
% numbers chosen from the gamma distribution with shape
% parameter BETA and scale parameter ETA.
%
% Parameter restrictions: BETA > 0, ETA > 0
%
% Some references refer to the gamma distribution with
% a single parameter. This corresponds to INVGAMMA
% with ETA = 1.
%
% Reference: Johnson, Kotz and Balakrishnan (1994)
% "Continuous Univariate Distributions, vol. 1",
% p. 494 ff Wiley
% Tested on; Matlab 7.0.1
% History:
% adapted from stixbox ms 26.06.2000
% Revised by jr 01-August-2000
% - Added approximation for higher degrees of freedom
% added b parameter ms 23.08.2000
% revised pab 23.10.2000
% - added comnsize, nargchk
% Revised by P-A to fit in Eikos, 2004
if nargin < 2
eta = 1;
end
if nargin < 3
p = rand;
end
r = zeros(size(p));
I
A.3 Computer code for Morris design
ok = (0<=p & p<=1 & beta>0 & eta>0);
k = find(beta>130 & ok);
% This approximation is from Johnson et al, p. 348,
% Eq. (17.33).
if any(k),
za = sqrt(2)*erfinv(2*p(k)-1);
r(k) = beta+beta*((1-(1/9)*beta^(-1)+ ...
(1/3)*beta^(-0.5)*za(k)).^3-1);
end
k1 = find(0<p & p<1 & beta<=130 & ok);
if any(k1)
% initial guess
r1 = max(beta-1,0.1)*ones(size(k1));
F1 = p(k1);
dr = ones(size(r1));
iy = 0;
max_count = 100;
ir = find(r1);
while(any(ir) & iy<max_count)
ri = r1(ir);
dr(ir) = (gammainc(ri,beta)-F1(ir))./ ...
pdfgamma(ri,beta,1);
ri = ri-dr(ir);
% Make sure that the current guess is > zero.
r1(ir) = ri+0.5*(dr(ir)-ri).*(ri<=0);
iy = iy+1;
ir = find((abs(dr)>sqrt(eps)*abs(r1)) & ...
abs(dr)>sqrt(eps));
end
r(k1) = r1;
if iy==max_count,
disp(Warning: invgamma did not converge.);
str = The last step was: ;
outstr = sprintf([str,%13.8f],dr(ir));
fprintf(outstr);
end
end
r(p==1 & ok) = Inf;
r = r*eta;
r(~ok) = NaN;
A.3 Computer code for Morris design
Matlab code to compute mean and standard deviation
of the elementary eects as proposed by Max D. Morris
[Mor91].
function [mu sigma] = morris(k,p,r)
%[MU SIGMA] = MORRIS(K,P,R)
% Comutes the mean and standard deviation of
% the elementary effects.
% Method of Morris.
% Max D. Morris. 1991.
% Factorial Sampling Plans for Preliminary
% Computational Experiments.
% Technometrics 33:161-174
% Input:
% K : no. of factors
% P : no. of levels
% R : no. of samples
% Output:
% MU : mean of elementary effects
% SIGMA: standard deviation of e.f.
%
% No. of model runs is R*(K+1)
%
% Get the delta value
delta = p/2/(p-1);
% Initial sample matrix B, (k+1,k),
% strictly lower triangular matrix of 1s.
B = tril(ones(k+1,k),-1);
% Matrix J, (k+1,k) matrix of 1s.
Jk = ones(k+1,k);
% Get the set of x values, [0,1-delta].
xset = 0:1/(p-1):1-delta;
% Initialize the elementary effects to zero.
eei = zeros(r,k);
indmat = []; xmat = [];
% Generate input data matrix.
for i=1:r
% Get the matrix D*, k-dimensional
% diagonal matrix in which each
% nonzero element is either +1 or
% -1 with equal probability.
Dstar = diag(sign(rand(k,1)*2-1));
% Get matrix P*, (k,k) random
% permutation matrix in which each
% column contains one element equal
% to 1 and all others equal to 0 and
% no two columns have 1s in the same
% position.
I = eye(k);
Pstar = I(:,randperm(k));
% Get vector x*, k-dimensional base value
% of X. Elements are randomly drawn from
% the allowed set.
xstar = xset(ceil(rand(k,1)*floor(p/2)));
% Get matrix B*, (k+1,k) orientation matrix.
% One elementary effect per input.
Bstar = (Jk(:,1)*xstar+(delta/2)* ...
((2*B-Jk)*Dstar+Jk))*Pstar;
% ith component has been decreased by delta.
[idec,jdec] = find(Dstar*Pstar<0);
% ith component has been increased by delta.
[iinc,jinc] = find(Dstar*Pstar>0);
% Build up an index matrix.
indmat = [indmat;jdec idec idec+1; ...
jinc iinc+1 iinc];
% Store the x-values.
xmat = [xmat;Bstar];
end
% Transform nonuniform indata from u(0,1)
% to p quantiles.
ind = find(<nonuniform input factors>);
xmat(:,ind) = 1/2/p+((p-1)/2/p-1/2/p)* ...
xmat(:,ind);
II
A.4 Computer code for FAST
% Transform distributions from standard
% uniform to general.
xmat = distTransform(xmat);
% Perform the r*(k+1) evaluations.
for i=1:r
X = xmat((i-1)*(k+1)+1:i*(k+1),:);
Y(i,:) = function2evaluate(X);
end
% Compute the r elementary effects.
for i=1:r
ind = indmat((i-1)*k+1:i*k,:);
eei(i,ind(:,1)) = (y(i,ind(:,2))- ...
y(i,ind(:,3)))/delta;
end
% Take the absolute value of the r
% elementary effects (Simlab does this..).
%eei = abs(eei);
% Get the mean and standard deviation
% of the elementary effects.
mu = mean(eei);
sigma = std(eei);
A.4 Computer code for FAST
Matlab implementation of the Fourier Amplitude Sen-
sitivity Test (FAST):
function Si = FAST(k)
%SI = FAST(K)
% First order indicies for a given model
% computed with Fourier Amplitude
% Sensitivity Test (FAST).
% R. I. Cukier, C. M. Fortuin, Kurt E. Shuler,
% A. G. Petschek and J. H. Schaibly.
% Study of the sensitivity of coupled reaction
% systems to uncertainties in rate coefficients.
% I-III Theory/Applications/Analysis
% The Journal of Chemical Physics
% Input:
% K : no. of input factors
% Output:
% SI[] : sensitivity indices
% Other used variables/constants:
% OM[] : frequencies of parameters
% S[] : search curve
% X[] : coordinates of sample points
% Y[] : output of model
% OMAX : maximum frequency
% N : number of sample points
% AC[],BC[]: fourier coefficients
% V : total variance
% VI : partial variances
MI = 4; %: maximum number of fourier
% coefficients that may be
% retained in calculating the
% partial variances without
% interferences between the
% assigned frequencies
%
% Frequency assignment to input factors.
OM = SETFREQ(k);
% Computation of the maximum frequency
% OMAX and the no. of sample points N.
OMAX = OM(k);
N = 2*MI*OMAX+1;
% Setting the relation between the scalar
% variable S and the coordinates
% {X(1),X(2),...X(k)} of each sample point.
S = pi/2*(2*(1:N)-N-1)/N;
ANGLE = OM*S;
X = 0.5+asin(sin(ANGLE))/pi;
% Transform distributions from standard
% uniform to general.
X = distTransform(X);
% Do the N model evaluations.
Y = function2evaluate(X);
% Computation of Fourier coefficients.
AC = zeros(N,1); % initially zero
BC = zeros(N,1); % initially zero
q = (N-1)/2;
N0 = q+1;
for j=2:2:N % j is even
AC(j) = 1/N*(Y(N0)+(Y(N0+(1:q))+Y(N0-(1:q)))* ...
cos(pi*j*(1:q)/N));
end
for j=1:2:N % j is odd
BC(j) = 1/N*(Y(N0+(1:q))-Y(N0-(1:q)))* ...
sin(pi*j*(1:q)/N);
end
% Computation of the general variance V
% in the frequency domain.
V = 2*(AC*AC+BC*BC);
% Computation of the partial variances
% and sensitivity indices.
for i=1:k
Vi=0;
for j=1:MI
Vi = Vi+AC(j*OM(i))^2+BC(j*OM(i))^2;
end
Vi = 2*Vi;
Si(i) = Vi/V;
end
% Selection of a frequency set. Done
% recursively as described in:
% A computational implementation of FAST
% [McRae et al.]
function OM = SETFREQ(k)
OMEGA = [0 3 1 5 11 1 17 23 19 25 41 31 ...
23 87 67 73 85 143 149 99 119 ...
237 267 283 151 385 157 215 449 ...
163 337 253 375 441 673 773 875 ...
873 587 849 623 637 891 943 1171 ...
1225 1335 1725 1663 2019];
DN = [4 8 6 10 20 22 32 40 38 26 56 62 ...
46 76 96 60 86 126 134 112 92 ...
128 154 196 34 416 106 208 328 ...
198 382 88 348 186 140 170 284 ...
568 302 438 410 248 448 388 596 ...
216 100 488 166 0];
OM(1) = OMEGA(k);
III
A.5 Computer code for EFAST
for i=2:k
OM(i) = OM(i-1)+DN(k+1-i);
end
% to use the same frequencies as SimLab...
if k==2
OM = [5 9];
elseif k==3
OM = [1 9 15];
end
A.5 Computer code for EFAST
Matlab implementation of the Extended Fourier Am-
plitude Sensitivity Test (FAST):
function [Si,Sti] = EFAST(k,wantedN)
%[SI,STI] = EFAST(K,WANTEDN)
% First order and total effect indices for a given
% model computed with Extended Fourier Amplitude
% Sensitivity Test (EFAST).
% Andrea Saltelli, Stefano Tarantola and Karen Chan.
% 1999
% A quantitative model-independent method for global
% sensitivity analysis of model output.
% Technometrics 41:39-56
% Input:
% K : no. of input factors
% WANTEDN : wanted no. of sample points
% Output:
% SI[] : first order sensitivity indices
% STI[] : total effect sensitivity indices
% Other used variables/constants:
% OM[] : vector of k frequencies
% OMI : frequency for the group of interest
% OMCI[] : set of freq. used for the compl. group
% X[] : parameter combination rank matrix
% AC[],BC[]: fourier coefficients
% FI[] : random phase shift
% V : total output variance (for each curve)
% VI : partial var. of par. i (for each curve)
% VCI : part. var. of the compl. set of par...
% AV : total variance in the time domain
% AVI : partial variance of par. i
% AVCI : part. var. of the compl. set of par.
% Y[] : model output
% N : no. of runs on each curve
NR = 1; %: no. of search curves
MI = 4; %: maximum number of fourier
% coefficients that may be
% retained in calculating the
% partial variances without
% interferences between the
% assigned frequencies
%
% Computation of the frequency for the group
% of interest OMi and the no. of sample points N.
OMi = floor((wantedN/NR-1)/(2*MI)/k);
N = 2*MI*OMI+1;
if(N*NR < 65)
fprintf([Error: sample size must be >= ...
65 per factor.\n]);
return;
end
% Algorithm for selecting the set of frequencies.
% OMci(i), i=1:k-1, contains the set of frequencies
% to be used by the complementary group.
OMci = SETFREQ(N-1,OMi/2/MI);
% Loop over the k input factors.
for i=1:k
% Initialize AV,AVi,AVci to zero.
AV = 0;
AVi = 0;
AVci = 0;
% Loop over the NR search curves.
for L=1:NR
% Setting the vector of frequencies OM
% for the k factors.
cj = 1;
for j=1:k
if(j==i)
% For the factor of interest.
OM(i) = OMi;
else
% For the complementary group.
OM(j) = OMci(cj);
cj = cj+1;
end
end
% Setting the relation between the scalar
% variable S and the coordinates
% {X(1),X(2),...X(k)} of each sample point.
FI = rand(1,k)*2*pi; % random phase shift
S_VEC = pi*(2*(1:N)-N-1)/N;
OM_VEC = OM(1:k);
FI_MAT = FI(ones(N,1),1:k);
ANGLE = OM_VEC*S_VEC+FI_MAT;
X = 0.5+asin(sin(ANGLE))/pi;
% Transform distributions from standard
% uniform to general.
X = distTransform(X);
% Do the N model evaluations.
Y = function2evaluate(X);
% Subtract the average value.
Y = Y-mean(Y);
% Fourier coeff. at [1:OMi/2].
NQ = (N-1)/2;
N0 = NQ+1;
COMPL = 0;
Y_VECP = Y(N0+(1:NQ))+Y(N0-(1:NQ));
Y_VECM = Y(N0+(1:NQ))-Y(N0-(1:NQ));
for j=1:OMi/2
ANGLE = j*2*(1:NQ)*pi/N;
C_VEC = cos(ANGLE);
S_VEC = sin(ANGLE);
AC(j) = (Y(N0)+Y_VECP*C_VEC)/N;
BC(j) = Y_VECM*S_VEC/N;
COMPL = COMPL+AC(j)^2+BC(j)^2;
end
% Computation of V_{(ci)}.
IV
A.5 Computer code for EFAST
Vci = 2*COMPL;
AVci = AVci+Vci;
% Fourier coeff. at [P*OMi, for P=1:MI].
COMPL = 0;
Y_VECP = Y(N0+(1:NQ))+Y(N0-(1:NQ));
Y_VECM = Y(N0+(1:NQ))-Y(N0-(1:NQ));
for j=OMi:OMi:OMi*MI
ANGLE = j*2*(1:NQ)*pi/N;
C_VEC = cos(ANGLE);
S_VEC = sin(ANGLE);
AC(j) = (Y(N0)+Y_VECP*C_VEC)/N;
BC(j) = Y_VECM*S_VEC/N;
COMPL = COMPL+AC(j)^2+BC(j)^2;
end
% Computation of V_i.
Vi = 2*COMPL;
AVi = AVi+Vi;
% Computation of the total variance
% in the time domain.
AV = AV+Y*Y/N;
end % L=1:NR
% Computation of sensitivity indicies.
AV = AV/NR;
AVi = AVi/NR;
AVci = AVci/NR;
Si(i) = AVi/AV;
Sti(i) = 1-AVci/AV;
end % i=1:k
% Algorithm for selection of a frequency
% set for the complementary group. Done
% recursively as described in:
% Appendix of Sensitivity Analysis
% [Saltelli et al.]
function OMci = SETFREQ(Kci,OMciMAX)
if Kci==1
OMci = 1;
elseif OMciMAX==1
OMci = ones(1,Kci);
else
if(OMciMAX < Kci)
INFD = OMciMAX;
else
INFD = Kci;
end
ISTEP = round((OMciMAX-1)/(INFD-1));
if(OMciMAX == 1)
ISTEP = 0;
end
OTMP = 1:ISTEP:INFD*ISTEP;
fl_INFD = floor(INFD);
for i=1:Kci
j = mod(i-1,fl_INFD)+1;
OMci(i) = OTMP(j);
end
end
V

You might also like