You are on page 1of 14

ErSE-253

Data Analysis in Geosciences


Homework 4
Due on Thursday, 13 October 2016

Wardana Saputra
146629

Earth Science and Engineering Program


Fall 2016

Problem 1
You throw two dice (D and Q) and then you subtract the outcome of Q from the outcome of D,
which is a new random variable S. The die Q is a four-sided die, while the die D is a normal 6sided die, but it has been modified, such that there is a higher probability to receive high numbers,
i.e. the probability you receive 1, 2, or 3 is 1/12, 1/12, and 2/12, respectively, but it is 2/12, 3/12
and 3/12 for each outcome 4, 5, and 6.
a) What is the set of possible outcomes of S?
The possible outcomes are: {3, 2, 1, 0, 1, 2, 3, 4, 5}.
b) What are the corresponding probabilities?
The corresponding probabilities are not equally probable as follows:
() -3 -2 -1

4
48

6
48

8
48

10
48

8
48

6
48

3
48

1
48

2
48

c) Find the expected value and the standard deviation of S?


4
4
3

48
48
48
4
9
8
48
48
48

. ()

2
. ()

8
48
8
48

0
0

20
48
40
48

24
48
72
48

24
48
96
48

15
48
75
48

The expected value is the weighted average of n possible outcomes:

{} =
= . () =
=1

80
= 1.6667
48

The variance of a random variable can be expressed as:

{} = =

2
. ()
=1

312
80 2
( . () ) =
( ) = 3.7222
48
48
=1

Thus, the standard deviation of is


= {} = 1.9293
d) Now make a joint variable using S and the higher outcome H of D and Q. The joint
variable is (S,H). What are the possible outcomes and probabilities for the joint variable?
Pairs of joint discrete random variables (S, H), may be defined by stating all possible
outcomes
{((1) , (1) ), , ((1) , () ), , (() , (1) ), , (() , () )}
With corresponding probabilities, sum to 1.
{11 , , 1 , , 1 , , }

Wardana Saputra | 146629

Possible Outcomes of H
1

1
48
1
48
1
48
0

1
48
1
48
2
48
2
48
2
48
0

1
48
1
48
2
48
2
48
2
48
2
48
2
48
0

-3

Possible
Outcomes
of S

-2

-1

0
1

1
48
0

0
0

3
48
3
48
3
48
3
48
0

0
3
48
3
48
3
48
3
48

e) What is the marginal probability that H is equal to 3?


To calculate the marginal probability that L is equal to 5, is like summing all the joint
probabilities in the 7th row.
6

{ = 3} = 3 = 3,7 =
=1

8
= 0.16667
48

Wardana Saputra | 146629

Problem 2
In this exercise we look at calculating the global mean. Please load in the same two variables as
in Homework 3:
a) First calculate directly the mean and standard deviation of the two variables Ds and Es
and then use the cell declustering method to calculate the global mean using 1.5 km x 1.5
km cell size.
Variable
Ds
Es

Direct
mean
2.2772
111.5490

Direct standard
deviation
0.8832
52.3449

Global mean with


cell declustering
2.5305
86.1246

Taking direct mean may be harmful if we have data with high standard deviation, global
mean with cell declustering may be better to address high global variance.
b) Now systematically change the cell size from 0.5 km to 10 km and plot how the global
mean of Es and Ds changes as a function of cell size. From these graphs, pick your global
mean estimate for both Es and Ds.

Global mean of Ds tends to decrease as the cell size grows. We may notice that the
variability increases while increasing cluster size. According to this calculation, the
global mean estimation of Ds is about 2.4940.

Wardana Saputra | 146629

On the other hand, global mean of Es tends to increase following the cell size growth.
The variability seems to increase while increasing cluster size. The global mean of Es is
93.4990.
c) Study how the global mean estimate of Ds and Es changes for independently considering
Ds and Es where Cs=1 on one hand, and Cs=2 on the other hand.

Wardana Saputra | 146629

Separating variable Ds and Es seems to be a good idea as we can see in both graphs
above. It is clear that variable Cs does affect the variable Ds and Es. We notice that
without separating variable Cs our mean may be bias with the mean value in between as
we compare these results with (b).

Wardana Saputra | 146629

Problem 3
Now we look at point estimation for Ds and Es:
a) What are the estimated values of Ds and Es at (1 , 1 )=(4,15) and (2 , 2 )=(12,2) using
the polygonal, triangulation and the inverse distance methods? Explain your answer and
how you choose what points are used in the estimation.

The easiest method to do is the polygon, since we just need to choose the value that is
closest to the point estimation. Somehow it works, but it could be disaster if we have
huge discontinuities between the data. Then, we can consider the result of this polygon
as the most pessimistic method.
A better way to avoid major discontinuities is doing triangulation. To determine the
triangle, at first I rank the top three value that is closest to our point estimation (seen as
black in picture above). Then, I check whether my point is covered by defined triangle
by taking the summation of all three central angles which should result 360o, otherwise I
need to replace the third rank with the lower rank. At last, I estimate the point value
regarding the area of the opposite triangle:
. 1 + . + .
=

The inverse distance method is also considered better since we address the distance to
our estimation point. Moreover, if we have a lot of data we can use all of them instead of
using only three data points as triangulation does. In this problem, I define neighboring
points as the area covered by a circle centering the point estimation with radius 3 km as
shown in both pictures. The inverse distance mean is calculated as:
1
1

=
1
1

At last, the result of these three methods for two point estimations of variable Ds and Es
are tabulated as follows:
Method
Polygonal
Triangulation
Inverse distance, p=1

Variable Ds
(1 , 1 )=(4,15) (2 , 2 )=(12,2)
3.6455
1.6859
2.8831
1.5603
2.8644
1.7232

Variable Es
(1 , 1 )=(4,15) (2 , 2 )=(12,2)
25.9574
174.1716
36.4157
132.4452
43.8984
163.1645

Wardana Saputra | 146629

b) Now estimate Ds on a regular grid (grid spacing with 0.1 x 0.1 grid) using all the sample
values, using the inverse distance method with p=2. Display your result. Make sure to
show axes values and a colorbar.

In this problem, since we have to utilize all the sample values, I will not use circle
clustering as same as (b). Moreover, the formulation now is different since we pick the
power p=2 as follows.
1
1

=
1
1

Applying this method into so many point estimations inside 0.1x0.1 grids result a
complete colormap of Ds as shown in the picture above. But, again this is only an
estimation of a simple inverse-distance method. In the last problem, I use the fancier
method that is well-known non-biased resulting a better estimation of variable Ds.

Wardana Saputra | 146629

Bonus
Repeat (a) using Kriging and report the steps you take on the way and the resulting values at the
two estimation locations.
In order to do Kriging, the first step is finding the best covariogram model that represents the
whole data sample. Here I calculate the omni-directional variogram of variable Ds which should
be fitted with this covariance function:
() = 0 + C1 (1 exp (

3||
))

We get 0 which is the nugget effect = 0.3, 1 = 0.5, and = 6. I did it because it is easier to
detect the nugget effect using variogram than the covariogram.

Then we can convert the variogram function as covariogram function as follows:


3||
() = C1 (exp (
)) = 0.3(exp(0.5))

Now we can create the covariance matrix, by substituting the distance matrix as the lag
distance of our point estimation with all available data.
At last we can obtain the weight of each data points as the inverse of C and D, = 1 .
The estimated value at the unknown location can be determined as follows:

=1

Wardana Saputra | 146629

By Kriging method, the estimated value of (1 , 1)=(4,15) is now 3.1161 and 1.9260 for
(2 , 2 )=(12,2).
I am sure that this is the best estimation among all methods in previous problems, since Kriging
is an unbiased method that aims to have the mean of the residual close to 0, and it is the best
as it minimizes the residual variances 2 .

Finally, I tried this Kriging to 0.1x0.1km grid. After waiting for minutes I obtain this colormap
figure of variable Ds that is much better than that of Problem 3(b).

Wardana Saputra | 146629

Matlab codes
%% ==================================== Problem 1 ==================================
clc
s=[-3,-2,-1,0,1,2,3,4,5];
p=[1,2,4,6,8,10,8,6,3]./48;
Es=sum(p.*s);
Vars=sum(p.*s.^2)-(sum(p.*s))^2;
std=sqrt(Vars);
%% ==================================== Problem 2 ==================================
clc
clear all
close all
load 'DataHW3'
%% 2.a
meanDs=mean(Ds);
stdDs=std(Ds);
meanEs=mean(Es);
stdEs=std(Es);
globalmeanDs=globalmean(Ds,Xs,Ys,1.5);
globalmeanEs=globalmean(Es,Xs,Ys,1.5);
%% 2.b
h=0.5:0.05:10;
for n=1:length(h)
meanDs(n)=globalmean(Ds,Xs,Ys,h(n));
meanEs(n)=globalmean(Es,Xs,Ys,h(n));
end
globmeanDs=mean(meanDs);
globmeanEs=mean(meanEs);
figure
plot(h,meanDs,'k.-')
hold on
plot([0.5 10],[globmeanDs globmeanDs],'r')
xlabel('Cell size (km)')
ylabel('Global mean of Ds (units of cm)')
title('Global mean of Ds as a function of cell size')
figure
plot(h,meanEs,'k.-')
hold on
plot([0.5 10],[globmeanEs globmeanEs],'r')
xlabel('cell size (km)')
ylabel('global mean of Es (m above sea)')
title('Global mean of Es as a function of cell size')
%% 2.c
clc
Ds1=Ds(Cs==1); Ds2=Ds(Cs==2);
Es1=Es(Cs==1); Es2=Es(Cs==2);
Xs1=Xs(Cs==1); Xs2=Xs(Cs==2);
Ys1=Ys(Cs==1); Ys2=Ys(Cs==2);
h=0.5:0.05:10;
for n=1:length(h)
meanDs1(n)=globalmean(Ds1,Xs1,Ys1,h(n));
meanEs1(n)=globalmean(Es1,Xs1,Ys1,h(n));
meanDs2(n)=globalmean(Ds2,Xs2,Ys2,h(n));
meanEs2(n)=globalmean(Es2,Xs2,Ys2,h(n));
end
globmeanDs1=mean(meanDs1);
globmeanEs1=mean(meanEs1);
globmeanDs2=mean(meanDs2);
globmeanEs2=mean(meanEs2);
c1=[1 0.8 0]; %dark red
c2=[0.6 0 0]; %dark yellow
figure
h1=plot(h,meanDs1,'.-','color',c1);
hold on
h2=plot(h,meanDs2,'.-','color',c2);

Wardana Saputra | 146629

hold on
plot([0.5 10],[globmeanDs1 globmeanDs1],'k--')
hold on
plot([0.5 10],[globmeanDs2 globmeanDs2],'k--')
xlabel('Cell size (km)')
ylabel('Global mean of Ds (units of cm)')
legend([h1,h2],'Cs=1','Cs=2')
title('Global mean of Ds as a function of cell size')
corrcoef(h,meanDs1)
corrcoef(h,meanDs2)
figure
h1=plot(h,meanEs1,'.-','color',c1);
hold on
h2=plot(h,meanEs2,'.-','color',c2);
hold on
plot([0.5 10],[globmeanEs1 globmeanEs1],'k--')
hold on
plot([0.5 10],[globmeanEs2 globmeanEs2],'k--')
xlabel('cell size (km)')
ylabel('global mean of Es (m above sea)')
legend([h1,h2],'Cs=1','Cs=2')
title('Global mean of Es as a function of cell size')
corrcoef(h,meanEs1)
corrcoef(h,meanEs2)
% function to obtain global mean of V by cell declustering with lag h
function [meanV]=globalmean(V,Xs,Ys,h)
n=length(V);
nx=ceil(max(Xs)/h)+1;
ny=ceil(max(Ys)/h)+1;
for i=1:nx
for j=1:ny
cell(i,j)=mean(V(and(and(Xs>=(i-1)*h,Ys>=(j-1)*h),and(Xs<i*h,Ys<j*h))));
end
end
cell=cell(~isnan(cell));
meanV=mean(cell(:));
end
% function to obtain ecluidian distance between two points
function [d]=distance(Xs,Ys,P)
d=sqrt((Xs-P(1)).^2+(Ys-P(2)).^2);
end
% function to obtain point estimation by polygonal method
function [mu]=estimatepoly(M,Xs,Ys,A)
dist=distance(Xs,Ys,A);
mu=M(dist==min(dist));
end
% function to obtain point estimation by triangulation method
function [mu]=estimatetriangular(M,Xs,Ys,O)
dist=distance(Xs,Ys,O);
[ind ind]=sort(dist);
n=3;
P(1:n)=ind(1:n);
% the triangular has to cover the point
while (~inpolygon(O(1),O(2),Xs(P),Ys(P)))
n=n+1;
P(3)=ind(n);
end
Aoab=polyarea([O(1),Xs(P(1)),Xs(P(2))],[O(2),Ys(P(1)),Ys(P(2))]);
Aobc=polyarea([O(1),Xs(P(2)),Xs(P(3))],[O(2),Ys(P(2)),Ys(P(3))]);
Aoac=polyarea([O(1),Xs(P(1)),Xs(P(3))],[O(2),Ys(P(1)),Ys(P(3))]);
Aabc=polyarea(Xs(P),Ys(P));
mu=(M(P(1))*Aobc+M(P(2))*Aoab+M(P(3))*Aoab)/Aabc;
end

Wardana Saputra | 146629

% function to obtain point estimation by inverse distance method


function [mu]=estimateinversedist(M,Xs,Ys,O,n)
dist=distance(Xs,Ys,O);
[distin ind]=sort(dist);
r=2.5; %radius of neighborhood
P=ind(distin<=r);
mu=sum((1./dist(P)).^n.*M(P))/sum((1./dist(P)).^n);
end
%% ==================================== Problem 3 ==================================
%% 3.a
clc
clear all
close all
load 'DataHW3'
A=[4,15];
B=[12,2];
% polygonal
polyDs_A=estimatepoly(Ds,Xs,Ys,A);
polyEs_A=estimatepoly(Es,Xs,Ys,A);
polyDs_B=estimatepoly(Ds,Xs,Ys,B);
polyEs_B=estimatepoly(Es,Xs,Ys,B);
% triangulation
triDs_A=estimatetriangular(Ds,Xs,Ys,A);
triEs_A=estimatetriangular(Es,Xs,Ys,A);
triDs_B=estimatetriangular(Ds,Xs,Ys,B);
triEs_B=estimatetriangular(Es,Xs,Ys,B);
% inverse distance with power n=1
n=1;
invDs_A=estimateinversedist(Ds,Xs,Ys,A,n);
invEs_A=estimateinversedist(Es,Xs,Ys,A,n);
invDs_B=estimateinversedist(Ds,Xs,Ys,B,n);
invEs_B=estimateinversedist(Es,Xs,Ys,B,n);
%% 3.b
n=2; %power of inverse distance method
s=0.1; %grid spacing
x=0:s:ceil(max(Xs));
y=0:s:ceil(max(Ys));
for i=1:length(x)-1
for j=1:length(y)-1
P=[1/2*(x(i+1)+x(i)),1/2*(y(j+1)+y(j))];
D(i,j)=sum((1./dist.^n.*Ds)/sum((1./dist).^n));
end
end
D=D';
imagesc(x,y,D)
colormap jet
colorbar
set(gca,'YDir','normal','DataAspectRatio',[1 1 1])
title('Measure of humidity content (units of cm)')
xlabel('East location (km)')
ylabel('North location (km)')
%% ==================================== Kriging ====================================
clc
clear all
close all
load 'DataHW3'
A=[4,15];
B=[12,2];
% construct matrix of distances
for i=1:length(Ds)
for j=1:length(Ds)
dist(i,j)=distance(Xs(j),Ys(j),[Xs(i),Ys(i)]);
end
end

Wardana Saputra | 146629

% find omni-directional covariance


for h=1:25
IND=find(and(dist>(h-1),dist<h));
[I,J]=ind2sub(size(Ds),IND);
u=Ds(I);
v=Ds(J);
varh(h)=1/(2*length(u))*sum((v-u).^2);
end
% fit variogram function
clc
n=17;
h=[0:n-1];
varh=varh(1:n);
plot(h,varh(1:n),'k.')
hold on
fun=@(x,h)0.3+x(1)*(1-exp(-3.*h/x(2)));
x0=[1,1];
x0=lsqcurvefit(fun,x0,h,varh);
fx=0.3+x0(1)*(1-exp(-3.*h/x0(2)));
plot(x,fx,'r','linewidth',1)
title('omni-directional variogram of Ds')
xlabel('lag h (km)')
ylabel('\gamma (h)')
% flip as covariogram function
figure
fx=0.5*exp(-0.5.*h);
plot(x,fx,'r','linewidth',1)
title('Covariogram of Ds')
xlabel('lag h (km)')
ylabel('\gamma (h)')
%% fill covariogram matrix
clc
C=0.5*exp(-0.5.*dist);
n=length(C);
C(n+1,:)=1;
C(:,n+1)=1;
C(n+1,n+1)=0;
%% Estimate point A
distA=distance(Xs,Ys,A);
D=0.5*exp(-0.5.*distA);
D(n+1)=1;
w=C\D;
vA=sum(w(1:n).*Ds);
%% Estimate point B
distB=distance(Xs,Ys,B);
D=0.5*exp(-0.5.*distB);
D(n+1)=1;
w=C\D;
vB=sum(w(1:n).*Ds);
%% Estimate all
s=0.1; %grid spacing
x=0:s:ceil(max(Xs));
y=0:s:ceil(max(Ys));
for i=1:length(x)-1
for j=1:length(y)-1
P=[1/2*(x(i+1)+x(i)),1/2*(y(j+1)+y(j))];
distP=distance(Xs,Ys,P);
D=0.5*exp(-0.5.*distP);
D(n+1)=1;
w=C\D;
cell(i,j)=sum(w(1:n).*Ds);
end
end
cell=cell';

Wardana Saputra | 146629

You might also like