You are on page 1of 4

Sta 104 - Simulation Homework #2

Due March 19th in class

Moments and Law of Large Numbers


The goal of this simulation assignment is give you some insight on the way in which moments of a
distribution affect the rate of convergence for the first central moment to 0. Our approach will involve
sampling from the normal, exponential, and gamma distributions with different parameter values using
different numbers of samples to estimate the first central moment and exploring the distribution that
results from multiple simulations.

Simulating the Normal Distribution


We can produce samples from the normal distribution in R using the rnorm function which is given the
following arguments: n - the number of normal samples to generate, mean - the mean of the distribution,
sd - the standard deviation of the distribution. Therefore if we wanted to generate 10 Normals with mean
1 and variance 1 we can run the following
> rnorm(10, 1, 1)
[1] -0.2070657 1.2774292
[7] 0.4252600 0.4533681

2.0844412 -1.3456977
0.4355480 0.1099622

1.4291247

1.5060559

In order to calculate the first central moment we would take the mean of the results of the sample and
then subtract the mean of the distribution therefore to calculate the estimate we would use
> mean(rnorm(10, 1, 1)) - 1
[1] -0.1181707
> mean(rnorm(10, 1, 1)) - 1
[1] -0.3879468
Note that we calculate the estimate twice and since each sample is different for each the results are slightly
different. We can write a function based on this to automate the process of running multiple simulations
and returning the results
> sim_normal = function(nsim, nsamp, mean = 0, sd = 1) {
+
sapply(1:nsim, function(x) mean(rnorm(nsamp, mean, sd)) +
mean)
+ }
> sim_normal(5, 10, 1, 1)
[1] -0.76619306 -0.60979706 -0.27886474 0.61659223 -0.04230209
For a larger number of simulations it makes sense to use summary statistics like mean and variance to
describe the resulting distribution, before doing this we need to save the results in a variable so that we
can calculate the mean and variance of the same results (since every run of sim_normal returns different
results).

> x = sim_normal(10000, 10, 1, 1)


> mean(x)
[1] 0.003233639
> var(x)
[1] 0.09902793
> hist(x)

1500
1000
0

500

Frequency

2000

2500

Histogram of x

1.0

0.5

0.0

0.5

1.0

1.5

Since this distribution is centered around 0 the mean is not very interesting but variance tells us something
about the spread of the distribution and hence tells us something about how close are results are getting
to zero. In order to explore this we can look at the variance we obtain as we increase the number
samples in each of the simulated iterations and observe how that affects the variance of the simulated
distribution.

> var(sim_normal(10000, 10, 1, 1))


[1] 0.09724735
> var(sim_normal(10000, 100, 1, 1))
[1] 0.009958978
> var(sim_normal(10000, 1000, 1, 1))
[1] 0.001008965
> var(sim_normal(10000, 10000, 1, 1))
[1] 9.777369e-05
As we would expect these results clearly show that the variance of the simulated distribution shrinks as
the number of samples per iteration is increase which is what we would expect based on the Law of Large
Numbers. What we now would like to explore is how quickly this convergence occurs as the variance and
skewness of a distribution change.

Simulation Functions
sim_normal = function(nsim, nsamp, mean = 0, sd = 1) {
sapply(1:nsim, function(x) mean(rnorm(nsamp, mean, sd))-mean )
}
sim_exp = function(nsim, nsamp, lambda) {
sapply(1:nsim, function(x) mean(rexp(nsamp, lambda))-1/lambda )
}
sim_gamma = function(nsim, nsamp, k, theta) {
sapply(1:nsim, function(x) mean(rgamma(nsamp, shape=k, scale=theta))-k*theta )
}

Assignment
On your own answer the following questions:
1. Below is a table of results for simulations with different numbers of iterations. For the missing cells
run the appropriate simulation and enter the observed variance of the simulation distribution that
you observe. For all results use nsim=10000.
Number of samples
Variance

Skewness

10

100

1,000

10,000

N (0, 12 )
N (0, 52 )
N (0, 102 )
N (0, 252 )
Exp(1)
Exp(1/5)
Exp(1/10)
Exp(1/25)
Gamma(1, 2)
Gamma(2, 2)
Gamma(3, 2)
Gamma(4, 2)
2. Create a line plot for the normal, exponential, and gamma results separately where the x-axis is the
number of samples and the y-axis is the observed variance. Each plot should contain a separate line
for each of the parameter settings (4 lines on the normal plot, 4 lines on the exponential plot, and
4 lines on the gamma plot respectively).
3. Describe any patterns that you observe as they relate to the speed of convergence of the different
distributions based on their variance and skewness.

You might also like