You are on page 1of 4

Fundations of Estatistical Inference Year 2014-2015

Practice 1

Point Estimation and Interval Estimation

Practice objectives: Evaluate by R simulation the properties of estimators of a parameter ob-


tained by different methods.
Learn the basic use of R to calculate confidence intervals. Empirically test the frequentist
concept of confidence of an interval estimator.

EXERCICE 1
Assume that the measurement error of a certain device is distributed according to a uniform
distribution on the interval [0, ]. We want to estimate the parameter () from samples of different
sizes, simulated with R.

1. Construct the estimators of using the moments and maximum likelihood methods.

2. Simulation exercice:

Generate the value of the parameter, fixing the seed. The true value of the parameter
will be a random value between 0 and 2:
> set.seed(12345)
> theta=runif(1,min=0,max=2)
We do the process one time: Mesure k=10 values of measurement errors (generate with
R) and estimate its value using both methods. (Hint: Remember in R, the uniform values
are generated by the function runif, the maximum is calculated with the max function
and the mean with the meanfunction.
We do the simulation 1000 times and record the values of both estimators for each
sample:
#Matrix with 1000 rows and 2 columns for the estimators
> res=matrix(0,nc=2,nr=1000)
> for (i in 1:1000){
+ x=...generate k=10 values of an uniform between 0 and theta...
+ res[i,]=c(...value of MM estimator...,...value of ML estimator...)
+ }
We graph the values of the estimators in a boxplot and we mark the real value of with
a dotted red line.
> boxplot(res)
> abline(h=theta,col=2,lty=3)
Compute the mean and the variance of the calculated statistics:
> apply(res,2,mean)
> apply(res,2,var)

1
Moments method Maximum Likelihood
Sample size Mean Variance Mean Variance
k = 10
k = 100
k = 1000

3. Repeat this simulation exercice with samples of size k= 100 and 1000. Fill the table above.

4. Based on the results, argue advantages and disadvantages of each estimator based on the
sample size. According to the criteria of lower mean square error, which do you think is the
best estimator?

EXERCICE 2
Consider that the height of the inhabitants of a country could be modelized with a random
variable X N ( = 175, 2 = 102 ).

1. We will generate a random sample of size n = 20 of X (that is as we take randomly 20 people


and measure its height).

x <- rnorm(20,mean=175,sd=10)

2. Do a descriptive statistics analysis to calculate the mean and the standard deviation.

> summary(x)
> mean(x)
> sd(x)

3. Calculate CI95 % () supposing 2 known using the function above:

ic <- function (x, sigma, alpha = 0.05)


{
n <- length(x)
m <- mean(x)
ul <- m + qnorm(1 - alpha/2) * sigma/sqrt(n)
ll <- m - qnorm(1 - alpha/2) * sigma/sqrt(n)
cat("IC: (", 100 * (1 - alpha), "%)", ll, ul, "\n")
}

we can calculate the confidence interval writing

> ic(x,10)

The functions code can be saved in files with the extension .R. Functions stored can be loaded
into the program using

> source("C:/misprogramas/R/ic.R")

being ic.R the file with the code of the function ic.

2
4. Calculate the CI99 % (). How do you explain that this interval is wider more reliable than the
previous one?

5. Calculate the CI95 % () supposing 2 unknown.

> n <- length(x)


> alpha <- 0.05
> m <- mean(x)
> s <- sd(x)
> ul <- m + qt(1 - alpha/2, n - 1) * s/sqrt(n)
> ll <- m - qt(1 - alpha/2, n - 1) * s/sqrt(n)

Why this interval is expected to be wider than the previous one? Could it have been narrower?

6. The CI95 % () supposing 2 unknown could be obtained with the R instruction t.test:

> t.test(x)

EXERCICE 3
We will reproduce the results taking 200 samples of 50 people and measuring its height. We
take 200 samples of random variable height X N ( = 175, 2 = 102 ), with size n = 50 each.

1. We generate random data in a matrix of 200 rows and n = 50 columns.

> X <- matrix(rnorm(200*50,mean=175,sd=10),ncol=50)

2. Calculate the sample mean of 200 samples of size 50:

> vm <- apply(X,1,mean)


> vm

You will have in vm 200 values of Xn , for n = 50.

3. Calculate the mean and the standard deviation of the 200 sample means. Draw a histogram.

> mvm <- mean(vm)


> svm <- sd(vm)
> hist(vm)

Does the shape of the histogram of the function remember the shape of the density of a
normal (Gaussian)?

4. Calculate the sample variance of 200 samples of size 50:

> vs2 <- apply(X,1,var)


> vs2

You will have in vs 200 values of S 2 , for n = 50. Draw a histogram of vs.

> hist(vs2)

3
Does the shape of the histogram of the function remember the shape of the density of a
Chi-Square?

5. Calculate for each of these 200 samples the lower end and the upper end of the confidence
interval CI95 % () with known:
Bottom end:

> ll <- vm - qnorm(0.975)*10/sqrt(50)

Upper end:

> ul <- vm + qnorm(0.975)*10/sqrt(50)

6. How many of these intervals do not contain the true value of the parameter = 175?

> sum(ll > 175 | ul < 175)

You can draw the 200 intervals using a graph. You also can see the proportion of intervals
that do not contain the true value of

> plot(0,type="n",xlim=c(0,200),ylim=c(167,183))
> abline(h=175,col=4)
> segments(1:200,ll,1:200,ul,col=1+(ll>175 | ul<175))

7. Repeat the last two items and construct CI95 % () with unknown.

You might also like