You are on page 1of 8

LAB 3 Joey Martinez

05/02/16
Math 311

The following sets of data were generated using the random data generator on Minitab. Part 1
of this lab will demonstrate the behavior of these randomly generated ‘nested’ raw data sets and their
tendency toward normaility as the size of samples increase. We being by examining samples of size 10,
50, 100, and 1000. Then we proceed to compare and contrast these data sets based on the type of
distribution generated. The first distribution we analyze is a standard normal distribution, with mean
µ=0 and standard deviation σ=1. The second we discuss is another normal distribution; however, this
distribution will have a mean of µ=2 and standard deviation of σ=10. Finally, we examine the behavior of
a binomial distribution of trial size 10 and probability of success p=.5 . In part 2, we determine the
probilities of hypothetical events from three types of probability distributions; these will be the normal,
binomial and uniform distributions.

Part 1)

A) Using a standard normal distribution (n_1), with mean µ=0 and standard deviation σ=1, we
generate random sample sets of size 10, 50, 100, and 1000. Each sample is denoted by
n_1= k, where k is the sample set size. Each larger sample contains the same variables of the
previous sample size; that is, each sample set is nested in the other larger sets. Thus, by the
random data generator, we display histograms and simple description statistics of our
results:

Histogram of n_1 =10 Histogram of n_1 =50


Normal Normal

3.0 Mean -0.6222 9 Mean -0.004256


StDev 1.178 StDev 1.112
N 10 N 50
8
2.5
7

2.0 6
Frequency

Frequency

5
1.5
4

1.0 3

2
0.5
1

0.0 0
-3 -2 -1 0 1 2 -3 -2 -1 0 1 2
n_1 =10 n_1 =50

Histogram of n_1 =100 Histogram of n_1 =1000


Normal Normal
20 100
Mean -0.003299 Mean -0.006312
StDev 1.101 StDev 1.051
N 100 N 1000
80
15
Frequency

60
Frequency

10

40

5
20

0 0
-3 -2 -1 0 1 2 -3 -2 -1 0 1 2 3
n_1 =100 n_1 =1000
LAB 3 Joey Martinez
05/02/16
Math 311

Descriptive Statistics: n_1 =10, n_1 =50, n_1 =100, n_1 =1000

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum


n_1 =10 10 0 -0.622 0.373 1.178 -1.866 -1.470 -1.069 0.050 1.589
n_1 =50 50 0 -0.004 0.157 1.112 -2.860 -0.839 -0.120 0.953 1.976
n_1 =100 100 0 -0.003 0.110 1.101 -2.860 -0.809 -0.036 0.851 2.680
n_1 =1000 1000 0 -0.0063 0.0332 1.0512 -3.7080 -0.7009 0.0071 0.6865 3.1119

The most obvious result from our random variable generation is the change in our sample mean
values as the size of each sample increases. Each sample has the following mean: x̄10 =-.622, x̄50=-.004,
x̄100 = -.003, and x̄1000=-.0063 , where x̄n represents the sample mean of sample size n. Now consider the
following absolute differences between the sample means and the population mean:| µ- x̄10|= .622, | µ-
x̄50|= .004, | µ- x̄100|=.003, and | µ- x̄1000|=.0063. As we can tell, when our sample size increases, the
absolute difference between our sample means and population mean value decreases. The initial
sample size offers the most disparate value among the four samples; however, as we continue to
genrate larger sample sizes, the distributions (as seen by the histograms above) become more normal
and our sample means tend to approach the true population value of µ=0. We also note that the
standard deviation begins to shrink and the median of the samples tends toward the value x=0, as
sample size increases.

Next, we will perform the same random data generation and ‘nesting’ procedure; except, we
will now change the population we are sampling from to a normal distribution with mean µ=2 and
standard deviation of σ=10. Our new sample sets will be denoted by n_2= k’, where k’ is the size of our
sample set. The results are as follows:

Histogram of n_2 =10 Histogram of n_2 =50


Normal Normal
Mean 3.864 16
4 Mean 0.6027
StDev 6.699 StDev 8.773
N 10 14 N 50

12
3

10
Frequency

Frequency

2 8

1 4

0 0
-10 -5 0 5 10 15 20 -20 -10 0 10 20
n_2 =10 n_2 =50

Histogram of n_2 =100 Histogram of n_2= 1000


Normal Normal
18 90
Mean 1.181 Mean 1.610
StDev 8.048 StDev 9.960
16 N 100 80 N 1000

14 70

12 60
Frequency

Frequency

10 50

8 40

6 30

4 20

2 10

0 0
-15.0 -7.5 0.0 7.5 15.0 22.5 -30 -20 -10 0 10 20 30
n_2 =100 n_2= 1000
LAB 3 Joey Martinez
05/02/16
Math 311

Descriptive Statistics: n_2 =10, n_2 =50, n_2 =100, n_2= 1000

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum


n_2 =10 10 0 3.86 2.12 6.70 -3.12 -2.21 2.75 9.04 16.19
n_2 =50 50 0 0.60 1.24 8.77 -17.68 -5.66 -0.54 7.33 20.12
n_2 =100 100 0 1.181 0.805 8.048 -17.987 -3.663 0.645 6.756 21.696
n_2= 1000 1000 0 1.610 0.315 9.960 -31.761 -5.007 1.623 8.399 30.855

As with the previous set of data, we make note of the sample means. The generated samples of
increasing size have the following sample means: x̄10=3.86, x̄50=.60, x̄100=1.181, and x̄1000=1.610, where
x̄n denotes the mean of a sample of size n. We also consider the absolute differences between the
population mean and sample means: | µ- x̄10|=1.86, | µ- x̄50|=1.40,| µ- x̄100|=.819, and| µ- x̄1000|=.390.
As the samples increase in size, the absolute differences between the sample means and the true mean
value approches 0. Hence, as before, the increasing sample sizes correspond with our sample
distributions tending towards the normal distribution with the assigned parameters µ=2 and σ=10.

The law of large numbers states that, as n∞, the sample mean x̄ approaches the population
mean. The data above is a demonstration of complience to this law. Given an initial population with
parameters µ and σ, as we increased our sample sizes, our sample values and distributions of samples
tended toward the population values and distributions. Consider the following comparisons between
the population distributions and our distributions of sample sizes 10 and 1000, where the sample size N
increases from left to right:

N --------------------------------------------------------------------------------------------------------------> ∞

Histogram of n_1 =10 Histogram of n_1 =1000 Distribution Plot (n_1)


Normal Normal Normal, Mean=0, StDev=1
Mean -0.6222
100
3.0 Mean -0.006312 0.4
StDev 1.178 StDev 1.051
N 10 N 1000
2.5 80
0.3
2.0 1
60
Frequency

Frequency

Density

1.5 0.2
40
1.0

0.1
20
0.5

0.0 0 0.0
-3 -2 -1 0 1 2 -3 -2 -1 0 1 2 3 0
n_1 =10 n_1 =1000 X

Histogram of n_2 =10 Histogram of n_2= 1000 Distribution Plot (n_2)


Normal Normal Normal, Mean=2, StDev=10

Mean 3.864
90
4 Mean 1.610 0.04
StDev 6.699 StDev 9.960
N 10 80 N 1000

70
3 0.03
60 1
Frequency

Frequency

Density

50
2 0.02
40

30

1 0.01
20

10

0 0 0.00
-10 -5 0 5 10 15 20 -30 -20 -10 0 10 20 30 2
n_2 =10 n_2= 1000 X
LAB 3 Joey Martinez
05/02/16
Math 311

B) Now we will randomly generate samples of size 10, 50, 100, and 1000 from a binomial
distribution with trials of size k=10 and probability of success p=.5. The sample variables will
be nested in the samples of larger size, as with the other distributions. We denote the
samples by n_bi=m, where m is the size of the sample. Thus we have the following
histograms and descriptive statistics:

Histogram of n_bi=10 Histogram of n_bi=50

0.4 0.30

0.25

0.3
0.20
Density

Density
0.2 0.15

0.10

0.1

0.05

0.0 0.00
2 3 4 5 6 7 2 3 4 5 6 7 8
n_bi=10 n_bi=50

Histogram of n_bi=100 Histogram of n_bi=1000


0.30 Normal
Mean 4.925
250 StDev 1.573
0.25 N 1000

200
0.20
Density

Frequency

150
0.15

100
0.10

0.05 50

0.00 0
1 2 3 4 5 6 7 8 2 4 6 8
n_bi=100 n_bi=1000

Descriptive Statistics: n_bi=10, n_bi=50, n_bi=100, n_bi=1000

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum


n_bi=10 10 0 4.600 0.452 1.430 2.000 3.750 5.000 5.250 7.000
n_bi=50 50 0 4.760 0.215 1.519 2.000 4.000 5.000 6.000 8.000
n_bi=100 100 0 4.510 0.164 1.636 1.000 4.000 4.000 5.750 8.000
n_bi=1000 1000 0 4.9250 0.0498 1.5735 1.0000 4.0000 5.0000 6.0000 9.0000

By the above, the differences between the distributions of each sample is most apparent
between n_bi=10 and n_bi=1000. The shape of the distribution of sample size 10 has no obvious
characteristics, aside from being slightly symmetric; however, as the sample becomes larger, we notice
that the distributions of the sample sizes 50 and 100 approach a normal distribution. In fact, when we
reach the final distribution of sample size 1000 , the shape of the distribution becomes approximately
normal, with mean x̄1000=4.9250 µ=kp=5 and standard deviation s1000=1.5735 σ=sqrt(kp(1-p))=1.118,
where k=10 is the trial size and p=.5 is the probability of success; as defined above.
LAB 3 Joey Martinez
05/02/16
Math 311

Part 2)

We now consider scenarios involving the normal, binomial and uniform distributions. We
subdivide this section of the lab into parts a, b and c for the normal, binomial, and uniform distributions,
respectively.

a) Suppose we are in the land of Springfield. Our friend Homer Simpson is at the local tavern
Moe’s enjoying a beer. Homer measures the amount of beer in his 16oz mug and finds it to be
filled with 14oz. Let the amount of beer in a mug at Moe’s be normally distributed, with mean
µ=15oz and standard deviation σ=.75oz. We determine whether Homer is being undercut by
Moe or experiencing a coincidence by finding the probability of being served a beer of 14 oz or
less. Consider the following graph demonstrating the cumulative probability of receiving 14oz or
less:
Distribution Plot of Beer Ammounts at Moe's
Normal, Mean=15, StDev=0.75
0.6

0.5

0.4
Density

0.3

0.2

0.1

0.09121
0.0
14 15
X=Ounces of Beer

We find that P(x ≤ 14oz)=.09121 and hence are forced to tell Homer that approximately only 9% of beers that
Moe serves is filled with 14oz or less. This means that Moe is more than likely singling Homer out and purposefully under
filling his mug; perhaps Moe has finally found out that Homer’s son Bart has been the one crank calling his tavern for all
these years…doh!

Unfortunately this is not the news that Homer had hoped for and demands that no more than 5% should be under
filled. Since the mean is 15oz, we consider an under filled mug to be any amount under this value. Thus we have the
following cut-off of 14.91oz of beer satisfying this condition, as demonstrated by the graph below.

Distribution Plot of Beer Amounts at Moe's


Normal, Mean=15, StDev=0.75
0.6

0.5

0.4
Density

0.3

0.2

0.45
0.1

0.0
14.91 15

X= Ounces of Beer
LAB 3 Joey Martinez
05/02/16
Math 311

b) Suppose we are examining the weather in Ellensburg, WA. We find that during the month of
April the chances of experiencing high winds in Ellensburg are 36%, on average. To predict the
following we choose 10 random days in the month of April. Since each day’s wind is
independent of the next, the chance of wind is fixed, the measurement of the wind is identical,
and the town can experience high wind or not, we can apply binomial probability to our
situation. We let the binomial distribution have trial size k=10 and chance of success p=.36.
1. We begin by finding the probability that exactly 3 days have high wind. This is
determined by the following graph:

Distribution Plot (x=3)


Binomial, n=10, p=0.36

0.25

0.20
0.2462
Probability

0.15

0.10

0.05

0.00
0 3 8
X

Thus we find that the probability is given by P(x=3)=.2426 or approximately 24.26%.

2. Now we are interested in the chance that at least 3 days will have high winds:

Distribution Plot
Binomial, n=10, p=0.36

0.25

0.20
Probability

0.15

0.10
0.7595

0.05

0.00
0 3
X

We find the probability to be P(3≤x)=.7595 or approximately 75.95%.


LAB 3 Joey Martinez
05/02/16
Math 311

3. Finally, we wish to find the chance of experiencing high winds for no more than 3
days:
Distribution Plot
Binomial, n=10, p=0.36

0.25

0.20

0.4868

Probability 0.15

0.10

0.05

0.00
3 8
X

We find that the chances of experiencing high wind for no more than 3 days to be
given by P(x≤3)=.4868 or 48.68%, as desired.

c) Consider a package delivery service that divides its packages into weight classes. Let the
packages in the 14lbs to 20lbs weight class be uniformly distributed. Suppose we wish to find
the following probabilities.
1. If customers are charged an extra fee for packages between 18-20lbs, what is the probability
a customer will have to pay this fee? In other words, we must find the chances of a package
falling between these values. We consider the following distribution:

Distribution Plot
Uniform, Lower=14, Upper=20
0.18
0.3333
0.16

0.14

0.12
Density

0.10

0.08

0.06

0.04

0.02

0.00
14 18 20
X

Thus, by the above cumulative probability density, we find that the chances are
P(18≤x≤20)=.3333, or about 33.33%.
LAB 3 Joey Martinez
05/02/16
Math 311

2. Now we wish to find the probability of a randomly selected package weighing 15lbs or less:

Distribution Plot
Uniform, Lower=14, Upper=20
0.18
0.1667
0.16

0.14

0.12

Density 0.10

0.08

0.06

0.04

0.02

0.00
14 15 20
X

We find that the probability of a package being less than or equal to 15lbs as P(x≤15)=.1667;
that is, the chances of a package being 15lbs or less is about 16.67%.

3. Finally, we are interested in determining the probability of a selecting a random package


weight of exactly 16.5lbs. Unfortunately, since the probability of a single point gives a
rectangle of zero area under the curve above, the chances of receiving an exact weight is
always 0% in a uniform distribution.

You might also like