You are on page 1of 2

YI-TING CHANG

006699414
Data Driven Decision Making

Sampling distribution of means


11
10
9
8
7
Frequency

6
5
4
3
2
1
0
100
120
140
160
180
200
220
240
260
280
300
320
340
360
380
400
420
80

Weights

By this practice, I understand that we can uses some Excel functions to randomly select some sample groups,
find the means of each group, and create a sampling distribution of their means. Even though the population
distribution of 5000 sumo wrestlers and jockeys is binormal, the distribution of the sample means is
approximately normal. CLT states that as n gets larger, the sampling distribution of the means becomes
normal, that means the standard error of the mean will becomes smaller.

To proof CLT, I put the numbers into the equation and compare n=50 and n=1000:
If n = 50, = 1.62/√50 = 0.23
If n = 1000, = 1.62/√1000 = 0.05
𝜎𝑥̅ 0.05 < 𝜎𝑥̅ 0.23

Therefore, when n gets larger, the standard error of the mean becomes smaller and the sampling distribution
of the means becomes closer to normal. CLT is proofed.

Distribution of Sample Data Confidence Intervals for Population Mean

A Confidence Interval is a range of values we are fairly sure our true value lies in. In the value that professor
gives me, I have 𝑥̅ = 0, 𝛼 = 0.05, which means z=1.96, 𝜎 = 1, 𝑛1 = 1, and 𝑛2 = 100.
𝜎 1
To find the confidence interval, I put the number into the equation 𝑥̅ ± Z 𝑛 , which test #1 is 0 ± 1.96 = 1.96,
√ √1
1
and test #2 is 0 ± 1.96 = 0.196. I find test #1’s true mean has 95% chance to lie between -1.96 to 1.96
√100
while test #2 has the same chance to lie between -0.196 to 0.196.
In my simulator, when 95% of chance is fixed, the sample mean of test #1 is 100% in CI, and test #2 is 96%. In
conclusion, we should put more samples in test to find a more accurate true value.
Distribution of Sample Data Distribution of Test Statistics

Hypothesis test evaluates two mutually exclusive statements about a population to determine which one best
support the sample data. In this demo, null hypothesis is 0 and alternate hypothesis is not equal to 0.
𝑥̅ −𝜇 −0.2−0
When 𝛼 = 0.05, z-critical = 1.96, and z-calc = 𝑠⁄ = 2.064 = -0.48. In this case, if |z-calc| > |z-critical|, reject
√𝑛
√25
the hypothesis; if |z-calc| ≤ |z-critical|, fail to reject the hypothesis. |-0.48| < |1.96|. Therefore, fail to reject
that null hypothesis is 0. More samples give more accurate result.

a. Explain the inter-relationships between “business information intelligence”, “business statistical


intelligence”, and “business modeling intelligence”.
Together we use concept of statistics, methods of managing supply chain, and information that we collected
and analyzed to build business models to help make better business decisions.

b. Explain the Plan, Perform, Analyze, Reflect (PPAR) Cycle.


First, we plan to apply a tool that answers a business question. Second, we perform statistical analysis. Third,
we analyze results and evaluate results. Fourth, we reflect results to gain insights. Last, we go back to the first
step to plan the next step. Each time we repeat this process, we gain a little more insights. We repeat this
cycle until we have sufficient results to tell a story.
c. Why is it important to take a random sample?
To have equal chance of being selected and to better estimate their corresponding population parameters.

d. What is the meaning of the central limit theorem?


The central limit theorem is the distribution of sample means approaches a normal pattern as the sample size
gets larger.

You might also like