Short Non-Technical Guide To Statistical Inference

Statistical inference
Methods for drawing conclusions about a population from sample data 2 key methods: Point estimates: calculate a single value (such as mean, proportion etc.) Confidence interval: calculate range of values that is likely to contain the true value of the parameter.
Standard error of an estimate

Standard deviation of an estimate is called the standard error. It describes the typical error or uncertainty associated with the estimate.
Computing SE for the sample mean:
SE = n
n number of independent observations standard deviation Since standard deviation is typically unknown, if we have a sample size n 30, we can use sample standard deviation (s) instead.
Central Limit Theorem

If a sample consists of at least 30 independent observations and the data are not strongly skewed, then the distribution of the sample mean is well approximated by a normal model. How to verify that sample observations are independent: random sample consists of less than 10% of observations (rule of thumb) subjects in an experiment are randomly assigned.
Confidence interval
point estimate z * SE
z*SE - margin of error Interpreting confidence interval: We are XX% confident that the population parameter is between....
Nontechnical Introduction to Statistical Inference Prepared by: Gabriela Hromis
Hypothesis testing
Do not reject H0 H0 true HA true
Reject H0
Correct Decision Type 1 Error 1- (significance level) Type 2 Error Correct Decision 1- (power)
... significance level (probability, under H0, that the test concludes HA) 1- power of the test (probability, under HA, that the test concludes HA) The p-value quantifies how strongly the data favor HA over H0 (that is, shows the odds that we got difference in two treatments purely by chance). A small p-value (usually < 0: 05) corresponds to sufficient evidence to reject H0 in favor of HA . Hypothesis must be set up before observing the data. If they are not, the test must be two-sided. Test statistic: Z when point estimate is nearly normal
Inference for numerical data

Paired Data
Two sets of observations are paired if each observation in one set has a special correspondence or connection with exactly one observation in the other data set.
2 s diff = s 2 x + s y 2rs x s y
SE =
s diff
Z=
x diff 0 SE
Difference of two means

If the sample means, x1 and x2, each meet the criteria for having nearly normal sampling distributions and the observations in the two samples are independent, then the difference in sample means, x1-x2, will have a sampling distribution that is nearly normal.
1 2 SE = + n1 n 2
s1 s 2 + n1 n2
since we usually don't know population , if the sample has at least 30 observations, we can use standard deviation estimates based on sample, and Z-test. For small sample, we use t-test.
One-sample means with the t distribution

We can use it when population is nearly normally distributed, the sample is small and population standard deviation is unknown. to accurately estimate standard error from a small sample
df = n-1 confidence interval:
x t df
s n
t distribution for the difference of means

For small sample when data is independent and nearly normally distributed
x 1 x2 T =
as a point estimate for
1 2
( x 1 x 2)( 1 2 )
2 s1 s2 2 + n1 n 2
When standard deviations of two groups are nearly equal, we can use pooled standard deviation (by pooling data we improve estimation of the variance)
2 pooled
s ( n 1 )+ s 2 ( n 2 1) = 1 1 n1 + n2 2
df = n1 + n 2 2
Comparing many means with ANOVA

H0 : 1 = 2 = . = k HA: At least one mean is different We simultaneously consider many groups, and evaluate whether their sample means differ more than we would expect from natural variation. Test statistic for ANOVA:
F=
MSG MSE
MSG - measures variability between groups dfG = k-1 MSE mean square error (measures of variability within the groups) dfE = n-k If H0 is true, variation in the sample means (MSG) should be relatively small compared to within-group variation (MSE) Conditions for ANOVA analysis: Independence of data approximately normal distributions approximately constant variance in the groups
Inference for categorical data

Inference for a single proportion
Condition of normality: the sample observations are independent we expected to see at least 10 successes and 10 failures in our sample, i.e. np 10 and n(1-p ) 10. This is called the success-failure condition .
Standard error:
SE p=
Confidence interval:
p ( 1 p ) n
p z * SE
Test statistic:
Z=
point estimate null value SE
Choosing a sample size when estimating a proportion:

We want a sample size that would ensure margin of error to be below some threshold m:
p ( 1 p ) m n
Solving for n:
z n p ( 1 p ) m
( )
If we have a good estimate of p, we use it, otherwise standard error is largest when p = 0.5, so to cover the worst case scenario, if we are not sure about the true p, we choose p=0.5
0.5 m n solving for n : 2 z n 0.25 m z
( )
Difference of two proportions

Condition of normality: each distribution is normally distributed samples are independent
SE p p = SE p + SE p =
2 2
1 2 1 2
p 1 ( 1 p1 ) p 2 ( 1 p 2 ) + n1 n2
Confidence interval:
(p 1 p 2) z * SE
Hypothesis testing: H0: p1 p2 = 0 HA: p1 p2 0 When the null hypothesis is p1 = p2, we can use pooled estimate of the proportion (since we are assuming equal proportions)
p =
p 1 n1 + p 2 n2 n 1 + n2
SE =
p (1 p) p ( 1 p) + n1 n2
Chi-square Test
Use chi-square test when: Given a sample of cases that can be classified into several groups, determine if the sample is representative of the general population. Evaluate whether data resemble a particular distribution, such as a normal distribution or a geometric distribution.
(a) One-way table In one-way table describes counts for each outcome in a single variable. We can put our data in a table like this:
Categories observed expected C1 C2 C3 C4 Total
We want to establish whether observed counts differ from the expected counts by chance, so the sample is representative of population or not. Chi-square statistic is then:
( obsereved count j expected count j ) = expected count j j

2
degrees of freedom: k -1 (where k is the number of categories)
(b) Two-way table Two way table describes counts for combinations of outcomes for two variables.
S1 C1 C2 Total S2 S3 Total
Expected count is then: Expected count =

k 2
Column TotalRow Total Table Total

2
( obsereved count j expected count j ) = expected count j j
degrees of freedom: (number of rows -1)x(number of columns -1) Conditions for the chi-square test: independent observations each category has to have at least 5 expected cases degrees of freedom: k 2

Short Non-Technical Guide To Statistical Inference

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Short Non-Technical Guide To Statistical Inference

Uploaded by

Copyright:

Available Formats

Statistical inference

Standard error of an estimate

Computing SE for the sample mean:

Central Limit Theorem

Nontechnical Introduction to Statistical Inference Prepared by: Gabriela Hromis

Do not reject H0 H0 true HA true

Inference for numerical data

Difference of two means

Nontechnical Introduction to Statistical Inference Prepared by: Gabriela Hromis

One-sample means with the t distribution

df = n-1 confidence interval:

t distribution for the difference of means

as a point estimate for

Nontechnical Introduction to Statistical Inference Prepared by: Gabriela Hromis

Comparing many means with ANOVA

Inference for categorical data

Nontechnical Introduction to Statistical Inference Prepared by: Gabriela Hromis

point estimate null value SE

Choosing a sample size when estimating a proportion:

0.5 m n solving for n : 2 z n 0.25 m z

Difference of two proportions

( obsereved count j expected count j ) = expected count j j

degrees of freedom: k -1 (where k is the number of categories)

Nontechnical Introduction to Statistical Inference Prepared by: Gabriela Hromis

Expected count is then: Expected count =

Column TotalRow Total Table Total

( obsereved count j expected count j ) = expected count j j

Nontechnical Introduction to Statistical Inference Prepared by: Gabriela Hromis

You might also like