Stats Notes

Contents
Univariate Data ............................................................................................................................................. 2

Central tendency ....................................................................................................................................... 2
Mean ..................................................................................................................................................... 2
Median .................................................................................................................................................. 3
Mode ..................................................................................................................................................... 3
Trimean ................................................................................................................................................. 3
Trimmed Mean ..................................................................................................................................... 3
Spread ....................................................................................................................................................... 3
Range .................................................................................................................................................... 3
Semi Inter Quartile Rang ....................................................................................................................... 3
Variance and Standard Deviation ......................................................................................................... 3
Skew .......................................................................................................................................................... 3
Graphical Representations........................................................................................................................ 4
Frequency Polygons& Cumulative Frequency Polygons ....................................................................... 4
Histograms & Bar Graphs ...................................................................................................................... 4
Stem and Leaf plots............................................................................................................................... 4
Box Plots................................................................................................................................................ 4
Describing Bivariate Data .............................................................................................................................. 4
Scatterplots ........................................................................................................................................... 4
Pearsons Correlation ............................................................................................................................ 4
Spearmans Rho .................................................................................................................................... 4
Probability ..................................................................................................................................................... 4
Binomial Distribution ................................................................................................................................ 4
Assumptions:......................................................................................................................................... 5
Subjective Probability ............................................................................................................................... 5
Normal Distribution ...................................................................................................................................... 5
Standard Normal Distribution ............................................................................................................... 5
Sampling Distribution.................................................................................................................................... 5
Standard Error of Statistic ......................................................................................................................... 5
Central Limit Theorem .............................................................................................................................. 5
Area under the Sampling Distribution of the Mean ................................................................................. 6
Sampling Distribution, Difference between Independent means ............................................................ 6

Sampling Distribution of a Linear Combination of Means ........................................................................ 6
Sampling Distribution of Pearsons R ........................................................................................................ 7
Sampling Distribution of Difference between Independent Pearsons Rs ............................................... 7
Sampling Distribution of Median .............................................................................................................. 7
Sampling Distribution of the Standard Deviation ..................................................................................... 7
Sampling Distribution of a Proportion ...................................................................................................... 7
Correction for Continuity .......................................................................................................................... 7
Sampling Distribution of Difference between Two Proportions .............................................................. 7
Point Estimation ............................................................................................................................................ 7
Characteristics of Estimators .................................................................................................................... 8
Estimating Variance .................................................................................................................................. 8
Confidence Intervals ..................................................................................................................................... 8
CI for Mean, SD Known ............................................................................................................................. 8
CI for mean, SD Estimated ........................................................................................................................ 8
Genera Formula ........................................................................................................................................ 8
CI for Difference between means, Independent Groups, SD known........................................................ 8
CI for Difference between means, Independent Groups, SD Estimated .................................................. 9
CI for Linear combination of Means, Independent Groups ...................................................................... 9
CI on Pearsons Correlation ...................................................................................................................... 9
CI between Independent Correlations.................................................................................................... 10
Hypothesis Testing ...................................................................................................................................... 10
Null Hypothesis ....................................................................................................................................... 10
Steps in Hypothesis testing ..................................................................................................................... 10
Univariate Data
Central tendency
Mean
Good for roughly symmetric distributions

Misleading in skewed distributions
Most efficient for normal distributions
Equal to expected value
GM better than AM (resistance to skewness)
Median
Midpoint of the distribution

Better than others for highly skewed data
Prone to sampling fluctuations
Deviation of median ~25% larger than that for mean.
Mode
Most frequently occurring score in a distribution

Only measure that can be used with nominal data
Highly susceptible to sampling fluctuations
Trimean
The trimean is almost as resistant to extreme scores as the median and is less subject to
sampling fluctuations than the arithmetic mean in extremely skewed distributions.
It is less efficient than the mean for normal distributions. .
The trimean is a good measure of central tendency and is probably not used as much as it
should be.
Trimmed Mean
Discard a certain % of low and high scores and compute mean

Median is 100% trimmed mean
Spread
Range
Largest-Smallest
Sensitive to extremes
Semi Inter Quartile Rang
Q3-Q1/2
Good for skewed distributions
Not better than SD for normal distributions
Variance and Standard Deviation
In a normal distribution, about 68% of the scores are within one standard deviation of the mean
and about 95% of the scores are within two standard deviations of the mean.
Supplement with semi inter quartile range when possibility of extremes.
Skew
One tail longer than other.

Tail to right: Right skewed or Positive Skewed.Mean larger than median
Skew and kurtosis
Graphical Representations
Frequency Polygons& Cumulative Frequency Polygons
Histograms & Bar Graphs
Interval vs no of scores
Shape varies and depends on size of interval
Bar graphs use space between bars and are used for qualitative variables
Stem and Leaf plots

Box Plots
Max, Q3, Mean, Median, Q1, Min
Describing Bivariate Data

Scatterplots
Scores of one variable vs Scors of another
Pearsons Correlation
Reflects degree of linear relationship between variables

1 => Perfect positive linear relationship
Check for patterns in Reducd Range

Linear Transformations have no effect on r. Eg changing range from 200-800 to 100-700
Spearmans Rho
Measure of linearrelationship between variables

Convert values to ranks
Rank-Randomization test
Probability
If A & B are not Independednt,
P(A & B) = p(A) * P(B|A)
Binomial Distribution
Consider situations where an event has only two outcomes.

One of them can be generalized as success and the other as failure.
For r successes in N trials,
Can be approximated by a normal distribution
Assumptions:
Events are:
1.
2.
3.
4.
Dichotomous
Mutually exclusive
Independent
Randomly selected
Subjective Probability
Example : Probability that India will defeat Pakistan
Normal Distribution
A family of distributions that have the same general shape (Bell).
Defined by mean and standard deviation.
Standard Normal Distribution
Normal distributions with mean of 0 and Standard deviation of 1

Can standardize using z scores:
Called z-distribution
Z is normal only if X is normal.
Can convert between scores (z) and percentile ranks (Area under the curve).
Sampling Distribution
Sample mean and population mean will be different

Sampling distribution Limit of relative frequency distribution of means observed for a given
sample size.
Different sampling distribution for each sample size.
Population - ,
Sample - , M
standard error of the mean

Spread of sampling distribution decreases as the sample size increases.
Standard Error of Statistic
Standard deviation of the sampling distribution of that statistic

Decreases with sample size
Central Limit Theorem

Given a distribution with mean and variance 2, the sampling distribution approaches normal
distribution with mean and variance 2/N as N the sample size increases.
Number of samples is assumed to be infinite.

Sample size is the number of scores in each sample
Area under the Sampling Distribution of the Mean

View as probability. Chance of mean taking a particular value given the sample size, assuming that the
distribution is normal.
Sampling Distribution, Difference between Independent means
Means from independent samples

Let difference of mean = Md
Sampling Distribution of a Linear Combination of Means
K populations with same standard deviation .

N subjects sampled randomly from each population with mean computed with each sample
Linear combination (L) of these means is computed as weighted average of these means.
Can be used to state many experimental hypotheses
L = a1M1 + a2M2 + ... + akMk
L = a1 1 + a2 2 + ... + ak k
Example : Avgs of 12,13,14 yr lds to infer about avg of a bunch of teens from all age groups
Sampling Distribution of Pearsons R
When abs value of correlation s low (<0.4) then sampling distribution is approximately normal.
For higher values, negative skew.
Fishers z' = .5[ln(1+r) - ln(1-r)]
Fishers transformation is normally distribute with
Sampling Distribution of Difference between Independent Pearsons Rs
Convert both to zs
Sampling Distribution of Median
Does not work for non-normal distributions
Sampling Distribution of the Standard Deviation
Positively skewed for small N but normal for n>25
Sampling Distribution of a Proportion
Parameter and statistic p

Sampling distribution will be a binomial distribution with mean = and
Correction for Continuity
Probability that a sample value is exactly equal to a specific value is 0 (continuous distribution)
So to calculate value at 10, calculate area under curve between 9.5 and 10.5
Sampling Distribution of Difference between Two Proportions
Point Estimation
Estimates for parameters

Used in confidence intervals and significance testing
Characteristics of Estimators
Unbiasedness
consistency
Relative efficiency
Estimating Variance
= (X-)/N
s = (X-M)/(N-1)
Confidence Intervals
CI for Mean, SD Known
Values needed
o Sample Mean
o Z-value depends on level of confidence
o Standard Error
CI is M - z M M + z M
95% z-value = 1.96
99% z-value = 2.58
CI for mean, SD Estimated
is used as an estimate of SD
Use t-value instead of z-value
M - t sM M + t sM
M is the sample mean, sM is an estimate of M, and t depends on the degrees of freedom and
the level of confidence
Genera Formula
statistic (z) (stat)

If standard error has to be estimated, then use t instead of z.
statistic (t) (sstat)
But for proportions, z is used even when the statistic is estimated
CI for Difference between means, Independent Groups, SD known
Compute Md = M1 - M2
Compute
Find z (1.96 for 95% interval; 2.58 for 99% interval)
Lower limit = Md - z
Upper limit = Md + z
CI for Difference between means, Independent Groups, SD Estimated
Compute Md = M1 - M2
Compute SSE1 = (X - M1 ) for Group 1 and SSE2 = (X - M2 ) for Group 2
Compute SSE = SSE1 + SSE2
Compute df = N - 2 where N = n1 + n2
Compute MSE = SSE/df
Find t-value
Lower limit = Md - t
Upper limit = Md + t
CI for Linear combination of Means, Independent Groups
Compute the sample mean (M) for each group.

Compute the sample variance (s) for each of the k groups.
Find the coefficients (a's) so that aii is the parameter to be estimated.
Compute L = a1M1 + a2M2 + ... + akMk
Compute MSE = s /k
Compute
Compute df = k(n-1) where k is the number of groups and n is the number of subjects in each
group.
Find t for the df and level of confidence
Lower limit = L - t sL
Upper limit = L + t SL
CI on Pearsons Correlation
Compute the sample r.

Use the r to z' table to convert the value of r to z'.
Use a z table to find the value of z for the level of confidence desired.
Lower limit = z' - (z)(z')

Upper limit = z' + (z)(z')
Use procedure the r to z' table to convert the lower and upper limits from z' to r
CI between Independent Correlations

1. Compute the sample r's.
2. Use an r to z' table to convert the values of r to z'.
3. Compute z'1 - z'2.
4. Use a to z table to find the value of z for the level of confidence desired.
5. Compute:
6. Lower limit = z'1 - z'2 - (z)(
7. Upper limit = z'1 - z'2 + (z)(
8. Use an r to z' table to convert the lower and upper limits to r's.
Hypothesis Testing
Ruling out chance as an explanation
Null Hypothesis
A hypothesis about a population parameter

Viability of this hypothesis is tested in light of the given data
Usually reverse of what the experimenter actually believes.
Data will be used to contradict this hypothesis.
Steps in Hypothesis testing

1.
2.
3.
4.
Specify the null and alternate hypothesis.()Ho & H1)

Select a significance level (typically 0.05 or .01)
Calculate the statistic analogous to parameter specified in the null hypothesis.
Calculate P-Value : Probability of obtaining statistic as a different or more different frm the
parameter specified in the null hypothesis.
a. Outcme statistically significant if null hypothesis is rejected.
5. If outcome is statistically significant,reject null hypothesis in favor of alternate hypothesis.
6. The final step is to describe the result and the statistical conclusion in an understandable way.

Stats Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stats Notes

Uploaded by

Copyright:

Available Formats

Contents

Univariate Data ............................................................................................................................................. 2

Sampling Distribution, Difference between Independent means ............................................................ 6

Good for roughly symmetric distributions

Midpoint of the distribution

Most frequently occurring score in a distribution

Discard a certain % of low and high scores and compute mean

Semi Inter Quartile Rang

Variance and Standard Deviation

One tail longer than other.

Stem and Leaf plots

Max, Q3, Mean, Median, Q1, Min

Describing Bivariate Data

Scores of one variable vs Scors of another

Reflects degree of linear relationship between variables

Check for patterns in Reducd Range

Measure of linearrelationship between variables

Consider situations where an event has only two outcomes.

Can be approximated by a normal distribution

Example : Probability that India will defeat Pakistan

Standard Normal Distribution

Normal distributions with mean of 0 and Standard deviation of 1

Sample mean and population mean will be different

standard error of the mean

Standard Error of Statistic

Standard deviation of the sampling distribution of that statistic

Central Limit Theorem

Number of samples is assumed to be infinite.

Area under the Sampling Distribution of the Mean

Sampling Distribution, Difference between Independent means

Means from independent samples

Sampling Distribution of a Linear Combination of Means

K populations with same standard deviation .

Sampling Distribution of Pearsons R

Sampling Distribution of Difference between Independent Pearsons Rs

Sampling Distribution of Median

Does not work for non-normal distributions

Sampling Distribution of the Standard Deviation

Positively skewed for small N but normal for n>25

Sampling Distribution of a Proportion

Parameter and statistic p

Correction for Continuity

Sampling Distribution of Difference between Two Proportions

Estimates for parameters

CI for mean, SD Estimated

statistic (z) (stat)

CI for Difference between means, Independent Groups, SD known

CI for Difference between means, Independent Groups, SD Estimated

CI for Linear combination of Means, Independent Groups

Compute the sample mean (M) for each group.

Compute the sample r.

Lower limit = z' - (z)(z')

CI between Independent Correlations

7. Upper limit = z'1 - z'2 + (z)(

Ruling out chance as an explanation

A hypothesis about a population parameter

Steps in Hypothesis testing

Specify the null and alternate hypothesis.()Ho & H1)

You might also like