You are on page 1of 10

Contents

Univariate Data ............................................................................................................................................. 2


Central tendency ....................................................................................................................................... 2
Mean ..................................................................................................................................................... 2
Median .................................................................................................................................................. 3
Mode ..................................................................................................................................................... 3
Trimean ................................................................................................................................................. 3
Trimmed Mean ..................................................................................................................................... 3
Spread ....................................................................................................................................................... 3
Range .................................................................................................................................................... 3
Semi Inter Quartile Rang ....................................................................................................................... 3
Variance and Standard Deviation ......................................................................................................... 3
Skew .......................................................................................................................................................... 3
Graphical Representations........................................................................................................................ 4
Frequency Polygons& Cumulative Frequency Polygons ....................................................................... 4
Histograms & Bar Graphs ...................................................................................................................... 4
Stem and Leaf plots............................................................................................................................... 4
Box Plots................................................................................................................................................ 4
Describing Bivariate Data .............................................................................................................................. 4
Scatterplots ........................................................................................................................................... 4
Pearsons Correlation ............................................................................................................................ 4
Spearmans Rho .................................................................................................................................... 4
Probability ..................................................................................................................................................... 4
Binomial Distribution ................................................................................................................................ 4
Assumptions:......................................................................................................................................... 5
Subjective Probability ............................................................................................................................... 5
Normal Distribution ...................................................................................................................................... 5
Standard Normal Distribution ............................................................................................................... 5
Sampling Distribution.................................................................................................................................... 5
Standard Error of Statistic ......................................................................................................................... 5
Central Limit Theorem .............................................................................................................................. 5
Area under the Sampling Distribution of the Mean ................................................................................. 6

Sampling Distribution, Difference between Independent means ............................................................ 6


Sampling Distribution of a Linear Combination of Means ........................................................................ 6
Sampling Distribution of Pearsons R ........................................................................................................ 7
Sampling Distribution of Difference between Independent Pearsons Rs ............................................... 7
Sampling Distribution of Median .............................................................................................................. 7
Sampling Distribution of the Standard Deviation ..................................................................................... 7
Sampling Distribution of a Proportion ...................................................................................................... 7
Correction for Continuity .......................................................................................................................... 7
Sampling Distribution of Difference between Two Proportions .............................................................. 7
Point Estimation ............................................................................................................................................ 7
Characteristics of Estimators .................................................................................................................... 8
Estimating Variance .................................................................................................................................. 8
Confidence Intervals ..................................................................................................................................... 8
CI for Mean, SD Known ............................................................................................................................. 8
CI for mean, SD Estimated ........................................................................................................................ 8
Genera Formula ........................................................................................................................................ 8
CI for Difference between means, Independent Groups, SD known........................................................ 8
CI for Difference between means, Independent Groups, SD Estimated .................................................. 9
CI for Linear combination of Means, Independent Groups ...................................................................... 9
CI on Pearsons Correlation ...................................................................................................................... 9
CI between Independent Correlations.................................................................................................... 10
Hypothesis Testing ...................................................................................................................................... 10
Null Hypothesis ....................................................................................................................................... 10
Steps in Hypothesis testing ..................................................................................................................... 10

Univariate Data
Central tendency
Mean

Good for roughly symmetric distributions


Misleading in skewed distributions
Most efficient for normal distributions
Equal to expected value
GM better than AM (resistance to skewness)

Median

Midpoint of the distribution


Better than others for highly skewed data
Prone to sampling fluctuations
Deviation of median ~25% larger than that for mean.

Mode

Most frequently occurring score in a distribution


Only measure that can be used with nominal data
Highly susceptible to sampling fluctuations

Trimean

The trimean is almost as resistant to extreme scores as the median and is less subject to
sampling fluctuations than the arithmetic mean in extremely skewed distributions.
It is less efficient than the mean for normal distributions. .
The trimean is a good measure of central tendency and is probably not used as much as it
should be.

Trimmed Mean

Discard a certain % of low and high scores and compute mean


Median is 100% trimmed mean

Spread
Range

Largest-Smallest
Sensitive to extremes

Semi Inter Quartile Rang

Q3-Q1/2
Good for skewed distributions
Not better than SD for normal distributions

Variance and Standard Deviation

In a normal distribution, about 68% of the scores are within one standard deviation of the mean
and about 95% of the scores are within two standard deviations of the mean.
Supplement with semi inter quartile range when possibility of extremes.

Skew

One tail longer than other.


Tail to right: Right skewed or Positive Skewed.Mean larger than median
Skew and kurtosis

Graphical Representations
Frequency Polygons& Cumulative Frequency Polygons
Histograms & Bar Graphs

Interval vs no of scores
Shape varies and depends on size of interval
Bar graphs use space between bars and are used for qualitative variables

Stem and Leaf plots


Box Plots

Max, Q3, Mean, Median, Q1, Min

Describing Bivariate Data


Scatterplots

Scores of one variable vs Scors of another

Pearsons Correlation

Reflects degree of linear relationship between variables


1 => Perfect positive linear relationship

Check for patterns in Reducd Range


Linear Transformations have no effect on r. Eg changing range from 200-800 to 100-700

Spearmans Rho

Measure of linearrelationship between variables


Convert values to ranks
Rank-Randomization test

Probability
If A & B are not Independednt,
P(A & B) = p(A) * P(B|A)

Binomial Distribution

Consider situations where an event has only two outcomes.


One of them can be generalized as success and the other as failure.
For r successes in N trials,

Can be approximated by a normal distribution

Assumptions:
Events are:
1.
2.
3.
4.

Dichotomous
Mutually exclusive
Independent
Randomly selected

Subjective Probability

Example : Probability that India will defeat Pakistan

Normal Distribution
A family of distributions that have the same general shape (Bell).
Defined by mean and standard deviation.

Standard Normal Distribution

Normal distributions with mean of 0 and Standard deviation of 1


Can standardize using z scores:

Called z-distribution
Z is normal only if X is normal.
Can convert between scores (z) and percentile ranks (Area under the curve).

Sampling Distribution

Sample mean and population mean will be different


Sampling distribution Limit of relative frequency distribution of means observed for a given
sample size.
Different sampling distribution for each sample size.
Population - ,
Sample - , M

standard error of the mean


Spread of sampling distribution decreases as the sample size increases.

Standard Error of Statistic

Standard deviation of the sampling distribution of that statistic


Decreases with sample size

Central Limit Theorem


Given a distribution with mean and variance 2, the sampling distribution approaches normal
distribution with mean and variance 2/N as N the sample size increases.

Number of samples is assumed to be infinite.


Sample size is the number of scores in each sample

Area under the Sampling Distribution of the Mean


View as probability. Chance of mean taking a particular value given the sample size, assuming that the
distribution is normal.

Sampling Distribution, Difference between Independent means

Means from independent samples


Let difference of mean = Md

Sampling Distribution of a Linear Combination of Means

K populations with same standard deviation .


N subjects sampled randomly from each population with mean computed with each sample
Linear combination (L) of these means is computed as weighted average of these means.
Can be used to state many experimental hypotheses
L = a1M1 + a2M2 + ... + akMk
L = a1 1 + a2 2 + ... + ak k

Example : Avgs of 12,13,14 yr lds to infer about avg of a bunch of teens from all age groups

Sampling Distribution of Pearsons R

When abs value of correlation s low (<0.4) then sampling distribution is approximately normal.
For higher values, negative skew.
Fishers z' = .5[ln(1+r) - ln(1-r)]
Fishers transformation is normally distribute with

Sampling Distribution of Difference between Independent Pearsons Rs

Convert both to zs

Sampling Distribution of Median

Does not work for non-normal distributions

Sampling Distribution of the Standard Deviation

Positively skewed for small N but normal for n>25

Sampling Distribution of a Proportion

Parameter and statistic p


Sampling distribution will be a binomial distribution with mean = and

Correction for Continuity

Probability that a sample value is exactly equal to a specific value is 0 (continuous distribution)
So to calculate value at 10, calculate area under curve between 9.5 and 10.5

Sampling Distribution of Difference between Two Proportions

Point Estimation

Estimates for parameters


Used in confidence intervals and significance testing

Characteristics of Estimators

Unbiasedness
consistency
Relative efficiency

Estimating Variance
= (X-)/N

s = (X-M)/(N-1)

Confidence Intervals
CI for Mean, SD Known

Values needed
o Sample Mean
o Z-value depends on level of confidence
o Standard Error
CI is M - z M M + z M
95% z-value = 1.96
99% z-value = 2.58

CI for mean, SD Estimated

is used as an estimate of SD
Use t-value instead of z-value

M - t sM M + t sM
M is the sample mean, sM is an estimate of M, and t depends on the degrees of freedom and
the level of confidence

Genera Formula

statistic (z) (stat)


If standard error has to be estimated, then use t instead of z.
statistic (t) (sstat)
But for proportions, z is used even when the statistic is estimated

CI for Difference between means, Independent Groups, SD known

Compute Md = M1 - M2

Compute
Find z (1.96 for 95% interval; 2.58 for 99% interval)

Lower limit = Md - z

Upper limit = Md + z

CI for Difference between means, Independent Groups, SD Estimated

Compute Md = M1 - M2
Compute SSE1 = (X - M1 ) for Group 1 and SSE2 = (X - M2 ) for Group 2
Compute SSE = SSE1 + SSE2
Compute df = N - 2 where N = n1 + n2
Compute MSE = SSE/df
Find t-value

Lower limit = Md - t

Upper limit = Md + t

CI for Linear combination of Means, Independent Groups

Compute the sample mean (M) for each group.


Compute the sample variance (s) for each of the k groups.
Find the coefficients (a's) so that aii is the parameter to be estimated.
Compute L = a1M1 + a2M2 + ... + akMk
Compute MSE = s /k

Compute
Compute df = k(n-1) where k is the number of groups and n is the number of subjects in each
group.
Find t for the df and level of confidence
Lower limit = L - t sL
Upper limit = L + t SL

CI on Pearsons Correlation

Compute the sample r.


Use the r to z' table to convert the value of r to z'.
Use a z table to find the value of z for the level of confidence desired.

Lower limit = z' - (z)(z')


Upper limit = z' + (z)(z')

Use procedure the r to z' table to convert the lower and upper limits from z' to r

CI between Independent Correlations


1. Compute the sample r's.
2. Use an r to z' table to convert the values of r to z'.
3. Compute z'1 - z'2.
4. Use a to z table to find the value of z for the level of confidence desired.

5. Compute:
6. Lower limit = z'1 - z'2 - (z)(

7. Upper limit = z'1 - z'2 + (z)(

8. Use an r to z' table to convert the lower and upper limits to r's.

Hypothesis Testing

Ruling out chance as an explanation

Null Hypothesis

A hypothesis about a population parameter


Viability of this hypothesis is tested in light of the given data
Usually reverse of what the experimenter actually believes.
Data will be used to contradict this hypothesis.

Steps in Hypothesis testing


1.
2.
3.
4.

Specify the null and alternate hypothesis.()Ho & H1)


Select a significance level (typically 0.05 or .01)
Calculate the statistic analogous to parameter specified in the null hypothesis.
Calculate P-Value : Probability of obtaining statistic as a different or more different frm the
parameter specified in the null hypothesis.
a. Outcme statistically significant if null hypothesis is rejected.
5. If outcome is statistically significant,reject null hypothesis in favor of alternate hypothesis.
6. The final step is to describe the result and the statistical conclusion in an understandable way.

You might also like