You are on page 1of 14

E370

1/13/2015
Flow Charts
Statistics Sampling Simple
Methods Random
Pseudo Stratified
Random Systematic
Cluster
Qualitative
Data or Nominal
Types Categorical Ordinal
Discrete
Quantitative
or Numerical Continuous
Bar/column
Qualitative Pie
Graphical
Types Pareto Diagram

Quantitative Histogram
Frequency Polygon
Ogive
Stem-n-leaf Plot

Descriptive Qualitative Mode (Nominal &


Ordinal)
Statistics
Median (Ordinal)

Quantitative Center Spread Shape


Mean Range Skewness
Median Variance Symmetric
Mode Standard Deviation Uni-modal, Bi-modal, etc.
Coefficient of
Variation
Statistics Sampling Methods Simple random
Pseudo random
Probability Stratified
Systematic
Cluster

Non Convenience
probability Judgment

Data Types Qualitative Nominal


or
Categorical Ordinal

Discrete
Quantitative
or Numerical Continuous

Graphical Types Bar/column


Qualitative Pie
Pareto Diagram
Histogram
Quantitative Frequency polygon
Ogive
Stem-n-Leaf Plot

Descriptive Statistics Mode (Nominal & Ordinal)


Qualitative Median (Ordinal only)

Center Spread Shape


Mean Range Skewness
Quantitative Median Variance Symmetric
Mode Standard Deviation Uni-, bi-modal,
Coefficient of Variation etc.
Descriptive Single Variable Qualitative
Statistics Methods
Quantitative Center
Spread

Shape
Estimating
Probabilities

Two Variable At least 1 Contingency Tables


Methods qualitative
variable

2 quantitative Scatter plots


variables Covariance
Correlation
Least Squares Lines
Descriptive Qualitative Mode (Nominal & Ordinal)
Statistics Single
Variable
Median (Ordinal only)
Methods

Quantitative Center
Spread
Shape
Estimating Probabilities
Chebyshevs Rule
Empirical or Normal Rule

At least 1 Contingency Tables


Two qualitative Probabilities
Variable variable Simple or Marginal
Methods Joint
Conditional
Independence Condition

2 quantitative Scatter plots


variables Covariance
Correlation
Least Squares Lines
Random Variables--Discrete:
Generic Discrete: Described by = ( )
a valid probability distribution,
a list of outcomes and
associated probabilities.

= () ( )

Bernoulli: A single trial resulting


E(X) =
in 1 of 2 mutually exclusive
and collectively exhaustive
outcomes.
Parameter is V(X) = (1-)

Binomial: Repeated Bernoulli E(X)=n V(X) = n(1-)


trials when X is the number of
successes, is constant over
If < 0.5, right skewed; If > 0.5, left skewed; If = 0.5, symmetric
all trials, and each trial is
independent of every other
trial. If n & n(1-) > 5, can be considered symmetric for certain
Parameters are n and . purposes.
X~B(n, )

=BINOM.DIST(x,n,,0/1)
Random VariablesContinuous
Generic Continuous: Described
+
by a formula called a pdf, ==

probability density function.

+
= =

Uniform: Simplest continuous


X~U(a,b) = =
distribution, sometimes called a ( )
rectangular distribution,
represented by a line (curve) ()
parallel to the x-axis. The P(c<x<d) =
()
distance from the x-axis to the
curve is the pdf. ( + ) ( )
= =
Parameters are a and b.

Normal Family: One of the



most important continuous = =
distributions; an infinite
number of normal
distributions, each defined by X~N(, )
its parameters, and ; a bell-
shaped and symmetric
distribution. =NORM.DIST(x, , , 1)

=NORM.INV(, , )
Random VariablesContinuous
+
Generic Continuous: ==

Described by a formula
called a pdf, +
probability density = =

function.

Uniform: Simplest
continuous distribution, X~U(a,b) = =
( )
sometimes called a
rectangular distribution, ()
represented by a line P(c<x<d) = ()
(curve) parallel to the x-
axis. The distance from
the x-axis to the curve = ( + ) ( )
is the pdf. =

Parameters are a and b.


Normal Family: One of = =
the most important
continuous X~N(, )
distributions; an infinite
number of normal =NORM.DIST(x, , , 1) =NORM.INV(, , )
distributions, each Transforms any normal into the number
defined by its The Standard Normal of standard deviations its values are away
parameters, and ; a from its mean.
bell-shaped and ~(, )
symmetric distribution. =

=NORM.S.DIST(Z,1) =NORM.S.INV()
The Normal Family of Distributions and its Uses

Normal Family: One of the most important



continuous distributions; an infinite number of = =
normal distributions, each defined by its
parameters, and ; a bell-shaped and
symmetric distribution. X~N(, ) ** =NORM.DIST(x, , , 1) ** =NORM.INV(, , )

** If the binomial is sufficiently symmetric [(n*)>5 AND n*(1-)>5], the normal can
provide a reasonable approximation of binomial probabilities. ** The use of the
Using the Normal for Approximations: Continuity Correction Factor (CCF = +/- 0.5) is essential for accurate estimates. ** Its
Approximate the Binomial Using the Normal usefulness diminishes as n increases, although there is no rule for when it can be
ignored. ** Use the expected values of the specific binomial for the parameters in the
NORM.DIST or NORM.INV commands.

~(, ) ** = ** =NORM.S.DIST(Z,1) ** =NORM.S.INV()
The Standard Normal: Transforms any

normal into the number of standard deviations its


values are away from its mean. Use by itself to compare relative locations of two different normal distributions. Use in
conjunction with other statistics for confidence intervals and hypothesis testing.

( )
X~t(df) ** = ** =T.DIST(t,df,1) ** =T.INV(,df)
The Students t: A family of approximations
of the Standard Normal; a one parameter
distribution each described completely by its Use when the population standard deviation, , is unknown, in conjunction with other
degrees of freedom (df). Centered at 0, units are statistics for confidence intervals and hypothesis testing.
approximate standard deviations.


~ , if is known and if X~N or if n > 30. If is unknown and if X~N or if n >

, The Sample Mean
30, ~ , .

()
p, The Sample Proportion ~ ,

if (n*)>5 AND n*(1-)>5.
Two Important Variables
Sample Mean, is the mean of some simple random sample of size n.
, is the
There is an infinite number of simple random samples of size n that can be drawn from a population.
estimator for the
population There is an infinite number of s, one for every simple random sample.
mean, .
Individual s vary in value because each sample most likely contains different observations.
is a random variable. Thus, has a distribution, that is, an expected value, a standard deviation,
and a shape.

~? ,

The Central
Limit Theorem For any X~?(, ), ~ ,

if X~N or if n > 30.


Sample pi is the proportion of successes found in a fixed number of trials in a binomial, that is = where
Proportion, pi, is X~B(n, ).
the estimator for
the population pi is the continuous variable version of the discrete binomial and can be any value between 0 and 1.
proportion, .
Since X~B(n, ) is a random variable, so is pi a random variable; it is a linear transformation of X. Thus pi
has a distribution, that is, an expected value, a standard deviation, and a shape.

()
= ~ ,
if X~B(n, ) is sufficiently symmetric.
Discrete w/ Probability Distribution Single variable with probability distribution; E(X), V(X)

Linear Combinations Multiple variables with known expected values E(aX + bY), V(aX + bY)

Bernoulli Single trial resulting in one of two mutually exclusive and collectively exhaustive outcomes.

One parameter, E(X)=, V(X) = (1-)

Binomial: X~B(n, ) Repeated, independent Bernoulli trials with constant probability of success.

Parameters n, E(X)=n* V(X) = n**(1-)


Uniform: X~U(a, b) Simplest continuous distribution; pdf is horizontal, parallel to X axis
(+) ()
= Parameters a, b () = =

Normal: X~N(, ) Bell-shaped and symmetric Parameters ,

Utility Distributions Standard Normal=Z~N(0, 1)

Students t: t(df)

Sampling Distributions Sample mean ~ ,



if X~N or, if n > 30.


Sample proportion ~(, ) if X is a binomial, and n* > 5 AND n*(1-)> 5

Confidence Interval Estimation



Population Mean, known /


Population Mean, unknown /

Population Proportion, n*>5, n*(1-) > 5 ( )


/

Confidence Interval Estimation

Population Mean, known /


/
Population Mean, unknown

( )
Population Proportion, n*>5, n*(1-) > 5 /

Hypotheses Parameter + sign + value H0 (Null ) possible signs < , =, >


or
H1 (Alternative) possible signs >, , <
Simple null; two-tailed test: =&
Tests
Composite null; one-tailed test Right-tailed test: < & >
Left-tailed test: > & <
Hypothesis Testing

Divide in half for 2-tailed test


Level of significance Be sure it is the correct tail for 1-tailed test
Critical Value(s) ZCRIT or tCRIT ; 2 for 2-tailed test, 1 for 1-tailed test
alpha,
Test Statistic ZOBS or tOBS
Rejection region Values on the number line outside the critical values
Non-rejection region Values on number line near the null including the critical values
Decision Rule A statement of the test result for every possible value of test statistic
p-value The area under the curve beyond the test statistic in the direction of the
alternative
The Test Reject the null and conclude the alternative
Fail to reject the null
Type I error Rejecting a TRUE null; probability =
Type II Error Fail to reject a FALSE null, probability =
LINEAR REGRESSION
Simple Regression One numerical dependent, Model: = + +
response or y variable; One Estimated equation: = +
independent, causal or x
variable, numerical or
dummy (indicator)

Multiple One numerical dependent, Model: = + + + . . . + +


Regression response or y variable; Estimated equation: = + + + . . .
Multiple independent,
causal or x variables,
numerical or dummy
(indicator)

Multiple R
Output Map
R Square

Adjusted R Square

Observations

ANOVA, Regression
df, SS Residual
Total

Coefficients Table Coefficient


Standard Error
t stat
P-value
Lower & Upper (1-)%
H0: i = 0

LINEAR REGRESSION
Default hypotheses
Hypothesis Tests H1: i 0

P-value is for 2-tailed test

Confidence Intervals Coefficient is point estimate.

Regression Diagnostics Check and clean the data Analyze and graph each variable separately.
Check for missing data.
Check any values with a standard score more than
|2.5| for accuracy.
Eliminate all observations that are not complete.
Retain all eliminated observations.
Model Assumptions Causal relationship exists between Y variable and each
X variable.
The relationship between Y and each X is linear.
Create scatter plots of each X variable against the Y
and look for a linear relationship.
X variables are independent of one another. If they are
not, one has Multicollinearity: calculate correlation
matrix. Any correlation between independent
variables of |r|>0.8 means potential problem.

Error Assumptions ~N Plot residuals in histogram or ogive; look for


normal histogram or S-shaped ogive.
= Calculate descriptive statistics of residuals and
look for 0 mean and 0 sum.
is constant over the full range of the dependent
variable, that is, homoscedastic. To check for
heteroscedasticity, plot predicted Y on x-axis and
residuals or standardized residuals on y-axis; look for
a pattern across the x-axis.
Influential Observations Check standardized residuals for any outliers.
Re-run the model without outliers and see how
regression results differ, if at all.
Retain or remove non-significant variables?
Adjust the model? Depends on purpose. If purpose is to make predictions,
non-significant variables with |t stat|<1 MAY be
eliminated from model and model is re-run.
If any other purpose exists for the regression, no
variables are ever removed regardless of statistical
results.

You might also like