Flow Charts

E370
1/13/2015
Flow Charts
Statistics Sampling Simple
Methods Random
Pseudo Stratified
Random Systematic
Cluster
Qualitative
Data or Nominal
Types Categorical Ordinal
Discrete
Quantitative
or Numerical Continuous
Bar/column
Qualitative Pie
Graphical
Types Pareto Diagram
Quantitative Histogram
Frequency Polygon
Ogive
Stem-n-leaf Plot
Descriptive Qualitative Mode (Nominal &

Ordinal)
Statistics
Median (Ordinal)
Quantitative Center Spread Shape

Mean Range Skewness
Median Variance Symmetric
Mode Standard Deviation Uni-modal, Bi-modal, etc.
Coefficient of
Variation
Statistics Sampling Methods Simple random
Pseudo random
Probability Stratified
Systematic
Cluster
Non Convenience
probability Judgment
Data Types Qualitative Nominal

or
Categorical Ordinal
Discrete
Quantitative
or Numerical Continuous
Graphical Types Bar/column

Qualitative Pie
Pareto Diagram
Histogram
Quantitative Frequency polygon
Ogive
Stem-n-Leaf Plot
Descriptive Statistics Mode (Nominal & Ordinal)

Qualitative Median (Ordinal only)
Center Spread Shape

Mean Range Skewness
Quantitative Median Variance Symmetric
Mode Standard Deviation Uni-, bi-modal,
Coefficient of Variation etc.
Descriptive Single Variable Qualitative
Statistics Methods
Quantitative Center
Spread
Shape
Estimating
Probabilities
Two Variable At least 1 Contingency Tables

Methods qualitative
variable
2 quantitative Scatter plots

variables Covariance
Correlation
Least Squares Lines
Descriptive Qualitative Mode (Nominal & Ordinal)
Statistics Single
Variable
Median (Ordinal only)
Methods
Quantitative Center
Spread
Shape
Estimating Probabilities
Chebyshevs Rule
Empirical or Normal Rule
At least 1 Contingency Tables

Two qualitative Probabilities
Variable variable Simple or Marginal
Methods Joint
Conditional
Independence Condition
2 quantitative Scatter plots

variables Covariance
Correlation
Least Squares Lines
Random Variables--Discrete:
Generic Discrete: Described by = ( )
a valid probability distribution,
a list of outcomes and
associated probabilities.

= () ( )
Bernoulli: A single trial resulting

E(X) =
in 1 of 2 mutually exclusive
and collectively exhaustive
outcomes.
Parameter is V(X) = (1-)
Binomial: Repeated Bernoulli E(X)=n V(X) = n(1-)

trials when X is the number of
successes, is constant over
If < 0.5, right skewed; If > 0.5, left skewed; If = 0.5, symmetric
all trials, and each trial is
independent of every other
trial. If n & n(1-) > 5, can be considered symmetric for certain
Parameters are n and . purposes.
X~B(n, )
=BINOM.DIST(x,n,,0/1)
Random VariablesContinuous
Generic Continuous: Described
+
by a formula called a pdf, ==

probability density function.
+
= =

Uniform: Simplest continuous

X~U(a,b) = =
distribution, sometimes called a ( )
rectangular distribution,
represented by a line (curve) ()
parallel to the x-axis. The P(c<x<d) =
()
distance from the x-axis to the
curve is the pdf. ( + ) ( )
= =
Parameters are a and b.
Normal Family: One of the

most important continuous = =
distributions; an infinite
number of normal
distributions, each defined by X~N(, )
its parameters, and ; a bell-
shaped and symmetric
distribution. =NORM.DIST(x, , , 1)
=NORM.INV(, , )
Random VariablesContinuous
+
Generic Continuous: ==

Described by a formula
called a pdf, +
probability density = =

function.
Uniform: Simplest
continuous distribution, X~U(a,b) = =
( )
sometimes called a
rectangular distribution, ()
represented by a line P(c<x<d) = ()
(curve) parallel to the x-
axis. The distance from
the x-axis to the curve = ( + ) ( )
is the pdf. =

Parameters are a and b.

Normal Family: One of = =
the most important
continuous X~N(, )
distributions; an infinite
number of normal =NORM.DIST(x, , , 1) =NORM.INV(, , )
distributions, each Transforms any normal into the number
defined by its The Standard Normal of standard deviations its values are away
parameters, and ; a from its mean.
bell-shaped and ~(, )
symmetric distribution. =

=NORM.S.DIST(Z,1) =NORM.S.INV()
The Normal Family of Distributions and its Uses
Normal Family: One of the most important

continuous distributions; an infinite number of = =
normal distributions, each defined by its
parameters, and ; a bell-shaped and
symmetric distribution. X~N(, ) ** =NORM.DIST(x, , , 1) ** =NORM.INV(, , )
** If the binomial is sufficiently symmetric [(n*)>5 AND n*(1-)>5], the normal can
provide a reasonable approximation of binomial probabilities. ** The use of the
Using the Normal for Approximations: Continuity Correction Factor (CCF = +/- 0.5) is essential for accurate estimates. ** Its
Approximate the Binomial Using the Normal usefulness diminishes as n increases, although there is no rule for when it can be
ignored. ** Use the expected values of the specific binomial for the parameters in the
NORM.DIST or NORM.INV commands.

~(, ) ** = ** =NORM.S.DIST(Z,1) ** =NORM.S.INV()
The Standard Normal: Transforms any
normal into the number of standard deviations its

values are away from its mean. Use by itself to compare relative locations of two different normal distributions. Use in
conjunction with other statistics for confidence intervals and hypothesis testing.
( )
X~t(df) ** = ** =T.DIST(t,df,1) ** =T.INV(,df)
The Students t: A family of approximations
of the Standard Normal; a one parameter
distribution each described completely by its Use when the population standard deviation, , is unknown, in conjunction with other
degrees of freedom (df). Centered at 0, units are statistics for confidence intervals and hypothesis testing.
approximate standard deviations.

~ , if is known and if X~N or if n > 30. If is unknown and if X~N or if n >

, The Sample Mean
30, ~ , .

()
p, The Sample Proportion ~ ,

if (n*)>5 AND n*(1-)>5.
Two Important Variables
Sample Mean, is the mean of some simple random sample of size n.
, is the
There is an infinite number of simple random samples of size n that can be drawn from a population.
estimator for the
population There is an infinite number of s, one for every simple random sample.
mean, .
Individual s vary in value because each sample most likely contains different observations.
is a random variable. Thus, has a distribution, that is, an expected value, a standard deviation,
and a shape.

~? ,

The Central
Limit Theorem For any X~?(, ), ~ ,

if X~N or if n > 30.

Sample pi is the proportion of successes found in a fixed number of trials in a binomial, that is = where
Proportion, pi, is X~B(n, ).
the estimator for
the population pi is the continuous variable version of the discrete binomial and can be any value between 0 and 1.
proportion, .
Since X~B(n, ) is a random variable, so is pi a random variable; it is a linear transformation of X. Thus pi
has a distribution, that is, an expected value, a standard deviation, and a shape.
()
= ~ ,
if X~B(n, ) is sufficiently symmetric.
Discrete w/ Probability Distribution Single variable with probability distribution; E(X), V(X)
Linear Combinations Multiple variables with known expected values E(aX + bY), V(aX + bY)
Bernoulli Single trial resulting in one of two mutually exclusive and collectively exhaustive outcomes.
One parameter, E(X)=, V(X) = (1-)
Binomial: X~B(n, ) Repeated, independent Bernoulli trials with constant probability of success.
Parameters n, E(X)=n* V(X) = n**(1-)

Uniform: X~U(a, b) Simplest continuous distribution; pdf is horizontal, parallel to X axis
(+) ()
= Parameters a, b () = =

Normal: X~N(, ) Bell-shaped and symmetric Parameters ,
Utility Distributions Standard Normal=Z~N(0, 1)
Students t: t(df)
Sampling Distributions Sample mean ~ ,

if X~N or, if n > 30.

Sample proportion ~(, ) if X is a binomial, and n* > 5 AND n*(1-)> 5

Confidence Interval Estimation

Population Mean, known /

Population Mean, unknown /

Population Proportion, n*>5, n*(1-) > 5 ( )

/

Confidence Interval Estimation

Population Mean, known /

/
Population Mean, unknown
( )
Population Proportion, n*>5, n*(1-) > 5 /

Hypotheses Parameter + sign + value H0 (Null ) possible signs < , =, >

or
H1 (Alternative) possible signs >, , <
Simple null; two-tailed test: =&
Tests
Composite null; one-tailed test Right-tailed test: < & >
Left-tailed test: > & <
Hypothesis Testing
Divide in half for 2-tailed test

Level of significance Be sure it is the correct tail for 1-tailed test
Critical Value(s) ZCRIT or tCRIT ; 2 for 2-tailed test, 1 for 1-tailed test
alpha,
Test Statistic ZOBS or tOBS
Rejection region Values on the number line outside the critical values
Non-rejection region Values on number line near the null including the critical values
Decision Rule A statement of the test result for every possible value of test statistic
p-value The area under the curve beyond the test statistic in the direction of the
alternative
The Test Reject the null and conclude the alternative
Fail to reject the null
Type I error Rejecting a TRUE null; probability =
Type II Error Fail to reject a FALSE null, probability =
LINEAR REGRESSION
Simple Regression One numerical dependent, Model: = + +
response or y variable; One Estimated equation: = +
independent, causal or x
variable, numerical or
dummy (indicator)
Multiple One numerical dependent, Model: = + + + . . . + +

Regression response or y variable; Estimated equation: = + + + . . .
Multiple independent,
causal or x variables,
numerical or dummy
(indicator)
Multiple R
Output Map
R Square
Adjusted R Square
Observations
ANOVA, Regression
df, SS Residual
Total
Coefficients Table Coefficient

Standard Error
t stat
P-value
Lower & Upper (1-)%
H0: i = 0
LINEAR REGRESSION
Default hypotheses
Hypothesis Tests H1: i 0
P-value is for 2-tailed test
Confidence Intervals Coefficient is point estimate.
Regression Diagnostics Check and clean the data Analyze and graph each variable separately.
Check for missing data.
Check any values with a standard score more than
|2.5| for accuracy.
Eliminate all observations that are not complete.
Retain all eliminated observations.
Model Assumptions Causal relationship exists between Y variable and each
X variable.
The relationship between Y and each X is linear.
Create scatter plots of each X variable against the Y
and look for a linear relationship.
X variables are independent of one another. If they are
not, one has Multicollinearity: calculate correlation
matrix. Any correlation between independent
variables of |r|>0.8 means potential problem.
Error Assumptions ~N Plot residuals in histogram or ogive; look for

normal histogram or S-shaped ogive.
= Calculate descriptive statistics of residuals and
look for 0 mean and 0 sum.
is constant over the full range of the dependent
variable, that is, homoscedastic. To check for
heteroscedasticity, plot predicted Y on x-axis and
residuals or standardized residuals on y-axis; look for
a pattern across the x-axis.
Influential Observations Check standardized residuals for any outliers.
Re-run the model without outliers and see how
regression results differ, if at all.
Retain or remove non-significant variables?
Adjust the model? Depends on purpose. If purpose is to make predictions,
non-significant variables with |t stat|<1 MAY be
eliminated from model and model is re-run.
If any other purpose exists for the regression, no
variables are ever removed regardless of statistical
results.

Flow Charts

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Flow Charts

Uploaded by

Copyright:

Available Formats

E370

Descriptive Qualitative Mode (Nominal &

Quantitative Center Spread Shape

Data Types Qualitative Nominal

Graphical Types Bar/column

Descriptive Statistics Mode (Nominal & Ordinal)

Center Spread Shape

Two Variable At least 1 Contingency Tables

2 quantitative Scatter plots

At least 1 Contingency Tables

2 quantitative Scatter plots

Bernoulli: A single trial resulting

Binomial: Repeated Bernoulli E(X)=n V(X) = n(1-)

Uniform: Simplest continuous

Normal Family: One of the

Normal Family: One of the most important

normal into the number of standard deviations its

One parameter, E(X)=, V(X) = (1-)

Parameters n, E(X)=n* V(X) = n**(1-)

Normal: X~N(, ) Bell-shaped and symmetric Parameters ,

Utility Distributions Standard Normal=Z~N(0, 1)

Sampling Distributions Sample mean ~ ,

Confidence Interval Estimation

Population Proportion, n*>5, n*(1-) > 5 ( )

Hypotheses Parameter + sign + value H0 (Null ) possible signs < , =, >

Divide in half for 2-tailed test

Multiple One numerical dependent, Model: = + + + . . . + +

Coefficients Table Coefficient

P-value is for 2-tailed test

Confidence Intervals Coefficient is point estimate.

Error Assumptions ~N Plot residuals in histogram or ogive; look for

You might also like

Population Proportion, n>5, n(1-) > 5 ( )