Professional Documents
Culture Documents
1/13/2015
Flow Charts
Statistics Sampling Simple
Methods Random
Pseudo Stratified
Random Systematic
Cluster
Qualitative
Data or Nominal
Types Categorical Ordinal
Discrete
Quantitative
or Numerical Continuous
Bar/column
Qualitative Pie
Graphical
Types Pareto Diagram
Quantitative Histogram
Frequency Polygon
Ogive
Stem-n-leaf Plot
Non Convenience
probability Judgment
Discrete
Quantitative
or Numerical Continuous
Shape
Estimating
Probabilities
Quantitative Center
Spread
Shape
Estimating Probabilities
Chebyshevs Rule
Empirical or Normal Rule
=BINOM.DIST(x,n,,0/1)
Random VariablesContinuous
Generic Continuous: Described
+
by a formula called a pdf, ==
probability density function.
+
= =
=NORM.INV(, , )
Random VariablesContinuous
+
Generic Continuous: ==
Described by a formula
called a pdf, +
probability density = =
function.
Uniform: Simplest
continuous distribution, X~U(a,b) = =
( )
sometimes called a
rectangular distribution, ()
represented by a line P(c<x<d) = ()
(curve) parallel to the x-
axis. The distance from
the x-axis to the curve = ( + ) ( )
is the pdf. =
Parameters are a and b.
Normal Family: One of = =
the most important
continuous X~N(, )
distributions; an infinite
number of normal =NORM.DIST(x, , , 1) =NORM.INV(, , )
distributions, each Transforms any normal into the number
defined by its The Standard Normal of standard deviations its values are away
parameters, and ; a from its mean.
bell-shaped and ~(, )
symmetric distribution. =
=NORM.S.DIST(Z,1) =NORM.S.INV()
The Normal Family of Distributions and its Uses
** If the binomial is sufficiently symmetric [(n*)>5 AND n*(1-)>5], the normal can
provide a reasonable approximation of binomial probabilities. ** The use of the
Using the Normal for Approximations: Continuity Correction Factor (CCF = +/- 0.5) is essential for accurate estimates. ** Its
Approximate the Binomial Using the Normal usefulness diminishes as n increases, although there is no rule for when it can be
ignored. ** Use the expected values of the specific binomial for the parameters in the
NORM.DIST or NORM.INV commands.
~(, ) ** = ** =NORM.S.DIST(Z,1) ** =NORM.S.INV()
The Standard Normal: Transforms any
( )
X~t(df) ** = ** =T.DIST(t,df,1) ** =T.INV(,df)
The Students t: A family of approximations
of the Standard Normal; a one parameter
distribution each described completely by its Use when the population standard deviation, , is unknown, in conjunction with other
degrees of freedom (df). Centered at 0, units are statistics for confidence intervals and hypothesis testing.
approximate standard deviations.
~ , if is known and if X~N or if n > 30. If is unknown and if X~N or if n >
, The Sample Mean
30, ~ , .
()
p, The Sample Proportion ~ ,
if (n*)>5 AND n*(1-)>5.
Two Important Variables
Sample Mean, is the mean of some simple random sample of size n.
, is the
There is an infinite number of simple random samples of size n that can be drawn from a population.
estimator for the
population There is an infinite number of s, one for every simple random sample.
mean, .
Individual s vary in value because each sample most likely contains different observations.
is a random variable. Thus, has a distribution, that is, an expected value, a standard deviation,
and a shape.
~? ,
The Central
Limit Theorem For any X~?(, ), ~ ,
if X~N or if n > 30.
Sample pi is the proportion of successes found in a fixed number of trials in a binomial, that is = where
Proportion, pi, is X~B(n, ).
the estimator for
the population pi is the continuous variable version of the discrete binomial and can be any value between 0 and 1.
proportion, .
Since X~B(n, ) is a random variable, so is pi a random variable; it is a linear transformation of X. Thus pi
has a distribution, that is, an expected value, a standard deviation, and a shape.
()
= ~ ,
if X~B(n, ) is sufficiently symmetric.
Discrete w/ Probability Distribution Single variable with probability distribution; E(X), V(X)
Linear Combinations Multiple variables with known expected values E(aX + bY), V(aX + bY)
Bernoulli Single trial resulting in one of two mutually exclusive and collectively exhaustive outcomes.
Binomial: X~B(n, ) Repeated, independent Bernoulli trials with constant probability of success.
Students t: t(df)
Sample proportion ~(, ) if X is a binomial, and n* > 5 AND n*(1-)> 5
Population Mean, unknown /
/
Population Mean, unknown
( )
Population Proportion, n*>5, n*(1-) > 5 /
Multiple R
Output Map
R Square
Adjusted R Square
Observations
ANOVA, Regression
df, SS Residual
Total
LINEAR REGRESSION
Default hypotheses
Hypothesis Tests H1: i 0
Regression Diagnostics Check and clean the data Analyze and graph each variable separately.
Check for missing data.
Check any values with a standard score more than
|2.5| for accuracy.
Eliminate all observations that are not complete.
Retain all eliminated observations.
Model Assumptions Causal relationship exists between Y variable and each
X variable.
The relationship between Y and each X is linear.
Create scatter plots of each X variable against the Y
and look for a linear relationship.
X variables are independent of one another. If they are
not, one has Multicollinearity: calculate correlation
matrix. Any correlation between independent
variables of |r|>0.8 means potential problem.