You are on page 1of 3


Applied Regression Analysis

Understand the hypothesized relationships between variables
Explanatory and predictive
Trying to fit a line to data

Gaussian distributions data is normal

Shouldn't have multicollinearity be non-correlated

Errors are normally distributed because its based a point estimates

Homoscedasticity it means that anything youre observing, variables should have the
same structure. Unequal variances or errors.

You can still violate these assumptions and build a model

When should it be used?

When it should not be used
How to diagnose common data problems
How to use dummy variables
Y variable
Criterion variable, dependent variable.
Correlation is a good place to start to figure out something about the data
The hypothesized relationship, there is something here.
RSquared value and T-test

Model summary provides most of the insights

Data that appears to be coded strangely

Know your data

Know the source
Is something missing
Distribution of variables whether the samples are representative
80% of the work is data prep, knowing the data really well
What is the relationship between key economic variables and female life expectancy
If data is missing there are ways to get around it, either by excluding records etc.
Q-Q plot and other normality distributions tell us whether the data is normally distributed
Data transformation makes it harder to draw inferences
Linear regression assumes that data is serial and countable
Ordinal variable -- Likelihood scale is 1 5 scale
Nominal variable name etc, gender
Interval set of data 2 is greater than 1 and is also twice as larger as 1
The scatter plot with all the variables can tell you if something is linear or curvo-linear
Graphs Legacy Dialogs scatter Matrix scatter
IVs are correlated which is bad
Transform the variables if they arent normal
The most common type of transformation is a log transformation
You want to preserve its inherent properties and preserve its inherent outcome. But you want
to make it linear
Collinearity option tells you about the Variance Inflation Factor
Throw all of them - Default
Forward selection adds them in one by one
Backward selection -Step-wise selection it enters variables

Parsimony build a parsimonious model

The F value, the higher the better the model
The significance number should be as small as possible
Significance value tells you how confident you are that there isnt chance involved
You want VIF to be less than 10
Beta coefficients if its positive, then the more phones in a house hold the more life
You have to make sure the direction of the relationship makes sense
Sign flipping is a big problem no sign flipping is not
Unstandardized B is 1 unit change
Interpretation is different because a variable is logged.
Why would you standardize a variable becomes important when you have two variables on
different scale
Standardized Beta a 1 standard deviation change of Beta = 1 stdev change * coefficient
You need to find the difference between the point value and the actual value and look at the
Residuals are important
If the error terms are not normally distributed.

You might also like