Professional Documents
Culture Documents
Kamakshaiah Musunuru
kamakshaiah@dhruvacollege.net
Contents
0.1 0.2 0.3 What is Regression? . . . . . . . . . . 0.1.1 Types . . . . . . . . . . . . . . Methodology . . . . . . . . . . . . . . 0.2.1 Goal . . . . . . . . . . . . . . . Models . . . . . . . . . . . . . . . . . . 0.3.1 The linear Model . . . . . . . . 0.3.2 The concept of small residuals 0.3.3 How to nd coecients . . . . How good is the ts . . . . . . . . . . 0.4.1 R2 . . . . . . . . . . . . . . . . 0.4.2 small r . . . . . . . . . . . . . 0.4.3 What is adjusted R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 2 2 3 3 3 4 4 4 4
0.4
0.1
What is Regression?
In statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specically, regression analysis helps one understand how the typical value of the dependent variable (or Criterion Variable) changes when any one of the independent variables is varied, while the other independent variables are held xed. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution. Few typical applications can be: Prediction Forecasting Machine Learning Exploring causal relationships
0.2. METHODOLOGY
Regression
CONTENTS
0.1.1
Types
There are many types of regression techniques, but broadly speaking it is possible to categorize then under two categories, i.e Statistical regression techniques and non-statistical regression techniques. The following are few of regression techniques used in the area of statistics. Linear regression model : If the relationship is linear with two variables know as (1) response variable (dependent variable) (2) Explanatory variable (independent variable) Simple linear regression: One dependent variable with only one independent variable Logistic regression: Some times known as logit regression used for predicting categorical dependent variable based on one or more predictor variables. The logistic function used ex is f (x) = ex +1 . More interesting part of this distribution is that; if regression function i.e. o + 1 X is taken on horizontal axis and respective values of X on vertical axis, the curve so generated appears to be inverted J shape curve passing or intersecting vertical axis at = 0.5 above origin, Nonlinear regression: This is a type of regression technique that is used when relationship found to be non-linear or curvy linear. For example Y = aX b + C . This type of relations might be regressed by logit functions, like ln Y = ln a + b ln X + ln C and so on. Nonparametric regression: when predictor does not take predetermined form but is constructed according to information derived from the data. Robust regression: When variables are not required to depend on the principle of sum of squares, and when the data need not depend on or highly sensitive to outliers not so robust Stepwise regression: A model selection regression technique which involves on the the 1. Forward selection 2. Backward elimination 3. Bidirectional elimination on predictive variables.
0.2
Methodology
Regression typically is used to test about how well model ts the data, i.e. if, Ho is o is Zero happened to be true, then its shows x is absent, otherwise, x is present then the model is useful. Hence, we often refer regression as Model Utility Test . In other words, if the p-values are lesser the valuable the model that the Ho is rejected.
0.2.1
Goal
The ultimate goal of regression is to see if how best values of o and 1 at reduced value of error.
0.3
Models
While executing regression in R; the user need to keep certain important formats of expression to specify a model. The following are important models of regression in R.
Kamakshaiah Musunuru kamakshaiah@dhruvacollege.net
0.3. MODELS
Regression
CONTENTS
Comments With intercept and slope With out intercept Polynomial First order model without interaction between A and B A model containing first order interactions between A and B A model including all first order effects interactions upto the nth order
0.3.1
1. The population parameter , and are known and i is = 2. We compute estimates of population parameter , with the help of Y + X called tted values i is residual. 3. e i = yi y i = yi + x 4. Residuals are used to check statistical e i 5. A line that ts the data has small residuals
0.3.2
1. We want residuals to be small in magnitude. 2. There might or might not exist a large negative residuals (henceforth called resids ) as well as large positive resids for any given t. 3. We cant simply require e i = 0 e i = 0
4. In fact, any line through the means of variable i.e. the point x , y can satisfy
0.3.3
How to nd coecients
e i = 0 we may come up with the constants or
During the process of minimization of error i.e. coecients. The points are given under. 1. SSE stands for Sum of Squared Error
i) (yi X
) with respect to equal to Zero. 3. Thus we set a potential derivatives of RSS ( , and 4. 5.
) SSE (, ,= ) SSE (, ,=
= 6. Hence,
x and =y
Regression
CONTENTS
0.4
0.4.1
1. SE when y is absent in regression: T SS = e 2 i = 2. SSE when y is present in regression: SSE = e 2 i = 3. Reg.SS = T SS SSE
4. Then, proportional reduction in SE due to LR(Linear Regression): R2 = Reg.SS/T SS = 1 SSE/T SS 5. Thus, R2 is proportion of variation in Y explained by LR
0.4.2
small r
0.4.3
=
What is adjusted R2
1SSE )/(np1) T SS/(n1)