You are on page 1of 5

How To Do Regression In R?

Kamakshaiah Musunuru
kamakshaiah@dhruvacollege.net

November 16, 2013

Contents
0.1 0.2 0.3 What is Regression? . . . . . . . . . . 0.1.1 Types . . . . . . . . . . . . . . Methodology . . . . . . . . . . . . . . 0.2.1 Goal . . . . . . . . . . . . . . . Models . . . . . . . . . . . . . . . . . . 0.3.1 The linear Model . . . . . . . . 0.3.2 The concept of small residuals 0.3.3 How to nd coecients . . . . How good is the ts . . . . . . . . . . 0.4.1 R2 . . . . . . . . . . . . . . . . 0.4.2 small r . . . . . . . . . . . . . 0.4.3 What is adjusted R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 2 2 3 3 3 4 4 4 4

0.4

0.1

What is Regression?

In statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specically, regression analysis helps one understand how the typical value of the dependent variable (or Criterion Variable) changes when any one of the independent variables is varied, while the other independent variables are held xed. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution. Few typical applications can be: Prediction Forecasting Machine Learning Exploring causal relationships

0.2. METHODOLOGY

Regression

CONTENTS

0.1.1

Types

There are many types of regression techniques, but broadly speaking it is possible to categorize then under two categories, i.e Statistical regression techniques and non-statistical regression techniques. The following are few of regression techniques used in the area of statistics. Linear regression model : If the relationship is linear with two variables know as (1) response variable (dependent variable) (2) Explanatory variable (independent variable) Simple linear regression: One dependent variable with only one independent variable Logistic regression: Some times known as logit regression used for predicting categorical dependent variable based on one or more predictor variables. The logistic function used ex is f (x) = ex +1 . More interesting part of this distribution is that; if regression function i.e. o + 1 X is taken on horizontal axis and respective values of X on vertical axis, the curve so generated appears to be inverted J shape curve passing or intersecting vertical axis at = 0.5 above origin, Nonlinear regression: This is a type of regression technique that is used when relationship found to be non-linear or curvy linear. For example Y = aX b + C . This type of relations might be regressed by logit functions, like ln Y = ln a + b ln X + ln C and so on. Nonparametric regression: when predictor does not take predetermined form but is constructed according to information derived from the data. Robust regression: When variables are not required to depend on the principle of sum of squares, and when the data need not depend on or highly sensitive to outliers not so robust Stepwise regression: A model selection regression technique which involves on the the 1. Forward selection 2. Backward elimination 3. Bidirectional elimination on predictive variables.

0.2

Methodology

Regression typically is used to test about how well model ts the data, i.e. if, Ho is o is Zero happened to be true, then its shows x is absent, otherwise, x is present then the model is useful. Hence, we often refer regression as Model Utility Test . In other words, if the p-values are lesser the valuable the model that the Ho is rejected.

0.2.1

Goal

The ultimate goal of regression is to see if how best values of o and 1 at reduced value of error.

0.3

Models

While executing regression in R; the user need to keep certain important formats of expression to specify a model. The following are important models of regression in R.
Kamakshaiah Musunuru kamakshaiah@dhruvacollege.net

0.3. MODELS

Regression

CONTENTS

Expression Y A Y-1+A YA+I(A2) Y A+B YA:B Y A B Y(A+B+C)2

Model Y=o+1A 1A Y=o + 1 A + 2 A2 Y=0 + 1 A + 2 B Y=0 + 1 A + 2 B Y=0 + 1 A + 2 B + 3AB Y=0 + 1 A + 2 B + 3C + 4 AB + 5 BC + 6 CA

Comments With intercept and slope With out intercept Polynomial First order model without interaction between A and B A model containing first order interactions between A and B A model including all first order effects interactions upto the nth order

0.3.1

The linear Model

1. The population parameter , and are known and i is = 2. We compute estimates of population parameter , with the help of Y + X called tted values i is residual. 3. e i = yi y i = yi + x 4. Residuals are used to check statistical e i 5. A line that ts the data has small residuals

0.3.2

The concept of small residuals

1. We want residuals to be small in magnitude. 2. There might or might not exist a large negative residuals (henceforth called resids ) as well as large positive resids for any given t. 3. We cant simply require e i = 0 e i = 0

4. In fact, any line through the means of variable i.e. the point x , y can satisfy

0.3.3

How to nd coecients
e i = 0 we may come up with the constants or

During the process of minimization of error i.e. coecients. The points are given under. 1. SSE stands for Sum of Squared Error

), that minimizes SSE ( ) = 2. We want to nd a point ( , ,

i) (yi X

) with respect to equal to Zero. 3. Thus we set a potential derivatives of RSS ( , and 4. 5.
) SSE (, ,= ) SSE (, ,=

) = 0 which is equal to (1)2 (yi ) = 0 which is equal to (xi )(2)(yi


(xi x )(yi y ) (xi x )2

) = 0 (1)2 (yi ) = 0 (xi )(2)(yi

= 6. Hence,

x and =y

Kamakshaiah Musunuru kamakshaiah@dhruvacollege.net

0.4. HOW GOOD IS THE FITS

Regression

CONTENTS

0.4
0.4.1

How good is the ts


R2
(xi )2 = (xi )2 = (yi y )2 (yi y )2

1. SE when y is absent in regression: T SS = e 2 i = 2. SSE when y is present in regression: SSE = e 2 i = 3. Reg.SS = T SS SSE

4. Then, proportional reduction in SE due to LR(Linear Regression): R2 = Reg.SS/T SS = 1 SSE/T SS 5. Thus, R2 is proportion of variation in Y explained by LR

0.4.2

small r

1. r gives strength of relationship 2. r = R2 3. r =


(xi x )(yi y ) (xi x )2 (yi y )2

= r SDy 4. Using this formula SDx 5. r is symmetric and absolute

0.4.3
=

What is adjusted R2
1SSE )/(np1) T SS/(n1)

The adjusted R2 can be computed with the help of SSE/(n p 1)

and residual standar error as

Kamakshaiah Musunuru kamakshaiah@dhruvacollege.net

You might also like