You are on page 1of 7

LOGISTIC REGRESSION MODEL 1.

1 Introduction Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response variable and one or more explanatory variables. It is often the case that the outcome variable is discrete, taking on two or more possible values. Over the last decade logistic regression model has become, in many fields, the standard method of analysis in this situation.1 Before beginning a study of logistic regression it is important to understand that the goal of an analysis using this method is the same as that of any model-building technique used in statistic: to find the best fitting and most parsimonious, yet biologically reasonable model to describe the relationship between an outcome (dependent or response) variable and a set of independent (predictor or explanatory) variables. These independent variables are often called covariates. The most common example of modeling, and one assumed to be familiar to the readers of this text, is the usual linear regression model where the outcome variable is assumed to be continuous.1 What distinguishes a logistic regression model from the linear regression model is that the outcome variable in logistic regression is binary or dichotomous. The difference between logistic and linear regression is reflected both in the choice of a parametric model and in the assumptions.1 In this chapter, we focus on logit analysis (a.k.a. logistic regression analysis) as an optimal method for the regression analysis of dichotomous (binary) dependent variables. Before considering the full model, lets examine one of its components the odds of an event.2 1.2 Odds and Odds Ratios To appreciate the logit model, its helpful to have an understanding of odds and odds ratios. Most people regard probability as the natural way to quantify the chances that an event will occur. We automatically think in terms of numbers ranging from 0 to 1, with a 0 meaning that the event will certainly not occur, and a 1 meaning that the event certainly will occur.2 Probability, can be computed as follows:

For example:

LOGISTIC REGRESSION MODEL However, there are other way of representing the chances of event, one of which the odds has a nearly equal claim to being natural. Consider Table 1, which shows the cross tabulation of race of defendant by death sentence for the 147 penalty-trial cases. The numbers in the table are the actual numbers of cases that have the stated characteristics. Table 1 Death Sentences by Race of Defendant for 147 Penalty Trials Black 28 45 73 Nonblack 22 52 74 Total 50 97 147

Death Life Total

Odds, =

Odds ratio =

or exp(

, the odds ratio represents the change in odds of being in on

categories of outcome when the value of a predictor increases by one unit. For example: (additional exercise April 2010) I. II. III. IV. Odds of a death sentence = Odds of a death sentence for black = Odds of death sentence for nonblack = Odds ratio of the black to the nonblack = Interpretation: we may say that the odds of death sentences for black are 47% more than for nonblack. We can also say that the odds of a death sentences for nonblack are 1/1.47 = 0.63 times the odds of a death sentence for blacks. So, depending on which categories were comparing, we either get an odds ratio greater than 1 or its reciprocal, which is less than 1. 1.3 The Logit Model Now were ready to introduce the logit model, otherwise known as the logistic regression model. For explanatory variables and individuals, the model is [ ]

where is the probability that The expression on the left-hand side is usually referred to as the logit or log-odds Logit, is the log of the odds, is not only linear in X, but also linear in the parameters. 2

LOGISTIC REGRESSION MODEL The positive logit values indicate that the odds are in favour of an event happening, while Negative logit values indicate that the odds are against the occurrence of an event. to obtain

We can solve the logit equation for

We can simplify further by dividing both numerator and denominator by the numerator itself:

In mathematical expression, this formula is called the logistic function and can be written as:

ranges from - to +, ranges between 0 and 1, and is nonlinearly related to Simple logit model Let and be defined as follows:

[ Hence, | |

LOGISTIC REGRESSION MODEL 1.4 Interpretation of odds-ratio If for example: Odds ratio =

This odds ratio indicates that a smoker is 3 times more likely to develop lung cancer compared to a nonsmoker. 1.5 Applying Logistic Regression This statistical method were applied to a data set to compare their predictive ability of classifying a baby as low birth weight or normal based on several predictor variables Description of Variables Variables Y X1 Description Birthweight Race Type Categorical Categorical Categorical 1= if low birth 0 = normal 2 = Malay 1 = Chinese 0 = Indian 1 = Male 0 = Female

X2 X3 X4 X5 X6 X7 X8 X9 X10

Gender Mothers age Fathers income Parity Abortion Mothers height Vitamin Weight gain Antenatal visits

Categorical Continuous (years) Continuous (RM) Integer (children) Categorical Continuous (cm) Continuous (mg) Continuous (kg) Integer (number of times)

1 = Yes 0 = No

Table 2: SPSS Results for Multiple Logistic Regression.

The estimated logistic regression model obtained:

LOGISTIC REGRESSION MODEL

where

Interpreting The values provided in the SPSS output are equivalent to the obtained in a multiple regression analysis. These are the values that you would use in an equation to calculate the probability of a case falling into a specific category. You should check whether your values are positive or negative. This will tell you about the direction of the relationship (increase/decrease) Test concerning The crucial statistic is the Wald statistic which has a chi-square distribution and tells us whether the coefficient for that predictor is significantly different from zero. If the coefficient is significantly different from zero then we assume that the predictor is making a significant contribution to the prediction of the outcome ( ). In this sense it is analogous to the -tests found in multiple regression.3

Walds p-value if Walds p-value <

, reject

The Walds p-value in Table 2 indicated that there were only four significant variables (pvalue < ) in the model; weight gain, antenatal visits, abortion and mothers height.

Interpretation of the odds-ratio i. ii. iii. Weight gain, exp( 0.824; the odds ratio means that for every 1 kg increase in weight gain, the odds of low birth weight will decrease. Antenatal visits, exp( ) =0.824; the odds ratio indicates that when a mother increases antenatal visit by 1 time, the odds of low birth weight will decrease. Abortion, exp( ) = 1.912; the odds ratio indicates that a mother who had abortion(s) is approximately 2 times more likely to have a baby with low birth weight compared to those who have no history of abortion(s). Mothers height, exp( )= 0.898; the odds ratio indicates that the odds of low birth weight is lower for taller

iv.

Assessing the Goodness of Fit of the Model

LOGISTIC REGRESSION MODEL I. Omnibus Tests of Model Coefficients gives us an overall indication of how well the model performs compared with model with none of the predictors entered into the model. For this results, we want a highly significant value (p-value must be less than 0.05) Hosmer-Lemeshow test will be used to test the goodness-of-fit of the model Cox & Snell R-square and Nagelkerke R-square values provide an indication of the amount of variation in the dependent variable explained by the model. The Classification table was also used in this study to know how well the model is able to predict the correct category. This table also provides the sensitivity and specificity of the model. Sensitivity measures the proportion of actual positives which are correctly identified, whereas Specificity measures the proportion of negative which are correctly identified. A model with high percentage of sensitivity and low in specificity are good and can be used for prediction.

II. III. IV.

The logistic regression model is a good fit for the data The logistic regression model is not a good fit for the data Chi-square statistic or p-value if p-value < , reject . if p-value > , accept . since p-value (0.511) > , accept regression model is good fit for the data.

. We can conclude that the logistic

The R-square values suggested that this model can explain about 15.7 to 24.6 percent of the total variation in the dependent variable. Example of Sensitivity and Specificity Analysis

LOGISTIC REGRESSION MODEL Overall predictive efficiency = 74.1% Sensitivity (actual positives which are correctly identified) = 141/170 = 0.8294 or 82.9% Specificity (actual negative which are correctly identified) = 54/93 = 0.5806 or 58.1% Based on these results, we can conclude that Logistic Regression Model for Bahasa Inggeris can be used to predict Form Four students achievement.

You might also like