You are on page 1of 14

UNIT 8 CORRELATION AND REGRESSION ANALYSIS Structure 8.0 Objectives 8.1 Introduction 8.2 Coalation 8.2.1 Concept 8.2.

2 Correlation and Independence 82.3 Nonsense Correlation 8.3 Regression 8.3.1 Concept 8.3.2 Correlation and Regression 8.3.3 Simple Regression 8.3.4 Multiple Regression , 8.4 Types of Data 8.6 Exercises 8.7 Key Words 8.8 Some Usefd Books 8.9 Answers or Hints to Check Your Progress 8.1 0 Answers or Hints toY3xercises I 8.0 OBJECTIVES \ After going through this unit, you will be able to: . refresh the concept of linear correlation; state(that zero correlation does not imply that the variables are independent; o n the other hand, independence of variables imply zero correlation; appreciate tJle fact that thi presence of a high degree of correlation does not necessarily amount to the existence of a meaningful relationship among the variables under consideration; explain between the concept of correlation and that of regression; refresh the method of least square in connection with two variable r&ssiop; distinguish between direct regressimreverse regression; understand how the approach to multiple regression is an extension of thk approach followed in two variable regression and know about various types of data that can be used'in regression analysis. Quantitative hlethods-l 8.1 INTRODUCTION Quantitative techniques are important tools of analysis in today's research in Economics. These tools can be broadly divided into two classes: mathematical too ls and statistical tools. Econonlic research is often concerned with theorizing of some economic phenomenon. Different mathematical tools are employed to express such a theory in a precise mathematical form. This mathematical form of economic theo ry is what is generally called a inathematical model. A major purpose of the formul ation of a mathematical model is to subject it to further mathematical treatment to ga in a deeper understanding of the economic phenomenon that the researcher may be priniarily interested in. However, the theoq so developed needs to be tested in the real-world situation. In other words, the usefulness of a mathematical model dep ends on its empirical verification. Thus, in economic research, often a researcher is hardpressed to put the mathematical model in such a form that it can render itself to empirical verification. For h s purpose, various statistical techniques have bee n found

to be extremely useful. We should note here, that often such techniques have bee n appropriately modified to suit the purposes of the economists. Consequently, a v ery rich and powerful area of economic analysis known as Econometrics has grown over the years. We may provide a working definition of Econometrics here. It may be described as the application of statistical tools in the quantitative analysi s of economic phenomena. We may mention here that econometricians have not only provided important tools for economic analysis but also their contributions have significantly enriched the subject matter of Statistical Science in general. Tod ay, no researcher can possibly ignore the need for being familiar with econometric tool s for the purpose of serious empirical economic analysis. In the subsequent units, you will learn about regression models of econometric analysis. The concepts of correlati on and regression form the core of regression models. You are already familiar with these two concepts, as you have studied them in the compulsory course on Quantitative Methods (MEC-003). In this unit we are going to put the two concept s in the perspective of en~piricarle search in Economics. Here, our emphasis will be on examining how the applications of these two concepts are important in studying t he possibility of relationship that may exist among economic variables. 8.2 CORRELATION 8.2.1 Concept In the introduction to this chapter, we have already referred to a mathematical model of some real-world observable economic phenomenon. In general, a model consists of some hctional relationships, some equations, some identities and some constra ints. Once, a model like this is formulated, the next issue is to examine how this mod el works in the real-world situation, for example, in India. This is what is known as the estimation of an econometric model. It may be mentioned here that Lawrence Klein did some pioneering work in the formulation and estimation of suchmodels. In fac t, many complex econometric models consisting of hundreds of functions, equations, identities and constraints have been constructed and estimated for different eco nomies of the world, including India, by using empirical data. The estimation of such complete macro-econometric models, however, involves certain issues that are beyond our scope. As a result, we shall abstract from su ch kind of a model and focus on a single equation economic relationship and conside r its empirical verification For example, inthe Keynesian model of income detemimt ion, consumption function plays a pivotal role. The essence of this relationship is t hat consumption depends on income. We may specifl a simple consumption function in the form of a linear equation with two constraints: one, autonomous part of consumption being positive and two, marginal propensity to consume being more than zero but less than on.. Thus, our consumption equation is This kind of a single equation and its estimation is commonly known as the'regre

ssion model in the econometric literature. It may be mentioned here that such a single equation regression model need not be a part of any econometric model and can be a mathematical formulation of some independently observed economic phenomenon. Any scientific inquiry has to be conducted systematically, and, economic inquiry is no exception. In the case of our regression model involving consumption and inca me, for example, aprelirninary step may be to examine, whether in the real-world sit uation, there exists any relationship between consumption and income at all. This is pre cisely what we attempt at with the help of the concept of correlation. Thus, at the mom ent, we are not concerned with the issue of dependence of consumption on income or vice-versa. We are simply interested in the possible co-movement of the two variables. We shall focus on the difference between correlation and regression l ater. Correlation can be defined as a quantitative measure of the degree or strength o f relationship that may exist between two variables. You are already familiar with the concept of Karl Pearson's coefficient of correlation. IfXand Yare two variables, we know that this correlation coefficient is given by the ratio of q e covarianc e between X and Y to the product of the standard deviation of X and that of Y. In ' symbols: The symbols have usual meaning. Here, the covariance in the numerator is importa nt. This in fact, gives a measure of the simultaneous change in the two uariables. I t is divided by product of the standard deviation of X and Y to make the measure fke of any unit in order to facilitate a comparison between more than one set of bi-variate data which may be expressed in different units. It may be noted here that this measure! of correlatibn coefficient is independent of a shift in the origin and a change of scale. The correlation coefficient lies between +1 and -1. In symbol : If the two variables tend to move in the same direction, the correlation coeffic ient is positive. In the event ofthe two variables ten- to move iri the opjposite direct ions, the correlation coefficient assumes a negative value. In the case of a perfect correlationship, the correlation coefficient is either +1 or -1, which is almost impossible in economics. When there does not seem to be any relationship between the two variables on the basis of the available data, the correlation coefficien t may assume a value equal to zero. * It should be noted here that Karl Pearson's correlation coefficient measures lin ear correlationship between two variables. This means that there exists a proportion al relationship between the two variables i.e., the two variables change in a fixed proportion. For example, we may find that the correlation coefficient between Correlation and Regression Analysis Vuantitallve Methods-' disposable income and personal consunlp~ione xpenditure i n India on the basis of some

national income data is 0.7. It only means that consumption in relation to incom e or income in relation to consumption changes by a factor of 0.7. We again stress he re that at the moment we are not commenting on whether income is the independent variabl e and consumption is the dependent variable or it is the other way round. It is important he^ to comment on what is known as coefficient of determination. Although iVis numerically equal to the square of the correlation coefficient, conceptuall y it is quite different from the correlation coefficient. We shall discuss this concept in det ails in the next unit. Example 8.1 If three uncorrelated variables x,, x2 and x3 have the same standard deviation, find the correlation coefficient between x, + x2 and x2 + x3. Supposeu = x, + x, and similarly,~= x, + x,.Then,wegave to find r,,. Let a,,a , and a, bc the standard deviationsof x, , x, and x, respectivdy. Let cov(x, , x,), cov(x , , x,) and cov(x, , x; ) be covarianccs between t k pairsof the variables(x, , x,), (x, , x ,) and (x, , x,) respectivdy.Since,it is given thatthe variablesare uncorrelakd, we have Let XI, X2 and X3 be the means of X,, X2 and X3 . So, respectively, we have, Therefore, r,, = cov(U,V)-- -a' = 0.5 a , 2a2 8.2.2 Correlation abd Independence We should appreciate that in the real-world situation the relationship between t wo variables may not be linear in nature. In fact, often variables are involved in all kinds of non-linear relationships. Thus, we should be very clear that even when Karl Pearson's correlation coejficient is found to be zero, the two variables might s till be related in a non-linear ynner. The frequently quoted statement, "Independence of two variables implies zero correlation coefficient but the converse is not neces sarily true." exemplifies this fact. Statistics and consequently Econometrics of non-li near relationships are quite involved in nature and beyond the scope of the present d iscussion. Consequently, at this stage, linearity should be taken as a necessary simplieing assumption. However, we shall see later that essentially a non-linear reIationsh ip can sometimes be reduced to a linear relationship through some appropriate transform ation and the tools of linear analysis can still be effectivery applied to such transf ormed relationships. We often employ1 su h te chniques as apractical solution to the c omplexities iilvolved in a non-linear relationship. 8.2.3 Nonsense Correlation Sometimes, two variables, even when seem to be not related in any manner, may display a high degree of correlation. Yule called this kind oi'correlation as 'n onsense correlation'. lf we measure two variables at regular time intervals, both the va riables may display a strong time-trend. As a result, the two variables may display a st rong

correlationship even when they are unrelated. Thus one should be very careful wh ile using such a source of data. In fact, a new branch of econometrics, known as Tim e Series econometrics, has been developed for exclusively handling such a situatio n. Another situation when two, seemingly unrelated, variables may display ahigh deg ree of correlation is the result ofthe influence of a third variable on both of them . Thus, the existence of a correlation between two variables does not necessarily imply a relationship between them. It only indicates that the data are not inconsistent with the possibility of such a relationship. The reasonableness of a possible relatio nship must be established on theoretidal considerations first, and then we should proc eed with the computation of correlation. Check Your Progress 1 1) i) Define correlation between two variables. ii) How do you measure linear correlation between two variables? iii) Why is this measure called a measure of linear correlation? 2) Explain how two independent variables have zero correlation but the converse is not true. 3) Does the presence of strong correlation between two variables necessarily imply the existence of a meaningful relationship between them? REGRESSION 8.3.1 Concept The term regression literally means a backward movement. Francis Galton first us ed the term in the late nineteenth century. He studied the relationship between the height of parents and that of children. Galton observed that although tall parents had tall children and similarly short parents had short children in a statistical sense, but in general the children's height tended towards an average value. In other words, t he children's height moved backward or regressed to the average. However, now the term regression in statistics has nothing to do with its earlier connotation of a backward movement. Correlation and Regression Analysis Quantitative Methods-l Regression analysis can be described as the study of the dependence of one variable on another or more vatiables. In other words, we can use it for examining the relationship that may exist among certain variables. For example, we may be inte rested in issues like how the aggregate demand for money depends upon the aggregate income level in an economy. We may employ regression technique to examine this. Here, Aggregate demand for money is called the dependent variable and aggregate income level is called the independent variable. Consequently, we have a simple demand for money hction. In this context, we present the following table to show some of the terms that are also used in the literature in place of dependent var iable and independent variable. Table 8.1: Classifying Terms for Variables in Regression Analysis Dependent Variable Independent Variable

Explained Variable Explanatory Variable Regressand Regressor Predictand Predictor Endogenous Variable Exogenous Variable Controlled Variable Control Variable Target Variable Control Variable Response Variable Stimulus Variable Source: Maddala (2002) and Gujrati (2003). It is now important to clarify that the terms dependent and independent do not necessarily imply a causal connection between the two types of variables. Thus, regression analysis per-se is not really concerned with causality analysis. Acau sal connection has to be established first by some theory that is outside the parlan ce of the regression analysis. In our earlier example of consumption function and the present example of demand for money function we have theories like Keynesian income hypothesis and transaction demand for money. On the basis of such theories perha ps we can employ regression technique to get some preliminary idea of some causal connection involving certain variables. In fact, causality study is now a highly speckhxd branch of econometrics and goes far beyond the scope of the ordinary regression analysis. A major purpose of regression analysis is to predict the value of one variable g iven the value of another or more variables. Thus, we may be interested in predicting the aggregate demand of money from a given value of aggregate income. We should be clear that by virtue of the very nature of economics and other bran ches of social science, the concern is a statistical relationship involving some vari ables rather than an exact mathematical relationship as we may obtain in natural scien ce. Consequently, if we are able to establish some kind of a relationship between an independent variable Xand a dependent variable Y, it can be expected to give us ogy sort of an average value of Y for a given value ofX. This kind of arelations hip is known as a statistical or stochastic relationship. Regression method is essentia lly concerned with the analysis of such kind of a stochastic relationship. From the above discussion, it should be clear that in our context, the dependent variable is assumed to be stochastic or random. In contrast, the independent var iables are taken to be non-stochastic or non-random. However, we must mention here that at an advanced level, even the independent variables are assumed to be stochasti c. In the next unit, we shall discuss the stochastic nature of the regression analy sis in details. If a regression relationship has just one independent variable, it is called a t wo Correlation and variable or simple regression. On the other hand, if we have more than one Regre ssion Analysis independent variable in it, then it is multiple repsion. 8.3.2 Correlation and Regression Earlier we made a reference ta the conceptual difference between correlation and regression. We may discuss it here. In regression analysis, we examine the natur e of

the relationship between the dependent and the independent variables. Here, as stated earlier, we try to estimate the average value of one variable h mth e giv en values of other variables. In correlation, on the other hand, our focus is on th e measurement of the strength of such a relationship. Consequently, in regression, we classify the variables in two classes of dependent and independent variables. In correlation, the treatment of the variables is rather symmetric; we do not have such kind of a classification. Finally, in regression, at our level, we take the depe ndent variable as random or stochastic and the independent variables as non-random or fixed. In correlation, in contrast, all the variables are implicitly taken to be random in nature. 8.3.3 Simple Regression Here, we are focusing on just one independent variable. The first thing that we have to do is to specify he relationship between Xand Y. Let us assume that there is a linear relationship between the two variables like: The concept of linearity, however, requires some clarification. We are postponin g that discussion to the next unit. Moreover, there can be various types of intrin sically non-linear relationships also. The treatment of such relationships is beyond our scope. Our purpose is to estimate the constants a and b from empirical observations on X and Y. Tlzr Method of Least Squares Usually, we have a sample of observations of a given size say, n. If we plot the n pairs of observations, we obtain a scatter-plot, as it is known in the literatur e. An example of a scatter-plot is presented below. X Fig. 8.1: Scatter-Plot Quantitative Methods-I A visual inspection of the scatter-plot makes it clear th at for different values ofX, the corresponding values of Yare not aligned on a straight line. As we have ment ioned earlier, in regression, we are concerned with an inexact or statistical relation ship. And this is the consequence of such a relationship. Now, the constants a and b a re respectively the intercept and slope of the straight line described by the above mentioned linear equation and several straight lines with different pairs of the values (a, b) can be passed through the above scatter. Our concern is the choice of a particular pair as the estimates of a and b for the regression equation under consideration. Obviously, this calls for an objective criterion. Such a criterion is provided by the method of least squares. The philosophy behi nd the least squares method is that we should fit in a straight line through the sc atterplot, in such a manner that the vertical differences between the observed values of Y and the correspondmg values obtained fiom the straight line for different values of X, called errors, are minimum. The line fitted in such a fashion is called the r

egression line. The values of a and b obtained fiom the regression line are taken to be th e estimates of the intercept and slope (regression coefficient) of the regression equation. The values of Y obtained fiom regression line are called the estimated values of Y. A stylized scatter-plot with a straight line fitted in it is presented below: The method of least square requires that we should choose our a and b in such a manner that sum of the squares of the vertical differences between the actual va lues or observed values of Yand the ones obtained &om the straight line is minimum. Putting mathematically, with respect to a and b where f is called the estimated value of Y. The values of a and b so obtained ar e known as the least-square estimates of a and b and are normally denoted by 2 and b^ This is a well-known minimization procedure of calculus and you must have done that in the course on Quantitative Methods (MEC-003). You $so must haveobtained the normal equations and solved them for obtaining 6 and b" . We are leaving tha t as an exercise for this unit. The earlier shown ~cat&r-~lwoitth a regression line i s presented below: Fig. 8.3: Scatter-Plot with the Regression Line ' This regression line, obviously, has a negative intercept. If we recapitulate, t he Correlation and two normal equations that we obtained fiom the above-mentioned procedure are Reg ression Analysis given by and After solving the two equations simultaneously we obtain the least square estima tes - C (X- X)(Y- 0 h = x (X- Xp and ~nreg ression analysis, the slope coefficient assumes special significance. It m easures the rate of change of the dependent variable with respect to the independent var iable. As a result, it is this constant that indicates whether there exists a relations hip between X and Y or not. The regression equation is in fact called the regression of Y on X, the slope b of this equation is term ed as the regression coefficient of Y on X. It is also denoted by byx . Aglance at the exp ression of the regression coefficient Yon X makes it quite clear that the above expressi on can also be written as Thus, putting the values of a and b, the regression equation of Y on X can be written as Reverse Regression Suppose, in another regression relationship Xacts as the dependent variable and Y as the independent variable. Then that relationship is called the regression ofX on Y. Here, we should dewtely avoid the temptation of expressing Xin terms of Y fiom the regression equation of Y on Xto obtain that ofXon Y and trying to mechanical ly

extract the least square estimates of its constants fiom the already known value s of B and i . The regression ofXon Y is in fact intrinsically differeat fiom that of Y onX. Geometrically speaking, in regression ofXon Y, we minimize the sum ofthe squares of the horizontal distances as against the mhhization of the sum of the squares of the vertical distances in Yon X, for obtaining the least square estimates. If ou r regression equation of X on Y is given by d Quantitative Methods-l < ' then its least square estimates are given by the crit erion: with respect to a' and b' By applying the usual minimization procedure, we obtain the following two normal equations: and We can simultaneously solve these two equations to get the least square estimate s and -- A i' = X-b' Y The slope b' of the regression of Xon Y is called the regression coefficient ofX on Y. It measures the rate of change ofXwith respect to Y, in order to distingui sh it clearly from the regression coefficient of Y onX; we also use the symbol b, for it. Puttipg the values of a' and b', the regression equation of X on Y can be writte n as To highlight the *nt diffaence between the two kinds of remion, the regression of Y on Xis sometirnes'termed as the direct regression and that of the Xon Yis c alled the reverse regression. Maddala (2002) gives an example of direct regression and reverse regression in connection with the issue of gender bias in the offer of emoluments. Let us assume that the variable Xrepresents qualifications and the variable Yrepresents emoluments. We may be interested in finding whether males and females with the1s ame qualifications receive the same emoluments or not. We may examine this by running the direct regression of Y onX. Alternatively, we ma y be curious about if males and females with the same emoluments possess the same qualifications or not. We may investigate into this by running the reverse regre ssion ofXon Y. Thus, it is perhaps valid to run both the regressions in order to have a clear insight into the question of gender bias in emoluments. Properties Let us now briefly consider some of the properties of the regression. 1) The product of the two regression coefficients is always equal to the square of the correlation coefficient: 2) The two regression coefficients have the same sign. In fact, the sign of the two coefficients depends upon the sign of the correlation coefficient. Since the sta ndard deviations of both Xand Yare, by definition, positive; if correlation coefficien t is positive, both the regression coefficients are positive and similarly, if correl ation coefficient happens to be negative, both the regression coefficients become nega tive.

- 3) The two regression lines always intersect each other at the point (X, Y). 4) When r = _+ 1 , there is an exact linear relationship between X and Y and in that case, the two regression lines coincide with each other. 5) When r = 0, the two regression equations reduce to Y = Y and X = z. In such a situation, neither YnorXcan be estimated from their respective regression equations. As mentioned earlier, coefficient of determination is an important concept in th e context of regression analysis. However, the concept will be more contextual if we discu ss it in the next unit. Example 8.2 From the following results, obtain the two regression equations and the estimate of the yield of crop, when the rainfall is 22 cm; and the rainfall, when the yield is 6 00 kg. Yield in kg Mean 508.4 Standard Deviation 36.8 Rainfall in cm 26.7 4.6 Co-efficient of correlation between yield and rainfall = 0.52. Let Y be yield and X be rainfall. So, for estimating the yield, we have to run t he regression of Y on Xand for the purpose of estimating the rainfall, we have to u se the regression ofXon Y. Wehave, z=26.7, Y=508.4, a, =4.6, a, =36.8:and r=0.!2 36.8 :. regression coefficients by, = 0.52 x 4.6 = 4.16and b, = 0.52~- = 0.065 4.6 36.8 Hence, the regression equation of Yon X is Y -508.4 = 4.16(X - 26.7) orY = 4.16X +397.33 Similarly, the regression equation of X on Y is X - 26.7 = O.O65(Y - 508.4) or X = 0.065Y - 6.346 When X = 22, Y = 4.16 x 22 + 397.33 = 488.8 When Y = 600, Y = 0.065 x 600 - 6.346 = 32.7 Hence, the estimated yield of crop is 488.8 kg and'the estimated rainfall is 32. 7 cm. Correlation and Regression Analysis i Quantitative Methods-] 8.4 TYPES OF DATA We conclude this unit by discussing the types of data that may be used for the purpose of economic analysis in general and regression analysis in particular. W e can use three kinds of data for the empirical verification of any economic pheno menon. They are: time series, cross section, pooled or panel data. Time Series Data A time series is a collection of the values of a variable that are observed at d ifferent points of time. Generally, the interval between two successive points of time re mains fixed. In other words, we collect data at regular time intervals. Such data may

be collected daily, weekly, monthly, quarterly or annually. We have for example, da ily data series for gold price, weekly money supply figures, monthly price index, qu arterly GDP series and annual budget data. Sometimes, we may have the same data in more than one time interval series; for example, both quarterly and annual GDP series may be available. The time interval is generally called the frequency of the time series. It should be clear that the above-mentioned list of time intervals is by no means an exhaustive one. There can be, for example, an hourly time series like t hat of stock price sensitivity index. Similarly, we may have decennial population ce nsus figures. We should note that conventionally, if the frequency is one year or mor e, it is called a low frequency time series. On the other hand, if the frequency is less than one year, it is termed as a high frequency time series. A major problem with tim e series is what is known as non-stationary data. The presence of non-stationarity is the main reason for nonsense correlation that we talked about in connection with our discussion on correlation. Cross Section Data In cross section data, we have observations for a variable for different units a t the same point of time. For example, we have the state domestic product figures for different states in India for a particular year. Similarly, we may collect vario us stock price figures at the same point of time in a particular day. Cross section data are also not free from problems. One main problem with this kind of data is that of the heterogeneity that we shall refer to in the next unit. Pooled Data Here, we may have time series observations for various cross sectional units. Fo r example, we may have time series of domestic product of each state for India and we may have a panel of such series. This is why such kind of a data set is calle d panel data. Thus, in this kind of data, we combine the element of time series wi th that of cross section data. One major advantage with such kind of data is that w e may have quite a large data set and the problem of degrees of freedom that mainl y arises due to the non-availability of adequate data can largely be overcome. Rec ently, the treatment of panel data has received much attention inempirical economic ana lysis. Check Your Progress 2 1 ) Explain how regression is not primarily concerned with causality analysis. 2) Bring out the difference between correlation and regression. I .............................................................................. .................................... ! ......................................... ......................................................................... 1 3) What is the distinction between Time Series Data and Cross Section Data? ! ................................................................................

.................................. ............................................... ................................ ................................... ................................................................................ .................................. 4) Explh the concept of reverse regression. I I 8.5 LET US SUM UP I I Regression models occupy a central place in empirical economic analysis. These I i models are essentially based on the concepts of comelation and regression. Cor relation - .is a quantitative measure of the strength of the linear relationship that may exist among some variables. The existence of a high degree of correlation, however, is not necessarily the evidence of a meaningll relationship. It only suggests that the data are not inconsistent with the possibility of such kind of a relationship. R egression on the other hand focuses on the direction of a linear relationship. Here, one i s concerned with the dependence of one variable on other variables. Regression, in itself, does not suggest any causal relationship. Correlation and regression, bo th are concerned with a statistical or stochastic relationship as against amathematical or an exact relationship. In the conventional regression analysis, the dependent varia ble is treated to be stochastic or random, whereas, the independent variables are taken to , be non-stochastic in nature. The constants of a regression equation are estima ted h mth e empirical observations by using the least square technique. In atwo vari able regression equation, there is one dependent variable and one independent variabl e. The slope coefficient of a regression equation is called the regression coeffici ent. It measures the rate of change of the dependent variable with respect to the indepe ndent variable. The distinction between the concept of direct regression and that of t he reverse regression is crucial in the regression analysis. Sometimes by running b oth the kinds of regression, important insight can be gained in the empirical econom ic analysis. In multiple regression, there are at least two independent variables. Finally, in regression analysis, three types of data, namely, time series, cross section and pooled, can be used. - Correlation and Regression Analysis Quantitative Methods-I 8.6 EXERCISES 1) Prove that correlation coefficient lies between - 1 and + 1. 2) Show that correlation coefficient is unaffected by a shift in the origin and a change of scale. 3) For the regression equation of Y onX, derive the least square estimators of t he

parameters. Try and work out the same for the regression equation ofXon Y. 4) From the following data, derive that regression equation which you consider t o be economically more meaningful. Give justification for your choice. Output 5 7 9 11 13 15 Profit per unit 1.70 2.40 2.80 3.40 3.70 4.40 5) To study the effect of rain on yield of wheat, the following results were obt ained: Mean Standard Deviation Yield in kg per acre 800 12 M a l l in inches 50 2 Correlation coefficient is 0.80. Estimate the yield, when d a l l is 80 inches. KEY WORDS coefficient of : It is equal to the square ofthe correlation coefficient. Determination Corklation : It is a quantitative measure of the strength of the relationship that may exist among certain variables. Cross Section Data : In cross section data, we have observations for a variable for different units at the same point of time. Econometrics : It is described as the application of statistical tools in the quantitative analysis of economic phenomena. Mathematical Model : The mathematical form of some economic theory is what is generally called a mathematical model. Method of Least : 1t is the method of estimating the parameters of a Square regression equation in such a fashion that the sum of the squares of the differences between the actual values or observed values of the dependent variable and their estimated values from the regression equation is minimum. Multiple Regression : It is a regression equation with more than one independent variable. Nonsense Correlation : The presence of correlation between two variables when there does not exist any meanmgfd relationship between them is known as nonsense correlation. Pooled Data : In pooled data, we have time series observations for various cross sectional units. Here, we combine the element oftime series with that of cross section data. Regression Equation : It is the equation that specifies the relationship between the dependent and the independent variables for the purpose of estimating the constants or the parameters of the equation with the help of empirical data on the variables. Regression : It is a statistical analysis of the nature of the relationship between the dependent and the independent variables. Reverse Regression : It is an independent estimation of a new regression equation when the independent variable ofthe origu7al equation is changed into the dependent variable and the dependent variable of the original equation is changed into the independent variable. Time Series Data : It is a series of the values of a variable obtained at different points of time. Two Variable : It is a regression equation with one independent Regression variable. 8.8 SOME USEFUL BOOKS Gujrati, Damodar N. (2003); Basic Econometrics, Fourth Edition, Chapter 2, Chapter 3, and Chapter 7, McGraw-Hill, New York.

Maddala, G.S. (2002); Introduction to Econometrics, Third Edition, Chapter 3 and Chapter 4, John Wiley & Sons Ltd., West Sussex. Pindyck, Robert S. and Rubinfeld, Daniel L. (1 991), Econometric Models and Economic Forecasts, Third Edition, Chapter 1; McGraw-Hill, New York. Karmel, P.H. and Polasek, M. (1 986); AppliedStatistics for Economists, Fourth Edition, Chapter 8, Khosla Publishing House, Delhi. 8.9 ANSWERS OR HINTS TO CHECK YOUR PROGRESS Check Your Progress 1 1) i) See section 8.2.1. ii) See section 8.2.1. No, because, it is assumed that there exists a linear relationship between \ the variables. 2) See section 8.2.2. 3) See section 8.2.3. Correlation and Regression Analysis Quantitative Methods-I Check Your Progress 2 1) See section 8.3.1. 2) Seesection8.3.2. 3) See section 8.4. 8.10 ANSWERS OR HINTS TO EXERCISES 1) Do Yourself. 2) Do Yourself. 3) Y = 0.257X + 0.50 4) Do Yourself. 5) 944 kg per acre.

You might also like