You are on page 1of 36
a1 42 43 45 46 47 48 Demand Estimation ‘ a CHAPTER OUTLINE f The Identification Problem ‘Marketing Research Approaches to Demand Estimation + Consumer Surveys and Observational Research + Case Application 4-1: Micromarketing: Marketers Zero in on Their Customers * Consumer Clinics * Market Experiments * Case Application 4-2: Estimation ofthe Demand for Oranges by Market Experiment * Case Applcatian 4-3: Virtual Shopping and Virtual ‘Management as New Managerial Tools Introduction to Regression Analysis Simple Regression Analysis * The Ordinary Least-Squares Method + Tests of Significance of Parameter Estimates + Other Aspects of Significance Tests and Confidence Intervals + Test of Goodness of Fit and Correlation Multiple Regression Analysis + The Multiple Regression- Model * The Coefficient of Determination and Adjusted R? + Analysis of Variance * Point and Interval Estimates Problems in Regression Analysis * Multicollinearity + Heteroscedasticity * Autocorrélation Demand Estimation by Regression Analysis + Model Specification * Collecting Data on the Variables * Specifying the Form ofthe Demand Equation * Testing the Econometric Results * Case Application 4-4: Estimation of the Demand for Air Travel ver the North Atlantic Estimating the Demand for U.S. Imports and Exports + Case Application 4-5: Price and Income Elasticities of Imports and Exports in the Real World * Case Application 4-6: The Major Commodity Exports and Imports of the United States * Case x Chapter /| “3 KEY TERMS (in order of their appearance) Identification problem Consumer surveys Observational research Micromarketing ‘Customer relationship management (CRM) Consumer clinics Market experiments Scatter diagram Regression analysis Least-squares method Degrees of freedom (df) t statistic test Confidence interval Coefficient of determination (R2) Total variation Explained vari Unexplained variation Coefficient of correlation Multiple regression analysis. Adjusted R? (22) Analysis of variance F statistic Standard error (SE) ofthe regression Multicollinearity Heteroscedasticity Cross-sectional data Autocorrelation TTime-series data Durbin-Watson statistic (d) | | | | { | | 138 | Part 2 Demand Analysis Application 4-7: The Major Trade Partners ofthe United States + Case Applicaton 4-8: The Top U.S. International Exporters Summary * Discussion Questions * Problems * Internet Site Addresses a this chapter we build on the analysis of consumer demand theory examined in I chapter 3 to show how a firm can estimate the demand for the product it sels. We saw in Chapter 3 that the forces that affect demand are the price of the com- modity, consumers’ incomes, the price of related (i,e., substitute and complemen- tary) commodities, consumers’ tastes, and other more specific forces that are im- portant for the particular commodity. We also saw in Chapter 3 that reliable estimates of the quantitative effect on sales of all the significant forces that affect |. demand are essential for the firm to make the best operating decisions and for planning. Important questions to which we seek an answer in this chapter are: How much will the revenues of the firm change after increasing the price of the commodity by a certain amount? How much will the quantity demanded of the commodity in- crease if consumers’ incomes increase by a specific amount? What ifthe firm dou- bles its advertising expenditures and/or if it provides a particular credit incentive to consumers? How much would the demand that a firm faces for its product fall if competitors lowered their prices, increased their advertising expenditures, or provided credit incentives? Firms must know the answers to these and other ques- pesto achieve the objective of maximizing their value. The answers are just as im~ portant for not-for-profit organizations. For example, it is crucial for a state uni- tersiy to know how much enrollment would decline with a 10 percent increase in tuition, how the socioeconomic composition of its student body would change, and how the number of out-of-state students would be affected. in this chapter, we begin by examining some general difficulties encountered in deriving the demand curve for a product from market data (the identification prob- lem). Then, we briefly discuss some marketing research approaches to demand es- timation. Subsequently, we focus on regression analysis as the most useful and common method of demand estimation. Finally, we discuss the estimation of the Semand for U.S. imports and exports. In the next chapter, we will examine meth- ods of forecasting demand. The Identification Problem [7 The demand curve for a commodity is generally estimated from market data on the quantity purchased of the commodity at various prices over time (i.e using time- series data) or for various consuming units or markets at one point in time (i.e., using cross-sectional data, However, simply joining the price-quantity observations on @ graph does not generat the demand curve for the ‘commodity. The reason is that each Chapter 4 Demand Estimation price-quantity observation is given by the intersection of a different (but unobserved) demand and supply curve of the commodity.! Over time or across different individuals or markets, the demand for the com- modity shifts or differs because of changes or differences in tastes, incomes, price of related commodities, and soon. Similarly, OVS Um OF Reese erent alr or markets, the supply curve shifts or is different because of changes or differences in technology, factor prices, and weather conditions (for agricultural commodities). The intersection (equilibrium) of the different but unknown demand and supply curves generates the different price-quantity points observed. (If the demand and supply curves did not shift or differ, the commodity price would remain the same.) There- fore, by simply joining the different price-quantity observations, we do not generate the demand curve for the commodity. The demand curve cannot be identified so sim- ply. This is referred to as the identification problem. For example, in Figure 4-1, oaly price-quantity points Ey, Ep, Bs, and Ey are ob- served? Each of these pricg-quantity observations, however, lies on a different de- mand and supply curve. These different demand curves result from changes in tastes, incomes, and prices of related commodities over time (with time-series analysis), or from differences in tastes, incomes, and prices of related commodities across.differ- ent individuals or markets (with cross-sectional data). It is clear, therefore, that sim- ply joining points E), E», Es, and E, by a line as in Figure 4-1 does not generate the demand curve for the commodity. Thus, the dashed line connecting points Ey, E>, Es, and Ey in Figure 4-1 is not the demand curve for the commodity. In order to derive the demand curve for the commodity from the observed price- quantity data points, we should allow the supply curve of the commodity to shift or to differ, in an unrestricted manner, as shown in Figure 4-1, while we adjust or cor- rect for the shifts or differences in the démand curve. That is, we must adjust or cor- rect for the effect on the demand for the commodity resulting from changes or dif- ferences in consumers’ incomes, in the price of related commodities, in consumers” tastes, and in other factors that cause the demand curve of the particular commodity to shift or to be different, so that we can isolate or identify the effect on the quantity demanded of the commodity resulting only from a change in its price. This price- quantity relationship, after correction for all the forces that cause the demand curve to shift or to be different, gives the true démand curve for the commodity (say, D2, in Figure 4-1). Note that in Figure 4-1, the colored demand curves that we seek to identify are flatter or more elastic than the dashed line joining the price-quantity observation points. Which of the colored demand curves shown in Figure 4-1 we actually derive depends on the level at which we hold constant consumers’ incomes, the price of related commodities, consumers’ tastes, and other forces that cause the demand curve "Brom principles of economics and the appendix to Chapter 1, we know that the supply curve of a commodity shows the quantity sup- plied ofthe commodity pe ime period at various prices ofthe commodity, while holding constant all the other determinants of supply. To derive a demand curve from market data, we need many more points, but in order to keep the figure simple, we assume that we have only the four price quantity observations shown in Figure 41 | 139 | | | | | 1 | | | | { | 142 | Part 2 Demand Analysis and the effectiveness of commercials, as well as television viewing patterns. Scan- ners and people meters, however, raise legal questions about privacy. Observational research does not, however, render consumer surveys useless. Sometimes consumer surveys are the only way to obtain information about possible consumers’ responses. For example, if a firm is thinking of introducing a new prod- uct or changing the quality of an existing one, the only way that the firm can test con- sumers’ reactions is to directly ask them, since no other data are available. From the survey, the researcher then typically tries to determine the demographic characteris tics (age, sex, education, income, family size) of consumers who are most likely to purchase the product. The same might be true in detecting changes in consumer tastes and preferences and in determining consumers” expectations about future prices and business conditions. Consumer surveys can also be useful in detecting consumers’ awareness of an advertising campaign by the firm. Furthermore, if the survey shows that consumers are unaware of price differences between the firm’s product and com- petitive products, this might be a good indication that the demand for the firm’s prod- tact is price inelastic. Case Application 4-1 examines micromarketing and customer relationship management, two of the most important new marketing research ap- proaches to demand estimation and marketing. Consumer Clinics Another approach to demand estimation is consumer clinics. These are laboratory experiments in which the participants are given a sum of money and asked to spend it in a simulated store to see how they react to changes in the commodity price, prod- uct packaging, displays, price of competing products, and other factors affecting de- mand, Participants in the experiment can be selected so as to closely represent the so- cioeconomic characteristics of the market of interest. Participants have an incentive to purchase the commodities they want the most because they are usually allowed to keep the goods purchased. Thus, consumer clinics are more realistic than consumer surveys. By being able to control the environment, consumer clinics also avoid the pitfall of actual market experiments (discussed next), Which can be ruined by extra- neous events, = Consumer clinics also face serious shortcomings, howeve questionable because participants know that they are in an artifici they are being observed. Therefore, they are not likely to act normally, as they would in a teal market situation. For example, suspecting that the researchers might be in- terested in their reaction to price changes, participants are-likely to show more sen- sitivity to price changes than in their everyday shopping’ Second) the sample of par- ticipants must necessarily be small because of the high Cost of running the experiment. Inferring a market behavior from the results of an experiment based on 1 very small sample can also be dangerous. Despite these disadvantages, consumer clinics can provide useful information about the demand for the firm’s product, par- ticularly if consumer clinics are supplemented with consumer surveys. . (Fin, the results are situation and that CASE APPLICATION 4-1 Chapter 4 Demand Estimation Micromarketing: Marketers Zero in on Their Customers ‘More and more consumer-product companies are narrowing their marketing strategy from the region and city to the individual neighbor- ‘hood and single store. The aim of such detailed point-of-sale information, or micromarket- ing, is to identify, store by store, the types of Products with the greatest potential appeal for the specific customers in the area. Using cen- ‘sus data and checkout scanners, Market Met- rics, a marketing research firm, collects con- sumer information at more than 30,000 supermarkets around the country. For example, for a particular grocery store in Georgia, Pennsylvania, Market Metrics found that potential customers were predominantly white, blue collar, and owned two cars, that they livedh households of three or four people and had an average income of $54,421, and that 26 percent of the people were below the age of 15. Based on these demographic and economic characteristics, Market Metrics determined that the strongest sellers in this market would be baby foods and grooming items, baking mixes, desserts, dry dinner mixes, cigarettes, laundry supplies, first-aid products, and milk. Less strong would be sales of artificial sweeteners, tea, books, film, prepared food, yogurt, wine, and liquor. Such store-specific micromarketing is likely to become more and more: ‘common and necessary for successful retailing. As marketers refine their tools, they are in- creasingly taking aim at the ultimate narrow target: the individual consumer. Indeed, many companies, led by banks, are assembling cus- tomer profiles and employing sophisticated technology called neural networks in order to set up one-to-one marketing (also called rela- tionship marketing or customer relationship management). This seeks to reach the individ- ual consumer and establish a learning relation: ship with each customer, starting with the most valuable ones. This is exactly What Amazon.com does when it reminds a customer that a book that might interest her has just Source: ‘come in. One-to-one marketing requires iden- tifying the company's customers, differentiat- ing among them, interacting with them, and customizing the product or service to fit each individual customer's needs. For example, the Quaker Oats Company tracks how your household redeems coupons and uses the information to refine the coupons it will offer you in the future, and Merrill Lynch & Co. provides detailed financial information about its customers to its brokers in order to help them promote the company's financial Products. Depositing $10,000 check may ¢liminate the customer asa likely candidate for 4 car loan but not for a home mortgage loan, It might even determine whether your telephone call gets answered first (if your profile, which comes up immediately on the bank's computer screen, identifies you as a valued customer) or last, Although it is not easy to set up one-to-one marketing, and most companies may not be ca- able of it or. ready for it, itis almost certain that marketing will be getting more and more Personalized in the future, Related to micromarketing is customer re- lationship management (CRM), which refers to the use of business strategy, marketing, and information technology (IT) by which a firm can try to increase business with customers that the firm already has, especially the most profitable ones, while ying to attract new ones. It tries to do this by trying to learn every- thing possible about the buying habits of cus- tomers and making them feel that they are re- ceiving personal attention. A number of companies have sprung up that offer all sorts of TT-based CRM services to their clients, such as setting up call centers, sales force automa- tion, marketing, data analysis, and Web site ‘Management. One of the largest of these CRM consultancy companies, Accenture, estimates that with a 10 percent improvement in CRM capabilites a firm can increase its profits by as much as 4 or 5 percent, Ore faut Customer” The Wall Sueet Journal (June 21, 1999), p. R18, “Is Your Company Ready for Gre-to One Marketing?” Harvard Business Review (January-February 1999), pp. [51-16 “Winning in Smart Markets.” Sloan Management Review (Summer 1999) pp. 39-69; and “Recs on Carpe Rela- tionship Management,” Financial Times (October 17, 2001), p. 1 SSS | 143 Fart 2 Demand Analysis Market Experiments Unlike consumer clinics, which are conducted under strict laboratory conditions, -narket experiments are conducted in the actual marketplace. There are many ways of performing market experiments. One method is to select several markets with sim- jlar socioeconomic characteristics and change the commodity price in some markets or stores, packaging in other markets or stores, and the amount and type of promo- tion in still other markets or stores, then. record the responses (purchases) of con- cumers in the different markets. By using census data or surveys for various markets, a firm can also determine the effect of age, sex, level of education, income, family size, and so forth on the-demand for the commodity. Alternatively, the firm could change, one ata time, each of the determinants of demand under its control in a par- ticular market over time and record consumers’ responses. "The advantages of market experiments are that they can be conducted on a large scale to ensure the validity of the results and that consumers are not aware that they are part of an experiment, Market experiments also have serious disadvantages, the experiment is likely however. One of these is that in order to keep costs down, to be conducted on too limited a scale and over a fairly short period of time, so that salerences about the entire market and for a more extended period of time are ques- tionable. Extsaneous occurrences, such as a strike or unusually bad weather, may se- riously bias the results in uncontrolled experiments. Competitors could try to sabo- tage the experiment by also changing prices and other determinants of demand under their control. They could also monitor the experiment and gain useful infor- mation that the firm would prefer not to disclose. Finally, a firm might permanently lose customers in the process of raising prices in the market where it is experiment- ing with a high price. Despite these shortcoming: saining its best pricing strateg’ paigns, and product qualities. Market exp process of introducing a product for which n ful in verifying the results of other statistical in providing some of the data required for these other statis mand estimation. Case Application 4-2 shows how the price elasticity and cross-price elasticity of demand for Florida and California oranges have been estimated by marker expe- riment, while Case Application 4-3 deals with two much more sophisticated mar- ieting research approaches to demand estimation—virtual, shopping and virtual management. s, market experiments may be useful to a firm in deter- yy and in testing different packaging, promotional cam- eriments are particularly useful in the ‘0 other data exist. They may also be use- techniques used to estimate demand and istical techniques of de- CASE APPLICATION 4-2 Chapter 4 Demand Estimation Estimation of the Demand for Oranges by Market Experiment In 1962, researchers at the University of Florida conducted a market experiment in Grand Rapids, Michigan, to determine the price elasticity and the cross-price elasticity of demand for three types of Valencia oranges: those from the Indian River district of Florida, those from the interior district of Florida, and those from California. Grand Rapids was cho- sen as the site for the market experiment be- cause its size, demographic characteristics, and economic base were representative of other midwestern markets for oranges. Nine supermarkets participated in the ex- periment, which involved changing the price of the three types of oranges, each day. for 31 consecutive days and recording the quantity sold of each variety. The price changes ranged within 16 cents, in 4-cent increments, around the price of oranges that prevailed in the mar- ket at the time of the study. More than 9,250 dozen oranges were sold in the nine supermar- _ kets during the 31 days of the experiment Each of the supermarkets was provided with an adequate supply of each type of orange so that supply effects could be ignored. The length of the experiment was also sufficiently JD ee a ee renee Price Elasticity and Cross-Price Elasticity of Demand for / Florida Indian River, Florida Interior, and California Oranges ‘Type of Orange Florida Indian River -3.07 Florida Interior +116 California 40.18 Source: M. B. Godwin, W. F. Chapman, and W. T. Hanley, “ Competition berween Florida and California Price Elasticities and Cross-Price Elasticities Florida Indian River short so as to ensure no change in tastes, in comes, population, the rate of inflation, and determinants of demand other than price. ‘The results, summarized in Table 4-1, indi cate that the price elasticity of demand for all three types of oranges was fairly high (the bol- face numbers in the main diagonal of the table). For example, the price elasticity of demand for the Indian River oranges of ~3.07 indicates that a 1 percent increase in their price leads to a 3.07 percent decline in their quantity de manded. More interestingly, the off-diagonal entties in the table show that while the cross- price elasticities of demand between the wo types of Florida oranges were larger than 1, they were close to zero with respect to the Cal- ifornia oranges. In other words, while con sumers regarded the two types of Florida or- anges as close substitutes, they did not view the California oranges as such. In pricing their or- anges, therefore, producers of each of the wo Florida varieties would have to carefully con- sider the price of the other (as consumers switch readily among them as a result of price changes) but need not be much concemed about the price of California oranges Florida Interior California 41.56 40.01 3.01 40.14 +0.09 -2.76 lencia Oranges in the Fruit Market,” Bulletin 704 (Gainesville: University of Florida, December 1965}. 146 | Part 2 Demand Analysis CASE APPLICATION 4-3 Virtual Shopping and Virtual Management as New Managerial Tools ‘The past few years have seen the *: velopment of the new and exciting marketing tools of vir- tual shopping and virtual management. In vir- tual shopping, a representative sample of con- sumers shops in a virtual store simulated 6n the computer screen, instead of in a simulated physical store, as in consumer clinics. By doing so, virtual shopping eliminates the high cost in terms of time and money involved in consumer clinics. Virtual shopping has been made possible by recent advances in computer graphics and three-dimensional (3D) model- ing, which allows marketers to re-create the at- mosphere of an actual retail store on the com- puter screen. Consumers can see shelves stocked with all kinds of products, they can view up close any product by touching its mage on the screen so as to be able to read its label and check its content, and they can then purchase the product by touching the picture of ‘a shopping cart. The sample consumers are then asked to take a series of tips through the simulated virtual store and shop as they would in a regular retail store. Prices, packaging, dis- plays, and promotions are changed in subse~ quent trips and the consumers’ reactions recorded. Virtual shopping simulations can help in making intelligent marketing decisions quickly and inexpensively. Preliminary tests seem to indicate that virtual shopping can track rather closely the buying behavior of con- sumers in a real store. ‘Much more sophisticated is virtual man- agement. As defined in Section 2-8, virtual management refers to the ability of a manager to simulate consumer behavior using computer models based on the emerging science or the- ory of complexity. If successful, such compu- tational models would mimic human behavior sufficiently closely to allow top management to simulate or test the impact of managerial de- cisions (such as, for example, changing the product price or its characteristics) before im- plementing those decisions in the real world. ‘As contrasted to virtual shopping, virtual man- agement does not rely on actual subjects to simulate actual shopping, but relies instead on consumer behavior that has already been in- putted into the program. In virtual manage- ‘ment, there are no actual or physical subjects involved at all. Their behavior has already been distilled and incorporated into the pro- gram and is ready to be used in managerial simulations. As such, virtual management is (continued) TS Introduction to Regression Analysis* [437 qn order to introduce regression analysis, stippos¢ that a manager wants to determine the relationship between the firm’s adv venue. The manager wants to test the hypothesis that higher advertising expenditures lead to higher sales for the firm, and, furthermore, she wants to estimate the strength of the relationship (.e., how much sales increase for each dollar increase in advertising ex penditures). To this end, the manager collects data on advertising expenditures and evsales reveiiue for the firm over the past 10 years: In this case, the level of adver- tising expenditures (X) is the independent or explanatory variable, while sales rev- cenues (¥) is the dependent variable that the manager seeks to explain. Suppose that snnene fair with ogresion analysis cn we this and the next thre scons a5 a review or skip chem and goon el 10 Section 47. CASE APPLICATION 4-3 much more complex and potentially much ‘more valuable than virtual shopping. For example, Macy's Department Store in New York City has created an elaborate com- puter model based on information from con- sumer research surveys and other database in- formation in which hundreds of synthetic shoppers interact in a virtual shopping experi- ence, Macy's hopes that the system will allow it to determine (1) the number of salespeople needed in each department of the store, (2) how to turn browsers into shoppers, and (3) how to locate service desks and cash regis- ters to maximize sales. But with a sufficiently ‘more elaborate system, management could conceivably simulate almost any type of man- agerial decision, from customers’ reactions to price changes, to the effectiveness of different types of advertisements, to the sales effect of different shelf arrangements, and so on, allow- ing management to maximize the value of the firm and possibly avoid embarrassing mistakes Chapter 4 Dertand Estimation Continued since it operates with a synthetic rather than a real public. Marketing scenarios also lead t6 dramatic time compression—a marketing day can be simulated in a few minutes and a month in a few hours. ~ ‘Some top U.S. corporations, such as Ci corp, Coca-Cola, Shell International Petro: eum, and Texas Instruments, are betting hun- dreds of thousands of dollars by sponsoring research in the field of complexity at the Santa Fe Institute in New Mexico, the foremost re- search center in this new science. Virtual man agement hamesses the power of database in- formation, econometrics, and the new information technology in a way that is much ‘easier for management to understand and use than linear programming and operations re- search—and thus potentially much more use ful. The challenge of setting up simulation models that closely duplicate human market behavior, however, remains dauntingly diffi- cult. Sources: “A New Laboratory for Economists,” Business Week (March 17, 1997), pp. 96-97; “Test Marketers Use Virtual Shopping to Gauge Potential of Real Products,” The New York Times (December 22, 1997), D3; and “Virtual Management,” Business Week (September 21, 1998), pp. 80-82. of the Firm Year (t) 1 203 4 Advertising expenditures (X) 10 9 11 12 Sales revenues (Y) 44 4042 Advertising Expenditures and Sales Revenues millions of dollars) No12°13°«13 145 46 48 52 54 58 56 60 Beard 5 6 7 8 9 the advertising-sales data for the firm in each of the past 10 years that the manager has collected are those in Table 4-2. If we now plot each pair of advertising-sales values in Table 4-2 as a point on a graph, with advertising expenditures (the independent or explanatory variable) measured along the horizontal axis and. sales revenues (the dependent variable) measured along the vertical axis, we get the points (dots) in Figure 4-2. This is known as a scatter diagram since it shows the spread of the points in the X-Y plane. [tar . Part 2 Demand Analysis ure 7 Adve ing Expenditures and Sales Revenues of the Firm in Each of 10 Years y 60 © (15, 60) E . ©-(13, 58) : 55 : © (14, 56) bs © (13, 54) ; ator 2 50 3 © (11, 48) i. uaa | | 3 © (10, 44) a “ . | o(11, 42) ood | 40 # (9, 40) | | ° 9 100 12 18 4H ‘Advertising expenditure (millions of dollars) ‘Advenising expenditure (X), the independent variable, is measured along the horizontal axis, while sales revenue {Op the dependent variable, is measured along the vertical axis. Each point (dot) in the figure represents one of the advertising-salés combinations shown in Table 4-2. From Figure 4-2 (scatter diagram), we see that there is a positive relationship be- tween the, level of the firm's advertising expenditures and its sales revenues (i.e., higher advertising expenditures are associated with higher sales revenues) and that this relationship is approximately linear. One way to estimate the approximate linear relationship between the firm’s ad- vertising expenditures and its sales revenues is to draw in, by visual inspection, the positively sloped straight line that “pest” fits between the data points (so that the nts are about equally distant on either side of the line). By extending the line to the vertical axis, we can then estimate the firm’s sales revenues with zero adver- penditures. The slope of the line will then provide an estimate of the in- ci in the sales revenues that the firm can expect with each $1 million increase ip its advertising expenditures. This will give us a rough estimate of the linear rela- ‘jouship between the firm’s sales revenues: (Y) and its advertising expenditures (X) | ia the form of Equation 4-1: | Y=atbX (4-1) | Chapter 4 Demand Estimation In Equation 4-1, a is the vertical intercept of the estimated linear relationship and gives the value of Y when X = 0, while b is the slope of the line and gives an esti- mate of the increase in ¥ resulting from each unit increase in X. The manager could use this information to estimate how much the sales revenues of the firm would be if its advertising expenditures were anywhere between $9 million and $15 million per year (the range of the advertising expenditures given in Table 4-2 and shown in Figure 4-2), or if advertising expenditures increased, say, to $16 million per year, or fell to $8 million per year. The difficulty with the visual fitting of a line to the data points in Figure 4-2 is that different researchers would probably fit a somewhat different line to the same data points and obtain somewhat different results. Regression analysis is a statistical technique for obtaining the line that best fits the data points according to an objec- tive statistical criterion, so that all researchers looking at the same data would get ex- actly the same result (i.e., obtain the same line). Specifically, the regression line is the line obtained by minimizing the sum of the squared vertical deviations of each point from the regression line. This method is, therefore, appropriately called the “or- dinary least-squares method,” or “OLS” in short. The regression line fitted by such a least-squares method is shdwn in Figure 4-3. In Figure 4-3, Y; refers to the actual or observed sales revenue of $44 million as- sociated with the advertising expenditures of $10 million in the first year for which the data were collected (see Table 4-2). The ¥; (reads: ¥ hat sub 1) shown in the fig- ure is the corresponding sales revenues of the firm estimated from the regression line for the advertising expenditure of $10 million in the first year. The symbol e, in the figure is then the corresponding vertical deviation or error of the actual or observed sales revenue of the firm from the sales revenue estimated from the regression line in the first year. That is, a=%-%; (4-2) Errors of this type arise because (1) numerous explanatory variables with only slight or itregular effect on ¥ are not included in Equation 4-1, (2) there are possible errors of measurement in ¥, and (3) random human behavior leads to different results (say, different purchases of a commodity) under identical conditions. Since there are 10 observation points in Figure 4-3, we have 10 such vertical de- viations or errors. These are labeled e; to e19 in the figure. The regression line shown in Figure 4-3 is the line that best fits the data points in the sense that the sum of the squared (vertical) deviations from the line is minimum. That is, each of the 10 e val- ues is first squared and then summed. The regression line is the line for which the sum of these squared deviations is a minimum.3 How the values of 4 (the vertical in- tercept) and b (the slope coefficient) of the regression line that minimizes the sum of the squared deviations are actually obtained is shown next. The erors are squared before they are aed in order to avoid the cancellation oferors of equal size but opposite signs. Squaring ‘the errors also penalizes larger errors relatively more than smaller ones, 149 4g0| Pare2 Darnand Analyals y= 7.60 + 3.53% “The regression line shown in the figae is the line tha are vertical deviations of the points from the ine sa minim. Simple Regression Analysis 747 qa this section we examine how to (1) calculate the value of and the value of b (the slope coefficient) of the regression rameter; and (4) test for the overall explanatory power of these tasks are usually performed by the computers We wil sis is performed and what it entails. The Ordinary Least-Squares Method x - . ; : We have seen in the previous section that a regressio + best fits the data points in the sense thatthe sum ofthe f a (the vertical intercept) significance of parameter estimates; (3) construct confidence intervals for the true pa- hand at first with very simple numbers in order fo show exactly how regression analy mn line is the line that best fits th data points in the sense that the sum of the squared. deviations from the line is amit ‘Advertising (X} line; (2) conduct tests of the regression. While al II do these operations by Chapter 4 Demand Estimation | 181 imum. The objective of regression analysis is to obtain estimates of a (the vertical in- tercept) and b (the slope) of the regression line: ¥,=4+5x, (4-3) In Equation 4-3, ¥; is the estimate of the firm’s sales revenues in year 1 obtained from ¥ the regression line for the level of advertising in year (X,), and 4 and b are estimates of parameters a and b, respectively. The deviation of error (e,) of each observed sales revenue (¥,) from its corresponding value estimated from the regression line (P,) is then (4-4) ‘The sum of these squared errors or deviations can thus be expressed as Le = LG - f= Sy -a-bx,2 (4-5) ma a where D7; is the sum of all observations, from time period 1 = 1 to 1 = n. The esti- mated values of @ and b (thatis, @ and 6) are obtained by minimizing the sum of the squared deviations (i.e., by minimizing the value of Equation 4-5). The value of b is given by Ya - HH -¥) a 5 (4-6) YH, - ma || Where ¥ and X are the mean or average values of the Y, and the X,, respectively. The value of d is then obtained from 6% 47 ‘Table 4-3 shows the calculation to determine the values of d and é for the advertising sales data in Table 4-2. Substituting the values obtained from Table 4-3 into Equa- tion 4-6, we get the value of &: XM - HH -¥) 106 rai 3.533 "qe aes of and ae tin by ding the pail eva of Egon 4 wih pct and etn he esaing wo gormal equations equal to zero, and solving them simultaneously to obtain Equaion 46" [Se Domi Sata one Reagl, Theor and Problems of Sutisies and Econometrics ode. (New York: MG Hil, 2000 Chop ¢} 152; Part2 Demand Analysis wo Estimate Regression Line for Sales-Advertising Problem ee? ' x % \ Year Advertising __Sales x v-¥ 4-H) ¥F i 2 9 40 10 30 9 | i 3 n 2 8 8 1 | : a 12 26 4 0 0 5 n 48 2 2 1 6 2 32 2 ° ° | 7 B sa 4 a 1 | 8 3 58 8 8 1 9 4 56 6 2 4 0 15 10 30 9 By then substituting the value of 6 found above and the values of ¥ and X found in Table 4-3 into Equation 4-7, we get the value of 4: G@=¥ bX = 50 - 3.533(12) = 7.60 Thus, the equation of the regression line is ¥, = 7.60 + 3.53X; (48) ‘This regression line indicates that with zero advertising expenditures (ie., with X, = 0), the expected sales revenue of the firm (7) is $7.60 million. With advertising _ of $10 million as in the first observation year (i.e., with Xi = $10 million), ¥, = $7.60 “4 $3.53(10) = $42.90 million. On the other hand, with Xio = $15 million, P40 = $7.60 + $3,53(15) = $60.55 million. Plotting these last two points (10, 42.90) and (is, 60.55) and joining them by a straight line, we obtain the regression line plotted in Figure 4-3.5 The estimated regression line could also be used to estimate that the firm’s sales revenue with advertising expenditures of $16 million would be $7.60 + $3.53(16) = $64.08 million, or $3.53 million higher than with advertising expenditures of $15 million. Caution should, however, be exercised in using the regression line to esti- mate the sales revenue of the firm for advertising expenditures very different from those used in the estimation of the regression line itself. Strictly speaking, the re- gression line should be used only to estimate the sales revenues of the firm resulting from advertising expenditures that were within the range or that at least are near the advertising values that are used in the estimation of the regression line. Thus, not much confidence can usually be attached to the value of the estimated a coefficient, since this gives the sales revenues of the firm when advertising expenditures are zero | snot tht te eression ie goes through point = 12, F = 50, This is always te ease and will be useful im the analysis that follows. t Chapter 4 Demand Estimation (far off from the observed values). Because of this, we will concentrate our attention on the value of the 6, or slope, coefficient. The value of 6 measures the increase in the firm’s sales revenues resulting from each unit (in this case, each $1 million) in- crease in the advertising expenditures of the firm. That is, 6 = AY/AX. In the termi- nology of Chapter 2, 6 measures the marginal effect on Y (sales) from each unit change in X (advertising).® Regression analysis is based on a number of crucial assumptions. These are that the error term (1) is normally distributed, (2) has zero expected value or mean, and (3) has constant variance in each time period and for all values of X, and that (4) its value in one time period is unrelated to its value in any other period. These assump- tions are required so as to obtain unbiased estimates of the slope coefficient and to be able to utilize probability theory to test for the reliability of the estimates, How this is done is shown next. Tests of Significance of Parameter Estimates In the previous section, we estimated the slope coefficient (5) from one sample of the advertising-sales data of the firm. If we had used a different sample (say, data for a different 10-year period), we would have obtained a somewhat different estimate of b, The greater is the dispersion of (i.e., the more spread out are) the estimated values of b (that we would obtain if we were to actually run many regressions for different data samples), the smaller is the confidence that we have in our single estimated value of the b coefficient. To test the hypothesis that b is statistivally significant (ie., that advertising posi- tively affects sales), we need first of all to calculate the standard error (deviation) of 6. The standard error of 6 (si) is routinely provided as part of the computer printout of the regression analysis, but it is important to know how it is calculated and how it is used in tests of significance. The standard error of b is given by (za We sr: VEX, = XY?” (n= DECK, — XP ) where Y; and X, are the actual sample observations of the dependent and independent variables in year ¢, Y; is the value of the dependent variable in year f estimated from the regression line, X is the expected value or mean of the independent variable, e is the error term or ¥,~ ¥,, 1 is the number of observations or data points used in the es- timation of the regression line, and & is the number of estimated coefficients in the regression. The value of 1 ~ & is called the degrees of freedom (df). Since in simple regression analysis, we estimate two parameters, d and 6, the value of k is 2, and the degrees of freedom is n - 2. The value of sj, for our advertising-sales example can be calculated by substituting the values from Table 4-4 (an extension of Table 4-3) into Equation 4-9. In Table 4-4, ‘In terms of calculus, bis the derivative of ¥ with respect to X, or AVN 153 Part 2 Demand Analysis, Calculations to Estimate the Standard Error of 6 QQ . ® ay (8) (6) (7) x ¥ ye Vere (ho Von (4X? 10 44 42.90 1.10 1.2100 4 9 40 39.37 0.63 0.3969 9 n a2 4643 4.43 19.6249 1 2 46 4996 3.96 15.6816 ° 4 48 46.43 187 2.4649 1 12 32 49.96 204 4.1616 ° 3 34 53.49 031 0.2601 1, 3 58 53.49 4s) 20.3401 1 14 56 57.02 -1.02 1.0404 4 18 60 60.55 -0.55 0.3025 9 YX 120 L%= 500 Def = 65.4830 E(x -X)* = 30 X=12 ¥=50 ° the values of f, in column 4 are obtained by substituting the various advertising ex- penditures of column 2 into Equation 4-8. Column 5 is obtained by subtracting the values in column 4 from the corresponding values in column 3, column 6 is obtained _ by squaring the values in column 5, and column 7 is repeated from Table 4-3. Thus, the value of sis equal to Le? 65.4830 ae [ee | — = 90.2728 = 6° GqomEGL SF 7 \o— aye * 20778 = 2 Having obtained the value of sj, we next calculate the ratio bis. This is called the 1 statistic or t ratio, The higher this calculated t ratio is, the more confident we are that the true but unknown value of b that we are seeking is not equal to zero (i.e., that there is a significant relationship between advertising and sales). For our sales- advertising example, we have <= = 6.79 (4-10) 6 _ 353 5 052 In order to conduct an objective or significance test for 6, we compare the calcu- lated t ratio.to the critical value of the # distribution with n ~ k= 10-2 =8 df given by Table C:2 on page 610.” This f tt ofthe statistical significance of the estimated thet cstribuiGn is a bell-shaped, symmeticel istubution abou its zero mean thai later than he standard normal Heonion (Grerthe figures en pages 609 and 610 in Appendix Cat the end ofthe book) so that more of its area ls thin the tails. While there ee Eval oval srbton, teresa diferent distin foreach sample ssn. HoweveTas bests pe De” eae etcaches the standard normal distbaion unl, when > 3, they ae approximately equal. Thus fr lege ihe Sa oreetsignfeanc et wsng the normal distribution without concerning ourselves with degies of fem In SY ate weal ae of tm that an ested parameters ikely to be statically slgnfcat atthe 5 percent lee i Os alas ere et Fake ceticient seater han 2 Since inthis case we have 10 observations or dts points in our advenising ales ares eran oe ealimate two pramcters (2 and), the depres of feedom is n—K=10-2= Band we us he distbtion 1 on- duct our significance test. Chapter 4 Demand Estimation coefficient is usually performed at the 5 percent level of significance. Thus, we go down the column headed 0.05 (referring to 2.5 percent of the area or probability in each tail of the f distribution, for a total of 5 percent in both tails) in Table C-2 until we reach 8 df. This gives the critical value of = 2.306 for this two-tailed r test. Since our calculated value of 1 = 6.79 exceeds the tabular value of t = 2.306 for the 5 percent level of significance with 8 df, we reject the null hypothesis that there is no relationship between X (advertising) and Y (sales) and accept the alternative hypoth- esis that there is in fact a significant relationship between X and ¥. To say that there is a statistically significant relationship between X and Y at the 5 percent level means that we are 95 percent confident that such a relationship exists. In other words, there is less than I chance in 20 (ie., less than 5 percent chance) of being wrong, or ac- cepting the hypothesis that there is a significant relationship between X and Y, when in fact there isn’t. Bey Other Aspects of Significance Tests and Confidence Intervals : In the previous section we showed how to conduct statistical tests to show that the slope coefficient is different from zero at the 5 percent level of significance. Other tests of significance are possible as well. For example, we can construct confidence intervals for the true parameter from the estimated coefficient. Moreover, we could test the hypothesis that the slope coefficient is different from zero at the | percent level of significance rather than at the 5 percent level. In that case, we would be allowing for only 1 chance in 100 of being wrong (ice, of accept- ing the alternative hypothesis that there is a relationship between X and Y when in fact no such relationship exists). To test the hypothesis at the 1 percent level, we go down the column headed 0.01 in Table C-2 until once again we reach 8 df. The crit- ical value of t that we get from the f table is 3.355. Since the calculated ¢ value of 6.79 exceeds this critical tabular value, we accept the hypothesis that there is in fact a sig- nificant relationship between X and Y at the 1 percent level also. While tests of significance are sometimes conducted at the 1 percent or even at the 10 percent level of significance, it is more common to use the 5 percent level. Note also that the greater is the number of the degrees of freedom (i.e., the greater is the number of observations or data points in relation to the number of estimated param- eters in the regression analysis), the smaller are the critical ¢ values in Table C-2 re- gardless of the level of significance that we choose. Therefore, the greater the num- ber of the degrees of freedom, the more likely it is to accept the hypothesis that a Statistically significant relationship exists between the independent variable(s) and the dependent variable. Note that tests of significance are not usually conducted for the coefficient (the vertical intercept), since this coefficient usually has little or no significance. Also note that in our presentation, we have tested only the hypothesis that b is significantly lass CO 156 | Part 2 Demand Analysis different from zero. Since 6 can be significantly’ different from zero by being either negative of positive, we conducted a two-tailed test, That is, we allowed for the pos- sibility of being significantly positive or significantly negative and examined areas (probabilities) under the f.distribution in both tails. We could also test, however, the hypothesis that b is larger or smaller than some specified value. In those cases, we would conduct a single-tailed test and examine the area (probability) that the value of 6 falls only in the right or in the left tail of the ¢ distribution (and look under the column headed 0.10 for the 5 percent test). “The above concepts can also be used to determine confidence intervals for the true b coefficient. Thus, using the tabular value of 1 = 2.306 for the 5 percent level of significance (2.5 percent in each tail) and 8 df in our advertising-sales example, we can say that we are 95 percent confident that the true value of b will be between 5 = 2.306 (si) 3.53 + 2.306(0.52) 3.53 + 1.20 ‘That is, we are 95 percent confident that the true value of b lies between 2.33 and 4.73. Similarly, we can say that we are 99 percent confident that the true value of b will be between 3.53 + 3.355(0.52), or 1.79 and 5.27 (the value of t = 3.355 is ‘ob- tained by going down the column headed 0.01 in Table C-2 until we reach 8 df). Test of Goodness of Fit and Correlation Besides testing for the statistical significance of a particular estimated parameter, we can also test for the overall explanatory power of the entire regression. This is ac- complished by calculating the coefficient of determination, which is usually denoted by R2. The coefficient of determination (R®) is defined as the proportion of the total variation or dispersion in the dependent variable (about its mean) that is explained by the variation in the independent or explanatory variable(s) in the regression. In terms of our advertising-sales example, R? measures how much of the variation in the firm’s sales is explained by the variation in its advertising expenditures. The closer the ob- served data points fall to the regression line, the greater is the proportion of the vari- ation in the firm’s sales explained by the variation in its advertising expenditures, and the ‘arger is the value of the coefficient of determination, or R. We can calculate the coefficient of determination (R?) by defining the total, the ex- plained, and the unexplained or residual variation in the dependent variable, ¥. The fotal variation in Y can be measured by squaring the deviation of each observed value of Y from its mean and then summing. That is, Total variation in ¥ = Y(% - ¥)? (4-11) wt - Chapter 4 Demand Estimation Regression analysis breaks up this total variation in Y into two components: the variation in Y that is explained by the independent variable (X) and the unexplained or residual variation in ¥. The explained variation in Y is given by Equation 4-12: + Explained variation in ¥ =} (%; - ¥)? (4-12) ia The values of fin Equation 4-12 are obtained by substituting the various observed values of X (the independent variable) into the estimated regression equation. The mean of ¥ (P) is then subtracted from each of the estimsted values of ¥, (¥)). As in- dicated by Equation 4-12, these differences are then squared and added to get the ex- plained variation in ¥. Finally, the unexplained variation in Y is given by Equation 4-13: Unexplained variation in ¥= J (% — ¥,)? (4-13) =m That is, the unexplained or residual variation in Y is obtained by first subtracting from each observed value of Y the corresponding estimated value of 7, and then squaring and summing. Summarizing, we have Total variation = explained variation + unexplained variation Ld) - Fy =Lh- FP + D-H (4-14) This breakdown of the total variation in ¥ into the explained and the unexplained variation is shown in Figure 4-4 for one particular observation or data point for our advertising-sales example. Now, the coefficient of determination, R?, is defined as the ratio of the explained variation in Y to the total variation in ¥. That is, po = &xPlained variation in Y _ EC 4-15) total variation in Y =, - Y)}? an) If all the data points were to fall on the regression line (a most unusual occurrence), all the variation in the dependent variable (Y) would be explained by the variation in the independent or explanatory variable (X), and R? would be equal to 1 or 100 per- cent. At the opposite extreme, if none of the variation in ¥ were explained by the vari- ation in X, R? would be equal to zero: Thus, the value of R? can assume any value ffom 0 to 1. While the coefficient of determination is also routinely provided in the computer printout of the regression analysis, we will now show how to actually calculate R? for our advertising-saleg problem. The calculations are shown in Table 4-5, From the bottom of column 4, we see that the total variation in Y (sales) is $440 million. The * 158| Pare Demand Analysis Total, Explained, and Residual Variation Sales (Y) Y= 7.60 + 3.53%, 60 58 55 Y= 50 45 40 ‘Advertising (X) 0 9 10 4 12 18 14 15 ‘The total variation in the dependent variable, 5 (¥; ~ Y)?, is equal to the explained variation, © (fs — Fp, plus Te sactnained or residual variation, 5 (P,~ }. For the observation (X 13, Y= 58), Yi> 7 = 88-5 8 Y= $349 30-= 349, and ¥,~ 2) = 4351 (F,= 53.49 is the estimated value of ¥, for X= in the fourth column of ‘Table 4-4). read Calculations to Estimate the Coefficient of Determination (R?) @) @), ay © © 7, (8) Ye yet eh (Y= V9 (iY? a4 6 36 4290-710 $0.4100 1.2100 40 -10 100 3937-1063 1129969 (0.3969 2 3 64 4643-357 ——«*12.7449——*19.6249 46 -4 16 49.96 -0.04 0.0016 15.6816 48 2 4 4643-357 (12.7449 2.4649 52 2 4 49.96 -0.04 0.0016 = 41616 54 4 349 -349—«12.1801 0.2601 38 8 53.49 3.49«12.1801 20.3401 56 6 5702 7.02 «49.2804 1.0404 60 10 605s 1055 141.3025 0.3025 we YF LM HP = 373.8430 = 65.4830 <:_ Chapter 4 Demand Estimation explained variation is $373%84 million; as shown at the bottom of column 7. Thus, the coefficient of determination for our advertising-sales problem is = $373.84 . $440 ‘This means that 85 percent of the total variation in the firm’s sales is accounted for by the variation in the firm’s advertising expenditures. The last column of Table 4-5 gives the unexplained variation in Y (and has been copied from column 6 of Table 4-4), The unexplained variation in Y for our adverifs- ing-sales example is $65.48 million. The sum of the explained and unexplained vari- ation in Y ($373.84 + $65.48 = $439.32) is equal to the total variation in Y ($440), except for rounding errors. ‘Two final things must be pointed out with respect to the coefficient of determina- tion. The first is that in simple regression analysis the square root of the coefficient of determination (R?) is the (absolute value of the) coefficient of correlation, which is denoted by x That is, R? = 0.85 ray (4-16) This is simply a measure of the degree of association or covariation that exists be- tween variables X and ¥. For our advertising-sales example, r= VR? = (085 = 0.92 This means that variables X and Y vary together 92 percent of the time. The coeffi- Cient of correlation ranges in value between ~ 1 (if all the sample observation points fall on a negatively sloped straight line) and 1 (for perfect positive linear correlation). It should be noted that the sign of the coefficient of correlation (r) is always the same as the sign of the estimated slope coefficient (6). As opposed to regression analysis, which implies that the variation in ¥ results from the variation in X, correlation analysis measures only the degree of association oF covariation between the two variables, without any implication of causality or de- pendence. In short, we can find the correlation coefficient between any two variables, but we run a regression analysis only if we believe that the variation in one variable (the independent variable, X) affects or somehow results in some variation in ¥ (the: dependent variable). This brings us to the second point. That is, although regression analysis implies causality (Le., that the variation in X causes the variation in ¥), only theory can tell us if we can expect the variation in X to result in the variation in ¥. In fact, it is pos- sible that a high coefficient of determination (and correlation) between X and Y may be due to some other factor that affects both X and ¥, which is not included in the re- gression analysis. For example, expenditures on food and housing may both depend ‘Note that once we obsin two ofthe tre valves ofthe toa, explained, and unexplained variation in we ean obtain the remaininy* ‘measure simply by subtraction, | 159 | | | | | 160 | Pare 2 Demand Analysis ‘on the level of consumers’ income rather than on each other. In such a case, we would simply say that there is correlation or covariation between X and ¥ without identify- ing one variable (X) as the independent or explanatory variable. Multiple Regression Analysis > (457 ‘We now extend the simple regression model to multiple regression analysis. We will show how to estimate the regression parameters, how to conduct tests of their statis- tical significance, and how to measure and test the overall explanatory power of the entire regression. The Multiple Regression Model When the dependent variable that we seek to explain is hypothesized to depend on more than one independent or explanatory variable, we have multiple regression analysis. For example, the firm’s sales revenue may be postulated to depend not only on the firm’s advertising expenditures (as examined in Section 4-4) but also on its ex- penditures on quality control. The regression model can then be written as Y=a+biXi + brX2 (4-17) where Y is the dependent variable referring to the firm’s sales revenue, X; refers to the firm’s advertising expenditures, and Xp refers to its expenditures on quality con- trol. The coefficients a, by, and b2 are the parameters to be estimated. ‘The a coefficient is the constant or vertical intercept and gives the value of Y when both X and Xz are equal to zero. On the other hand, b, and by are the slope coeffi- cients, They measure the change in-Y per unit change of X; and X2, respectively. Specifically, by measures the change in sales (Y) per unit change in advertising ex- penditures (X;), while holding quality-control expenditures (Xz) constant. Similarly, by measures the change in Y per unit change in Xz while holding X; constant. That is, by = AY/AX;, while by = AY/AX.9 In our sales-advertising and quality-control prob- lem we postulate that both b; and b2 are positive, or that the firm can increase its sales by increasing its expenditures for advertising and quality control. The model can also be generalized to any number of independent or explanatory variables (’), as indicated in Equation 4-18: Weg 4 biXy + baXat: © + DAXE (4-18) ‘Th: only assumptions made in multiple regression analysis in addition to those made for simple regression analysis are that the number of independent or explana- tory variables in the regression be smaller than the number of observations and that “there be no perfect linear eorrelation among the independent variables.'° ‘tn terms of calculus, b; = 3Y/AX;, while by = 2¥/AX3. Thus, by and by are often refered to asthe “partial regression ” Mf the somber of independent o explanatory variables (te X's) is equa to or larger than tbe numberof observations, or if there is a vas near relatonslip among some oral ofthe independent or explanatory variables the regression equation cannot be estimated. Chapter 4 Demand Estimation | 161 ‘Yearly Expenditures on Advertising and Quality crear Control, and Sales of the Firm (in millions of dollars) Year (t) Advertising (X) WwoR Wo RB BoB Quality control (%3) 43 3 4 $ 6 7 Sales revenue (Y) 42 46 48 s2 sa 58 Computer Results of Regression of Yon X; and Xp SMPL1-10 10 Observations 1S // Dependent variable is ¥ Coefficient Standard Error T Statistic 17.9437 5.91914 3.03147 1.87324 0.70334 2.66335, 1.91549 0.68101 2.81272 Rsquared 0.930154 Mean of dependent var 50,0000 ‘Adjusted R squared 0.910198 SD of dependent var 6.992061 SE of regression 2.095311 Sum of squared resid 30.73242 Durbin-Watson seat 1.841400 F statistic 46.6100 Log likelihood -19,80301 The process of estimating the parameters or coefficients of a multiple regression equation is, in principle, the same as in simple regression analysis, but since the cal- culations are much more complex and time-consuming, they are invariably done with computers. The computer also provides routinely the standard error of the estimates, the ¢ statistics, the coefficient of multiple determination, and several other important Statistics that are used to conduct other statistical tests of the results (to be examined later), All that is required is to be able to set up the regression analysis, feed the data into the computer, and interpret the results, For example, if we regress the firm’s sales (¥) on its expenditures for advertising (Xi) and quality control (X2) using the data in Table 4-6 (an extension of Table 4-2), we obtain the results given in Table 4-7.!! From the results shown in Table 4-7, we can write the following regression equation: - ¥, = 17.944 + 1.873Xy + 1.915Xy (4-19) tstatistic (2.663) (2,813) Se Teuls sven in Table 4-7 are inthe form provided by a standard computer program (TSP). Other compuer programs (such 25 ‘SPSS, EViews, and RATS) usually provide the same general information ina similar formal. Different compuler programs, however, usually give slightly differed results because of differences in rounding. 162| ta? Demand Analysis These results indicate that for each $1 million increase in expenditures on adver- tising and quality control, the sales of the firm increase by $1.87 million (the esti- mated coefficient of X1) and $1.92 million (the estimated coefficient of Xp), respec- tively. To perform t tests for the statistical significance of the estimated parameters or coefficients, we need to determine the critical value of 1 from the table of the distribution. At the 0.05 level of significance for n — k= 10-3 =7 df (where k is the number of estimated parameters, including the constant term), this is 2.365, ob- tained by going down the column headed 0.05 in Table C-2 (for the two-tailed test with 2.5 percent of the area under each tail of the ¢ distribution) until we reach 7 df. i Since the value of the-calculated ¢ statistic exceeds the critical t value of 2.365, we conelude that both parameters are statistically different from zero at the 5 percent Yevel of Spitificance.! | The Coefficient of Determination and Adjusted R? {As in simple regression analysis, the coefficient of determination measures the pro- portion of the total variation in the dependent variable that is explained by the varia- von in the independent or explanatory variables in the regression. From Table 4-7, ‘we see that for our example the coefficient of determination, or R?, is 0.93. This means that variation in the firm’s expenditures on advertising and quality control ex- plain 93 percent of the variation in the firms sales revenues. This is larger than the FR of 0.85 that we obtained for the simple regression of sales on advertising expen- ditures alone that we found on page 159. This was to be expected. That is, as more relevant independent or explanatory variables are included in the regression, we gen- erally expect a larger proportion of the total variation in the dependent variable to be “explained.” However, in order to take into consideration that the number of degrees ofTree- dom declines as additional independent or explanatory variables are included in the | regression, we calculate the adjusted Ry (R?) as mw a1-a-P(24) (4-20) | where n is the number of observations or sample data points and k is the number of parameters or coefficients estimated. For example, in the regression analysis of ¥ on X and Xo, n= 10,n- k= 10~3=7, and R= 0.930154. Substituting these values into Equation 4-20, we get the value of R? = 0.910198 (the same a in the compute i printout given by Table 4-7). This means that when due consideration is given to the ; rect that including the firm’s expenditures on quality control as an additional ex: _planatory variable inthe regression reduces the degrees of freedom, the proportion 0 see exe Sts wee rovded by te computer pita we hana hon ot A ey SS Ne ca Sn oto ae be san). We wil flow ths post be ek oFOe sess otherwise indicated. Chapter 4 Demand Estimation the total variation in sales explained by the regression is 91 percent rather than 93 percept. This is still larger than the 85 percent explained by advertising as the sin- gle independent variable in the simple regression. 13 The inclusion of expenditures on quality control in the regression analysis also leads to a very different value of by (the estimated coefficient of advertising expen- ditures in the multiple regression), as compared to the value of 6 (the estimated co- efficient of advertising expenditures in the simple regression). The value of 6 was found to be 3.53 in Equation 4-8, while the value of bis 1.87 in Equation 4-19. Thus, Omission of an important explanatory variable (expenditures for quality control, in this case) from the simple regression gives biased results for the estimated slope co- efficient. Specifically, simple regression analysis attributes a much greater influence of advertising on sales tho warranted. In other words, advertising gets credited for some of the influence on sales that is, in fact, due to expenditures on quality control. ‘Thus, it is crucial to include in the regression analysis all important independent ot explanatory variables. : . Analysis of Variance The overall explanatory power of the entire regression can be tested with the analy- sis of variance. This uses the value of the F Statistic, or F ratio, which is also pro- vided by. the computer printout. Specifically, the F statistic is used to test the hy- Pothesis that the variation in the independent variables (the X's) explains a significant Proportion of the variation in the dependent variablé (¥). Thus, we can use the F sta- tistic to test the null hypothesis that all the regression coeficients we bapal Gis against i are not all equal to zero. The value of the F statistic is given by p = 2xPlained variation/(k - 1) total variation/(n — k) where, as usual, n is the number of observations and kis the nurnber of estimated pa- rameters or coefficients in the regression. It is because the F Statistic is the ratio of two variances that this test is often referred to as the “analysis of variance.” The F statistic can also be calculated in terms of the Coefficient of determination as follows: RUk-1) G-R@-—_ Using the values of R? = 0.930154, n = 10, and & = 3 for our example, we obtain F =,46.61, the same value as in the computer printout in Table 4-7, To conduct the F test or analysis of variance, we compare the calculated or regres- sion value of the F statistic with a critical value from the table of the F distribution. (4-21) F= (4-22) "Note that the ajstment (reduction) into obtain Ris amaller the large the value of miss elaton tok Hg

You might also like