You are on page 1of 21

STUDY QUESTIONS on LOGARITHMS in REGRESSION.

All of the questions deal with logarithms to the base e, not base 10 (although many of the properties are true of ba 1 Show that log(1000) = log(10) + log(100). Why should this be true? 2 Show that log(100) = log(1000) - log(10). Why should this be true? 3 Show that log(1000) = 3*log(10). Why should this be true? 4 Show that log(.001) = -3*log(10). Why should this be true? 5 Show that antilog(5) = antilog(2)*antilog(3). 6 Show that antilog(4) = antilog(5)*antilog(-1). 7 Show that log(1+ 0.09) is approximately 0.09. 8 Show that log(1 - 0.09) is approximately -0.09. 9 Show that exp(.09) is approximately 1.09. 10 Show that exp(-.09) is approximately 0.91. 11 Suppose that $100 grows at a compounded annual interest rate of 6%. What will the principal be at the end of 12 years? 12 Show that your answer for the previous Q is approximately 100*exp(0.72). Why should this be true? 13 How do the previous two questions illustrate the "Rule of 72"?

14 15 16 17 18 19 20

The Moore Data worksheet in this workbook shows the growth of processor chip computing power over time. It is the empirical basis of a computer industry rule called Moore's Law, named after Gordon Moore, one of the fou Consider the problem of trying to extrapolate Moore's Law into the future in order to estimate the power of future p What is the response variable and what is the predictor variable? Are these data cross-sectional or time series? Are these data numerical or categorical? Are these data discrete or continuous? What do we mean by "these data"? Just the power? Just the year? Both? None of the above? What is the population for these data? Could these data be both a random sample and a regression process? Consider the issue of whether these data are a regression process. Assume the population includes all capacities that could have been introduced, and that the data are a random sample from this population. (The latter might be a stretch.) Run the appropriate regression, using all of the data in cols A and B, and answer the following: What must we check to verify whether these data are a regression process? Why is it important to check these conditions? What difference does it make if the "N" assumption is false? State the hypothesis for the "L" assumption. Qualitatively assess the "L" assumption. Suppose POWER = A*exp(B*YEAR). Is this a linear relationship between POWER and YEAR? How can it be made linear? Run a regression of log(POWER) on YEAR and answer the following questions: Qualitatively assess the "L" assumption. Supposing LHI true, use the linear regression model to estimate POWER in the year 2000. Supposing LHI, what would an average margin of error be for your estimate in the preceding question? Supposing LHI, use the log model to estimate POWER in the year 2000. Supposing LHI, what would an average margin of error be for your estimate in the preceding question? How reasonable do you think the estimates are in Q29 and Q31? If the log model were true, about how long would it take processor power to double? What major criticism could be made of Moore's Law on statistical grounds?

21 22 23 24 25 26 27

28 29 30 31 32 33 34 35

Omit 1980 and its datum. Then re-run a regression of log(POWER) on YEAR and answer the following questions: 36 State the hypothesis for the "H" assumption. 37 Quantitatively assess the "H" assumption. Use significance level = .05.

38 39 41 42 43 44 45 46 47 48

State the hypothesis for the "I" assumption. Quantitatively assess the "I" assumption. Use significance level = .05. State the hypothesis for the "N" assumption. Quantitatively assess the "N" assumption. Use significance level = .05. Evaluate the "L" assumption. Supposing LHI, use the new log model to estimate POWER in the year 2000. Supposing LHI, what would be an average margin of error for your estimate? If the new log model were true, about how long would it take processor power to double? How can we interpret the meaning of the "standard error of the estimate" (0.1795)? Supposing LHIN, provide a 90% confidence interval for your estimate in Q44.

of the properties are true of base 10, also).

ipal be at the end of 12 years?

mputing power over time. Gordon Moore, one of the founders of Intel. estimate the power of future processors.

e following:

swer the following questions:

Answers to Logarithm Questions

STUDY QUESTIONS on LOGARITHMS in REGRESSION. All of the questions deal with logarithms to the base e, not base 10 (although many of the properties are true of ba 1 Show that log(1000) = log(10) + log(100). Why should this be true? ln(1000) = 6.907755279 ln(10)= 2.302585093 ln(100)= 4.605170186 Sum = 6.907755279 It is true because logs convert multiplication into addition: log(10*100) = log(10) + log(100) 2 Show that log(100) = log(1000) - log(10). Why should this be true? ln(100)= 4.605170186 ln(1000) = 6.907755279 ln(10)= 2.302585093 Difference = 4.605170186 It is true because logs convert division into subtraction: log(1000/10) = log(1000) - log(10) 3 Show that log(1000) = 3*log(10). Why should this be true? ln(1000) = 6.907755279 ln(10) = 2.302585093 3*ln(10) = 6.907755279 It is true because logs convert exponentiation into multiplication: log(1000) = log(10^3) = 3*log(10) 4 Show that log(.001) = -3*log(10). Why should this be true? ln(.001) = -6.907755279 ln(10) = 2.302585093 -3*ln(10) = -6.907755279 It is true because logs convert exponentiation into multiplication: log(.001) = log(10^-3) = -3*log(10) 5 Show that antilog(5) = antilog(2)*antilog(3). antilog(5) = exp(5) = 148.4131591 antilog(2) = exp(2) = 7.389056099 antilog(3) = exp(3) = 20.08553692 Product = 148.4131591 6 Show that antilog(4) = antilog(5)*antilog(-1). antilog(4) = exp(4) = 54.59815003 antilog(5) = exp(5) = 148.4131591 antilog(-1) = exp(-1) = 0.367879441 Product = 54.59815003 7 Show that log(1+ 0.09) is approximately 0.09. log(1 + .09) = 0.086177696 8 Show that log(1 - 0.09) is approximately -0.09. log(1 - .09) = -0.094310679 9 Show that exp(.09) is approximately 1.09. exp(.09) = 1.094174284 10 Show that exp(-.09) is approximately 0.91. exp(-.09) = 0.913931185 11 Suppose that $100 grows at a compounded annual interest rate of 6%. What will the principal be at the end of 12 years? 100*(1 + .06)^12 = 201.2196472 12 Show that your answer for the previous Q is approximately 100*exp(0.72). Why should this be true? 100*exp(.72) = 205.4433211 It is true because 1+.06 is approximately exp(.06) [see Q7], so 100*(1+.06)^12 = approx 100*exp(.06)^12 = 100*exp(.72) 13 How do the previous two questions illustrate the "Rule of 72"? In 12 years, the principal doubled at 6%. The "Rule of 72" says doubling time * compounding rate = approx 72.

The Moore Data worksheet in this workbook shows the growth of processor chip computing power over time. It is the empirical basis of a computer industry rule called Moore's Law, named after Gordon Moore, one of the fou Consider the problem of trying to extrapolate Moore's Law into the future in order to estimate the power of future p 14 What is the response variable and what is the predictor variable? Response = POWER; predictor = YEAR. 15 Are these data cross-sectional or time series? Time series 16 Are these data numerical or categorical?

Page 4

Answers to Logarithm Questions

Numerical 17 Are these data discrete or continuous? Discrete (processor capacities are certain powers of 2, here rounded off in usual "k" notation) 18 What do we mean by "these data"? Just the power? Just the year? Both? None of the above? The set of paired values (YEAR, POWER). 19 What is the population for these data? From one point of view the data are the population. However, we can imagine that there were alternative capacities that could have been introduced in the given years, or in other years, but were not. In the latter case, the population includes those alternatives, as it does for the future. 20 Could these data be both a random sample and a regression process? Yes, the assumptions of regression process and RS are not mutually exclusive.

21 22 23

24 25

Consider the issue of whether these data are a regression process. Assume the population includes all capacities that could have been introduced, and that the data are a random sample from this population. (The latter might be a stretch.) Run the appropriate regression, using all of the data in cols A and B, and answer the following: What must we check to verify whether these data are a regression process? The residuals must be level; the residuals must be homoscedastic; the residuals must be independent. Why is it important to check these conditions? If the residuals are LHI, then we can rest assured that the statistical procedures that have been developed for regression w What difference does it make if the "N" assumption is false? If the sample size is sufficiently large (n=6 here is not OK), the N assumption is required only to use the normsdist function for confidence intervals and hypothesis tests about individual years. Without N, we can still count cases for those purposes, but there are not enough cases here (6) for more than a few confid If n is sufficiently large, N is not necessary for inferences about the slope or intercept, the correlation or R-square, or the F test, or estimating mean capacity for a given year (although the concept of mean capa But n is not sufficiently large for any of these inferences. So testing N is critical. State the hypothesis for the "L" assumption. The residuals are level. Qualitatively assess the "L" assumption. L fails: The residual plot shows large, patterned residuals. The plot of POWER vs YEAR, below, is also clearly not linear.

Residuals vs Fits
30,000 22,500 15,000 7,500 0 -7,500 -15,000 -22,500 -20,000 -10,000
70,000 60,000 50,000

Databit Pow

Residuals

40,000
30,000 20,000 10,000 0 1975

10,000

20,000

30,000

40,000

Fits

26 Suppose POWER = A*exp(B*YEAR). Is this a linear relationship between POWER and YEAR? No. 27 How can it be made linear? By taking logs of both sides: log(POWER) = log(A) + B*YEAR. Then the "new" Y becomes log(POWER).

Page 5

Answers to Logarithm Questions

Run a regression of log(POWER) on YEAR and answer the following questions: 28 Qualitatively assess the "L" assumption. L fails: The residual plot shows large, patterned residuals. However, the plot of log(POWER) vs YEAR, below, looks rather

Residuals vs Fits
12.00

log(Databit Po

0.60 0.40 0.20 0.00 -0.20 -0.40 -0.60 2.00

10.00 8.00 6.00 4.00 2.00

Residuals

4.00

6.00

8.00

10.00

12.00

Fits

0.00 1975

29 Supposing LHI true, use the linear regression model to estimate POWER in the year 2000. Estimate = -6,530,671.50 + 3,291.37 * 2000 = 30 Supposing LHI, what would an average margin of error be for your estimate in the preceding question? +/19,597.62 31 Supposing LHI, use the log model to estimate POWER in the year 2000. Log(Est)= -919.78 + 0.4664 * 2,000.00 = So estimate = exp(log(Est)) = 430,262.11 32 Supposing LHI, what would an average margin of error be for your estimate in the preceding question? For log(Est), +/0.461560488 So for estimate, approximately +/58.65% More precisely, lower limit of margin of error = 271,194 and upper limit = 682,631 33 How reasonable do you think the estimates are in Q29 and Q31? The linear estimate is out-of-date, already having been achieved in about 1993. The log estimate is also too low given that the residuals tend to be high in recent years (U-shaped residual plot). For example, the estimate of POWER for 1995 = 41,784 , which is too low. 34 If the log model were true, about how long would it take processor power to double? Starting from power level P = exp(-919.78+0.4664*YEAR), after n more years we would achieve power level 2P = exp(-919 Dividing the latter equation by 2 and equating the result to the former, we have exp(-919.78+0.4664*YEAR) = 0.5*exp(-919 Simplifying by canceling common terms on both sides, we have 1 = 0.5*exp(0.4664*n). Taking logs of both sides, we have Hence, n= 1.486239942 This is Moore's Law: that processor power doubles about every year and a half (18 months), on 35 What major criticism could be made of Moore's Law on statistical grounds? The log model fits only crudely. The residual plot shows that Moore's Law underestimated gains both early on and recently and overestimated gains in the middle period. There are large uncertainties in its estimates: on average about 50-60% (st.

Omit 1980 and its datum. Then re-run a regression of log(POWER) on YEAR and answer the following questions: 36 State the hypothesis for the "H" assumption. The residuals are homoscedastic. 37 Quantitatively assess the "H" assumption. Use significance level = .05. The White test p-value is 0.807998 , which is above all reasonable significance levels. So accept H 38 State the hypothesis for the "I" assumption. The residuals are independent. 39 Quantitatively assess the "I" assumption. Use significance level = .05. The p-value for the autocorrelation test is 0.081768782 > 0.05 so accept hypothesis of inde

Page 6

Answers to Logarithm Questions

41 State the hypothesis for the "N" assumption. The residuals are normally distributed. 42 Quantitatively assess the "N" assumption. Use significance level = .05. The Lilliefors p-value is > 0.15 > 0.05 so accept N. 43 Evaluate the "L" assumption. Visually, the plot of residuals vs fits does not look good for L. However, there are only 5 points. So this degree of "smiley" might be due to chance variation. That can be tested quantitatively by fitting the equation RESIDUAL = A + B1*Fit + B2*Fit^2 and testing Ho: B2 = 0. The result is that Ho is accepted. So L is accepted. BTW, when the same quantitative test is applied in Q28 (including the 1980 data), Ho (L) is rejected.

Residuals vs Fits
0.18 0.12

Residuals

0.06 0.00 -0.06 -0.12 -0.18 -0.24 5 6 7 8 9 10 11

Fits

44 Supposing LHI, use the new log model to estimate POWER in the year 2000. Log(Est)= -1,086.12 + 0.5499 * 2,000.00 = So estimate = exp(log(Est)) = 880,225.96 45 Supposing LHI, what would be an average margin of error for your estimate? For log(Est), +/0.179463282 So for estimate, approximately +/19.66% More precisely, lower limit of margin of error = 735,621 and upper limit = 1,053,256 Notice that the margin of error is much reduced from the log model (with 1980 data) in Q28. 46 If the new log model were true, about how long would it take processor power to double? Starting from power level P = exp(-1086.12+0.5499*YEAR), after n more years we would achieve power level 2P = exp(-10 Dividing the latter equation by 2 and equating the result to the former, we have exp(-1086.12+0.5499*YEAR) = 0.5*exp(-10 Simplifying by canceling common terms on both sides, we have 1 = 0.5*exp(0.5499*n). Taking logs of both sides, we have Hence, n= 1.260485316 or about every 15 months. 47 How can we interpret the meaning of the "standard error of the estimate" (0.1795)? As the approximate +/- percentage margin of error. 48 Supposing LHIN, provide a 90% confidence interval for your estimate in Q44. In log scale, lower limit = 13.39271682 In log scale, upper limit = In POWER scale, lower limit = 655,213.77 In POWER scale, upper limit = (Actually, even with "N", the T distribution should be used because of the small sample size. So better answer is In POWER scale, lower limit = 576,996.57 In POWER scale, upper limit =

Page 7

Answers to Logarithm Questions

properties are true of base 10, also).

e at the end of 12 years?

p(.06)^12 = 100*exp(.72)

e = approx 72.

ing power over time. on Moore, one of the founders of Intel. mate the power of future processors.

Page 8

Answers to Logarithm Questions

ternative capacities

developed for regression will be valid. Otherwise, those procedures may not be valid.

use the normsdist function

for more than a few confidence levels -- and not a very good estimate for those. the concept of mean capacity for a year may not make sense).

is also clearly not linear.

Databit Power per Chip (in 1000s)

1975

1980

1985

1990

1995

2000

Page 9

Answers to Logarithm Questions

YEAR, below, looks rather linear.

log(Databit Power per Chip - in K)

1980

1985

1990

1995

2000

52071.17578

12.97

d residual plot).

power level 2P = exp(-919.78+0.4664*(YEAR+n)). 664*YEAR) = 0.5*exp(-919.78+0.4664*(YEAR+n)). ogs of both sides, we have log(2) = 0.4664*n. and a half (18 months), on average.

both early on and recently, average about 50-60% (st. dev of residuals in log scale = 0.46156).

the following questions:

icance levels. So accept H.

o accept hypothesis of independence.

Page 10

Answers to Logarithm Questions

esting Ho: B2 = 0.

13.69

e power level 2P = exp(-1086.12+0.5499*(YEAR+n)). 5499*YEAR) = 0.5*exp(-1086.12+0.5499*(YEAR+n)). ogs of both sides, we have log(2) = 0.5499*n.

13.98315102 1,182,511.38 1,342,811.69

better answer is

Page 11

Databit_power_per_chip (in 1000s) Year 1980 1985 1988 1990 1993 1995 Power 64 256 1,000 4,000 16,000 64,000

Simple linear regression of POWER on YEAR. Databit_power_per_chip (in 1000s) Year 1980 1985 1988 1990 1993 1995 Power Fitted Values 64 -13756.656 256 2700.201 1,000 12574.314 4,000 19157.057 16,000 29031.171 64,000 35613.913 Residuals 13820.656 -2444.201 -11574.314 -15157.057 -13031.171 28386.087 Results of multiple regression for Power Summary measures Multiple R R-Square Adj R-Square StErr of Est ANOVA Table Source Explained Unexplained Regression coefficients Constant Year White test results: n= White stat = p-value = Autocorrelations for Residuals Lag 1

Databit Power per Chip (in 1000s)


70,000 60,000 50,000 40,000 30,000 20,000

10,000
0 1975 1980 1985 1990 1995 2000

30,000 22,500 15,000 7,500 0 -7,500 -15,000 -22,500 -20,000

Residuals

of multiple regression for Power

y measures 0.7164 0.5132 0.3915 19,597.62

df 1 4

SS 1,619,552,096 1,536,267,136

MS 1,619,552,096 384,066,784

F 4.2169

p-value 0.1093

ion coefficients Coefficient -6,530,671.50 3,291.37 Std Err 3,187,203.00 1,602.81 t-value -2.0490 2.0535 Lilliefors test results n= Lilliefors stat = Approx p-value p-value 0.1098 0.1093

hite test results: 6 2.199 0.138

6 0.245 > 0.15

relations for Residuals Autocorr -0.0016

StErr 0.4082

Residuals vs Fits

-20,000 -10,000

10,000

20,000

30,000

40,000

Fits

Regress log(POWER) on YEAR. Databit_power_per_chip (in 1000s) Year 1980 1985 1988 1990 1993 1995 Power logPower Fitted Values 64 4.15888 3.645 256 5.54518 5.976 1,000 6.90776 7.376 4,000 8.29405 8.308 16,000 9.68034 9.708 64,000 11.06664 10.640 Residuals 0.514 -0.431 -0.468 -0.014 -0.027 0.426 Results of multiple regression for logPower Summary measures Multiple R R-Square Adj R-Square StErr of Est ANOVA Table Source Explained Unexplained

0.9871 0.9745 0.9681 0.4616

df 1 4

Regression coefficients Coefficient Constant -919.7806 Year 0.4664 White test results: n= White stat = p-value =

6 2.069 0.150

Autocorrelations for Residuals Lag Autocorr 1 -0.0288

Residuals vs
0.60 0.40

Residuals

0.20 0.00 -0.20 -0.40 -0.60 2.00 4.00

Residuals in Tim
0.60

0.60 0.40

Residuals

0.20 0.00 -0.20 -0.40 -0.60 1980 1985

egression for logPower 70,000 60,000 50,000 40,000 SS 32.5173 0.8522 MS 32.5173 0.2130 F 152.6360 p-value 0.0002 30,000 20,000 10,000 Std Err 75.0646 0.0377 t-value -12.2532 12.3546 p-value 0.0003 0.0002

Databit Power per Chip (in 1000s)

0 1975

1980

1985

1990

Lilliefors test results n= 6 Lilliefors stat = 0.185 Approx p-value > 0.15

log(Databit Power per Chip


12.00 10.00

StErr 0.4082

8.00 6.00

Residuals vs Fits

4.00 2.00 0.00 1975

1980

1985

1990

6.00

8.00

10.00

12.00

Fits

Residuals in Time Order

1985

1990

1995

per Chip (in 1000s)

1990

1995

2000

wer per Chip - in K)

1990

1995

2000

Regress log(POWER) on YEAR, but omit 1980 data. Databit_power_per_chip (in 1000s) Year 1985 1988 1990 1993 1995 Power logPower Fitted Values 256 5.54518 5.439 1,000 6.90776 7.089 4,000 8.29405 8.189 16,000 9.68034 9.839 64,000 11.06664 10.938 Residuals 0.106 -0.181 0.105 -0.158 0.128 Results of multiple regression for logPower Summary measures Multiple R R-Square Adj R-Square StErr of Est ANOVA Table Source Explained Unexplained

0.9975 0.9949 0.9933 0.1795

df 1 3

Regression coefficients Coefficient Constant -1086.1221 Year 0.5499 White test results: n= White stat = p-value =

5 0.059 0.808

Autocorrelations for Residuals Lag Autocorr 1 -0.7784

Residuals v
0.18 0.12 0.06 0.00 -0.06 -0.12 -0.18 -0.24 5 6

Residuals

egression for logPower

SS 18.9904 0.0966

MS 18.9904 0.0322

F 589.6356

p-value 0.0002

Std Err 45.0706 0.0226

t-value -24.0982 24.2824

p-value 0.0002 0.0002

Lilliefors test results n= 5 Lilliefors stat = 0.246 Approx p-value > 0.15

StErr T-value p-value 0.4472 -1.7405141 0.08176878

Residuals vs Fits

10

11

Fits

You might also like