You are on page 1of 66

2.

Basic Tools for Forecasting


Introduction ................................................................................................................. 2 2.1 Types of data ......................................................................................................... 3 2.2 Time series plots .................................................................................................... 9 2.3 Scatter plots ......................................................................................................... 13 2.4 Summarizing the data .......................................................................................... 18 2.4.1 Notation ........................................................................................................ 18 2.4.2 Measures of average ..................................................................................... 19 2.4.3 Measures of variation ................................................................................... 21 2.4.4 Assessing variability ..................................................................................... 24 2.4.5 An example: hot growth companies ............................................................... 26 2.5 Correlation........................................................................................................... 28 2.6 Transformations ................................................................................................... 32 2.6.1 Differences and growth rates ......................................................................... 33 2.6.2 The log transform.......................................................................................... 36 2.7 How to measure forecasting accuracy? ................................................................ 38 2.7.1 Measures of forecast accuracy ...................................................................... 41 2.7.2 Measures of absolute error ........................................................................... 43 2.8 Prediction intervals .............................................................................................. 49 2.9 Basic Principles ................................................................................................... 53 Summary ................................................................................................................... 55 References ................................................................................................................. 56 Exercises ................................................................................................................... 56 Mini-case 2.1: Are the outcomes of NFL games predictable? ................................. 62 Mini-Case 2.2: Whither Wal-Mart? ....................................................................... 64 Mini-Case 2.3: Economic recessions...................................................................... 65

DDD: Draw the doggone diagram! (In memory of the late David Hildebrand, who stated the matter rather more forcibly!)

Introduction
In most of the chapters in this book we assume that we have available some kind of database from which to build numerical forecasts. The data may be incomplete or subject to error, they may not relate directly to the key variables of interest, and they may not be available in timely fashion. Nevertheless, they are all we have and we must learn to understand and respect them, if not to love them; indeed, such is the basis of any good relationship!

At a conceptual level, we need to understand how the data are compiled and how they relate to the forecasting issues we seek to address. We must then examine the data to understand the structure and main features, and to summarize the information available. In section 2.1, we examine the types of data that arise in practice, and then examine graphical summaries in sections 2.2 and 2.3. Section 2.4 describes the basic numerical summaries that are useful and we then move on to measures of association in section 2.5. Sometimes the original form of the data is not appropriate and some kind of transformation or modification is needed; this topic is the focus of Section 2.6. Methods for the generation of forecasts are the focus of later chapters, but in this chapter we will consider the evaluation of outputs from the forecasting process. In section 2.7 we examine measures of forecasting accuracy and the evaluation of forecasting performance,

then turn to prediction intervals in section 2.8. . The chapter ends with a summary and discussion of some underlying principles.

2.1 Types of data


A database may be thought of as a table with multiple dimensions, as the following examples illustrate: A survey of prospective voters in an upcoming election; the variables measured might include voting intentions, party affiliation, age, gender and address A portfolio of stocks listed in the London Stock Exchange; for each company we would record contact information, market capitalization, closing stock prices and dividend payments over suitable periods, and news announcements The economy of the United States; factors of interest would certainly include gross domestic product (GDP), consumer expenditures, capital investment imports and exports. The reader will certainly be able to add to these lists. The survey of voters refers to cross-sectional data, in that the purpose is to collect information as close as possible to the same time for all those interviewed. Of course, in practice the survey will cover several days but what matters is that the inherent variation in the data is across respondents. For practical purposes, we view the data as being collected in the same (short) time period. Of course, voters may change their minds at a later stage and such shifts of opinion are a major source of discrepancies between opinion polls and election outcomes.

The daily closing prices for a particular stock or fund over some time period represent time series data. We are interested in the movement of the price over time. The same applies if we track the movements over time of macroeconomic variables such as GDP; it is the development over time that is important. Cross sectional data refer to measurements on multiple units, recorded in a single time period. A time series is a set of measurements recorded on a single unit over multiple time periods.

From these examples, we see that a database may be cross-sectional or time-dependent or both (consider tracking voting intentions over time or looking at consumer expenditures each quarter for different regions of a country). Although forecasting practice often involves multiple series (such as the sales of different product lines), the methods we examine have the common theme of using data from the past and present to predict future outcomes. Thus, our primary focus in the first part of the book will be upon the use of time series data. However, as methods of data capture have become more sophisticated (e.g. scanners in supermarkets) it has become possible to develop databases that relate to individuals such as consumers and their spending habits. Forecasting may then involve the use of cross-sectional data to predict individual preferences or to evaluate a new customer based upon individuals with similar demographic characteristics.

By way of example, consider the data shown in Tables 2.1, 2.2 and 2.3. Table 2.1 shows the weekly sales of a consumer product in a certain market area, produced by a major U.S. manufacturer. This data set will be examined in greater detail in Chapter 4. The data are genuine but we have labeled the product WFJ Sales to preserve confidentiality. 4

Table 2.2 shows the annual numbers of domestic passengers at Washington Dulles International Airport for the years 1963-2007. Clearly, both data sets are time series but the sales figures are fairly stable (at least after the first 12 weeks or so) whereas the passenger series shows a strong upward movement. Table 2.3, appearing later in the chapter, involves cross-sectional data showing the financial characteristics of a sample of companies. 2.1.1 Use of large databases A manager responsible for a large number of product lines may well claim that the forecasting can all be done by computer and there is no need to waste time on modelbuilding or detailed examination of individual series. This assertion is half-right. The computer can indeed remove most of the drudgery from the forecasting exercise; see for example the forecasting methods described in Chapters 4 and 13. However, a computer is like a sheep-dog. Properly trained, it can deliver a sound flock of forecasts; poorly trained, it can create mayhem. Even if the forecasting task in question involves thousands of series to be forecast, there is no substitute for understanding the general structure of the data so that we can identify appropriate forecasting methods. The manager can then focus on those products that are providing unusual results.

In order to develop an effective forecasting process, therefore, we need to understand the kind of data we are handling. That does not mean examining every series in detail, or even at all, but rather by looking at a sample of series to establish a framework for effective forecasting. Thus, it is important to understand when and how to use forecasting methods, how to interpret the results and how to recognize their limitations and the potential for improvement. 5

Table 2.1: Value (in $) of weekly sales of product WFJ Sales [Week 1 is first week of January; WFJ Sales.xlsx]

Week

Sales

Week

Sales

Week

Sales

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

23056 24817 24300 23242 22862 22863 23391 22469 22241 24367 29457 31294 38713 35749 39768 32419 37503 31474 35625 33159 34306

22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

33631 32900 34426 33777 34849 30986 33321 34003 35417 33822 32723 34925 33460 30999 31286 35030 34260 35001 36040 36056 31397

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62

32187 30322 34588 38879 37166 37111 39021 40737 42358 51914 35404 30555 30421 30972 32336 28194 29203 28155 28404 34128

Table 2.2: Washington Dulles International Airport, Domestic Passengers 1963-2007 [Numbers of passengers in '000s; Source: U.S. Department of Transportation, Bureau of Transport Statistics. Dulles.xlsx]

Year

Passengers Year

Passengers Year

Passengers

1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977

641 728 920 1079 1427 1602 1928 1869 1881 1992 2083 2004 2000 2251 2267

1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

2518 2858 2086 1889 2248 2651 3136 4538 8394 9980 8650 9224 9043 9406 9408

1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

8501 8947 9653 10095 10697 12445 16055 15873 14021 13146 12928 18213 22129 17787 18792

2.2 Time series plots


Our aim in the next several sections is not to provide detailed technical presentations on the construction of the various plots; rather we indicate their application in the current context. Guidelines for producing these plots and other analyses are provided on the books website for Excel, EViews, Forecast Pro, Minitab, SAS and SPSS. Also, the reader is encouraged to make use of the tutorials provided by these programs, as well as their Help commands. The plots in this chapter are generated using Minitab, unless stated otherwise.

The time series plot for WFJ Sales is shown in Figure 2.1. As its name suggests, a time series plot shows the variable of interest on the vertical axis and time on the horizontal axis. Several features are immediately apparent. Sales are low for the first twelve weeks and then remain stable until week 46 when there is an increase over the Thanksgiving to Christmas period and then a peak in the last week of the year. Sales in the following year are lower than for the final weeks of the previous year,, but higher than for the corresponding period a year before. We would not wish to make too much of data for one product over little more than a year, but inspection of the plot has revealed a number of interesting patterns that we would want to check out for similar products. If these patterns were found to persist across a number of product lines, we would need to take these patterns into account in production planning. For example, the company might

initiate extra shifts or overtime to cover peak periods and plan to replenish inventories during slack periods.

The second time plot presents the data on airline passengers given in Table 2.2. Figure 2.2 shows steady growth from 1962-79 (the airport opened in 1962), then a pause followed by rapid growth in the late eighties. After a further pause in the early nineties, there was a long period of growth, with peaks in 1999 and 2005 followed by short-term declines. A detailed explanation of these changes lies outside the present discussion; more detailed explanations would require us to examine airport expansion plans, overall levels of passenger demand, the traffic at other airports in the area, and so on. The key point is that the time series plot can tell us a lot about the phenomenon under study and will often suggest suitable approaches to forecasting.

Figure 2.1: Plot of weekly WFJ Sales [WFJ Sales.xlsx]


Time series plot for WFJ Sales
55000 50000 45000

FM Sales

40000 35000 30000 25000 20000 1 6 12 18 24 30 36 Index 42 48 54 60

10

Figure 2.2: Plot of Domestic Passengers at Dulles, 1963-2007 [Dulles.xlsx]


Time series plot for Dulles passengers
25000

20000

Passengers

15000

10000

5000

0 1963 1970 1977 1984 Year 1991 1998 2005

2.2.1 Seasonal plots Figure 2.1 had some elements of a seasonal pattern (the end-of-year peak) but only just over one year of data from which to identify seasonal behavior. Clearly, Figure 2.2 has no seasonal pattern since the figures represent complete years. However, seasonal variations are often very important for planning purposes and it is desirable to have a graphical procedure that allows us to explore whether seasonal patterns exist. For monthly data, for example, we may plot the dependent variable against the months, and generate a separate, but overlaid, plot for a succession of years. In Figure 2.3A we provide such a plot for airline revenue passenger miles (RPM). RPM measures the total number of revenue generating miles flown by passengers of U.S. airlines, measured in billions of miles. To avoid cluttering the diagram, we use only five years of data for 11

1995-99; a multi-colored diagram that could be created on-line is more informative and can readily accommodate more years without confusion. Figure 2.3: Seasonal plots for airline revenue Passenger Miles for 1995-99. [Revenue miles.xlsx] A. Plot by month, with years overlaid
Scatterplot of RevPass Miles vs Month
46 44 42
Year 1995 1996 1997 1998 1999

RevPass Miles

40 38 36 34 32 30 Jan Mar May Jul Month Sep Nov

B. Time series plot, with each year identified as a sub-group


Time Series Plot of RevPassMiles
46 44 42
Year 1995 1996 1997 1998 1999

RevPassMiles

40 38 36 34 32 30 1 6 12 18 24 30 Index 36 42 48 54 60

In Figure 2.3A, the line for each year lies above those for earlier years with only rare exceptions, indicating the steady growth in airline traffic over this period. Figure 2.3B 12

also shows trend and the seasonal peaks and allows easy comparison of successive seasonal cycles. There is a major seasonal peak in the summer and a lesser peak in March/April depending on the timing of Easter. These plots of the data provide considerable insight into the variations in demand for air travel. Draw the Doggone Diagram (DDD) is indeed wise counsel.

2.3 Scatter plots


The time series plots, as displayed so far show the evolution of a single series over time. As we saw with the seasonal plot, it is possible to show multiple series on the same chart, although some care is required to make the axes sufficiently similar. An alternative is to plot the variable of interest against potential explanatory variable(s), to see how far knowledge of the explanatory variable might improve the forecasts of the variable of interest. Such scatter plots are valuable for both cross-sectional and time series data.

In Figure 2.4 we show a cross-sectional scatter plot for data taken from Business Week that refer to 100 Hot Growth Companies. The companies were selected based upon their performance over the last three years in terms of sales growth, earnings growth and the return on invested capital, all taken as three-year averages. The data are given in Table 2.3. In the diagram, we plot the price-earnings ratio (P-E Ratio) against the return on capital (ROC), in order to determine whether variations in ROC are a major determinant of the companys P-E Ratio. Although there is clearly some relationship between the two variables, it is also clear that the stock prices reflect a much more

13

complex evaluation of a companys performance than simply looking at the recent return on capital.

In its report, Business Week ranked the 100 companies and we may check how far these two factors went into the ranking by looking at scatter plots against the rank. These plots are shown in Figure 2.5. The plots show a strong relationship between ROC and Rank, but a much weaker one between the P-E Ratio and Rank. This finding is hardly surprising! The BW ranking gave 50% weight to a companys ROC ranking, along with 25% each for sales growth and profits growth.

Table 2.3: Listing of Rank, P-E Ratio against Return on Capital for 100 Hot Growth Companies for the year? [Source: Business Week, June 7, 2004, pages 104-109; Growth companies.xlsx]
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Return on Capital 51.5 68.5 35.3 28.7 28.2 29.8 24.4 27.4 32.3 35.1 32.5 15.9 18.5 15.0 22.3 17.2 21.2 13.5 P-E Ratio 37 53 71 21 26 14 36 38 24 37 26 36 47 22 13 26 24 19 Rank 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Return on Capital 19.6 16.7 13.1 17.8 17.7 14.5 15.2 8.9 14.7 12.0 10.9 14.8 15.9 12.9 21.8 12.1 8.5 18.6 P-E Ratio 17 11 41 43 28 52 22 * 31 19 19 22 11 38 21 30 25 25 Rank 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 Return on Capital 15.5 12.6 7.6 7.5 13.4 7.4 15.6 11.5 10.0 8.6 14.0 12.9 10.4 12.2 9.3 12.6 12.5 13.0 P-E Ratio 14 17 24 43 28 14 26 25 31 15 31 20 16 19 32 21 26 39

14

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

30.3 21.2 14.1 26.7 16.1 20.3 21.2 18.3 30.6 16.6 15.7 13.4 28.2 11.2 17.3 19.9

48 27 50 55 23 33 15 14 22 34 7 42 18 45 26 27

53 54 55 56 57 58 59 60 61 62 63 64 65 66 67

12.1 15.1 17.9 9.0 11.0 18.1 6.3 15.1 15.9 15.8 18.9 12.7 15.1 8.7 15.1

22 21 15 25 29 24 30 23 22 19 23 18 28 29 25

86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

15.8 14.4 10.6 12.2 11.4 11.7 12.3 12.3 14.5 13.0 14.0 10.6 11.0 10.4 11.1

32 28 16 31 13 6 37 29 20 28 23 30 21 31 29

* P-E Ratio not recorded

Figure 2.4 Plot of P-E Ratio against Return on Capital for 100 Hot Growth Companies [Growth companies.xlsx]
Scatterplot of P-E Ratio vs Return on Capital
70 60 50

P-E Ratio

40 30 20 10 0 0 10 20 30 40 Return on Capital 50 60 70

Figure 2.5A: Plot of Return on Capital against Rank

15

Scatterplot of Return on Capital vs Rank


70 60

Return on Capital

50 40 30 20 10 0 0 20 40 Rank 60 80 100

Figure 2.5B: Plot of P-E Ratio against Rank.


Scatterplot of P-E Ratio vs Rank
70 60 50

P-E Ratio

40 30 20 10 0 0 20 40 Rank 60 80 100

We will often have multiple variables of interest and wish the look for relationships among them. Rather than generate a series of scatter plots as above, we may combine them into a matrix plot, which is just a two-way array of plots of each variable against each of the others. The matrix plot for these three variables is shown in Figure 2.6. The three plots in the upper right are those shown in Figures 2.4 and 2.5. The plots in the bottom left part of the diagram are these same three plots, but with the X and Y axes reversed. The matrix plot provides a condensed summary of the relationships among multiple variables and is a useful screening device for relevant variables in the early stages of a forecasting exercise. 16

Figure 2.6: Matrix plot for P-E Ratio, ROC and Rank for 100 Hot Growth Companies [Growth companies.xlsx]
Matrix Plot of P-E Ratio, Return on Capital, Rank
80 0 25 50

40

P-E Ratio

0 50 Return on Capital 25 0

100

50

Rank

0 0 40 80 0 50 100

For these data, the relationship between the P-E Ratio and ROC appears to be weaker than we might expect. The reason for this lies in the set of data used. We are looking at the records of 100 companies selected for their recent strong performance, so that they all have strong financial foundations. A random sample of all companies would show greater variations in performance but a stronger overall relationship. A fundamental but 17

often overlooked feature of any statistical study, and especially any forecasting study, is that the sample data must be relevant for the task in hand. The data set in Table 2.3 is useful if we are trying to evaluate a possible investment in a growth company, but much less so if we are trying to understand overall market valuations.

2.4 Summarizing the data


Graphical summaries provide invaluable insights. Time plots and scatter plots should always be used in the early stages of a forecasting study to aid understanding. Furthermore, as we shall see in later chapters, such diagrams also play an invaluable role in providing diagnostics for further model development. Even when we have a large number of items to forecast, plots for a sample from the whole set of series will provide useful guidance and insights.

At the same time, we must recognize that while graphical methods provide qualitative insights, we often need some kind of numerical summary such as the average level of sales over time or the variability in P-E ratios across companies. These measures are also valuable for diagnostic purposes, when we seek to summarize forecasting errors, as in section 2.7. 2.4.1 Notation At this stage we need to elaborate upon some notational conventions, since we will use this framework throughout the remainder of the book. 1. Random variables and observations When we speak of an observation, it is something we have already recorded, a specific number or category. By contrast, when we talk about future observations, uncertainty exists. For example, if we talk 18

about tomorrows closing price of the Dow Jones Index, a range of possibilities exists, which can be described by a probability distribution. Such a variable, with both a set of possible values and an associated probability distribution, is known as a random variable. Texts with a more theoretical orientation often use upper-case letters to denote random variables and lower-case letters for observations that have already been recorded. More applied books often make no distinction, but rely upon the context making the difference clear. We will follow the second course of action and generally use the same notation for both existing observations and random variables. 2. Variables and parameters As just noted, variables are entities that we can observe such as sales or incomes. By contrast, parameters contribute to the description of an underlying process (e.g. a population mean) and are typically not observable. We distinguish these concepts by using the usual (Roman) alphabet for variables (sample values), but Greek letters for parameters (population values). Thus, the variable we wish to forecast will always be denoted by Y and, where appropriate, the sample mean and standard deviation by and S. The corresponding population mean and

standard deviation will be denoted by and respectively. 2.4.2 Measures of average By far the most important measure of average is the arithmetic mean, often known simply as the mean or the average.

19

Given a set of n values: Y1 , Y2 , , Yn the arithmetic mean is given by:


= Y Y1 + Y2 + + Yn 1 i = n = Yi . n n i =1

(2.1)

When the range of summation is clear from the context, such as the index going from 1 to n in the above formula, we will often write the summation sign without including the limits.

An alternate measure of average is given by the median, defined as follows.

Given a set of n values Y1 , Y2 , , Yn , we place these values in ascending order written as Y(1) Y(2) Y( n ) . The median is the middle observation. If n is odd, n can be written, n=2m+1 and the median is Y(m+1). If n is even, n=2m and the median is [Y(m)+Y(m+1)]

Example 2.1: Calculation of the mean and median Suppose the sales of a popular book over a seven-week period are: Week 1 2 3 4 5 6 7

Sales (000s) 15 10 12 16 9 8 14

The mean is Y =

(15 + 10 + 12 + 16 + 9 + 8 + 14) = 12. 7

The order statistics (as we often refer to the values placed in increasing order) are: 8, 9, 10, 12, 14, 15, 16.

20

Hence the median is the fourth value in the sequence, which also happens to be 12. If data for week 8 now becomes available (sales = 16) the mean becomes 12.5 and the median is [12+14] = 13. However, suppose that sales for week 8 had been 116, because of a sudden surge in popularity. The mean becomes 25, yet the median remains at 13. In general the mean is sensitive to extreme observations but the median is not.

Which value represents the true average? The question cannot be answered as framed. The median provides a better view of weekly sales over the first 8 weeks, but the publisher and the author are more interested in the numbers actually sold. The forecaster has the unenviable task of trying to decide whether future sales will continue at the giddy level of 100 thousand plus, or whether they will revert to the earlier more modest level. The wise forecaster would enquire into the reasons for the sudden jump, such as a rare large order or a major publicity event. 2.4.3 Measures of variation A safe investment is one whose value does not fluctuate much over time. Similarly, inventory planning is much more straightforward if sales are virtually the same each period. Implicit in both these statements is the idea that we use some measure of variability to evaluate risk, whether of losing money or running out of stock. There are three measures of variability that are in common use: the range, the mean absolute deviation, and the standard deviation. The standard deviation is derived from the variance, which we also define here.

21

The range denotes the difference between the largest and smallest values in the sample: Range = Y( n ) Y(1)

The deviations are defined as the differences between each observation and the mean. By construction, the mean of the deviations is zero, so to compute a measure of variability we use either the absolute values or the squared values. If we use the squares, our units of measurement become squared also. For example revenues (in $) become ($)2 so we reverse the operation after computing the average by taking the square root to ensure the measure remains in $s. These various measures are defined as follows, in terms of the deviations d= Yi Y . i

The Mean Absolute Deviation is the average of the deviations about the mean, ignoring the sign:

MAD =

| d
n

|
(2.2)

The Variance is an average of the squared deviations about the mean: S2 =

(n 1)

2 i

(2.3)

The Standard Deviation is the square root of the variance:


= S = S2

(n 1)

2 i

(2.4)

22

Example 2.2: Calculation of measures of variation Consider the values for the seven weeks of book sales, given in Example 2.1. From the order statistics, we immediately see that the range is: Range = 16 8 = 8. However, if week 8 is entered with sales = 116, the range shoots up to 116 8 = 108. This simple example illustrates both the strength and weakness of the range: it is very easy to compute, but it is severely affected by extreme values. Its vulnerability to extreme values makes it unsuitable for most purposes in forecasting. The deviations for the seven weeks are:

Week

5 9

7 8 14

Sums 84 0 18 58

Sales (000s) 15 10 12 16 Deviation |d| d2 +3 3 9 -2 2 4

0 +4 -3 0 4 3

-4 +2 4 2 4

0 16

9 16

From the table, we have MAD = 18/7 = 2.57, S2 = 58/6 = 9.67 and S = 3.11.

Why do we use (n-1) rather than n in the denominator of the variance? Since we are using the deviations, if we had only one observation, its deviation would necessarily be zero. That is, we have no information about the variability in the data. Likewise, in our sample of seven, if you tell me six of the deviations, I can work out the value of the seventh observation from the fact that they must sum to zero. In effect, by subtracting the mean from each observation we have lost an observation. In statistical 23

parlance, this is known as losing a degree of freedom and we say that the variance is computed using (n-1) degrees of freedom, which we abbreviate to (n-1) DF. In later chapters, we sometimes lose several DF, and the definitions of variability will change accordingly. This adjustment has the benefit of making the sample variance an unbiased estimator for the population variance. Why dont we use (n-1) in the MAD? Standard practice is to use n but there is no other good reason!

Is S always bigger than MAD? S gives greater weight to the more extreme observations by squaring them and it may be shown that S > MAD whenever MAD is greater than zero. A rough relationship between the two is: S=1.25MAD. 2.4.4 Assessing variability The statement that our book sales have a standard deviation of 3.11 (thousand, remember) conveys little about the inherent variability in the data from week to week, unless we live and breathe details about the sales of that particular book, like any penniless author. To produce a more standard frame of reference, we use standardized scores. Given a sample mean Y and sample standard deviation S, we define the standardized scores for the observations, also known as Z-scores as:

Z=
Following our simple example, we obtain Week Sales (000s) 1 15 2 3 4 16

Yi Y S

5 9

6 8

7 14 24

10 12

Deviation Z-score

+3

-2

+4

-3

-4

+2

0.96 -0.64

0 1.29 -0.96 -1.29 0.64

The Z-scores still do not provide much information until we provide a frame of reference. In this book, we usually use Z-scores to examine forecast errors and proceed in three steps: 1. Check that the observed distribution of the errors is approximately normal (for details, see Appendix X). 2. If the assumption is satisfied, relate the Z-score to the normal tables (provided in Appendix X): The probability that |Z| > 1 is about 0.32 The probability that |Z| > 2 is about 0.046 The probability that |Z| > 3 is about 0.0027

3. Create a time series plot of the residuals (and/or Z-scores) when appropriate to determine which observations appear to be extreme. At this stage we do not pursue the systematic use of Z scores except to recognize that whenever you see a Z-score greater than 3 in absolute value, the observation is very atypical, since the probability of such an occurrence is less than 3 in 1,000. Often, such large values will signify that something unusual has happened and we refer to such observations as outliers. In cross-sectional studies it is sometimes admissible to just delete such observations (e.g. a report of a 280 year-old man is undoubtedly a recording error). In time series forecasting, we wish to retain the complete sequence of values and must investigate more closely, often finding special circumstances (e.g. a strike, bad 25

weather, a special sales promotion) for which we had not allowed. Outliers indicate the need for further exploration, not routine rejection. We defer the detailed treatment of outliers to Chapter X. 2.4.5 An example: hot growth companies The default summary outputs for Minitab and Excel for the data in Table 2.3 on hot growth companies are shown in Figure 2.7. The output from other programs may have a somewhat different format, but the summary measures included are similar and most programs allow a variety of options. Excel typically produces too many decimal places; for ease of comparison, our output has been edited to produce a reasonable number of decimal places. Note that the count of observations is one fewer for the P-E Ratio, as we had a missing value.

Both sets of summary statistics we show include a numbers of measures wse do not need until later. However Minitab introduces Q1 (quartile 1, the value with 25% of observations below Q1 and 75% above) and Q3 (quartile 3, with 75% below and 25% above). These, together with the median (Q2) are often useful for summarizing variables.

Figure 2.7: Descriptive Statistics for Hot Growth Companies [Growth companies.xlsx] (a) Minitab
Variable Return on Capital P-E Ratio Variable Return on Capital P-E Ratio N 100 99 N* 0 1 Mean 17.028 27.06 Q3 18.575 31.00 SE Mean 0.900 1.12 Maximum 68.500 71.00 StDev 8.998 11.11 Range 62.200 65.00 Variance 80.961 123.36 Minimum 6.300 6.00 Q1 12.100 20.00

Median 14.900 25.00

(b) Excel
Return on Capital P-E Ratio

26

Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count

17.028 0.900 14.900 15.100 8.998 80.961 11.676 2.814 62.200 6.300 68.500 1702.800 100

Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count

27.06 1.12 25.00 26.00 11.11 123.36 1.95 1.07 65.00 6.00 71.00 2679.00 99

Given the mean and standard deviation, we proceed to compute the Z-scores, shown in Table 2.4. We list the top seven companies, as the interesting features are associated with the top few on the list; those ranked 94-100 are provided for comparison. In Table 2.4, we have shaded those Z-scores that are greater than 3 in absolute value; none of the remaining companies had any Z-scores outside 3. From the table, it is clear that the first two companies on the list have an ROC much greater than the rest, whereas the third one has a P-E Ratio that is much larger. Turning to numbers 94-100 they have negative Z-scores for ROC and typically small scores for their P-E Ratios. That is, they do not look so good, although that is only with reference to the illustrious company they are keeping. Across all public companies, these 100 would show impressive figures that yielded positive Z-scores.

Table 2.4: Z-scores for hot growth companies


Rank Return on P-E Z-ROC Z-PE

27

Capital 1 2 3 4 5 6 7 51.5 68.5 35.3 28.7 28.2 29.8 24.4

Ratio 37 53 71 21 26 14 36 3.83 5.72 2.03 1.30 1.24 1.42 0.82 0.89 2.33 3.95 -0.55 -0.10 -1.18 0.80

94 95 96 97 98 99 100

14.5 13.0 14.0 10.6 11.0 10.4 11.1

20 28 23 30 21 31 29

-0.28 -0.45 -0.34 -0.71 -0.67 -0.74 -0.66

-0.64 0.08 -0.37 0.26 -0.55 0.35 0.17

2.5 Correlation
In the previous section we produced numerical summaries to complement the graphical analysis of section 2.2. We now develop a statistic that performs a similar function for the scatter plots of section 2.3, known as the correlation. Before developing the coefficient, we examine Figure 2.8; in each case the horizontal axis may be interpreted as time. The five plots suggest the following: Y1 increases with time, and is perfectly related to time; Y2 decreases with time, and is perfectly related to time; Y3 tends to increase with time but is not perfectly related to time; Y4 tends to decrease with time but the relationship is weaker than for Y3; Y5 shows virtually no relationship with time; 28

Y6 is perfectly related to time, but the relationship is not linear.

Our measure should reflect these differences, but not be affected by changes in the origin or changes of scale; the origins and scales of the variables are deliberately omitted from the diagrams as they do not affect the degree of association between the two variables. The most commonly used measure that satisfies these criteria is the (Pearson) Product Moment Correlation Coefficient, which we simply refer to as the correlation. We use the letter r to denote the sample coefficient and the Greek letter [rho] to denote the corresponding population quantity. Our definition refers to the sample quantity; the population definition follows on replacing the sample components by their expected values, but we shall not need that expression explicitly.

Figure 2.8: Plots of hypothetical data against time.


Y1
Y2

Y3

Y4

29

Y5

Y6

The sample correlation between X and Y is defined as:

r=

(X
i =1

X )(Yi Y )
2

(X

X)

(Y Y )
i

(2.5)
2

When we divide denominator by (n-1) the two terms inside the square root sign become the sample variances of X and Y respectively. That is, taking square roots, they represent the two standard deviations, SX and SY. The numerator divided by (n-1) is known as the sample covariance between X and Y, denoted by SXY. That is, the correlation may be written as:
r= S XY S X SY

(2.6)

It may be shown that, for Y1 in Figure 2.8, r = 1, the maximum value possible. Similarly, Y2 has r = -1, the minimum possible. The other correlations are, for Y3, Y4, Y5 and Y6: 0.93, -0.66, -0.09 and 0 respectively. In general, we see that the absolute value of r declines as the relationship gets weaker. At first sight the result for Y6 appears odd. There is a clear relationship with X, but the correlation is zero. The reason for this is that r measures linear association but the relationship with X is quadratic rather than

30

linear. A good example would be the relationship between total revenue and price: charge too much or too little and total revenue is low.

Example 2.3: Calculation of the correlation Using the data from Example 2.1, the detailed calculations for the correlation between sales and time are shown in the table. A spreadsheet could readily be set up in this format for direct calculations, but all standard software packages have a correlation function. Week, X Sales (000s), Y X X Y Y ( X X )2 (Y Y ) 2 ( X X )(Y Y ) 1 2 3 4 5 9 6 7 8 14 2 3 0 0 28 58 -10 X =4 Y = 12 4 8.3 -1.4 Sums Mean

15 10 12 16 -3 +3 9 9 -9 -2 -2 4 4 4 -1

0 1

0 +4 -3 1 0 1

-4 +2 4 9 4 6

0 16 0

9 16 -8

0 -3

28 6 and SY = 58 6 so that r = 0.248 . 10 / 6, S X = Thus, S XY =


The example shows a weak negative correlation for sales with time; that is, sales may be declining slightly over time.

Example 2.4: Correlation for hot growth companies For the data given in Table 2.3, the correlations among rank, ROC and P-E Ratio are: Variables Rank and ROC Correlation -0.647 31

Rank and P-E Ratio -0.267 ROC and P-E Ratio 0.306

As expected, there is a strong negative correlation between Rank and ROC, since 50 percent of the weight for the ranking is based upon ROC (high ROC relates to a small number for Rank). The correlation of P-E Ratio and Rank is also negative, but weaker (no direct weighting). Finally, we see a modest positive correlation between the ROC and the P-E Ratio. We may compare these numbers with the plots in Figure 2.5 to gain some insight into their interpretation.

2.6 Transformations
We now examine the annual figures for number of passengers on domestic flights out of Dulles airport. The descriptive statistics are as follows:
Descriptive Statistics: Passengers (from Dulles.xlsx)
Variable Passengers N 45 N* 0 Mean 7111 SE Mean 892 StDev 5987 Minimum 641 Q1 1996 Median 4538 Q3 10396 Maximum 22129

Dulles, we have a problem! What does the average of 7,111 mean? Such levels were typical of the mid-eighties, but the average in a strongly trending series like this one has no meaning. Certainly, it would make no sense to use either the mean or the median to forecast the next years traffic.

How should we deal with a series that reveals a strong trend? Everyday conversation provides a clue. We talk of the return on an investment, an increase of a certain number in sales or the percentage change in GDP. This approach is partly a matter of 32

convenience; some ideas are more readily communicated using (percentage) changes rather than raw figures. Thus we may regard 3 percent growth in GDP in the US or Europe as reasonable, 1 percent as anemic and 10 percent as unsustainable (except in China and India. The same information conveyed in currency terms, measured in trillions of US$, would be hard to comprehend.

From the forecasting perspective there are two further reasons for considering such alternatives: The forecast related directly back to the previously observed value, so that such forecasts are unlikely to be wildly off-target Averages measured in terms of changes or percentage changes in the time series are often more stable and more meaningful than averages computed from the original series. We now explore these options in greater detail.

2.6.1 Differences and growth rates


The change in the absolute level of the series from one period to the next is known as the (first) difference 1 of the series, and it is written as: DY= Yt Yt 1 t (2.7)

At time t, the previous value Yt 1 is already known. If the forecast for the difference is

the forecast for Y , Y , becomes written as D t t

(2.8)We use ^, the hat symbol, to denote a forecast or a F= Y = Yt 1 + D t t t


1

Many texts use the Greek capital letter (del) and others use (inverted del) but the use of D seems a better mnemonic device for difference.

33

GYt = 100

(Yt Yt 1 ) . Yt 1

(2.9)

Expression (2.9) also defines the one-period return on an investment, given the opening

the forecast for the price of Yt 1 . Once the growth rate has been predicted, denoted by G t
next time period is:
G t F Y Y = = + [1 ]. t t t 1 100

(2.10)

The time plots for DY and GY for the Dulles passengers series are shown in Figures 2.9A and B. Both series show a fairly stable level over time, so that the mean becomes a useful summary again, although GY is trending slightly downwards, indicating a slowing in percentage growth.

Another feature of Figure 2.9A is that the variability in DY is much greater at the end of the series than it is at the beginning. By contrast, the GY series has more consistent fluctuations. We could claim that GY has a stable variance over time, a claim that it would be hard to make for DY. Which should we use? In part, the choice will depend upon the purpose behind the forecasting exercise, but an often reasonable guideline is a common-sense one Do you naturally think of changes in the time series in absolute terms or relative (i.e. percentage) terms? If the answer is absolute use DY; if it is relative use GY. In the present case, both transformed series show some unusual values and further investigation would be warranted.

The summary statistics for DY and GY are:


The use of G to describe growth rate is non-standard; we use it for the same reason as above: it is a convenient mnemonic device.
2

34

Descriptive Statistics: Difference, Growth rate (from Dulles.xlsx)


Variable Difference Growth rate Variable Difference Growth rate N 44 44 N* 1 1 Mean 413 9.41 SE Mean 232 2.83 StDev 1536 18.78 Minimum -4342 -26.99 Q1 -74 -1.53 Median 221 5.93 Q3 552 18.21

Maximum 5285 84.95

These figures also reflect the considerable fluctuations that appear in each series

Figure 2.9A: Time plot for the first differences of the Dulles passengers series. [Dulles.xlsx]
Time Series Plot of Difference
5000

2500

Difference

-2500

-5000 1963 1967 1971 1975 1979 1983 1987 Year 1991 1995 1999 2003 2007

Figure 2.9B: Time plot for the growth rates for the Dulles passengers series [Dulles.xlsx]

35

Time Series Plot of Growth rate


100 80 60

Growth rate

40 20 0 -20 -40 1963 1967 1971 1975 1979 1983 1987 Year 1991 1995 1999 2003 2007

2.6.2 The log transform


In George Orwells classic novel 1984 there is a scene where the chocolate ration is reduced by 50 percent and then increased by 50 percent. The main character, Winston complains that he does not have as much chocolate as before, but he is sharply rebuked for his remarks. However, Winston is right, since
(1 50 50 )(1 + )= 0.75 100 100

So that Winston has 25 percent less chocolate than before. To avoid this asymmetry, we may use the logarithmic, or just log transform, usually with the natural logarithm defined on the base e=2.71828 . The log transform may be written as Lt = ln(Yt ) and the (first) difference in logarithms becomes: = DLt ln(Yt ) ln(Yt 1 ) (2.11)

36

The primary purpose of the log transform is to convert exponential (or proportional) growth into linear growth. The transform often ahs the secondary benefit of stabilizing the variance, as did the use of growth rates. Indeed, the log and growth rate transforms tend to produce very similar results, as can be seen by comparing the plot of the log differences for the Dulles passengers series in Figure 2.10 with Figure 2.9B.

Figure 2.10: Time plot for the first difference of logarithms for the Dulles passengers series (DL_pass) [Dulles.xlsx]

Time Series Plot of DL_pass


0.6

0.4

DL_pass

0.2

0.0

-0.2

-0.4 1963 1967 1971 1975 1979 1983 1987 1991 1995 Year 1999 2003 2007

t say, the forecast for the original If we generate a forecast of the log-difference, DL series, given previous value Yt 1 becomes:

= Y exp( DL ) Y t t 1 t

(2.12) 37

Example 2.5: Calculation of forecast using log-differences The actual number of Dulles passengers for 2007 was 18,792 (in thousands). To make a forecast for 2008, we might use the last value for the log-difference as the forecast of DLt , which is 0.05496. Then equation (2.12) yields

18792exp(0.05496) = Y = 18792*1.0565 = 19,854 . t

2.7 How to measure forecasting accuracy?


A key question in any forecasting endeavor is how to measure performance. Such measures are of particular value when we come to select a forecasting procedure, since we may compare alternatives and choose the method with the best track record. Then, once the method is being used on a regular basis, we need similar measures to tell us whether the forecasts are maintaining their historical level of accuracy. If a particular set of forecasts is not performing adequately, managerial intervention will be needed to get things back on track by putting improvements in place such as more timely data, better statistical methods and software (see chapters 13 and 14)..

The generation of forecasts and the selection of a preferred method will occupy a major portion of the book. Therefore, in order to discuss issues of accuracy without the need to develop forecasting methods explicitly at this stage, we consider an example taken from meteorology. Weather forecasts that appear in the media are not directed at a particular audience and there is no reason to suppose that forecasts of temperature would have any inherent bias. However, we would expect that such forecasts (and this is typically true of all forecasts) would become less accurate as the forecast horizon increases, in this case, the number of days ahead. 38

We consider a set of local forecasts for daily high temperatures, extracted from the Washington Post for the period December 17 2003 to January 5 2004. The forecasts are generated by Accuweather, a weather forecasting organization. The forecasts appear for 1 to 5 days ahead, so the initial data could be summarized as shown in Table 2.5 (first few days only). However, this form of presentation is not useful for the evaluation of the forecasts since, for example, the 4-days ahead forecast made on December 17 refers to conditions to be observed on December 21. To match forecasts to actual outcomes we must slide the columns down, as shown in Table 2.6. We may now compare forecasts in the same row.

Table 2.5: Temperature forecasts for Washington DC, December 17-22 2003. [Figures represent daily highs at Reagan National Airport. Source: Washington Post. DC weather.xlsx]
Date 1 Forecasts, days ahead 2 3 4 5 Actual Temp

17-Dec-03 18-Dec-03 19-Dec-03 20-Dec-03 21-Dec-03 22-Dec-03

42 36 38 44 52 58

40 38 40 52 58 56

42 42 52 56 56 44

44 50 54 54 48 48

48 54 52 48 48 50

50 38 37 40 44 57

Table 2.6: Temperature forecasts for Washington DC, December 17 2003 to January 5, 2004 [ Figures represent daily highs at Reagan National Airport; DC weather.xlsx]
Date Forecasts, days ahead Actual

39

Temp

17-Dec-03 18-Dec-03 19-Dec-03 20-Dec-03 21-Dec-03 22-Dec-03 23-Dec-03 24-Dec-03 25-Dec-03 26-Dec-03 27-Dec-03 28-Dec-03 29-Dec-03 30-Dec-03 31-Dec-03 1-Jan-04 2-Jan-04 3-Jan-04 4-Jan-04 5-Jan-04 42 36 38 44 52 58 54 45 47 54 54 60 55 53 52 54 69 64 62 40 38 40 52 58 56 44 46 52 55 59 53 50 52 54 64 64 58 42 42 52 56 56 44 49 54 54 58 49 55 56 45 64 62 54 44 50 54 54 48 48 54 50 54 49 53 57 52 62 60 54 48 54 52 48 48 50 54 54 51 48 50 48 60 56 51

50 38 37 40 44 57 62 56 43 44 54 52 60 54 49 55 50 68 72 49

A general format following the structure of Table 2.6 is shown in Table 2.7, where Yt denotes the actual value in period t and Ft|t-h denotes the forecast made in period t-h for period t, the h-step ahead foreast. Period t-h is called the forecast origin. Often, we are interested in one-step-ahead forecasts and we then simplify the notation 3 to Ft instead of
Ft|t-1 . Thus, F15 refers to the one-step-ahead forecast made for period 15 at time 14,
F15|13 to the two-step-ahead forecast made for period 15 at time 13 and so on. These

values will be eventually compared to the observed value in period 15, Y15.
The notation for forecasts is not standard. Some texts use Ft+h to denote forecasts h steps-ahead for Yt+h. While this notation is simpler than ours, and appears to work well when expressed algebraically as here, the notation F13+2 [since it is not equal to F15!] is potentially confusing and F 15|13 is clearer.
3

40

The forecast origin is the time period from which the forecasts are made Table 2.7: Structure of forecasts for 1, 2, 3, periods ahead Period Days ahead that forecasts were made Actual 1 t-1 t t+1 t+2 Ft-1 or Ft-1|t-2 Ft or Ft|t-1 Ft+1 or Ft+1|t Ft+2 or Ft+2|t+1 2 Ft-1|t-3 Ft|t-2 3 Ft-1|t-4 Ft|t-3 ... Yt-1 Yt Yt+1 Yt+2

Ft+1|t-1 Ft+1|t-2 Ft+2|t Ft+2|t-1

2.7.1 Measures of forecast accuracy Now that we have a set of forecasts and actual values with which to compare them, how should the comparisons be made? A natural approach would be to look at the differences between the observed values and the forecasts, and to use their average as a performance measure. Suppose that we start from forecast origin t so that the forecasts are made successively (one-step-ahead) at times t+1, t+2, , t+m; there being m such forecasts in all. The one-step-ahead forecast error at time t+i may be denoted by et + i = Yt + i Ft + i A possible indicator is the mean of the errors. The Mean Error (ME) is given by
ME = (Yt +i Ft +i ) / m = et +i / m
m m

= i 1= i 1

(2.13)

The Mean Error is a useful way of detecting bias in a forecast; that is, ME will be large and positive (negative) when the actual value is consistently greater (less) than the 41

forecast. When the variable of interest is strictly positive, as with the number of employees or sales revenues, a percentage measure is often more useful.

The Mean Percentage Error (MPE) is

= MPE

100 m (Yt +i Ft +i ) 100 m et +i = Y m i 1= m i 1 Yt +i = t +i

(2.14)

Note that the ME is a useful measure for the temperature data, but MPE is not, since the temperature can fall below zero. More importantly, temperature does not have a natural origin, so that the MPE would give different (and equally meaningless) results depending on whether we used the Fahrenheit or Celsius scales.

Example 2.6: Calculation of ME and MPE (Electric errors.xlsx) The calculations of ME and MPE are illustrated in Table 2.8. The data in this table represent the monthly electricity consumption (in KWH, kilowatt hours) in a Washington DC household for 2003; the column of forecasts represents the consumption in the corresponding month in 2002. Consumption is low in the winter and high in the summer because the home uses gas heating and electric air conditioning.

As noted, the ME and MPE are useful measures of bias; from Table 2.8 we see that the household generally reduced its consumption over the year, so the forecasts tended to be too high. In passing, we note that the year-on-year change is given by comparing the totals for 2002 and 2003, 13,190 and 11,270 KWH respectively, which results in a 14.6%

42

drop. The 18.7 percent average given by the MPE reflects month-by-month forecasting performance, not the change in the totals.

A limitation of these measures is that they do not reflect variability. Positive and negative errors could virtually cancel each other out, yet substantial forecasting errors could remain. To see this effect, suppose we used the average monthly figure for 2002 to predict the months of 2003. The average is 1099 KWH, a figure that seriously underestimates summer consumption and overestimates the rest of the year. Yet, the ME would be unchanged. The MPE expands to -37.2 since the errors are larger in the months with low consumption; however, this apparent gain is largely illusory. For example, a forecast value of 800 KWH per month is clearly not very useful, yet it reduces the MPE to -0.1, as shown in Table 2.9! From this discussion it is evident we also need measures that take account of the magnitude of an error regardless of sign. 2.7.2 Measures of absolute error The simplest way to gauge the variability in forecasting performance is to examine the absolute errors, defined as the value of the error ignoring its sign and expressed as: | ei= | | Yi Fi | . (2.15)

Thus, if we generate a forecast of F = 100, the absolute error is 20 whenever the actual value turns out to be either 80 or 120. As before, we may consider various averages, based upon the absolute errors. Those in common use are:

43

Mean Absolute Error:


MAE = | Yt +i Ft +i | / m = | et +i | / m
m m

(2.16)

= i 1= i 1

Mean Absolute Percentage Error:

MAPE =

100 m | Yt + i Ft + i | 100 m | et + i | = Y m i =1 m i =1 Yt + i t +i

(2.17)

Mean Square Error:


MSE = (Yt + i Ft + i ) 2 / m = et2+ i / m
i =1 i =1 m m

(2.18)

Root Mean Square Error: RMSE = MSE . (2.19)

Mean Absolute Scaled Error

MASE =

| Y | Y
i =1 i =1 m

t +i

Ft +i | (2.20) Yt +i 1 |

t +i

MASE is a new measure introduced by Hyndman and Koehler (2005). The MASE is the ratio of the MAE for the current set of forecasts relative to the MAE for forecasts made using the random walk; the random walk forecast specifies the most recent observation as 44

the forecast for the next period. When the MASE is greater than one, we may conclude that the random walk forecasts are superior. When MASE is less than one, the method under consideration is superior to the random walk.

The following comments are in order: 1. MAPE should only be used when Y > 0; MASE is not so restricted. 2. MAPE is the most commonly used error measure in practice. It is sensitive to values of Y close to zero when the Median, MdAPE can be used in its place. 3. The RMSE is used since the MSE involves squared errors so that is the original series is in dollars, MSE is measured in terms of (dollars)2. Taking the square root to obtain the RMSE restores the original units. 4. The RMSE gives greater weight to large (absolute) errors. It is therefore sensitive to extreme errors. It may be shown that RMSE MAE for any set of m forecasts. 5. The measure using absolute values always equals or exceeds the absolute value of the measure based on the errors, so that MAE | ME | and MAPE | MPE | . If the values are close in magnitude that suggests a systematic bias in the forecasts. 6. Both MAPE and MASE are scale-free and so can be used to make comparisons across multiple series. The other measures are scale-dependent and cannot be used to make such comparisons without an additional scaling.

Example 2.7: Calculation of absolute error measures The absolute error measures are computed in Table 2.8 for the electricity forecasts. The individual terms are shown in the various columns and MAE, MAPE and MSE are then evaluated as the column averages. RMSE follows directly from equation (2.16); that is, 45

by taking the square root of the MSE. The lower part of the table yields the MAE for the random walk forecasts, so that the MASE is given by the ratio of the forecast MAE to the MAE of the random walk forecasts. That is, relative to the random walk methods, the forecasts based upon the same month in the previous year provide a 38 percent
= [ 100*(263.6 163.3) / 263.6] reduction in the mean absolute error.

Table 2.8: Analysis of forecasting accuracy for electricity consumption in a Washington DC household [The monthly forecasts for 2003 are the corresponding actual values for 2002] (Electric Errors.xlsx)
(a) Error analysis for actual forecasts
Month Actual Forecast [= 2002 values] Jan-03 Feb-03 Mar-03 Apr-03 May-03 Jun-03 Jul-03 Aug-03 Sep-03 Oct-03 Nov-03 Dec-03 790 810 680 500 520 810 1120 1840 1600 1250 740 610 820 790 720 640 780 980 1550 1850 1880 1600 890 690 -30 20 -40 -140 -260 -170 -430 -10 -280 -350 -150 -80 30 20 40 140 260 170 430 10 280 350 150 80 -3.8 2.5 -5.9 -28.0 -50.0 -21.0 -38.4 -0.5 -17.5 -28.0 -20.3 -13.1 Errors Absolute errors Percentage errors Absolute percentage errors 3.8 2.5 5.9 28.0 50.0 21.0 38.4 0.5 17.5 28.0 20.3 13.1 900 400 1600 19600 67600 28900 184900 100 78400 122500 22500 6400 Squared errors

ME = MAE = MSE =

-160.0 163.3 44483.3

MPE = MAPE = RMSE =

-18.7 19.1 210.9

(b) Error analysis for random walk


Month Actual Random walk forecast Errors Absolute errors

46

Jan-03 Feb-03 Mar-03 Apr-03 May-03 Jun-03 Jul-03 Aug-03 Sep-03 Oct-03 Nov-03 Dec-03

790 810 680 500 520 810 1120 1840 1600 1250 740 610

690 790 810 680 500 520 810 1120 1840 1600 1250 740 20 -130 -180 20 290 310 720 -240 -350 -510 -130 20 130 180 20 290 310 720 240 350 510 130

MAE

263.6

Table 2.9 provides a comparison of the three sets of forecasts for electricity consumption: Last years value as given in Table 2.8 Yearly average (= 1099) All months set at 800. Random Walk forecast (as above)

Table 2.9: Comparison of forecasts for electricity data (Electric Errors.xlsx)


Forecast ME Last year's values Monthly average All F = 800 Random Walk -160 -160 139 Error Measure MPE -18.7 -37.2 0.1 MAE 164 396 299 MAPE 19.1 51.5 28.8 RMSE 211 440 433 MASE 0.62 1.67 1.64

From the table, we see that the forecasts based upon last years figures are clearly more accurate. Indeed, some local utilities use these forecasts to estimate customers bills 47

when no meter recordings are available. The ME and MPE values indicate that the forecasts were somewhat biased. Did the household deliberately try to conserve energy? Perhaps, but another factor was certainly that 2003 had a cooler summer; not something that could be reliably forecast at the beginning of January.

Example 2.8: Comparison of weather forecasting errors We may now use these various measures to assess the performance of the various forecasts presented in Table 2.6. The results are given in Table 2.10; the MPE and MAPE are not reported since they are not sensible measures in this case. As is to be expected, the MAE and RMSE generally increase as the forecast horizon is extended; forecasts should improve as we get nearer to the event. The MASE increases because we used the first order lag to define the random walk. If we had used the same order of lag as the original forecasts, the MASE would be more similar to the lag one value.

Table 2.10: Summary of forecast errors for weather data [DC_weather.xlsx] Steps Ahead Measure 1 2 3 4 5

ME MAE RMSE MASE

-0.47 3.11 4.35 0.40

0.61 3.17 3.92 0.44

1.00 3.59 4.41 0.55

1.63 4.38 5.49 0.68

3.53 5.27 6.43 0.87

Theoretically, the RMSE for forecasts should increase as the lead time increases. However, we note that this expectation may be violated because of small numbers of 48

observations being used to compute the summary measures, as illustrated in the table for lags 1 and 2.

2.8 Prediction intervals


Thus far, our discussion has centered upon point forecasts; that is, future observations for which we report a single forecast value. For many purposes, managers seem to feel comfortable with a single figure. However, such confidence in a single number is often misplaced.

Consider, for example, the generation of weekly sales forecasts. Our method might generate a forecast for next week of 600 units. The manager who plans for the sale of exactly 600 units and never considers the possibility of selling more or fewer units is a fool, and will probably become an ex-manager fairly quickly! Why? Because the demand for most products is inherently variable! Some weeks will see sales below the forecast level and some will see more. When sales fall short of the point forecast, the business will incur holding costs for unsold inventory or may have to destroy perishable stock. When sales exceed the point forecasts, not only will the business will lose sales, but disappointed customers may go elsewhere in the future. The best choice of inventory level will depend upon the relative costs of lost sales and excess inventory, which are described by the statistical distribution of possible sales, known as the predictive distribution. The selection of the best inventory level in this case is known as the newsvendor problem, since it was originally formulated in the context of selling newspapers. Our purpose here is not to dwell upon the details, which may be found in 49

most management science texts, such as Winston and Albright (2001), but rather to emphasize the fundamental role that the predictive distribution plays in such cases.

If we know the (relative) magnitudes of the costs of lost sales and of excess inventory we may define an overall cost function and then select the level of inventory to minimize cost; see Exercise 2.13 for further details.

Sometimes, these costs are difficult to assess, and the manager will prefer to guarantee a certain level of service. For example, suppose that we wish to meet demand 95 percent of the time. We then need to add a safety stock to the point forecast to ensure that the probability of a stock-out is no more than 5 percent. Typically, we assume that the predictive distribution for demand follows the normal law (although such an assumption is at best an approximation and needs to be checked). If we assume that the standard deviation (SD) of the distribution is known, we may use the upper 95 percent point of the standard normal distribution 4 (this value is 1.645, see Appendix Table A1) so the appropriate stock level is: Mean + 1.645*(SD), (2.21)

The mean in this case is the point forecast. Thus, if the point forecast is 600 with an associated SD of 25, the manager would stock 600 + 1.645*25 or 641 units to achieve the desired level of customer service.

The normal distribution is by far the most widely used in the construction of prediction intervals. This usage makes it critical to check that the forecast errors are approximately normally distributed. See Appendix X.

50

Expression (2.21) is an example of a one-sided prediction interval: the probability is 0.95 that demand would be equal to or less than 641, assuming that our forecasting method is appropriate for the sales of that particular product. Typically the SD is unknown and must be estimated from the sample that was used to generate the point forecast. That is, we use the RMSE to estimate the SD. Further, in most forecasting applications, it is more common to employ two-sided prediction intervals. Putting these ingredients together, we define the two-sided 100(1-) percent prediction interval as: Forecast z/2*(RMSE). (2.22)

Here z/2 denotes the upper 100(1-/2) percentage point of the normal distribution. We should recognize at this point that although we are using the sample value of RMSE to estimate the SD, we are not making any allowance for this fact. In Chapter 6, we define prediction intervals more precisely. For present purposes, expression (2.22) will suffice.

The general purpose of such intervals is to provide an indication of the reliability of the point forecasts. The limits derived from (2.22) are sometimes expressed as optimistic and pessimistic forecasts; such nomenclature is useful as a way of presenting the concept to others, but a precise formulation of the limits as in (2.22) should be used rather than a vague assessment of extreme outcomes.

Example 2.9: Evaluation of prediction intervals The RMSE of the one-step-ahead forecasts for the weather data is given in Table 2.10 as 4.35. We may use this sample value as an estimate of the RMSE for the process. Thus the 95 percent one-step-ahead prediction intervals would be: Point forecast 1.96*4.35 = F 8.53. 51

Comparison of forecast and actual values in Table 2.6 reveals that one value out of 19 lies outside these limits (that of Jan 5). We also may observe that the extreme changes in the weather at the end of the sample period led to greater inaccuracies in forecasting, and a considerable increase in the RMSE. Finally, note that prediction intervals may be used for retrospective analysis as here, but their primary purpose is to provide assessments of uncertainty for future events.

A detailed discussion of prediction intervals must await the formal development of forecasting models. The reader who wishes to preview these discussions should consult sections 6.xx.

An alternative approach to using theoretical formulae when calculating prediction intervals is to use the observed errors to show the range of variation expected in the forecasts. For the extended weather data set for the years we can calculate the 1-step ahead errors made using the random walk forecasts. These form a histogram as shown in figure ?. From this we can see that 90% of the forecast errors fall within the interval ?.

We can also fit a theoretical probability density to the observed errors as follows. We use a normal ddistribution here but other distributions are possible since in many applications more extreme errors are observed than those suggested by a normal distribution. Fitting a distribution gives us more precise estimates of the prediction intervals and these are called empirical prediction intervals. To be useful, these empirical prediction intervals need to be based on a large sample of errors.

52

2.9 Basic Principles


This book, like most other texts on business forecasting tends to devote most of its space to discussions of forecasting methods and underlying statistical models. However, if the groundwork is not properly laid, the best methods in the world cannot save the forecaster from the effects of poor data selection and inadequate preparation. In this and later chapters, we select principles from Armstrong (2001) and other sources that are particularly relevant to the material just covered. We number these principles in order within each chapter to facilitate cross-referencing but also give the original number from Armstrong (2001) in square brackets where appropriate. The principles have been reworded to meet present needs and are not direct quotations. However, the cross reference is cited wherever a given principle adheres to the spirit of the original statement.

Principle 2.1 [2.2] Ensure that the data match the forecasting situation. Once the underlying purpose of the forecasting exercise has been specified, the ideal data set can be identified. However, there are many reasons why the ideal may not be available. For example, macroeconomic data are published with a lag that may be of several months duration and, even then, may be published only as a preliminary estimate. The forecaster needs to examine the available data with respect to the end use to which the forecasts will be put, and make sure that a match exists.

Principle 2.2 [5.1] Clean the data. Data may be wrongly recorded, omitted or affected by changing definitions. Adjustments should be made where necessary, but a record of such changes should be kept and made 53

available to users of the forecasts. Data cleaning can be very time-consuming, although the plots and numerical summaries described in this chapter will go a long way towards identifying data errors. Failure to clean dat can lead to the familiar situation of Garbage in, garbage out. Principle 2.3 [5.2] Use transformations as required by expectations. We considered differences, growth rates and log-transforms in section 2.6. The forecaster needs to consider whether the original measurements provide the most appropriate framework for generating forecasts or whether some form of transformation is desirable. The basic pattern of no growth (use original data), linear growth (use differences) of relative growth (use growth rates or log-differences) will often provide adequate guidance. This issue will be revisited periodically in later chapters. Principle 2.4 [5.8] Use graphical displays for data. As we have seen in sections 2.2 and 2.3, plotting the data can provide a variety of insights and may also suggest suitable transformations or adjustments. Graphical analysis should always be the first step when developing forecasting procedures, even if applied to only a small sample from a larger set of series. Principle 2.5 [5.4] Adjust for unsystematic past events (e.g. outliers). Data may be affected by the weather, political upheavals, supply shortages, or other events. Such factors need to be taken into account when clear reasons can be identified for the unusual observations. The forecaster should resist the temptation to give the data a face-lift by over-adjusting for every minor event. Principle 2.6 [5.5] Adjust for systematic events (e.g. seasonal effects).

54

Systematic events such as weekends, public holidays and seasonal patterns can affect the observed process and must be taken into account. We will discuss these adjustments in chapters 9 and X. Principle 2.7 [13.20, modified] Use error measures that adjust for scale in the data when comparing across series. When you compare forecasts for a single series, scale-dependent measures such as MAE or RMSE are useful. However, when you compare across different series, you should use scale-free series such as MAPE (if appropriate) or MASE. Principle 2.8 [13.25] Use multiple measures of performance based upon the errors. If forecast users are able to compare performance using different measures, they will be better able to assess performance relative to their particular needs. Multiple measures allow users to focus on those attributes of a forecasting procedure that they deem most relevant and also to check on the robustness of their conclusions. For example, one user may value simplicity and be willing to accept somewhat reduced accuracy in order to keep it simple. Another may wish to avoid large errors, when the RMSE becomes most relevant, since it depends upon squared errors. A third may avoid RMSE purely because it gives such weight to large errors.

Summary
In this chapter we have described the basis tools of data analysis, particularly as they relate to the analysis of time series. In particular, we examined: Scatter plots and time series plots for preliminary analysis of the data (sections 2.2 and 2.3) 55

Basic summary statistics for individual variables (section 2.4) Correlation as a measure of association for cross-sectional data (section 2.5) Transformations of the data (section 2.6) Measures of forecasting accuracy (section 2.7) Prediction intervals as a measure of the uncertainty related to point forecasts (section 2.8).

Finally, in section 2.9 we briefly examined some of the underlying principles that should be kept in mind when starting out on a forecasting exercise.

References
Anderson, D.R., Sweeney, D.J. and Williams, T.A. (2005) Statistics for Business and Economics. South-Western: Mason, Ohio. Ninth edition. Armstrong, J.S. (2001). Principles of Forecasting: A Handbook for Researchers and Practitioners. Boston, MA: Kluwer Hyndman, R.J and Koehler, A.B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22, 679 688 Winston, W. and Albright, S.C. (2001). Practical Management Science. Pacific Grove, CA: Duxbury, Second edition.

Exercises
2.1 The average monthly temperatures for Boulder, Colorado from January 1991 to September 2008 are given in Boulder.xlsx [Source: U.S. Department of Commerce, National Oceanic and Atmospheric Administration]. Plot the time 56

series and also create a seasonal plot for the first four years of the series. Comment upon your results.

2.2 The following table contains data on railroad passenger injuries in the U.S. (rail safety.xlsx) from 1991 to 2007. Injuries represents the number of persons injured in the given year, train-miles denotes the millions of miles travelled by trains and the final column is the ratio describing the number of injuries per 100 million miles travelled. a. Create a scatterplot for injuries against train-miles b. Plot each of the three time series c. Does the level of injuries appear to be changing over time? If so, in what way?

Train-

Injuries per T-M

Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

Injuries

miles

473 382 411 559 497 573 513 601 535 481 658 746 877 726 679

72 74 74 75 75 76 77 78 78 82 84 88 90 89 89

657 516 555 745 663 754 666 770 683 584 781 850 979 812 760

57

2005 2006 2007

935 761 938

90 92 95

1,040 828 990

Source: U.S. Department of Transportation, Federal Railroad Administration.

2.3 An investor has a portfolio consisting of holdings in nine stocks (returns.xlsx). The end-of-year returns over the previous year are: -5.0, -3.7, 0.9, 4.8, 6.2, 8.9, 11.2, 18.6, 25.4. a. Compute the summary statistics (mean, median, MAD and S). b. Just before the close of business in the last trading session of the year, the company that had reported the 5.0 percent drop declares bankruptcy, so that the return becomes -100 percent. Re-compute the results and comment on your findings. c. Are simple summary statistics relevant to this investor? How would you modify the calculations if at all?

2.4 For the temperature data (Boulder.xlsx) in Exercise 2.1, compute the summary statistics (mean, median, MAD and S) for each month. Comment upon your results. Does it make sense to compute summary statistics across all values, rather than month by month? Explain why or why not.

2.5 Compute the summary statistics (mean, median, MAD and S) for each of the variables listed in Exercise 2.2 (rail safety.xlsx). Are these numbers a sensible summary of safety conditions? 58

2.6 Calculate the correlation between the monthly values of electricity consumption (electricity.xlsx) for 2002 (listed as forecasts in the table) and 2003 in Table 2.8. Interpret the result.

2.7 Compute the correlations for the hot growth companies in Table 2.3 (Growth companies.xlsx) for each pair of variables Rank, P-E Ratio and Return, using the first 50 and the second 50 separately. Compare these results with those given in Example 2.4. Explain the differences. .

2.8 The quarterly sales figures and percentage growth figures for Netflix are given in Netflix.xlsx and the table below. Produce time series plots for each of quarterly sales, absolute growth and the growth rate.

a. b.

Are the mean and median useful in this case? Explain why or why not. Calculate the growth rate for each quarter relative to the same quarter in the previous year; that is for 2001 Q1 we have 100(17.06 5.17)/5.17 = 230. After allowing for the start-up phase of the company, do sales show signs of leveling off?

Quarterly Growth- Growth Year 2000 2000 2000 Quarter 1 2 3 Sales absolute 5.17 7.15 10.18 * 1.97 3.04 percent * 38.1 42.5

59

2000 2001 2001 2001 2001 2002 2002 2002 2002 2003 2003 2003 2003 2004 2004 2004 2004 2005 2005 2005 2005 2006 2006 2006 2006 2007 2007 2007 2007

4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

13.39 17.06 18.36 18.88 21.62 30.53 36.36 40.73 45.19 55.67 63.19 72.20 81.19 99.82 119.71 140.41 140.66 152.45 164.03 172.74 193.00 224.13 239.35 255.95 277.23 305.32 303.69 293.97 302.36

3.21 3.67 1.30 0.52 2.74 8.91 5.83 4.37 4.46 10.48 7.52 9.02 8.99 18.63 19.89 20.70 0.25 11.79 11.58 8.71 20.26 31.13 15.22 16.60 21.28 28.09 -1.63 -9.72 8.39

31.5 27.4 7.6 2.8 14.5 41.2 19.1 12.0 10.9 23.2 13.5 14.3 12.4 22.9 19.9 17.3 0.2 8.4 7.6 5.3 11.7 16.1 6.8 6.9 8.3 10.1 -0.5 -3.2 2.9

Source: Netflix Annual Reports

60

2.9

Use the data in Boulder.xlsx, sheet Data(2) to generate 12-month-ahead forecasts by using the value for the same month in the previous year. Compute the ME, MAE, RMSE and MASE for these forecasts. Repeat the analysis using the monthly averages calculated in Exercise 2.4. a. Which set of forecasts appears to work better? b. Is the comparison fair? [Hint: What do you know and when do you know it?] c. What conclusions would you draw about the choice between the two methods?

2.10 The average temperatures (in degrees Fahrenheit) on the day (January 20th) of the Presidents inauguration (Inauguration.xlsx) are shown in the following table. a. b. Summarize the data numerically and graphically Create a 95 percent prediction interval for the inauguration of President Obama in 2009. Year 1937 1941 29 1977 27 1945 36 1981 55 1949 38 1985 7 1953 48 1989 52 1957 44 1993 40 1961 22 1997 34 1965 37 2001 36 1969 35 2005 35

Temperature 33 Year 1973

Temperature 42

[Source: Washington Post, November 12, 2008]

2.11 Compute 95 percent prediction intervals for the 12-month-ahead forecasts for temperature (Boulder.xlsx) generated in Exercise 2.10, using the estimated RMSE calculated there. Find the percentage of the observations that lies outside the limits. Is this figure close to 95 percent? 61

2.12 Use the data in Table 2.8 (electricity.xlsx) to generate forecasts for electricity consumption for the household in 2004, based on the end of 2003 as the forecast origin. Generate 90 percent prediction intervals for these forecasts.

2.13 Following the discussion in section 2.8 on inventory management, construct a cost function assuming that the cost of lost sales is C times that of the cost of holding unsold stock. Past records show that the predictive distribution for future sales is:
P ( Sales = P ( Sales = 1 , = x 90,91,,109,110 21 x = ) 0, otherwise x = )

Find the optimal inventory level, in the sense of minimizing overall cost, when C =1 and when C=3. What should the safety stock be to guarantee a service level of 90 percent?

Mini-case 2.1: Are the outcomes of NFL games predictable?


On Sunday December 7 2008, a total of fourteen games were played in the National Football League (NFL). Inevitably, many pundits attempt to predict the outcome of each game and individual performances are all over the map. Two of the best-known experts are Jeff Sagarin and Danny Sheridan. Their forecasts, published in USA Today on December 5th, are summarized in the table below (NFL.xlsx). The first column shows the eventual winner, the second column the loser and the third the final margin of victory.

62

The Sagarin and Sheridan predictions (recorded as the points by which the winner was favored, or not) appear in the next two columns. Which criteria do you regard as appropriate for comparing the forecasts with the outcomes? How would you define a chance outcome? Is there any evidence to suggest that the experts do better than chance? If you were paying a fee to obtain these forecasts, would you favor one expert over the other, or are they sufficiently different to merit obtaining forecasts from both sources?

Winner Baltimore Minnesota Tennessee Houston New Orleans Chicago Philadelphia Indianapolis Miami New England San Francisco Denver Arizona Pittsburgh

Loser Washington Detroit Cleveland Green Bay Atlanta Jacksonville New York Giants Cincinnati Buffalo Seattle New York Jets Kansas City Saint Louis Dallas

Margin 14 4 19 3 4 13 6 32 13 3 10 7 24 7

Sagarin 8.66 14.53 14.77 -5.7 -0.4 7.51 -9.52 14.73 2.6 6.98 -6.2 9.58 14.78 6.49

Sheridan 5 10 13.5 -5.5 3 6.5 -7 13.5 -1 4.5 -3.5 9 14 3

63

Mini-Case 2.2: Whither Wal-Mart?


As Wal-Mart has grown, so its stock has proved to be a solid investment in both good times and bad. In order to determine whether a future investment in the stock would be worthwhile, we need to consider the plans the company has for future growth. The annual reports provide a considerable amount of information, see http://walmartstores.com/investors/ One particular aspect of Wal-Marts future strategy is their investment in different types of retail outlet, known as Wal-Mart stores, Superstores and Sams Club. As the name suggests the Superstore may be thought of as an upgrade of the Wal-Mart store, being generally larger in size and carrying a wider range of merchandise. The Sams Clubs are more oriented to bulk purchasing. The spreadsheet Walmart.xlsx provides annual data on the numbers of each type of store within the U.S. at the fiscal year-end (March 31) for the period 1995 2008.

Another feature of interest to the potential investor is the growth in sales over time. The spreadsheet also provides quarterly sales figures for the period 2003, Q1 through 2007, Q4. 1) Summarize the changes in types of store over the period. What does this say about Wal-Marts plans for the future? 2) Compute the growth in sales over time. Is there any evidence that the rate of growth is slowing or increasing?

Table 2.11: Summary results for Wal-Mart [Source: Wal-Mart Annual Reports, 1994 2008] 64

WalMart Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 Stores 2176 2218 1960 1921 1860 1801 1736 1647 1568 1478 1353 1209 1075 971 Super Stores 154 255 344 441 564 721 888 1066 1258 1471 1713 1980 2256 2447 Sam's Club 453 470 436 443 451 463 475 500 525 538 551 567 579 591 Year 2003 2003 2004 2004 2004 2004 2005 2005 2005 2005 2006 2006 2006 2006 2007 Note: Financial year ends on January 31st 2007 2007 2007 2008 Quarter 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 Sales ($ billion) 62.6 62.5 74.5 64.8 69.7 68.5 82.2 70.0 75.9 74.6 88.4 78.8 84.5 83.5 98.1 85.4 92.0 90.9 106.3

Mini-Case 2.3: Economic recessions


The data given below summarize the length of each recession in the period 1929 - 2008, as determined by the National Bureau of Economic Research. 1) Calculate the average length of a recession and provide a 95 percent confidence interval for this quantity. Interpret the result.

65

2) Calculate the average time between recessions and provide a 95 percent confidence interval for this quantity. Interpret the result. 3) Is there any correlation between the length of a recession and the period of growth immediately preceding it referred to as Gap in the table? 4) Is there any correlation between the length of a recession and the period of growth immediately following a recession? 5) Comment on your findings.

Information on recessions in the U.S. economy, 1929 2008 [Source: National Bureau of Economic Research]
Onset Duration Aug-29 May-37 Feb-45 Nov-48 Jul-53 Aug-57 Apr-60 Dec-69 Nov-73 Jan-80 Jul-81 Jul-90 Mar-01 Dec-07 43 13 8 11 10 8 10 11 16 6 16 8 8 End Feb-33 May-38 Sep-45 Sep-49 Apr-54 Mar-58 Jan-61 Oct-70 Feb-75 Jun-80 Oct-82 Feb-91 Oct-01 50 80 37 45 39 24 106 36 58 12 92 73 Gap

12 (continuing)

66

You might also like